SF130118

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: SF130118

Post by IWB »

This is the IPON-RRRL list with SF8:

Code: Select all

   # PLAYER              : RATING  ERROR     (%)    D(%)  OppAvg   CFS(next)    POINTS       W       D       L  PLAYED
   1 Houdini 6.02        :   3349     10   81.4%    32.4    3068     100        2687.0    2152    1070      78    3300
   2 Komodo 11.2.2       :   3318     10   78.5%    34.8    3070      99        2591.5    2017    1149     134    3300
   3 Stockfish 8         :   3301      9   76.9%    40.3    3071     100        2538.5    1873    1331      96    3300
   4 Shredder 13         :   3124      8   55.8%    51.2    3083     100        1842.5     997    1691     612    3300
   5 Fizbo 2             :   3096      8   52.1%    41.7    3085     100        1719.5    1032    1375     893    3300
   6 Ginkgo 2.0          :   3066      8   47.9%    50.1    3087      81        1582.0     756    1652     892    3300
   7 Gull 3              :   3060      8   47.2%    46.9    3087      95        1557.5     783    1549     968    3300
   8 Andscacs 0.92       :   3050      8   45.8%    45.2    3088     100        1512.5     766    1493    1041    3300
   9 Booot 6.2           :   3031      8   43.2%    49.9    3089      69        1425.5     602    1647    1051    3300
  10 Jonny 8.00          :   3028      8   42.8%    47.0    3090      90        1412.0     637    1550    1113    3300
  11 Fritz 16            :   3020      8   41.7%    46.7    3090      79        1376.5     606    1541    1153    3300
  12 Equinox 3.30        :   3014      8   41.0%    47.9    3091     100        1354.0     563    1582    1155    3300
  13 Chiron 4            :   2994      8   38.3%    45.8    3092      55        1263.5     507    1513    1280    3300
  14 Critter 1.6a        :   2993      8   38.2%    46.1    3092     100        1260.0     500    1520    1280    3300
  15 Nirvanachess 2.4    :   2969      8   35.0%    44.5    3094      90        1156.0     422    1468    1410    3300
  16 Hannibal 1.7        :   2961      8   34.0%    44.1    3094     ---        1121.5     394    1455    1451    3300
and this would be the list with a possible new SF:

Code: Select all

   # PLAYER                        : RATING  ERROR     (%)    D(%)  OppAvg   CFS(next)    POINTS       W       D       L  PLAYED
   1 Stockfish 130118 64 POPCNT    :   3349     10   81.0%    35.0    3072      66        2672.0    2095    1154      51    3300
   2 Houdini 6.02                  :   3346     10   80.7%    32.1    3072     100        2663.0    2133    1060     107    3300
   3 Komodo 11.2.2                 :   3317     10   78.0%    34.2    3074     100        2575.0    2010    1130     160    3300
   4 Shredder 13                   :   3126      8   55.8%    51.2    3086     100        1840.5     995    1691     614    3300
   5 Fizbo 2                       :   3097      8   51.9%    41.4    3088     100        1711.5    1029    1365     906    3300
   6 Ginkgo 2.0                    :   3068      8   48.0%    50.2    3090      93        1583.5     755    1657     888    3300
   7 Gull 3                        :   3060      8   46.8%    46.4    3091      97        1546.0     780    1532     988    3300
   8 Andscacs 0.92                 :   3049      8   45.3%    44.3    3092     100        1496.5     765    1463    1072    3300
   9 Booot 6.2                     :   3031      8   43.0%    49.6    3093      74        1420.0     601    1638    1061    3300
  10 Jonny 8.00                    :   3028      8   42.5%    46.7    3093      92        1403.0     633    1540    1127    3300
  11 Fritz 16                      :   3019      8   41.4%    46.1    3094      75        1365.0     604    1522    1174    3300
  12 Equinox 3.30                  :   3015      8   40.8%    47.6    3094     100        1346.5     561    1571    1168    3300
  13 Chiron 4                      :   2996      8   38.3%    45.8    3095      66        1262.5     506    1513    1281    3300
  14 Critter 1.6a                  :   2993      8   37.9%    45.5    3095     100        1251.5     501    1501    1298    3300
  15 Nirvanachess 2.4              :   2970      9   35.0%    44.4    3097      95        1154.0     422    1464    1414    3300
  16 Hannibal 1.7                  :   2960      8   33.6%    43.4    3098     ---        1109.5     394    1431    1475    3300
That is a 48 Elo increase.
It is close, but it would be enough.


[/list][/list]
User avatar
pohl4711
Posts: 2432
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SF130118

Post by pohl4711 »

At the moment, a testrun with Contempt=+20 shows a small Elo-gain vs. Master in the Framework. It would be very interesting, to repeat that latest inofficial IPON Testrun of Stockfish 180113 with Contempt=+20. Versus the high number of weaker engines to play for Stockfish, I would expect a measureable Elo-gain playing with Contempt=+20. And the result would help, to find a good new defsult-setting for Contempt in Stockfish, which is discussed at the moment on Fishcooking-Forum.

Stefan
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: SF130118

Post by tpoppins »

IWB wrote:
APassionForCriminalJustic wrote:
Jouni wrote:Thanks for test. Obviously SF needs contempt 20 for Houdini rating.
Stockfish doesn't need anything. It's already better than all of these engines including Houdini. The only thing that matters is match versus match rather than match percentage versus all engines. So with Houdini losing badly versus Stockfish if Houdini stays at the top of the list then it simply makes this rating list meaningless.
Yea, who needs statistics and mathematcs. All fake news. Fxxx it!
Indeed. Following that reasoning further, Tal should have never been the world champion because he kept losing badly to Korchnoi.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: SF130118

Post by IWB »

You might gain some elo but the performance against the toughest contenders goes down - which might boost them and SF might end up with more Elo but less distance to the second ... or even the first engine!

I run an expiremnt on this on 18.06.16 (see news on my site) with mixed results. Additionaly I expect the loss/gain because of contempt to be within my eror margins, so nothing conclusive ...

I am not a big fan of contempt (for every engine*) because basicaly you give up the point of "best chess" and are aiming for a good result in rating lists (playing worse chess because you constantly overestimate your position in the hope your, now better, opponent might blunder!). This might be ok if you are the undisputed leader (as some engines were years ago) but it isn't today because as soon as something better occurs your engine is hit even harder - and currently SF is not even "undisputed" with 3 Elo in front of the second (which in itself is to low for my test setup for a definitve answer).

Nonetheless, if the three SF guys do not decide to release something I will make the test toward the end of the week (has to fit into my schedule) because I am curious myself.

Ingo

*except in a head to head tourney were you know your opponent.
Jouni
Posts: 3278
Joined: Wed Mar 08, 2006 8:15 pm

Re: SF130118

Post by Jouni »

You result +48 is very close to NCM +50 and Pohl 52! But SF should use contempt, because Houdini has default 2(=20) and Komodo 10. Note, that contempt implementation was radically changed recently. I am really disappointed, that Stockfish team has not even discussed about SF9! May be after TCEC11 won only?
Jouni
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: SF130118

Post by IWB »

I know that the contempt handling was changed but except they havened found something revolutionary new - which I highly doubt - i expect it not help much. Just check my testing of 2016 of SF AND Komodo. No "big" gain for Komodo and its contempt either.
If someone wants to use a contempt as default so be it (I cant stop them anyhow), but I am not convinced that it is really a good thing for chess in general (see last posting for reasons)
Yes it may gains some low one digit elos but that is usually within any error margin of the rating lists, so why doing it at all?

(And yes I know that 99% of the people checking a rating list just look at the ranking and not at error bars - unfortunately. They think that if one engine leads even with 1 Elo it is the better engine. If that is your goal as a developer ...)
Last edited by IWB on Sun Jan 21, 2018 9:46 am, edited 1 time in total.
User avatar
pohl4711
Posts: 2432
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SF130118

Post by pohl4711 »

I repeated the 5000 games testrun of Stockfish 171206 with some different Contempt settings. Let's see, if the new contempt-algorithm of Stockfish can lower the draw-rate and/or gain some Elo.



Games Completed = 5000 of 5000 (Avg game length = 424.849 sec)
Settings = Gauntlet/512MB/180000ms+1000ms/M 400cp for 4 moves, D 130 moves
1. Stockfish 171206 bmi2 3521.5/5000 2299-256-2445 (L: m=0 t=0 i=0 a=256) (D: r=1631 i=411 f=109 s=17 a=277)
2. Komodo 11.2.2 x64 412.5/1000 86-261-653 (L: m=0 t=0 i=0 a=261) (D: r=379 i=178 f=35 s=2 a=59)
3. Houdini 6 pext 462.0/1000 115-191-694 (L: m=1 t=0 i=0 a=190) (D: r=510 i=77 f=35 s=10 a=62)
4. Fire 6.1 popc 209.0/1000 11-593-396 (L: m=0 t=0 i=0 a=593) (D: r=270 i=39 f=15 s=3 a=69)
5. Shredder 13 x64 207.0/1000 15-601-384 (L: m=0 t=0 i=0 a=601) (D: r=260 i=69 f=13 s=2 a=40)
6. Fizbo 1.9 bmi2 188.0/1000 29-653-318 (L: m=1 t=0 i=0 a=652) (D: r=212 i=48 f=11 s=0 a=47)


Games Completed = 5000 of 5000 (Avg game length = 437.087 sec)
Settings = Gauntlet/512MB/180000ms+1000ms/M 400cp for 4 moves, D 130 moves
1. Stockfish 171206 C=15 3551.0/5000 2400-298-2302 (L: m=0 t=0 i=0 a=298) (D: r=1412 i=430 f=128 s=17 a=315)
2. Komodo 11.2.2 x64 412.5/1000 105-280-615 (L: m=0 t=0 i=0 a=280) (D: r=327 i=175 f=27 s=2 a=84)
3. Houdini 6 pext 461.5/1000 126-203-671 (L: m=0 t=0 i=0 a=203) (D: r=432 i=121 f=31 s=11 a=76)
4. Fire 6.1 popc 199.0/1000 21-623-356 (L: m=1 t=0 i=0 a=622) (D: r=226 i=34 f=28 s=1 a=67)
5. Shredder 13 x64 186.5/1000 11-638-351 (L: m=0 t=0 i=0 a=638) (D: r=223 i=53 f=24 s=3 a=48)
6. Fizbo 1.9 bmi2 189.5/1000 35-656-309 (L: m=1 t=0 i=0 a=655) (D: r=204 i=47 f=18 s=0 a=40)


Games Completed = 5000 of 5000 (Avg game length = 439.036 sec)
Settings = Gauntlet/512MB/180000ms+1000ms/M 400cp for 4 moves, D 130 moves
1. Stockfish 171206 C=25 3554.0/5000 2463-355-2182 (L: m=1 t=0 i=0 a=354) (D: r=1293 i=455 f=118 s=19 a=297)
2. Komodo 11.2.2 x64 434.0/1000 139-271-590 (L: m=0 t=0 i=0 a=271) (D: r=282 i=194 f=36 s=1 a=77)
3. Houdini 6 pext 470.0/1000 150-210-640 (L: m=0 t=0 i=0 a=210) (D: r=436 i=100 f=34 s=16 a=54)
4. Fire 6.1 popc 175.0/1000 11-661-328 (L: m=0 t=0 i=0 a=661) (D: r=192 i=37 f=16 s=1 a=82)
5. Shredder 13 x64 186.5/1000 17-644-339 (L: m=0 t=0 i=0 a=644) (D: r=205 i=64 f=18 s=1 a=51)
6. Fizbo 1.9 bmi2 180.5/1000 38-677-285 (L: m=2 t=0 i=1 a=674) (D: r=178 i=60 f=14 s=0 a=33)


Games Completed = 5000 of 5000 (Avg game length = 442.943 sec)
Settings = Gauntlet/512MB/180000ms+1000ms/M 400cp for 4 moves, D 130 moves
Time = 554778 sec elapsed, 0 sec remaining
1. Stockfish 171206 C=40 3618.0/5000 2598-362-2040 (L: m=0 t=0 i=0 a=362) (D: r=1140 i=447 f=93 s=25 a=335)
2. Komodo 11.2.2 x64 403.0/1000 118-312-570 (L: m=0 t=0 i=0 a=312) (D: r=234 i=212 f=23 s=4 a=97)
3. Houdini 6 pext 483.0/1000 183-217-600 (L: m=0 t=0 i=0 a=217) (D: r=380 i=97 f=33 s=16 a=74)
4. Fire 6.1 popc 172.5/1000 12-667-321 (L: m=2 t=0 i=0 a=665) (D: r=197 i=31 f=14 s=3 a=76)
5. Shredder 13 x64 164.5/1000 17-688-295 (L: m=0 t=0 i=0 a=688) (D: r=171 i=60 f=15 s=2 a=47)
6. Fizbo 1.9 bmi2 159.0/1000 32-714-254 (L: m=3 t=0 i=0 a=711) (D: r=158 i=47 f=8 s=0 a=41)




Conclusions: (all comparsions to the result of Stockfish 171206 with default Contempt=0)

1) C=+15 gained +5 Elo. Draws overall lowered from 48.9% to 46.0%. 3fold-draws lowered from 32.6% to 28.2%

2) C=+25 gained +5 Elo. Draws overall lowered from 48.9% to 43.6%. 3fold-draws lowered from 32.6% to 25.9%

3) C=+40 gained +17 Elo. Draws overall lowered from 48.9% to 40.8%. 3fold-draws lowered from 32.6% to 22.8%
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: SF130118

Post by IWB »

Nice example of what I ment:

With C = 25 and 40 Houdini got more points against SF then default contempt (and 0.5 point less with C=15 - I would consider that statistical noise).
However, Komodo gained at C15 and C25 but suffered at C40 ... its a bit strange and ... inconclusive!

Assuming the trend would be identical on my list both Komodo and Houdini would gain points with a C20!

Ingo


edited PS: Because you mentioned it: I don't mind SFs draw rate at all, I mind the result! Looking at my list draw rate is not that bad AND there is another Engine which is much worse than SF :-)
User avatar
pohl4711
Posts: 2432
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SF130118

Post by pohl4711 »

Author: Stéphane Nicolet
Date: Tue Jan 23 14:26:45 2018 +0100
Timestamp: 1516714005

Contempt 20

Set the default contempt value of Stockfish to 20 centipawns.

The contempt feature of Stockfish tries to prevent the engine from
simplifying the position too quickly when it feels that it is very
slightly behind, instead keeping the tension a little bit longer.

Various tests in November 2017 have proved that our current imple-
mentation works well against SF7 (which is about 130 Elo weaker than
current master) and than the Elo gain is an increasing function of
contempt, going (against SF7) from +0 Elo when contempt is set at
zero centipawns, to +30 Elo when contempt is 40 centipawns.

See pull request 1325 for details:

https://github.com/official-stockfish/S ... /pull/1325

This november discussion left open the decision of which "default"
value for contempt we should use for Stockfish, taking into account
the various uses ofStockfish (opening preparation for humans, computer
online tournaments,analysis tool for web pages, human/computer play,
etc).

This pull request proposes to set the default contempt value of SF
to twenty centipawns, which turns out to be the highest value which
is not a regression against current master, as this seemed to be a
good compromise between risk and safety. A couple of SPRT[-3..1]
tests were done to bisect this value:

Contempt 10: http://tests.stockfishchess.org/tests/v ... 02977e2901 (PASSED)
Contempt 15: http://tests.stockfishchess.org/tests/v ... 02977e28fa (PASSED)
Contempt 20: http://tests.stockfishchess.org/tests/v ... 02977e28fc (PASSED)
Contempt 25: http://tests.stockfishchess.org/tests/v ... 02977e2904 (FAILED)

Surprisingly, a test at "very long time control" hinted that using
contempt 20 is not only be non-regressive against contempt 0, but
may actually exhibit some small Elo gain, giving a likehood of superio-
rity of 88.7% after 8500 games:
Jouni
Posts: 3278
Joined: Wed Mar 08, 2006 8:15 pm

Re: SF130118

Post by Jouni »

Currently nice +15 in Stockfish master test and +10 in NCM! More against weaker engines?
Jouni