for kai laskos (about doubling speed elo gain)

syzygy · Post by **syzygy** » Sat Jun 01, 2019 4:38 pm

Laskos wrote: ↑Sat Jun 01, 2019 9:57 am
jp wrote: ↑Sat Jun 01, 2019 9:42 am ... which is relevant to the W/L question Kai raised (though it's only Komodo 9.3).

Laskos wrote: ↑Fri May 31, 2019 8:49 pm This is not that hard. The harder thing seems to be to get the behavior of Win/Loss ratio to LTC. From the latest months' data of Andreas, I seem to get that Win/Loss ratio increases from rapid to LTC, almost consistently already. It might be (not very likely) related to the contempt used recently by many top engines.

Yes, but the data is contradictory when the whole FGRL data are used. Also, I am getting on my hardware higher W/L ratio for Lc0 T40 against SF at 4m + 4s compared to 1m + 1s TC from balanced openings, but pretty stable Elo difference. Not LTC at all, but the issue is not easy to me. Also, self-play might have a bit different behavior compared to the general case. Contempt too might interfere, as it's lowering the draw rate, but to unclear proportion of wins and losses.

Isn't it to be expected that the draw rate will be lower (and so the W/L will be higher) between the same two engines as TC increases? Both engines will make fewer mistakes and there will therefore be less noise. At STC, mistakes by both engines cancel each other out so far as they don't represent strength difference. At LTC, far fewer mistakes are made that cancel each other out (read: games tend to be won only by the stronger engine).

(And of course this is not about STC/LTC but about shorter/longer TC or, equivalently, lower/higher speed. What is 4m+4s on one computer is 4h+4m on another.)

I'm pretty sure contempt only lowers the draw rate by creating more noise, so it won't give more accurate ELO estimates.

jp · Post by jp » Sat Jun 01, 2019 4:53 pm

syzygy wrote: ↑Sat Jun 01, 2019 4:38 pm Isn't it to be expected that the draw rate will be lower (and so the W/L will be higher) between the same two engines as TC increases? Both engines will make fewer mistakes and there will therefore be less noise. At STC, mistakes by both engines cancel each other out so far as they don't represent strength difference. At LTC, far fewer mistakes are made that cancel each other out (read: games tend to be won only by the stronger engine).

I might have expected the opposite, that the draw rate would be higher as TC increases.

syzygy · Post by **syzygy** » Sat Jun 01, 2019 9:15 pm

jp wrote: ↑Sat Jun 01, 2019 4:53 pm
syzygy wrote: ↑Sat Jun 01, 2019 4:38 pm Isn't it to be expected that the draw rate will be lower (and so the W/L will be higher) between the same two engines as TC increases? Both engines will make fewer mistakes and there will therefore be less noise. At STC, mistakes by both engines cancel each other out so far as they don't represent strength difference. At LTC, far fewer mistakes are made that cancel each other out (read: games tend to be won only by the stronger engine).
I might have expected the opposite, that the draw rate would be higher as TC increases.

Ouch yes, and that is what I had wanted to write!

As the TC increases, the draw rate increases. If Elo difference stays the same, that means the W/L rate will be higher. So what I wrote still makes sense I think, I should just have written "draw rate will be higher" instead of lower.

Suppose engine A on average score 55-45 against engine B. With zero noise, that means 10 wins for A and 90 draws (so infinite W/L ratio). With maximum noise, it means 55 wins for A and 45 wins for B (so a W/L ratio that is just above 1).

jp · Post by jp » Mon Jun 03, 2019 2:42 pm

For comparison, Kai's old Houdini 3 results --

Laskos wrote: ↑Thu Jul 25, 2013 2:16 am

Code: Select all

1&#41;  4k nodes vs 2k nodes          +3862 -352  =786   +303
2&#41;  8k nodes vs 4k nodes          +3713 -374  =913   +280
3&#41;  16k nodes vs 8k nodes         +3399 -436 =1165   +237
4&#41;  32k nodes vs 16k nodes        +3151 -474 =1374   +208
5&#41;  64k nodes vs 32k nodes        +2862 -494 =1641   +179
6&#41;  128k nodes vs 64k nodes       +2613 -501 =1881   +156
7&#41;  256k nodes vs 128k nodes       +942 -201  =855   +136
8&#41;  512k nodes vs 256k nodes       +900 -166  =930   +134
9&#41;  1024k nodes vs 512k nodes      +806 -167 =1026   +115
10&#41; 2048k nodes vs 1024k nodes     +344  -83  =572    +93
11&#41; 4096k nodes vs 2048k nodes     +307  -85  =607    +79
12&#41; 8192k nodes vs 4096k nodes     +290  -70  =640    +78

Laskos wrote: ↑Thu Jul 25, 2013 11:43 am I also derived the rule-of-thumb formula for gain in Elo points from doubling nodes (or time):

N = number of nodes per move
Elo gain for doubling is ~ 18100/{log(N)}^2

Laskos · Post by **Laskos** » Wed Jun 05, 2019 9:44 am

syzygy wrote: ↑Sat Jun 01, 2019 4:38 pm
Laskos wrote: ↑Sat Jun 01, 2019 9:57 am
jp wrote: ↑Sat Jun 01, 2019 9:42 am ... which is relevant to the W/L question Kai raised (though it's only Komodo 9.3).

Laskos wrote: ↑Fri May 31, 2019 8:49 pm This is not that hard. The harder thing seems to be to get the behavior of Win/Loss ratio to LTC. From the latest months' data of Andreas, I seem to get that Win/Loss ratio increases from rapid to LTC, almost consistently already. It might be (not very likely) related to the contempt used recently by many top engines.

Yes, but the data is contradictory when the whole FGRL data are used. Also, I am getting on my hardware higher W/L ratio for Lc0 T40 against SF at 4m + 4s compared to 1m + 1s TC from balanced openings, but pretty stable Elo difference. Not LTC at all, but the issue is not easy to me. Also, self-play might have a bit different behavior compared to the general case. Contempt too might interfere, as it's lowering the draw rate, but to unclear proportion of wins and losses.
Isn't it to be expected that the draw rate will be lower (and so the W/L will be higher) between the same two engines as TC increases? Both engines will make fewer mistakes and there will therefore be less noise. At STC, mistakes by both engines cancel each other out so far as they don't represent strength difference. At LTC, far fewer mistakes are made that cancel each other out (read: games tend to be won only by the stronger engine).

(And of course this is not about STC/LTC but about shorter/longer TC or, equivalently, lower/higher speed. What is 4m+4s on one computer is 4h+4m on another.)

I'm pretty sure contempt only lowers the draw rate by creating more noise, so it won't give more accurate ELO estimates.

Yes, looks about right, although there was some consensus that stronger entities (from anything like TC, hardware etc) compress somewhat the Elo differences while increasing the draw rate. From the same FGRL rating list some two years ago, it was pretty clear that Elo differences diminish from 10m + 6s to 60m + 15s TC (and draw rates increase). Wilo (draws discarded) or normalized Elo seemed more stable from going from one TC to another.
But several months ago I saw something different with FGRL: Elo almost doesn't compress at all on these 2 different TC. And Elo again seems a good measure to separate engines by strength. I have this data:

Elo top 10:

10m + 6s:

Code: Select all

   # PLAYER              : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 Stockfish 9         :  169.9    8.3    1984.5    2700    73.5     100    
   2 Houdini 6           :  128.7    8.2    1844.5    2700    68.3     100    
   3 Komodo 12           :  108.0    7.9    1770.0    2700    65.6     100    
   4 Fire 7.1            :   16.3    7.4    1419.5    2700    52.6     100    
   5 Ethereal 11.00      :  -29.2    7.3    1240.0    2700    45.9     100    
   6 Deep Shredder 13    :  -55.8    7.4    1136.0    2700    42.1      84    
   7 Booot 6.3.1         :  -61.3    7.4    1114.5    2700    41.3      96    
   8 Fizbo 2             :  -70.9    7.5    1077.5    2700    39.9     100    
   9 Andscacs 0.94       :  -90.4    7.6    1003.0    2700    37.1     100    
  10 Gull 3              : -115.3    7.6     910.5    2700    33.7     ---    

White advantage = 37.17 +/- 2.00
Draw rate (equal opponents) = 68.70 % +/- 0.51
SD=100.9

60m + 15s:

Code: Select all

   # PLAYER              : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 Stockfish 9         :  157.4   11.4     972.5    1350    72.0     100    
   2 Houdini 6           :  130.2   11.0     925.5    1350    68.6      98    
   3 Komodo 12           :  113.4   10.6     895.5    1350    66.3     100    
   4 Fire 7.1            :    3.6   10.2     684.5    1350    50.7     100    
   5 Ethereal 11.00      :  -26.9   10.2     624.0    1350    46.2     100    
   6 Booot 6.3.1         :  -46.4   10.2     585.5    1350    43.4      63    
   7 Deep Shredder 13    :  -48.9    9.8     580.5    1350    43.0     100    
   8 Andscacs 0.94       :  -74.2    9.5     531.5    1350    39.4      91    
   9 Fizbo 2             :  -83.8   10.0     513.0    1350    38.0     100    
  10 Gull 3              : -124.4   10.5     437.5    1350    32.4     ---    

White advantage = 33.35 +/- 2.81
Draw rate (equal opponents) = 70.50 % +/- 0.71
SD=98.8

The compression in Elo ratings is a negligible 2.1%. 2 years ago I got a significant Elo compression of IIRC some 15% on these 2 TC. I am not sure why this changed and your reasoning applies to a higher degree now than 2 years ago. I agree with your reasoning, but it's important how much quantitatively it matters in Elo ratings. OTOH, I seem to remember that longer TC CCRL and CEGT rating lists are somewhat Elo-compressed compared to shorter TC.

for kai laskos (about doubling speed elo gain)

Re: for kai laskos (about doubling speed elo gain)

Re: for kai laskos (about doubling speed elo gain)

Re: for kai laskos (about doubling speed elo gain)

Re: for kai laskos (about doubling speed elo gain)

Re: for kai laskos (about doubling speed elo gain)