jp wrote: ↑Sat Jun 01, 2019 9:42 am
... which is relevant to the W/L question Kai raised (though it's only Komodo 9.3).
Laskos wrote: ↑Fri May 31, 2019 8:49 pm
This is not that hard. The harder thing seems to be to get the behavior of Win/Loss ratio to LTC. From the latest months' data of Andreas, I seem to get that Win/Loss ratio increases from rapid to LTC, almost consistently already. It might be (not very likely) related to the contempt used recently by many top engines.
Yes, but the data is contradictory when the whole FGRL data are used. Also, I am getting on my hardware higher W/L ratio for Lc0 T40 against SF at 4m + 4s compared to 1m + 1s TC from balanced openings, but pretty stable Elo difference. Not LTC at all, but the issue is not easy to me. Also, self-play might have a bit different behavior compared to the general case. Contempt too might interfere, as it's lowering the draw rate, but to unclear proportion of wins and losses.
Isn't it to be expected that the draw rate will be lower (and so the W/L will be higher) between the same two engines as TC increases? Both engines will make fewer mistakes and there will therefore be less noise. At STC, mistakes by both engines cancel each other out so far as they don't represent strength difference. At LTC, far fewer mistakes are made that cancel each other out (read: games tend to be won only by the stronger engine).
(And of course this is not about STC/LTC but about shorter/longer TC or, equivalently, lower/higher speed. What is 4m+4s on one computer is 4h+4m on another.)
I'm pretty sure contempt only lowers the draw rate by creating more noise, so it won't give more accurate ELO estimates.
syzygy wrote: ↑Sat Jun 01, 2019 4:38 pm
Isn't it to be expected that the draw rate will be lower (and so the W/L will be higher) between the same two engines as TC increases? Both engines will make fewer mistakes and there will therefore be less noise. At STC, mistakes by both engines cancel each other out so far as they don't represent strength difference. At LTC, far fewer mistakes are made that cancel each other out (read: games tend to be won only by the stronger engine).
I might have expected the opposite, that the draw rate would be higher as TC increases.
syzygy wrote: ↑Sat Jun 01, 2019 4:38 pm
Isn't it to be expected that the draw rate will be lower (and so the W/L will be higher) between the same two engines as TC increases? Both engines will make fewer mistakes and there will therefore be less noise. At STC, mistakes by both engines cancel each other out so far as they don't represent strength difference. At LTC, far fewer mistakes are made that cancel each other out (read: games tend to be won only by the stronger engine).
I might have expected the opposite, that the draw rate would be higher as TC increases.
Ouch yes, and that is what I had wanted to write!
As the TC increases, the draw rate increases. If Elo difference stays the same, that means the W/L rate will be higher. So what I wrote still makes sense I think, I should just have written "draw rate will be higher" instead of lower.
Suppose engine A on average score 55-45 against engine B. With zero noise, that means 10 wins for A and 90 draws (so infinite W/L ratio). With maximum noise, it means 55 wins for A and 45 wins for B (so a W/L ratio that is just above 1).
jp wrote: ↑Sat Jun 01, 2019 9:42 am
... which is relevant to the W/L question Kai raised (though it's only Komodo 9.3).
Laskos wrote: ↑Fri May 31, 2019 8:49 pm
This is not that hard. The harder thing seems to be to get the behavior of Win/Loss ratio to LTC. From the latest months' data of Andreas, I seem to get that Win/Loss ratio increases from rapid to LTC, almost consistently already. It might be (not very likely) related to the contempt used recently by many top engines.
Yes, but the data is contradictory when the whole FGRL data are used. Also, I am getting on my hardware higher W/L ratio for Lc0 T40 against SF at 4m + 4s compared to 1m + 1s TC from balanced openings, but pretty stable Elo difference. Not LTC at all, but the issue is not easy to me. Also, self-play might have a bit different behavior compared to the general case. Contempt too might interfere, as it's lowering the draw rate, but to unclear proportion of wins and losses.
Isn't it to be expected that the draw rate will be lower (and so the W/L will be higher) between the same two engines as TC increases? Both engines will make fewer mistakes and there will therefore be less noise. At STC, mistakes by both engines cancel each other out so far as they don't represent strength difference. At LTC, far fewer mistakes are made that cancel each other out (read: games tend to be won only by the stronger engine).
(And of course this is not about STC/LTC but about shorter/longer TC or, equivalently, lower/higher speed. What is 4m+4s on one computer is 4h+4m on another.)
I'm pretty sure contempt only lowers the draw rate by creating more noise, so it won't give more accurate ELO estimates.
Yes, looks about right, although there was some consensus that stronger entities (from anything like TC, hardware etc) compress somewhat the Elo differences while increasing the draw rate. From the same FGRL rating list some two years ago, it was pretty clear that Elo differences diminish from 10m + 6s to 60m + 15s TC (and draw rates increase). Wilo (draws discarded) or normalized Elo seemed more stable from going from one TC to another.
But several months ago I saw something different with FGRL: Elo almost doesn't compress at all on these 2 different TC. And Elo again seems a good measure to separate engines by strength. I have this data:
The compression in Elo ratings is a negligible 2.1%. 2 years ago I got a significant Elo compression of IIRC some 15% on these 2 TC. I am not sure why this changed and your reasoning applies to a higher degree now than 2 years ago. I agree with your reasoning, but it's important how much quantitatively it matters in Elo ratings. OTOH, I seem to remember that longer TC CCRL and CEGT rating lists are somewhat Elo-compressed compared to shorter TC.