After 17.8. we have these LTC patches until 28.8.:
Total: 30893 W: 5154 L: 4910 D: 20829 Elo +2.74
Total: 36203 W: 6046 L: 5781 D: 24376 Elo +2.54
Total: 41084 W: 6769 L: 6680 D: 27635 Elo +0.75
Total: 13655 W: 2294 L: 2099 D: 9262 Elo +4.96
Wow nice >10 ELO gain? No, regression test gave just 1,0 ELO . Also in NCM after more than 100 000 games no visible progress.
Observation from SF development
Moderators: hgm, Rebel, chrisw
-
- Posts: 12542
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Observation from SF development
There are error bars, of course, on all of the measurements.
And the measurements are really valid only for the exact conditions of the tests.
But if you look at Pohl's site, you will see a relentless and merciless march towards the stars.
Eventually, they would reach infinity.
So the numbers may not add up like cord wood and create the giant pile of Elo instantly.
But the technique clearly does work.
And the measurements are really valid only for the exact conditions of the tests.
But if you look at Pohl's site, you will see a relentless and merciless march towards the stars.
Eventually, they would reach infinity.
So the numbers may not add up like cord wood and create the giant pile of Elo instantly.
But the technique clearly does work.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 10309
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Observation from SF development
Numbers are wrong.Jouni wrote: ↑Thu Aug 30, 2018 4:46 pm After 17.8. we have these LTC patches until 28.8.:
Total: 30893 W: 5154 L: 4910 D: 20829 Elo +2.74
Total: 36203 W: 6046 L: 5781 D: 24376 Elo +2.54
Total: 41084 W: 6769 L: 6680 D: 27635 Elo +0.75
Total: 13655 W: 2294 L: 2099 D: 9262 Elo +4.96
Wow nice >10 ELO gain? No, regression test gave just 1,0 ELO . Also in NCM after more than 100 000 games no visible progress.
If you want unbiased estimates you should use fixed number of games and not SPRT.
They have many patches that fail at long time control and I guess most of the patches that passed long time control are going to fail if they test again.
Imagine that every patch is 0 elo improvement and you use SPRT.
If you test enough patches some are going to pass with +2 elo or even +4 elo biased estimate.
After enough time you can easily get 100 elo but when you test with 40000 games no visible progress.
It may be more interesting if they test every patch with 40000 games after it pass and use the result as the estimate(and use also negative numbers if they get them and if you want an unbiased estimate you are not allowed to revert patches that got less than 50% with 40000 games after they already passed SPRT).
I guess we are going to see a lower elo sum even if they test always against the previous version.
Uri
-
- Posts: 3293
- Joined: Wed Mar 08, 2006 8:15 pm
Re: Observation from SF development
BTW latest regression test shows regression: -1,7 ELO. So the trend to infinity is slowing down .
Jouni
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Observation from SF development.
Hello Jouni:
If those four tests were fixed game matches, I get your Elo estimates; for 95% confidence intervals, I get +0.53 to +2.74, +0.50 to +4.59, -1.17 to +2.67 and +1.66 to +8.27, respectively. Please compare these intervals with the correct ones shown below.
Michel van den Bergh did the correspondient math with Elo estimates from SPRT tests and anyone can see them now, just clicking over the two places that I write in bold:
Here are the results:
http://tests.stockfishchess.org/html/li ... 02bdba9914
http://tests.stockfishchess.org/html/li ... 02bdbadfa9
http://tests.stockfishchess.org/html/li ... 02bdbb1b85
http://tests.stockfishchess.org/html/li ... 02bdbb8038
Other important issue is that Elo gains from different tests are not additive: Fishtest is full of examples, just take all the patches between two consecutive 40000-game regression/progression matches and you will see that the difference between these 40000-game matches is usually less than the sum of gains of intermediate patches. Uri said that, again.
Regards from Spain.
Ajedrecista.
As Uri already said, error bars exist and numbers are wrong just because Elo estimates from SPRT tests involve different math than Elo estimates from fixed number of games matches.Jouni wrote: ↑Thu Aug 30, 2018 4:46 pm After 17.8. we have these LTC patches until 28.8.:
Total: 30893 W: 5154 L: 4910 D: 20829 Elo +2.74
Total: 36203 W: 6046 L: 5781 D: 24376 Elo +2.54
Total: 41084 W: 6769 L: 6680 D: 27635 Elo +0.75
Total: 13655 W: 2294 L: 2099 D: 9262 Elo +4.96
Wow nice >10 ELO gain? No, regression test gave just 1,0 ELO . Also in NCM after more than 100 000 games no visible progress.
If those four tests were fixed game matches, I get your Elo estimates; for 95% confidence intervals, I get +0.53 to +2.74, +0.50 to +4.59, -1.17 to +2.67 and +1.66 to +8.27, respectively. Please compare these intervals with the correct ones shown below.
Michel van den Bergh did the correspondient math with Elo estimates from SPRT tests and anyone can see them now, just clicking over the two places that I write in bold:
That is, over the box with the SPRT stats or over the word 'SPRT'.18-08-26 Viz tweak_se_malus diff
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 13655 W: 2294 L: 2099 D: 9262
sprt @ 60+0.6 th 1
LTC now...
Here are the results:
http://tests.stockfishchess.org/html/li ... 02bdba9914
Code: Select all
TC 60+0.6
SPRT elo0: 0.00 alpha: 0.05 elo1: 5.00 beta: 0.05
LLR 2.96 [-2.94,2.94] (accepted)
Elo 2.30 [-0.17,4.63] (95%)
LOS 96.6%
Games 30893 [w:16.7%, l:15.9%, d:67.4%]
Code: Select all
TC 60+0.6
SPRT elo0: 0.00 alpha: 0.05 elo1: 5.00 beta: 0.05
LLR 2.95 [-2.94,2.94] (accepted)
Elo 2.10 [-0.23,4.27] (95%)
LOS 96.2%
Games 36203 [w:16.7%, l:16.0%, d:67.3%]
Code: Select all
TC 60+0.6
SPRT elo0: -3.00 alpha: 0.05 elo1: 1.00 beta: 0.05
LLR 2.96 [-2.94,2.94] (accepted)
Elo 0.39 [-1.72,2.41] (95%)
LOS 64.7%
Games 41084 [w:16.5%, l:16.3%, d:67.3%]
Code: Select all
TC 60+0.6
SPRT elo0: 0.00 alpha: 0.05 elo1: 4.00 beta: 0.05
LLR 2.95 [-2.94,2.94] (accepted)
Elo 4.60 [1.19,7.96] (95%)
LOS 99.6%
Games 13655 [w:16.8%, l:15.4%, d:67.8%]
Regards from Spain.
Ajedrecista.