Am I wrong, or at this moment there are almost 1.6 million games played? It is quite a lot.gladius wrote:Since the beta announcement, we have had up to six computers testing at once, and have played over 400,000 games!
Now:
If I include the draw ratio of more less 64.71%, I obtain an error bar of more less ± 4.32 Elo with 95% confidence (of course, LOS ~ 100%).Stockfish Testing Framework wrote:Retest regression with new hash size of 128MB, previous result with 32MB hash size was +16.62 ELO
20000 @ 60+0.05 th 1 (3698d9aa5573ca Vs. sf_2.3.1_base):
Code: Select all
ELO: 16.12 +-7.3 (95%) LOS: 100.0% Total: 8756 W: 1748 L: 1342 D: 5666
Given the fact that self tests tend to exaggerate Elo improvements, can be said that real improvement (for example in IPON) is around 10 Elo ± error bars?
I guess that it is too soon for a 2.3.2 or 2.4 release... thanks in advance for your attention.
Regards from Spain.
Ajedrecista.
