CCRL 40/40 Rating List - Custom engine selection
1004662 games played by 2384 programs, run by 22 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz),
about 15 minutes on a modern Intel CPU.
Computed on March 16, 2019 with Bayeselo based on 1'004'662 games
Tested by CCRL team, 2005-2019, http://computerchess.org.uk/ccrl/4040/
Engine Elo + - Score AvOp Games
FrankWalter 2.2.8 64-bit 2457 +20 -20 50.3% -3.9 892
FrankWalter 2.3.3 64-bit 2447 +20 -20 47.8% +14.9 867
FrankWalter 2.2.3 64-bit 2392 +36 -35 59.5% -74.7 301
It a shame to see FW regress, but it happens I suppose, especially when testing at long time controls. My own tests at shorter TC indicated progression, but I've since identified a bug in the transposition table so I hope the new version (2.3.5) will not disappoint and show progression at long time controls! Especially because Graham intends to field FW2.3.5 in D6, so I will need every bit of strength I can find;-)
Thank you for testing at 40/40, it really addresses some area's that are problematic for me to test.
So many tests posted this week because I had a large backlog dating back to January. Now that I got some time this week I'm clearing it. On the other hand what's posted here is only a tip of the iceberg, figuratively speaking.
I wouldn't call v2.3.3 a clear regression, not with 10 Elo separating it from v2.2.8 and the compound error margins being +/-28 Elo. It failed to prove superiority, that's all. These things happen all the time, there are a lot of engines that oscillate back and forth before finally making a spurt forward.
Good to find a confirmation of the usefulness of testing at LTC. Some are of the opinion that 40/40 is a waste of time, that IMO is an extremely simplistic view.