FrankWalter 2.3.3 64-bit Gauntlet for CCRL 40/40

tpoppins · Post by **tpoppins** » Wed Mar 20, 2019 8:10 am

Games: PGN

CCRL 40/40 Rating List - Custom engine selection
1004662 games played by 2384 programs, run by 22 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz), 
about 15 minutes on a modern Intel CPU.
Computed on March 16, 2019 with Bayeselo based on 1'004'662 games
Tested by CCRL team, 2005-2019, http://computerchess.org.uk/ccrl/4040/

           Engine                     Elo   +    -   Score  AvOp  Games
  FrankWalter 2.2.8 64-bit           2457  +20  -20  50.3%   -3.9   892
  FrankWalter 2.3.3 64-bit           2447  +20  -20  47.8%  +14.9   867
  FrankWalter 2.2.3 64-bit           2392  +36  -35  59.5%  -74.7   301

ljgw · Post by **ljgw** » Wed Mar 20, 2019 10:38 am

Wow, what an enormous amount of testing Tirsa!

Thank you in particular for testing FW2.3.3:-)

It a shame to see FW regress, but it happens I suppose, especially when testing at long time controls. My own tests at shorter TC indicated progression, but I've since identified a bug in the transposition table so I hope the new version (2.3.5) will not disappoint and show progression at long time controls! Especially because Graham intends to field FW2.3.5 in D6, so I will need every bit of strength I can find;-)

Thank you for testing at 40/40, it really addresses some area's that are problematic for me to test.

--Laurens

tpoppins · Post by **tpoppins** » Wed Mar 20, 2019 7:36 pm

You're welcome, Laurens.

So many tests posted this week because I had a large backlog dating back to January. Now that I got some time this week I'm clearing it. On the other hand what's posted here is only a tip of the iceberg, figuratively speaking.

I wouldn't call v2.3.3 a clear regression, not with 10 Elo separating it from v2.2.8 and the compound error margins being +/-28 Elo. It failed to prove superiority, that's all. These things happen all the time, there are a lot of engines that oscillate back and forth before finally making a spurt forward.

Good to find a confirmation of the usefulness of testing at LTC. Some are of the opinion that 40/40 is a waste of time, that IMO is an extremely simplistic view.

FrankWalter 2.3.3 64-bit Gauntlet for CCRL 40/40

FrankWalter 2.3.3 64-bit Gauntlet for CCRL 40/40

Re: FrankWalter 2.3.3 64-bit Gauntlet for CCRL 40/40

Re: FrankWalter 2.3.3 64-bit Gauntlet for CCRL 40/40