Crafty 25.3 64-bit Gauntlet for CCRL 40/15
Moderators: hgm, Rebel, chrisw
-
- Posts: 41463
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Crafty 25.3 64-bit Gauntlet for CCRL 40/15
gbanksnz at gmail.com
-
- Posts: 41463
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15
Code: Select all
CCRL 40/15 Rating List - Custom engine selection
1116816 games played by 2606 programs, run by 23 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 15 minutes on an Intel i7-4770k.
Computed on January 25, 2020 with Bayeselo based on 1'116'816 games
Tested by CCRL team, 2005-2020, http://ccrl.chessdom.com/ccrl/4040/
Rank Engine Elo + - Score AvOp Games
1 Crafty 25.2 64-bit 2931 +11 -11 48.8% +6.0 2688
Crafty 25.3 64-bit 2904 +28 -28 49.0% +8.5 400
Crafty 25.0 64-bit 2864 +20 -20 46.9% +19.5 794
Crafty 25.1 64-bit 2853 +30 -30 42.8% +47.6 356
Crafty 23.8 64-bit 2811 +23 -23 51.5% -8.7 604
Crafty 24.1 64-bit 2803 +22 -22 49.5% +2.9 659
Crafty 23.6 64-bit 2788 +25 -25 53.0% -19.7 478
Crafty 24.0 64-bit 2772 +28 -28 47.0% +19.5 396
Crafty 23.3 64-bit 2756 +32 -32 50.5% -3.6 309
Crafty 23.5 64-bit 2747 +28 -28 51.2% -10.2 406
Crafty 23.4 64-bit 2736 +28 -28 46.7% +20.4 419
Crafty 23.4 32-bit 2733 +20 -20 50.8% -5.5 861
Crafty 23.3 32-bit 2716 +32 -32 50.5% -4.3 309
Crafty 23.2 64-bit 2711 +30 -30 49.0% +6.9 357
Crafty 23.2 32-bit 2696 +32 -32 47.8% +15.7 320
Crafty 23.1 32-bit 2687 +18 -18 46.2% +24.9 970
Crafty 23.0 32-bit 2630 +29 -29 49.9% +1.7 380
Crafty 22.8 32-bit 2596 +32 -32 48.7% +6.9 315
Crafty 22.4 32-bit 2580 +32 -32 48.1% +12.2 318
Crafty 22.10 32-bit 2573 +32 -32 47.3% +14.4 320
Crafty 22.1 32-bit 2565 +28 -28 50.2% -3.6 421
Crafty 21.6 32-bit 2550 +34 -34 47.8% +12.6 302
Crafty 21.5 32-bit 2542 +32 -32 48.3% +13.7 344
Crafty 22.0 32-bit 2538 +33 -33 48.5% +9.6 304
Crafty 20.14 32-bit 2517 +27 -27 47.4% +19.5 482
Crafty 20.13 32-bit 2510 +33 -33 48.7% +9.8 312
Crafty 20.11 32-bit 2502 +33 -33 49.8% +3.2 307
gbanksnz at gmail.com
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15
Hmm, what's wrong with 25.3 or 25.2, how can a supposedly equal version (assuming bugfixes only) be nearly 30 elo weaker in CCRL list?
CEGT on the other hand has 25.3 30 elo stronger than 25.2, so the relative spread is 60 elo, that's insanely huge and scary
we can talk error bars, but this seems way off, epecially at this TC
is something wrong with Crafty or does independent testing actually produce huge noise in this case?
CEGT on the other hand has 25.3 30 elo stronger than 25.2, so the relative spread is 60 elo, that's insanely huge and scary
we can talk error bars, but this seems way off, epecially at this TC
is something wrong with Crafty or does independent testing actually produce huge noise in this case?
Martin Sedlak
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15
This is not meant as a criticism, I'm just trying to understand the discrepancy.
From what I've seen the change going from 400 to 3k games in CCRL is typically less than 10 elo points.
I don't understand the CEGT results for 25.3 either.
From what I've seen the change going from 400 to 3k games in CCRL is typically less than 10 elo points.
I don't understand the CEGT results for 25.3 either.
Martin Sedlak
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15
Code: Select all
1 Crafty 25.2 64-bit 2931 +11 -11 48.8% +6.0 2688
Crafty 25.3 64-bit 2904 +28 -28 49.0% +8.5 400
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15
CEGT:
too soon only if you assume CCRL has worst case for 25.3 and CEGT has best case, what's the probabily of two independent lists hitting the opposite extrema of the error bars?
Code: Select all
Crafty 25.3 x64 1CPU 2823 14 14
Crafty 25.2 x64 1CPU 2791 12 12
Martin Sedlak
-
- Posts: 41463
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15
All I can say is that I've run all of the 25.3 games and most (if not all) of the 25.2 games.mar wrote: ↑Fri Jan 31, 2020 11:45 am Hmm, what's wrong with 25.3 or 25.2, how can a supposedly equal version (assuming bugfixes only) be nearly 30 elo weaker in CCRL list?
CEGT on the other hand has 25.3 30 elo stronger than 25.2, so the relative spread is 60 elo, that's insanely huge and scary
we can talk error bars, but this seems way off, epecially at this TC
is something wrong with Crafty or does independent testing actually produce huge noise in this case?
gbanksnz at gmail.com
-
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15
Error bars are not rock solid and more so after just 400 games. The error bars come with a confidence level , typically 95%, which means one out of 20 will a failed result . The likelihood of one of the error bars when you have two runs is 10% - one out of 10 times. It happens far more often than people realize. Makes for good debates and exclamations that something is wrong. That is why Bob typically tested changes way over 100,000 games - he likes to get a 100% confidence level. Generally speaking if they both provide the same bench nodes in single CPU mode , both versions are functionally the same. Other Elo noise can be added by different operating systems and different compilers as well as just using different time controls or different CPUs , different opening books / positions etc etc.mar wrote: ↑Fri Jan 31, 2020 11:45 am Hmm, what's wrong with 25.3 or 25.2, how can a supposedly equal version (assuming bugfixes only) be nearly 30 elo weaker in CCRL list?
CEGT on the other hand has 25.3 30 elo stronger than 25.2, so the relative spread is 60 elo, that's insanely huge and scary
we can talk error bars, but this seems way off, epecially at this TC
is something wrong with Crafty or does independent testing actually produce huge noise in this case?
The statistical artifact shown may mean nothing at all. Try version 25.6. Most of the changes since 25.2 have been bug fixes. Some of the bugs were very rare , others would depend on how users used Crafty ( for example enabling draw offers or allowing resignations ). There are a lot of variables that may not be consistent from one user to the next.