Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
Graham Banks
Posts: 41455
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by Graham Banks »

gbanksnz at gmail.com
User avatar
Graham Banks
Posts: 41455
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by Graham Banks »

Code: Select all

CCRL 40/40 Rating List - Custom engine selection
1009555 games played by 2389 programs, run by 22 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz), about 15 minutes on a modern Intel CPU.
Computed on March 23, 2019 with Bayeselo based on 1'009'555 games
Tested by CCRL team, 2005-2019, http://computerchess.org.uk/ccrl/4040/

Rank               Engine                Elo   +    -   Score  AvOp  Games
1 Jumbo 0.6.51 64-bit                2492  +28  -28  50.4%   -2.4   416
  Jumbo 0.6.66 64-bit                2490  +17  -17  51.1%   -6.6  1172
  Jumbo 0.6.10 64-bit                2483  +24  -24  52.5%  -19.4   585
  Jumbo 0.6.35 64-bit                2478  +20  -20  49.2%   +4.9   942
  Jumbo 0.6.96 64-bit                2474  +27  -27  46.6%  +25.8   460
  Jumbo 0.6.31 64-bit                2469  +25  -25  47.3%  +18.9   566
  Jumbo 0.4.34 64-bit                2421  +25  -25  53.0%  -22.3   532
  Jumbo 0.5.3 64-bit                 2369  +28  -28  45.4%  +33.7   457
  Jumbo 0.4.17 64-bit                2342  +28  -27  51.9%  -13.5   462
  Jumbo 0.4.0 64-bit                 2276  +35  -35  49.7%   +2.4   302
gbanksnz at gmail.com
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by tpoppins »

Image
Tirsa Poppins
CCRL
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by Sven »

Thanks for the tests! When comparing current CCRL Jumbo ratings to my own test results for versions 0.6.* I see some significant differences for 0.6.66 and 0.6.96 which I do not understand. For older Jumbo versions both CCRL and own test ratings are matching pretty well. Here are my relative ratings (generated with BayesElo from many games between many different Jumbo versions, TC 40/0:03):

Code: Select all

Jumbo 0.6.96         166    8    7  6387   51%   156   33%
Jumbo 0.6.66         122   11   11  2592   50%   119   32%
Jumbo 0.6.51          83   10   10  3248   48%    95   34%
Jumbo 0.6.10          81    7    7  9004   52%    69   33%
Jumbo 0.6.31          80   13   13  2055   50%    78   30%
Jumbo 0.6.35          74   11   11  2800   51%    71   35%
Is it possible that Jumbo gauntlets were run on very old hardware? That would be a plausible explanation at least for not seeing the jump from 0.6.51 to 0.6.66 on CCRL since one of the main changes from 0.6.51 to 0.6.66 was a compiler flag optimization that will only help with SSE 4.2 or newer.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by tpoppins »

I don't think you can expect LTC results mirror, or even closely follow, those obtained in hyper-bullet tests, Sven. Sometimes they do, other times they don't.

If you're looking for a confirmation of your results our blitz list would be a better place to look. Currently v0.6.96 is +20 Elo over v0.6.66 there which, given that self-play performance is typically 2x of the performance vs. other engines, looks about right, albeit with error margins as wide as the difference. The margins should contract appreciably after the next update as the game counts will nearly double for each version, then we should see the difference more clearly.

All of my hardware supports POPCNT. AFAIK, one of Grahams's boxes doesn't. Taken by itself this factor is largely irrelevant: the speedup offered by POPCNT (~5%) translates to about 5 Elo, a difference that will never show on the 40/40 list due to the number of games required to confirm it.

Most of the v0.6.66 games are mine; at the moment all of v0.6.96 games are Graham's but I'm running a fillup test to bring the number closer to that of v0.6.66 for a higher-LOS comparison. The results should be in by Saturday night, then we'll see. Frankly, I don't expect v0.6.96 to do better than get level with the previous version but seeing the difference on the blitz list sparked some hope.

IMO, bullet TCs are fine for no-regression testing but to find tangible gains at LTC you'd need to slow down your TC by a factor of 20 at least.
Tirsa Poppins
CCRL
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by Michael Sherwin »

Or maybe on the gauntlet machine they don't serve peanuts. :P
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
Graham Banks
Posts: 41455
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by Graham Banks »

Michael Sherwin wrote: Fri Apr 05, 2019 5:46 am Or maybe on the gauntlet machine they don't serve peanuts. :P
:lol:
gbanksnz at gmail.com
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by Sven »

tpoppins wrote: Fri Apr 05, 2019 2:46 am I don't think you can expect LTC results mirror, or even closely follow, those obtained in hyper-bullet tests, Sven. Sometimes they do, other times they don't.
Of course I agree.
If you're looking for a confirmation of your results our blitz list would be a better place to look. Currently v0.6.96 is +20 Elo over v0.6.66 there [...]
You are right but even there the noticeable improvement of v0.6.66 over previous versions is invisible (see below).
All of my hardware supports POPCNT. AFAIK, one of Grahams's boxes doesn't. Taken by itself this factor is largely irrelevant: the speedup offered by POPCNT (~5%) translates to about 5 Elo, a difference that will never show on the 40/40 list due to the number of games required to confirm it.
My point was not about POPCNT. In 0.6.66 I had introduced a compile optimization that has no effect on pre-SSE4.2 hardware but should improve playing strength significantly with all time controls on modern hardware.
IMO, bullet TCs are fine for no-regression testing but to find tangible gains at LTC you'd need to slow down your TC by a factor of 20 at least.
Certainly correct but there are some kinds of changes for which we know that they help for all time controls. Among these are, for instance:
- certain compile improvements
- fixing some severe bugs
- some kinds of evaluation improvements that do not slow down the engine (e.g. parameter tuning)

Anyway, I was just curious about the hardware that was used. You explained it, and it's ok for me, no problem at all - considering the very low strength level of Jumbo and also the current error bars the whole issue is certainly almost irrelevant. Thanks for your comments and your big efforts!
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by Michael Sherwin »

I have started 4 matches with Jumbo and RomiX.

First match is Jumbo 66 bb versus RomiXNoRL at 2+6 using the nooman 3 move 500 pgn 1000 games.

Second match is Jumbo 66 bb versus RomiXNoRL at 40/10 using the Sherwin50.pgn 100 games

Third match is Jumbo 96 bb versus RomiXRL at 2+6 using Sherwin50.pgn 1000 games

Fourth match is Jumbo 96 bb versus RomiXRL at 40/10 using nooman 3 move 500 pgn 1000 games random with not repeat of position

I hope I set everything up correctly because I got a little confused. Test 4 will take awhile. I'm doing the learning test for myself. But the test with no learning is for Sven. So Sven if you want me to add one more test I can since this is running on a 6 core processor. Just let me know.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Jumbo 0.6.96 64-bit Gauntlet for CCRL 40/40

Post by Sven »

Michael Sherwin wrote: Fri Apr 05, 2019 9:47 am So Sven if you want me to add one more test I can since this is running on a 6 core processor. Just let me know.
Sounds interesting, thanks for your tests! If you actually have some spare resources left for another test then I would be interested in something that would allow to compare Jumbo 0.6.96 to 0.6.66 under identical conditions, e.g. "Jumbo 96 bb versus RomiXNoRL at 2+6 using the nooman 3 move 500 pgn 1000 games" (that is your test 1 but with 96).
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)