Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by tpoppins »

Image

Games: PGN
Test setup: details

Code: Select all

CCRL 40/40 Rating List - Custom engine selection
816025 games played by 2140 programs, run by 21 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz), 
about 15 minutes on a modern Intel CPU.
Computed on April 5, 2018 with Bayeselo based on 816'025 games
Tested by CCRL team, 2005-2018, http://computerchess.org.uk/ccrl/4040/

              Engine                  Elo   +    -   Score  AvOp  Games
  Dorpsgek Dillinger 64-bit          2202  +21  -21  49.2%   +4.7   790
  Dorpsgek Eves-Temptation 64-bit    2200  +26  -26  51.5%  -10.8   525
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by ZirconiumX »

This is strange. The patches all tested as improvements, but the net result is a slight regression, even with the bugfix I found.
Some believe in the almighty dollar.

I believe in the almighty printf statement.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by Ras »

ZirconiumX wrote:the net result is a slight regression
With these error margins, it looks like noise.
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by ZirconiumX »

But the expected improvement is outside the noise, which is what I'm worried about.
Some believe in the almighty dollar.

I believe in the almighty printf statement.
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by CMCanavessi »

ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
I've just started a gauntlet of Dillinger vs. 16 other engines with very similar strenght (8 rounds, total 128 games). After it's done I'll run the same with Eve's Temptation and report back the results.
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by tpoppins »

ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
The expected improvement (+45) is within the error margins (47) in this case. The older version's margins must also be taken into account, not just those of the new version.
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by tpoppins »

CMCanavessi wrote:
ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
I've just started a gauntlet of Dillinger vs. 16 other engines with very similar strenght (8 rounds, total 128 games). After it's done I'll run the same with Eve's Temptation and report back the results.
The combined error margins will be close to 100.
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by tpoppins »

tpoppins wrote:

Code: Select all

CCRL 40/40 Rating List - Custom engine selection
816025 games played by 2140 programs, run by 21 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz), 
about 15 minutes on a modern Intel CPU.
Computed on April 5, 2018 with Bayeselo based on 816'025 games
Tested by CCRL team, 2005-2018, http://computerchess.org.uk/ccrl/4040/

              Engine                  Elo   +    -   Score  AvOp  Games
  Dorpsgek Dillinger 64-bit          2202  +21  -21  49.2%   +4.7   790
  Dorpsgek Eves-Temptation 64-bit    2200  +26  -26  51.5%  -10.8   525
This table doesn't show the LOS (the corresponding CCRL page does), which is a mere 53.9%. Thus with the current number of games for each version it's impossible to tell which one is stronger.
I intend to rectify this situation by the next update.
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by CMCanavessi »

This is what I've gotten so far:

Dillinger:

Code: Select all

    Engine                    Score          Do
01: Dorpsgek Dillinger x64    64.5/128 ········ 
02: Galjoen 0.38 x64          6.5/8    1111===1 
03: Clarabit 1.00 x64         6.0/8    01101111 
03: CPW-Engine 1.1 x64        6.0/8    10111101 
05: Galjoen 0.37.2.1 x64      5.5/8    011101=1 
06: CT800 V1.12 x64           4.5/8    110011=0 
06: Isa 2.0.64 x64            4.5/8    =100==11 
06: GopherCheck 0.2.3 x64     4.5/8    10===101 
06: AdroitChess 0.4 x32       4.5/8    0=1==011 
10: CT800 V1.20 x64           4.0/8    00011011 
11: Isa 2.0.61 x64            3.5/8    0=10=1=0 
11: Zevra v1.8.4 r650 x64     3.5/8    =0011001 
13: Prophet v3.0 20170909 x64 3.0/8    =1=000== 
14: Parrot 070116 x32         2.5/8    01=000== 
15: Dumb 1.1 x64              2.0/8    010000== 
15: Monarch 1.7 x32           2.0/8    01100000 
17: Gunborg 1.35 x64          1.0/8    00000010 

128 games played / Tournament finished
Name of the tournament: 042 - Dorpsgek Dillinger Gauntlet

Eve's

Code: Select all

    Engine                        Score          Do
01: Dorpsgek Eve's Temptation x64 67.5/128 ········ 
02: Galjoen 0.37.2.1 x64          6.0/8    111=011= 
03: Galjoen 0.38 x64              5.0/8    1011=01= 
04: AdroitChess 0.4 x32           4.5/8    0111=100 
04: Prophet v3.0 20170909 x64     4.5/8    0101=011 
04: CT800 V1.20 x64               4.5/8    0=011110 
04: Isa 2.0.61 x64                4.5/8    110=01== 
08: Clarabit 1.00 x64             4.0/8    =1100=10 
08: CT800 V1.12 x64               4.0/8    =101010= 
08: Isa 2.0.64 x64                4.0/8    =01===01 
08: Zevra v1.8.4 r650 x64         4.0/8    =10=0011 
08: CPW-Engine 1.1 x64            4.0/8    =11000=1 
13: Parrot 070116 x32             3.5/8    11100=00 
14: Monarch 1.7 x32               2.5/8    =1==0000 
14: GopherCheck 0.2.3 x64         2.5/8    00=10001 
14: Gunborg 1.35 x64              2.5/8    1=0==000 
17: Dumb 1.1 x64                  0.5/8    0=000000 

128 games played / Tournament finished

And the current ratings (Dillinger played much more):


Code: Select all

137 Dumb 1.0 x64                           :  2226.3     126   44   15   67    41    12  2324.3    63    63.0
 138 Isa 2.0.61 x64                         :  2213.1     142   46   26   70    42    18  2308.5    64    58.6
 139 Dorpsgek Eve's Temptation x64          :  2194.7     128   52   31   45    53    24  2175.6    16    16.0
 140 Galjoen 0.37.2.1 x64                   :  2185.1     150   47   22   81    39    15  2311.7    65    58.1
 141 Zevra v1.8.4 r650 x64                  :  2182.8      85   15   10   60    24    12  2486.6    63    45.5
 142 GopherCheck 0.2.3 x64                  :  2181.6     211   48   39  124    32    18  2389.4   100    84.4
 143 CT800 V1.20 x64                        :  2170.1      85   16    7   62    23     8  2478.3    63    45.5
 144 Dorpsgek Dillinger x64                 :  2169.4     323   97   48  178    37    15  2315.3   104    75.4
 145 CT800 V1.12 x64                        :  2163.9     142   46   15   81    38    11  2309.2    64    58.6
 146 Dumb 1.1 x64                           :  2163.0      77   12   10   55    22    13  2493.8    62    48.0
 147 Clarabit 1.00 x64                      :  2159.5      92   48   14   30    60    15  2087.8    21    20.4
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Post by Sven »

tpoppins wrote:
ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
The expected improvement (+45) is within the error margins (47) in this case. The older version's margins must also be taken into account, not just those of the new version.
This is not quite correct. The error margin for the rating difference R2-R1, when given error margins E1 and E2, is sqrt(E1^2 + E2^2), so in the given case it would be around 33. For the special case E1=E2 you would get E(diff)=sqrt(2) * E1. So the statement that the expected improvement was outside the error margin was in fact correct.