Page 1 of 2

Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sat Apr 07, 2018 10:56 am
by tpoppins
Image

Games: PGN
Test setup: details

Code: Select all

CCRL 40/40 Rating List - Custom engine selection
816025 games played by 2140 programs, run by 21 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz), 
about 15 minutes on a modern Intel CPU.
Computed on April 5, 2018 with Bayeselo based on 816'025 games
Tested by CCRL team, 2005-2018, http://computerchess.org.uk/ccrl/4040/

              Engine                  Elo   +    -   Score  AvOp  Games
  Dorpsgek Dillinger 64-bit          2202  +21  -21  49.2%   +4.7   790
  Dorpsgek Eves-Temptation 64-bit    2200  +26  -26  51.5%  -10.8   525

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sat Apr 07, 2018 1:06 pm
by ZirconiumX
This is strange. The patches all tested as improvements, but the net result is a slight regression, even with the bugfix I found.

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sat Apr 07, 2018 1:59 pm
by Ras
ZirconiumX wrote:the net result is a slight regression
With these error margins, it looks like noise.

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sat Apr 07, 2018 8:14 pm
by ZirconiumX
But the expected improvement is outside the noise, which is what I'm worried about.

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sat Apr 07, 2018 8:38 pm
by CMCanavessi
ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
I've just started a gauntlet of Dillinger vs. 16 other engines with very similar strenght (8 rounds, total 128 games). After it's done I'll run the same with Eve's Temptation and report back the results.

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sun Apr 08, 2018 7:13 am
by tpoppins
ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
The expected improvement (+45) is within the error margins (47) in this case. The older version's margins must also be taken into account, not just those of the new version.

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sun Apr 08, 2018 7:29 am
by tpoppins
CMCanavessi wrote:
ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
I've just started a gauntlet of Dillinger vs. 16 other engines with very similar strenght (8 rounds, total 128 games). After it's done I'll run the same with Eve's Temptation and report back the results.
The combined error margins will be close to 100.

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sun Apr 08, 2018 7:48 am
by tpoppins
tpoppins wrote:

Code: Select all

CCRL 40/40 Rating List - Custom engine selection
816025 games played by 2140 programs, run by 21 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz), 
about 15 minutes on a modern Intel CPU.
Computed on April 5, 2018 with Bayeselo based on 816'025 games
Tested by CCRL team, 2005-2018, http://computerchess.org.uk/ccrl/4040/

              Engine                  Elo   +    -   Score  AvOp  Games
  Dorpsgek Dillinger 64-bit          2202  +21  -21  49.2%   +4.7   790
  Dorpsgek Eves-Temptation 64-bit    2200  +26  -26  51.5%  -10.8   525
This table doesn't show the LOS (the corresponding CCRL page does), which is a mere 53.9%. Thus with the current number of games for each version it's impossible to tell which one is stronger.
I intend to rectify this situation by the next update.

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sun Apr 08, 2018 4:45 pm
by CMCanavessi
This is what I've gotten so far:

Dillinger:

Code: Select all

    Engine                    Score          Do
01: Dorpsgek Dillinger x64    64.5/128 ········ 
02: Galjoen 0.38 x64          6.5/8    1111===1 
03: Clarabit 1.00 x64         6.0/8    01101111 
03: CPW-Engine 1.1 x64        6.0/8    10111101 
05: Galjoen 0.37.2.1 x64      5.5/8    011101=1 
06: CT800 V1.12 x64           4.5/8    110011=0 
06: Isa 2.0.64 x64            4.5/8    =100==11 
06: GopherCheck 0.2.3 x64     4.5/8    10===101 
06: AdroitChess 0.4 x32       4.5/8    0=1==011 
10: CT800 V1.20 x64           4.0/8    00011011 
11: Isa 2.0.61 x64            3.5/8    0=10=1=0 
11: Zevra v1.8.4 r650 x64     3.5/8    =0011001 
13: Prophet v3.0 20170909 x64 3.0/8    =1=000== 
14: Parrot 070116 x32         2.5/8    01=000== 
15: Dumb 1.1 x64              2.0/8    010000== 
15: Monarch 1.7 x32           2.0/8    01100000 
17: Gunborg 1.35 x64          1.0/8    00000010 

128 games played / Tournament finished
Name of the tournament: 042 - Dorpsgek Dillinger Gauntlet

Eve's

Code: Select all

    Engine                        Score          Do
01: Dorpsgek Eve's Temptation x64 67.5/128 ········ 
02: Galjoen 0.37.2.1 x64          6.0/8    111=011= 
03: Galjoen 0.38 x64              5.0/8    1011=01= 
04: AdroitChess 0.4 x32           4.5/8    0111=100 
04: Prophet v3.0 20170909 x64     4.5/8    0101=011 
04: CT800 V1.20 x64               4.5/8    0=011110 
04: Isa 2.0.61 x64                4.5/8    110=01== 
08: Clarabit 1.00 x64             4.0/8    =1100=10 
08: CT800 V1.12 x64               4.0/8    =101010= 
08: Isa 2.0.64 x64                4.0/8    =01===01 
08: Zevra v1.8.4 r650 x64         4.0/8    =10=0011 
08: CPW-Engine 1.1 x64            4.0/8    =11000=1 
13: Parrot 070116 x32             3.5/8    11100=00 
14: Monarch 1.7 x32               2.5/8    =1==0000 
14: GopherCheck 0.2.3 x64         2.5/8    00=10001 
14: Gunborg 1.35 x64              2.5/8    1=0==000 
17: Dumb 1.1 x64                  0.5/8    0=000000 

128 games played / Tournament finished

And the current ratings (Dillinger played much more):


Code: Select all

137 Dumb 1.0 x64                           :  2226.3     126   44   15   67    41    12  2324.3    63    63.0
 138 Isa 2.0.61 x64                         :  2213.1     142   46   26   70    42    18  2308.5    64    58.6
 139 Dorpsgek Eve's Temptation x64          :  2194.7     128   52   31   45    53    24  2175.6    16    16.0
 140 Galjoen 0.37.2.1 x64                   :  2185.1     150   47   22   81    39    15  2311.7    65    58.1
 141 Zevra v1.8.4 r650 x64                  :  2182.8      85   15   10   60    24    12  2486.6    63    45.5
 142 GopherCheck 0.2.3 x64                  :  2181.6     211   48   39  124    32    18  2389.4   100    84.4
 143 CT800 V1.20 x64                        :  2170.1      85   16    7   62    23     8  2478.3    63    45.5
 144 Dorpsgek Dillinger x64                 :  2169.4     323   97   48  178    37    15  2315.3   104    75.4
 145 CT800 V1.12 x64                        :  2163.9     142   46   15   81    38    11  2309.2    64    58.6
 146 Dumb 1.1 x64                           :  2163.0      77   12   10   55    22    13  2493.8    62    48.0
 147 Clarabit 1.00 x64                      :  2159.5      92   48   14   30    60    15  2087.8    21    20.4

Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40

Posted: Sun Apr 08, 2018 7:36 pm
by Sven
tpoppins wrote:
ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
The expected improvement (+45) is within the error margins (47) in this case. The older version's margins must also be taken into account, not just those of the new version.
This is not quite correct. The error margin for the rating difference R2-R1, when given error margins E1 and E2, is sqrt(E1^2 + E2^2), so in the given case it would be around 33. For the special case E1=E2 you would get E(diff)=sqrt(2) * E1. So the statement that the expected improvement was outside the error margin was in fact correct.