Page 1 of 2
Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sat Apr 07, 2018 10:56 am
by tpoppins
Games:
PGN
Test setup:
details
Code: Select all
CCRL 40/40 Rating List - Custom engine selection
816025 games played by 2140 programs, run by 21 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz),
about 15 minutes on a modern Intel CPU.
Computed on April 5, 2018 with Bayeselo based on 816'025 games
Tested by CCRL team, 2005-2018, http://computerchess.org.uk/ccrl/4040/
Engine Elo + - Score AvOp Games
Dorpsgek Dillinger 64-bit 2202 +21 -21 49.2% +4.7 790
Dorpsgek Eves-Temptation 64-bit 2200 +26 -26 51.5% -10.8 525
Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sat Apr 07, 2018 1:06 pm
by ZirconiumX
This is strange. The patches all tested as improvements, but the net result is a slight regression, even with the bugfix I found.
Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sat Apr 07, 2018 1:59 pm
by Ras
ZirconiumX wrote:the net result is a slight regression
With these error margins, it looks like noise.
Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sat Apr 07, 2018 8:14 pm
by ZirconiumX
But the expected improvement is outside the noise, which is what I'm worried about.
Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sat Apr 07, 2018 8:38 pm
by CMCanavessi
ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
I've just started a gauntlet of Dillinger vs. 16 other engines with very similar strenght (8 rounds, total 128 games). After it's done I'll run the same with Eve's Temptation and report back the results.
Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sun Apr 08, 2018 7:13 am
by tpoppins
ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
The expected improvement (
+45) is within the error margins (47) in this case. The older version's margins must also be taken into account, not just those of the new version.
Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sun Apr 08, 2018 7:29 am
by tpoppins
CMCanavessi wrote:ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
I've just started a gauntlet of Dillinger vs. 16 other engines with very similar strenght (8 rounds, total 128 games). After it's done I'll run the same with Eve's Temptation and report back the results.
The combined error margins will be close to 100.
Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sun Apr 08, 2018 7:48 am
by tpoppins
tpoppins wrote:Code: Select all
CCRL 40/40 Rating List - Custom engine selection
816025 games played by 2140 programs, run by 21 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz),
about 15 minutes on a modern Intel CPU.
Computed on April 5, 2018 with Bayeselo based on 816'025 games
Tested by CCRL team, 2005-2018, http://computerchess.org.uk/ccrl/4040/
Engine Elo + - Score AvOp Games
Dorpsgek Dillinger 64-bit 2202 +21 -21 49.2% +4.7 790
Dorpsgek Eves-Temptation 64-bit 2200 +26 -26 51.5% -10.8 525
This table doesn't show the LOS (the corresponding
CCRL page does), which is a mere 53.9%. Thus with the current number of games for each version it's impossible to tell which one is stronger.
I intend to rectify this situation by the next update.
Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sun Apr 08, 2018 4:45 pm
by CMCanavessi
This is what I've gotten so far:
Dillinger:
Code: Select all
Engine Score Do
01: Dorpsgek Dillinger x64 64.5/128 ········
02: Galjoen 0.38 x64 6.5/8 1111===1
03: Clarabit 1.00 x64 6.0/8 01101111
03: CPW-Engine 1.1 x64 6.0/8 10111101
05: Galjoen 0.37.2.1 x64 5.5/8 011101=1
06: CT800 V1.12 x64 4.5/8 110011=0
06: Isa 2.0.64 x64 4.5/8 =100==11
06: GopherCheck 0.2.3 x64 4.5/8 10===101
06: AdroitChess 0.4 x32 4.5/8 0=1==011
10: CT800 V1.20 x64 4.0/8 00011011
11: Isa 2.0.61 x64 3.5/8 0=10=1=0
11: Zevra v1.8.4 r650 x64 3.5/8 =0011001
13: Prophet v3.0 20170909 x64 3.0/8 =1=000==
14: Parrot 070116 x32 2.5/8 01=000==
15: Dumb 1.1 x64 2.0/8 010000==
15: Monarch 1.7 x32 2.0/8 01100000
17: Gunborg 1.35 x64 1.0/8 00000010
128 games played / Tournament finished
Name of the tournament: 042 - Dorpsgek Dillinger Gauntlet
Eve's
Code: Select all
Engine Score Do
01: Dorpsgek Eve's Temptation x64 67.5/128 ········
02: Galjoen 0.37.2.1 x64 6.0/8 111=011=
03: Galjoen 0.38 x64 5.0/8 1011=01=
04: AdroitChess 0.4 x32 4.5/8 0111=100
04: Prophet v3.0 20170909 x64 4.5/8 0101=011
04: CT800 V1.20 x64 4.5/8 0=011110
04: Isa 2.0.61 x64 4.5/8 110=01==
08: Clarabit 1.00 x64 4.0/8 =1100=10
08: CT800 V1.12 x64 4.0/8 =101010=
08: Isa 2.0.64 x64 4.0/8 =01===01
08: Zevra v1.8.4 r650 x64 4.0/8 =10=0011
08: CPW-Engine 1.1 x64 4.0/8 =11000=1
13: Parrot 070116 x32 3.5/8 11100=00
14: Monarch 1.7 x32 2.5/8 =1==0000
14: GopherCheck 0.2.3 x64 2.5/8 00=10001
14: Gunborg 1.35 x64 2.5/8 1=0==000
17: Dumb 1.1 x64 0.5/8 0=000000
128 games played / Tournament finished
And the current ratings (Dillinger played much more):
Code: Select all
137 Dumb 1.0 x64 : 2226.3 126 44 15 67 41 12 2324.3 63 63.0
138 Isa 2.0.61 x64 : 2213.1 142 46 26 70 42 18 2308.5 64 58.6
139 Dorpsgek Eve's Temptation x64 : 2194.7 128 52 31 45 53 24 2175.6 16 16.0
140 Galjoen 0.37.2.1 x64 : 2185.1 150 47 22 81 39 15 2311.7 65 58.1
141 Zevra v1.8.4 r650 x64 : 2182.8 85 15 10 60 24 12 2486.6 63 45.5
142 GopherCheck 0.2.3 x64 : 2181.6 211 48 39 124 32 18 2389.4 100 84.4
143 CT800 V1.20 x64 : 2170.1 85 16 7 62 23 8 2478.3 63 45.5
144 Dorpsgek Dillinger x64 : 2169.4 323 97 48 178 37 15 2315.3 104 75.4
145 CT800 V1.12 x64 : 2163.9 142 46 15 81 38 11 2309.2 64 58.6
146 Dumb 1.1 x64 : 2163.0 77 12 10 55 22 13 2493.8 62 48.0
147 Clarabit 1.00 x64 : 2159.5 92 48 14 30 60 15 2087.8 21 20.4
Re: Dorpsgek Eve's Temptation 64-bit Gauntlet for CCRL 40/40
Posted: Sun Apr 08, 2018 7:36 pm
by Sven
tpoppins wrote:ZirconiumX wrote:But the expected improvement is outside the noise, which is what I'm worried about.
The expected improvement (
+45) is within the error margins (47) in this case. The older version's margins must also be taken into account, not just those of the new version.
This is not quite correct. The error margin for the rating difference R2-R1, when given error margins E1 and E2, is sqrt(E1^2 + E2^2), so in the given case it would be around 33. For the special case E1=E2 you would get E(diff)=sqrt(2) * E1. So the statement that the expected improvement was outside the error margin was in fact correct.