Page 2 of 4
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Thu Jan 10, 2019 2:33 pm
by CMCanavessi
konsolas wrote: ↑Tue Jan 08, 2019 10:03 pm
It's been a long while since I last updated Topple, but I've finally rewritten the evaluation and implemented Texel tuning with a simple linear search: self play shows an elo gain of about +150.
The release can be found here:
https://github.com/konsolas/ToppleChess ... tag/v0.3.0
I've provided 3 builds for windows, but it should be easy to build from source on Linux and macOS with CMake (you may need to remove the -static flag for clang on MacOS to work).
It would be awesome to find out how the new Topple performs against other engines
I don't know what the others will get, but this might well be a case of "self-play elo increase doesn't translate to real elo increase"
I ran 2 gauntlets, one with Topple v0.2.1 and then with Topple v0.3.0, same opponents, same number of games, and the results were almost identical (even v0.2.1 got more points, well inside error margins)
Code: Select all
Engine Score To
01: Topple v0.2.1 x64 118.0/240 ····
02: Counter 2.8 x64 4.0/4 1111
02: SOS 5.1 x32 4.0/4 1111
02: Fruit 1.0 x32 4.0/4 1111
02: Myrddin 0.87 x64 4.0/4 1111
06: DanaSah 7.0 x32 3.5/4 11=1
06: Asymptote v0.3 x64 3.5/4 11=1
06: Francesca M.A.D. 0.19 x32 3.5/4 1=11
06: tomitankChess 2.0 x64 3.5/4 11=1
06: Coiled 0.6 x64 3.5/4 1=11
11: Coiled 0.4 x64 3.0/4 0111
11: K2 v.0.91 x32 3.0/4 0111
11: Fruit 1.5 x32 3.0/4 1011
11: Nemeton 1.7 x32 3.0/4 1=1=
11: Winter 0.2 x64 3.0/4 1101
11: GreKo 2017 x64 3.0/4 1011
11: Abrok 5.0 x32 3.0/4 1==1
11: Simplex 0.9.8 x64 3.0/4 1==1
19: RubiChess 0.8.1 x64 2.5/4 1=10
19: TCB 0052 x32 2.5/4 1=01
19: Rotor 0.8 x32 2.5/4 =110
19: Minic 0.29 x64 2.5/4 01=1
19: Drosophila 1.5 x64 2.5/4 0=11
24: RomiChess P3n x64 2.0/4 0101
24: Galjoen 0.39.2 x64 2.0/4 0101
24: Schooner v1.8 x64 2.0/4 =10=
24: Counter 2.6 x64 2.0/4 1010
24: Betsabe II 1.75 x32 2.0/4 1001
24: Absolute Zero 2.4.7.2 x64 2.0/4 1=0=
24: Jazz Orchestra 840 x64 2.0/4 1100
24: Plisk 0.2.7_d x64 2.0/4 0110
24: Betsabe II 1.84 x32 2.0/4 ==10
24: Nemeton 1.61 x32 2.0/4 1010
24: Pharaon 3.5.1 x32 2.0/4 0011
24: Gromit3 3.0.0 x32 2.0/4 0101
24: RubiChess 0.9 x64 2.0/4 1100
37: Sungorus 1.4 x64 1.5/4 01=0
37: Hermann 2.8 x64 1.5/4 100=
37: Dimitri 4.00 x32 1.5/4 =100
37: MadChess 2.2 x64 1.5/4 0=10
37: Orion 0.5 x64 1.5/4 00=1
37: Ares 1.005.2.1 x64 1.5/4 010=
37: Ifrit m1.8 x64 1.5/4 =010
37: ChessBrainVB 3.60 x32 1.5/4 100=
37: Eeyore 1.52 x64 1.5/4 100=
37: Galjoen 0.38 x64 1.5/4 =0==
37: FrankWalter 2.2.8 x64 1.5/4 1=00
48: Counter 2.1.0 x64 1.0/4 1000
48: Dimitri 3.93 x32 1.0/4 0=0=
48: Nemeton 1.8 x32 1.0/4 0010
48: Orion 0.4 x64 1.0/4 0010
48: Kingfisher v1.1 x64 1.0/4 0010
48: Gaia 3.5 x64 1.0/4 0100
48: Monolith 0.3 x64 1.0/4 0010
48: Giraffe 20161023 x64 1.0/4 0010
56: Bumblebee 1.0.36898e1 x64 0.5/4 000=
56: TJchess 1.3 x64 0.5/4 0=00
56: AnMon 5.75 x32 0.5/4 00=0
59: Zevra v2.0 r172 x64 0.0/4 0000
59: Zevra v2.1.1 r216 x64 0.0/4 0000
59: Bearded Neural v44.5 x64 0.0/4 0000
240 games played / Tournament finished
Name of the tournament: 191 - Topple v0.2.1 Gauntlet
Code: Select all
Engine Score To
01: Topple v0.3.0 x64 115.5/240 ····
02: Rotor 0.8 x32 4.0/4 1111
02: Drosophila 1.5 x64 4.0/4 1111
04: Coiled 0.4 x64 3.5/4 =111
04: Counter 2.6 x64 3.5/4 1=11
06: GreKo 2017 x64 3.0/4 0111
06: SOS 5.1 x32 3.0/4 1011
06: AnMon 5.75 x32 3.0/4 1011
06: tomitankChess 2.0 x64 3.0/4 1101
06: Monolith 0.3 x64 3.0/4 =11=
06: FrankWalter 2.2.8 x64 3.0/4 1110
06: Francesca M.A.D. 0.19 x32 3.0/4 =1=1
06: RubiChess 0.8.1 x64 3.0/4 =11=
06: Ares 1.005.2.1 x64 3.0/4 1101
06: Winter 0.2 x64 3.0/4 1011
16: ChessBrainVB 3.60 x32 2.5/4 =101
16: Hermann 2.8 x64 2.5/4 01=1
16: DanaSah 7.0 x32 2.5/4 101=
16: Nemeton 1.8 x32 2.5/4 1=01
16: Orion 0.5 x64 2.5/4 11=0
16: Pharaon 3.5.1 x32 2.5/4 01=1
16: Fruit 1.0 x32 2.5/4 110=
16: Minic 0.29 x64 2.5/4 1===
16: Zevra v2.0 r172 x64 2.5/4 101=
16: Asymptote v0.3 x64 2.5/4 0=11
16: Simplex 0.9.8 x64 2.5/4 =101
16: Coiled 0.6 x64 2.5/4 01=1
28: K2 v.0.91 x32 2.0/4 ==10
28: Counter 2.8 x64 2.0/4 1100
28: Zevra v2.1.1 r216 x64 2.0/4 0==1
28: MadChess 2.2 x64 2.0/4 1001
28: Betsabe II 1.75 x32 2.0/4 0101
28: Myrddin 0.87 x64 2.0/4 1010
28: Bumblebee 1.0.36898e1 x64 2.0/4 0101
28: Betsabe II 1.84 x32 2.0/4 1001
28: Galjoen 0.38 x64 2.0/4 0101
28: Absolute Zero 2.4.7.2 x64 2.0/4 01==
28: Jazz Orchestra 840 x64 2.0/4 0110
28: Giraffe 20161023 x64 2.0/4 0110
28: Nemeton 1.61 x32 2.0/4 1=0=
28: RubiChess 0.9 x64 2.0/4 01==
28: Gaia 3.5 x64 2.0/4 0011
28: Counter 2.1.0 x64 2.0/4 1100
44: Schooner v1.8 x64 1.5/4 01=0
44: Nemeton 1.7 x32 1.5/4 ==0=
44: Dimitri 3.93 x32 1.5/4 1=00
44: Sungorus 1.4 x64 1.5/4 0=10
44: Gromit3 3.0.0 x32 1.5/4 0=01
44: Dimitri 4.00 x32 1.5/4 0=10
50: Kingfisher v1.1 x64 1.0/4 1000
50: Galjoen 0.39.2 x64 1.0/4 1000
50: Fruit 1.5 x32 1.0/4 0010
50: Orion 0.4 x64 1.0/4 0010
50: Plisk 0.2.7_d x64 1.0/4 0001
50: TJchess 1.3 x64 1.0/4 1000
56: Abrok 5.0 x32 0.5/4 =000
56: Eeyore 1.52 x64 0.5/4 000=
56: RomiChess P3n x64 0.5/4 000=
56: Bearded Neural v44.5 x64 0.5/4 00=0
56: Ifrit m1.8 x64 0.5/4 0=00
61: TCB 0052 x32 0.0/4 0000
240 games played / Tournament finished
Name of the tournament: 192 - Topple v0.3.0 Gauntlet
Let's see what other testers get...
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Thu Jan 10, 2019 5:29 pm
by konsolas
CMCanavessi wrote: ↑Thu Jan 10, 2019 2:33 pm
konsolas wrote: ↑Tue Jan 08, 2019 10:03 pm
It's been a long while since I last updated Topple, but I've finally rewritten the evaluation and implemented Texel tuning with a simple linear search: self play shows an elo gain of about +150.
The release can be found here:
https://github.com/konsolas/ToppleChess ... tag/v0.3.0
I've provided 3 builds for windows, but it should be easy to build from source on Linux and macOS with CMake (you may need to remove the -static flag for clang on MacOS to work).
It would be awesome to find out how the new Topple performs against other engines
I don't know what the others will get, but this might well be a case of "self-play elo increase doesn't translate to real elo increase"
I ran 2 gauntlets, one with Topple v0.2.1 and then with Topple v0.3.0, same opponents, same number of games, and the results were almost identical (even v0.2.1 got more points, well inside error margins)
Code: Select all
Engine Score To
01: Topple v0.2.1 x64 118.0/240 ····
02: Counter 2.8 x64 4.0/4 1111
02: SOS 5.1 x32 4.0/4 1111
02: Fruit 1.0 x32 4.0/4 1111
02: Myrddin 0.87 x64 4.0/4 1111
06: DanaSah 7.0 x32 3.5/4 11=1
06: Asymptote v0.3 x64 3.5/4 11=1
06: Francesca M.A.D. 0.19 x32 3.5/4 1=11
06: tomitankChess 2.0 x64 3.5/4 11=1
06: Coiled 0.6 x64 3.5/4 1=11
11: Coiled 0.4 x64 3.0/4 0111
11: K2 v.0.91 x32 3.0/4 0111
11: Fruit 1.5 x32 3.0/4 1011
11: Nemeton 1.7 x32 3.0/4 1=1=
11: Winter 0.2 x64 3.0/4 1101
11: GreKo 2017 x64 3.0/4 1011
11: Abrok 5.0 x32 3.0/4 1==1
11: Simplex 0.9.8 x64 3.0/4 1==1
19: RubiChess 0.8.1 x64 2.5/4 1=10
19: TCB 0052 x32 2.5/4 1=01
19: Rotor 0.8 x32 2.5/4 =110
19: Minic 0.29 x64 2.5/4 01=1
19: Drosophila 1.5 x64 2.5/4 0=11
24: RomiChess P3n x64 2.0/4 0101
24: Galjoen 0.39.2 x64 2.0/4 0101
24: Schooner v1.8 x64 2.0/4 =10=
24: Counter 2.6 x64 2.0/4 1010
24: Betsabe II 1.75 x32 2.0/4 1001
24: Absolute Zero 2.4.7.2 x64 2.0/4 1=0=
24: Jazz Orchestra 840 x64 2.0/4 1100
24: Plisk 0.2.7_d x64 2.0/4 0110
24: Betsabe II 1.84 x32 2.0/4 ==10
24: Nemeton 1.61 x32 2.0/4 1010
24: Pharaon 3.5.1 x32 2.0/4 0011
24: Gromit3 3.0.0 x32 2.0/4 0101
24: RubiChess 0.9 x64 2.0/4 1100
37: Sungorus 1.4 x64 1.5/4 01=0
37: Hermann 2.8 x64 1.5/4 100=
37: Dimitri 4.00 x32 1.5/4 =100
37: MadChess 2.2 x64 1.5/4 0=10
37: Orion 0.5 x64 1.5/4 00=1
37: Ares 1.005.2.1 x64 1.5/4 010=
37: Ifrit m1.8 x64 1.5/4 =010
37: ChessBrainVB 3.60 x32 1.5/4 100=
37: Eeyore 1.52 x64 1.5/4 100=
37: Galjoen 0.38 x64 1.5/4 =0==
37: FrankWalter 2.2.8 x64 1.5/4 1=00
48: Counter 2.1.0 x64 1.0/4 1000
48: Dimitri 3.93 x32 1.0/4 0=0=
48: Nemeton 1.8 x32 1.0/4 0010
48: Orion 0.4 x64 1.0/4 0010
48: Kingfisher v1.1 x64 1.0/4 0010
48: Gaia 3.5 x64 1.0/4 0100
48: Monolith 0.3 x64 1.0/4 0010
48: Giraffe 20161023 x64 1.0/4 0010
56: Bumblebee 1.0.36898e1 x64 0.5/4 000=
56: TJchess 1.3 x64 0.5/4 0=00
56: AnMon 5.75 x32 0.5/4 00=0
59: Zevra v2.0 r172 x64 0.0/4 0000
59: Zevra v2.1.1 r216 x64 0.0/4 0000
59: Bearded Neural v44.5 x64 0.0/4 0000
240 games played / Tournament finished
Name of the tournament: 191 - Topple v0.2.1 Gauntlet
Code: Select all
Engine Score To
01: Topple v0.3.0 x64 115.5/240 ····
02: Rotor 0.8 x32 4.0/4 1111
02: Drosophila 1.5 x64 4.0/4 1111
04: Coiled 0.4 x64 3.5/4 =111
04: Counter 2.6 x64 3.5/4 1=11
06: GreKo 2017 x64 3.0/4 0111
06: SOS 5.1 x32 3.0/4 1011
06: AnMon 5.75 x32 3.0/4 1011
06: tomitankChess 2.0 x64 3.0/4 1101
06: Monolith 0.3 x64 3.0/4 =11=
06: FrankWalter 2.2.8 x64 3.0/4 1110
06: Francesca M.A.D. 0.19 x32 3.0/4 =1=1
06: RubiChess 0.8.1 x64 3.0/4 =11=
06: Ares 1.005.2.1 x64 3.0/4 1101
06: Winter 0.2 x64 3.0/4 1011
16: ChessBrainVB 3.60 x32 2.5/4 =101
16: Hermann 2.8 x64 2.5/4 01=1
16: DanaSah 7.0 x32 2.5/4 101=
16: Nemeton 1.8 x32 2.5/4 1=01
16: Orion 0.5 x64 2.5/4 11=0
16: Pharaon 3.5.1 x32 2.5/4 01=1
16: Fruit 1.0 x32 2.5/4 110=
16: Minic 0.29 x64 2.5/4 1===
16: Zevra v2.0 r172 x64 2.5/4 101=
16: Asymptote v0.3 x64 2.5/4 0=11
16: Simplex 0.9.8 x64 2.5/4 =101
16: Coiled 0.6 x64 2.5/4 01=1
28: K2 v.0.91 x32 2.0/4 ==10
28: Counter 2.8 x64 2.0/4 1100
28: Zevra v2.1.1 r216 x64 2.0/4 0==1
28: MadChess 2.2 x64 2.0/4 1001
28: Betsabe II 1.75 x32 2.0/4 0101
28: Myrddin 0.87 x64 2.0/4 1010
28: Bumblebee 1.0.36898e1 x64 2.0/4 0101
28: Betsabe II 1.84 x32 2.0/4 1001
28: Galjoen 0.38 x64 2.0/4 0101
28: Absolute Zero 2.4.7.2 x64 2.0/4 01==
28: Jazz Orchestra 840 x64 2.0/4 0110
28: Giraffe 20161023 x64 2.0/4 0110
28: Nemeton 1.61 x32 2.0/4 1=0=
28: RubiChess 0.9 x64 2.0/4 01==
28: Gaia 3.5 x64 2.0/4 0011
28: Counter 2.1.0 x64 2.0/4 1100
44: Schooner v1.8 x64 1.5/4 01=0
44: Nemeton 1.7 x32 1.5/4 ==0=
44: Dimitri 3.93 x32 1.5/4 1=00
44: Sungorus 1.4 x64 1.5/4 0=10
44: Gromit3 3.0.0 x32 1.5/4 0=01
44: Dimitri 4.00 x32 1.5/4 0=10
50: Kingfisher v1.1 x64 1.0/4 1000
50: Galjoen 0.39.2 x64 1.0/4 1000
50: Fruit 1.5 x32 1.0/4 0010
50: Orion 0.4 x64 1.0/4 0010
50: Plisk 0.2.7_d x64 1.0/4 0001
50: TJchess 1.3 x64 1.0/4 1000
56: Abrok 5.0 x32 0.5/4 =000
56: Eeyore 1.52 x64 0.5/4 000=
56: RomiChess P3n x64 0.5/4 000=
56: Bearded Neural v44.5 x64 0.5/4 00=0
56: Ifrit m1.8 x64 0.5/4 0=00
61: TCB 0052 x32 0.0/4 0000
240 games played / Tournament finished
Name of the tournament: 192 - Topple v0.3.0 Gauntlet
Let's see what other testers get...
Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.
I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Thu Jan 10, 2019 6:27 pm
by Guenther
konsolas wrote: ↑Thu Jan 10, 2019 5:29 pm
Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.
I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1
If I am not mistaken Topple does not support ponder, right?
Carlos does some unusual asymmetric testing with always ponder on, thus the error bars are higher and it also depends on,
how often it faces opponents, which also cannot ponder.
This means the CCRL/CEGT result could be very different.
viewtopic.php?f=2&t=68701&start=70#p785495
Post by CMCanavessi » Tue Jan 08, 2019 9:25 pm
TC is 1 minute + 1 second, ponder is ON for engines that support it, 1 thread, 2 move book, random openings with NO reverse games.
Post by xr_a_y » Wed Jan 09, 2019 7:05 am
Ok those are 2 domains where Minic is not good. I'll activate pondering soon (this is usualy worh 40-60 elo) and work on sudden death TC because current heuristic isn't smart enougth. I guess some "emergency time" management shall be added also.
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Thu Jan 10, 2019 6:50 pm
by CMCanavessi
Guenther wrote: ↑Thu Jan 10, 2019 6:27 pm
konsolas wrote: ↑Thu Jan 10, 2019 5:29 pm
Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.
I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1
If I am not mistaken Topple does not support ponder, right?
Carlos does some unusual asymmetric testing with always ponder on, thus the error bars are higher and it also depends on,
how often it faces opponents, which also cannot ponder.
This means the CCRL/CEGT result could be very different.
viewtopic.php?f=2&t=68701&start=70#p785495
Post by CMCanavessi » Tue Jan 08, 2019 9:25 pm
TC is 1 minute + 1 second, ponder is ON for engines that support it, 1 thread, 2 move book, random openings with NO reverse games.
Post by xr_a_y » Wed Jan 09, 2019 7:05 am
Ok those are 2 domains where Minic is not good. I'll activate pondering soon (this is usualy worh 40-60 elo) and work on sudden death TC because current heuristic isn't smart enougth. I guess some "emergency time" management shall be added also.
Yeah, but the conditions were exactly the same for both runs, same opponents, same number of games, same TC, same hash size, same everything.
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Thu Jan 10, 2019 6:57 pm
by Guenther
CMCanavessi wrote: ↑Thu Jan 10, 2019 6:50 pm
Guenther wrote: ↑Thu Jan 10, 2019 6:27 pm
konsolas wrote: ↑Thu Jan 10, 2019 5:29 pm
Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.
I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1
If I am not mistaken Topple does not support ponder, right?
Carlos does some unusual asymmetric testing with always ponder on, thus the error bars are higher and it also depends on,
how often it faces opponents, which also cannot ponder.
This means the CCRL/CEGT result could be very different.
viewtopic.php?f=2&t=68701&start=70#p785495
Post by CMCanavessi » Tue Jan 08, 2019 9:25 pm
TC is 1 minute + 1 second, ponder is ON for engines that support it, 1 thread, 2 move book, random openings with NO reverse games.
Post by xr_a_y » Wed Jan 09, 2019 7:05 am
Ok those are 2 domains where Minic is not good. I'll activate pondering soon (this is usualy worh 40-60 elo) and work on sudden death TC because current heuristic isn't smart enougth. I guess some "emergency time" management shall be added also.
Yeah, but the conditions were exactly the same for both runs, same opponents, same number of games, same TC, same hash size, same everything.
Same openings too? I don't know how much lines your 2_moves book contains, but you wrote randomly selected w/o repeating?
(ofc 200 games still would give an error of may be +-30)
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Thu Jan 10, 2019 7:05 pm
by CMCanavessi
Guenther wrote: ↑Thu Jan 10, 2019 6:57 pm
CMCanavessi wrote: ↑Thu Jan 10, 2019 6:50 pm
Guenther wrote: ↑Thu Jan 10, 2019 6:27 pm
konsolas wrote: ↑Thu Jan 10, 2019 5:29 pm
Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.
I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1
If I am not mistaken Topple does not support ponder, right?
Carlos does some unusual asymmetric testing with always ponder on, thus the error bars are higher and it also depends on,
how often it faces opponents, which also cannot ponder.
This means the CCRL/CEGT result could be very different.
viewtopic.php?f=2&t=68701&start=70#p785495
Post by CMCanavessi » Tue Jan 08, 2019 9:25 pm
TC is 1 minute + 1 second, ponder is ON for engines that support it, 1 thread, 2 move book, random openings with NO reverse games.
Post by xr_a_y » Wed Jan 09, 2019 7:05 am
Ok those are 2 domains where Minic is not good. I'll activate pondering soon (this is usualy worh 40-60 elo) and work on sudden death TC because current heuristic isn't smart enougth. I guess some "emergency time" management shall be added also.
Yeah, but the conditions were exactly the same for both runs, same opponents, same number of games, same TC, same hash size, same everything.
Same openings too? I don't know how much lines your 2_moves book contains, but you wrote randomly selected w/o repeating?
(ofc 200 games still would give an error of may be +-30)
Yep, openings can change things a bit, the book contains 200 openings and 1 entry with no moves (bookless), so 201 total entries. Still, +150 elo as seen by konsolas would be way way higher than the margin of errors from the different openings being used, as the engines were all well within +-150 elo, so if v0.2.1 scored almost 50%, a +150 elo engine would have scored at least 60-65% if not more.
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Thu Jan 10, 2019 7:10 pm
by Guenther
CMCanavessi wrote: ↑Thu Jan 10, 2019 7:05 pm
...
Yep, openings can change things a bit, the book contains 200 openings and 1 entry with no moves (bookless), so 201 total entries. Still, +150 elo as seen by konsolas would be way way higher than the margin of errors from the different openings being used, as the engines were all well within +-150 elo, so if v0.2.1 scored almost 50%, a +150 elo engine would have scored at least 60-65% if not more.
Well +150 seems not realistic then and I would also suppose a SP rating test 'fata morgana'.
(I did not read the part in this thread with that high expectation before)
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Fri Jan 11, 2019 8:35 am
by Daniel Anulliero
Hi Konsolas
Wow , indeed these results must be very disapointing...
Do you use the same opennings in your tests ?
I think its necessary To use the same opennings .
In Isa , I test every new versions in à self test vs 2-3 others version .
I play 500 games vs each others , 250 différent opennings with colour reversed .TC 1minute + 250 miliseconds.
I stop the test if It is so bad after 200 games.
So, at the end , à dev version have played 1500 games , and the error bar is relatively small.
Good luck
Dany
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Sat Jan 12, 2019 1:50 pm
by konsolas
Thanks Daniel,
I've built up a small collection of engines now to run tournaments with so hopefully i can have a more accurate picture of strength improvements in the future:
Code: Select all
Rank Name Elo +/- Games Score Draws
1 Critter_1.6a_32bit 744 nan 110 98.6% 2.7%
2 gaviota-1.0-win32 261 82 110 81.8% 7.3%
3 pawny_1.2.x64.SSE4.2 236 76 110 79.5% 10.0%
4 ToppleDebug 114 61 120 65.8% 13.3%
5 GarboChess2-32 61 60 110 58.6% 17.3%
6 orion64-v0.5-bmi2 54 62 110 57.7% 11.8%
7 Topple2 29 59 120 54.2% 11.7%
8 Topple2E 26 59 120 53.8% 10.8%
9 simplex-098-32-ja -77 65 110 39.1% 7.3%
10 chispa403-blend -80 65 110 38.6% 6.4%
11 bikjump -241 83 110 20.0% 1.8%
12 drosophila-win64 -inf nan 110 0.0% 0.0%
13 Godel -inf nan 110 0.0% 0.0%
730 of 780 games finished.
Topple2 = Topple v0.2.1, Topple2E = Topple v0.3.1, ToppleDebug = current dev build of Topple.
This reflects CMCanavessi's results (where v0.2.1 was very similar to v0.3.1), but I think there is sufficient evidence to suggest that the current development build is likely to be stronger.
Re: Automated tuning... finally... (Topple v0.3.0)
Posted: Sat Jan 12, 2019 2:33 pm
by Guenther
konsolas wrote: ↑Sat Jan 12, 2019 1:50 pm
Thanks Daniel,
I've built up a small collection of engines now to run tournaments with so hopefully i can have a more accurate picture of strength improvements in the future:
Code: Select all
Rank Name Elo +/- Games Score Draws
1 Critter_1.6a_32bit 744 nan 110 98.6% 2.7%
2 gaviota-1.0-win32 261 82 110 81.8% 7.3%
--------------------------------------------------------------------------
3 pawny_1.2.x64.SSE4.2 236 76 110 79.5% 10.0%
4 ToppleDebug 114 61 120 65.8% 13.3%
5 GarboChess2-32 61 60 110 58.6% 17.3%
6 orion64-v0.5-bmi2 54 62 110 57.7% 11.8%
7 Topple2 29 59 120 54.2% 11.7%
8 Topple2E 26 59 120 53.8% 10.8%
9 simplex-098-32-ja -77 65 110 39.1% 7.3%
10 chispa403-blend -80 65 110 38.6% 6.4%
11 bikjump -241 83 110 20.0% 1.8%
--------------------------------------------------------------------------
12 drosophila-win64 -inf nan 110 0.0% 0.0%
13 Godel -inf nan 110 0.0% 0.0%
730 of 780 games finished.
Topple2 = Topple v0.2.1, Topple2E = Topple v0.3.1, ToppleDebug = current dev build of Topple.
This reflects CMCanavessi's results (where v0.2.1 was very similar to v0.3.1), but I think there is sufficient evidence to suggest that the current development build is likely to be stronger.
Hi Vincent, you should remove opponents from your pool, which are too strong or too weak, they just add random noise
and an unnecessary bigger error in rating calculations.
BTW Drosophila and Godel seem to have a problem in your environment, because 0% is unlikeley in real games considering their strength.
(probably always crashing, or always losing on time - you should check for unusual result tags too)