Automated tuning... finally... (Topple v0.3.0)

CMCanavessi · Post by **CMCanavessi** » Thu Jan 10, 2019 2:33 pm

konsolas wrote: ↑Tue Jan 08, 2019 10:03 pm It's been a long while since I last updated Topple, but I've finally rewritten the evaluation and implemented Texel tuning with a simple linear search: self play shows an elo gain of about +150.

The release can be found here: https://github.com/konsolas/ToppleChess ... tag/v0.3.0
I've provided 3 builds for windows, but it should be easy to build from source on Linux and macOS with CMake (you may need to remove the -static flag for clang on MacOS to work).

It would be awesome to find out how the new Topple performs against other engines

I don't know what the others will get, but this might well be a case of "self-play elo increase doesn't translate to real elo increase"

I ran 2 gauntlets, one with Topple v0.2.1 and then with Topple v0.3.0, same opponents, same number of games, and the results were almost identical (even v0.2.1 got more points, well inside error margins)

Code: Select all

    Engine                    Score       To
01: Topple v0.2.1 x64         118.0/240 ···· 
02: Counter 2.8 x64           4.0/4     1111 
02: SOS 5.1 x32               4.0/4     1111 
02: Fruit 1.0 x32             4.0/4     1111 
02: Myrddin 0.87 x64          4.0/4     1111 
06: DanaSah 7.0 x32           3.5/4     11=1 
06: Asymptote v0.3 x64        3.5/4     11=1 
06: Francesca M.A.D. 0.19 x32 3.5/4     1=11 
06: tomitankChess 2.0 x64     3.5/4     11=1 
06: Coiled 0.6 x64            3.5/4     1=11 
11: Coiled 0.4 x64            3.0/4     0111 
11: K2 v.0.91 x32             3.0/4     0111 
11: Fruit 1.5 x32             3.0/4     1011 
11: Nemeton 1.7 x32           3.0/4     1=1= 
11: Winter 0.2 x64            3.0/4     1101 
11: GreKo 2017 x64            3.0/4     1011 
11: Abrok 5.0 x32             3.0/4     1==1 
11: Simplex 0.9.8 x64         3.0/4     1==1 
19: RubiChess 0.8.1 x64       2.5/4     1=10 
19: TCB 0052 x32              2.5/4     1=01 
19: Rotor 0.8 x32             2.5/4     =110 
19: Minic 0.29 x64            2.5/4     01=1 
19: Drosophila 1.5 x64        2.5/4     0=11 
24: RomiChess P3n x64         2.0/4     0101 
24: Galjoen 0.39.2 x64        2.0/4     0101 
24: Schooner v1.8 x64         2.0/4     =10= 
24: Counter 2.6 x64           2.0/4     1010 
24: Betsabe II 1.75 x32       2.0/4     1001 
24: Absolute Zero 2.4.7.2 x64 2.0/4     1=0= 
24: Jazz Orchestra 840 x64    2.0/4     1100 
24: Plisk 0.2.7_d x64         2.0/4     0110 
24: Betsabe II 1.84 x32       2.0/4     ==10 
24: Nemeton 1.61 x32          2.0/4     1010 
24: Pharaon 3.5.1 x32         2.0/4     0011 
24: Gromit3 3.0.0 x32         2.0/4     0101 
24: RubiChess 0.9 x64         2.0/4     1100 
37: Sungorus 1.4 x64          1.5/4     01=0 
37: Hermann 2.8 x64           1.5/4     100= 
37: Dimitri 4.00 x32          1.5/4     =100 
37: MadChess 2.2 x64          1.5/4     0=10 
37: Orion 0.5 x64             1.5/4     00=1 
37: Ares 1.005.2.1 x64        1.5/4     010= 
37: Ifrit m1.8 x64            1.5/4     =010 
37: ChessBrainVB 3.60 x32     1.5/4     100= 
37: Eeyore 1.52 x64           1.5/4     100= 
37: Galjoen 0.38 x64          1.5/4     =0== 
37: FrankWalter 2.2.8 x64     1.5/4     1=00 
48: Counter 2.1.0 x64         1.0/4     1000 
48: Dimitri 3.93 x32          1.0/4     0=0= 
48: Nemeton 1.8 x32           1.0/4     0010 
48: Orion 0.4 x64             1.0/4     0010 
48: Kingfisher v1.1 x64       1.0/4     0010 
48: Gaia 3.5 x64              1.0/4     0100 
48: Monolith 0.3 x64          1.0/4     0010 
48: Giraffe 20161023 x64      1.0/4     0010 
56: Bumblebee 1.0.36898e1 x64 0.5/4     000= 
56: TJchess 1.3 x64           0.5/4     0=00 
56: AnMon 5.75 x32            0.5/4     00=0 
59: Zevra v2.0 r172 x64       0.0/4     0000 
59: Zevra v2.1.1 r216 x64     0.0/4     0000 
59: Bearded Neural v44.5 x64  0.0/4     0000 

240 games played / Tournament finished
Name of the tournament: 191 - Topple v0.2.1 Gauntlet

Code: Select all

    Engine                    Score       To
01: Topple v0.3.0 x64         115.5/240 ···· 
02: Rotor 0.8 x32             4.0/4     1111 
02: Drosophila 1.5 x64        4.0/4     1111 
04: Coiled 0.4 x64            3.5/4     =111 
04: Counter 2.6 x64           3.5/4     1=11 
06: GreKo 2017 x64            3.0/4     0111 
06: SOS 5.1 x32               3.0/4     1011 
06: AnMon 5.75 x32            3.0/4     1011 
06: tomitankChess 2.0 x64     3.0/4     1101 
06: Monolith 0.3 x64          3.0/4     =11= 
06: FrankWalter 2.2.8 x64     3.0/4     1110 
06: Francesca M.A.D. 0.19 x32 3.0/4     =1=1 
06: RubiChess 0.8.1 x64       3.0/4     =11= 
06: Ares 1.005.2.1 x64        3.0/4     1101 
06: Winter 0.2 x64            3.0/4     1011 
16: ChessBrainVB 3.60 x32     2.5/4     =101 
16: Hermann 2.8 x64           2.5/4     01=1 
16: DanaSah 7.0 x32           2.5/4     101= 
16: Nemeton 1.8 x32           2.5/4     1=01 
16: Orion 0.5 x64             2.5/4     11=0 
16: Pharaon 3.5.1 x32         2.5/4     01=1 
16: Fruit 1.0 x32             2.5/4     110= 
16: Minic 0.29 x64            2.5/4     1=== 
16: Zevra v2.0 r172 x64       2.5/4     101= 
16: Asymptote v0.3 x64        2.5/4     0=11 
16: Simplex 0.9.8 x64         2.5/4     =101 
16: Coiled 0.6 x64            2.5/4     01=1 
28: K2 v.0.91 x32             2.0/4     ==10 
28: Counter 2.8 x64           2.0/4     1100 
28: Zevra v2.1.1 r216 x64     2.0/4     0==1 
28: MadChess 2.2 x64          2.0/4     1001 
28: Betsabe II 1.75 x32       2.0/4     0101 
28: Myrddin 0.87 x64          2.0/4     1010 
28: Bumblebee 1.0.36898e1 x64 2.0/4     0101 
28: Betsabe II 1.84 x32       2.0/4     1001 
28: Galjoen 0.38 x64          2.0/4     0101 
28: Absolute Zero 2.4.7.2 x64 2.0/4     01== 
28: Jazz Orchestra 840 x64    2.0/4     0110 
28: Giraffe 20161023 x64      2.0/4     0110 
28: Nemeton 1.61 x32          2.0/4     1=0= 
28: RubiChess 0.9 x64         2.0/4     01== 
28: Gaia 3.5 x64              2.0/4     0011 
28: Counter 2.1.0 x64         2.0/4     1100 
44: Schooner v1.8 x64         1.5/4     01=0 
44: Nemeton 1.7 x32           1.5/4     ==0= 
44: Dimitri 3.93 x32          1.5/4     1=00 
44: Sungorus 1.4 x64          1.5/4     0=10 
44: Gromit3 3.0.0 x32         1.5/4     0=01 
44: Dimitri 4.00 x32          1.5/4     0=10 
50: Kingfisher v1.1 x64       1.0/4     1000 
50: Galjoen 0.39.2 x64        1.0/4     1000 
50: Fruit 1.5 x32             1.0/4     0010 
50: Orion 0.4 x64             1.0/4     0010 
50: Plisk 0.2.7_d x64         1.0/4     0001 
50: TJchess 1.3 x64           1.0/4     1000 
56: Abrok 5.0 x32             0.5/4     =000 
56: Eeyore 1.52 x64           0.5/4     000= 
56: RomiChess P3n x64         0.5/4     000= 
56: Bearded Neural v44.5 x64  0.5/4     00=0 
56: Ifrit m1.8 x64            0.5/4     0=00 
61: TCB 0052 x32              0.0/4     0000 

240 games played / Tournament finished
Name of the tournament: 192 - Topple v0.3.0 Gauntlet

Let's see what other testers get...

konsolas · Post by **konsolas** » Thu Jan 10, 2019 5:29 pm

CMCanavessi wrote: ↑Thu Jan 10, 2019 2:33 pm

konsolas wrote: ↑Tue Jan 08, 2019 10:03 pm It's been a long while since I last updated Topple, but I've finally rewritten the evaluation and implemented Texel tuning with a simple linear search: self play shows an elo gain of about +150.

The release can be found here: https://github.com/konsolas/ToppleChess ... tag/v0.3.0
I've provided 3 builds for windows, but it should be easy to build from source on Linux and macOS with CMake (you may need to remove the -static flag for clang on MacOS to work).

It would be awesome to find out how the new Topple performs against other engines

I don't know what the others will get, but this might well be a case of "self-play elo increase doesn't translate to real elo increase"

I ran 2 gauntlets, one with Topple v0.2.1 and then with Topple v0.3.0, same opponents, same number of games, and the results were almost identical (even v0.2.1 got more points, well inside error margins)

Code: Select all

    Engine                    Score       To
01: Topple v0.2.1 x64         118.0/240 ···· 
02: Counter 2.8 x64           4.0/4     1111 
02: SOS 5.1 x32               4.0/4     1111 
02: Fruit 1.0 x32             4.0/4     1111 
02: Myrddin 0.87 x64          4.0/4     1111 
06: DanaSah 7.0 x32           3.5/4     11=1 
06: Asymptote v0.3 x64        3.5/4     11=1 
06: Francesca M.A.D. 0.19 x32 3.5/4     1=11 
06: tomitankChess 2.0 x64     3.5/4     11=1 
06: Coiled 0.6 x64            3.5/4     1=11 
11: Coiled 0.4 x64            3.0/4     0111 
11: K2 v.0.91 x32             3.0/4     0111 
11: Fruit 1.5 x32             3.0/4     1011 
11: Nemeton 1.7 x32           3.0/4     1=1= 
11: Winter 0.2 x64            3.0/4     1101 
11: GreKo 2017 x64            3.0/4     1011 
11: Abrok 5.0 x32             3.0/4     1==1 
11: Simplex 0.9.8 x64         3.0/4     1==1 
19: RubiChess 0.8.1 x64       2.5/4     1=10 
19: TCB 0052 x32              2.5/4     1=01 
19: Rotor 0.8 x32             2.5/4     =110 
19: Minic 0.29 x64            2.5/4     01=1 
19: Drosophila 1.5 x64        2.5/4     0=11 
24: RomiChess P3n x64         2.0/4     0101 
24: Galjoen 0.39.2 x64        2.0/4     0101 
24: Schooner v1.8 x64         2.0/4     =10= 
24: Counter 2.6 x64           2.0/4     1010 
24: Betsabe II 1.75 x32       2.0/4     1001 
24: Absolute Zero 2.4.7.2 x64 2.0/4     1=0= 
24: Jazz Orchestra 840 x64    2.0/4     1100 
24: Plisk 0.2.7_d x64         2.0/4     0110 
24: Betsabe II 1.84 x32       2.0/4     ==10 
24: Nemeton 1.61 x32          2.0/4     1010 
24: Pharaon 3.5.1 x32         2.0/4     0011 
24: Gromit3 3.0.0 x32         2.0/4     0101 
24: RubiChess 0.9 x64         2.0/4     1100 
37: Sungorus 1.4 x64          1.5/4     01=0 
37: Hermann 2.8 x64           1.5/4     100= 
37: Dimitri 4.00 x32          1.5/4     =100 
37: MadChess 2.2 x64          1.5/4     0=10 
37: Orion 0.5 x64             1.5/4     00=1 
37: Ares 1.005.2.1 x64        1.5/4     010= 
37: Ifrit m1.8 x64            1.5/4     =010 
37: ChessBrainVB 3.60 x32     1.5/4     100= 
37: Eeyore 1.52 x64           1.5/4     100= 
37: Galjoen 0.38 x64          1.5/4     =0== 
37: FrankWalter 2.2.8 x64     1.5/4     1=00 
48: Counter 2.1.0 x64         1.0/4     1000 
48: Dimitri 3.93 x32          1.0/4     0=0= 
48: Nemeton 1.8 x32           1.0/4     0010 
48: Orion 0.4 x64             1.0/4     0010 
48: Kingfisher v1.1 x64       1.0/4     0010 
48: Gaia 3.5 x64              1.0/4     0100 
48: Monolith 0.3 x64          1.0/4     0010 
48: Giraffe 20161023 x64      1.0/4     0010 
56: Bumblebee 1.0.36898e1 x64 0.5/4     000= 
56: TJchess 1.3 x64           0.5/4     0=00 
56: AnMon 5.75 x32            0.5/4     00=0 
59: Zevra v2.0 r172 x64       0.0/4     0000 
59: Zevra v2.1.1 r216 x64     0.0/4     0000 
59: Bearded Neural v44.5 x64  0.0/4     0000 

240 games played / Tournament finished
Name of the tournament: 191 - Topple v0.2.1 Gauntlet

Code: Select all

    Engine                    Score       To
01: Topple v0.3.0 x64         115.5/240 ···· 
02: Rotor 0.8 x32             4.0/4     1111 
02: Drosophila 1.5 x64        4.0/4     1111 
04: Coiled 0.4 x64            3.5/4     =111 
04: Counter 2.6 x64           3.5/4     1=11 
06: GreKo 2017 x64            3.0/4     0111 
06: SOS 5.1 x32               3.0/4     1011 
06: AnMon 5.75 x32            3.0/4     1011 
06: tomitankChess 2.0 x64     3.0/4     1101 
06: Monolith 0.3 x64          3.0/4     =11= 
06: FrankWalter 2.2.8 x64     3.0/4     1110 
06: Francesca M.A.D. 0.19 x32 3.0/4     =1=1 
06: RubiChess 0.8.1 x64       3.0/4     =11= 
06: Ares 1.005.2.1 x64        3.0/4     1101 
06: Winter 0.2 x64            3.0/4     1011 
16: ChessBrainVB 3.60 x32     2.5/4     =101 
16: Hermann 2.8 x64           2.5/4     01=1 
16: DanaSah 7.0 x32           2.5/4     101= 
16: Nemeton 1.8 x32           2.5/4     1=01 
16: Orion 0.5 x64             2.5/4     11=0 
16: Pharaon 3.5.1 x32         2.5/4     01=1 
16: Fruit 1.0 x32             2.5/4     110= 
16: Minic 0.29 x64            2.5/4     1=== 
16: Zevra v2.0 r172 x64       2.5/4     101= 
16: Asymptote v0.3 x64        2.5/4     0=11 
16: Simplex 0.9.8 x64         2.5/4     =101 
16: Coiled 0.6 x64            2.5/4     01=1 
28: K2 v.0.91 x32             2.0/4     ==10 
28: Counter 2.8 x64           2.0/4     1100 
28: Zevra v2.1.1 r216 x64     2.0/4     0==1 
28: MadChess 2.2 x64          2.0/4     1001 
28: Betsabe II 1.75 x32       2.0/4     0101 
28: Myrddin 0.87 x64          2.0/4     1010 
28: Bumblebee 1.0.36898e1 x64 2.0/4     0101 
28: Betsabe II 1.84 x32       2.0/4     1001 
28: Galjoen 0.38 x64          2.0/4     0101 
28: Absolute Zero 2.4.7.2 x64 2.0/4     01== 
28: Jazz Orchestra 840 x64    2.0/4     0110 
28: Giraffe 20161023 x64      2.0/4     0110 
28: Nemeton 1.61 x32          2.0/4     1=0= 
28: RubiChess 0.9 x64         2.0/4     01== 
28: Gaia 3.5 x64              2.0/4     0011 
28: Counter 2.1.0 x64         2.0/4     1100 
44: Schooner v1.8 x64         1.5/4     01=0 
44: Nemeton 1.7 x32           1.5/4     ==0= 
44: Dimitri 3.93 x32          1.5/4     1=00 
44: Sungorus 1.4 x64          1.5/4     0=10 
44: Gromit3 3.0.0 x32         1.5/4     0=01 
44: Dimitri 4.00 x32          1.5/4     0=10 
50: Kingfisher v1.1 x64       1.0/4     1000 
50: Galjoen 0.39.2 x64        1.0/4     1000 
50: Fruit 1.5 x32             1.0/4     0010 
50: Orion 0.4 x64             1.0/4     0010 
50: Plisk 0.2.7_d x64         1.0/4     0001 
50: TJchess 1.3 x64           1.0/4     1000 
56: Abrok 5.0 x32             0.5/4     =000 
56: Eeyore 1.52 x64           0.5/4     000= 
56: RomiChess P3n x64         0.5/4     000= 
56: Bearded Neural v44.5 x64  0.5/4     00=0 
56: Ifrit m1.8 x64            0.5/4     0=00 
61: TCB 0052 x32              0.0/4     0000 

240 games played / Tournament finished
Name of the tournament: 192 - Topple v0.3.0 Gauntlet

Let's see what other testers get...

Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.

I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1

Guenther · Post by **Guenther** » Thu Jan 10, 2019 6:27 pm

konsolas wrote: ↑Thu Jan 10, 2019 5:29 pm
Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.

I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1

If I am not mistaken Topple does not support ponder, right?

Carlos does some unusual asymmetric testing with always ponder on, thus the error bars are higher and it also depends on,
how often it faces opponents, which also cannot ponder.

This means the CCRL/CEGT result could be very different.

viewtopic.php?f=2&t=68701&start=70#p785495

Post by CMCanavessi » Tue Jan 08, 2019 9:25 pm
TC is 1 minute + 1 second, ponder is ON for engines that support it, 1 thread, 2 move book, random openings with NO reverse games.

Post by xr_a_y » Wed Jan 09, 2019 7:05 am
Ok those are 2 domains where Minic is not good. I'll activate pondering soon (this is usualy worh 40-60 elo) and work on sudden death TC because current heuristic isn't smart enougth. I guess some "emergency time" management shall be added also.

CMCanavessi · Post by **CMCanavessi** » Thu Jan 10, 2019 6:50 pm

Guenther wrote: ↑Thu Jan 10, 2019 6:27 pm
konsolas wrote: ↑Thu Jan 10, 2019 5:29 pm
Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.

I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1
If I am not mistaken Topple does not support ponder, right?

Carlos does some unusual asymmetric testing with always ponder on, thus the error bars are higher and it also depends on,
how often it faces opponents, which also cannot ponder.

This means the CCRL/CEGT result could be very different.

viewtopic.php?f=2&t=68701&start=70#p785495
Post by CMCanavessi » Tue Jan 08, 2019 9:25 pm
TC is 1 minute + 1 second, ponder is ON for engines that support it, 1 thread, 2 move book, random openings with NO reverse games.

Post by xr_a_y » Wed Jan 09, 2019 7:05 am
Ok those are 2 domains where Minic is not good. I'll activate pondering soon (this is usualy worh 40-60 elo) and work on sudden death TC because current heuristic isn't smart enougth. I guess some "emergency time" management shall be added also.

Yeah, but the conditions were exactly the same for both runs, same opponents, same number of games, same TC, same hash size, same everything.

Guenther · Post by **Guenther** » Thu Jan 10, 2019 6:57 pm

CMCanavessi wrote: ↑Thu Jan 10, 2019 6:50 pm
Guenther wrote: ↑Thu Jan 10, 2019 6:27 pm
konsolas wrote: ↑Thu Jan 10, 2019 5:29 pm
Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.

I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1
If I am not mistaken Topple does not support ponder, right?

Carlos does some unusual asymmetric testing with always ponder on, thus the error bars are higher and it also depends on,
how often it faces opponents, which also cannot ponder.

This means the CCRL/CEGT result could be very different.

viewtopic.php?f=2&t=68701&start=70#p785495
Post by CMCanavessi » Tue Jan 08, 2019 9:25 pm
TC is 1 minute + 1 second, ponder is ON for engines that support it, 1 thread, 2 move book, random openings with NO reverse games.

Post by xr_a_y » Wed Jan 09, 2019 7:05 am
Ok those are 2 domains where Minic is not good. I'll activate pondering soon (this is usualy worh 40-60 elo) and work on sudden death TC because current heuristic isn't smart enougth. I guess some "emergency time" management shall be added also.

Yeah, but the conditions were exactly the same for both runs, same opponents, same number of games, same TC, same hash size, same everything.

Same openings too? I don't know how much lines your 2_moves book contains, but you wrote randomly selected w/o repeating?
(ofc 200 games still would give an error of may be +-30)

CMCanavessi · Post by **CMCanavessi** » Thu Jan 10, 2019 7:05 pm

Guenther wrote: ↑Thu Jan 10, 2019 6:57 pm
CMCanavessi wrote: ↑Thu Jan 10, 2019 6:50 pm
Guenther wrote: ↑Thu Jan 10, 2019 6:27 pm
konsolas wrote: ↑Thu Jan 10, 2019 5:29 pm
Ah, that's certainly disappointing. I suppose I should change how I test new versions so I don't get overinflated Elo estimates in the future. Thank you very much for testing. I've updated the release page on GitHub.

I'm quite surprised, since v0.3.0 shares very little evaluation code with v0.2.1
If I am not mistaken Topple does not support ponder, right?

Carlos does some unusual asymmetric testing with always ponder on, thus the error bars are higher and it also depends on,
how often it faces opponents, which also cannot ponder.

This means the CCRL/CEGT result could be very different.

viewtopic.php?f=2&t=68701&start=70#p785495
Post by CMCanavessi » Tue Jan 08, 2019 9:25 pm
TC is 1 minute + 1 second, ponder is ON for engines that support it, 1 thread, 2 move book, random openings with NO reverse games.

Post by xr_a_y » Wed Jan 09, 2019 7:05 am
Ok those are 2 domains where Minic is not good. I'll activate pondering soon (this is usualy worh 40-60 elo) and work on sudden death TC because current heuristic isn't smart enougth. I guess some "emergency time" management shall be added also.

Yeah, but the conditions were exactly the same for both runs, same opponents, same number of games, same TC, same hash size, same everything.
Same openings too? I don't know how much lines your 2_moves book contains, but you wrote randomly selected w/o repeating?
(ofc 200 games still would give an error of may be +-30)

Yep, openings can change things a bit, the book contains 200 openings and 1 entry with no moves (bookless), so 201 total entries. Still, +150 elo as seen by konsolas would be way way higher than the margin of errors from the different openings being used, as the engines were all well within +-150 elo, so if v0.2.1 scored almost 50%, a +150 elo engine would have scored at least 60-65% if not more.

Guenther · Post by **Guenther** » Thu Jan 10, 2019 7:10 pm

CMCanavessi wrote: ↑Thu Jan 10, 2019 7:05 pm ...
Yep, openings can change things a bit, the book contains 200 openings and 1 entry with no moves (bookless), so 201 total entries. Still, +150 elo as seen by konsolas would be way way higher than the margin of errors from the different openings being used, as the engines were all well within +-150 elo, so if v0.2.1 scored almost 50%, a +150 elo engine would have scored at least 60-65% if not more.

Well +150 seems not realistic then and I would also suppose a SP rating test 'fata morgana'.
(I did not read the part in this thread with that high expectation before)

Daniel Anulliero · Post by **Daniel Anulliero** » Fri Jan 11, 2019 8:35 am

Hi Konsolas
Wow , indeed these results must be very disapointing...
Do you use the same opennings in your tests ?
I think its necessary To use the same opennings .
In Isa , I test every new versions in à self test vs 2-3 others version .
I play 500 games vs each others , 250 différent opennings with colour reversed .TC 1minute + 250 miliseconds.
I stop the test if It is so bad after 200 games.
So, at the end , à dev version have played 1500 games , and the error bar is relatively small.
Good luck
Dany

konsolas · Post by **konsolas** » Sat Jan 12, 2019 1:50 pm

Thanks Daniel,

I've built up a small collection of engines now to run tournaments with so hopefully i can have a more accurate picture of strength improvements in the future:

Code: Select all

Rank Name                          Elo     +/-   Games   Score   Draws
   1 Critter_1.6a_32bit            744     nan     110   98.6%    2.7%
   2 gaviota-1.0-win32             261      82     110   81.8%    7.3%
   3 pawny_1.2.x64.SSE4.2          236      76     110   79.5%   10.0%
   4 ToppleDebug                   114      61     120   65.8%   13.3%
   5 GarboChess2-32                 61      60     110   58.6%   17.3%
   6 orion64-v0.5-bmi2              54      62     110   57.7%   11.8%
   7 Topple2                        29      59     120   54.2%   11.7%
   8 Topple2E                       26      59     120   53.8%   10.8%
   9 simplex-098-32-ja             -77      65     110   39.1%    7.3%
  10 chispa403-blend               -80      65     110   38.6%    6.4%
  11 bikjump                      -241      83     110   20.0%    1.8%
  12 drosophila-win64             -inf     nan     110    0.0%    0.0%
  13 Godel                        -inf     nan     110    0.0%    0.0%

730 of 780 games finished.

Topple2 = Topple v0.2.1, Topple2E = Topple v0.3.1, ToppleDebug = current dev build of Topple.

This reflects CMCanavessi's results (where v0.2.1 was very similar to v0.3.1), but I think there is sufficient evidence to suggest that the current development build is likely to be stronger.

Guenther · Post by **Guenther** » Sat Jan 12, 2019 2:33 pm

konsolas wrote: ↑Sat Jan 12, 2019 1:50 pm Thanks Daniel,

I've built up a small collection of engines now to run tournaments with so hopefully i can have a more accurate picture of strength improvements in the future:
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
   1 Critter_1.6a_32bit            744     nan     110   98.6%    2.7%
   2 gaviota-1.0-win32             261      82     110   81.8%    7.3%
   --------------------------------------------------------------------------
   3 pawny_1.2.x64.SSE4.2          236      76     110   79.5%   10.0%
   4 ToppleDebug                   114      61     120   65.8%   13.3%
   5 GarboChess2-32                 61      60     110   58.6%   17.3%
   6 orion64-v0.5-bmi2              54      62     110   57.7%   11.8%
   7 Topple2                        29      59     120   54.2%   11.7%
   8 Topple2E                       26      59     120   53.8%   10.8%
   9 simplex-098-32-ja             -77      65     110   39.1%    7.3%
  10 chispa403-blend               -80      65     110   38.6%    6.4%
  11 bikjump                      -241      83     110   20.0%    1.8%
  --------------------------------------------------------------------------
  12 drosophila-win64             -inf     nan     110    0.0%    0.0%
  13 Godel                        -inf     nan     110    0.0%    0.0%

730 of 780 games finished.
Topple2 = Topple v0.2.1, Topple2E = Topple v0.3.1, ToppleDebug = current dev build of Topple.

This reflects CMCanavessi's results (where v0.2.1 was very similar to v0.3.1), but I think there is sufficient evidence to suggest that the current development build is likely to be stronger.

Hi Vincent, you should remove opponents from your pool, which are too strong or too weak, they just add random noise
and an unnecessary bigger error in rating calculations.
BTW Drosophila and Godel seem to have a problem in your environment, because 0% is unlikeley in real games considering their strength.
(probably always crashing, or always losing on time - you should check for unusual result tags too)

Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)

Re: Automated tuning... finally... (Topple v0.3.0)