Statistical Interpretation
Posted: Sat Dec 10, 2016 3:06 am
Help is needed interpreting the statistics from a test. The purpose of the test was to determine the significance of a history move. This is a refutation history called countermove in Stockfish, and counter_move[ply - 1] in Crafty. The history move is stored in (piece type, to square) format.
Engine Blackburne 1.9.0 is a 32-bit GAS assembled program with a lot of debug code, and has the history move installed. 1.9.0a has no history installed. Popochin is a public domain third program of comparable strength to establish a baseline comparison. Here, a 100 game test was performed between the two identical engines, and Popochin as the comparison. The results are strange where the history move performs well against the same engine, but performs badly against the comparison engine.
From the results, can it be concluded the refutation history counter_move is useless or detrimental to the search?
Name of the tournament: Arena 3.0 tournament
Level: Blitz 2/0
-----------------Blackburne1.9.0-----------------
Blackburne1.9.0 - Blackburne1.9.0a : 26.0/50 18-16-16 52% +14
Blackburne1.9.0 - Popochin : 20.5/50 17-26-7 41% -63
-----------------Blackburne1.9.0a-----------------
Blackburne1.9.0a - Blackburne1.9.0 : 24.0/50 16-18-16 48% -14
Blackburne1.9.0a - Popochin : 29.5/50 23-14-13 59% +63
-----------------Popochin-----------------
Popochin - Blackburne1.9.0 : 29.5/50 26-17-7 59% +63
Popochin - Blackburne1.9.0a : 20.5/50 14-23-13 41% -63
Games : 150 (finished)
White Wins : 66 (44.0 %)
Black Wins : 48 (32.0 %)
Draws : 36 (24.0 %)
White Perf. : 56.0 %
Black Perf. : 44.0 %
Engine Blackburne 1.9.0 is a 32-bit GAS assembled program with a lot of debug code, and has the history move installed. 1.9.0a has no history installed. Popochin is a public domain third program of comparable strength to establish a baseline comparison. Here, a 100 game test was performed between the two identical engines, and Popochin as the comparison. The results are strange where the history move performs well against the same engine, but performs badly against the comparison engine.
From the results, can it be concluded the refutation history counter_move is useless or detrimental to the search?
Code: Select all
Engine Score S-B
1: Blackburne1.9.0a 53.5/100 2591.0
2: Popochin 50.0/100 2468.5
3: Blackburne1.9.0 46.5/100 2416.0
Level: Blitz 2/0
-----------------Blackburne1.9.0-----------------
Blackburne1.9.0 - Blackburne1.9.0a : 26.0/50 18-16-16 52% +14
Blackburne1.9.0 - Popochin : 20.5/50 17-26-7 41% -63
-----------------Blackburne1.9.0a-----------------
Blackburne1.9.0a - Blackburne1.9.0 : 24.0/50 16-18-16 48% -14
Blackburne1.9.0a - Popochin : 29.5/50 23-14-13 59% +63
-----------------Popochin-----------------
Popochin - Blackburne1.9.0 : 29.5/50 26-17-7 59% +63
Popochin - Blackburne1.9.0a : 20.5/50 14-23-13 41% -63
Games : 150 (finished)
White Wins : 66 (44.0 %)
Black Wins : 48 (32.0 %)
Draws : 36 (24.0 %)
White Perf. : 56.0 %
Black Perf. : 44.0 %