Hello:
I have tested EXchess 6.50b w32 this time, from depth 1 up to depth 10, exactly the same as I did for Quazar. Conditions were the same as the first post of this topic. Here is BayesElo 0057.2 output, with error bars for 95% confidence:
Code: Select all
version 0057.2, Copyright (C) 1997-2010 Remi Coulom.
compiled Apr 5 2012 17:26:01.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under the terms and conditions of the GNU General Public License.
See http://www.gnu.org/copyleft/gpl.html for details.
ResultSet>readpgn 01_Vs_02.pgn
2500 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>readpgn 02_Vs_03.pgn
5000 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>readpgn 03_Vs_04.pgn
7500 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>readpgn 04_Vs_05.pgn
10000 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>readpgn 05_Vs_06.pgn
12500 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>readpgn 06_Vs_07.pgn
15000 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>readpgn 07_Vs_08.pgn
17500 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>readpgn 08_Vs_09.pgn
20000 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>readpgn 09_Vs_10.pgn
22500 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>elo
ResultSet-EloRating>mm 1 1
Iteration 100: 0.0033999
Iteration 200: 0.000231515
Iteration 300: 1.61843e-005
00:00:00,00
ResultSet-EloRating>confidence 0.95
0.95
ResultSet-EloRating>ratings
Rank Name Elo Diff + - Games Score Oppo. Draws Win W-L-D
1 Depth_10 720.75 0.00 11.33 11.33 2500 68.82% 589.21 33.96% 51.84% 1296-355-849
2 Depth_9 589.21 -131.54 8.01 8.01 5000 50.01% 589.18 34.06% 32.98% 1649-1648-1703
3 Depth_8 457.61 -131.59 8.11 8.11 5000 50.92% 448.50 32.00% 34.92% 1746-1654-1600
4 Depth_7 307.79 -149.82 8.27 8.27 5000 50.88% 300.41 28.96% 36.40% 1820-1732-1448
5 Depth_6 143.20 -164.59 8.34 8.34 5000 49.27% 147.45 27.22% 35.66% 1783-1856-1361
6 Depth_5 -12.89 -156.09 8.56 8.56 5000 52.56% -36.32 24.00% 40.56% 2028-1772-1200
7 Depth_4 -215.84 -202.95 8.94 8.94 5000 50.13% -221.68 19.06% 40.60% 2030-2017-953
8 Depth_3 -430.46 -214.62 9.02 9.02 5000 49.74% -426.48 17.44% 41.02% 2051-2077-872
9 Depth_2 -637.12 -206.66 9.35 9.35 5000 54.40% -676.36 18.32% 45.24% 2262-1822-916
10 Depth_1 -922.25 -285.13 13.87 13.87 2500 15.36% -637.12 18.24% 6.24% 156-1888-456
ResultSet-EloRating>x
ResultSet>x
The total number of games is 22500. I have not got such good results as with Quazar when fitting the curve, but it is still OK. Representing ratings on y axis and ln(depth) on x axis, only adjusting for depth > 5 (an arbitrary choice again; with Quazar was depth > 4. I included one more point in Quazar case than in this case), I get Y(depth_i) ~ 1126.3*ln(depth_i) - 1880.4 (
R² ~ 0.9993); the slope of the adjusted line by least squares is less than in Quazar case: it corresponds to less Elo gain when depth grows in EXchess test. The adjusts seem good because R² > 0.999 in both Quazar and EXchess fits, so this logarithmic approach does not seem very bad.
At this moment, the expected Elo gain between depth 10 and depth 11 (under these conditions) is more less Y(11) - Y(10) ~ 1126.3*ln(11/10) ~ 107.3 Elo. Who knows? I have to remark that the shape of the curve is again very similar to Quazar and Houdini tests.
Comparing ratings with estimates, and calculating errors (rounding up to 0.1 Elo) as error_i = Y(depth_i) - rating_i:
Code: Select all
Depth: Rating: Error:
------ ------- ------
6 143.2 -5.5
7 307.8 3.5
8 457.6 4.1
9 589.2 5.1
10 720.8 -7.7
Errors are lower than in Quazar fit: if I calculate the average error for N points as SUM(|error_i|)/N, then:
Code: Select all
Quazar: N = 6, SUM(|error_i|) ~ 43.8; |average error| ~ 43.8/6 = 7.3
EXchess: N = 5, SUM(|error_i|) ~ 25.9; |average error| ~ 25.9/5 = 5.18
The growth of the draw ratio with increasing depths is evident again, and very easy to see in BayesElo output.
I provide all PGN files (around 8.36 MB because they are compressed), win-lose-draw statistics of each 2500-game match and used openings in a PGN file:
Fixed_depth_testing_of_EXchess_v6.50b_win32.rar (8.38 MB)
IIRC, this Zippyshare link will die at 30 days of inactivity.
Code: Select all
Finished game 2500 (Depth_1 vs Depth_2): 1/2-1/2 {Draw by 3-fold repetition}
Score of Depth_1 vs Depth_2: 156 - 1888 - 456 [0.15] 2500
ELO difference: -296
Finished match
----------------------------------------------------------------------------
Finished game 2500 (Depth_2 vs Depth_3): 0-1 {Black wins by adjudication}
Score of Depth_2 vs Depth_3: 374 - 1666 - 460 [0.24] 2500
ELO difference: -199
Finished match
----------------------------------------------------------------------------
Finished game 2500 (Depth_3 vs Depth_4): 0-1 {Black wins by adjudication}
Score of Depth_3 vs Depth_4: 385 - 1703 - 412 [0.24] 2500
ELO difference: -204
Finished match
----------------------------------------------------------------------------
Finished game 2500 (Depth_4 vs Depth_5): 1-0 {White wins by adjudication}
Score of Depth_4 vs Depth_5: 327 - 1632 - 541 [0.24] 2500
ELO difference: -201
Finished match
----------------------------------------------------------------------------
Finished game 2500 (Depth_5 vs Depth_6): 1/2-1/2 {Draw by 3-fold repetition}
Score of Depth_5 vs Depth_6: 396 - 1445 - 659 [0.29] 2500
ELO difference: -155
Finished match
----------------------------------------------------------------------------
Finished game 2500 (Depth_6 vs Depth_7): 0-1 {Black wins by adjudication}
Score of Depth_6 vs Depth_7: 338 - 1460 - 702 [0.28] 2500
ELO difference: -168
Finished match
----------------------------------------------------------------------------
Finished game 2500 (Depth_7 vs Depth_8): 0-1 {Black mates}
Score of Depth_7 vs Depth_8: 360 - 1394 - 746 [0.29] 2500
ELO difference: -153
Finished match
Warning: QObject::killTimers: timers cannot be stopped from another thread
----------------------------------------------------------------------------
Finished game 2500 (Depth_8 vs Depth_9): 1/2-1/2 {Draw by adjudication}
Score of Depth_8 vs Depth_9: 352 - 1294 - 854 [0.31] 2500
ELO difference: -138
Finished match
----------------------------------------------------------------------------
Finished game 2500 (Depth_9 vs Depth_10): 0-1 {Black wins by adjudication}
Score of Depth_9 vs Depth_10: 355 - 1296 - 849 [0.31] 2500
ELO difference: -138
Finished match
I think that I will not run more fixed depth tests. Any suggestions, results, corrections... are welcomed. Good luck for the people that are running fixed depth tests now. As Ed already said (I added bold):
Rebel wrote:Excellent, I hope my experiment will produce similar results, the more engines the more reliable. Will take some weeks I fear.
Regards from Spain.
Ajedrecista.