Re: Off-topic, sorry...
Posted: Thu Aug 16, 2012 12:49 pm
Hello again:
I got a gap of 353 Elo this time (this match was ran in around 75 minutes), which is much more correct IMHO. Jarkko obtained a gap of 326 ± 55 (with 95% confidence) in his experiment. I will continue with those conditions in view of this result.
I repeated the (depth 2) versus (depth 3) match:
I tried BayesElo with the output PGN:
The error bars seem tiny IMHO: they look more like they are for 1-sigma confidence ~ 68.27% confidence, instead of the usual 95% confidence ~ 1.96-sigma confidence. If I multiply them by 1.96, then I get around ± 17 or ± 18 Elo, which is more less what I obtain by my own method (although my method is less reliable for such unbalanced matches, let us say outside the 15%-85% range, like this case).
The next match will be (depth 1) versus (depth 2); I will open a new topic when I will have enough matches or when I will finish this experiment.
Regards from Spain.
Ajedrecista.
I added -repeat and changed draw factor to -draw 100 50:Adam Hair wrote:For reversed colors, you have to add the switch '-repeat', and the argument for '-games' has to be even (so '-games 2' is fine).Ajedrecista wrote: Viewing some command line examples, I finally wrote this one:
It can be improved for sure: I think that the resign factor is good (-resign 5 900), but I also think that I was too conservative with the draw factor (-draw 10 5); I am not sure if openings are played with reverse colours using -games 2 -rounds 1250... I finally used some opening positions from this PGN file. Any suggestions to improve the accuracy of the matches (including the input PGN file) will be welcomed, as usual.Code: Select all
cutechess-cli -engine conf="Depth_2" depth=2 -engine conf="Depth_3" depth=3 -each tc=40/4 -concurrency 2 -draw 10 5 -resign 5 900 -games 2 -rounds 1250 -pgnin klo_250_eco_a00-e97_variations.pgn -pgnout 02_Vs_03.pgn
I think your draw factor may cause some false draws. I have seen plenty of examples where the advantage during a game changes, with intermediate scores near 0 before favoring the other engine. However, I do not know how often that will occur under these circumstances. I would change the number of full moves to something greater than 80, and changes the draw score to 50 centipawns.
I am glad you have cutechess working. I be interested in any tests and results you share with us
Adam
Code: Select all
cutechess-cli -engine conf="Depth_2" depth=2 -engine conf="Depth_3" depth=3 -each tc=40/4 -concurrency 2 -draw 100 50 -resign 5 900 -games 2 -rounds 1250 -pgnin klo_250_eco_a00-e97_variations.pgn -repeat -pgnout 02_Vs_03.pgn
I repeated the (depth 2) versus (depth 3) match:
Code: Select all
Finished game 2500 (Depth_2 vs Depth_3): 0-1 {Black wins by adjudication}
Score of Depth_2 vs Depth_3: 140 - 2060 - 300 [0.12] 2500
ELO difference: -353
Finished match
Code: Select all
ResultSet>readpgn 02_Vs_03.pgn
2500 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>elo
ResultSet-EloRating>mm
00:00:00,00
ResultSet-EloRating>exactdist
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo Diff + - Games Score Oppo. Draws
Win W-L-D
1 Depth_3 170.44 0.00 9.05 8.78 2500 88.40% -170.44 12.00% 82.40% 2060-140-300
2 Depth_2 -170.44 -340.88 8.78 9.05 2500 11.60% 170.44 12.00% 5.60% 140-2060-300
The next match will be (depth 1) versus (depth 2); I will open a new topic when I will have enough matches or when I will finish this experiment.
Regards from Spain.
Ajedrecista.