After much effort trying to get improvements out of Telepath and Ares and nothing improved the rating, I found some endgames that were played poorly! I fixed them, tested in static positions and all looked great.
Now, I tested with 400 games against a baseline opponent using cutechess 0.20 and nothing. Not the slightest improvement. After some other changes, again the same.
Finally, I came to an idea that bore fruit. My opening book is too prone to draws. This is a great thought as it should be easy to test. So, I ran a two tests with 400 games each using the same new version against the same baseline version with the only difference being the opening book. I used the cutechess-cli -repeat option which forces a book line to be used twice in succession with opponents switching colors to give each program even chances. The new book is a version of Performance.bin that came with Scid a couple of years ago.
Here are the results:
Old book - V2 is 94 Elo stronger than V1 with stats margins of +/- 30.
New book - V2 is 131 Elo stronger than V1 with stats margins of +/- 30.
Note: All previous tests had shown all new versions to be 91 to 100 Elo stronger with margins of +/- 30. This is what led me to think that the book may be drawish.
So, it seems my book was too prone to draws. A 37 point gain with a 30 point margin at 95% confidence levels looks significant.
How do you guys test your books for quality? I am asking all.
Stats, Testing and Opening books
Moderator: Ras
-
CRoberson
- Posts: 2096
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA