Stats, Testing and Opening books

CRoberson · Post by **CRoberson** » Fri Dec 30, 2011 7:30 pm

After much effort trying to get improvements out of Telepath and Ares and nothing improved the rating, I found some endgames that were played poorly! I fixed them, tested in static positions and all looked great.

Now, I tested with 400 games against a baseline opponent using cutechess 0.20 and nothing. Not the slightest improvement. After some other changes, again the same.

Finally, I came to an idea that bore fruit. My opening book is too prone to draws. This is a great thought as it should be easy to test. So, I ran a two tests with 400 games each using the same new version against the same baseline version with the only difference being the opening book. I used the cutechess-cli -repeat option which forces a book line to be used twice in succession with opponents switching colors to give each program even chances. The new book is a version of Performance.bin that came with Scid a couple of years ago.

Here are the results:
Old book - V2 is 94 Elo stronger than V1 with stats margins of +/- 30.
New book - V2 is 131 Elo stronger than V1 with stats margins of +/- 30.

Note: All previous tests had shown all new versions to be 91 to 100 Elo stronger with margins of +/- 30. This is what led me to think that the book may be drawish.

So, it seems my book was too prone to draws. A 37 point gain with a 30 point margin at 95% confidence levels looks significant.

How do you guys test your books for quality? I am asking all.