Stats, Testing and Opening books

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

CRoberson
Posts: 2096
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Stats, Testing and Opening books

Post by CRoberson »

After much effort trying to get improvements out of Telepath and Ares and nothing improved the rating, I found some endgames that were played poorly! I fixed them, tested in static positions and all looked great.

Now, I tested with 400 games against a baseline opponent using cutechess 0.20 and nothing. Not the slightest improvement. After some other changes, again the same.

Finally, I came to an idea that bore fruit. My opening book is too prone to draws. This is a great thought as it should be easy to test. So, I ran a two tests with 400 games each using the same new version against the same baseline version with the only difference being the opening book. I used the cutechess-cli -repeat option which forces a book line to be used twice in succession with opponents switching colors to give each program even chances. The new book is a version of Performance.bin that came with Scid a couple of years ago.

Here are the results:
Old book - V2 is 94 Elo stronger than V1 with stats margins of +/- 30.
New book - V2 is 131 Elo stronger than V1 with stats margins of +/- 30.

Note: All previous tests had shown all new versions to be 91 to 100 Elo stronger with margins of +/- 30. This is what led me to think that the book may be drawish.

So, it seems my book was too prone to draws. A 37 point gain with a 30 point margin at 95% confidence levels looks significant.

How do you guys test your books for quality? I am asking all.