Bookbuilding 101

Dan Honeycutt · Post by **Dan Honeycutt** » Mon Aug 06, 2012 6:15 pm

Hi All,

I'm working on a book making utility. My general idea is:

(1) Feed it a pgn file or files of decent quality to obtain moves and win/loss statistics for the various positions.

(2) Use an engine to evaluate each book position for some specified time. Of course this could take days for a book of any size but, hey, it's a one-shot deal.

(3) Assign a score to each move based on the win/loss percentage and the engine evaluation. Popularity could also be a factor. The move score would determine the probability that a move is selected.

I don't know squat about book making. Does what I'm doing make sense? What else should I be doing? Any and all comments are welcome.

Best
Dan H.

ZirconiumX · Post by **ZirconiumX** » Mon Aug 06, 2012 6:23 pm

Dan Honeycutt wrote:Hi All,

I'm working on a book making utility. My general idea is:

(1) Feed it a pgn file or files of decent quality to obtain moves and win/loss statistics for the various positions.

(2) Use an engine to evaluate each book position for some specified time. Of course this could take days for a book of any size but, hey, it's a one-shot deal.

(3) Assign a score to each move based on the win/loss percentage and the engine evaluation. Popularity could also be a factor. The move score would determine the probability that a move is selected.

I don't know squat about book making. Does what I'm doing make sense? What else should I be doing? Any and all comments are welcome.

Best
Dan H.

1) Yes this is standard. Maybe in this you could take in the elo of the players (if applicable) and use this as a filter. For instance, moves in the higher elo players are usually known to be better than the blunders of a few 700 Elo players.

2) I'm not so sure about this one. If you evaluate it with an engine, you inflict the engine's personality on the book. So one analysed with Houdini may not be suitable for HIARCS and vice versa.

3) I'd suggest something like (wins * 3) + draws. If you count in losses your engine will prefer being safe, over picking a sharp variation. Popularity is also something which is author dependent. If you have a high popularity, your engine's book may become outdated over time, for instance in the Najdorf, 6. Bg5 e6 7. f4 was the norm a while back. Now, however, 6. Be3 is the main move.

Matthew:out

jdart · Post by **jdart** » Mon Aug 06, 2012 6:29 pm

This is basically correct but you need to start evaluating at the leaf nodes of the book and back up the scores, using the minimax algorithm. For a large book and a lot of nodes, that is a lot of CPU time.

--Jon

bob · Post by **bob** » Mon Aug 06, 2012 6:38 pm

jdart wrote:This is basically correct but you need to start evaluating at the leaf nodes of the book and back up the scores, using the minimax algorithm. For a large book and a lot of nodes, that is a lot of CPU time.

--Jon

I did this in early versions of Crafty. CPU for minimaxing is basically not an issue. The I/O however, is expensive if you use a large book.

But the problem is in the minimaxing. You are choosing an opening line based on whatever info you get at the end of that line, minimaxed with all of the alternatives along that line. That caused me great grief and I gave up on the idea. You choose lines based on a position that might have resulted from really lousy play. I eventually tried to minimax based on GM analysis, but I found positions where "white is clearly winning, but black had a forced mate" and gave up on that as well.

I even tried the idea of doing a search at each book position to be sure that there are no tactical errors along interior book paths. But shallow searches are confused by gambits and you can't trust them. Long searches on a large book becomes computationally intractable. Another approach I have tried (with better success) is to create a book, use book learning, and play millions of games against a variety of opponents and let the learning weed out the bad lines. But gambits are again a problem, since shallow searches won't understand the reason behind the gambit every time...

In short, most likely, hand-tuned books will still be the optimal solution for quite a while longer.

Harvey Williamson · Post by **Harvey Williamson** » Mon Aug 06, 2012 6:48 pm

Dan Honeycutt wrote: (2) Use an engine to evaluate each book position for some specified time. Of course this could take days for a book of any size but, hey, it's a one-shot deal.

Best
Dan H.

You might find this will take years

Jim Ablett · Post by **Jim Ablett** » Mon Aug 06, 2012 7:38 pm

Bookbuilder program does something similar >

http://superchess.com/

Jim.

Don · Post by **Don** » Mon Aug 06, 2012 7:48 pm

bob wrote:
jdart wrote:This is basically correct but you need to start evaluating at the leaf nodes of the book and back up the scores, using the minimax algorithm. For a large book and a lot of nodes, that is a lot of CPU time.

--Jon
I did this in early versions of Crafty. CPU for minimaxing is basically not an issue. The I/O however, is expensive if you use a large book.

But the problem is in the minimaxing. You are choosing an opening line based on whatever info you get at the end of that line, minimaxed with all of the alternatives along that line. That caused me great grief and I gave up on the idea. You choose lines based on a position that might have resulted from really lousy play. I eventually tried to minimax based on GM analysis, but I found positions where "white is clearly winning, but black had a forced mate" and gave up on that as well.

I even tried the idea of doing a search at each book position to be sure that there are no tactical errors along interior book paths. But shallow searches are confused by gambits and you can't trust them. Long searches on a large book becomes computationally intractable. Another approach I have tried (with better success) is to create a book, use book learning, and play millions of games against a variety of opponents and let the learning weed out the bad lines. But gambits are again a problem, since shallow searches won't understand the reason behind the gambit every time...

In short, most likely, hand-tuned books will still be the optimal solution for quite a while longer.

Hi Bob,

I built this large book for automated testing where I include every line played at least N times by human players from a large database of games using TWIC.

I have tens of thousands of games now of komodo vs other strong programs. I can see from the statistics that Komodo should play certain moves and avoid others. However, the statistics are deceiving and I am looking for a general solution. Mini-maxing the results (by win percentage) may be part of the solution but it's has serious sample size issues. You cannot go to some end node where only 5 games are played and mini-max a score back to the first move and expect this to be valid. From computer go with Monte Carlo it's clear that sample size trumps score in many cases so it pays to view the score of a node as having something to do with the total result, not just the success of the best move. For example if one move is played one time and is a win, you don't mini-max 100%.

In my data I have the situation where if Komodo plays 1. e4 it will win slightly more games. But if you look closely at the data you will see that Komodo does much better if the opponent plays 1... e5 - and not as well if the opponent plays 1 ... c5 and the data is pretty conclusive that really Komodo should play 1. d4. In other words 1. e4 looks good only because it has good success when the opponent plays 1. ..e5 and many of the games in the opening book had e5 as the response.

So the question, and I would like to get peoples thoughts on this, is if there is a graceful way to minimax these scores while taking into consideration evidence taken from sibling moves. MCTS solves this problem by controlling how often a move is played, playing the successful moves more often than the unsuccessful moves but still giving all moves some attention - and it's been proved that if you do this right it will have a mini-max effect. But the book is a testing book where each opening line is selected, not each move choice - we play all the lines, both sides as is common for rating and testing chess programs. I have about 28,000 lines which at least covers anything being playing relatively frequently.

You can do things such as only trusting samples larger than 1000 games but that is not very satisfying and probably not correct either. My sense of things here is that the score of any given node needs to be some combination of the combined result of all it's children AND the best response or responses - and this can be recursive.

Here is a canonical example:

At node N move A is played 5 times and scores 70%
move B is played 15 times and score 60%

What should be score of node N? With mini-max we would believe 70, if we combine all results it would be 62.5, but my intuition is that it should be something between the two.

Codesquid · Post by **Codesquid** » Mon Aug 06, 2012 7:56 pm

(1) Feed it a pgn file or files of decent quality to obtain moves and win/loss statistics for the various positions.

Standard approach that works well. I see a couple of interesting problems with it though:

A good book may need to be specific to whether you want to play human-human, human-engine or engine-engine games. In particular, there is no psychological component in engine-engine matches.
Unfortunately a pgn does not tell you why a given game ended the way it did, unless all games have been extensively analyzed and annotated before. Was the game won because of the opening or because of the opponent's suboptimal play (or even a blunder) at some point?
In regular over-the-board games, I've seen many times that a blunder by one side was replied to with another blunder by the other side, failing to see the first blunder for what it is. Engines on the other hand don't experience this.
Potential copyright issues with game collections. I don't think chess games themselves can be copyrighted, but a specific collection of games might be copyrightable.

Dan Honeycutt wrote:(2) Use an engine to evaluate each book position for some specified time. Of course this could take days for a book of any size but, hey, it's a one-shot deal.

I'm doing this with Octochess and it seems to work quite well for me. It takes a lot of time though, 100% CPU for 8 months now and still counting. My book is limited to lines of 20 plies and that's already pushing it, getting a successful book hit 20 plies in a game is a very rare thing.

To decide which lines to take into my book, I combine the following:

Actual games the engine plays against itself, other engines and human players.
A limited amount of games from top grandmasters. Limited mainly due to the sheer volume.
From known lines, branch off into more promising looking lines according to Octochess' evaluation.

Currently I'm at 40k positions in the book.

To give you an example line:
1. e4 e6 2. d4 d5 3. exd exd 4. Nf3 Nf6 5. Bd3 Be7 6. c4 Bg4 7. cxd Nxd5 8. Nc3 Nc6 9. O-O O-O 10. h3 Bxf3 11. Qxf3
With perfect play from both sides, according to known lines evaluated by Octochess' own evaluation, this is the best with 17 centipawns advantage for white.

(3) Assign a score to each move based on the win/loss percentage and the engine evaluation. Popularity could also be a factor. The move score would determine the probability that a move is selected.

A very interesting idea. Finding a good formula to combine both win/loss and evaluation will be the most challenging part.

Only question is, will it combine the best of both worlds, or the worst?

Richard Allbert · Post by **Richard Allbert** » Mon Aug 06, 2012 8:03 pm

I agree with Harvey, the evaluation of each position is going to make the whole thing last a long time.

I wrote a bookmaker for Jabba - what I did was hunt down all the Playchess Engine room pgn files I could find - as in the engine room there you basically have the top engines playing vs themselves with different books.

I found out here what Bob wrote - the IO is the bottleneck.

I then computed win / loss / no. games, and simply wrote the positions in the book according to thresholds for win and number of games.

Problem is Jabba has larger weaknesses than the book, so I'm not sure if the result was a good book or not

Another method could be to take 150 ECO positions (there was an epd file posted here with them) to about move 6 or so, and play very fast matches with good engines from these positions, and build a book from the result.

Initially, I made the mistake of using a massive human games DB, unchecked, assuming that the bad moes would be filtered out by the "add to book if > X games played) and watched in horror as Jabba blundered a piece in the opening - a blunder that had apparently occurred enough times to make it into the book!

Good luck

Richard

Don · Post by **Don** » Mon Aug 06, 2012 8:16 pm

Richard Allbert wrote:I agree with Harvey, the evaluation of each position is going to make the whole thing last a long time.

I wrote a bookmaker for Jabba - what I did was hunt down all the Playchess Engine room pgn files I could find - as in the engine room there you basically have the top engines playing vs themselves with different books.

I found out here what Bob wrote - the IO is the bottleneck.

I then computed win / loss / no. games, and simply wrote the positions in the book according to thresholds for win and number of games.

Problem is Jabba has larger weaknesses than the book, so I'm not sure if the result was a good book or not

Another method could be to take 150 ECO positions (there was an epd file posted here with them) to about move 6 or so, and play very fast matches with good engines from these positions, and build a book from the result.

Initially, I made the mistake of using a massive human games DB, unchecked, assuming that the bad moes would be filtered out by the "add to book if > X games played) and watched in horror as Jabba blundered a piece in the opening - a blunder that had apparently occurred enough times to make it into the book!

Good luck

Richard

In the book I talked about I was horrified to see that 1. h4 made it into the book. It was played at least 10 times. The TWIC database has mostly games by strong players but it also includes important tournaments that can have very weak players in them such as tournaments for the very young (where the strongest players may not even be masters and and the weakest may be quite weak or even beginners.) I did not filter by ELO.

I manually removed 1. h4 but didn't try beyond that - most the blunders are probably in the first 3 or 4 ply but I'm sure there are a few beyond that which still survive the "at least 10 time" frequency test.

Bookbuilding 101

Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101