I think you should decide for yourself what you should be doing.Dan Honeycutt wrote:Hi All,
I'm working on a book making utility. My general idea is:
(1) Feed it a pgn file or files of decent quality to obtain moves and win/loss statistics for the various positions.
(2) Use an engine to evaluate each book position for some specified time. Of course this could take days for a book of any size but, hey, it's a one-shot deal.
(3) Assign a score to each move based on the win/loss percentage and the engine evaluation. Popularity could also be a factor. The move score would determine the probability that a move is selected.
I don't know squat about book making. Does what I'm doing make sense? What else should I be doing? Any and all comments are welcome.
I'm storing the book as a CSV file. One line per position+move + additional data such as distance from root, path errors and such.
On Unix you can do a fast lookup with the 'look' command. (on Ubuntu 'look -b'). It is faster than the 0.1 second minmove time of servers anyway.
During book updating I load the graph into memory. Memory is cheap. My book contains approximately 2.5M positions.
I have started with the 1M most frequently occurring positions from PGN collections. Each position is evaluated by a deep search. Moves that lead to other nodes in the graph are excluded from this search, otherwise you can't back propagate the values properly.
The graph is mini-maxed, and path errors are calculated for each side.
(An path error is the cumulative error from root to node. It is useful to keep them separated by color. If there are many paths leading to the same position, you can normalize on the smallest combined error).
I play moves as long as the path error for the engine doesn't exceed 0.1 pawns. All positions in the graph where one side's path error is below this 0.1 pawn are considered 'repertoire': positions that can occur in games.
I extend the book continuously with:
1. Actually played lines until the program got into a lost position
2. Lines from general PGN games where one side plays the computer's repertoire and ends up in a bad position. These games are filtered for blunders.
3. Drop-out expansion from repertoire nodes, with the provision that I extend one move at a time (not all of them as in classical drop-out expansion)
4. Moves played in PGN games more than 3 times from repertoire positions.
Note that in none of these steps I ever use the PGN "Result" tags, always the engine's own judgment.
The book grown this way is needed to dampen the engine's eagerness to play into gambits. Without book it will do that too easily. With deeper analyses and experience they get weeded out.
This updating is a continuous effort. The machine which plays on FICS and ICC has background processes for this. Since I don't want these processes to take any CPU time away from when the engine is playing, I schedule them with 'idprio' (the counterpart of 'rtprio'), which unfortunately is not available on Linux.
I have not observed any I/O issues. The whole effort is nicely CPU bound.