Symbolic has a makebook command which takes four parameters:
1) Name of PGN input file
2) Name of book output file
3) Maximum variation ply length
4) Minimum game count for a position/move to appear
The variation length is usually set to 64, but a smaller figure like 32 or 40 would likely be okay and might even be better.
The minimum game count is the lower bound on the number of games in which a position/move pair must appear for the pair to be included in the book. Larger values mean smaller books and better safety from noise. Smaller values mean more knowledge in the book. Sixteen is a good value here for large or poorly edited PGN files.
Symbolic chugs through the input PGN file one game at a time. A game is used for the book only if the game started normally and had a definite conclusion. A valid game is played for the maximum variation length or to its end, whichever comes first. For each position and its immediately following move, the move and the 128 bit position signature is stored in a big fat binary tree with the signature as the key value in each node. Also in each node is a WLD (Win/Lose/Draw) counter vector; this vector gets bumped in the slot corresponding to the game's outcome.
All data is normalized from a White-To-Move standpoint before tree insertion/adjustment.
When the scan of the input PGN file is complete, Symbolic then writes the big fat binary tree to the book output file in signature order from low to high. Only those nodes with a WLD total greater than or equal to the minimum game count are output. All data is written in binary format and in small endian order: 4 bytes for the move, 12 bytes for the WLD, and 16 bytes for the signature for a total of 32 bytes. This makes the book file portable across different hosts.
To probe the book, a simple binary search is performed to find a matching position signature. If one is found, then the program backs up its file read pointer 32 bytes at a time to the very first position/move record match for the position being probed. Then the program advances to the last matching record, collecting the moves and their WLD statistics. The result is then fed to whatever routine which asked for the probe. Note that appropriate Side-To-Move mapping of data in and out is required if Black is on the move.
Because the book file is not very large compared to the amount of memory available, at program start time the entire book is loaded into memory to enable very fast probing. If needed, the program can switch books or have multiple books.
Sample log excerpt:
Code: Select all
2015-07-12 05:25:50.995 = C'tor: BookEngine
2015-07-12 05:25:50.995 = Loading opening book from file: Book
2015-07-12 05:25:50.995 = Book file byte count: 7,261,184
2015-07-12 05:25:50.995 = Book file move count: 226,912
Here is a list of the book files in use. Each file was built form the same PGN input but with a different minimum game count parameter.
Code: Select all
232 -rw-r--r--@ 1 sje staff 116896 Sep 28 2012 mondo1024.book
456 -rw-r--r--@ 1 sje staff 232288 Sep 28 2012 mondo512.book
904 -rw-r--r--@ 1 sje staff 462784 Sep 28 2012 mondo256.book
1776 -rw-r--r--@ 1 sje staff 907520 Sep 28 2012 mondo128.book
3496 -rw-r--r--@ 1 sje staff 1787360 Sep 28 2012 mondo64.book
6976 -rw-r--r--@ 1 sje staff 3568032 Sep 28 2012 mondo32.book
14192 -rw-r--r--@ 1 sje staff 7261184 Jul 1 17:49 mondo16.book
30104 -rw-r--r--@ 1 sje staff 15410656 Sep 28 2012 mondo8.book
72656 -rw-r--r--@ 1 sje staff 37198208 Sep 28 2012 mondo4.book
1) Building the book can eat a lot of memory for storing the tree.
2) It would probably be safe to use 64 bit signatures instead of 128 bit signatures.
3) Stored moves could be packed into 16 bits instead of 32, but as with signatures, I prefer that the internal and external data formats match.
4) I've tried many alternatives over the years, and this is the fastest and simplest.