Bookbuilding 101

marcelk · Post by **marcelk** » Tue Aug 07, 2012 7:09 pm

Dan Honeycutt wrote:Hi All,

I'm working on a book making utility. My general idea is:

(1) Feed it a pgn file or files of decent quality to obtain moves and win/loss statistics for the various positions.

(2) Use an engine to evaluate each book position for some specified time. Of course this could take days for a book of any size but, hey, it's a one-shot deal.

(3) Assign a score to each move based on the win/loss percentage and the engine evaluation. Popularity could also be a factor. The move score would determine the probability that a move is selected.

I don't know squat about book making. Does what I'm doing make sense? What else should I be doing? Any and all comments are welcome.

Best
Dan H.

I think you should decide for yourself what you should be doing.

I'm storing the book as a CSV file. One line per position+move + additional data such as distance from root, path errors and such.

On Unix you can do a fast lookup with the 'look' command. (on Ubuntu 'look -b'). It is faster than the 0.1 second minmove time of servers anyway.

During book updating I load the graph into memory. Memory is cheap. My book contains approximately 2.5M positions.

I have started with the 1M most frequently occurring positions from PGN collections. Each position is evaluated by a deep search. Moves that lead to other nodes in the graph are excluded from this search, otherwise you can't back propagate the values properly.

The graph is mini-maxed, and path errors are calculated for each side.
(An path error is the cumulative error from root to node. It is useful to keep them separated by color. If there are many paths leading to the same position, you can normalize on the smallest combined error).

I play moves as long as the path error for the engine doesn't exceed 0.1 pawns. All positions in the graph where one side's path error is below this 0.1 pawn are considered 'repertoire': positions that can occur in games.

I extend the book continuously with:
1. Actually played lines until the program got into a lost position
2. Lines from general PGN games where one side plays the computer's repertoire and ends up in a bad position. These games are filtered for blunders.
3. Drop-out expansion from repertoire nodes, with the provision that I extend one move at a time (not all of them as in classical drop-out expansion)
4. Moves played in PGN games more than 3 times from repertoire positions.

Note that in none of these steps I ever use the PGN "Result" tags, always the engine's own judgment.

The book grown this way is needed to dampen the engine's eagerness to play into gambits. Without book it will do that too easily. With deeper analyses and experience they get weeded out.

This updating is a continuous effort. The machine which plays on FICS and ICC has background processes for this. Since I don't want these processes to take any CPU time away from when the engine is playing, I schedule them with 'idprio' (the counterpart of 'rtprio'), which unfortunately is not available on Linux.

I have not observed any I/O issues. The whole effort is nicely CPU bound.

jefk · Post by **jefk** » Wed Aug 08, 2012 7:06 pm

[quote="Jim Ablett"]Bookbuilder program does something similar >
http://superchess.com/
Jim.[/quote]

yes that's me, but what i found out by the years is that there is a large amount of theoretical chess opening knowledge by GM's and in computer chess we only are approaching that, so you have to get that first.

and then you can think about improvements.
so its basically a lot of handmade work, finetuning, and so on.
books like the Quality chess GM repertoire series for example
are quite good. it you want to re-invent the wheel by letting
the computer try to re-invent chess opening theory, it will
cost you a lot of time, and i know that because i did it

but the end results are better than when you only let
an engine as eg. Crafty play is booklearned moves, which
still are lousy in many cases; eg 1.e4 d5?
scandinavian, 1-0.

of course a Chessbase format book like the hiarcs14 book is
pretty good, but its in the Chessbase format and not ideal.

the best way to have access to the constant evolving chess theory is database which some GM's and correspondence chess masters are
willing to update, the New in Chess guys probably do something like
that for themselves. and at convekta.com as well i suppose.

but obviously in human chess most GM's like to keep their
ideas private and would not contribute to such a database.

currently i consider my chesspartner userbook as one of the
best computer chess books , had many inputs from an American
IM but the book is still private although i do think sometimes
how to release it. but i wont just simply give it away.

first plan is writing a (E-)book about opening theory, almost
finished, the advanced stage is costing me headaches,
and the maybe a paper-book.

problem with my CP book is that its only for the CP interface
and not for other interfaces as Arena, Chessbase and so on.

and yes i did some book tests at slow time controls on a
Quad and the results were better than Canbaz his perfect2012 book.
(he scored about 53 %, my book scored about 55 pct).

oh PS and for evaluating some of the end(game) nodes i
have tuned some parameters of Houdini, as i still havent
got Komodo5 mp , still waiting for that

jef

jdart · Post by **jdart** » Fri Aug 10, 2012 4:09 am

Crafty play is booklearned moves, which still are lousy in many cases; eg 1.e4 d5? scandinavian, 1-0.

It is not so simple as that. Crafty does play e4 d5, but it is selecting reasonable lines and not doing too badly. Here are a couple examples:

[Event "?"]
[Site "chessclub.com"]
[Date "2012.07.26"]
[Round "?"]
[White "Arasan 15.0"]
[Black "crafty"]
[Result "1/2-1/2"]
[ECO "B01"]
[WhiteElo "2489"]
[BlackElo "2611"]
[TimeControl "900+3"]

1. e4 d5 2. exd5 Qxd5 3. Nc3 Qa5 4. d4 c6 5. Bc4 Bf5 6. Nf3 Nf6
7. Bd2 e6 8. Qe2 Bb4 9. O-O-O Nbd7 10. a3 Bxc3 11. Bxc3 Qc7 12. Ne5
b5 13. Bb3 Be4 14. Bb4 Bd5 15. Nxd7 Qxd7 16. Qe5 Bxb3 17. cxb3 a5
18. Bc5 Qb7 19. Qd6 Ne4 20. Qf4 Nf6 21. Kb1 O-O-O 22. Rc1 Qc7 23. Qf3
Rd5 24. a4 b4 25. Rc2 Re8 26. Rhc1 Kb7 27. h3 Rf5 28. Qd3 Nd7 29. g4
Nxc5 30. Rxc5 Rxc5 31. Rxc5 g6 32. Rb5+ Ka7 33. Rc5 Rd8 34. Qb5 Rd6
35. g5 Qb6 36. Qxa5+ Qxa5 37. Rxa5+ Kb6 38. Rc5 Rxd4 39. Rc4 e5
40. Kc2 c5 41. Rxd4 exd4 42. Kd3 Kc6 43. Kc4 Kb7 44. f3 Kc6 45. Kd3
Kd5 46. h4 Kc6 1/2-1/2 {Game drawn by mutual agreement}

[Event "?"]
[Site "chessclub.com"]
[Date "2012.08.01"]
[Round "?"]
[White "Arasan 14.3"]
[Black "crafty"]
[Result "1/2-1/2"]
[ECO "B01"]
[WhiteElo "2709"]
[BlackElo "2672"]
[TimeControl "300+4"]

1. e4 d5 2. exd5 Qxd5 3. Nc3 Qa5 4. d4 c6 5. Nf3 Nf6 6. Bd2 Bf5
7. Bc4 e6 8. Qe2 Bb4 9. Ne5 Nbd7 10. O-O-O Nxe5 11. dxe5 Nd5 12. Ne4
Bxe4 13. Qxe4 Bxd2+ 14. Rxd2 Nb6 15. Bb3 Nd7 16. f4 O-O-O 17. Rd6 Nc5
18. Qe3 Nxb3+ 19. Qxb3 g5 20. g3 gxf4 21. gxf4 Rxd6 22. exd6 Qd5
23. Qxd5 cxd5 24. Rg1 Rf8 25. Rg7 Kd7 26. Kd2 Kxd6 27. Rxh7 Ke7
28. Rh3 Rg8 29. Rg3 Rh8 30. h3 Rh4 31. Ke3 b5 32. a3 Kd6 33. Rf3 f6
34. Kf2 Rh8 35. Kg3 Rg8+ 36. Kf2 a5 37. c3 Rh8 38. b4 a4 39. Kg2 Rh6
40. Kg3 e5 41. fxe5+ fxe5 42. Rf8 Rg6+ 43. Kf2 e4 44. Rd8+ Ke6
45. Re8+ Kd7 46. Rb8 Rf6+ 47. Ke1 Kc6 48. Rc8+ Kd6 49. Rc5 Rh6
50. Rxb5 Rxh3 51. Kd2 d4 52. c4 Rd3+ 53. Ke2 Rxa3 54. Rd5+ Ke6
55. Rxd4 Ke5 56. Rd1 Rb3 57. b5 a3 58. Rd7 Rb1 59. Ra7 Rb2+ 60. Ke3
Rb3+ 61. Kf2 Kd4 62. b6 Rb2+ 63. Kf1 Kxc4 64. Rxa3 Kd4 65. b7 Rxb7
66. Ra4+ Ke3 67. Ra3+ Kf4 68. Kg2 Rb2+ 69. Kf1 Rc2 70. Ke1 Rh2
71. Rb3 Rh6 72. Ke2 Rh5 73. Kf2 Rd5 74. Ke2 Ra5 75. Rb2 Ra3 76. Rc2
Rh3 77. Kf1 Kf3 78. Rf2+ Ke3 79. Re2+ Kd3 80. Ra2 Rh1+ 81. Kg2 Rd1
82. Ra3+ Ke2 83. Ra2+ Ke1 84. Ra3 Rd2+ 85. Kg1 Rd3 86. Ra1+ Ke2
87. Kg2 e3 88. Kg3 Rc3 89. Kg2 Rd3 90. Kg3 Rc3 91. Kg2 Rc6 92. Ra2+
Kd3 93. Ra3+ Rc3 94. Ra1 Ke2 95. Ra2+ Kd1 96. Kf1 Rc4 97. Ra1+ Kd2
98. Ra2+ 1/2-1/2 {Game drawn by mutual agreement}

Ozymandias · Post by **Ozymandias** » Fri Aug 10, 2012 12:17 pm

"it takes a PGN as input and outputs a subset (the valid PGNs) to stdout"
I only see the subset in the command line box, no pgn created that I could find.

Norm Pollock · Post by **Norm Pollock** » Fri Aug 10, 2012 2:50 pm

jdart wrote:It is one of several utility programs I don't distribute with the Arasan source. But I don't mind if you use it.

I have uploaded a Windows executable to

http://www.arasanchess.org/playchess2.exe

It is a filter so it takes a PGN as input and outputs a subset (the valid PGNs) to stdout. It can handle comments but not annotated games with variations.

Caveat: its eval is only as good the Arasan search engine that is embedded in it. Also the margins for considering a game valid are hard coded.

Jon-

Thanks Jon. With a few adjustments this program can be a very valuable tool. There are many games in pgn databases with incorrect results recorded. I base this assumption on the number of games I found with incorrect results where the result is a checkmate or stalemate. A program named "joined" by Andreas Stable is able to find such games with faux results. But there is no similar tool for games that end by mutual agreement.

What this tool needs most is the additional output of games that are considered "suspect". Right now it only outputs games that seem OK.

-Norm

Ozymandias · Post by **Ozymandias** » Fri Aug 10, 2012 3:13 pm

"Right now it only outputs games that seem OK"
Hi Norm,
Being an expert in PGN tools is no wonder you found said games, could you tell me where?

jdart · Post by **jdart** » Fri Aug 10, 2012 4:21 pm

To save the output you need to redirect to a file, for example:

playchess2 input.pgn > output.pgn

--Jon

jefk · Post by **jefk** » Fri Aug 10, 2012 5:46 pm

[quote="jdart"][quote]Crafty play is booklearned moves, which still are lousy in many cases; eg 1.e4 d5? scandinavian, 1-0. [/quote]
It is not so simple as that. Crafty does play e4 d5, but it is selecting reasonable lines and not doing too badly. Here are a couple examples:
{Game drawn by mutual agreement}[/quote]

well i know, it was a bit ironic, Scandinavian is a well established defense,
and it might score well against humans. but with houdini and some sound lines i didnt have much problems with it. so i still prefer sicilian or french against e4.

For the rest the Crafty book also gets stuck for White after 1.d4 Nf6, and then is going to think for itself it seems. and comes up with 2.Nf3 or c4 after a long time. they should fix that i think.

but currently since August 3 or so Crafty isnt online anymore on the ICC.

hope they come back

jef

Norm Pollock · Post by **Norm Pollock** » Fri Aug 10, 2012 5:48 pm

Ozymandias wrote:"Right now it only outputs games that seem OK"
Hi Norm,
Being an expert in PGN tools is no wonder you found said games, could you tell me where?

I will only say that the dbs on my page DO NOT have incorrect results in games ending in stalemate or checkmate. To see if other dbs have incorrect results from checkmate/stalemate, use the command

joined -q -p -v126 inputfile.pgn

You can get "joined" from Jim Ablett's site.

If you get too much output due to Chessbase's proprietary interpretation of PGN Standards, in particular the dis-ambiguation rule, you could first "treat" your db with this command:

pgn-extract -s -C -N -V -oout.pgn inputfile.pgn

then

joined -q -p -v126 out.pgn

Norm Pollock · Post by **Norm Pollock** » Fri Aug 10, 2012 5:50 pm

jdart wrote:To save the output you need to redirect to a file, for example:

playchess2 input.pgn > output.pgn

--Jon

I did realize that. But what I am asking for is a way to capture the games that were excluded. Namely, the games that seem to have an incorrect result based on the evaluation of the final position.

-Norm

Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101