creating an opening book

flok · Post by **flok** » Wed Jun 24, 2015 11:00 am

Hi,

I'm currently building an opening book. I'll describe my approach and I would appreciate your feedback.

First I start by storing all unique FEN-strings for the first 7 plies (starting at the start position) including quiescence nodes (attack + promotion moves).
I use the FEN string because with the zobrist hashing I get 14% collisions for 2.5 million nodes.
After that I iterated through that list of fen-strings and calculate the score for that position with a search depth of 15 plies (+/- 1s).

Then wen my program wants to check the book, I generate the fen-string for each move and find position (and thus move) with the best (biggest) score.

stegemma · Post by **stegemma** » Wed Jun 24, 2015 1:15 pm

flok wrote:Hi,

I'm currently building an opening book. I'll describe my approach and I would appreciate your feedback.

First I start by storing all unique FEN-strings for the first 7 plies (starting at the start position) including quiescence nodes (attack + promotion moves).
I use the FEN string because with the zobrist hashing I get 14% collisions for 2.5 million nodes.
After that I iterated through that list of fen-strings and calculate the score for that position with a search depth of 15 plies (+/- 1s).

Then wen my program wants to check the book, I generate the fen-string for each move and find position (and thus move) with the best (biggest) score.

To save space and avoid collisions you could compress FEN string in different ways; the simpler one is just removing the slashes "/" but a binary form could be better. There are a lot of past threads talking about the shorter way to define a position. Just removing slashes and maybe the first space after the initial 64 characters would save 8 byte per position. The castling flags could be substituted by a single character, maybe the hexadecimal value of:

KQkq
0000 binary digits

0 = no castling
1 = q castling
2 = k castling
3 = kq castling
...
F = KQkq

Now you saved more than 9 bytes. In a similar way, you can save another (eventual) byte for enpassant, numbering the sixteen enpassant squares from 0 to F.

Those "compression" is zero cost, because you still would have to build the FEN string at runtime and this way you just have to build a shorter string, that would be almost human readable.

Ferdy · Post by **Ferdy** » Wed Jun 24, 2015 1:52 pm

flok wrote:Hi,

I'm currently building an opening book. I'll describe my approach and I would appreciate your feedback.

First I start by storing all unique FEN-strings for the first 7 plies (starting at the start position) including quiescence nodes (attack + promotion moves).
I use the FEN string because with the zobrist hashing I get 14% collisions for 2.5 million nodes.
After that I iterated through that list of fen-strings and calculate the score for that position with a search depth of 15 plies (+/- 1s).

Then wen my program wants to check the book, I generate the fen-string for each move and find position (and thus move) with the best (biggest) score.

I think it is a good start. Perhaps some positions may need more than 15 plies of analysis.

I thought of adding not only the best reply move and score but also the 2nd best move and 3rd best move, especially when the difference between 1st best and 3rd best is not that big, say 10 cp or less. Later this can be used when you have a random book play mode in the engine implemented.

This can be extended of course, when the position is used in the game and that game resulted in a loss, then perhaps you update the loss counter of the fen say,

fen; e2e4; 5; 15; sf6; 2015.01.25; 1; 2; 4;

where:
fen is the fen string
e2e4 is the move recommended in this position
5 is the search score during analysis in cp
15 is the analysis depth
sf6 is the engine that analyzes the positions
2015.01.25 is the date the analysis was done
1 is the win counts
2 is the loss counts
4 is the draw counts

Later you may decide whether to take this fen or not having the result stats at hand.

Then you can take a large game file with high quality and browse thru the the opening moves and game results and update your fen book on results stats - sort of learning. Perhaps a new field in your book say.

fen; e2e4; 5; 15; sf6; 2015.01.25; 1; 2; 4; 5;4;24

where the 5;4;24 are the fields for learning (W/L/D) stat. Selecting a book move would be a fun then when you consider those fields.

We can make this a little bit complicated by recording the average elo of the player in the learning stat, say

... 2800; 2750; 2780

2800 is the ave rating of the win learn stat
same for 2750 and 2780 for loss and draw respectively.

Some of these I learned from Lissandrello author of Neurone.

Robert Pope · Post by **Robert Pope** » Wed Jun 24, 2015 5:15 pm

flok wrote: ...because with the zobrist hashing I get 14% collisions for 2.5 million nodes.

Really? That seems crazy high.

I just ran a simulation of 3 million nodes, and got 0 collisions with a 64 bit hash, and only 6 collistions with a 40 bit hash.

jdart · Post by **jdart** » Wed Jun 24, 2015 6:00 pm

Yeah, you should get few collisions, but you need at least a 64-bit hashcode.

Basing opening selection on search scores will work most of the time but there are openings where that won't help, at least with a moderate-depth search.

For example, this position from Karjakin-Carlsen, Baku 2008:

[D] 3qr1k1/pp1bppb1/3p1npB/2r1n2p/3NP2P/1BN2P2/PPPQ2P1/1K1R3R b - - 0 15

Carlsen played Nc4 and this is the best-scoring move. But may not be your engine's first choice.

Gambit openings are even tricker and often there is a known "only move" that is challenging to find through search.

--Jon

mvk · Post by **mvk** » Wed Jun 24, 2015 7:06 pm

Some random points:

- The hash collision rate seems wrong, or your hashing method is too weak.

- Storing plain FENs is perfectly ok for books and simplifies code. I use one line per move, repeating the FEN on each line. No sweat. This way I can just use the Unix 'look' command with popen() for a binary search on the file. No need to cram bits these days.

- I would advice to do exclusion searches. This way you can chain the book positions into a big tree, minimax it, and get higher quality moves near the root.

- Are you proposing a 7-ply full width book? At least you can apply the alpha-beta logic and skip most positions. Maybe I misunderstand. perft 7 is much larger than 2.5M, and sqrt(perft 7) is about 60k.

- Don't do fixed-depth searches. Strength correlates with nodes searched more than with depth reached (given the same program). There is a lot of variability among same-depth positions. I switched to using a node-count based search for my book when it became larger.

- 1 second is too low if you want to use the move at time levels that are slower. First order, the time advantage for playing a book move at once doesn't compensate the quality problem of playing a shallow move. If, during the game, you can outsearch the book move, don't use the book move but search.

flok · Post by **flok** » Wed Jun 24, 2015 8:52 pm

mvk wrote:Some random points:
- The hash collision rate seems wrong, or your hashing method is too weak.

Multiple persons said this so it must be true

I store all data in a mysql database. Very convenient.
Now 369494 hashes out of 3290575 unique (fen-)entries have 2 or 3 collisions.
I looked at one and it looks like this:

Code: Select all

mysql> select * from entries where hash=-9221985131600632708;
+----------------------------------------------------------------+----------------------+-------+------+
| fen                                                            | hash                 | plies | eval |
+----------------------------------------------------------------+----------------------+-------+------+
| rnbqkbnr/1ppppp1p/8/p5p1/1P1P4/6P1/P1P1PP1P/RNBQKBNR b KQkq -  | -9221985131600632708 |    15 |   23 |
| rnbqkbnr/1ppppp1p/8/p5p1/1P1P4/6P1/P1P1PP1P/RNBQKBNR b KQkq b3 | -9221985131600632708 |    16 |   29 |
| rnbqkbnr/1ppppp1p/8/p5p1/1P1P4/6P1/P1P1PP1P/RNBQKBNR b KQkq d3 | -9221985131600632708 |    16 |   23 |
+----------------------------------------------------------------+----------------------+-------+------+
3 rows in set &#40;2.23 sec&#41;

I checked this for 5 more and it is always one without pawn move and then the other 2 colliding entries are a pawn move.
A bit puzzling. Thinking out aloud: so what I think happened is that the b3 one did earlier d3 and vice versa. Then you would indeed end up with the same fen-string (apart from the en-passant field) and also the same hash-code. If that is indeed what is happening, then I would expect to have maybe even 8 collisions.

- Storing plain FENs is perfectly ok for books and simplifies code. I use one line per move, repeating the FEN on each line. No sweat. This way I can just use the Unix 'look' command with popen() for a binary search on the file. No need to cram bits these days.

Yeah totally agree. I take it a step further by storing the data into mysql. For 3.2 million records the search-time is less than 0.00 seconds.

- I would advice to do exclusion searches. This way you can chain the book positions into a big tree, minimax it, and get higher quality moves near the root.

What is an exclusion search? I googled it but got no relevant hits.

- Are you proposing a 7-ply full width book? At least you can apply the alpha-beta logic and skip most positions. Maybe I misunderstand. perft 7 is much larger than 2.5M, and sqrt(perft 7) is about 60k.

Yes. The values given was not the final 7-ply result but somewhere between 6 and 7. Also it will become slightly more than 7-ply nodes because I do quiesce search as well.
So I calculate the evaluation of all moves in the first 7 plies (+quiescence) and for each of these 3.195.901.860 moves I do a 15-ply search. Hmmm maybe I'll cut that back a bit; that would take 100 years with 1 second per move

- Don't do fixed-depth searches. Strength correlates with nodes searched more than with depth reached (given the same program). There is a lot of variability among same-depth positions. I switched to using a node-count based search for my book when it became larger.

Yes I was planning to let my script update things after an initial fill. I want to have some initial data so that I can experiment with it. The script that retrieves the fen-strings from the database and then sets the score (+depth) is implemented so that it can do a fixed-depth search OR a fixed-time search. In fixed-time it can check if it reached a greater depth and then update he record with the newly found score.

- 1 second is too low if you want to use the move at time levels that are slower. First order, the time advantage for playing a book move at once doesn't compensate the quality problem of playing a shallow move.

Yes the 1-second was the result of the selected ply-depth.

Dann Corbit · Post by **Dann Corbit** » Wed Jun 24, 2015 9:14 pm

jdart wrote:Yeah, you should get few collisions, but you need at least a 64-bit hashcode.

Basing opening selection on search scores will work most of the time but there are openings where that won't help, at least with a moderate-depth search.

For example, this position from Karjakin-Carlsen, Baku 2008:

[D] 3qr1k1/pp1bppb1/3p1npB/2r1n2p/3NP2P/1BN2P2/PPPQ2P1/1K1R3R b - - 0 15

Carlsen played Nc4 and this is the best-scoring move. But may not be your engine's first choice.

Gambit openings are even tricker and often there is a known "only move" that is challenging to find through search.

--Jon

This is an interesting position.

Here is a recent correspondence game that uses that opening {edit -- no, it does not have that opening. I inserted a game from another position I am studying in the same data set.}:

Code: Select all

&#91;Event "WBCCC-2014"&#93;
&#91;Site "http&#58;//rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=29172;pg=1"&#93;
&#91;Date "2014-08-27"&#93;
&#91;White "Michael_Glatthaar"&#93;
&#91;Black "Garvin_Gray"&#93;
&#91;WhiteTime "18 days, 9 hours, 20 minutes"&#93;
&#91;BlackTime "8 days, 7 hours, 2 minutes"&#93;
&#91;Result "1-0"&#93;

1. d4 d5 2. c4 e6 3. Nc3 c6 4. Nf3 Nf6 5. Bg5 dxc4 6. e4 b5 7. e5 h6 8. Bh4 g5
9. Nxg5 hxg5 10. Bxg5 Nbd7 11. g3 Bb7 12. Bg2 Qb6 13. exf6 O-O-O 14. O-O Ne5
15. dxe5 Rxd1 16. Rfxd1 c5 17. Bf4 Rg8 18. Ne4 Qa5 19. Ng5 Bxg2 20. Kxg2 Qa4
21. Bd2 Qa6 22. Ne4 Qa4 23. Nc3 Qa5 24. Be3 Qc7 25. Bf4 Qc6+ 26. f3 b4 27. Ne4
a5 28. Rd2 Qa4 29. h4 Kb7 30. h5 Kc6 31. Rf2 Kc7 32. g4 c3 33. bxc3 c4 34. Kg3
Kb6 35. Rd2 Qa3 36. Be3+ Kc6 37. Rd4 Qb2 38. Rc1 Qxa2 39. cxb4 1-0

mvk · Post by **mvk** » Wed Jun 24, 2015 9:24 pm

flok wrote:
- I would advice to do exclusion searches. This way you can chain the book positions into a big tree, minimax it, and get higher quality moves near the root.
What is an exclusion search? I googled it but got no relevant hits.

An exclusion search is a search where you remove some moves from the root position. In this suggestion, exclude any move that leads to other book positions. With that, you can then do a proper minimax.

One more thing: Your original post suggests that you want to do a 1-ply book probe, and pick the best hit. This can be problematic, especially when the board position is not in book. It can lead to playing book moves backwards, or responding to a blundering opponent with a tame move, instead of capitalising on the blunder. At least check if the board position is in book as well if you want to do this. That mitigates the problems (but doesn't solve them all).

syzygy · Post by **syzygy** » Wed Jun 24, 2015 10:54 pm

flok wrote:I looked at one and it looks like this:

Code: Select all

mysql> select * from entries where hash=-9221985131600632708;
+----------------------------------------------------------------+----------------------+-------+------+
| fen                                                            | hash                 | plies | eval |
+----------------------------------------------------------------+----------------------+-------+------+
| rnbqkbnr/1ppppp1p/8/p5p1/1P1P4/6P1/P1P1PP1P/RNBQKBNR b KQkq -  | -9221985131600632708 |    15 |   23 |
| rnbqkbnr/1ppppp1p/8/p5p1/1P1P4/6P1/P1P1PP1P/RNBQKBNR b KQkq b3 | -9221985131600632708 |    16 |   29 |
| rnbqkbnr/1ppppp1p/8/p5p1/1P1P4/6P1/P1P1PP1P/RNBQKBNR b KQkq d3 | -9221985131600632708 |    16 |   23 |
+----------------------------------------------------------------+----------------------+-------+------+
3 rows in set &#40;2.23 sec&#41;

So the problem obviously is that you are not hashing in en passant rights.

It seems likely that you are also not hashing in castling rights. That will give even more collisions.

https://chessprogramming.wikispaces.com/Zobrist+Hashing

* Four numbers to indicate the castling rights, though usually 16 (2^4) are used for speed
* Eight numbers to indicate the file of a valid En passant square, if any

creating an opening book

creating an opening book

Re: creating an opening book

Re: creating an opening book

Re: creating an opening book

Re: creating an opening book

Re: creating an opening book

Re: creating an opening book

Re: creating an opening book

Re: creating an opening book

Re: creating an opening book