Some opening book design questions

phhnguyen · Post by **phhnguyen** » Fri Feb 23, 2018 3:02 am

noobpwnftw wrote:Crazy or not, here are some stats on binary encoded FENs from 8.85 billion stored positions in bytes:
min-keylen = 17, max-keylen = 45, avg-keylen = 35.19, avg-zkey = 7.34

After compression(as shown by avg-zkey) it is actually less than 8 bytes, yet seeking among keys are close to O(1). You can apply data specific filters during compression since you have quite a lot of similar rows in the opening positions.

Then, you can convert it to any other formats without having to iterate from a starting position or do de-duplication.

Thanks for very interesting information. That is amazing when you could compress FEN to such small size!

I know the main advantage of using FEN over hash key is that we can convert two ways (FEN <--> board). However I for opening book I see it has some main drawbacks, comparing to hash key:
- The average FEN size is still too large. A hash key takes only 8 bytes instead of over 35 bytes in uncompress format. In my previous book I cut down to 7 bytes only to save more space.
- FENs have not fixed lengths, therefore you should store their lengths and / or a pointer tables somewhere to access the data. It requires more memory and computing.
- Compare an array (FEN) must be slower than a number. Converting board into FEN is a bit slow than computing hash key and that is not easy to update on fly.

Perhaps you take a look into piece list? Once I have done a roughly calculating. A piece list for Xiangqi may take under 24 bytes per chess position. It is significantly smaller than 35 bytes and it has fixed size.

noobpwnftw wrote: Certain positions in Xiangqi depend on whether the previous move contained a check or chase to determine the upcoming repetition is draw or lose. GUIs usually check it based on history moves, but when a move would cause reputation lose is found on the current moving side then it may be too late to avoid it without a major sacrifice. A small forward search in the book must be performed to avoid such cases.

I think a good book maker can check out those problems first before storing a move into a book.

noobpwnftw wrote: If you store the book as "position_after_move -> score" then for each position you need to generate all moves and probe multiple times, I suggest you store the book as "position -> [move1 -> score1, move2 -> score2, ...]".

By that you need to store moves and move lengths into books and can easily make books few times bigger.

Actually generating, making move, searching new positions out from a book may take few more milliseconds. That is not worth to save that computing time.

noobpwnftw wrote:
The last question, some opening variants may eventually lead to the same mid-game positions that can be reached by engine search, the book might not cover all paths towards it. Just like how WDL tablebases would benefit mid-game search.

I am not sure about this point. So those positions are still reachable in opening? If yes they could be searched as usual I guess.

noobpwnftw wrote: Try out this one:
http://www.chessdb.cn/query_en/
The opening part contains 8.85 billion of unique positions with at least one move per position in ~32GB, covers almost every opening variation in Xiangqi. Each leaf is evaluated by engine no less than 20 plies and then back propagated to the root.

Good effort! Keep up your excellent work!!!

phhnguyen · Post by **phhnguyen** » Fri Feb 23, 2018 3:10 am

hgm wrote: I am not saying Polyglot format is optimal. OTOH, its drawbacks are hardly important at todays storage and memory capacities, and compatibility with other software can be very useful.

I think Polyglot data may be double size than an optimal one. It may be almost nothing for small books but it may also be a trouble for downloading and using huge books.

I am still considering between modifying Polyglot or developing a total new one. For Xiangqi I could re-use very little from Polyglot thus compatibility is not an issue.

Just curious questions: do you (xboard) load all data of a book into memory or search from hard disk? If it is loaded do you release it after being out of the book?

noobpwnftw · Post by **noobpwnftw** » Fri Feb 23, 2018 7:56 am

1. Probing radix trees does not require comparisons of uncompressed form, so effectively I'm comparing for 7.34 bytes per position on average.
2. Fixed lengths or not do not really matter since you won't be storing your book as "data[index]" anyway. Sorted skip lists would seek just as fast.
3. https://en.wikipedia.org/wiki/Judy_array

How can you represent this position with 24 bytes of piece list?
r1bakabr1/9/nc2c1n2/p1p1p3p/6p2/2P6/P3P1P1P/1CN1B1C1N/4A4/R3KAB1R b
My binary FEN would took 35 bytes:
0x60327236088548140581101018218518181985982909090cd0b0c0d83a83e82fab0e10

I don't see an advantage of using piece lists on a 90-square board.

I think a good book maker can check out those problems first before storing a move into a book.

This is something you can't rely on when designing the book format.

By that you need to store moves and move lengths into books and can easily make books few times bigger.

Individual move data are of fixed length, storing one extra byte of move count won't make the book few times bigger. Also, storing a move instead of storing a key usually saves space.

phhnguyen · Post by **phhnguyen** » Fri Feb 23, 2018 9:53 am

noobpwnftw wrote:1. Probing radix trees does not require comparisons of uncompressed form, so effectively I'm comparing for 7.34 bytes per position on average.
2. Fixed lengths or not do not really matter since you won't be storing your book as "data[index]" anyway. Sorted skip lists would seek just as fast.
3. https://en.wikipedia.org/wiki/Judy_array

How can you represent this position with 24 bytes of piece list?
r1bakabr1/9/nc2c1n2/p1p1p3p/6p2/2P6/P3P1P1P/1CN1B1C1N/4A4/R3KAB1R b
My binary FEN would took 35 bytes:
0x60327236088548140581101018218518181985982909090cd0b0c0d83a83e82fab0e10

I don't see an advantage of using piece lists on a 90-square board.

The define of piece list in CPW:
https://chessprogramming.wikispaces.com/Piece-Lists

It is simply an array of 32 items for 32 pieces. Lucky for Xiangqi it has no promotion thus we don't need to save piece type into piece list, only positions of pieces.

I usually write a piece list for 2 side as:

Code: Select all

byte pieceList&#91;2&#93;&#91;16&#93;;

The first index (0) is always King, the 2nd and 3th for two Advisors... the last 5 indexes for 5 Pawns.

With that "standard" pieceList, it takes only 32 bytes for whole board, already smaller than your average length 35 bytes of your FENs.

You may see that there are some redundant bits here. For 90 squares we don't need whole byte but 7 bits only. 6 pieces of Rooks Cannons, Horses need 6 x 7 = 42 bits. The rest 26 pieces need roughly 6 bits each, take 26 x 6 = 156 bit. Total is 42 + 156 = 198 bits = 24.75 bytes.

You may reduce more since you are EGTB expert, say King + Advisors may take totally under 1200 combinations (they need about 11 bits for 3 pieces), Elephants can be located in totally 7 positions (need 3 bits only). All may reduce the size to 21 - 22 bytes.

noobpwnftw wrote: Individual move data are of fixed length, storing one extra byte of move count won't make the book few times bigger. Also, storing a move instead of storing a key usually saves space.

Do you mean you did not store position for each entry but moves only? I knew that method but it has some big drawbacks for chess engines (but it is good for your website).

For each entry I store only 7 bytes key and 1 byte weight, total is 8 bytes.

noobpwnftw · Post by **noobpwnftw** » Fri Feb 23, 2018 3:23 pm

It would then require 12(RCN) * 7 + 6(P) * 10 + 4(B) * 4 + 3(A) * 4 + 4(K) * 2 + 1(turn) = 181 bits or even less if palace states are combined.

Although keys(either hash or other formats) are known before probing the book, unlike EGTBs that you won't store keys at all(accessed with data[index] = value), opening book is a sparse array so you always store at least partial key information for each position for its reference to hold, picking a key that compresses well is important. As you may see the majority of book data are just the keys.

Piece list seems very viable, I would expect it compresses well since only a few bits are changed when making a move.

jdart · Post by **jdart** » Sat Feb 24, 2018 4:14 am

I used to use memory-mapped files and only load pages from the book as needed, but with the size book I have and the size of memory on modern systems it does not really make sense to do that anymore.

But if you do want to do that you should probably just borrow an existing framework like LMDB.

--Jon

noobpwnftw · Post by **noobpwnftw** » Sat Feb 24, 2018 4:55 pm

Actually, Chess has every necessary parts ready(i.e. results from fishtest) to make a book by brute force. It might practically ultra-weak solve the opening.

Ras · Post by **Ras** » Sun Feb 25, 2018 10:50 am

phhnguyen wrote:For chess do you check with vertical flipping? Any drawback?

In the CT800 dedicated unit, there is first a "normal" book lookup, and only if that fails, a reversed one. Works as defence against humans who might try to lose a tempo for throwing the machine out of its book early.

In my previous implementation I did binary search based on sorted positions. That required to search whole data every time. Any tips/tricks to reduce search scope?

The CT800 has sorted positions, but also "cache indices" that are derived when compiling the book. You get the "cache index" by shifting down the position key by the appropriate number of bits. The search ends when the book key is greater than the position key (book is sorted in ascending position order).

The position key is made of a combined CRC-40, which is a CRC-32 as primary key, and only if that matches, a secondary CRC-8 is checked. Both CRCs have polynomials that are prime to each other.

Book moves are not taken directly from the book. Instead, a book lookup scans for all moves that are recorded for that positionkey and then matches them against the list of legal moves for the board position. Not only this prevents any risk of doing an illegal move, but it also allows to drop the en passant state when calculating the position key.

The book can always contain an en passant move, and if the position is reached via a transposition where that en passant move is not possible, the move matching will filter out that move from the book.

I don’t understand why Polyglot, ChessBase store moves in their opening positions?

The CT800 has active and passive book knowledge. For active moves, the move is recorded for its current position, and the resulting position also. For passive moves, only the resulting position. If the opponent makes that move, the CT800 has answers, but it won't do a passive move by itself.

jdart · Post by **jdart** » Thu Mar 01, 2018 2:25 am

I forgot to mention:

There is a trick you can use to store the moves, which is to implement a move generator that can produce moves in a repeatable sort order, and then store the move indices in the book. You don't have to store the move itself, because you can just run the move generator, get the full list of moves and then pick the nth move, where n is returned from the book.

--Jon

Aleks Peshkov · Post by **Aleks Peshkov** » Thu Mar 01, 2018 9:47 am

jdart wrote:Color-flipped positions *can* occur in chess, but I would be surprised if they were frequent enough to make this optimization a significant win.

--Jon

It is easy to make color-independent but side-to-move dependent board representation. So all hashing and evaluation are transparently symmetrical.

Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions

Re: Some opening book design questions