Thanks for very interesting information. That is amazing when you could compress FEN to such small size!noobpwnftw wrote:Crazy or not, here are some stats on binary encoded FENs from 8.85 billion stored positions in bytes:
min-keylen = 17, max-keylen = 45, avg-keylen = 35.19, avg-zkey = 7.34
After compression(as shown by avg-zkey) it is actually less than 8 bytes, yet seeking among keys are close to O(1). You can apply data specific filters during compression since you have quite a lot of similar rows in the opening positions.
Then, you can convert it to any other formats without having to iterate from a starting position or do de-duplication.
I know the main advantage of using FEN over hash key is that we can convert two ways (FEN <--> board). However I for opening book I see it has some main drawbacks, comparing to hash key:
- The average FEN size is still too large. A hash key takes only 8 bytes instead of over 35 bytes in uncompress format. In my previous book I cut down to 7 bytes only to save more space.
- FENs have not fixed lengths, therefore you should store their lengths and / or a pointer tables somewhere to access the data. It requires more memory and computing.
- Compare an array (FEN) must be slower than a number. Converting board into FEN is a bit slow than computing hash key and that is not easy to update on fly.
Perhaps you take a look into piece list? Once I have done a roughly calculating. A piece list for Xiangqi may take under 24 bytes per chess position. It is significantly smaller than 35 bytes and it has fixed size.
I think a good book maker can check out those problems first before storing a move into a book.noobpwnftw wrote: Certain positions in Xiangqi depend on whether the previous move contained a check or chase to determine the upcoming repetition is draw or lose. GUIs usually check it based on history moves, but when a move would cause reputation lose is found on the current moving side then it may be too late to avoid it without a major sacrifice. A small forward search in the book must be performed to avoid such cases.
By that you need to store moves and move lengths into books and can easily make books few times bigger.noobpwnftw wrote: If you store the book as "position_after_move -> score" then for each position you need to generate all moves and probe multiple times, I suggest you store the book as "position -> [move1 -> score1, move2 -> score2, ...]".
Actually generating, making move, searching new positions out from a book may take few more milliseconds. That is not worth to save that computing time.
I am not sure about this point. So those positions are still reachable in opening? If yes they could be searched as usual I guess.noobpwnftw wrote:
The last question, some opening variants may eventually lead to the same mid-game positions that can be reached by engine search, the book might not cover all paths towards it. Just like how WDL tablebases would benefit mid-game search.
Good effort! Keep up your excellent work!!!noobpwnftw wrote: Try out this one:
http://www.chessdb.cn/query_en/
The opening part contains 8.85 billion of unique positions with at least one move per position in ~32GB, covers almost every opening variation in Xiangqi. Each leaf is evaluated by engine no less than 20 plies and then back propagated to the root.