cms271828 wrote:Hi, I'm trying to put together an opening book, but still I'm a bit unsure how.
I want to build an array of 64-bit values,and use the zobrist key of the current position to index the correct entry.
So with a 64-bit key, and table size 2^16, that would leave 16 free bits in the entry to store some info.
I have a lot of pgn files to use, I would like to do it so that for positions that are the same(with same side to move), I use the move that is played the most frequently.
Perhaps one way (I'm using java), when I scan through the pgn files, I get a new position, I could create an object containing all the moves played for that position, and their frequencies. Then when I goto fill in that free 16 bits, I can use the move played most frequently.
I think I would need to xor in the side to move into the position, so same positions with different sides to move, would be treated as different positions.
I have a couple of other concerns, such as how deep into a pgn game should I scan, seems pointless to use anything later than opening moves, and doing this could overwrite positions that occur more frequently too.
Is my idea any good, thanks for any help.

If you only use 16 bit in one entry you only have one move to play.
That would be a boring opening database. And it could easily be attacked.
Consider using a list of moves (with statistics attached to it). Either make
it a fixed list of 0 to 5 moves using another 64=4*16 move bits and
generic probability (50, 25, 13, 8, 4 percent) or a real data structure
with fixed M moves (move, win percentage) or (move, #win, #draw, #loss)
or use a dynamic data structure for a different number of moves for each position.
The move list of a position may stop after <= 5 percent or <= 10 times played.
These filters of win percentage or number of moves played from a
position in a PGN database can be used as a filter when you retrieve
opening positions from the PGN database. Another filter may be the
players' ELO (> 2200) or ELO difference (< 200).
I did the PGN scanning and some of the filters in a Python program
myself but I did not have the time to finish my own opening book.
I have not decided whether I use an hash index plus linear lists or a
kind of tree with the key split in portions of 8 bit, each an index in an
array of pointers to the next 8 bit branch or NULL if there is no more entry.
If you want the position-move-entries to have a fixed size you can store a
pointer or offset in them, pointing to a raw data array with all the dynamic infos.
If this array grows too big you can split it into parts and load the
parts on the fly in a cache.
Harald