Dense board representation as hash code

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

koedem
Posts: 105
Joined: Fri Mar 18, 2016 10:45 pm

Dense board representation as hash code

Post by koedem »

So I was wondering how expensive it would be to get a collision free hash code for positions so that in search one doesn't have to worry about collisions as well as it possibly helping for perfting where a hash collision really makes a difference.

It's trivial to get 256 Bits and user Pio had the nice idea for 192 Bits: Store occupancy bitboard (64) + 32 * piece type (4) of the occupied squares in some deterministic order. This has the nice side effect that you can make it colour agnostic by having the order be depending on who to move.

A further improvement he came up with was Huffmann encoding the pieces:
King positions will need 12 bits Pawns will be encoded as 1, knights as 011, bishops as 010, rooks as 001 and queen as 000.
Considering kings take 12 Bits here it should be denser to just take them out of the bitboard part and store their position as 6 Bit at a fixed position. A better way may be to have 1 Bit denote if they are on the starting position, and if they are you can use those spare 6 Bits to denote their castling rights.

These changes together probably save at least some 20 Bits (things get a bit tricky when pawns promote into bigger pieces; the first promotion will require a capture so nothing would break there however the second promotion may not have a capture; so if you want to be save you probably have to give 16 Bit of those saved 32 back to ensure no problems for some exotic positions; or one may use the trick further below)
There is some fiddling to do with e.p. too but if smartly done that may not take any more space.

You then will want to get 20 or so more bits saved by choosing the bucket, the simplest here would be to just fold it through XOR. This may not give a nice distribution though the hope could be that the Huffmann compression and possibly some future ideas might at least help scramble the data a bit. If not you'd have to throw an additional hash function on it I suppose.

With 40 saved Bits you would have quite a bit of space for the perft count or for the search information. However in the perft case some select few positions may need more than 40 Bits. So additionally one could allocate one Bit to tell if this is a 192 or 256 Bit hash entry. In the latter case you would have enough space for the hash code of exotic positions and for a high perft count.
You would then have a bucket be 4 * 192 = 3 * 256 Bit = 92 Bytes. Might not be ideal for hash line length but you can probably use some smart prefetching to make that not make a difference.


Overall a TT entry in this usually costs 24 Byte here. This is more expensive than usual however not *that* much more expensive, and you do get colour agnostic which may help in some opening positions for search or with the perft efficiency depending on the position.

Generating the hash code is quite a bit more expensive that a Zobrist hash. One might still try to make it somewhat incremental though that will be a bit of a headache... On the other hand in the search you might (?) save yourself the move validation part if you use lockless hashing?
While I don't think this beats Zobrist hashing methods in terms of play strength it might be an interesting idea to think about and maybe test.

Thoughts? Any good ways to squeeze out another few Bits. (I mean, there have to be; you could for example get something out of the occupancy bitboard as you can have at most 32 1s, 30 without the kings; though compressing that may not be so trivial)
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Dense board representation as hash code

Post by xr_a_y »

There are things like "compress bitset" already. Worth a try ?

https://github.com/lemire/cbitset
https://github.com/lemire/Concise
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Dense board representation as hash code

Post by xr_a_y »

User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Dense board representation as hash code

Post by hgm »

Unfortunately Hufman encoding cannot be updated incrementally in an efficient way. A Zobrist approach with a very long key is much more practical. Human encoding of a position with all pieces present would take 32*1+16*3+12*5+4*6 = 164 bits (where castling rights can be kludged in by representing virgin K and R as Pawns), which gives a good estimate or how many positions are possible. With a key longer than 164 bits it should also be possible to make a Zobrist scheme collision free.
koedem
Posts: 105
Joined: Fri Mar 18, 2016 10:45 pm

Re: Dense board representation as hash code

Post by koedem »

hgm wrote: Sat Feb 01, 2020 11:48 am Unfortunately Hufman encoding cannot be updated incrementally in an efficient way. A Zobrist approach with a very long key is much more practical. Human encoding of a position with all pieces present would take 32*1+16*3+12*5+4*6 = 164 bits (where castling rights can be kludged in by representing virgin K and R as Pawns), which gives a good estimate or how many positions are possible. With a key longer than 164 bits it should also be possible to make a Zobrist scheme collision free.
Well, I would use the same codes for the pieces at all points so I am not really Huffman encoding, I just used it to create my constant piece encodings.
Schematically an update would look like this: Pieces moves from x to y, you change the bitboard and then figure out somehow the how many-th piece it was before and will be after the move. Then you just shift the corresponding part in the representation. (you might even use a rotate thing that rotates the rotated out Bits in again so 110101010101 might become 010101010111 right away)

Of course with a large enough keys a Zobrist approach would still work, but it doesn't seem obvious to me that this would work for such a small number of Bits. The problem with Zobrist would be that a piece on a square would always be encoded in the same way, no matter the rest of the position.

And with e.g. 12 * 64 = 768 values and only, let's say 200 Bits to be generous, it seems hard to believe to me that any possible subset of size 32 would be linearly independent. (there might be a rigorous way to show that too)
Of course that is not strictly necessary as many of them would be illegal positions however even if they were to exist which I am not sure they do, it sounds like a very difficult problem to find such keys.

EDIT: 2^188 < 768 choose 32 < 2^189, gives a lower bound for 189 Bits. Though the true bound will be even higher.

EDIT 2: Wait, I'm stupid, you'd need all subsets of size 64 to be linearly independent, right? Since you remove 32 keys for 32 pieces and replace them with new ones. That would place a new lower bound at as much as 314 Bits.

Anyway, I should just go to sleep before I make even more errors.
koedem
Posts: 105
Joined: Fri Mar 18, 2016 10:45 pm

Re: Dense board representation as hash code

Post by koedem »

After some sleep I realize I wrote a bunch of nonsense. :D
2^188 < 768 choose 32 < 2^189, gives a lower bound for 189 Bits.
should be correct though I think. Simply by starting from an empty board, all possible positions are a subset of all 32 key combinations. (plus less pieces but that should be negligible) So all those combinations should be different, which means that they need to have the possibility to take 768 choose 32 < 2^189 different values. This does not yet imply the existence of such keys as it may be impossible to find keys that so densely "cover" these values without any collisions however 200 Bits or so could well be possible.

However even then I don't see an easy way to make this argument constructive, i.e. to find such keys.

On a side note, it's not possible to edit posts anymore after a couple of hours even if no one responded?
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Dense board representation as hash code

Post by hgm »

With the "768 choose 32" it seems you also want to encode positions with 32 white Kings and such...

For the King you need only 6 independent keys, corresponding to the 6 bits of its square number. If you know there will be at most two Rooks, you can do with 12 independent keys for those (in theory even 11), instead of 64.

Note that for positions without super-numerary pieces one can also get a 187-bit representation by storing the square numbers of the pieces in a fixed order in radix-65 format and the Pawns in radix-49 format. The resulting hash key can be described as an additive (rather than XOR) Zobrist key, so that it can be updated in the familiar incremental way. This considers all pieces as distinguishable, so that in principle a lot could still be gained by exploiting interchangeability of pieces of equal type. Especially for Pawns, where there are two factors 8! = 40,320 to gain (2 x 15 bits!). And that even doesn't account for the fact that Pawns do not have unlimited lateral mobility.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Dense board representation as hash code

Post by hgm »

One more thought: a radix-49 representation of the pawn structure takes 2log 49 = 5.6 bits/pawn, i.e. 90 bits for 16 pawns. This is very inefficient, because (as remarked) it treats all pawns as distinguishable. Uri Blass once proposed a 64-bit representation, consisting of a 48 bits 'pawn-occupancy' bitboard of the 48-square area where pawns can be, plus 16 bits for indicating the color of the 16 pawns. This is in fact a re-shuffled Hufman encoding with empty=0, white P = 10 and black P = 11, where the leading bits of all Hufman codes are collected into the occupancy bitboard.

The Hufman encoding of single squares is not optimal, however, because the fraction of occupied squares (33%) is rather far from 50% and 25% (the powers of 2 for which Hufman encoding is perfect). This can be cured by encoding groups of squares. (2/3)^5 = 32/243 and (2/3)^3 = 8/27 are rather close to a (negative) power of 2 (32/256 and 8/32), so that encoding groups of 5 or 3 squares can be more efficient. In particular, a stretch of 5 empty squares can be encoded as 3 bits, and the 5 cases with 1 Pawn in it with 4 bits, with a pair of Pawns as 5 bits, etc. Stretches of 3 empty can be encoded with 2 bits, with 1 Pawn in it as 3 bits, with a pair as 4 bits, etc.

If the 48-square area is then broken up in 9 stretches of 5 squares, plus 1 of 3 squares, encoding a completely empty board would take 9x3+2 = 29 bits. Every Pawn would increase the size of the code by 1 bit, so the worst case, with 16 Pawns, would take 45 bits.
User avatar
Deberger
Posts: 91
Joined: Sat Nov 02, 2019 6:42 pm
Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ

Re: Dense board representation as hash code

Post by Deberger »

Vladan Vučković, author of the Axon and Achilles chess engines,

describes a compact chessboard representation (C.C.R.) employing a 4-bit piece coding technique.

Features include:

* Generation of the 64-bit Transposition Keys Without Zobrist Method

* The Efficient C.C.R. Conversion to Bitboard

PDF: https://pdfs.semanticscholar.org/474d/c ... 45e7ba.pdf