Hash value composition

ydebilloez · Post by **ydebilloez** » Thu Jun 17, 2021 11:28 am

Dear all,

I would like to see if following modifications are better/equal/worse for hash value calculation.
In order to avoid collision, a same hash value should not represent two different positions, but in chess, we have the 3-fold repetition rule that adds some information to the same position.
Same colour of play and same number of available moves are important elements to calculate if we are in the same position. So why we don't incorporate this in a special way on the hash value?

Proposal:
1 bit - colour to move
n bits - available pieces on board (2 (KK) to 32) - we could take the lower x bits (see below)
n bits - available moves - we could take the lower x bits (see below)
or 1 bit - ep move possible (see below)
and/or 4 bits - castle rights
64 - above bits - remaining hash value available bits

I have seen implementations xor-ing the special ep/colour to move/castle rights inside the hash fields, but something tells me this is not the best approach.

If we set the first bit for the colour to move as a separate bit, we actually avoid possible collisions in 50% of the positions, and while we reduce the available bits for the hash value by 1, we do not loose hash precision. Hash collission should be equal to an implementation where we xor it into the hash key. I do this in Belofte.

If we set e.g. 3 bits of the available pieces in the next 3 bits, we avoid hash collisions when the number of pieces is different. While capturing 8 pieces in a combination could generate a collision with 0 or 16, and 1=9=17, ... etc. We loose a number of hash values, but at the same time we gain on testing 3-fold repetition as the bit change would break out of the tests. To be discussed if 1-2-4 bits is more optimal or 5 bits that represents the complete game (0-31 is within 2-32 possible range). I do use 3 bits in Belofte.

We could factor in the number of available moves, which would take into account possible castle (4 bits) and ep rights (1 bit) or even ep field (3 bits). Alternatively, just take ep or castle rights individually, while there is no need for movegeneration to calculate the hash, it consumes more bits. I am playing with this in Belofte.

Incorporating this in the hash gives me a feeling that it would greatly reduce hash collisions in a real search tree.

When using the available moves in the same way as available pieces, we would with 3 bits cover effectively castle rights and ep rights. I have a feeling that this would reduce possible hash collisions but it comes at a cost and I am not sure that it would perform better at reducing the number of hash collisions in a search tree compared to its alternative.

Is anyone using any of above techniques? What are your thoughts?

hgm · Post by **hgm** » Thu Jun 17, 2021 12:36 pm

'Something tells me' seems a euphemism for 'I have no clue, and am just trying random things'.

Dedicating 1 key bit to the stm should have no effect, as the tree contains approximately as many positions with white and black to move. Using the piece count should drive up the number of key collisions, as the number of pieces doean't vay that much across the tree, and in an end-game search might just span a few of the 31 possible values, highly driving up the number of collisions.

It also depends on how you derive index and signature from the key. If you derive the index from the part that contains the piece-count bits, and only 8 pieces are left, 75% of the hash table would remain completely unused.

ydebilloez · Post by **ydebilloez** » Thu Jun 17, 2021 1:50 pm

hgm wrote: ↑Thu Jun 17, 2021 12:36 pm 'Something tells me' seems a euphemism for 'I have no clue, and am just trying random things'.

In a game, you reduce the number of pieces from 32 to minimum 3, in a number of plies, 'lets say' on average 80. Lets 'assume' the game terminates at move 40 with still 16 pieces on board. This means that 1 out of 5 moves are capture moves (let's 'assume'). Having a search tree of 20 deep with no captures 'seems' unreasonable.
I am not trying to do random things, I am just trying to do clueless things.

hgm wrote: ↑Thu Jun 17, 2021 12:36 pm It also depends on how you derive index and signature from the key. If you derive the index from the part that contains the piece-count bits, and only 8 pieces are left, 75% of the hash table would remain completely unused.

Not talking about signature and index, just about different hash values to represent a position.

If we have 8 pieces on board, and no captures, the total number of combinations is:
64×63×62×61×60×59×58×55 = 1.722011284×10¹⁴ (king in corner prohibits other king of 4 cases)
Hashkey of 60 bits (64 minus stm and piece count): 1.152921505×10¹⁸

I am not good enough in maths to see how many possible collisions we get from this, but the number of available hash values is still 1000 times bigger than the actual hashes, and searching through the tree without any capture move and not getting a cut-off because of a bad move 'seems' unreasonable.

Lets go on with the cluelessnes:
If we have - let's 'assume' - 35 moves on average, and 20 ply: this is 7.609583502×10³⁰ positions. A hash table based on 64 bits has 1.844674407×10¹⁹ possible combinations. We have theoretically 4×10¹¹ collisions. If we throw away 3 bits to include the piececount lower 3 bits, we get 3×10¹² collisions.

However, we cannot get there from here - with or without capture - so whilst 35 different moves are possible, there are a lot of move sequences that lead to the same position and that are no collision and that are true identical positions. We really need to test with real searches to validate.

Pio · Post by **Pio** » Thu Jun 17, 2021 3:04 pm

ydebilloez wrote: ↑Thu Jun 17, 2021 1:50 pm
hgm wrote: ↑Thu Jun 17, 2021 12:36 pm 'Something tells me' seems a euphemism for 'I have no clue, and am just trying random things'.
In a game, you reduce the number of pieces from 32 to minimum 3, in a number of plies, 'lets say' on average 80. Lets 'assume' the game terminates at move 40 with still 16 pieces on board. This means that 1 out of 5 moves are capture moves (let's 'assume'). Having a search tree of 20 deep with no captures 'seems' unreasonable.
I am not trying to do random things, I am just trying to do clueless things.

hgm wrote: ↑Thu Jun 17, 2021 12:36 pm It also depends on how you derive index and signature from the key. If you derive the index from the part that contains the piece-count bits, and only 8 pieces are left, 75% of the hash table would remain completely unused.
Not talking about signature and index, just about different hash values to represent a position.

If we have 8 pieces on board, and no captures, the total number of combinations is:
64×63×62×61×60×59×58×55 = 1.722011284×10¹⁴ (king in corner prohibits other king of 4 cases)
Hashkey of 60 bits (64 minus stm and piece count): 1.152921505×10¹⁸

I am not good enough in maths to see how many possible collisions we get from this, but the number of available hash values is still 1000 times bigger than the actual hashes, and searching through the tree without any capture move and not getting a cut-off because of a bad move 'seems' unreasonable.

Lets go on with the cluelessnes:
If we have - let's 'assume' - 35 moves on average, and 20 ply: this is 7.609583502×10³⁰ positions. A hash table based on 64 bits has 1.844674407×10¹⁹ possible combinations. We have theoretically 4×10¹¹ collisions. If we throw away 3 bits to include the piececount lower 3 bits, we get 3×10¹² collisions.

However, we cannot get there from here - with or without capture - so whilst 35 different moves are possible, there are a lot of move sequences that lead to the same position and that are no collision and that are true identical positions. We really need to test with real searches to validate.

How do you think for example that the piece count helps? You make your hash 8 times more likely to collide if the piece count is the same. If the piece count is not the same the random zobrist will fail on those three bits 1/8 times while your will not fail. To maximise the gain of your idea you need the piece counts to be as evenly distributed as possible which you cannot get better than random. So to make your method the best you will have to make piece count random and you will get Zobrist.

So you can see that h.g.m is right.

ydebilloez · Post by **ydebilloez** » Thu Jun 17, 2021 4:36 pm

Pio wrote: ↑Thu Jun 17, 2021 3:04 pm To maximise the gain of your idea you need the piece counts to be as evenly distributed as possible which you cannot get better than random.

So you can see that h.g.m is right.

HGM is always right, we should look at getting pure random and getting information in there is walking away from pure random. So far I agree.

I believe the case of 1 bit for stm cannot be refuted. We effectively create 2 hashtables, one for white to move and one for black to move, each 63 bits instead of 64 bits for a global one. How this translates to collision count in real world needs to proven.

Lets go further on this idea: Lets work with 16 hashtables, each of 60 bits:
bit 1: stm, bit 2-4: piececount & 0x07, bits 5-64: zobrist key

The advantage of putting the piececount in clear is that:
- when looking for 3 fold repetition backwards, you can stop looking if bit 2-4 change
- when cleaning up the hashtable, you can clean all entries that are n or more pieces down, you can also clean when playing a capture move you can clean all your current entries, (if your search is deep enough, why not use 4 bits / 16 pieces ?)

Under the assumption that you cannot get there from here, by using 1 bit for stm, 3 bits for piececount, you actually create 16 different hashtables in the same array. Collisions with the same number of pieces on board are possible in the remaining 60 bits, but never with another number of pieces on board or another colour to move. If your zobrist keys are pure random, and generating legal moves from a certain position, the number of unique relevant positions with good moves (not brute-force) should be way lower than the hash collision threshold,

Optimal spread is with pure random, but movesequences are not pure random and are somewhat impacted by killer moves. The impact of killer moves could validate the idea.

ydebilloez · Post by **ydebilloez** » Thu Jun 17, 2021 4:49 pm

Lets put it simple. Use 1 bit for number of pieces on board is even or not even, remaining 63 bits are zobrist.
When being 1 piece down (or 3, 5, 7), I can never collide with 0 (or 2 or 4) down...
or
Use 1 bit for stm, remaining 63 bits are zobrist.

While dividing the number of possible hash values by 2, I also divide the possible colliding positions by 2.

Combining both: I divide the number of hash values by 4, I also divide the number of possible colliding positions by 4.
and so on...

mvanthoor · Post by **mvanthoor** » Thu Jun 17, 2021 5:46 pm

ydebilloez wrote: ↑Thu Jun 17, 2021 11:28 am Dear all,

I would like to see if following modifications are better/equal/worse for hash value calculation.
In order to avoid collision, a same hash value should not represent two different positions, but in chess, we have the 3-fold repetition rule that adds some information to the same position.

Why are you trying to re-invent an alternative for Zobrist hashing?

The Zobrist-hash literally contains everything you would ever need to know so you can uniquely identify a position.

"This position has been on the board before" is, in my opinion, not a property of a position; it doesn't make it a different position. So, we just use the Zobrist-hash to identify the position, and keep a list of all the hashes we encountered during play in a game_history table. To find out if your current position is a third repetition, you just run through the game_history table, backwards, and if you find your hash twice, you know your position is a third repetition.

As soon as you find a capture, a pawn move, or a castling move, you don't have to search backwards any further, because you can never ever reach the position in which the capture, castling, or pawn move was made.

Putting the repetition into the hash makes a repeating position unique, which makes it _more difficult_ to actually detect the repetition.

ydebilloez · Post by **ydebilloez** » Thu Jun 17, 2021 6:38 pm

mvanthoor wrote: ↑Thu Jun 17, 2021 5:46 pm Why are you trying to re-invent an alternative for Zobrist hashing?
The Zobrist-hash literally contains everything you would ever need to know so you can uniquely identify a position.

Hi Marcel,
I am not re-inventing an alternative for Zobrist, I do it with some enhancements. Instead of hashing the colour in the Zobrist hash, I prepend the Zobrist hash with the colour.

Normally, you do something like this:

Code: Select all

hash ^= randomvalues[colour-to-move];

In my code, I do:

Code: Select all

If white-to-move: hash &= ~whitemask, else hash |= whitemask

Which is setting or clearing the bit. (In my code, the bit is initialised to zero and only set when needed.)

All random values that make up the keys have the bits blanked out.

Code: Select all

 foreach() randomvalues [] ~= whitemask

(I actually create only 60 bits of random values. All 4 first bits are blanked out.)

You end up with a zobrist key that has it first bit set when black is to move, cleared when white is to move.
All random values are aligned to 64 - number of specific bits (1 + 3).

mvanthoor wrote: ↑Thu Jun 17, 2021 5:46 pm To find out if your current position is a third repetition, you just run through the game_history table, backwards, and if you find your hash twice, you know your position is a third repetition.
As soon as you find a capture, a pawn move, or a castling move, you don't have to search backwards any further, because you can never ever reach the position in which the capture, castling, or pawn move was made.

This is exactly what I proposed to do (and actually do in Belofte), just looking at the bits in the hash key instead of looking up the move played. If the bits of the number of pieces changes (because of capture), I stop looking backwards. I also use the 50-moves counter to count maximum n positions backwards.

Please note as mentioned before: While dividing the number of possible hash values by 2, I also divide the possible colliding positions by 2.

odomobo · Post by **odomobo** » Thu Jun 17, 2021 7:11 pm

63-bit zobrist with 1-bit stm is nearly identical to 64-bit zobrist. In one case, you have 1 population, and the other you have 2 populations, with half the number of items but twice the chance of collisions. I believe the case of 1 population is a miniscule bit worse (much, much less than a 1% difference in collision rate).

I think a better approach would be to find a specific weakness in zobrist before trying to improve it. However, I think that will be difficult, because zobrist is pretty close to the perfect hashing algorithm as far as I can tell.

hgm · Post by **hgm** » Thu Jun 17, 2021 11:17 pm

ydebilloez wrote: ↑Thu Jun 17, 2021 1:50 pmIn a game, you reduce the number of pieces from 32 to minimum 3, in a number of plies, 'lets say' on average 80. Lets 'assume' the game terminates at move 40 with still 16 pieces on board. This means that 1 out of 5 moves are capture moves (let's 'assume'). Having a search tree of 20 deep with no captures 'seems' unreasonable.

No, but reasonable is that you would have on average 4 captures, with most positions reached through 2 - 6 captures. So the number of pieces would vary across the tree from 26 - 30, (if the root had 32 pieces), and positions with 2 - 24 pieces would virtually not occur. So effectively your tree uses only 25% of all possible keys, driving up the number of key collisions by a factor 4.

With the castling rights it is even worse: most of the game (often all of the game, if you play from an opening book) neither side will have castling rights. So those 4 bits will be 0 for all positions in the tree, and you will have driven up the number of collisons by a factor 16.

Hash value composition

Hash value composition

Re: Hash value composition

Re: Hash value composition

Re: Hash value composition

Re: Hash value composition

Re: Hash value composition

Re: Hash value composition

Re: Hash value composition

Re: Hash value composition

Re: Hash value composition