Zobrist key random numbers

bob · Post by **bob** » Fri Jan 23, 2009 5:39 pm

Aleks Peshkov wrote:Considering Hamming distance it is important to notice, that 0 value (empty square) is very frequent in real board (at least 50% of all values and much more in an endgame). I suspect that a set of 64 different keys for all empty squares can improve collision situation significantly.

It is also intuitively obvious that some distance of 32+8 = 40, is not better then 32-8 = 24 distance, so the average should be maximized toward 32, not 64 for 64-bit sets.

You can't get to 32. For the 788 numbers I need, you can get to 28. But then, as someone pointed out via private email, the problem then becomes that any two numbers, XORed with each other, produce a third number in the set... That is really painful...

Gerd Isenberg · Post by **Gerd Isenberg** » Fri Jan 23, 2009 6:03 pm

bob wrote:
Aleks Peshkov wrote:Considering Hamming distance it is important to notice, that 0 value (empty square) is very frequent in real board (at least 50% of all values and much more in an endgame). I suspect that a set of 64 different keys for all empty squares can improve collision situation significantly.

It is also intuitively obvious that some distance of 32+8 = 40, is not better then 32-8 = 24 distance, so the average should be maximized toward 32, not 64 for 64-bit sets.
You can't get to 32. For the 788 numbers I need, you can get to 28. But then, as someone pointed out via private email, the problem then becomes that any two numbers, XORed with each other, produce a third number in the set... That is really painful...

That was for the minimum hd, not the average hd.

If you have 14 as min HD, what is your max HD and avg HD?
If you have 23 as min HD, what was your max HD and avg HD?

wgarvin · Post by **wgarvin** » Fri Jan 23, 2009 7:42 pm

Maximizing Hamming distance between any 2 keys in the set of 788 (or however many) does not seem like a good way to optimize the Zobrist keys, because is not a great model for how the keys are actually used. All interesting board positions contain more than 2 of the keys from the entire set of 788 (one white king + one black king + several others). Since all of the used entries are XORed together to make the Zobrist key, it seems reasonable to take a group of related keys (e.g. the 64 keys for "white bishop on square X") and XOR together all subsets of up to N of those keys, where N is the maximum number of that piece that typically appears in a position. I was hasty in reading the thread, but I think this was more or less what hgm suggested already.

Now: Most positions will contain a small number of the 64 keys for each piece (for pawns, the position will contain 0 to 8 keys out of 48 keys for each color, and for pieces it will usually contain 0 to 2 keys out of 64 keys for each color). Maybe you could take advantage of that property to design keys that have less collisions on actual chess positions than randomly-generated keys do (though I doubt it will make much difference, in the end).

So I would suggest that you first try to design two sets of 48 zobrist keys for the pawns, where (following hgm's idea) you can take all subsets of N=6 or N=7 of those 48 keys, and XOR them together, and then measure the hamming distance between every one of those subsets. (N=6 seems like it would be >8 billion combinations... using N=4 might be more practical, and hopefully good enough). First try to maximize that, so that at least several bits are being flipped by each placement of a pawn. Keep the final list of subsets for the two sets of pawn keys (call them X and Y).

After making two sets of pawn keys, it should be relatively simple to create sets of 64 keys for each other piece, with the property that for all subsets of N=2 have decent hamming distances from each other (and from all of the values on the X and Y lists).

Hmm, another idea (that might be dumb): When measuring bitcounts or hamming differences or whatever... Maybe give a higher weight to the lowest 28 bits or so of the key ? Because the lower bits control the hardware address in the transposition table, while the higher bits are only used to disambiguate the inevitable collisions. You don't want situations where the positions from a search tree are competing for the available transposition table entries any more than necessary. I'm not sure if that would matter much, though. And ultimately I'll be surprised if any of this results in lower collision rates than you get from randomly-assigned Zobrist keys.

hgm · Post by **hgm** » Fri Jan 23, 2009 8:35 pm

That last remark was indeed the conclusion I arrived at when I was toying with this. Unless your hash key is vvery short, there is nothing to worry about. And if there is something to worry about, I could not find a way to improve the situation by being smart.

bob · Post by **bob** » Fri Jan 23, 2009 8:41 pm

Gerd Isenberg wrote:
bob wrote:
Aleks Peshkov wrote:Considering Hamming distance it is important to notice, that 0 value (empty square) is very frequent in real board (at least 50% of all values and much more in an endgame). I suspect that a set of 64 different keys for all empty squares can improve collision situation significantly.

It is also intuitively obvious that some distance of 32+8 = 40, is not better then 32-8 = 24 distance, so the average should be maximized toward 32, not 64 for 64-bit sets.
You can't get to 32. For the 788 numbers I need, you can get to 28. But then, as someone pointed out via private email, the problem then becomes that any two numbers, XORed with each other, produce a third number in the set... That is really painful...
That was for the minimum hd, not the average hd.

If you have 14 as min HD, what is your max HD and avg HD?
If you have 23 as min HD, what was your max HD and avg HD?

You are correct. I overlooked the "the average" in the previous post...

bob · Post by **bob** » Fri Jan 23, 2009 8:43 pm

wgarvin wrote:Maximizing Hamming distance between any 2 keys in the set of 788 (or however many) does not seem like a good way to optimize the Zobrist keys, because is not a great model for how the keys are actually used. All interesting board positions contain more than 2 of the keys from the entire set of 788 (one white king + one black king + several others). Since all of the used entries are XORed together to make the Zobrist key, it seems reasonable to take a group of related keys (e.g. the 64 keys for "white bishop on square X") and XOR together all subsets of up to N of those keys, where N is the maximum number of that piece that typically appears in a position. I was hasty in reading the thread, but I think this was more or less what hgm suggested already.

Now: Most positions will contain a small number of the 64 keys for each piece (for pawns, the position will contain 0 to 8 keys out of 48 keys for each color, and for pieces it will usually contain 0 to 2 keys out of 64 keys for each color). Maybe you could take advantage of that property to design keys that have less collisions on actual chess positions than randomly-generated keys do (though I doubt it will make much difference, in the end).

So I would suggest that you first try to design two sets of 48 zobrist keys for the pawns, where (following hgm's idea) you can take all subsets of N=6 or N=7 of those 48 keys, and XOR them together, and then measure the hamming distance between every one of those subsets. (N=6 seems like it would be >8 billion combinations... using N=4 might be more practical, and hopefully good enough). First try to maximize that, so that at least several bits are being flipped by each placement of a pawn. Keep the final list of subsets for the two sets of pawn keys (call them X and Y).

After making two sets of pawn keys, it should be relatively simple to create sets of 64 keys for each other piece, with the property that for all subsets of N=2 have decent hamming distances from each other (and from all of the values on the X and Y lists).

Hmm, another idea (that might be dumb): When measuring bitcounts or hamming differences or whatever... Maybe give a higher weight to the lowest 28 bits or so of the key ? Because the lower bits control the hardware address in the transposition table, while the higher bits are only used to disambiguate the inevitable collisions. You don't want situations where the positions from a search tree are competing for the available transposition table entries any more than necessary. I'm not sure if that would matter much, though. And ultimately I'll be surprised if any of this results in lower collision rates than you get from randomly-assigned Zobrist keys.

I am more concerned about a more strict sense of that, even. What is important with hashing, is what keys can be produced from a given starting position. At some depth, you can only move pawns D/2 squares forward/diagonally, which further limits the specific sets of numbers that are interesting. But I am (currently) leaving this as "an interesting topic to continue when I have lots of free time, maybe later this Summer..."

bob · Post by **bob** » Fri Jan 23, 2009 11:01 pm

hgm wrote:
Zach Wegner wrote:Random idea:

zobrist[piece][square] = rotate_left(deBruijn[piece], square);
This is not such a crazy idea, and in fact you don't even need a DeBruijn number for it.

To save cache space I rotate the Zobrist keys bytewise in my engines. So in stead of taking 768 random long long ints, (6KB) I have a 675-byte table of random chars. Something like:
Code: Select all
char Zobrist&#91;12&#93;&#91;64&#93;;

hashKey ^= *&#40;long long int*) &Zobrist&#91;pieceType&#93;&#91;squareNr&#93;;
I measured the effect on collision rate compared to the traditional method, and there is none. My guess is that it would even work if the randoms overlap 63 out of 64 bits, rather than 56 out of 64. But this would make them more expensive to read, and the table is now already so small that further reduction is not worth it anyway.

You do realize how bad it is to do unaligned memory accesses???

hgm · Post by **hgm** » Fri Jan 23, 2009 11:35 pm

Of course!

I measured that first:

The memory access has a latency that is 2 clocks longer than for the aligned access. But it is not a 2-cycle pipeline stall; execution of other instructions continues as usual during these cycles. Even another memory load could be done in one of the two cycles. (Not in the other, of course, as there are really two loads needed , when the data is distributed over two words.)

So the cost is fairly small, in any case much smaller than the latency of an L1 cache miss that has to be satisfied from L2 (20 clocks on an AMD, or on an Intel when the replaced line is dirty). So it saves a lot of time if I make sure the Zobrist table will reside in L1 always, because it is small enough to fit there together with the other core data.

Remember that a Zobrist table on a board of 81 squares with 13 different piece types for each color can be rather big, and waste most of your L1 cache, if not already overflowing it on some machines. So it comes in very handy that this is possible.

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Jan 24, 2009 12:12 am

hgm wrote:Of course!

I measured that first:

The memory access has a latency that is 2 clocks longer than for the aligned access. But it is not a 2-cycle pipeline stall; execution of other instructions continues as usual during these cycles. Even another memory load could be done in one of the two cycles. (Not in the other, of course, as there are really two loads needed , when the data is distributed over two words.)

So the cost is fairly small, in any case much smaller than the latency of an L1 cache miss that has to be satisfied from L2 (20 clocks on an AMD, or on an Intel when the replaced line is dirty). So it saves a lot of time if I make sure the Zobrist table will reside in L1 always, because it is small enough to fit there together with the other core data.

Remember that a Zobrist table on a board of 81 squares with 13 different piece types for each color can be rather big, and waste most of your L1 cache, if not already overflowing it on some machines. So it comes in very handy that this is possible.

It would crash if that is your declaration, and if the array becomes located at the end of a 4 KByte page, if piecetype with code 11 is on square > 56 - or is that type some pawn?

One may try unaligned SSE2 or SSE3 load instruction and 15 (or 16 for padding) additional bytes.

Code: Select all

char Zobrist&#91;12*64+16&#93;;

__m128 zob;
unsigned long long hashkey = ...;
zob = _mm_loadu_si128&#40; (__m128*)&#40;Zobrist + 64*pieceType + squareNr&#41; ); 
zob = _mm_xor_si128&#40; zob, _mm_cvtsi64x_si128&#40;hashkey&#41; );
hashkey = _mm_cvtsi128_si64&#40; zob );

Greg Strong · Post by **Greg Strong** » Sat Jan 24, 2009 1:55 am

Gerd Isenberg wrote:One may try unaligned SSE2 or SSE3 load instruction and 15 (or 16 for padding) additional bytes.
Code: Select all
char Zobrist&#91;12*64+16&#93;;

__m128 zob;
unsigned long long hashkey = ...;
zob = _mm_loadu_si128&#40; (__m128*)&#40;Zobrist + 64*pieceType + squareNr&#41; ); 
zob = _mm_xor_si128&#40; zob, _mm_cvtsi64x_si128&#40;hashkey&#41; );
hashkey = _mm_cvtsi128_si64&#40; zob );

What's the advantage of doing this with SSE instructions?

Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers