Zobrist key independence
Posted: Mon Feb 17, 2020 4:20 pm
Zobrist hashing requires roughly 64 (squares) x 6 (piece types) x 2 (colors) = 768 keys. (Actually 32 less, for the squares inaccessible by Pawns, but then you also need some keys for turn to move and castling / e.p. rights. With 64-bit keys there must always be collisions, as the number of possible chess positions is far larger than 2^64. (More like 2^160, actually.) For TT hashing we don't care too much, though, if there are collisions with keys for wildly different positions, as these will never occur in the same search tree. What does hurt are low-order dependencies, where just a few keys XORed together produce 0.
So we want a set of keys that is as free from the lowest-order dependencies as possible. Now we can take 8 different keys out of 768 in slightly over 2^61 ways (768!/(760!*8!)). With randomly chosen 64-bit keys there is a 1-in-2^64 probability for each of those XOR combinations of eight to be 0. So it is not too far-fetched that we will be spared the misfortune of having 8 keys that XOR to 0 (an 8th-order dependency). But having a 9th-order dependency seems unavoidable, as there are nearly 2^68 combinations of 9 keys possible, so that we expect 0 to be hit by them on average 16 times. The probability that it will then happen 0 times is exp(-16) or about 10^-7.
With a smaller number of keys the probability of those having a dependency of a given order of course becomes smaller. In a set of (say) 200 keys, one can select 10 of those in 200!/(190!*10!) ~ 2^54 ways. So with a random set of 200 64-bit keys you would almost certainly have no 10th-order dependency, and even the chances of having no 12th-order dependency would be pretty good.
Redistributing the pain
Now we can reduce the number of required independent keys enormously, by making use of the fact that in common chess positions some combinations of Zobrist keys can never occur, so that they do not need to be independent. E.g. there will surely be only a single King of each color, so only one of the 64 of its Zobrist keys can occur. We can construct the 64 King keys using only 6 keys from the 'maximally independent' set of keys (the 'basis keys'), by assigning a key to each bit of the square number, and then XORing all the keys corresponding to the bits in the square number that are 1. That will cause 3rd-order dependency in the King keys, but that doesn't matter, as there will only be a single one of those in the total key. So that the fact that that same contribution could also have been made by several pairs of keys (e.g. 1 = 2^3 = 4^5 = ...) is harmless: we cannot confuse it with positions that had two Kings, as these will not occur. Even the fact that a1 has the same key as an empty square is harmless: the confusion between a1 being King or empty will always be resolved by whether there is a King elsewhere.
For other pieces it is a bit more tricky. If we are willing to ignore positions with 'super-numerary' pieces (e.g. 3 Rooks; who would ever promote to Rook if he already has two?), we could in theory encode the Rook constellation in 12 bits (2 for each square number). But encoding those bits by a key for each, like with the King location, would not give a Zobrist encoding, as the Rooks would be considered distinguishable. What we are looking for is a set of 64 Zobrist 'proto-keys' without any 4th-order (or lower) dependencty. By primitive trial and error I was able to construct such a set o 16-bit keys: just generate 64 random 16-bit keys, and calculate and tabulate all XOR's of pairs of those (and single ones), to see if a duplicat occurs. If no two pairs give the same result, no four of them will give zero. The same algorithm applied to 15-bit keys did not give any result in decent time. (Perhaps expert in generation of magics would know how to do better!)
From this set of 64 proto-keys, plus 16 64-bit basis keys, one can construct 64 Rook keys as follows: assign each of the 16 basis keys to a bit of the proto-keys, and XOR those that correspond to set bits of the proto-key for a given square with each other to make the 64-bit key for that square. The same set of 64 proto-keys can be used for all piece types that are present in maximally two copies. I would think it acceptable for N, B, R and Q; this allows one super-numerary Q. But it would run the risk of confusing positions with 3 Knights with those that have a fewer number, although that risk is not very large: with 16-bit proto-keys to encode 2^11 constellations of two identical pieces, only 1/32 of the 2^16 possible proto-keys are used by the two-piece constellations. And the 3% of 3-Knight positions that would also be in use by a 2-Knight positions will likely have the two Knights in completely different locations, not likely to occur in the same tree. One could throw in a 17th basis key XORed into all 64 Knight keys; that would allow distinguishing positions with even and odd number of Knights, so that 3-Knight positions could only collide with single-Knight positions (which occupy only 64 out of 2^16 = 1/1024 of the proto-key space).
For Pawns one could simply use 48 basis keys, one for each square that can have a Pawn. The total number of basis keys needed to construct the 768-key set this way is thus 2*48 + 8*16 + 2*6 = 96 + 128 + 12 = 236 keys. The number of keys that we really want to be independent is thus reduced by more than a factor 3! The chances on even an 11th-order dependency in a so much smaller set are only 1 in 8, as one can choose 11 out of 236 basis keys in about 2^61 ways.
So we want a set of keys that is as free from the lowest-order dependencies as possible. Now we can take 8 different keys out of 768 in slightly over 2^61 ways (768!/(760!*8!)). With randomly chosen 64-bit keys there is a 1-in-2^64 probability for each of those XOR combinations of eight to be 0. So it is not too far-fetched that we will be spared the misfortune of having 8 keys that XOR to 0 (an 8th-order dependency). But having a 9th-order dependency seems unavoidable, as there are nearly 2^68 combinations of 9 keys possible, so that we expect 0 to be hit by them on average 16 times. The probability that it will then happen 0 times is exp(-16) or about 10^-7.
With a smaller number of keys the probability of those having a dependency of a given order of course becomes smaller. In a set of (say) 200 keys, one can select 10 of those in 200!/(190!*10!) ~ 2^54 ways. So with a random set of 200 64-bit keys you would almost certainly have no 10th-order dependency, and even the chances of having no 12th-order dependency would be pretty good.
Redistributing the pain
Now we can reduce the number of required independent keys enormously, by making use of the fact that in common chess positions some combinations of Zobrist keys can never occur, so that they do not need to be independent. E.g. there will surely be only a single King of each color, so only one of the 64 of its Zobrist keys can occur. We can construct the 64 King keys using only 6 keys from the 'maximally independent' set of keys (the 'basis keys'), by assigning a key to each bit of the square number, and then XORing all the keys corresponding to the bits in the square number that are 1. That will cause 3rd-order dependency in the King keys, but that doesn't matter, as there will only be a single one of those in the total key. So that the fact that that same contribution could also have been made by several pairs of keys (e.g. 1 = 2^3 = 4^5 = ...) is harmless: we cannot confuse it with positions that had two Kings, as these will not occur. Even the fact that a1 has the same key as an empty square is harmless: the confusion between a1 being King or empty will always be resolved by whether there is a King elsewhere.
For other pieces it is a bit more tricky. If we are willing to ignore positions with 'super-numerary' pieces (e.g. 3 Rooks; who would ever promote to Rook if he already has two?), we could in theory encode the Rook constellation in 12 bits (2 for each square number). But encoding those bits by a key for each, like with the King location, would not give a Zobrist encoding, as the Rooks would be considered distinguishable. What we are looking for is a set of 64 Zobrist 'proto-keys' without any 4th-order (or lower) dependencty. By primitive trial and error I was able to construct such a set o 16-bit keys: just generate 64 random 16-bit keys, and calculate and tabulate all XOR's of pairs of those (and single ones), to see if a duplicat occurs. If no two pairs give the same result, no four of them will give zero. The same algorithm applied to 15-bit keys did not give any result in decent time. (Perhaps expert in generation of magics would know how to do better!)
From this set of 64 proto-keys, plus 16 64-bit basis keys, one can construct 64 Rook keys as follows: assign each of the 16 basis keys to a bit of the proto-keys, and XOR those that correspond to set bits of the proto-key for a given square with each other to make the 64-bit key for that square. The same set of 64 proto-keys can be used for all piece types that are present in maximally two copies. I would think it acceptable for N, B, R and Q; this allows one super-numerary Q. But it would run the risk of confusing positions with 3 Knights with those that have a fewer number, although that risk is not very large: with 16-bit proto-keys to encode 2^11 constellations of two identical pieces, only 1/32 of the 2^16 possible proto-keys are used by the two-piece constellations. And the 3% of 3-Knight positions that would also be in use by a 2-Knight positions will likely have the two Knights in completely different locations, not likely to occur in the same tree. One could throw in a 17th basis key XORed into all 64 Knight keys; that would allow distinguishing positions with even and odd number of Knights, so that 3-Knight positions could only collide with single-Knight positions (which occupy only 64 out of 2^16 = 1/1024 of the proto-key space).
For Pawns one could simply use 48 basis keys, one for each square that can have a Pawn. The total number of basis keys needed to construct the 768-key set this way is thus 2*48 + 8*16 + 2*6 = 96 + 128 + 12 = 236 keys. The number of keys that we really want to be independent is thus reduced by more than a factor 3! The chances on even an 11th-order dependency in a so much smaller set are only 1 in 8, as one can choose 11 out of 236 basis keys in about 2^61 ways.