most similar hashes of two positions
Moderators: bob, hgm, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 11265
 Joined: Wed Mar 08, 2006 7:57 pm
 Location: Redmond, WA USA
 Contact:
Re: most similar hashes of two positions
It does not matter how close they are as long as one bit is different.
Re: most similar hashes of two positions
Here is a pair with distance 4.
First has polyglot key 7837007157572575988
[d] r1bq1knr/pp2bppp/2n1p3/3p4/6P1/3Q1N2/PPP1PP1P/RNB1KB1R w KQ  0 1
Second has polyglot key 7908924022443647732
[d] rn2kbr1/pp3ppp/4p2n/2pq1b2/3P4/2N1PN2/PP1B1PPP/R2QKB1R w KQq  0 1
First has polyglot key 7837007157572575988
[d] r1bq1knr/pp2bppp/2n1p3/3p4/6P1/3Q1N2/PPP1PP1P/RNB1KB1R w KQ  0 1
Second has polyglot key 7908924022443647732
[d] rn2kbr1/pp3ppp/4p2n/2pq1b2/3P4/2N1PN2/PP1B1PPP/R2QKB1R w KQq  0 1
Re: most similar hashes of two positions
If some people really want to try this, then it would seem better to ask for the first person to find two legal positions (i.e. reachable from the root) having identical Polyglot hash keys. Another option is to ask for two positions with identical Polyglot hash keys that "differ" in the fewest number of moves and unmoves.brtzsnr wrote:In case of a tie we count the lowest 32 bits. The competition runs until 31st of August (including).
It should really not be very difficult to find two different positions mapping to the same polyglot key.
Just do this:
 in addition to the 8byte polyglot key, add a 1byte hash key;
 do a big perft while storing all positions (8byte key plus 1byte key) in a huge hash table with not too small buckets;
 if the 8byte polyglot keys collide, check the 1byte hash key to see whether the positions are identical or whether you have two positions mapping to the same polyglot hash key;
 use the hashtable to skip traversing subtrees you've traversed before.
To find the actual fens once you find two positions mapping to the same polyglot hash key, print out the fen of the current position and the hash key. Redo the perft to find back the first position mapping to that key and print its fen.
Regarding the hash table. One could take 64byte buckets. If 32GB is available for the hash table, that gives 2^35 / 64 = 2^29 buckets. That means 29 bits of the 64+8=72 bits of hash key do not need to be stored (as they are already encoded in the index number of the bucket). That leaves 43 bits per position. That means about 11.9 positions per bucket, which unfortunately has to be rounded down to 11 positions. Maybe add a 7bit hash key instead of an 8bit hash key, then you can store 12 positions in a bucket.

 Posts: 1092
 Joined: Thu Jul 16, 2009 8:47 am
 Location: Almere, The Netherlands
Re: most similar hashes of two positions
I wonder if you will find Hamming distance zero. With reasonable 64 bit Zobrist keys (which the Polyglot keys probably are) this seems not to happen very often.
 hgm
 Posts: 24667
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: most similar hashes of two positions
IIRC Joerg Oster already solved this problem a long time ago, by presenting a position that had key 0. I believe this was indeed with Polyglot keys. The position looked comparatively normal.
He did this by normal techniques from linear algebra, as the keys can be considered 768 (or so) vectors in a 64dimensional space.
There should already be two positions with each 5 pieces with equal key: QRBN together have 512 keys = 2^9. So there are 2^45/5! ~ 2^38 possible combinations of 5 of those. According to the birthday paradox this gives 2^75 pairs of combinations, each with a 1/2^64 probability of being equal. So we would expect 2048 of those pairs to have equal keys. These would then represent two 5men positions (without Kings and Pawns) that have the same key. You can then add Kings and Pawns to some of the empty squares they have in common to make a realistic positon.
He did this by normal techniques from linear algebra, as the keys can be considered 768 (or so) vectors in a 64dimensional space.
There should already be two positions with each 5 pieces with equal key: QRBN together have 512 keys = 2^9. So there are 2^45/5! ~ 2^38 possible combinations of 5 of those. According to the birthday paradox this gives 2^75 pairs of combinations, each with a 1/2^64 probability of being equal. So we would expect 2048 of those pairs to have equal keys. These would then represent two 5men positions (without Kings and Pawns) that have the same key. You can then add Kings and Pawns to some of the empty squares they have in common to make a realistic positon.
Last edited by hgm on Wed Aug 12, 2015 9:34 pm, edited 1 time in total.

 Posts: 1092
 Joined: Thu Jul 16, 2009 8:47 am
 Location: Almere, The Netherlands
Re: most similar hashes of two positions
According to the Mtheory we live in a 11 dimensional space, 64 dimensions seem a bit overwhelming, at least to me this is.hgm wrote:IIRC Joerg Oster already solved this problem a long time ago, by presenting a position that had key 0. I believe this was indeed with Polyglot keys. The position looked comparatively normal.
He did this by normal techniques from linear algebra, as the keys can be considered 768 (or so) vectors in a 64dimensional space.
There should already be two positions with each 4 pieces with equal key: QRBN together have 512 keys = 2^9. So there are 2^36 possible combinations of 4 of those. According to the birthday paradox this gives 2^71 pairs of combinations, each with a 1/2^64 probability of being equal. So we would expect 128 of those pairs to have equal keys. These would then represent two 4men positions (without Kings and Pawns) that have the same key. You canthen add Kings and Pawns to some of the empty squares they have in common to make a realistic positon.
Re: most similar hashes of two positions
Skipper is collision free and uses 8 or 9 x 64 bits. Only one advantage: it is simple for me to understand and work with.bob wrote:I am not sure what there is to decide. You can't represent all chess positions with just 64 bits. So far the best has been around 160 bits or so. mapping 2^160 positions into 2^64 bits clearly will have literally gazillions of positions with a hamming distance of zero. Something like 2^96 such positions roughly...brtzsnr wrote:Based on the thread "Worst Advice" (http://www.talkchess.com/forum/viewtopic.php?t=57235) I propose the following challenge:
Find two distinct positions whose Polyglot keys (http://hgm.nubati.net/book_format.html) have the lowest Hamming distance (fewer distinct bits).
The challenge has two categories:
1. The positions can be crafted.
2. The positions must be reached from the starting position (standard chess rules) by a series of legal moves.
In case of a tie we count the lowest 32 bits. The competition runs until 31st of August (including).
The goal of the challenge is to decide whether the hash move validity checking is necessary. Polyglot keys are used in order to be able to verify the result and to eliminate poor choices of the Zobrist hash values.
Please post your FENs.
Collisions (false matches with 64 bit signatures) absolutely happen. If an illegal move will crash your engine, you'd better check 'em or suffer the occasional crash since these are just like death, taxes, and such, there is no escaping them.
Re: most similar hashes of two positions
With 2
12090907956623999240 [d]rn1qk2r/ppp1ppbn/8/3P1b2/3P4/8/PP2PPPP/RNBQKB1R w KQkq  0 1
16414363598899675400 [d]r2qkbnr/ppQb1ppp/2n1p3/8/P7/2Pp4/1P3PPP/RNB1KBNR b KQkq  0 1
12090907956623999240 [d]rn1qk2r/ppp1ppbn/8/3P1b2/3P4/8/PP2PPPP/RNBQKB1R w KQkq  0 1
16414363598899675400 [d]r2qkbnr/ppQb1ppp/2n1p3/8/P7/2Pp4/1P3PPP/RNB1KBNR b KQkq  0 1
Re: most similar hashes of two positions
Hamming distance of the Zobrist keys is not a good measure, but what does make sense is to look at all combinations of Zobrist keys that add up to 0. Those combinations form a subspace of the 768dimensional vector space of all linear combinations of Zobrist keys. An element of this subspace with small Hamming distance corresponds to a small number of Zobrist keys that add up to 0. This combination might correspond to a short series of chess moves and unmoves that convert one position into another position with the same key (whenever that series of moves and unmoves is legal). Such "close" positions are more likely to occur in the same search tree than two very remote positions.mar wrote:Hamming distance is not a good measure of hash key quality (this has been shown to me by hgm some time ago).brtzsnr wrote:Based on the thread "Worst Advice" (http://www.talkchess.com/forum/viewtopic.php?t=57235) I propose the following challenge:
Find two distinct positions whose Polyglot keys (http://hgm.nubati.net/book_format.html) have the lowest Hamming distance (fewer distinct bits).
Actually you can craft Zobrist hashes to maximize Hamming distance but such will perform very very badly.
Re: most similar hashes of two positions
About a year ago I went through the posts looking for hash collision info. I thought I did a reasonably good search, but I don't remember seeing HGM giving an expose on the subject. Can you point out the discussion?mar wrote:Hamming distance is not a good measure of hash key quality (this has been shown to me by hgm some time ago).brtzsnr wrote:Based on the thread "Worst Advice" (http://www.talkchess.com/forum/viewtopic.php?t=57235) I propose the following challenge:
Find two distinct positions whose Polyglot keys (http://hgm.nubati.net/book_format.html) have the lowest Hamming distance (fewer distinct bits).
Actually you can craft Zobrist hashes to maximize Hamming distance but such will perform very very badly.
As I recall, a few years back, you initiated a rather long discussion on this topic. You were trying to produce keys with the maximum possible Hamming distance until someone (offline) demonstrated that this would produce horrible collision problems. IE. XOR'ing any two keys would produce a key already in the set. Unfortunately this conversation degenerated and ended prior to determining what the optimal hamming distance was. I remember thinking to myself that the optimal distance was (hash_key_size_in_bits minus log2(number_of_Zorbist_keys))/4 e.g. (for 784 zorbist keys and a 64bit hash it would be (64log2(784))/4 = 13.6). If this is true, then it would also seem that it's impossible to create a set of 64bit Zorbist keys that can guarantee that no collisions will occur when two positions are separated by 4 or more plies. IE. 13.6 / 2^4 < 1.bob wrote:I am not sure what there is to decide. You can't represent all chess positions with just 64 bits. So far the best has been around 160 bits or so. mapping 2^160 positions into 2^64 bits clearly will have literally gazillions of positions with a hamming distance of zero. Something like 2^96 such positions roughly...
Collisions (false matches with 64 bit signatures) absolutely happen. If an illegal move will crash your engine, you'd better check 'em or suffer the occasional crash since these are just like death, taxes, and such, there is no escaping them.
It does seem possible, at least in theory, to be able to create a set of 64bit Zorbist keys which could guarantee that no collisions would occur between 2 positions separated by 3plies or less. IE. 13.6 / 2^3 > 1 . Although, I have no clue as to how to go about constructing such a set.
Do these sound like reasonable conclusions?
On a somewhat related subject...
You did a collision rate test a few years back in which you showed that very high collision rates (on the order 10^6) had a relatively minor if any effect on engine move selection. I tried a few months ago to calculate what the collision rate for a given TT size and Zorbist key size but I couldn't find anyway to do it. A guess would be 1 / (2^(key_size_in_bits minus log2(#_of_table_entries)). This would work if all table entries were IID. There not, of course, and you can't have more collisions than table hits. Since the number of table hits in the middle game run between 15% and 20% this would seem to put a upper limit on possible collisions rates during this part of the game. So it's likely the only way to find out is to run some tests. The piece of information that would be nice to know is: With a 64bit key size what is the number of table entries that will cause a measurable decline in engine ELO due to excess collisions.
If it takes a collision rate > ~10^6 to cause a noticable change in engine ELO and 20 excess bit in the key vice the number of table entries will produce an error rate of about 10^6 then it would appear that table sizes as large as 16 trillion entries (2^44 entries) would be possible.
I know what a lot of you are thinking, you're thinking that when the number of table entries exceeds the square root of the possible number of zorbist keys collisions become almost gauranteed. While this may be true, It doesn't tell you how many collisions you will have, just that the probability of having at least one becomes very large. In a table of 2^44 entries you have to have greater than 2^24 probes that return invalid data due to a false positive per 2^44 probes to have an error rate of ~10^6.
In any case, I don't think this can be tested on a full 64bit key because of the amount of TT memory needed and the amount of time required to generate the required numbe of cache probes. i.e. a machine probing at 100,000,000 nodes per second would require over 2.5 months to completely fill the table.
Thoughts?
Regards,
Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.