most similar hashes of two positions

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

petero2
Posts: 685
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: most similar hashes of two positions

Post by petero2 »

hgm wrote:IIRC Joerg Oster already solved this problem a long time ago, by presenting a position that had key 0. I believe this was indeed with Polyglot keys. The position looked comparatively normal.
That was actually me. Here are some links:

http://talkchess.com/forum/viewtopic.ph ... 96&t=45155

http://talkchess.com/forum/viewtopic.ph ... 00&t=45155

http://talkchess.com/forum/viewtopic.ph ... 41&t=45155

http://talkchess.com/forum/viewtopic.ph ... 04&t=45155
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: most similar hashes of two positions

Post by Joerg Oster »

hgm wrote:IIRC Joerg Oster already solved this problem a long time ago, by presenting a position that had key 0.
Definitely not me. :lol:
Jörg Oster
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: most similar hashes of two positions

Post by hgm »

petero2 wrote:That was actually me. Here are some links:
My humblest aplogies! I never was good with names. :oops:
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: most similar hashes of two positions

Post by syzygy »

petero2 wrote:
hgm wrote:IIRC Joerg Oster already solved this problem a long time ago, by presenting a position that had key 0. I believe this was indeed with Polyglot keys. The position looked comparatively normal.
That was actually me. Here are some links:

http://talkchess.com/forum/viewtopic.ph ... 96&t=45155

http://talkchess.com/forum/viewtopic.ph ... 00&t=45155

http://talkchess.com/forum/viewtopic.ph ... 41&t=45155

http://talkchess.com/forum/viewtopic.ph ... 04&t=45155
So you have "won": distance 0 between the initial position and any of:
petero2 wrote:[D]3rk1nq/1p1p4/P1p3p1/1p2p3/3PP3/6P1/1PP4P/R2QK1R1 w - - 0 1
Some other positions that have the same hash as the initial position:

Code: Select all

3rkn1b/p1p4p/3p4/1Pp3p1/PP3p1p/5P2/3P4/1N1RKNBB w - - 0 1
2b1kqrn/5p2/3pp3/5P1p/3pp3/1P2P2P/2PPP3/1QB1KR2 w - - 0 1
1r2kn1b/2p1p3/P7/P3p1p1/1P4Pp/5P2/P3P1P1/BR2KR1Q w - - 0 1
qr1nkrnb/5p2/8/p4P1P/P1p1P2p/8/P1P1P2P/RNN1KRQ1 w - - 0 1
rqn1kn2/2p5/3p4/1P1p4/6P1/pP3P2/1PP3PP/1R1BK1N1 w - - 0 1
nr1rkq2/p7/1pp2p2/6p1/3PpP2/1PP5/2P1P3/BBNRKNQR w - - 0 1
b2bknqr/3p3p/p6p/7P/1p1p1p1P/P1P2P2/1P6/1B2KB2 w - - 0 1
1bbnk2q/4p2p/2p2pp1/p2pPP2/1P2P3/7P/8/QBRNK1N1 w - - 0 1
2b1kqr1/p2p3p/3p4/p2PpP2/PpP2p2/6P1/8/RRB1KQ1N w - - 0 1
2qrkn1r/3p1p2/8/2Pp1p1p/5P2/2PP4/1P2P1PP/1RR1K1N1 w - - 0 1
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

Re: most similar hashes of two positions

Post by brtzsnr »

Yes. Peter won the 'crafted' category before the contest started. Congratulations, Peter! Very nice find.

The second category is still open. As a reminder in the second category the positions must be reached by following a series of legal moves from the start position.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: most similar hashes of two positions

Post by syzygy »

brtzsnr wrote:The second category is still open. As a reminder in the second category the positions must be reached by following a series of legal moves from the start position.
What do you mean by that? Practically any reasonably looking position is reachable by a series of legal moves from the initial position.

I did not look in detail at all positions listed by Peter, but they do seem to be reasonable. I'm sure most if not all of them are reachable by a series of legal moves from the initial position.

The initial position itself is reachable by the empty series of legal moves.
petero2
Posts: 685
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: most similar hashes of two positions

Post by petero2 »

hgm wrote:
petero2 wrote:That was actually me. Here are some links:
My humblest aplogies! I never was good with names. :oops:
No problem. Here is a link to the source code in case someone wants to find more collisions:

https://dl.dropboxusercontent.com/u/896 ... ashcoll.7z
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: most similar hashes of two positions

Post by bob »

Zenmastur wrote:
mar wrote:
brtzsnr wrote:Based on the thread "Worst Advice" (http://www.talkchess.com/forum/viewtopic.php?t=57235) I propose the following challenge:

Find two distinct positions whose Polyglot keys (http://hgm.nubati.net/book_format.html) have the lowest Hamming distance (fewer distinct bits).
Hamming distance is not a good measure of hash key quality (this has been shown to me by hgm some time ago).
Actually you can craft Zobrist hashes to maximize Hamming distance but such will perform very very badly.
About a year ago I went through the posts looking for hash collision info. I thought I did a reasonably good search, but I don't remember seeing HGM giving an expose on the subject. Can you point out the discussion?

bob wrote:I am not sure what there is to decide. You can't represent all chess positions with just 64 bits. So far the best has been around 160 bits or so. mapping 2^160 positions into 2^64 bits clearly will have literally gazillions of positions with a hamming distance of zero. Something like 2^96 such positions roughly...

Collisions (false matches with 64 bit signatures) absolutely happen. If an illegal move will crash your engine, you'd better check 'em or suffer the occasional crash since these are just like death, taxes, and such, there is no escaping them.
As I recall, a few years back, you initiated a rather long discussion on this topic. You were trying to produce keys with the maximum possible Hamming distance until someone (offline) demonstrated that this would produce horrible collision problems. IE. XOR'ing any two keys would produce a key already in the set. Unfortunately this conversation degenerated and ended prior to determining what the optimal hamming distance was. I remember thinking to myself that the optimal distance was (hash_key_size_in_bits minus log2(number_of_Zorbist_keys))/4 e.g. (for 784 zorbist keys and a 64-bit hash it would be (64-log2(784))/4 = 13.6). If this is true, then it would also seem that it's impossible to create a set of 64-bit Zorbist keys that can guarantee that no collisions will occur when two positions are separated by 4 or more plies. IE. 13.6 / 2^4 < 1.

It does seem possible, at least in theory, to be able to create a set of 64-bit Zorbist keys which could guarantee that no collisions would occur between 2 positions separated by 3-plies or less. IE. 13.6 / 2^3 > 1 . Although, I have no clue as to how to go about constructing such a set.

Do these sound like reasonable conclusions?
Yes. And I think it was HGM that led me to conclude the idea was bad. Burton Wendroff and Tony Warnock wrote a paper in the ICGA in the early 80's that addressed just this issue. And they discussed the problem in terms of "plies". And they had a scheme that would let you protect N consecutive from collisions, but the down-side was that after that, the probability of collisions went up sharply, perhaps exponentially.


On a somewhat related subject...

You did a collision rate test a few years back in which you showed that very high collision rates (on the order 10^-6) had a relatively minor if any effect on engine move selection. I tried a few months ago to calculate what the collision rate for a given TT size and Zorbist key size but I couldn't find anyway to do it. A guess would be 1 / (2^(key_size_in_bits minus log2(#_of_table_entries)). This would work if all table entries were IID. There not, of course, and you can't have more collisions than table hits. Since the number of table hits in the middle game run between 15% and 20% this would seem to put a upper limit on possible collisions rates during this part of the game. So it's likely the only way to find out is to run some tests. The piece of information that would be nice to know is: With a 64-bit key size what is the number of table entries that will cause a measurable decline in engine ELO due to excess collisions.
You CAN have more collisions than hits. Because EVERY node in the tree results in a TT probe. And a collision would produce a false match. It's a two-step process. Compute the address from the signature (which has a huge number of different signatures that map to the same table address) and then compare the signature to that which is stored in the table. True collisions could drive the hit rate up, or not, since i (and probably others) only count a position as a hit if the signature matches AND the draft is useful. There are far more "hits" where the draft is too small. It is hard to figure out how to even measure hits in such an environment. If you end up using it, it would be a hit in my terminology, which means collisions would certainly fall under "hits" since only the collisions that mean something would have the ability to change anything.

But the big problem today is speed. On the 20 core box I use a lot, 100M nodes per second is typical. That produces a billion nodes every 10 seconds, which allows PLENTY of probes to encounter a collision. 20 years ago, a 64 bit signature might never produce a collision. Today it is not uncommon. I ran a test yesterday on a single position (fine 70) and after 5-6 minutes Crafty reported a single "bad move from hash table" error which signifies a collision. But that is a lower bound on that position because there are almost certainly more collisions where the hash move just happened to be legal.

64 bits is a lot of "protection" until you begin to see searches where the total nodes can't be counted in a 32 bit counter, I see 15-25 billion node searches on ICC all the time...

time=3:23(99%) nodes=18300692792(18.3B) fh1=91% pred=11 nps=90.0M
time=3:06(99%) nodes=16568754647(16.6B) fh1=91% pred=11 nps=89.0M
time=4:57(99%) nodes=27050978878(27.1B) fh1=91% pred=14 nps=90.8M
time=3:46(99%) nodes=19049838535(19.0B) fh1=89% pred=11 nps=84.3M

Those were from the last ACCA tournament. Those kinds of numbers tend to drive up collisions dramatically compared to 20 years ago when 60K nodes per second was very fast.

If it takes a collision rate > ~10^-6 to cause a noticable change in engine ELO and 20 excess bit in the key vice the number of table entries will produce an error rate of about 10^-6 then it would appear that table sizes as large as 16 trillion entries (2^44 entries) would be possible.

I know what a lot of you are thinking, you're thinking that when the number of table entries exceeds the square root of the possible number of zorbist keys collisions become almost gauranteed. While this may be true, It doesn't tell you how many collisions you will have, just that the probability of having at least one becomes very large. In a table of 2^44 entries you have to have greater than 2^24 probes that return invalid data due to a false positive per 2^44 probes to have an error rate of ~10^-6.

In any case, I don't think this can be tested on a full 64-bit key because of the amount of TT memory needed and the amount of time required to generate the required numbe of cache probes. i.e. a machine probing at 100,000,000 nodes per second would require over 2.5 months to completely fill the table.

Thoughts?

Regards,

Zen
It is an issue. But from the paper Cozzie and I wrote, I don't worry about collisions causing problems with game play, but in my case, I have to be ultra-careful about illegal moves that can corrupt the board since Crafty doesn't do copy/make. If I make a castle move when it is not legal (king or rook has moved, I am doomed because there is now at least one extra piece on the board, I might have lost an opponent's piece, and there is no way to recover, leaving the board corrupted which will eventually lose the game. Or create two kings when that side can now never be checkmated. :)

So I screen moves to the best of my ability (within computational time constraints) and don't give a second thought to the occasional incorrect evaluation.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: most similar hashes of two positions

Post by hgm »

On the subject of hash keys:

A few years ago Daniel Shawul investigated keys that could be generated 'on the fly' by combining a piece-type and a square part (rather than preparing a piece-square table of them, as we usually do. You would then only need nrOfPieceTypes *nrOfColors + nrOfBoardSquares keys, in stead of nrOfPieceTypes * nrOfColors * nrOfBoardSquares. E.g. for chess 12 + 64 = 76 instead of 12*64 = 768, saving a factor 10 in table space (of these nasty 8-byte keys).

IIRC this worked quite well if you multiplied (i.e. pieceKey[coloredType]*squareKey[square]). For purely randomly chosen piece and square keys this was a little bit inferior, as the keys then get a bias for being even (only 25% would be odd). But that can be repaired by chosing the original keys predominantly odd (e.g. ~rand()*rand()).

Using pieceKey + squareKey, and the usual XOR for constructing the total key, surprisingly only works slightly worse.

I have experimented myself with 'overlapping keys'. When each key reused 7 bytes of the previous one, shifted left 8 bits, plus 8 new random bits, there was not a significant increase in the number of collisions compared to completely independent random keys. This saved a factor 8 in cache load.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: most similar hashes of two positions

Post by bob »

hgm wrote:On the subject of hash keys:

A few years ago Daniel Shawul investigated keys that could be generated 'on the fly' by combining a piece-type and a square part (rather than preparing a piece-square table of them, as we usually do. You would then only need nrOfPieceTypes *nrOfColors + nrOfBoardSquares keys, in stead of nrOfPieceTypes * nrOfColors * nrOfBoardSquares. E.g. for chess 12 + 64 = 76 instead of 12*64 = 768, saving a factor 10 in table space (of these nasty 8-byte keys).

IIRC this worked quite well if you multiplied (i.e. pieceKey[coloredType]*squareKey[square]). For purely randomly chosen piece and square keys this was a little bit inferior, as the keys then get a bias for being even (only 25% would be odd). But that can be repaired by chosing the original keys predominantly odd (e.g. ~rand()*rand()).

Using pieceKey + squareKey, and the usual XOR for constructing the total key, surprisingly only works slightly worse.

I have experimented myself with 'overlapping keys'. When each key reused 7 bytes of the previous one, shifted left 8 bits, plus 8 new random bits, there was not a significant increase in the number of collisions compared to completely independent random keys. This saved a factor 8 in cache load.
Only issue I would see is alignment. I might give it a try to see what it does to speed.