Hash collision?

hgm · Post by **hgm** » Sat Jun 08, 2019 7:08 pm

It is not only that there are relatively few nodes close to the root, but also that these need comparatively high depth. So even if you are so extremely unlucky that you get a key collision in a node close to the root, the entry you retrieve instead will have virtually zero chance to have a usable depth for that node. So basically the rareness of nodes close to the root has to be squared.

If the collisions are completely random and independent, they will hardly ever propagate to a level two ply closer to the root. What will usually happen is that the intervening cut-node, which sees its cut-move spoiled, will simply pick another cut move. A true disadvantage will usually be inherited by all grand children, so for those there isn't such an easy way out.

As to Martin's question: the flaw is that in every other hash slot you will have exactly the same problem. So it doesn't help you that I-don't-know-how-many bits in the key first direct you to a bucket where only positions that have the same value for those bits go. Wherever you end up, there will be something there, and it will be a 1/nrOfSignatures probability that it happens to have the same signature as the sought position by pure chance. Only when all other buckets were empty mapping to another bucket would guarantee that you get no collision there.

chrisw · Post by **chrisw** » Sat Jun 08, 2019 8:04 pm

hgm wrote: ↑Sat Jun 08, 2019 7:08 pm It is not only that there are relatively few nodes close to the root, but also that these need comparatively high depth. So even if you are so extremely unlucky that you get a key collision in a node close to the root, the entry you retrieve instead will have virtually zero chance to have a usable depth for that node. So basically the rareness of nodes close to the root has to be squared.

If the collisions are completely random and independent, they will hardly ever propagate to a level two ply closer to the root. What will usually happen is that the intervening cut-node, which sees its cut-move spoiled, will simply pick another cut move. A true disadvantage will usually be inherited by all grand children, so for those there isn't such an easy way out.

well, speaking as someone whose program(s) always contained hard to imagine bugs, it seems that any evaluation bug (for example, getting mate at a glance wrong in some rare situation, like en passent involved) will find its way back to the root and cause a loss, when evaluation thought it had a win.
Isn’t there an adage to the effect that “search will find and amplify your evaluation bugs”?

As to Martin's question: the flaw is that in every other hash slot you will have exactly the same problem. So it doesn't help you that I-don't-know-how-many bits in the key first direct you to a bucket where only positions that have the same value for those bits go. Wherever you end up, there will be something there, and it will be a 1/nrOfSignatures probability that it happens to have the same signature as the sought position by pure chance. Only when all other buckets were empty mapping to another bucket would guarantee that you get no collision there.

mar · Post by **mar** » Sat Jun 08, 2019 9:22 pm

hgm wrote: ↑Sat Jun 08, 2019 7:08 pm As to Martin's question: the flaw is that in every other hash slot you will have exactly the same problem. So it doesn't help you that I-don't-know-how-many bits in the key first direct you to a bucket where only positions that have the same value for those bits go. Wherever you end up, there will be something there, and it will be a 1/nrOfSignatures probability that it happens to have the same signature as the sought position by pure chance. Only when all other buckets were empty mapping to another bucket would guarantee that you get no collision there.

Thanks for the explanation.

So the whole "effective hash size" is nonsense and everything boils down to how many bits of hash you store in the TT, inversely proportional to the number of slots in the bucket.

hgm · Post by **hgm** » Sat Jun 08, 2019 11:26 pm

chrisw wrote: ↑Sat Jun 08, 2019 8:04 pmwell, speaking as someone whose program(s) always contained hard to imagine bugs, it seems that any evaluation bug (for example, getting mate at a glance wrong in some rare situation, like en passent involved) will find its way back to the root and cause a loss, when evaluation thought it had a win.
Isn’t there an adage to the effect that “search will find and amplify your evaluation bugs”?

Wouldn't that simply be because an evaluation bug would show up consistently in many leaves, so that there is no escaping from it already at the root?

chrisw · Post by **chrisw** » Sun Jun 09, 2019 1:29 pm

hgm wrote: ↑Sat Jun 08, 2019 11:26 pm
chrisw wrote: ↑Sat Jun 08, 2019 8:04 pmwell, speaking as someone whose program(s) always contained hard to imagine bugs, it seems that any evaluation bug (for example, getting mate at a glance wrong in some rare situation, like en passent involved) will find its way back to the root and cause a loss, when evaluation thought it had a win.
Isn’t there an adage to the effect that “search will find and amplify your evaluation bugs”?
Wouldn't that simply be because an evaluation bug would show up consistently in many leaves, so that there is no escaping from it already at the root?

Yes, that’s one possibility. The other is that it’s a bug of type only appearing at the end of a “forcing” sequence, for example, an incorrect mate at a glance evaluation. In which case its passage back to the root is more guaranteed.
I guess we can make the case that a game terminating evaluation pulled out out of a collision at the end of a forcing sequence would have similar effect, but we are piling chance upon chance upon chance here. Collision. Mate score found. Prior moves forcing. The first may be 1 in 20000 as some suggest, the second, a mate bound, 1 in a 100 at best, the third, maybe 1 in a 100, guessing. But that would be another 1 in 10000.
Indicating a serendipitously game altering hash fail in 1 in 200,000,000 nodes. Which clearly is not the case.
Looked at from the other angle, Stockfish must have played 10,000 observed tournament games by now, with nobody, AFAIK, reporting a total fail. At ten seconds a move thats 10*100*10000 positions with 10^7 nps, or 10^14 nodes and potential for serendipitous hash failure.

hgm · Post by **hgm** » Sun Jun 09, 2019 1:58 pm

Indeed, score mutations would only propagate as far towards the root as the cut-moves are singular (by definition, I would say). So we live by the grace of the fact that in chess long series of singular moves are quite rare, and although they might occur in the tree, there will almost never be one extending all the way from root to leaf. Game-terminating scores would indeed enhance this probability, as a leaf where that happens can be close to the root even in a deep search. Non-game-terminating errors would be corrected in the next iteration, where the search goes beyond them. This would be worse if the hash probing would consider a mate score valid to any depth that should nominally be sufficient to see it (e.g. use a mate-in-3 score for a d=20 probe, even when it was obtained in a d=5 search), as then collision close to the root with a mate score from a shallow search would not be rejected anymore.

chrisw · Post by **chrisw** » Sun Jun 09, 2019 2:13 pm

hgm wrote: ↑Sun Jun 09, 2019 1:58 pm Indeed, score mutations would only propagate as far towards the root as the cut-moves are singular (by definition, I would say). So we live by the grace of the fact that in chess long series of singular moves are quite rare, and although they might occur in the tree, there will almost never be one extending all the way from root to leaf. Game-terminating scores would indeed enhance this probability, as a leaf where that happens can be close to the root even in a deep search. Non-game-terminating errors would be corrected in the next iteration, where the search goes beyond them. This would be worse if the hash probing would consider a mate score valid to any depth that should nominally be sufficient to see it (e.g. use a mate-in-3 score for a d=20 probe, even when it was obtained in a d=5 search), as then collision close to the root with a mate score from a shallow search would not be rejected anymore.

We may be living by the grace of the fact that such a score mutation or serendipitous hash collision won't be repeatable. Rare random bugs dependent of the phase of the moon never get found. So, they may exist, but they can't be proven.

konsolas · Post by **konsolas** » Sun Jun 09, 2019 8:11 pm

Although the likelihood of an incorrect cutoff affecting the search in a meaningful way seems small, would it be worth verifying other aspects of the hash entry to reduce collisions? For example, checking that the stored move is pseudo-legal.

chrisw · Post by **chrisw** » Sun Jun 09, 2019 10:29 pm

konsolas wrote: ↑Sun Jun 09, 2019 8:11 pm Although the likelihood of an incorrect cutoff affecting the search in a meaningful way seems small, would it be worth verifying other aspects of the hash entry to reduce collisions? For example, checking that the stored move is pseudo-legal.

In the cases where the hash entry wasn’t treated as terminal and contained a continuation move, then it’s unlikely any programmer didn’t have validation checks on the move. Feeding unchecked bad move data into some random branch in the tree is liable to cause bad crashes. For a terminal hash entry, you probably wouldn’t want to incur the time penalty of verification, even though that would reduce the risk a factor of ten or whatever. Better just add another 4 bits to the key instead.

syzygy · Post by **syzygy** » Mon Jun 10, 2019 1:08 am

Joost Buijs wrote: ↑Sat Jun 08, 2019 4:07 pm
syzygy wrote: ↑Sat Jun 08, 2019 2:51 pm
In Stockfish, the ratio of false hits is one in every 2^16/3 = 21845 probes, so one in every 16384 is unlikely to do a lot of damage.
Maybe Stockfish doesn't probe on every node (position) it generates, I don't know. But if it does, and you run on a fairly modern PC it can have like 1000 false hits a second, personally I can't imagine that this doesn't hurt. But I assume it has all been tested safe and sound.

I don't like the SF approach (3 positions per 32-byte bucket, 16 bits for the key) myself, but it is difficult to argue against it as long as there are no test results showing that it hurts at very long time controls (compared with 4 positions per 64-byte bucket, 32 bits for the key).

Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?

Re: Hash collision?