Clemens Pruell wrote:Till now I've used the TT for move ordering purposes only (e.g. retrieve the PV-move from a previous iteration). Having read a very instructive paper by Prof. Hyatt about TT key collisons, when in a test a program would choose a different root move 50% of the time in case the TT is used for 'immediate' cutoffs (transpositions in a sense that the 'same' position 'most likely' occured before & was searched to the same depth, and hence instead of searching it again the score of the hash-move is returned) I'm somewhat afraid that using the TT in this way is a somewhat 'unsound' way of foreward pruning, because it seems to happen very often that two totally different positions get the same Zobrist Key. In extreme cases this could cause the program to blunder horribly.
However, in order to improve my program's overall search depth I'd still like to use the TT for its primary purpose, but I don't know what is a 'safe' workaround for the problem of possible Key Collisions. I've been thinking of using a second TT with completely different 64-bit Zobrist Keys, perhaps this could lower the chance of Key Collsions (must be really incredibly unlikely that two totally different positions get the same two 64-Zobrist Keys), however this certainly means there'll be a siginifant overhead with constantly updating the two different Zobrist Keys, plus of course the required memory would be doubled, too.
So my question: Do you think it's worth the effort to use two Zobrist Keys + twice as much memory for reducing the chances + potentially very negative consequences of TT key collisions, have you tried something similar?
Other than that as an additional layer of safety maybe it's sufficient to verify that the stored hash move is also at least legal in a current node, but of course when analyzing positions which result from the same root node I suppose that it's also very likely that the same move is legal in many different positions which have this same root-node,
so this - again - could be mere luck that not only the Zobrist Key is identical but also the Hashmove is also legal in both positions, but nevertheless the position in the TT could be very different from a current node, although it's got the same Zobrist Key.
Thanks for any good tips of how to reduce the negative effects of Key Collisions, I'm also wondering how much of a speedup I could expect from my program if it used the TT not only for move ordering, but also for (apparent) Transposition-Forward-Pruning? What is a typical number for your own programs? 10%? 90%?
best wishes
First, if you use a 64 bit zobrist key, and your random numbers are really random and are true 64 bit randoms, you should _rarely_ see two different positions that produce the same zobrist signature.
Note that the signature is a 64 bit value, while for addressing the table, you use a subset of those bits. But you should store the non-address bits in the table for a key match test. You already know the address bits match, you are probing the same address. But you must verify that the other N bits in the signature match what is in the table as well.
if you are doing that, and you are _really_ seeing lots of positions that are producing true hash key collisions (two distinctly different positions produce the same Zobrist signture) then you need to do some debugging as something is wrong. You might see that once every 10M or 100M nodes in a bad implementation, much less overall. I once ran such a test to see what the real probability is, and posted the results on rec.games.chess.computers 15+ years ago. One false match in one billion is high. If you are seeing a lot, something is wrong and has to be corrected.
The "birthday paradox" will certainly cause an occasional collision. I used to test the hash move for legality and print an error message if it failed. This was a way of determining if I had such a signature collision, although it was not highly accurate. I would typically see 2-3 errors per GAME played on ICC. If you figure 60 minutes for a long game, 3600 seconds, at 1M nodes per second back then, 2-3 errors per 4 billion nodes. And it was probably just a bit higher. The larger the hash, the greater the probability because you can't get false matches on things that are overwritten frequently... I'll try to find the real numbers somewhere and post them...
But. If you look at the hashing paper in the JICGA I wrote with Cozzie's help, one hash collision every 1M nodes won't make any measurable difference in the program. Even one error every 1000 nodes doesn't make a big difference...
But certainly you don't want that many...