TT Key Collisions, Workarounds?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: TT Key Collisions, Workarounds?

Post by hgm »

Well, I don't know which articles exactly you were quoting, but the way you describe their conclusion it does not sound like the 50% different move choice has anything to dowith hash collisions whatsoever. They are just because your use TT cuts or not...
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: TT Key Collisions, Workarounds?

Post by tpetzke »

Unfortunately 64 bit are not enough to identify uniquely each possible chess position, so a perfect hash function will require much more bits, 143 bit or so.

= (roughly) 64! / 32!*8!*8!*2!*2!*2!*2!*2!*2! = 10^43 = 2^143

This includes illegal positions and excludes positions through promotions and captures.

This rough calculation means you need 143 bit to store all positions. The real more correct number is probably higher, heard something of about 155 bit.

If you see a lot of key collisions maybe you try to seed with a different number or you have a weak random number generator and should replace it. I use a Mersenne Twister implementation, but this is probably overkill.

Thomas...
Clemens Pruell

Re: TT Key Collisions, Workarounds?

Post by Clemens Pruell »

http://www.cis.uab.edu/hyatt/collisions.html

very interesting article, and seeing the tables with the SS/ SG/ MS entries let me think that type-1 collisions are the rule, and not so much the exception.
well, to me reading this paper was a major surprise because I didn't expect type-1 collisions to happen so frequently. hence, my current goal is to find some way of hashing chess positions perfectly (which won't work with Zobrist Key), because there -must- exist such a perfect hashing function which eliminates every chance of type-1 collisions completely.
wouldn't this be great ?! - using a TT with a perfect hashing function
in chess programs and not having to worry about inconsistent & unreliable playing strength (due to TT-type-1 collisions, whether it's as unlikely as once in 30.000 years or whether they happen so incredibly often as it's mentioned in that article ...).
well, I'll keep on looking for a way of hashing chess positions perfectly (to 64-bit keys), I'm sure there -must- exist such a function.
thanks for tipps
best wishes
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: TT Key Collisions, Workarounds?

Post by Desperado »

Sorry Clemens, that sounds confusing. Maybe you can give an example.
Clemens Pruell wrote:
Desperado wrote:Just a little sth. , because you expressed yourself with "the score of the hash-move ".
The score belongs to the position, but not necessarily to
a move. You may think of a failHi situation, store a score+move.
Again at this position you failLo and keep the move. Now there is a move
and a score, but they do not necessarily correspond. (as i said, just a little
sth.)

Michael
Well, I've treated the score as a property of the hash move, reasoning behind this is:

If this move is played, and I follow this variation ...
ok, you follow the "tMove"...

example: let us say we are at ply: 3 depth: 6 HashEntry-> transDepth: 2
...
until I'm at the same depth of the hashmove,
...
case 1:
- we can follow further some transmoves until we are at ply: 3 + 2 (5)
- where is you leaf node, following which entry (ply3,ply4)?
else :
- which variation do you follow ?
- we are here(somehow ply 5)
* you define this node as leaf because of hashEntry at ply 3+2(5) ?!
* and by random has the same score as the hashEntry at ply 3?!
...
I'll reach a leaf node with the same score which was listed together with the hash move. ...
What happens now ?

conclusion:
===========

Of course there are many more questions i could ask.
But maybe i _totally_ missed the point.

Maybe you do what everyone is doing.
First exploring transmoves integrated in a normal search.
Otherwise it would be nice if you could rephrase what you meant,
or give an example.

Thx, Michael
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: TT Key Collisions, Workarounds?

Post by tpetzke »

I'll keep on looking for a way of hashing chess positions perfectly (to 64-bit keys), I'm sure there -must- exist such a function.
Stop looking ! It's like pouring a full barrel of water into a cocktail glass and hoping nothing will spill over. 64 bit are just not enough to code 2^155 different boards.

Thomas...
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: TT Key Collisions, Workarounds?

Post by Sven »

Clemens Pruell wrote:http://www.cis.uab.edu/hyatt/collisions.html

very interesting article, and seeing the tables with the SS/ SG/ MS entries let me think that type-1 collisions are the rule, and not so much the exception.
well, to me reading this paper was a major surprise because I didn't expect type-1 collisions to happen so frequently. hence, my current goal is to find some way of hashing chess positions perfectly (which won't work with Zobrist Key), because there -must- exist such a perfect hashing function which eliminates every chance of type-1 collisions completely.
wouldn't this be great ?! - using a TT with a perfect hashing function
in chess programs and not having to worry about inconsistent & unreliable playing strength (due to TT-type-1 collisions, whether it's as unlikely as once in 30.000 years or whether they happen so incredibly often as it's mentioned in that article ...).
well, I'll keep on looking for a way of hashing chess positions perfectly (to 64-bit keys), I'm sure there -must- exist such a function.
thanks for tipps
best wishes
If you read again the paper you referred to above, especially the final section "Conclusions", you will realize that there is absolutely no reason to be afraid of these collisions. Take this paragraph, for instance:
Hyatt and Cozzie in 'The effect of hash collisions in a Computer Chess program' wrote:The numbers are surprising, but they clearly show that errors do not influence the search nearly as much as has always been suspected. Even with programs searching at one million nodes per second and beyond, the errors do not wreck the search and cause substantial problems. For example, in the tactical positions, there are just a few key positions that influence the final result. In a tree search space of 120 million nodes, a signature collision on one of such a small set of critical nodes is very unlikely. One point this certainly underlines is that past claims that programs search a large number of useless chess positions is clearly right on target. One can almost change scores at will and still have little adverse effect on the quality of the moves produced.
Sven
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: TT Key Collisions, Workarounds?

Post by wgarvin »

Clemens Pruell wrote:Yes, of course I store the full Zobrist Key in order to detect Type B (Index-) Collisions. It's only the Type A - collisions which seem to happen more often than one would think.
Type A collisions should be extremely rare. (More than one in a game should be astronomically unlikely). Have you considered whether your Zobrist numbers are random enough?

If you are just using C library rand() or something, you should replace it with some custom PRNG code. Use the Mersenne twister or something like RKISS that Stockfish uses.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: TT Key Collisions, Workarounds?

Post by Don »

Clemens Pruell wrote: well, I'll keep on looking for a way of hashing chess positions perfectly (to 64-bit keys), I'm sure there -must- exist such a function.
thanks for tipps
best wishes
This is an impossible task. Imagine trying to assign a 3 digit number to everyone man woman and child in the world! There are several billion humans and only 1000 three digit numbers so many people are going to have to share the same numbers.

There is something called perfect hashing which means a hash function that produces NO collisions. It's not practical for computer chess but even if it were it would require a lot more than 64 bits.

It would be possible to design some function that produced unique keys for any chess position, but this would not technically be "hashing" and would not be practical. One example of that is the fen representation, which generates unique values for each position - but that is not practical as a hash function as it requires many hundreds of bits.

You can design fairly compact way to represent a chess position and try to treat them as keys - but you would have a problem with good key distribution. Fen can be compressed a lot for example by eliminated the '/' between positions, and allowing double digits to represent long sequences of empty spaces. The opening position might look like this:

rnbqkbnrpppppppp32PPPPPPPPRNBQKBNRw

You can do way better by abandoning the ascii representation and using your own condensed alphabet. You don't need 8 bits per token when there are a lot less than 256 tokens that you need.

It is likely you can get amazing compression with a lot of work, but it HAS to take a lot more than 64 bits in total since there are more positions than could ever fit in 64 bits regardless of how clever you are with representation.
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: TT Key Collisions, Workarounds?

Post by jwes »

Clemens Pruell wrote:
hgm wrote:you can expect one collision per 30,000 years.
Thanks a lot, now that seems to be a really low chance of key collisions, so I'll use the TT for its primary purpose from now on :)
Big thanks.
It is a really low estimate. The actual value depends on the hash table size. If you have 256M hash table entries and you probe 4M times a second, you can expect a key collision every 16K seconds, or about every 4 1/2 hours.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: TT Key Collisions, Workarounds?

Post by bob »

Clemens Pruell wrote:Till now I've used the TT for move ordering purposes only (e.g. retrieve the PV-move from a previous iteration). Having read a very instructive paper by Prof. Hyatt about TT key collisons, when in a test a program would choose a different root move 50% of the time in case the TT is used for 'immediate' cutoffs (transpositions in a sense that the 'same' position 'most likely' occured before & was searched to the same depth, and hence instead of searching it again the score of the hash-move is returned) I'm somewhat afraid that using the TT in this way is a somewhat 'unsound' way of foreward pruning, because it seems to happen very often that two totally different positions get the same Zobrist Key. In extreme cases this could cause the program to blunder horribly.
However, in order to improve my program's overall search depth I'd still like to use the TT for its primary purpose, but I don't know what is a 'safe' workaround for the problem of possible Key Collisions. I've been thinking of using a second TT with completely different 64-bit Zobrist Keys, perhaps this could lower the chance of Key Collsions (must be really incredibly unlikely that two totally different positions get the same two 64-Zobrist Keys), however this certainly means there'll be a siginifant overhead with constantly updating the two different Zobrist Keys, plus of course the required memory would be doubled, too.
So my question: Do you think it's worth the effort to use two Zobrist Keys + twice as much memory for reducing the chances + potentially very negative consequences of TT key collisions, have you tried something similar?
Other than that as an additional layer of safety maybe it's sufficient to verify that the stored hash move is also at least legal in a current node, but of course when analyzing positions which result from the same root node I suppose that it's also very likely that the same move is legal in many different positions which have this same root-node,
so this - again - could be mere luck that not only the Zobrist Key is identical but also the Hashmove is also legal in both positions, but nevertheless the position in the TT could be very different from a current node, although it's got the same Zobrist Key.
Thanks for any good tips of how to reduce the negative effects of Key Collisions, I'm also wondering how much of a speedup I could expect from my program if it used the TT not only for move ordering, but also for (apparent) Transposition-Forward-Pruning? What is a typical number for your own programs? 10%? 90%?
best wishes
First, if you use a 64 bit zobrist key, and your random numbers are really random and are true 64 bit randoms, you should _rarely_ see two different positions that produce the same zobrist signature.

Note that the signature is a 64 bit value, while for addressing the table, you use a subset of those bits. But you should store the non-address bits in the table for a key match test. You already know the address bits match, you are probing the same address. But you must verify that the other N bits in the signature match what is in the table as well.

if you are doing that, and you are _really_ seeing lots of positions that are producing true hash key collisions (two distinctly different positions produce the same Zobrist signture) then you need to do some debugging as something is wrong. You might see that once every 10M or 100M nodes in a bad implementation, much less overall. I once ran such a test to see what the real probability is, and posted the results on rec.games.chess.computers 15+ years ago. One false match in one billion is high. If you are seeing a lot, something is wrong and has to be corrected.

The "birthday paradox" will certainly cause an occasional collision. I used to test the hash move for legality and print an error message if it failed. This was a way of determining if I had such a signature collision, although it was not highly accurate. I would typically see 2-3 errors per GAME played on ICC. If you figure 60 minutes for a long game, 3600 seconds, at 1M nodes per second back then, 2-3 errors per 4 billion nodes. And it was probably just a bit higher. The larger the hash, the greater the probability because you can't get false matches on things that are overwritten frequently... I'll try to find the real numbers somewhere and post them...

But. If you look at the hashing paper in the JICGA I wrote with Cozzie's help, one hash collision every 1M nodes won't make any measurable difference in the program. Even one error every 1000 nodes doesn't make a big difference...

But certainly you don't want that many...