Collision Detection in Transposition Table

cms271828 · Post by **cms271828** » Sun Apr 25, 2010 8:06 pm

Hi

We know 64 bit zobrist key is considered large enough to avoid collisions, and 32 apparently is not.

I'm considering using a 54 bit zobrist key so I can make my TT 2^22 elements (instead of 2^21), each one being one java long(64 bits) (instead of 2 java longs).

I believe I can get it down to 54 bits, but my question is about collisions.

I can't remember how its worked out, I know how the 'birthday paradox' works, but not sure how to relate it, cause it also depends on how many nodes per second you can search, say 3,000,000 for my engine on my pc.

So if a collision came about once a month or year, then I guess thats no problem.

Can anyone help, thanks

jwes · Post by **jwes** » Sun Apr 25, 2010 9:28 pm

cms271828 wrote:Hi

We know 64 bit zobrist key is considered large enough to avoid collisions, and 32 apparently is not.

I'm considering using a 54 bit zobrist key so I can make my TT 2^22 elements (instead of 2^21), each one being one java long(64 bits) (instead of 2 java longs).

I believe I can get it down to 54 bits, but my question is about collisions.

I can't remember how its worked out, I know how the 'birthday paradox' works, but not sure how to relate it, cause it also depends on how many nodes per second you can search, say 3,000,000 for my engine on my pc.

So if a collision came about once a month or year, then I guess thats no problem.

Can anyone help, thanks

Once the hash table is full, the probability of a position not seen before giving a false positive is 1/(2^bits in stored part of key). In your case, that 1/2^32. Assuming 3,000,000 nodes per second gives about 2,300,000 new positions per second (a WAG), that would be about 2 collisions per hour.

cms271828 · Post by **cms271828** » Sun Apr 25, 2010 10:20 pm

Thanks, I'm starting to confuse myself now.

Getting mixed up between the probability 2 different positions will aquire the same 54 bit zobrist key, and...

The probability 2 different positions will map to the same entry in the table(With 2^22 entries, that entries position would be number key & (2^22 -1))

I figured the 2nd one doesn't matter(as long as the 32bits stored are distinct), cause you can just record/probe the next few entries.

But the 1st one is a definate problem, as it may try and play a move stored for a different position.

I'm not sure if the calculation in the last reply accounts for this. I was sure it was analagous to the birthday problem.

Any thoughts, thanks

hgm · Post by **hgm** » Sun Apr 25, 2010 10:50 pm

Note that Bob showed that making one error every 1000 probes (returning a wrong entry) has virtually no effect on move choice, and only little on the score. That would mean a 10 bit hash lock is already sufficient.

I am not sure the result is valid, though: as far as I understood it, it assigned the errors randomly based on the probe number, rather than the probe key. That would mean another path to the same position might get the good result on the next probe, which might allow the search to work around errors. With a key collision you would consistently get the same position wrong.

jwes · Post by **jwes** » Mon Apr 26, 2010 4:29 am

cms271828 wrote:Thanks, I'm starting to confuse myself now.

Getting mixed up between the probability 2 different positions will aquire the same 54 bit zobrist key, and...

The probability 2 different positions will map to the same entry in the table(With 2^22 entries, that entries position would be number key & (2^22 -1))

I figured the 2nd one doesn't matter(as long as the 32bits stored are distinct), cause you can just record/probe the next few entries.

But the 1st one is a definate problem, as it may try and play a move stored for a different position.

I'm not sure if the calculation in the last reply accounts for this. I was sure it was analagous to the birthday problem.

Any thoughts, thanks

The calculations I gave are for all 56 bits matching. I think a more exact estimate of the probability of a false positive for a hash probe is p=(probability that a probe finds an occupied entry)*(probability that a probe does not return a true match)*(number of entries that have the same address part of the hash key)/2^(the number of bits of the hash key stored in each entry (assuming the stored bits and the bits used for addressing are different). This is off the top of my head and could be wrong. It is somewhat different from the birthday problem because of the buckets.

abulmo · Post by **abulmo** » Wed Apr 28, 2010 9:55 am

jwes wrote: Once the hash table is full, the probability of a position not seen before giving a false positive is 1/(2^bits in stored part of key). In your case, that 1/2^32. Assuming 3,000,000 nodes per second gives about 2,300,000 new positions per second (a WAG), that would be about 2 collisions per hour.

I do not understand this computation.
I suppose most programs use an index different from the stored part of the key. In that case, doesn't the collision number also depend on the size of the hash table ?
If the program has got a hash table with 1 million entries, & it reads 1 million entries per second, there is only a 1/2^32 risk of collision per second, which is less than 1 collision per century.

rbarreira · Post by **rbarreira** » Sat Jun 19, 2010 7:34 pm

cms271828 wrote:Hi

We know 64 bit zobrist key is considered large enough to avoid collisions, and 32 apparently is not.

I'm considering using a 54 bit zobrist key so I can make my TT 2^22 elements (instead of 2^21), each one being one java long(64 bits) (instead of 2 java longs).

I believe I can get it down to 54 bits, but my question is about collisions.

I can't remember how its worked out, I know how the 'birthday paradox' works, but not sure how to relate it, cause it also depends on how many nodes per second you can search, say 3,000,000 for my engine on my pc.

So if a collision came about once a month or year, then I guess thats no problem.

Can anyone help, thanks

If the wikipedia article on the birthday problem has the correct formulas, this is the probability that there will be a 54-bit collision with a full hash table of 2^22 elements:

(1-e^(-2^22*(2^22-1)/(2*2^54))) approx 0.05%

This is approximately 1000 times more than with a 64 bits hash value, but it doesn't look that bad. I think you may be able to get away with that hash length.

edit - many people are using hash tables with 2^27 elements with a 64-bit hash, which has the same collision probability as your case. So you really should be fine.

Don · Post by **Don** » Tue Jun 22, 2010 7:09 am

cms271828 wrote:Hi

We know 64 bit zobrist key is considered large enough to avoid collisions, and 32 apparently is not.

I'm considering using a 54 bit zobrist key so I can make my TT 2^22 elements (instead of 2^21), each one being one java long(64 bits) (instead of 2 java longs).

I believe I can get it down to 54 bits, but my question is about collisions.

I can't remember how its worked out, I know how the 'birthday paradox' works, but not sure how to relate it, cause it also depends on how many nodes per second you can search, say 3,000,000 for my engine on my pc.

So if a collision came about once a month or year, then I guess thats no problem.

Can anyone help, thanks

The real issue isn't how many collisions, but how often it actually changes a move - or really more to the point how often does it change the result of a game?

But if you want collisions numbers you can use 3 or 4 extra bits to give you a good idea. In other words, pretend you are using 60 bits instead of 64 bits. When the 60 bits appear to collide, let the 4 bits determine if the collision was real. With only a small amount of error you will see how often a 60 bit key would give you collisions.

I suggest that if you are currently using 56 bits, do my experiment with 50 or 52 bits (using the other bits to estimate collision rates) and see if your collision rate seems reasonable. If, for example, you can play several hours before getting a collision you can be pretty happy. If you are using 4 check bits and you get a collision every hour, you can assume that with the extra 4 bits you are going to improve by a factor of 16. It's up to you to decide if that is acceptable or not.

If your collision rate is once every 16 hours (just a random example), you can probably assume that the collision is "move changing" only a fraction as often as that - in other words it's like having 3 or 4 extra bits of safety.

A few years ago I tested a program using 32 bits vs 64 bits for the hash key and found that I was getting fairly frequent collisions with 32 bits, but after a LOT of games I could not prove it was playing weaker.

But that was a few years ago. At todays search speeds the problem of course gets worse. Actually, does it really get worse? You might get more collisions but a collision on a super deep search is less likely to affect the move played.

My gut feeling is that 40 bits is enough - you will probably find collisions if you look hard enough, but it won't affect the strength in any kind of measurable way. However, if you are like me it's better to err on the safe side and go a few more bits that you think you really need.

Collision Detection in Transposition Table

Collision Detection in Transposition Table

Re: Collision Detection in Transposition Table

Re: Collision Detection in Transposition Table

Re: Collision Detection in Transposition Table

Re: Collision Detection in Transposition Table

Re: Collision Detection in Transposition Table

Re: Collision Detection in Transposition Table

Re: Collision Detection in Transposition Table