Hash collision?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Hash collision?

Post by chrisw »

Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Hash collision?

Post by Ras »

Bob Hyatt has already done the research on the required hash length many years ago. The result was that 32 bit are not sufficient, 64 bit are, and you can even get away with 48 bit.

Obviously, you also have the implicit storage via the hash table index, minus the bits for your number of slots. You can get away with storing 32 bits if you have at least 16 bits implicit storage, i.e. 64k entries. Let's assume a bucket length of 3, then you need 18 bits, i.e. 256k entries. That shouldn't be an issue today.
Rasmus Althoff
https://www.ct800.net
elpapa
Posts: 211
Joined: Sun Jan 18, 2009 11:27 pm
Location: Sweden
Full name: Patrik Karlsson

Re: Hash collision?

Post by elpapa »

chrisw wrote: Fri Jun 07, 2019 4:43 pm Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
This will occupy my mind for days and in the end I won't be able to solve it, thank you very much.
User avatar
silentshark
Posts: 327
Joined: Sat Mar 27, 2010 7:15 pm

Re: Hash collision?

Post by silentshark »

This is all useful. Thanks.

So the big question for me, remains. 16 bits - good enough, or will screw you up sometimes?

(Of course, maybe I have another bug elsewhere, but hey.)
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Hash collision?

Post by chrisw »

elpapa wrote: Fri Jun 07, 2019 9:04 pm
chrisw wrote: Fri Jun 07, 2019 4:43 pm Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
This will occupy my mind for days and in the end I won't be able to solve it, thank you very much.
it’s similar to the how many people would need to be in a room for 50/50 chance that two have the same birthday paradox. well, not really a paradox.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Hash collision?

Post by hgm »

The number of collisions you get is trivial to calculate: it only depends on the size of the signature (in so far that doesn't overlap with the part of the key used as table index). With 32-bit signature one in every 4G probes will give you a false match, once the table is entirely filled. (Which in practice will always be the case.) If your probing scheme probes in clusters of 4, this of course also multiplies the collision rate by 4, so you get one error in a billion. At 1Mnps that means one collision every 1000 sec. This is bad/unacceptable if collisions can make your engine crash, and completely unobservable if they cannot. To really affect play you should have thousands of collisions per search.

With a 16-bit signature you would have 1 collision in 65k probes, even 1 in 16k if you probe 4 entries. That means that at 1Mnps you would get 60 collisions/sec. That should still be way below the rate where there is a realistic chance that they would affect the move choice or score, as long as they can never crash the engine.

The total size of the table has no influence on any of this.
Uri Blass
Posts: 10268
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Hash collision?

Post by Uri Blass »

chrisw wrote: Fri Jun 07, 2019 4:43 pm Puzzle. Suppose a search takes X nodes and the hashtable is 2^n in size. You would like to be 95% confident no two nodes have a hash collision.

What’s the formula connecting X and n?
More interesting question for me:
What is the probability that an engine get a wrong mate score in tablebase positions because of hash collision or a different reason when there is mate in n as a function of n.
By wrong mate score I mean mate that is faster than possible or mate for the opponent.

It may be interesting to have an automatic tool that simply get an epd file of random million mate in n positions from the 7 piece tablebases
and gives an engine to analyze them for m seconds without tablebases and give statistics about the results of the searches.

This tool may be good for testing engines.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Hash collision?

Post by Ras »

hgm wrote: Sat Jun 08, 2019 12:26 amThe total size of the table has no influence on any of this.
I don't agree. Imagine you had enough memory for a table with 2^64 entries, then you wouldn't need to store a signature at all because the full signature itself is only 64 bit (usually) so that the signature would be identical to the index position. That's how the implicit bits of the signature (via the index position) come into play.
Rasmus Althoff
https://www.ct800.net
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Hash collision?

Post by Dann Corbit »

Ras wrote: Sat Jun 08, 2019 3:19 am
hgm wrote: Sat Jun 08, 2019 12:26 amThe total size of the table has no influence on any of this.
I don't agree. Imagine you had enough memory for a table with 2^64 entries, then you wouldn't need to store a signature at all because the full signature itself is only 64 bit (usually) so that the signature would be identical to the index position. That's how the implicit bits of the signature (via the index position) come into play.
Take the other extreme: A hash table that stores a single entry.
Almost every operation overwrites it (except for real transpositions).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Hash collision?

Post by Ras »

Dann Corbit wrote: Sat Jun 08, 2019 3:22 amTake the other extreme: A hash table that stores a single entry.
Yeah then you would need to store the full 64 bit signature. Basically, the information stored is the length of the signature plus the implicit index bits minus the bucket length bits. Of course, you have to store bits that are not already encoded in the index bits.

I store the upper 32 bits. Then I abuse the flag for exact/alpha/beta which needs 2 bits, but is a uint8_t, so that allows to fudge in 6 additional bits. The bucket length is 3 which loses me 2 bits again, that gives 36 bits. With 2^12 = 4096 entries, that makes 48 bits, and with 10 bytes per entry, that's 40 kB. Even my embedded STM32 platform is sufficient for that, let alone smartphones or PCs.

On top of that, I check whether the hash move is pseudo-legal, so that gives a little more redundancy.
Rasmus Althoff
https://www.ct800.net