Don Dailey

Joined: 29 Apr 2008
Posts: 4325

Post subject: Re: how to measure frequency of hash collisions.    Posted: Sat Jun 16, 2012 10:23 pm

Daniel Shawul wrote:
 Don wrote: You can estimate the collision rate by using N bits are for checking. So if your key is 64 bits, pretend it's only 60 bits and 4 bits are for collision testing. If the 4 bits do not match it was a collision. You can extrapolate to get the 64 bit collision rate estimate - each time you add a bit you can expect half the number of collisions. Don

But that won't work because in a hash collision, the hash signature (all 64 bits) are the same for two completely different positions... You need a key from another sequence of random numbers (be it from the same or different hash function). Am I missing something ?

Yes, what you are missing is that you are not testing the 64 bit key. You are testing a 60 bit key. In my example just pretend that you are generating a 60 bit key and then a totally independent 4 bit key. Let's say for example that the you get a collision on the 60 bit key once out of 1 million matches (as judged by the 4 bit verify key.) You can expect that had this been a 64 bit key instead of 60 bits you will get only 1/16 as many collisions since 2^4 is 16. Each extra bit cuts the number of expected collisions in half.

This is superior to any of the suggested methods proposed because it does not require modification of the program in order to add more key space, you just borrow a few bits from the already existing key. So this could be added to the program with no additional overhead. In fact the program could still continue to use the full 64 bit key while doing the 60 bit key collision detection test for statistical purposes and it could even be put in the log file of the program. If you use 4 bits for collision detection then you count how many of these 60 bit keys collided and divide by 16 to get an ESTIMATE of how often a 64 bit key would have collided.
