I got an interesting post of
yours about frequency of collisions you made a long time ago. It seems to be the birthday paradox problem with different m and n.
Wikipedia also cites the same formula:
Code: Select all
Pr(n,m) = 1 - exp(m * m / 2 * n)
So for 64 bit keys (n = 2^64), one or more collision happens at the level of sqrt(n) as you said = 2^32 = 4G. It is not as bad as I thought. But that will require us to compare each key with the rest which makes it less likely to get a collision doing just that. It will probably take a lot more than 4G.
I understood that part but got lost with further discussion that involve hash tables. In this case the comparisons are applied only to the key stored at the same index. I can't even make a general statement as to whether using a larger/small hash table will increase or decrease the chances of finding a collision... I mean with larger tables the comparisons will be lower but then smaller tables would replace entries quite often. So what is an optimal size for that?
Another doubt I have is with using search to get more collisions since the keys will be more "dependent". If I was able to compare all the keys with each other, then I think the brute force approach that scans the set of keys uniformly would be better. Also supposedly zobrist XOR supposedly changes the key significantly even after a single move. So why would search that has dependency show more collisions than brute force? Infact it may narrow the test domain since correlated positions are being tested.
I thik that one XOR changes the hashkey as much as 12 moves (XORs) do so it may not be better to use search. It is just a doubt I have and use of hash tables for collision search is making it difficult for me to understand.