I addressed both of those concerns in my algorithm description. To summarize, in case you don't want to read the whole post describing the algorithm:sje wrote:I have serious doubts about the reliability of 64 bit signatures over extended calculation times. I've given examples of 64 bit false positives and have shown how the number of required signature bits is roughly proportional to log2(N) where N is the probe count. Others have posted on how their chess engines have had 64 bit false positives and how such are handled in a search.
A second area of doubt comes from the possibility of undetected I/O disk errors where an application could beat on disk more in a single month than what is their total lifetime projected use. While undetected hard drive I/O errors are uncommon with modern hardware, they are common enough that I've seen a couple of them.
I use 64+64 bit signatures, although some of the bits in the second signature are ignored. The number of ignored bits depend on the hash table size. I computed the probability for a hash collision ruining the computation to be 1/28000.
I/O errors are handled because all data is compressed using the xz program, which uses CRC32 checksums to detect data corruption.
I also addressed the cosmic ray concern by using a lockless hash table algorithm that is almost immune to random bit errors.