mar wrote:But hashing byte by byte can never be fast no matter what you do, this is why CRC is slow like hell even with tables.
There are a number of processors out there that support CRC in hardware, and if you can leverage that, it should be somewhat faster. Besides, CRC is not really designed for hashing, but for detecting bit errors. Though in practice, it performs indeed reasonably well as hash function. I'm using a combined CRC-40 for the opening book to identify positions.
Therefore I find it rather dubious considering the guy claims xxHash (easily the fastest and best hash function out there) is slower than his byte-by-byte zobrist hashing of text
Taking a look at the xxHash sources, there is some unrolling that kicks in only for len>=16, but many English words are a lot shorter than that. So I'd expect xxHash to take a performance hit due to short inputs in this application, which are not what the xxHash author has used in his benchmarking.
The next thing is that the Zobrist guy has given benchmarks with and without string conversion to byte arrays, and the conversion itself seems to impact performance negatively. "Zobhash" is given with and without, and without, it is 5 times faster. The speedup for "GetHashCode .net" is even more - and the number of collisions also goes down without conversion. I'm not sure what this conversion actually does, but it seems to have a major influence to the extent that it dominates the execution time.