Zobrist hashing for text

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Cardoso
Posts: 362
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Zobrist hashing for text

Post by Cardoso »

Who would thought of that?
Apparently someone did:
https://www.codeproject.com/Articles/12 ... on-ZobHash
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Zobrist hashing for text

Post by mar »

Yes zobrist hashing can be used outside of the domain of chess (and has been done before).
But hashing byte by byte can never be fast no matter what you do, this is why CRC is slow like hell even with tables.

Therefore I find it rather dubious considering the guy claims xxHash (easily the fastest and best hash function out there) is slower than his byte-by-byte zobrist hashing of text, but that's obviously nonsense.
xxHash does 32/64 bits at a time (actually 128/256 bit chunks) (fast hashes work this way, Murmur3 is also awesome).

So no idea what he did (most likely something rather suboptimal), but it's obvious his numbers don't make sense.

See here where Yann Collet evaluates performance of his xxHash:
https://github.com/Cyan4973/xxHash
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Zobrist hashing for text

Post by mar »

Here are my results:

Code: Select all

all hashes are 32-bit, no loop unrolling for zobrist/fnv, VS2017 64-bit C++ compiler
avg word len: 9.4 chars
total words: 428395

timings (best of 10):
xxHash: 5462 us (2% slower than FNV-1a, 23% faster than zobrist)
zobrist: 6743 us (26% slower than FNV-1a, 23% slower than xxHash)
FNV-1a: 5351 us (2% faster than xxHash, 26% faster than zobrist)

zobrist: 14 collisions (*depends on values, I used MT in C++)
xxHash (seed 0): 14 collisions (*depends on seed, not a typo - different 14)
FNV-1a: 19 collisions
So we can say that zobrist hashing of strings has good quality and is certainly comparable in performance.
BUT the figures the author gives are zobrist being 48% faster than xxHash and 70% faster than fnv-1a, I wonder how he came up with those...

xxHash has a bit of distadvantage of having internal state of 128-bits in this test, but it shines at hashing large buffers
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Zobrist hashing for text

Post by Ras »

mar wrote:But hashing byte by byte can never be fast no matter what you do, this is why CRC is slow like hell even with tables.
There are a number of processors out there that support CRC in hardware, and if you can leverage that, it should be somewhat faster. Besides, CRC is not really designed for hashing, but for detecting bit errors. Though in practice, it performs indeed reasonably well as hash function. I'm using a combined CRC-40 for the opening book to identify positions.
Therefore I find it rather dubious considering the guy claims xxHash (easily the fastest and best hash function out there) is slower than his byte-by-byte zobrist hashing of text
Taking a look at the xxHash sources, there is some unrolling that kicks in only for len>=16, but many English words are a lot shorter than that. So I'd expect xxHash to take a performance hit due to short inputs in this application, which are not what the xxHash author has used in his benchmarking.

The next thing is that the Zobrist guy has given benchmarks with and without string conversion to byte arrays, and the conversion itself seems to impact performance negatively. "Zobhash" is given with and without, and without, it is 5 times faster. The speedup for "GetHashCode .net" is even more - and the number of collisions also goes down without conversion. I'm not sure what this conversion actually does, but it seems to have a major influence to the extent that it dominates the execution time.