Zobrist keys  measure of quality?
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Zobrist keys  measure of quality?
I always thought that one can't do better than using "truly" random keys (or at least using statistically good PRNG  read passing various statistical tests).
However as Steven showed recently (while working on Oscar)  even PRNG that fails some statistical tests is good enough for zobrist keys that just work.
So I tried to mesasure something else  given any two features (=keys) I measured maximum number of same bits.
In my case, I have 1049 keys (yes I know it's much more than necessary) and for my PRNG it was 50, for random numbers I got from random.org I got 49.
The average is obviously very close to 32 (half of my key size).
Then I used a brute force loop to try to improve and managed to reduce this so that no two features have more than 42 common bits,
with the idea that this should (in theory) lower the possibility of cancellation,
but one would have to conduct collision tests if this is indeed true (which I haven't).
However as Steven showed recently (while working on Oscar)  even PRNG that fails some statistical tests is good enough for zobrist keys that just work.
So I tried to mesasure something else  given any two features (=keys) I measured maximum number of same bits.
In my case, I have 1049 keys (yes I know it's much more than necessary) and for my PRNG it was 50, for random numbers I got from random.org I got 49.
The average is obviously very close to 32 (half of my key size).
Then I used a brute force loop to try to improve and managed to reduce this so that no two features have more than 42 common bits,
with the idea that this should (in theory) lower the possibility of cancellation,
but one would have to conduct collision tests if this is indeed true (which I haven't).
Re: Zobrist keys  measure of quality?
I once looked at this briefly, and came to the conclusion that it doesn't really matter.
There is some discussion on the topic at https://chessprogramming.wikispaces.com/Zobrist+Hashing under "Linear independence".
There is some discussion on the topic at https://chessprogramming.wikispaces.com/Zobrist+Hashing under "Linear independence".
 hgm
 Posts: 26246
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: Zobrist keys  measure of quality?
This theory has already been debunked here many times. E.g. with four 12bit keys, 0xF00, 0x0F0, 0x00F and 0xFFF any pair has 8 out of 12 different (the 'Hamming distance'), while they are dependent. (The XOR of all of them is 0.) While a simple 1, 2, 4, 8 would have provided collisionfree keys, while they would all have 10 bits in common.mar wrote:..., with the idea that this should (in theory) lower the possibility of cancellation,
In fact there exists an algorithm to generate a set of keys with maximum Hamming distance. This results in a set where any key is the XOR of 2 others. Which is absolutely the worst possible key set.
A good way to test the quality of your key set is to store all XORs of 3 keys in a hash table (using that XOR as the hash). With 1000 keys you get 166M XORs. If two diffent sets give the same XOR (i.e. you get a collission in that hash table) your key set contains a 6thorder dependency, which is pretty bad. To detect 7thorder dependencies you can check for all products of 4 keys if it occurs in the table. (This takes very long with 1000 keys, as you have ~40G combinations.)
Re: Zobrist keys  measure of quality?
Thanks, so now I see that it's nonsense (or wrong measure)... Well... I also found in the links posted by Evert that it has been discussed already in 2001.hgm wrote: This theory has already been debunked here many times. E.g. with four 12bit keys, 0xF00, 0x0F0, 0x00F and 0xFFF any pair has 8 out of 12 different (the 'Hamming distance'), while they are dependent. (The XOR of all of them is 0.) While a simple 1, 2, 4, 8 would have provided collisionfree keys, while they would all have 10 bits in common.

 Posts: 719
 Joined: Tue May 22, 2007 9:13 am
Re: Zobrist keys  measure of quality?
Zobrist hashing is also known as Tabulation hashing in the literature, and in the last few years it has been shown that it has a lot of nice statistical properties, even though Zobrist hashing is not even 4independent in general. See e.g. http://arxiv.org/pdf/1011.5200.pdf or google for the authors' names for more accessible introductions.hgm wrote:This theory has already been debunked here many times. E.g. with four 12bit keys, 0xF00, 0x0F0, 0x00F and 0xFFF any pair has 8 out of 12 different (the 'Hamming distance'), while they are dependent. (The XOR of all of them is 0.) While a simple 1, 2, 4, 8 would have provided collisionfree keys, while they would all have 10 bits in common.mar wrote:..., with the idea that this should (in theory) lower the possibility of cancellation,
In fact there exists an algorithm to generate a set of keys with maximum Hamming distance. This results in a set where any key is the XOR of 2 others. Which is absolutely the worst possible key set.
A good way to test the quality of your key set is to store all XORs of 3 keys in a hash table (using that XOR as the hash). With 1000 keys you get 166M XORs. If two diffent sets give the same XOR (i.e. you get a collission in that hash table) your key set contains a 6thorder dependency, which is pretty bad. To detect 7thorder dependencies you can check for all products of 4 keys if it occurs in the table. (This takes very long with 1000 keys, as you have ~40G combinations.)
Re: Zobrist keys  measure of quality?
Whether it would be nonsense or not, I am using 64 bit zobrist atoms consisting of 32 0bits and 32 1bits. Thus applying a xored Zobrist atom always will switch 32 bits.mar wrote:Thanks, so now I see that it's nonsense (or wrong measure)... Well... I also found in the links posted by Evert that it has been discussed already in 2001.
Random values traditionally chosen as zobrist atoms could in extrem encounter a zero value which would then switch nothing at all. Thus I concluded, the bigger the absolute difference between zero and onebits, the greater the chance of producing indentic overlayed keys.
This is pure intention, but I am sure, that using such equal bitted (randomized) Zobrist atoms as I do cannot harm anything.

 Posts: 928
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: Zobrist keys  measure of quality?
I studied the number of collisions for a hash function in a different setting (hashing short strings, like ticker symbols in the stock market) and I found that cyclic redundancy codes (CRC) performed better than could be expected from purely random numbers. You can describe CRC as a Zobrist key, by making the ith entry in the Zobrist table be the coefficients of the polynomial x^(i+64) modulo P(x), where P(x) is a 65th degree polynomial.
If you want to test out this idea, here's some C++ code to generate the Zobrist table:
If you want to test out this idea, here's some C++ code to generate the Zobrist table:
Code: Select all
#include <cstdio>
int main() {
unsigned long long const P = 0x42F0E1EBA9EA3693ull;
unsigned long long h = P;
for (int i = 0; i < 1000; ++i) {
std::printf("%016llx\n", h);
h = (h & 0x8000000000000000u) ? (h << 1) ^ P : h << 1;
}
}
Re: Zobrist keys  measure of quality?
Interesting discussion. This seems like a good forum for the sort of programming I am doing at the moment. Has anyone done any work on quasirandom numbers and hash tables? I tried Marsaglia's simple Weyl number idea with hash tables. It seems better than using a linear probe offset of 1.

 Posts: 928
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: Zobrist keys  measure of quality?
Arg, there is a typo and the forum won't let me edit the message. Here's the corrected code:
Code: Select all
#include <cstdio>
int main() {
unsigned long long const P = 0x42F0E1EBA9EA3693ull;
unsigned long long h = P;
for (int i = 0; i < 1000; ++i) {
std::printf("%016llx\n", h);
h = (h & 0x8000000000000000ull) ? (h << 1) ^ P : (h << 1);
}
}
Re: Zobrist keys  measure of quality?
Interesting, I did a similar test of hash functions some time ago (looking for a good generalpurpose hashing function) and found that CRC32 indeed has pretty good distribution (but also quite slow unless you use slicing tables).AlvaroBegue wrote:I studied the number of collisions for a hash function in a different setting (hashing short strings, like ticker symbols in the stock market) and I found that cyclic redundancy codes (CRC) performed better than could be expected from purely random numbers.
I also found that Hsieh's SuperFastHash is not a) that super fast and b) produces more collisions that other hashes I tried.
I used english wordlist of 1.6M words, hashing strings.
I'm posting my hash function here (originally inspired by MurmurHash3), I hereby place it in public domain. It only has 32bit output though.
Another disadvantage right now is that it requires 32bit aligned input and is not incremental.
Also it passed smhasher tests.
Code: Select all
// "random" keys
// primes are essential for multiplicative keys
static const UInt KEY1 = 0xdb91908du; // prime
static const UInt KEY2 = 0x6be5be6fu; // prime
static const UInt KEY3 = 0x53a28fc9u; // prime
static const UInt KEY4 = 0xf8ffd50bu; // prime
// rotate (left, could be right, doesn't matter because left = 32right)
static inline UInt Rotate( UInt u, Byte amount )
{
CORE_ASSERT( amount && amount < 32 );
return (u << amount)  (u >> (Byte)(32amount));
}
// faster than Murmur3 32bit
static inline UInt MyHashUInt( UInt u, UInt h )
{
// passes smhasher tests
u = Rotate( u*KEY1, 1 );
return u = Rotate( h ^ KEY2, 11 );
}
static inline UInt MyHash( const void *buf, size_t len, UInt seed = 0 )
{
const Byte *b = (const Byte *)buf;
CORE_ASSERT( len < 4  !((UIntPtr)b & 3) );
const UInt *u = (const UInt *)b;
size_t ulen = len/4;
UInt h = KEY3 ^ seed;
h ^= len * KEY4;
b += ulen*4;
len &= 3;
while ( ulen ) {
h = MyHashUInt( *u++, h );
}
// process remaining data if any:
UInt last = 0;
switch( len )
{
case 3:
last = (UInt)b[2] << 16;
case 2:
last = (UInt)b[1] << 8;
case 1:
last = (UInt)b[0];
h = MyHashUInt( last, h );
default:
break;
}
// finalize (avalanche)
h ^= h >> 7;
h ^= h << 21;
return h;
}
It's also quite fast except that clang unfortunately missed to fold one of the rotations (I assume because another optimization is applied before rotfolding).
I will look at your code in more detail later, thanks.
EDIT: it also produces different results on little vs big endian machines but since I used it as generalpurpose hashing function for my hashset, I don't really care as long as it's fast and has good distribution.