Zobrist Number Statistics and WHat to Look For
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Zobrist Number Statistics and WHat to Look For
I've been working on a function to create a list of good Zobrist hash numbers. (Actually, just the 'key seeds' the function is not mine.) I understand that the desire is to get a very good set of random numbers, with a mean of 0.5 and a standard deviation of 0.288675. (I'm not too sure exactly why this is, but....) I can generate a list that gives me:
Mean Value: 0.499998
Standard Deviation: 0.288679
I can't seem to get it any better than that. But what I did was create a list of 32bit integers  not 64bits. I planned to just 'combine' them into 64bit hash numbers. Will this affect the results of my hashing? Or will the mean and standard deviation be kept intact, more or less?
What else should I consider when trying to produce a good list of Zobrist numbers? I assume that you have to make sure that there are no identical numbers in the list, but are there any other considerations to keep in mind? Is there some kind of realworld test that you can perform to see how your Zobrist numbers will actually perform? Or can you be confident that your numbers will perform well if the statistics are good? (Mean and Standard Deviation.)
Mean Value: 0.499998
Standard Deviation: 0.288679
I can't seem to get it any better than that. But what I did was create a list of 32bit integers  not 64bits. I planned to just 'combine' them into 64bit hash numbers. Will this affect the results of my hashing? Or will the mean and standard deviation be kept intact, more or less?
What else should I consider when trying to produce a good list of Zobrist numbers? I assume that you have to make sure that there are no identical numbers in the list, but are there any other considerations to keep in mind? Is there some kind of realworld test that you can perform to see how your Zobrist numbers will actually perform? Or can you be confident that your numbers will perform well if the statistics are good? (Mean and Standard Deviation.)
 hgm
 Posts: 23086
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: Zobrist Number Statistics and WHat to Look For
None of this has the slightest effect on your collision rate. What you could test for is subgroups of a small number of keys that XOR to 0. If you have 3 keys A, B and C and A^B^C == 0, that would be very bad. (Unless they woul all be for different squares of the same King.) But usually you would not find groupsof 6 suc 'dependent' keys evenwith 32bit keys, and trying groups of 7 or 8 is too time consuming todo it exhaustively. With 64bit keys it seemspretty hopeless to findadependence ever, as you might need tocombine 12 or more keys.
Re: Zobrist Number Statistics and WHat to Look For
As long as the 32bit integers are random (independent and uniformly distributed), this is absolutely fine. You could even just generate a long list of random bits, then form your Zobrist keys from blocks of 64 bits. As long as the bits are random, this is exactly as good as generating good random 64bit integers directly.munchkin wrote:But what I did was create a list of 32bit integers  not 64bits. I planned to just 'combine' them into 64bit hash numbers. Will this affect the results of my hashing?
When you have a good set of random numbers, their mean and standard deviation will be around the numbers you got as the result of them being random. Or rather, as the result of them being uniformly distributed. The other way around this is not the case!! I can easily produce 32bit numbers that have a mean of 0.5 (after dividing by 2^32) and the correct standard deviation, but that are absolutely not random. For example: 32768 + k * 65536, with k ranging from 0 to 65535. This set of numbers will give very bad hashing results.Or will the mean and standard deviation be kept intact, more or less?
Once you have generated your random numbers, the only reason to look at their mean and standard deviation is for a sanity check. If the mean is completely off, something is wrong with your random generator. But a more useful check is to verify that not all your numbers are even (I remember someone posting his Zobrist keys here that all turned out to be divisible by 1024 or so... that is a sign of a very bad random number generator).
A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.
 Ajedrecista
 Posts: 1373
 Joined: Wed Jul 13, 2011 7:04 pm
 Location: Madrid, Spain.
 Contact:
Re: Zobrist Number Statistics and what to look for.
Hello Andrew:
Uniform distribution (continuous)
Uniform distribution (discrete)
If you have an uniform distribution between x = a and x = b (b > a) then:
In this case: a = 0 and b = 1:
By the way, I think that you got very good results!
Regards from Spain.
Ajedrecista.
I can answer to your question of why the standard deviation must be around 0.288675: as random numbers are in the range [0, 1] (maybe 0 and/or 1 are not in the interval), you want an uniform distribution.munchkin wrote:I've been working on a function to create a list of good Zobrist hash numbers. (Actually, just the 'key seeds' the function is not mine.) I understand that the desire is to get a very good set of random numbers, with a mean of 0.5 and a standard deviation of 0.288675. (I'm not too sure exactly why this is, but....) I can generate a list that gives me:
Mean Value: 0.499998
Standard Deviation: 0.288679
Uniform distribution (continuous)
Uniform distribution (discrete)
If you have an uniform distribution between x = a and x = b (b > a) then:
Code: Select all
Mean = (a + b)/2
Variance = (b  a)²/12
Standard deviation = sqrt(variance)
Code: Select all
Mean = (0 + 1)/2 = 0.5
Standard deviation = (1  0)/sqrt(12) = 1/sqrt(12) ~ 0.288675
Regards from Spain.
Ajedrecista.
Re: Zobrist Number Statistics and WHat to Look For
If you generate 64 bits random numbers you know your problem has been solved. Generating 32 bits numbers you know you risk a multilineair relationship. Still you post here you generate 32 bits numbers.munchkin wrote:I've been working on a function to create a list of good Zobrist hash numbers. (Actually, just the 'key seeds' the function is not mine.) I understand that the desire is to get a very good set of random numbers, with a mean of 0.5 and a standard deviation of 0.288675. (I'm not too sure exactly why this is, but....) I can generate a list that gives me:
Mean Value: 0.499998
Standard Deviation: 0.288679
I can't seem to get it any better than that. But what I did was create a list of 32bit integers  not 64bits. I planned to just 'combine' them into 64bit hash numbers. Will this affect the results of my hashing? Or will the mean and standard deviation be kept intact, more or less?
What else should I consider when trying to produce a good list of Zobrist numbers? I assume that you have to make sure that there are no identical numbers in the list, but are there any other considerations to keep in mind? Is there some kind of realworld test that you can perform to see how your Zobrist numbers will actually perform? Or can you be confident that your numbers will perform well if the statistics are good? (Mean and Standard Deviation.)
Why are you posting it like this?
Empirically we can show that using 32 bits numbers in a sequential manner (2 adjacent numbers forming a 64 bits number with the top 32 bits number just a few cycles away from the 32 lowbits) is not safe if the same SIMPLE generator generates them, if you are going to use the generated 64 bits number 'as is' and run it on a couple of dozens of cores during a game; provided that you want 0 errors.
I know many authors who simply don't care for a bunch of errors, in which case i bet some 8 bits RNG's will do. If you care getting 0, then this is a problem combining 32 bits numbers, even if that's mersenne twister.
Of course in some cases you can deliver a simplistic proof why this isn't safe.
Suppose we have a function R that is the few cycle operation that happens after each randomization.
So we have then entries of 64 bits consisting out of 32 bits that fulfill:
{R(x), x}
And we're gonna XOR up to 32 numbers that we can pick ourselves.
So effectively your hashkey of 64 bits loses you a bunch of bits compared to a fully semirandom 64 bits number.
This loss of a few bits you will notice in computerchess, provided that R(x) is a simple function. Of course it's always possible to design a R(x) that's strong enough, yet is that an interesting discussion here?
For some it will be  they just want to discuss.
In my case of course  i'm using 128 bits Zobrist in Diep from which i use effectively around a 80 bits in storage.
That might give you a clue why the idea of {R(x),x} is not a good one.
Re: Zobrist Number Statistics and WHat to Look For
That's a beginner's flaw. The mean of 100 random numbers means absolutely nothing. Doesn't begin to prove the numbers are not random. Nor does it prove they are. In fact, out of a lot of generated sets of 100 numbers, you would expect a set to have a mean of exactly .5, just as you would expect one set to have a set of sequential values. Out of a string of 52, you had BETTER find a consecutive string of 5 numbers every now and then, as the probability of a royal flush in poker is not 0.0...syzygy wrote:As long as the 32bit integers are random (independent and uniformly distributed), this is absolutely fine. You could even just generate a long list of random bits, then form your Zobrist keys from blocks of 64 bits. As long as the bits are random, this is exactly as good as generating good random 64bit integers directly.munchkin wrote:But what I did was create a list of 32bit integers  not 64bits. I planned to just 'combine' them into 64bit hash numbers. Will this affect the results of my hashing?
When you have a good set of random numbers, their mean and standard deviation will be around the numbers you got as the result of them being random. Or rather, as the result of them being uniformly distributed. The other way around this is not the case!! I can easily produce 32bit numbers that have a mean of 0.5 (after dividing by 2^32) and the correct standard deviation, but that are absolutely not random. For example: 32768 + k * 65536, with k ranging from 0 to 65535. This set of numbers will give very bad hashing results.Or will the mean and standard deviation be kept intact, more or less?
Once you have generated your random numbers, the only reason to look at their mean and standard deviation is for a sanity check. If the mean is completely off, something is wrong with your random generator. But a more useful check is to verify that not all your numbers are even (I remember someone posting his Zobrist keys here that all turned out to be divisible by 1024 or so... that is a sign of a very bad random number generator).
A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.
Your logic fails badly, because that means the 100th number is NEVER random, since it will cause the rest of the group to average SOME value or another... If you pick a number out of thin air, such as .55, you can use the same argument, but it is still wrong... UNLESS you actually generate the numbers such that they average .5... Just because they do is meaningless, however.
Re: Zobrist Number Statistics and WHat to Look For
Correct that the mean doesn't say much for Zobrist Hashing.bob wrote:That's a beginner's flaw. The mean of 100 random numbers means absolutely nothing. Doesn't begin to prove the numbers are not random. Nor does it prove they are. In fact, out of a lot of generated sets of 100 numbers, you would expect a set to have a mean of exactly .5, just as you would expect one set to have a set of sequential values. Out of a string of 52, you had BETTER find a consecutive string of 5 numbers every now and then, as the probability of a royal flush in poker is not 0.0...syzygy wrote:As long as the 32bit integers are random (independent and uniformly distributed), this is absolutely fine. You could even just generate a long list of random bits, then form your Zobrist keys from blocks of 64 bits. As long as the bits are random, this is exactly as good as generating good random 64bit integers directly.munchkin wrote:But what I did was create a list of 32bit integers  not 64bits. I planned to just 'combine' them into 64bit hash numbers. Will this affect the results of my hashing?
When you have a good set of random numbers, their mean and standard deviation will be around the numbers you got as the result of them being random. Or rather, as the result of them being uniformly distributed. The other way around this is not the case!! I can easily produce 32bit numbers that have a mean of 0.5 (after dividing by 2^32) and the correct standard deviation, but that are absolutely not random. For example: 32768 + k * 65536, with k ranging from 0 to 65535. This set of numbers will give very bad hashing results.Or will the mean and standard deviation be kept intact, more or less?
Once you have generated your random numbers, the only reason to look at their mean and standard deviation is for a sanity check. If the mean is completely off, something is wrong with your random generator. But a more useful check is to verify that not all your numbers are even (I remember someone posting his Zobrist keys here that all turned out to be divisible by 1024 or so... that is a sign of a very bad random number generator).
A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.
Your logic fails badly, because that means the 100th number is NEVER random, since it will cause the rest of the group to average SOME value or another... If you pick a number out of thin air, such as .55, you can use the same argument, but it is still wrong... UNLESS you actually generate the numbers such that they average .5... Just because they do is meaningless, however.
Very interesting is however the other way around. Suppose there exists intelligent other forms of life in universe which are communicating in space somehow.
They wouldn't want early forms of life which didn't mature yet, to obtain their knowledge, as that would mean for such early lifeforms they directly have weapons to kill each other. See how close Kennedy was to nearly wipe out entire USA, Russia and Europe in a 3d world war  enough books on that; it's that the Russian response was appropriate  yet at sea they would've cleansweeped the ocean as some years ago became clear. Every single sovjet submarine commander had nuclear torpedo's with a small warhead  enough to sink anything.
A single accident between the ships or a misunderstanding of what was happening  and world war 3 would have started.
Obviously more intelligent life forms would not want to risk that their knowledge in the hands of some individuals, for example from experiment earth, would cause a new form of life would cease to exist and/or would mutually wipe out each other. So they would encrypt their communication.
This is a likely scenario under that assumption. Again it's an assumption.
So then the next question is: is there a way to RECOGNIZE whether seemingly random noise in fact is just encrypted data?
One way to do that IMHO is the assumption that most semirandom number generators have a perfect spread. Too perfect.
If you start with a large array of bits say sized 2 ^ n with n being a tad bigger and 2^n / 8 being the largest block of memory you can allocate;
then you go generate random numbers and fill up the memory block. If it has been perfectly filled you calculate what time it took to fill it up.
Typically perfect spreads of semi random number generators fill up the block in O ( n log n ).
pseudo C code to show it for full bytes (more efficient is use 1 bit of course instead of 8 bits to store 1 bit of info):
Code: Select all
done = 0;
nentries = topower(2,n);
for ( i = 0; i < nentries; i++ )
buffer[i] = 0;
didn = 0;
while( done < nentries ) {
didn++;
rand = getnextrand();
indexnr = rand & (nentries  1);
if( !buffer[indexnr] ) {
done++;
buffer[indexnr] = 1;
}
}
printf("Finished needed n=%llu tries to fill array of 2 to power %u\n",didn,n);
Re: Zobrist Number Statistics and WHat to Look For
To repeat what others have stated  if you are deliberately affecting or selecting your numbers (e.g. in order to get the "mean" you want), they're no longer random.

 Posts: 913
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: Zobrist Number Statistics and WHat to Look For
As others have said, what you want to use is random numbers. In order to generate a table of highquality random numbers, what I have done in the past is generate several such tables by different methods, and then XOR them together. The resulting table will be at least as random as the most random of the ingredients.
Some ideas of where to get random numbers (in decreasing order of quality):
* /dev/random in Linux (you may have to type on your keyboard for a while for the OS to gain enough entropy)
* /dev/urandom in Linux (for this one you don't need to wait)
* rand() from the Linux C library
* Some chunk of a large compressed file
Some ideas of where to get random numbers (in decreasing order of quality):
* /dev/random in Linux (you may have to type on your keyboard for a while for the OS to gain enough entropy)
* /dev/urandom in Linux (for this one you don't need to wait)
* rand() from the Linux C library
* Some chunk of a large compressed file
Re: Zobrist Number Statistics and WHat to Look For
Be careful with this one. You need something like LZMA (7Zip) that uses an arithmetic coder for the last step because that effectively almostrandomizes the bits. Don't use anything that uses Huffman coding, because it might give you long runs of predictable bits [winzip deflate, PDF, etc]. Arithmetic coder with any dynamic model will give you effectively random bits, even if the model is not too great. Taking bits from a lossy .JPG file should be fine, but don't use a .PNG file. etc.AlvaroBegue wrote: * Some chunk of a large compressed file
An easy way to generate good Zobrist keys is to grab a good PRNG like the Mersenne Twister and just use that. Don't ever use the builtin C library rand() stuff unless you only use like 12 bits from each call... many crappy rand() implementations exist out there. Something like the Mersenne Twister is much better. Any PRNG that can pass something like the Diehard tests is definitely good enough.
AFAIK, nobody here has shown a way to generate a significantly better set of Zobrist keys than just a set of random or pseudorandom numbers. Optimizing them for specific properties (e.g. knowing that you will seldom have more than two black bishops on the board at a time, you might optimize all of the bishop keys to mix better with other pieces at a cost of being weaker with >3 bishops...) might be interesting but I don't think anyone has demonstrated a significant benefit from something like that.