Zobrist Number Statistics and WHat to Look For

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

munchkin

Zobrist Number Statistics and WHat to Look For

Post by munchkin »

I've been working on a function to create a list of good Zobrist hash numbers. (Actually, just the 'key seeds' the function is not mine.) I understand that the desire is to get a very good set of random numbers, with a mean of 0.5 and a standard deviation of 0.288675. (I'm not too sure exactly why this is, but....) I can generate a list that gives me:

Mean Value: 0.499998
Standard Deviation: 0.288679

I can't seem to get it any better than that. But what I did was create a list of 32-bit integers -- not 64-bits. I planned to just 'combine' them into 64-bit hash numbers. Will this affect the results of my hashing? Or will the mean and standard deviation be kept intact, more or less?

What else should I consider when trying to produce a good list of Zobrist numbers? I assume that you have to make sure that there are no identical numbers in the list, but are there any other considerations to keep in mind? Is there some kind of real-world test that you can perform to see how your Zobrist numbers will actually perform? Or can you be confident that your numbers will perform well if the statistics are good? (Mean and Standard Deviation.)
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Zobrist Number Statistics and WHat to Look For

Post by hgm »

None of this has the slightest effect on your collision rate. What you could test for is sub-groups of a small number of keys that XOR to 0. If you have 3 keys A, B and C and A^B^C == 0, that would be very bad. (Unless they woul all be for different squares of the same King.) But usually you would not find groupsof 6 suc 'dependent' keys evenwith 32-bit keys, and trying groups of 7 or 8 is too time consuming todo it exhaustively. With 64-bit keys it seemspretty hopeless to findadependence ever, as you might need tocombine 12 or more keys.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: Zobrist Number Statistics and WHat to Look For

Post by syzygy »

munchkin wrote:But what I did was create a list of 32-bit integers -- not 64-bits. I planned to just 'combine' them into 64-bit hash numbers. Will this affect the results of my hashing?
As long as the 32-bit integers are random (independent and uniformly distributed), this is absolutely fine. You could even just generate a long list of random bits, then form your Zobrist keys from blocks of 64 bits. As long as the bits are random, this is exactly as good as generating good random 64-bit integers directly.
Or will the mean and standard deviation be kept intact, more or less?
When you have a good set of random numbers, their mean and standard deviation will be around the numbers you got as the result of them being random. Or rather, as the result of them being uniformly distributed. The other way around this is not the case!! I can easily produce 32-bit numbers that have a mean of 0.5 (after dividing by 2^32) and the correct standard deviation, but that are absolutely not random. For example: 32768 + k * 65536, with k ranging from 0 to 65535. This set of numbers will give very bad hashing results.

Once you have generated your random numbers, the only reason to look at their mean and standard deviation is for a sanity check. If the mean is completely off, something is wrong with your random generator. But a more useful check is to verify that not all your numbers are even (I remember someone posting his Zobrist keys here that all turned out to be divisible by 1024 or so... that is a sign of a very bad random number generator).

A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.
User avatar
Ajedrecista
Posts: 1971
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Zobrist Number Statistics and what to look for.

Post by Ajedrecista »

Hello Andrew:
munchkin wrote:I've been working on a function to create a list of good Zobrist hash numbers. (Actually, just the 'key seeds' the function is not mine.) I understand that the desire is to get a very good set of random numbers, with a mean of 0.5 and a standard deviation of 0.288675. (I'm not too sure exactly why this is, but....) I can generate a list that gives me:

Mean Value: 0.499998
Standard Deviation: 0.288679
I can answer to your question of why the standard deviation must be around 0.288675: as random numbers are in the range [0, 1] (maybe 0 and/or 1 are not in the interval), you want an uniform distribution.

Uniform distribution (continuous)

Uniform distribution (discrete)

If you have an uniform distribution between x = a and x = b (b > a) then:

Code: Select all

Mean = (a + b)/2
Variance = (b - a)²/12
Standard deviation = sqrt(variance)
In this case: a = 0 and b = 1:

Code: Select all

Mean = (0 + 1)/2 = 0.5
Standard deviation = (1 - 0)/sqrt(12) = 1/sqrt(12) ~ 0.288675
By the way, I think that you got very good results!

Regards from Spain.

Ajedrecista.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Zobrist Number Statistics and WHat to Look For

Post by diep »

munchkin wrote:I've been working on a function to create a list of good Zobrist hash numbers. (Actually, just the 'key seeds' the function is not mine.) I understand that the desire is to get a very good set of random numbers, with a mean of 0.5 and a standard deviation of 0.288675. (I'm not too sure exactly why this is, but....) I can generate a list that gives me:

Mean Value: 0.499998
Standard Deviation: 0.288679

I can't seem to get it any better than that. But what I did was create a list of 32-bit integers -- not 64-bits. I planned to just 'combine' them into 64-bit hash numbers. Will this affect the results of my hashing? Or will the mean and standard deviation be kept intact, more or less?

What else should I consider when trying to produce a good list of Zobrist numbers? I assume that you have to make sure that there are no identical numbers in the list, but are there any other considerations to keep in mind? Is there some kind of real-world test that you can perform to see how your Zobrist numbers will actually perform? Or can you be confident that your numbers will perform well if the statistics are good? (Mean and Standard Deviation.)
If you generate 64 bits random numbers you know your problem has been solved. Generating 32 bits numbers you know you risk a multilineair relationship. Still you post here you generate 32 bits numbers.

Why are you posting it like this?

Empirically we can show that using 32 bits numbers in a sequential manner (2 adjacent numbers forming a 64 bits number with the top 32 bits number just a few cycles away from the 32 lowbits) is not safe if the same SIMPLE generator generates them, if you are going to use the generated 64 bits number 'as is' and run it on a couple of dozens of cores during a game; provided that you want 0 errors.

I know many authors who simply don't care for a bunch of errors, in which case i bet some 8 bits RNG's will do. If you care getting 0, then this is a problem combining 32 bits numbers, even if that's mersenne twister.

Of course in some cases you can deliver a simplistic proof why this isn't safe.

Suppose we have a function R that is the few cycle operation that happens after each randomization.

So we have then entries of 64 bits consisting out of 32 bits that fulfill:

{R(x), x}

And we're gonna XOR up to 32 numbers that we can pick ourselves.

So effectively your hashkey of 64 bits loses you a bunch of bits compared to a fully semi-random 64 bits number.

This loss of a few bits you will notice in computerchess, provided that R(x) is a simple function. Of course it's always possible to design a R(x) that's strong enough, yet is that an interesting discussion here?

For some it will be - they just want to discuss.

In my case of course - i'm using 128 bits Zobrist in Diep from which i use effectively around a 80 bits in storage.

That might give you a clue why the idea of {R(x),x} is not a good one.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Zobrist Number Statistics and WHat to Look For

Post by bob »

syzygy wrote:
munchkin wrote:But what I did was create a list of 32-bit integers -- not 64-bits. I planned to just 'combine' them into 64-bit hash numbers. Will this affect the results of my hashing?
As long as the 32-bit integers are random (independent and uniformly distributed), this is absolutely fine. You could even just generate a long list of random bits, then form your Zobrist keys from blocks of 64 bits. As long as the bits are random, this is exactly as good as generating good random 64-bit integers directly.
Or will the mean and standard deviation be kept intact, more or less?
When you have a good set of random numbers, their mean and standard deviation will be around the numbers you got as the result of them being random. Or rather, as the result of them being uniformly distributed. The other way around this is not the case!! I can easily produce 32-bit numbers that have a mean of 0.5 (after dividing by 2^32) and the correct standard deviation, but that are absolutely not random. For example: 32768 + k * 65536, with k ranging from 0 to 65535. This set of numbers will give very bad hashing results.

Once you have generated your random numbers, the only reason to look at their mean and standard deviation is for a sanity check. If the mean is completely off, something is wrong with your random generator. But a more useful check is to verify that not all your numbers are even (I remember someone posting his Zobrist keys here that all turned out to be divisible by 1024 or so... that is a sign of a very bad random number generator).

A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.
That's a beginner's flaw. The mean of 100 random numbers means absolutely nothing. Doesn't begin to prove the numbers are not random. Nor does it prove they are. In fact, out of a lot of generated sets of 100 numbers, you would expect a set to have a mean of exactly .5, just as you would expect one set to have a set of sequential values. Out of a string of 52, you had BETTER find a consecutive string of 5 numbers every now and then, as the probability of a royal flush in poker is not 0.0...

Your logic fails badly, because that means the 100th number is NEVER random, since it will cause the rest of the group to average SOME value or another... If you pick a number out of thin air, such as .55, you can use the same argument, but it is still wrong... UNLESS you actually generate the numbers such that they average .5... Just because they do is meaningless, however.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Zobrist Number Statistics and WHat to Look For

Post by diep »

bob wrote:
syzygy wrote:
munchkin wrote:But what I did was create a list of 32-bit integers -- not 64-bits. I planned to just 'combine' them into 64-bit hash numbers. Will this affect the results of my hashing?
As long as the 32-bit integers are random (independent and uniformly distributed), this is absolutely fine. You could even just generate a long list of random bits, then form your Zobrist keys from blocks of 64 bits. As long as the bits are random, this is exactly as good as generating good random 64-bit integers directly.
Or will the mean and standard deviation be kept intact, more or less?
When you have a good set of random numbers, their mean and standard deviation will be around the numbers you got as the result of them being random. Or rather, as the result of them being uniformly distributed. The other way around this is not the case!! I can easily produce 32-bit numbers that have a mean of 0.5 (after dividing by 2^32) and the correct standard deviation, but that are absolutely not random. For example: 32768 + k * 65536, with k ranging from 0 to 65535. This set of numbers will give very bad hashing results.

Once you have generated your random numbers, the only reason to look at their mean and standard deviation is for a sanity check. If the mean is completely off, something is wrong with your random generator. But a more useful check is to verify that not all your numbers are even (I remember someone posting his Zobrist keys here that all turned out to be divisible by 1024 or so... that is a sign of a very bad random number generator).

A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.
That's a beginner's flaw. The mean of 100 random numbers means absolutely nothing. Doesn't begin to prove the numbers are not random. Nor does it prove they are. In fact, out of a lot of generated sets of 100 numbers, you would expect a set to have a mean of exactly .5, just as you would expect one set to have a set of sequential values. Out of a string of 52, you had BETTER find a consecutive string of 5 numbers every now and then, as the probability of a royal flush in poker is not 0.0...

Your logic fails badly, because that means the 100th number is NEVER random, since it will cause the rest of the group to average SOME value or another... If you pick a number out of thin air, such as .55, you can use the same argument, but it is still wrong... UNLESS you actually generate the numbers such that they average .5... Just because they do is meaningless, however.
Correct that the mean doesn't say much for Zobrist Hashing.

Very interesting is however the other way around. Suppose there exists intelligent other forms of life in universe which are communicating in space somehow.

They wouldn't want early forms of life which didn't mature yet, to obtain their knowledge, as that would mean for such early lifeforms they directly have weapons to kill each other. See how close Kennedy was to nearly wipe out entire USA, Russia and Europe in a 3d world war - enough books on that; it's that the Russian response was appropriate - yet at sea they would've cleansweeped the ocean as some years ago became clear. Every single sovjet submarine commander had nuclear torpedo's with a small warhead - enough to sink anything.

A single accident between the ships or a misunderstanding of what was happening - and world war 3 would have started.

Obviously more intelligent life forms would not want to risk that their knowledge in the hands of some individuals, for example from experiment earth, would cause a new form of life would cease to exist and/or would mutually wipe out each other. So they would encrypt their communication.

This is a likely scenario under that assumption. Again it's an assumption.

So then the next question is: is there a way to RECOGNIZE whether seemingly random noise in fact is just encrypted data?

One way to do that IMHO is the assumption that most semi-random number generators have a perfect spread. Too perfect.

If you start with a large array of bits say sized 2 ^ n with n being a tad bigger and 2^n / 8 being the largest block of memory you can allocate;
then you go generate random numbers and fill up the memory block. If it has been perfectly filled you calculate what time it took to fill it up.

Typically perfect spreads of semi random number generators fill up the block in O ( n log n ).

pseudo C code to show it for full bytes (more efficient is use 1 bit of course instead of 8 bits to store 1 bit of info):

Code: Select all

done = 0;
nentries = topower(2,n);
for ( i = 0; i < nentries; i++ ) 
  buffer&#91;i&#93; = 0;
didn = 0;

while&#40; done < nentries ) &#123;
   didn++;
   rand = getnextrand&#40;);
   indexnr = rand & &#40;nentries - 1&#41;;
   if&#40; !buffer&#91;indexnr&#93; ) &#123;
     done++;
     buffer&#91;indexnr&#93; = 1;
   &#125;
     
&#125;

printf&#40;"Finished needed n=%llu tries to fill array of 2 to power %u\n",didn,n&#41;;
now 'didn' should be smaller than or about equal to O( nentries log nentries ) which we call of course O ( n log n ), to give indication that our data might be not truely random, yet artificial created.
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: Zobrist Number Statistics and WHat to Look For

Post by rbarreira »

To repeat what others have stated - if you are deliberately affecting or selecting your numbers (e.g. in order to get the "mean" you want), they're no longer random.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Zobrist Number Statistics and WHat to Look For

Post by AlvaroBegue »

As others have said, what you want to use is random numbers. In order to generate a table of high-quality random numbers, what I have done in the past is generate several such tables by different methods, and then XOR them together. The resulting table will be at least as random as the most random of the ingredients.

Some ideas of where to get random numbers (in decreasing order of quality):
* /dev/random in Linux (you may have to type on your keyboard for a while for the OS to gain enough entropy)
* /dev/urandom in Linux (for this one you don't need to wait)
* rand() from the Linux C library
* Some chunk of a large compressed file
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Zobrist Number Statistics and WHat to Look For

Post by wgarvin »

AlvaroBegue wrote: * Some chunk of a large compressed file
Be careful with this one. You need something like LZMA (7-Zip) that uses an arithmetic coder for the last step because that effectively almost-randomizes the bits. Don't use anything that uses Huffman coding, because it might give you long runs of predictable bits [winzip deflate, PDF, etc]. Arithmetic coder with any dynamic model will give you effectively random bits, even if the model is not too great. Taking bits from a lossy .JPG file should be fine, but don't use a .PNG file. etc.


An easy way to generate good Zobrist keys is to grab a good PRNG like the Mersenne Twister and just use that. Don't ever use the built-in C library rand() stuff unless you only use like 12 bits from each call... many crappy rand() implementations exist out there. Something like the Mersenne Twister is much better. Any PRNG that can pass something like the Diehard tests is definitely good enough.

AFAIK, nobody here has shown a way to generate a significantly better set of Zobrist keys than just a set of random or pseudo-random numbers. Optimizing them for specific properties (e.g. knowing that you will seldom have more than two black bishops on the board at a time, you might optimize all of the bishop keys to mix better with other pieces at a cost of being weaker with >3 bishops...) might be interesting but I don't think anyone has demonstrated a significant benefit from something like that.