Zobrist key random numbers

bob · Post by **bob** » Thu Jan 22, 2009 12:05 am

Has anyone done any sort of study (excluding the Wendroff/Warnock paper in the JICCA) on the random numbers you use for hashing?

I was looking for an assignment for a parallel programming course, and decided to use the random number generation of the Zobrist values as a relatively simple project. Crafty needs 788 numbers. 768 are used by the piece tables, 6 for each side, 64 squares per piece which is 12 * 64 or 768. I also have 16 EP target square random numbers that are used to alter the hash when an EP capture on a specific square is possible. Now up to 784. I then have 4 more for castling rights, since each side can castle to either side.

I first tested the basic numbers I get from the PRNG I use in Crafty, and discovered that the min hamming distance between any two numbers in that set was 14 bits which seemed low. I tested the numbers in two of the other programs I use for testing and found similar results. I then wrote a pretty brain-dead (but parallel) algorithm to start generating random numbers and replacing one of the original 788 if the new number increased the min hamming distance. I've had this running for about 2 hours and have raised the min hamming distance from 14 to 23. But at 23, the changes are becoming _very_ infrequent, and while each update improves the overall hamming distance average, pulling 23 up to 24 has not happened yet. I'm going to let this grind overnight on an 8-cpu machine to see if 24 is reachable.

My questions are:

(1) does anyone know how to compute the theoretical minimum hamming distance in a set of 788 64 bit numbers that can be produced? In my case, can this be raised to 24 or beyond? (I have the test set to stop when the PRNG starts to "cycle".

(2) has anyone tested their numbers and tried alternative generators to see if this can be improved on? I am currently using the existing Crafty PRNG, but am going to replace it with a 64 bit version from the internet when the current run goes long enough to convince me it is not going to improve things any further.

Once I get this done, I am going to release Crafty version 23.0, which will include these hard-coded numbers. The version number will change because this will break old opening books that used different RNs to produce the hash signatures stored in those books. Anyone can copy them if they want. But I am curious just how "good" these numbers can be...

hgm · Post by **hgm** » Thu Jan 22, 2009 12:40 am

I investigated the performance of Zobrist hashing schemes and never found any deviation from the expected number of collisions when using simple random generators (known to be very bad from other tests). Hamming distance seemed totally irrelevant: if I intentionally included some keys with very low Hamming distance (e.g. 1, 2, 4, 8) the collision rate did not increase at all.

I also measured the minimum 'distance' of positions that could collide (i.e. how many pieces would you have to change or displace), by systematically looking for small subsets of the keys that would XOR to zero. For a total set of a given size (e.g. 768) you can calculate how many combinations of 6, 7, 8 etc. keys you can make (e.g. for 6 keys this would be 768!(762!*6!) or B(768,6)), and how large the subset can be before you must have two that are equal (because the number of pairs is larger than the number of bit patterns of your key length). Then I calculated how many such collisions you had when you are well beyond the number where collisions are unavoidable for a number of actual random sets. (for 64-bit keys you cannot quite do that, so I did the tests for somewhat shorter key-lengths, by ignoring some bits in the comparison). It was always very close to the expected number.

My conclusion was that there is nothing to gain here. A poor RNG does not perform measurably worse than a perfect one, and minimum hamming distance did not correlate in any way with performance. The set of randoms that you need is so large that the smallest number of dependent keys is so much smaller than the key length, that small hamming distances do not make it any easier to construct a dependency that it already is in the limit of perfect randomness.

Dann Corbit · Post by **Dann Corbit** » Thu Jan 22, 2009 2:07 am

bob wrote:Has anyone done any sort of study (excluding the Wendroff/Warnock paper in the JICCA) on the random numbers you use for hashing?

I was looking for an assignment for a parallel programming course, and decided to use the random number generation of the Zobrist values as a relatively simple project. Crafty needs 788 numbers. 768 are used by the piece tables, 6 for each side, 64 squares per piece which is 12 * 64 or 768. I also have 16 EP target square random numbers that are used to alter the hash when an EP capture on a specific square is possible. Now up to 784. I then have 4 more for castling rights, since each side can castle to either side.

I first tested the basic numbers I get from the PRNG I use in Crafty, and discovered that the min hamming distance between any two numbers in that set was 14 bits which seemed low. I tested the numbers in two of the other programs I use for testing and found similar results. I then wrote a pretty brain-dead (but parallel) algorithm to start generating random numbers and replacing one of the original 788 if the new number increased the min hamming distance. I've had this running for about 2 hours and have raised the min hamming distance from 14 to 23. But at 23, the changes are becoming _very_ infrequent, and while each update improves the overall hamming distance average, pulling 23 up to 24 has not happened yet. I'm going to let this grind overnight on an 8-cpu machine to see if 24 is reachable.

My questions are:

(1) does anyone know how to compute the theoretical minimum hamming distance in a set of 788 64 bit numbers that can be produced? In my case, can this be raised to 24 or beyond? (I have the test set to stop when the PRNG starts to "cycle".

(2) has anyone tested their numbers and tried alternative generators to see if this can be improved on? I am currently using the existing Crafty PRNG, but am going to replace it with a 64 bit version from the internet when the current run goes long enough to convince me it is not going to improve things any further.

Once I get this done, I am going to release Crafty version 23.0, which will include these hard-coded numbers. The version number will change because this will break old opening books that used different RNs to produce the hash signatures stored in those books. Anyone can copy them if they want. But I am curious just how "good" these numbers can be...

Perhaps this program can be useful. You could easily do a search through the pairs until you find a good collection. If the value you get for minimum hamming distance across all pairs is not large enough then just use a bigger starting array. It seems to me that 40 should be easily achievable.

Code: Select all

/* 
   A C-program for MT19937-64 &#40;2004/9/29 version&#41;.
   Coded by Takuji Nishimura and Makoto Matsumoto.

   This is a 64-bit version of Mersenne Twister pseudorandom number
   generator.

   Before using, initialize the state by using init_genrand64&#40;seed&#41;  
   or init_by_array64&#40;init_key, key_length&#41;.

   Copyright &#40;C&#41; 2004, Makoto Matsumoto and Takuji Nishimura,
   All rights reserved.                          

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions
   are met&#58;

     1. Redistributions of source code must retain the above copyright
        notice, this list of conditions and the following disclaimer.

     2. Redistributions in binary form must reproduce the above copyright
        notice, this list of conditions and the following disclaimer in the
        documentation and/or other materials provided with the distribution.

     3. The names of its contributors may not be used to endorse or promote 
        products derived from this software without specific prior written 
        permission.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
   A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
   CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
   EXEMPLARY, OR CONSEQUENTIAL DAMAGES &#40;INCLUDING, BUT NOT LIMITED TO,
   PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
   PROFITS; OR BUSINESS INTERRUPTION&#41; HOWEVER CAUSED AND ON ANY THEORY OF
   LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT &#40;INCLUDING
   NEGLIGENCE OR OTHERWISE&#41; ARISING IN ANY WAY OUT OF THE USE OF THIS
   SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

   References&#58;
   T. Nishimura, ``Tables of 64-bit Mersenne Twisters''
     ACM Transactions on Modeling and 
     Computer Simulation 10. &#40;2000&#41; 348--357.
   M. Matsumoto and T. Nishimura,
     ``Mersenne Twister&#58; a 623-dimensionally equidistributed
       uniform pseudorandom number generator''
     ACM Transactions on Modeling and 
     Computer Simulation 8. &#40;Jan. 1998&#41; 3--30.

   Any feedback is very welcome.
   http&#58;//www.math.hiroshima-u.ac.jp/~m-mat/MT/emt.html
   email&#58; m-mat @ math.sci.hiroshima-u.ac.jp &#40;remove spaces&#41;
*/


#include <stdlib.h>
#include <stdio.h>

#define NN 312
#define MM 156
#define MATRIX_A 0xB5026F5AA96619E9ULL
#define UM 0xFFFFFFFF80000000ULL /* Most significant 33 bits */
#define LM 0x7FFFFFFFULL /* Least significant 31 bits */


/* The array for the state vector */
static unsigned long long mt&#91;NN&#93;; 
/* mti==NN+1 means mt&#91;NN&#93; is not initialized */
static int mti=NN+1; 

/* initializes mt&#91;NN&#93; with a seed */
void init_genrand64&#40;unsigned long long seed&#41;
&#123;
    mt&#91;0&#93; = seed;
    for &#40;mti=1; mti<NN; mti++) 
        mt&#91;mti&#93; =  &#40;6364136223846793005ULL * &#40;mt&#91;mti-1&#93; ^ &#40;mt&#91;mti-1&#93; >> 62&#41;) + mti&#41;;
&#125;

/* initialize by an array with array-length */
/* init_key is the array for initializing keys */
/* key_length is its length */
void init_by_array64&#40;unsigned long long init_key&#91;&#93;,
		     unsigned long long key_length&#41;
&#123;
    unsigned long long i, j, k;
    init_genrand64&#40;19650218ULL&#41;;
    i=1; j=0;
    k = &#40;NN>key_length ? NN &#58; key_length&#41;;
    for (; k; k--) &#123;
        mt&#91;i&#93; = &#40;mt&#91;i&#93; ^ (&#40;mt&#91;i-1&#93; ^ &#40;mt&#91;i-1&#93; >> 62&#41;) * 3935559000370003845ULL&#41;)
          + init_key&#91;j&#93; + j; /* non linear */
        i++; j++;
        if &#40;i>=NN&#41; &#123; mt&#91;0&#93; = mt&#91;NN-1&#93;; i=1; &#125;
        if &#40;j>=key_length&#41; j=0;
    &#125;
    for &#40;k=NN-1; k; k--) &#123;
        mt&#91;i&#93; = &#40;mt&#91;i&#93; ^ (&#40;mt&#91;i-1&#93; ^ &#40;mt&#91;i-1&#93; >> 62&#41;) * 2862933555777941757ULL&#41;)
          - i; /* non linear */
        i++;
        if &#40;i>=NN&#41; &#123; mt&#91;0&#93; = mt&#91;NN-1&#93;; i=1; &#125;
    &#125;

    mt&#91;0&#93; = 1ULL << 63; /* MSB is 1; assuring non-zero initial array */ 
&#125;

/* generates a random number on &#91;0, 2^64-1&#93;-interval */
unsigned long long genrand64_int64&#40;void&#41;
&#123;
    int i;
    unsigned long long x;
    static unsigned long long mag01&#91;2&#93;=&#123;0ULL, MATRIX_A&#125;;

    if &#40;mti >= NN&#41; &#123; /* generate NN words at one time */

        /* if init_genrand64&#40;) has not been called, */
        /* a default initial seed is used     */
        if &#40;mti == NN+1&#41; 
            init_genrand64&#40;5489ULL&#41;; 

        for &#40;i=0;i<NN-MM;i++) &#123;
            x = &#40;mt&#91;i&#93;&UM&#41;|&#40;mt&#91;i+1&#93;&LM&#41;;
            mt&#91;i&#93; = mt&#91;i+MM&#93; ^ &#40;x>>1&#41; ^ mag01&#91;&#40;int&#41;&#40;x&1ULL&#41;&#93;;
        &#125;
        for (;i<NN-1;i++) &#123;
            x = &#40;mt&#91;i&#93;&UM&#41;|&#40;mt&#91;i+1&#93;&LM&#41;;
            mt&#91;i&#93; = mt&#91;i+&#40;MM-NN&#41;&#93; ^ &#40;x>>1&#41; ^ mag01&#91;&#40;int&#41;&#40;x&1ULL&#41;&#93;;
        &#125;
        x = &#40;mt&#91;NN-1&#93;&UM&#41;|&#40;mt&#91;0&#93;&LM&#41;;
        mt&#91;NN-1&#93; = mt&#91;MM-1&#93; ^ &#40;x>>1&#41; ^ mag01&#91;&#40;int&#41;&#40;x&1ULL&#41;&#93;;

        mti = 0;
    &#125;
  
    x = mt&#91;mti++&#93;;

    x ^= &#40;x >> 29&#41; & 0x5555555555555555ULL;
    x ^= &#40;x << 17&#41; & 0x71D67FFFEDA60000ULL;
    x ^= &#40;x << 37&#41; & 0xFFF7EEE000000000ULL;
    x ^= &#40;x >> 43&#41;;

    return x;
&#125;

/* generates a random number on &#91;0, 2^63-1&#93;-interval */
long long genrand64_int63&#40;void&#41;
&#123;
    return &#40;long long&#41;&#40;genrand64_int64&#40;) >> 1&#41;;
&#125;

/* generates a random number on &#91;0,1&#93;-real-interval */
double genrand64_real1&#40;void&#41;
&#123;
    return &#40;genrand64_int64&#40;) >> 11&#41; * &#40;1.0/9007199254740991.0&#41;;
&#125;

/* generates a random number on &#91;0,1&#41;-real-interval */
double genrand64_real2&#40;void&#41;
&#123;
    return &#40;genrand64_int64&#40;) >> 11&#41; * &#40;1.0/9007199254740992.0&#41;;
&#125;

/* generates a random number on &#40;0,1&#41;-real-interval */
double genrand64_real3&#40;void&#41;
&#123;
    return (&#40;genrand64_int64&#40;) >> 12&#41; + 0.5&#41; * &#40;1.0/4503599627370496.0&#41;;
&#125;

unsigned long long hamdist&#40;unsigned long long x, unsigned long long y&#41;
&#123;
    unsigned long long dist = 0,
                    val = x ^ y;
    while &#40;val&#41; &#123;
        ++dist;
        val &= val - 1;
    &#125;
    return dist;
&#125;

unsigned long long ullData&#91;1000000&#93;;
unsigned long long dist_distrib&#91;64&#93;;

int main&#40;void&#41;
&#123;
    int i;
    unsigned long long init&#91;4&#93;=&#123;0x12345ULL, 0x23456ULL, 0x34567ULL, 0x45678ULL&#125;, length=4;
    unsigned long long total = 0;
    unsigned long long hdmin = 64;
    unsigned long long hdmax = 0;
    init_by_array64&#40;init, length&#41;;
    for &#40;i=0; i<1000000; i++) &#123;
      ullData&#91;i&#93; = genrand64_int64&#40;);
    &#125;
   
    for &#40;i=0; i<&#40;1000000-1&#41;; i++) &#123;
    	int hd = hamdist&#40;ullData&#91;i&#93;, ullData&#91;i+1&#93;);
        dist_distrib&#91;hd&#93;++;
    	if &#40;hd < hdmin&#41; hdmin = hd;
    	if &#40;hd > hdmax&#41; hdmax = hd;
      total += hd;
    &#125;
    total /= &#40;1000000-1&#41;;
    printf&#40;"Average hamming distance = %llu, min = %llu max=%llu\n", total, hdmin, hdmax&#41;;
	for &#40;i = 0; i < 64; i++)
		if &#40;dist_distrib&#91;i&#93;) printf&#40;"%llu = %llu\n", i, dist_distrib&#91;i&#93;);
    return 0;
&#125;

/*
Average hamming distance = 32, min = 12 max=50
12 = 2
13 = 1
15 = 4
16 = 23
17 = 82
18 = 188
19 = 487
20 = 1066
21 = 2223
22 = 4277
23 = 7813
24 = 13587
25 = 21686
26 = 32457
27 = 46194
28 = 60881
29 = 74706
30 = 87989
31 = 96209
32 = 98930
33 = 96867
34 = 88449
35 = 75442
36 = 60821
37 = 45594
38 = 32511
39 = 21653
40 = 13716
41 = 7790
42 = 4218
43 = 2228
44 = 1109
45 = 471
46 = 195
47 = 85
48 = 35
49 = 8
50 = 2
*/

Dann Corbit · Post by **Dann Corbit** » Thu Jan 22, 2009 2:22 am

With 100x larger data set here are the stats:

Code: Select all

Average hamming distance = 32, min = 10 max=53
10 = 1
11 = 1
12 = 11
13 = 73
14 = 265
15 = 922
16 = 2723
17 = 7215
18 = 19540
19 = 47277
20 = 105986
21 = 222360
22 = 435355
23 = 795915
24 = 1357838
25 = 2176159
26 = 3263129
27 = 4590274
28 = 6064558
29 = 7528863
30 = 8780912
31 = 9631937
32 = 9931467
33 = 9636934
34 = 8783648
35 = 7526321
36 = 6065347
37 = 4589512
38 = 3261400
39 = 2175312
40 = 1360070
41 = 795468
42 = 435574
43 = 222908
44 = 106306
45 = 47323
46 = 19646
47 = 7510
48 = 2706
49 = 870
50 = 264
51 = 72
52 = 21
53 = 6

Dann Corbit · Post by **Dann Corbit** » Thu Jan 22, 2009 3:32 am

A little thought shows my initial guess of 40 to be impossible (not sure what the real limit is).

I will post a set of 26 bit min hamming distance tomorrow.

So far:

Code: Select all

7266447313870364031,
4946485549665804864,
16945909448695747420,
16394063075524226720,
4873882236456199058,
14877448043947020171,
13857871200353263164,
5249110015610582907,
1235879089597390050,
17320312680810499042,
8942268601720066061,
14226945236717732373,
9383926873555417063,
11510704754157191257,
6489677788245343319,
236502320419669032,
13670483975188204088,
8904234204977263924,
17251681303478610375,
13075804672185204371,
10831805955733617705,
1074659097419704618,
17119870085051257224,
10949279256701751503,
17618792803942051220,
957923366004347591,
1012818702180800310,
974125122787311712,
1861591264068118966,
3809841506498447207,
13408683141069553686,
13900005529547645957,
16475327524349230602,
13554353441017344755,
4738086532119935073,
6788940719869959076,
11670856244972073775,
2488756775360218862,
11016608897122070904,
13978444093099579683,
11628184459157386459,
4249175367970221090,
7306216312942796257,
2889379661594013754,
17310575136995821873,
3435082195390932486,
7444710627467609883,
11216615872596820107,
5770016989916553752,
1845293119018433391,
1259990987526807294,
15531278677382192198,
16317667579273275294,
9134927878875794005,
9030000260502624168,
3115849849651526279,
10637177623943913351,
16428252083632828910,
15722061259110909195,
1524477318846038424,
3443401252003315103,
9689406052675046032,
3609534220891499022,
9281161952002617309,
147649915388326849,
10131486500045494660,
8517521928922081563,
2836463788873877488,
9570032848237657586,
10863668391515109722,
1552803150609742001,
2790375502200557313,
9038024825497430573,
91888768090624853,
18145740993490813283,
12654149022202498759,
4206418103243872757,
7233523623977397383,
16215672393055652545,
10738887894178969425,
18204068107936511697,
14383128117587060157,
15849831596023436356,
17675390291979846750,
2977578850677699908,
4893323648315952120,
1286466090879300568,
9888911152046881084,
3183884142852054126,
13772224761560358678,
9094920381744576493,
16129742540997802980,
16945398958854163835,
17650197551112766205,
4852422776459831315,
9362357443732597760,
2374087830147132659,
15000861190071109560,
2902081674035190433,
15810933559568492050,
12678167969995957047,
1798389442866720091,
14599659047605763816,
17782026137486407602,
3027116614594121997,
6496187495111652573,
9859717582333494749,
5930885232297252456,
4061277839767617930,
12035499765197893420,
4370560942875640999,
5634383789002676726,
8907175724738102010,
16593287713411177161,
3325238609365118367,
17125462708636491154,
10188578131352743024,
15161949016576385831,
6657699133553403854,
2352226978179009098,
3859532089524191730,
17631674258545638758,
17581014988572303447,
7458819045178438285,
5488240155574034654,
7788457792396395633,
1234217808918573134,
10045923023026503261,
2803693151878023143,
15042951576086758915,
1809970995609003345,
6472832596573079460,
10531013421297635766,
9687689322966646953,
5309220766259687196,
16099895359080387221,
15463291357721078947,
14892243053368326929,
15913028219213787856,
17114854711077298999,
6881183452302138939,
9743703211409053724,
6120184892838179370,
4488025338083630045,
6932643716586102064,
7085729815153112724,
2577286319771781656,
3212578193372804795,
7140055923216537951,
11545784650358033961,
15097420044622846756,
9577429288174440901,
1651506457155992685,
17206783877956047023,
4977537220776361209,
16799175153209517804,
17623046094747753280,
16695848152965592420,
807005292018537624,
5694219642130963465,
17782774995076800672,
5025236171457631467,
11601588411391151733,
13800826249802546619,
8320715347883164621,
15765965395656039061,
9783656346475123313,
15660593116602060011,
12686638988178810220,
4601829680043422926,
13649469918759116343,
11072681934824066811,
3484285465118294452,
5250162757012130834,
11822278712571422607,
16820676274862823347,
2258483262623939360,
18059638503665417019,
4038533795659869059,
9217581272623145121,
10474525469047663535,
1704011107366220345,
15761773994985082820,
1396662462031085091,
11499399103501446733,
8253913244279442889,
18014518805502248392,
10766750626076003259,
13347551569433141972,
12285514714001968960,
12601751059746685848,
4088019966658540820,
7550841428494759028,
14457212971848820997,
16415384305117522006,
3274893893751255634,
10659638677637579954,
6338371175383690179,
18162230369769586150,
428321177825102899,
15817001710821818046,
12235635129931490851,
13527122555395203005,
2178128838908381672,
14367385536076681562,
12915933956577960132,
4652351249928028414,
4951391482976146214,
7296771005593252278,
14127467184305310300,
16956161371242220606,
2288730399829329044,
10224299563023357164,
15302392990651373659,
5732367270286632815,
7889985811968217208,
5521408468444455621,
3010954896914979971,
14717388125128732814,
17543622707664214218,
16699503790769821674,
5300069125018992847,
4491717662350224528,
6179887503359229642,
10444048407325898982,
12310419282952883445,
9615039512616912255,
8912906368760218223,
2736639230152280465,
1466229153547721039,
2954109638740141000,
11746874332443504878,
10637552291778782845,
17962194647643634986,
7190730870912935711,
15378933026600388229,
7415343666878271746,
4247889442676475641,
17929428741300739597,
6205303292541642307,
11123091740953949733,
784117286214737662,
11137709922435249100,
14725228923085766997,
10386402860586212951,

bob · Post by **bob** » Thu Jan 22, 2009 4:36 am

My code has been churning for 6 hours now. It has computed 788 numbers with a minimum hamming distance of 23. It has been stuck on 23 for several hours. I'm now testing with the 64 bit twister from the internet... It still occasionally finds a number that reduces the overall average, but it has not been able to raise the minimum in 4 hours or so now. I'm going to let it grind overnight, but suspect it won't do any better.

hgm · Post by **hgm** » Thu Jan 22, 2009 9:47 am

And now does this set give you a lower key-collission rate than the first set of randoms that you generated? Or was it a total waste of time?

Another point that is not addressed, and that potentially could give an improvement of the minimum collision distance is that the keys do not form a homogeneous group. Eg.g. 0xFFFFFFFF00000000, 0xFFFFFFFFFFFFFFFF and 0x00000000FFFFFFFF are 3 keys that have hamming distance 32, 32 and 64, and thus could easily be part of a larger set with minimum hamming distance 32, which you would consider "unacheivably good". Nevertheless, the set totally sucks, as the three keys are dependent, (they XOR to zero). With 12-bit keys 0xFF0, 0xF0F and 0x0FF all have hamming distance 8, which is pretty good for 12-bit keys, and they also all have distance 8 to 0x000. Nevertheless they are dependent, and would spoil any set that containst them.

In fact, any set of keys with a large minimal hamming distance can be transformed into a set with exactly the same hamming distance between any pair of keys, by XORing all keys with the same constant. But if I take that constant to be the XOR of 3 arbitrary keys from the set, those 3 keys in the transformed set will be dependent!

But if you put these 3 keys all in the subset of 64 for the white King keys, say at f2, g2, and h2, the dependency would not hurt at all. Collisions would occur between positions that have a single King on f2, and otherwise identical positions that had two white Kings on g2 and h2. Who cares? OTOH, if the 3 offending keys were all in the white Pawn sub-set, you could not see the difference between having a single Pawn at f2, or two Pawns at g2 & h2. This is a collision that is extremely hurtful for an engine.

So if you want to improve collission rate, forget about hamming distance, and look for small sub-sets of dependent keys. (An efficient way to do this is exhaustively generate all XORs of N keys, and put the resulting XORs in a hash table. When two sets of N XOR to the same value, you have detected a dependent sub-set of 2N keys. To also detect odd-membered subsets, also put the XORs of N-1 keys in the hash table.)

If the hash key is too long to get any collisions for the N that you can manage considering available time and memory, just look at a shorter part of the key. On a small laptop N=3 is typically done in a minute or so; for a set of 768 there are 768*767*766/6 = 75M = 2^26 different XOR products, or 2^51 pairs. So for a key shorter 51 bits there will almost certainly be some collisions, revealing dependent sub-sets of 6 keys. Similarly, with N=4 (screening all 8-key sub-sets) you will generate 2^66.5 pairs, each having a 2^-64 probability to collide with a 64-bit key. So on the average you will have 2^2.5 = 6 collissions. But you can easily 'hide' those collisions by assigning the keys in the dependent subsets all to the same piece.

bob · Post by **bob** » Thu Jan 22, 2009 5:08 pm

hgm wrote:And now does this set give you a lower key-collission rate than the first set of randoms that you generated? Or was it a total waste of time?

Don't know yet. First step was to find the best set according to the minimum hamming distance criterion that I could find. Next step is to test old and new to see what happens with regard to collisions. My intent is to use the "perft" code to enumerate a tree saving each hash signature along the way and looking for duplicate signatures with different actual positions.

More later...

Another point that is not addressed, and that potentially could give an improvement of the minimum collision distance is that the keys do not form a homogeneous group. Eg.g. 0xFFFFFFFF00000000, 0xFFFFFFFFFFFFFFFF and 0x00000000FFFFFFFF are 3 keys that have hamming distance 32, 32 and 64, and thus could easily be part of a larger set with minimum hamming distance 32, which you would consider "unacheivably good". Nevertheless, the set totally sucks, as the three keys are dependent, (they XOR to zero). With 12-bit keys 0xFF0, 0xF0F and 0x0FF all have hamming distance 8, which is pretty good for 12-bit keys, and they also all have distance 8 to 0x000. Nevertheless they are dependent, and would spoil any set that containst them.

Actually that is not possible, and the "why" is trivial to see. You have 785 _other_ numbers that have to have 32 different bits from all 3 of the above keys, which is not possible at all...

In fact, any set of keys with a large minimal hamming distance can be transformed into a set with exactly the same hamming distance between any pair of keys, by XORing all keys with the same constant. But if I take that constant to be the XOR of 3 arbitrary keys from the set, those 3 keys in the transformed set will be dependent!

But if you put these 3 keys all in the subset of 64 for the white King keys, say at f2, g2, and h2, the dependency would not hurt at all. Collisions would occur between positions that have a single King on f2, and otherwise identical positions that had two white Kings on g2 and h2. Who cares? OTOH, if the 3 offending keys were all in the white Pawn sub-set, you could not see the difference between having a single Pawn at f2, or two Pawns at g2 & h2. This is a collision that is extremely hurtful for an engine.

So if you want to improve collission rate, forget about hamming distance, and look for small sub-sets of dependent keys. (An efficient way to do this is exhaustively generate all XORs of N keys, and put the resulting XORs in a hash table. When two sets of N XOR to the same value, you have detected a dependent sub-set of 2N keys. To also detect odd-membered subsets, also put the XORs of N-1 keys in the hash table.)

If the hash key is too long to get any collisions for the N that you can manage considering available time and memory, just look at a shorter part of the key. On a small laptop N=3 is typically done in a minute or so; for a set of 768 there are 768*767*766/6 = 75M = 2^26 different XOR products, or 2^51 pairs. So for a key shorter 51 bits there will almost certainly be some collisions, revealing dependent sub-sets of 6 keys. Similarly, with N=4 (screening all 8-key sub-sets) you will generate 2^66.5 pairs, each having a 2^-64 probability to collide with a 64-bit key. So on the average you will have 2^2.5 = 6 collissions. But you can easily 'hide' those collisions by assigning the keys in the dependent subsets all to the same piece.

I am not worried about being perfect. Wendroff already addressed that and it is impossible with 64 bits for obvious reasons. My first cut, out of a natural curiosity, was to find the set of 788 numbers with the largest min hamming distance between any pair from the set. My next approach is to look at the piece tables more closely. I'd like to have min hamming distance be between pairs in different piece boards, but within a piece table, I would prefer to have large min hamming distances between squares that can be reached quickly.

Probably the most trivial case is a light-squared bishop. Only 32 random numbers are important and there is no need to even compute the hamming distances between light squares and dark squares. For kings the closer the two squares are physically, the more important their min hamming distance is. Etc...

hgm · Post by **hgm** » Thu Jan 22, 2009 5:51 pm

bob wrote:
hgm wrote:... With 12-bit keys 0xFF0, 0xF0F and 0x0FF all have hamming distance 8, which is pretty good for 12-bit keys, and they also all have distance 8 to 0x000. Nevertheless they are dependent, and would spoil any set that containst them.
Actually that is not possible, and the "why" is trivial to see. You have 785 _other_ numbers that have to have 32 different bits from all 3 of the above keys, which is not possible at all...

I don't understand what you are saying here. What exactly do you consider to be impossible? It can't be that the keys I gave are dependent (because they obviously are). And if three keys are dependent, thety do spoil the entire set, no matter how large the set is. But a 12-bit example can of course not be projected to a real chess board needing 788 keys.

So to not discuss too much in abstracto, let me revise the example to 64-bit:

Having keys

Code: Select all

k1 = 0xFFFFFFFFFFC00000
k2 = 0xFFFFF800003FFFFF
k3 = 0x000007FFFFFFFFFF

In terms of hamming distance, we have

d(k1,k2) = 42
d(k2,k3) = 42
d(k1,k3) = 43
d(k1,0) = 42
d(k2,0) = 43
d(k3,0) = 43

Furthermore, each of the keys would have a hamming distance of on the average 32 to any random 64-bit number drawn from a homogeneous distribution. So in terms of hamming distance, they don't spoil anything, and are in fact very likely to drive up the average, despite the effort you take to maximize the distance within the remaining 785 keys.

But they would make the set totally useless if they were used such that they could appear together in a key, such as Pawn keys for f2, g2, h2. No matter what the other 785 keys are, positions with a full Pawn shield for the King on g1 would be constantly confused with those where the king is totally bare, all other pieces being positioned identically.

Zach Wegner · Post by **Zach Wegner** » Thu Jan 22, 2009 6:30 pm

Random idea:

zobrist[piece][square] = rotate_left(deBruijn[piece], square);

Zobrist key random numbers

Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers