Zobrist key independence

Rebel · Post by **Rebel** » Wed Feb 26, 2020 4:48 pm

Just for the record, what's your definition of a bucket?

I use 4 slots, I guess that's not the same.

In the example I used a HT of 128 Mb.

hgm · Post by **hgm** » Wed Feb 26, 2020 5:56 pm

With 'bucket' I mean the group of entries a given key could end up in. But on second thought I think it is really the number of entries that matters: if you have larger buckets, fewer of those will fit in the table. So the number of bits used to select the bucket will be smaller, which leaves more bits in the stored key that are not guaranteed to be the same (and thus redundant). That reduces the probability of an accidental match, but then this is completely compensated by the fact that you probe multiple entries.

So with 128MB hash (and I assume 16-byte entries) you have 8M =2^23 entries. With 48-bit key you would then expect 1 collission in 2^25 = 32M probes. You see 1 in ~200M, which is lower. But 1 is not really a statistically significant number; it could just be that you were lucky.

chrisw · Post by **chrisw** » Wed Feb 26, 2020 6:19 pm

hgm wrote: ↑Wed Feb 26, 2020 5:56 pm With 'bucket' I mean the group of entries a given key could end up in. But on second thought I think it is really the number of entries that matters: if you have larger buckets, fewer of those will fit in the table. So the number of bits used to select the bucket will be smaller, which leaves more bits in the stored key that are not guaranteed to be the same (and thus redundant). That reduces the probability of an accidental match, but then this is completely compensated by the fact that you probe multiple entries.

So with 128MB hash (and I assume 16-byte entries) you have 8M =2^23 entries. With 48-bit key you would then expect 1 collission in 2^25 = 32M probes. You see 1 in ~200M, which is lower. But 1 is not really a statistically significant number; it could just be that you were lucky.

Deliberately stupid question: what is the point of buckets? Is there an objective other than trying to find the least consequent slot to overwrite into?
Second stupid question: how do we define least consequent?

mar · Post by **mar** » Wed Feb 26, 2020 6:49 pm

The point of buckets is really simple: CPU cache. cache line size is 64 bytes on x86/x64 so with 16 bytes/entry you have 4 entries per bucket.
(assuming your TT is cacheline-aligned)
You can also use various replacement schemes for various entries (as has been suggested by hgm in another thread), so that would be another benefit.

Rebel · Post by **Rebel** » Wed Feb 26, 2020 7:41 pm

hgm wrote: ↑Wed Feb 26, 2020 5:56 pm With 'bucket' I mean the group of entries a given key could end up in. But on second thought I think it is really the number of entries that matters: if you have larger buckets, fewer of those will fit in the table. So the number of bits used to select the bucket will be smaller, which leaves more bits in the stored key that are not guaranteed to be the same (and thus redundant). That reduces the probability of an accidental match, but then this is completely compensated by the fact that you probe multiple entries.

So with 128MB hash (and I assume 16-byte entries) you have 8M =2^23 entries. With 48-bit key you would then expect 1 collission in 2^25 = 32M probes. You see 1 in ~200M, which is lower. But 1 is not really a statistically significant number; it could just be that you were lucky.

My bucket (or slots) is 4, (remaining) depth replacement scheme to keep the best entries and yes 16 byte entries. If we take the previous posted results and what you called "probes" (call the HT) are not the same as found keys. Only on found keys a hash collision can occur.

Code: Select all

POS       Probes       Found Keys       Collisions
Pos 1   21.558.717   2.855.652 (13%)         0
Pos 2   50.555.311   5.499.316 (10%)         0
Pos 3   58.852.315   7.175.444 (12%)         0
Pos 4   75.216.172  14.076.588 (18%)         1

bob · Post by **bob** » Thu Feb 27, 2020 1:38 am

Here's my approach:

Code: Select all

/*
 ************************************************************
 *                                                          *
 *  Now we search for an entry to overwrite in three        *
 *  passes.  A bucket contains 4 entries in Crafty.  All    *
 *  accessed by the low order bits of the current hash      *
 *  signature.                                              *
 *                                                          *
 *  Pass 1:  If any signature in the bucket matches the     *
 *    current signature, we are going to overwrite this     *
 *    entry, period.  It might seem worthwhile to check the *
 *    draft and not overwrite if the table draft is greater *
 *    than the current remaining depth, but after you think *
 *    about it, this is a bad idea.  If the draft is        *
 *    greater than or equal the current remaining depth,    *
 *    then we should never get here unless the stored bound *
 *    or score is unusable because of the current alpha/    *
 *    beta window.  So we are overwriting to avoid losing   *
 *    the current result.                                   *
 *                                                          *
 *  Pass 2:  If any of the entries come from a previous     *
 *    search (not iteration) then we choose the entry from  *
 *    this set that has the smallest draft, since it is the *
 *    least potentially usable result.                      *
 *                                                          *
 *  Pass 3:  If neither of the above two found an entry to  *
 *    overwrite, we simply choose the entry from the bucket *
 *    with the smallest draft and overwrite that.           *
 *                                                          *
 ************************************************************
 */

Rebel · Post by **Rebel** » Thu Feb 27, 2020 11:30 am

Time for some volume testing, 500 positions from the STS test.
First 100 at 30 seconds
Second 100 at 1 minute
Third 100 at 2:30
Fourth 100 at 5:00
Last 100 at 10:00

First 100 below.
Probes means number of HT lookup's
Found keys, those match the position in the HT
Coll, collisions
TP, in case the collision is a transposition and thus has the potential to cause harm.

First 100 at 30 seconds per move. 20 collisions of which 13 transpositions.

Code: Select all

       Probes          Found Keys     Coll   TP
     15.708.253      2.451.861 (15%)     0    0
     17.786.894      2.877.969 (16%)     2    2
     17.002.583      2.261.897 (13%)     0    0
     19.172.410      3.698.957 (19%)     0    0
     22.055.680      5.569.780 (25%)     0    0
     20.247.514      4.618.226 (22%)     0    0
     18.648.606      2.582.235 (13%)     0    0
     17.990.221      2.473.495 (13%)     0    0
     16.944.258      2.400.943 (14%)     0    0
     19.569.871      2.525.649 (12%)     0    0
     20.723.537      3.875.531 (18%)     0    0
     22.114.911      3.234.268 (14%)     0    0
     18.119.229      2.656.125 (14%)     0    0
     16.401.223      1.979.946 (12%)     0    0
     20.347.874      4.546.725 (22%)     0    0
     16.588.564      2.736.373 (16%)     1    1
     22.080.157      3.952.163 (17%)     0    0
     23.742.664      3.536.947 (14%)     0    0
     22.886.758      5.946.657 (25%)     0    0
     17.502.757      2.770.054 (15%)     0    0
     18.461.098      2.436.395 (13%)     0    0
     25.265.608      5.419.638 (21%)     0    0
     21.225.435      4.139.089 (19%)     0    0
     23.365.571      4.130.027 (17%)     1    1
     16.891.814      2.330.492 (13%)     1    0
     16.887.244      3.371.409 (19%)     0    0
     22.892.556      3.732.848 (16%)     0    0
     17.485.622      2.802.948 (16%)     0    0
     20.112.662      4.170.295 (20%)     0    0
     16.842.545      2.465.090 (14%)     0    0
     18.230.852      2.374.021 (13%)     0    0
     18.497.973      2.882.729 (15%)     0    0
     16.059.760      2.880.054 (17%)     0    0
     22.871.140      5.119.680 (22%)     0    0
     19.977.250      3.445.492 (17%)     0    0
     17.857.349      2.440.920 (13%)     0    0
     16.533.635      2.107.979 (12%)     0    0
     20.022.200      3.781.403 (18%)     0    0
     17.431.843      3.131.102 (17%)     1    0
     20.011.543      3.114.406 (15%)     0    0
     20.558.669      2.996.039 (14%)     2    1
     18.953.070      2.702.890 (14%)     0    0
     19.760.054      3.483.652 (17%)     0    0
     17.966.764      3.480.242 (19%)     0    0
     22.691.710      4.824.762 (21%)     0    0
     26.541.331      7.162.938 (26%)     0    0
     19.977.882      3.426.015 (17%)     0    0
     22.408.617      4.786.573 (21%)     0    0
     17.197.878      2.237.960 (13%)     2    1
     22.331.088      4.235.397 (18%)     0    0
     24.123.290      5.434.076 (22%)     0    0
     21.218.709      4.605.075 (21%)     1    1
     20.950.233      3.702.643 (17%)     0    0
     20.562.674      3.957.659 (19%)     0    0
     20.411.578      3.633.875 (17%)     0    0
     23.749.954      3.530.708 (14%)     1    1
     17.559.989      2.460.171 (14%)     0    0
     19.064.927      3.707.974 (19%)     0    0
     19.573.506      3.200.948 (16%)     0    0
     26.125.677      6.990.136 (26%)     0    0
     18.523.700      2.695.777 (14%)     0    0
     20.856.950      3.586.884 (17%)     1    0
     25.300.452      4.611.175 (18%)     1    0
     17.303.255      3.005.826 (17%)     0    0
     19.873.124      3.232.224 (16%)     0    0
     19.293.430      3.536.106 (18%)     0    0
     19.064.185      3.216.010 (16%)     0    0
     23.277.802      3.957.564 (17%)     0    0
     24.193.241      6.010.303 (24%)     0    0
     21.114.189      4.071.833 (19%)     0    0
     20.741.350      3.572.927 (17%)     1    1
     19.379.385      3.337.443 (17%)     0    0
     22.180.720      6.150.667 (27%)     0    0
     21.624.838      4.207.473 (19%)     0    0
     21.967.179      4.853.402 (22%)     0    0
     18.432.731      3.114.884 (16%)     1    0
     19.250.583      4.254.427 (22%)     0    0
     20.556.278      3.714.631 (18%)     1    1
     20.562.305      4.698.648 (22%)     0    0
     20.302.775      4.374.173 (21%)     0    0
     23.071.204      3.617.612 (15%)     0    0
     17.106.654      2.446.673 (14%)     0    0
     17.141.058      2.467.106 (14%)     0    0
     17.960.827      2.886.650 (16%)     0    0
     17.992.248      2.712.582 (15%)     0    0
     19.187.041      3.450.448 (17%)     0    0
     21.903.327      3.814.130 (17%)     0    0
     18.028.073      2.143.395 (11%)     0    0
     19.528.982      2.875.116 (14%)     0    0
     19.122.584      3.753.892 (19%)     0    0
     18.423.714      3.294.996 (17%)     0    0
     17.505.090      2.708.692 (15%)     0    0
     18.504.489      2.730.105 (14%)     0    0
     17.717.061      2.254.488 (12%)     0    0
     18.278.367      3.337.586 (18%)     0    0
     18.451.235      2.264.715 (12%)     0    0
     23.776.056      4.925.944 (20%)     2    2
     18.417.504      2.675.981 (14%)     0    0
     18.569.066      3.303.984 (17%)     0    0
     23.398.836      4.576.620 (19%)     1    1

PS, HGM, if you (as OP) think I am spoiling your thread I will post follow ups in the other thread, you know that thread which started as hash collisions but changed into human collisions

hgm · Post by **hgm** » Thu Feb 27, 2020 2:00 pm

Well, hash collisions are sort of on topic. Replacement schemes, although very interesting in themselves, are not.

What do you mean, by the collision being a 'transposition'?

If I make a very rough estimate, you searched about 20M nodes per position, x100 positions = 2G nodes, and only 20 collisions, so 1 in 100M. That is still lower than the 1 in 32M that I predicted from your key length / TT size. Part of the explanation could be that you use separate positions. Initially there would be nothing related to the position in the TT, (you might even have cleared it), and it takes time to fill up. While the table is partly empty there obviously is less you could collide with. The node count of individual searches (~20M) is larger than your TT size (8M), but some of these nodes will be transpositions. So it could be that the TT is not even fully filled at the end of the search, so that on average probes are colliding with a less than half-filled TT, reducing the number of collisions correspondingly.

Although the techniques I discussed reduce the number of required basis keys for which you care about dependence from 749 to 213, this is not nearly enough reduction to make a set of 64-bit keys collision free. For a Zobrist scheme with purely random keys, very high-order dependencies (of which there will always be many more than of the lower-order dependencies) will always cause collisions between positions with so many differences that they will never occur in the same tree. But if the basis keys represent bits in the proto-keys for pair or triplet of identical pieces, XORing many basis keys from that same group makes the position not much more different than what it was than when you just XOR in a single basis key. For a pair of Rooks the best you can hope for is two Rooks now in a different location, each of which they can reach (on an empty board) in at most 2 moves. So you will never get something that is removed by more than 4 moves, no matter how many proto-key bits differ in the colliding positions.

So having only very-high-order dependency in the basis key (which due to the reduction in the required number you are now able to do) is not necessary helpful. What you really care about is that a dependencies affects as many different pairs/triplets as possible. A third-order dependency between three basis keys that affects three pairs, will typically cause a collission with a position that has 6 pieces moved to a different location that they could not have reached in a single move, i.e. be at least 12 ply removed (and 24 ply if they were all of the same color). While a 6th-order dependency between basis keys for 2 pairs would only move 4 pieces around.

So I am now toying with this vague idea to construct the basis key so that they have intentionally low-order dependencies touching many piece types, in the hope this would make it easier to exclude high-order dependencies that involve just a few piece types. One idea is this: I pick completely independent basis keys for two piece types. Say we use pair encoding, each pair needing 13 basis keys. We can use keys 1<<N, (N<39) as 39 fully independent keys, which we could use to construct Zobrist keys for 3 pairs. Call the sets for these 3 pairs {A, B, C, ...}, {a, b, c, ...} and {1, 2, 3, ...}. Now for the 4th pair we could use basis keys {Aa1, Bb2, Cc3, ...}. They would all be 4th-order dependent with the sets for the first three, but all 4 involved keys would always be in a different group. So to make a collision through this dependence, 4 pairs would have to be messed up. For the 5th group we could use {Ab3, Bc4, Cd5, ...}. This would also have such 'maximally distributed' 4th-order dependencies with the first 3 sets, and with most of the 4th set, as making, (say) A disappear from Ab3 by multiplying with Aa1 would leave us with ab13, which still needs keys from set 2 and 3 to get 0. Except that multiplying all keys of set 4 and 5 together would produce 0. I wonder if there is a solution to that, and how many sets could be constructed this way from just the 39 fully-independent keys. Eight sets would be enough to handle all NBRQ pairs...

Rebel · Post by **Rebel** » Thu Feb 27, 2020 5:11 pm

hgm wrote: ↑Thu Feb 27, 2020 2:00 pm Well, hash collisions are sort of on topic. Replacement schemes, although very interesting in themselves, are not.

I think it's best to start a new thread (I just did) because your try is more important.

hgm wrote: ↑Thu Feb 27, 2020 2:00 pm What do you mean, by the collision being a 'transposition'?

A HT match serves 2 purposes, to get the best move and if the tree can be cut off, a transposition. When a key gives a collision which is not a transposition there is no harm, just a bad move ordering.

Zobrist key independence

Re: Zobrist key independence

Re: Zobrist key independence

Re: Zobrist key independence

Re: Zobrist key independence

Re: Zobrist key independence

Re: Zobrist key independence

Re: Zobrist key independence

Re: Zobrist key independence

Re: Zobrist key independence