Measuring Hash Collisions (once again)

Alayan · Post by **Alayan** » Sat Feb 29, 2020 7:24 pm

hgm wrote: ↑Sat Feb 29, 2020 9:20 am Note that I am not convinced by the effect of 7-men EGT in Stockfish. There should not be any lines of chessic relevance from the opening position that reach the 7-men stage, meaning that the scores obtained in these probes should never be able to propagate to the root. It sounds more like an accidental effect of replacing different TT entries because you don't search exactly the same tree, similar to what you would have if you run with a TT of a different size, or using a different set of Zobrist keys. Or that EGT probing slightly slows down the NPS. If root moves are so close in score that the choice between them is basically random, such insignificant things can affect the choice. (And Elo-wise you would not care at all.)

It's indeed insignificant, the point was that sometimes with close moves, tiny changes far down in the tree can affect the move choice indirectly. But that's because the moves are extremely close in the first place, and I agree that elo-wise changing choice between two equal moves is irrelevant.

petero2 · Post by **petero2** » Sun Mar 01, 2020 1:44 am

hgm wrote: ↑Sat Feb 29, 2020 3:03 pm That pretty much confirms that 16 (non-redundant) bits in the signature is already good enough from an Elo perspective. And with full verification of the hash move, Elo would be the only concern.

petero2 wrote: ↑Sat Feb 29, 2020 1:22 pmSome of the bits in the Zobrist key are used to determine which bucket a position belongs to. Each entry also stores the whole 64 bit key, even though some of the bits are redundant since they were used to determine the bucket and thus are the same for all positions belonging to that bucket.
So XOR the signature (or index) with the incremental eval, to make those bits non-redundant!

That would indeed reduce the number of redundant bits, but it would likely not completely eliminate them, depending on your engine.

I added code to Texel to estimate how many bits of redundancy this would remove, by logging for each evaluation score how many times that score occurred during search, then computing the entropy of the distribution of scores. I got the following results by searching for about a minute from the starting position:

Code: Select all

Material only score               : ~  3 bits (*)
PST score only                    : ~  8 bits
Material + PST                    : ~ 10 bits
PST_MG + PST_EG                   : ~ 16 bits
(MTRL+PST)_MG + (MTRL+PST)_EG     : ~ 17.5 bits
Separate PST score per piece type : almost as good as 64 bit Zobrist hash

If the engine only keeps track of the total material score incrementally, you would only gain about 3 bits.
If the engine keeps track of the total piece square table (PST) score incrementally, you would gain about 8 bits.
If the engine incrementally keeps track of both middle game and end game PST scores, you would gain about 16 bits.

(*) For more tactical positions, the material score would work better though. For WAC71 I got about 6.3 bits.

In Texel I actually keep track of PST scores for each piece type, since I use different MG/EG interpolation factors for different piece types. If I combine all PST scores using a decent mixing function, I get a hash value that is almost as good as a 64 bit Zobrist hash value. I think computing this value would be at least as expensive as computing an additional 64 bit Zobrist hash value though, so it is probably not useful in practice.

I think this trick would work better in an engine that has been automatically tuned, since that tends to add a small amount of "noise" to the piece square tables.

For an engine using 64-bit Zobrist keys, I don't think this trick is worthwhile, since even if you use a huge 256GB hash table (2^32 buckets) you would still have 32 non-redundant bits to store in the hash entries, which according to my previous test is more than enough to not cause any elo loss.

I think this trick would be useful for a 32 bit engine using 32 bit Zobrist keys though, especially if the incremental eval includes both base piece values and PST corrections.

hgm · Post by **hgm** » Sun Mar 01, 2020 10:03 am

The incremental evaluation in KingSlayer is a 32-bit quantity that encodes both end-game and opening score for PST plus material (so that a multiply with the game phase followed by a right-shift can do the linear interpolation). The PST used for the incremental update contain likewise combined values. (The upper bits contain the difference of opening and end-game values, so that the piece base value mostly cancels there, and fewer bits are needed for that.)

So XORing this into the index should give me about 16 extra bits, at the expense of only a single XOR instruction. I also gain an extra bit by XORing the stm key only into the index. (I also do that for the e.p. status, but the entropy of that should be close to 0.) This doesn't cost anything extra, as I would have to XOR the stm into the key somehow anyway, and then duplicate the key to derive the index from it. It doesn't matter (speed-wise) whether I do the XOR before or after the duplication.

With effectively 80+ bits key length I felt I did not have to bother at all with hash-move verification or crash resistance in the face of faulty hash moves.

Rebel · Post by **Rebel** » Sun Mar 01, 2020 8:19 pm

Rebel wrote: ↑Sat Feb 29, 2020 3:23 pm With Zobrist 8-bit checksum, 48 vs 56 bit hash key.
Code: Select all
KEY = 48 bit           KEY = 56 bit
POS  Time   Coll  TP    Coll  TP
100  0:30     26  17       0   0
100  1:00     41  23       0   0
100  2:30    136  98       0   0
100  5:00    263 211       0   0
100 10:00    438 355       0   0
No single collision with a 56 byte key.

Update:

Last step, playing games. On depth to measure move and/or score differences due to collisions.

Code: Select all

     KEY = 48 bit           KEY = 56 bit
Depth  Time  Coll TPE    Coll TPE   Moves Scores    Played
 14     5s     40   6       0   0     0     0        1418
 15    10s     68  10       0   0     0     0        2225
 16    20s    115  28       0   0     0     0        1445
 17    40s    368  68      13   7     1     2        1527

Time - estimated average time per move.
Coll - number of collisions.
TPE - transposition errors.
Moves - number of different moves played due to collisions.
Scores - number of different scores due to collisions.
Played - number of moves played.

Remarks

1. Measuring the effect on collisions there seems to be a difference between loading positions from EPD with a clean hash table and playing games where the HT is never cleared and each position is close to the previous position. On 100 positions from EPD no single collision reported on 10 minutes thinking time while in games with an average time of 40 seconds (although containing much more positions) 13 collision were discovered.

2. It's surprising to notice how well a 48-bit key already does despite the transposition errors. For a SP program like ProDeo doing 2.5 million NPS it's safe to conclude a 56 bit hash key is satisfying and due to the lack of a SMP version it makes no sense (time wise) to extend this test to a 56 vs 64 bit key comparison.

3. Based on the results it's hard to estimate how many collisions a 64 hash key will produce on a system doing 75-100 million NPS on long time control.

4. Finally on depth=17 the 56 bit version gives 2 score and 1 move collision:

48. Qe2 {+0.21/17 309s}
48. Qe2 {+0.19/17 286s}

and

46. Qg4 {+0.58/17 111s}
46. Qe4 {+0.85/17 62s}

Rebel · Post by **Rebel** » Mon Mar 02, 2020 9:43 am

I am going to do a 64 bit hash key check after all, self play games on nodes.

On an average middle game position, hash key 56 bit, NPS ± 2.6 million, no collision discovered.

Code: Select all

  Nodes    Time        Probes    Found Keys         Coll  TPE   r1bk1n1r/pp1n1q1p/2p2p1R/3p4/3PpN2/2NB2Q1/PPP2PP1/2K1R3 w - -
1 billion  6:07   250.064.018    150.353.406 (60%)     0    0   00:05:04  20.00  1.41  1.Bxe4 dxe4 2.Nxe4 Rg8 3.Nd6 Qe7 4.Rxe7 Rxg3 
2 billion 12:31   511.038.816    304.021.955 (59%)     0    0   00:05:11  20.00  1.41  1.Bxe4 dxe4 2.Nxe4 Rg8 3.Nd6 Qe7 4.Rxe7 Rxg3 
4 billion 25:07 1.018.575.173    600.861.953 (58%)     0    0   00:14:49  21.00  1.41  1.Bxe4 dxe4 2.Nxe4 Rg8 3.Nd6 Qe7 4.Rxe7 Rxg3

But a question first, on TCEC hardware:
1. how many NPS is Stockfish (or another engine) doing ?
2. what is the TCEC time control nowadays ?

Rebel · Post by **Rebel** » Tue Mar 03, 2020 8:21 am

I have the 64-bit hashkey collision check running now, 5 games, 1 billion nodes, average time per move 7-8 minutes. So far after 5 x 50 = 250 moves no collision reported, that is, not yet.

The math, my Stockfish on 20 cores does 25 million NPS so doing 1 billion nodes with ProDeo equals about 40 seconds Stockfish running on 20 cores.

Now I have heard an experienced programmer say, Stockfish doing 75 million NPS will generate hundreds of collisions each second. Not my experience, ProDeo produces zero.

From Bob's and Anthony's study I have understood about 1 collision per one million, again not my experience.

I am wondering how these huge difference in numbers can be, one reason could be the creation of the random keys at program start. Unlike most of you (I think) I don't create 64 bit hash keys at once. I create a first set of 32 bit keys with formula X and a second set of 32 bit keys with formula Y, then combine them. Does this create so much better keys I get far less collisions?

syzygy · Post by **syzygy** » Tue Mar 03, 2020 8:55 am

Rebel wrote: ↑Tue Mar 03, 2020 8:21 am Now I have heard an experienced programmer say, Stockfish doing 75 million NPS will generate hundreds of collisions each second. Not my experience, ProDeo produces zero.

Stockfish stores just 16 bits in a TT entry. With 3 entries per bucket, this means 1 false hit per 2^16/3=21845.33 probes that should not have hit.

It is pretty easy to calculate the number of false hits if the hashing scheme is known (number of bits stored in a TT entry, number of TT entries per TT bucket). This is all assuming that Zobrist keys follow an ideal uniform distribution (which they pretty much do if nothing is terribly wrong).

Rebel · Post by **Rebel** » Tue Mar 03, 2020 2:13 pm

syzygy wrote: ↑Tue Mar 03, 2020 8:55 am
Rebel wrote: ↑Tue Mar 03, 2020 8:21 am Now I have heard an experienced programmer say, Stockfish doing 75 million NPS will generate hundreds of collisions each second. Not my experience, ProDeo produces zero.
Stockfish stores just 16 bits in a TT entry. With 3 entries per bucket, this means 1 false hit per 2^16/3=21845.33 probes that should not have hit.

How can that function, any idea?

hgm · Post by **hgm** » Tue Mar 03, 2020 2:36 pm

Rebel wrote: ↑Tue Mar 03, 2020 2:13 pmHow can that function, any idea?

Well, as Bob pointed out, collisions have virtually zero impact on Elo (even when they are what you call 'transpositions', so that their score is used). Unless they are more frequent than one every 10K nodes. Which with a 16-bit signature and 3-way bucket they aren't.

It can be added that even if thec ollisions would cause a measurable Elo-decrease, this is not without compensation. By storing a 16-bit signature instead of a 48-bit key you can nearly double the number of hash entries for the same TT size, and that in itself should be worth about 8 Elo. So if the collisions by themselves cause an Elo drop smaller than 8, this actually is the superior design. As the FishTest apparently demonstrated.

Rebel · Post by **Rebel** » Tue Mar 03, 2020 2:57 pm

hgm wrote: ↑Tue Mar 03, 2020 2:36 pm
Rebel wrote: ↑Tue Mar 03, 2020 2:13 pmHow can that function, any idea?
Well, as Bob pointed out, collisions have virtually zero impact on Elo (even when they are what you call 'transpositions', so that their score is used). Unless they are more frequent than one every 10K nodes. Which with a 16-bit signature and 3-way bucket they aren't.

It can be added that even if thec ollisions would cause a measurable Elo-decrease, this is not without compensation. By storing a 16-bit signature instead of a 48-bit key you can nearly double the number of hash entries for the same TT size, and that in itself should be worth about 8 Elo. So if the collisions by themselves cause an Elo drop smaller than 8, this actually is the superior design. As the FishTest apparently demonstrated.

That's quite a fun thing to do next, a hash key of 16 bits, wow...

Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)

Re: Measuring Hash Collisions (once again)