hash collisions

D Sceviour · Post by **D Sceviour** » Sat Feb 01, 2020 5:00 am

jdart wrote: ↑Tue Jan 28, 2020 5:09 am I recently got a note from TCEC team and they were basically saying that since they are going to be running on 176 cores for the next season, they are concerned that engines be able to handle this well. They mention this testing technique suggested by the Demmolito author:

I found a crashing bug linked to HT races. I don't know if this is the only bug causing crashes, but it sure is the most likely candidate (ie. rare crashes, whose probability of occurrence increases with nb of threads).

This might be of interest to other programmers. The debugging technique consists of simulating in single threaded search, the effect of SMP races. The important thing to note is that, even if HT read/writes are racing, the read and write operations are still atomic on 64-bit words. So the result of an HT race is not completely random, but of the form e'=(key1,data2) mixing two entries e1=(key1,data1) and e2=(key2,data2) that were racing. This can be simulated in single threaded mode (which is 100% deterministic), with as high probability of crashing as one desires, by reducing the number of bits used (ie. generate more collisions). Example using only 8-bits instead of 64-bit, and demolito crashes very easily:

if (((e->keyXorData ^ e->data) & 0xFFull) == (key & 0xFFull)) {...}

Would it be simpler to set a very small hash table - say 8k?

bob · Post by **bob** » Sat Feb 01, 2020 5:44 am

Daniel Shawul wrote: ↑Fri Jan 31, 2020 5:39 pm
jdart wrote: ↑Fri Jan 31, 2020 5:03 pm There are two different things:

1. The possibility that distinct positions produce the same hash.
2. The possibility that two threads compete, one writing the hash entry while another is reading the same entry.

The lockless hash method prevents 2. I don't think it prevents 1 or resolves the possible issues from it, in particular fetching an illegal move.

---Jon
Yes, but if you do legality testing, you don't need to solve (2). Many engines do not use lockless hashing including stockfish I think.
Infact, I used to do legality testing only and removed it after I added lockless hashing and it was pretty safe until now.
So lockless hashing looks redundant now. Or one can move to 128 bit hashkeys, as used in perfts, and still use the hash move without legality testing.

None of that helps with the base problem, namely interleaved update. You have two choices if you want to avoid the minor overhead (xor instruction) of lockless hashing.

(1) reduce a hash entry to 64 bits, which means every write will automatically be "atomic" or
(2) use a semaphore (lock) to protect the hash entry while reading or writing, to prevent any interleaved updating or out of order memory writes from corrupting a position.

Both of those are somewhat distasteful from a performance perspective, particularly as core count increases. Leaving lockless hashing as the most practical solution. I don't see how ANY parallel search program can avoid choosing lockless or (1) or (2) above, otherwise you need to be completely immune to corrupted hash entries. The wrong score will be meaningless. You can tolerate millions of those without even noticing. But if something will cause you to crash and burn, you have to take action.

After the paper I wrote with Cozzie, it became pretty obvious that the problems could be safely ignored. But with a working solution (to a problem that is really not a problem) I chose to leave the lockless hashing in. There is ANOTHER place where it was much more important for me, pawn hashing. There I store lots of data, masks about passed pawns, open files, king shelter, etc. I really couldn't stand some of those to be wrong because the mask tells me how many pawns to loop over, for example. If it has too many bits, it would (and did) cause me problems. The "lockless" approach works with arbitrary-sized structs so pawn hashing was a simple fix. I don't store moves so legality is not an issue from that perspective, but I do store lots of other stuff that has to be correct to avoid breaking my pawn evaluation.

JohnWoe · Post by **JohnWoe** » Sat Feb 01, 2020 8:55 pm

There are 2 hash collision scenarios:
1. Program crashes.
2. Program plays a bad move and loses. ( Might also play a neutral/good move too... )

Case 1: is clearly bad. A chess engine shouldn't crash by any input.
Case 2: can be tolerated if it happens seldom enough.

In my engine I use hash for 3 things:

Code: Select all

Search: I store index of good moves. So hash collisions doesn't matter at all. 
Eval:   I store eval scores. Here hash collision clearly hurts. But it happens every 2^64 nodes so not worrying.
3 rep:  Any decent chess engine should know something about repetitions in search.

To me it seems like a non-issue. The cure is worse than the disease. One could add perfect hashing by using array of uint_64t.

I think in the future we might have something uint128_t
In Sapeli then all I do just change 1 line:

Code: Select all

#define BITBOARD uint128_t

And that's it.

bob · Post by **bob** » Sun Feb 02, 2020 4:09 am

As a note, that does NOTHING for the SMP race condition being discussed. 64 bits are written atomically and every bit will either be from position A or position B if two threads write to the same address at the same time. But with 128 bits, you get two separate non-atomic writes that can result in errors. As I mentioned, so long as a hashing collision won't cause a crash, a small number of collisions is perfectly acceptable.

ydebilloez · Post by **ydebilloez** » Sun Feb 02, 2020 2:40 pm

I want to add some idea to avoid hash collisions. Maybe not new but have not found them.

We initialize the hash keys with random numbers or with pseudo random numbers that are repeatable and shared in between runs or implementations. However, while randomness gives us the idea that collision is unlikely, it is inherently possible as long as we use a hash instead of the full description.

A chessboard could be implemented in FEN, with approximately a maximum of 344 bits (or somewhat less) if we do something with the pawns on row 1 and 8. As such, the higher limit for the hash key is 43 bytes (or somewhat less). There are certainly better representations that make it more compact but a full board representation will still remain much higher than typical hash key lengths. The lower end used in programs is 16 or 32 bits with discussion to increase it to 64 or 128 bits to make collision less likely.

The basic idea expressed here is that as long as we don't use the full board representation (in its most compressed side), collissions, although at an almost inprobable way, will theoretically continue to exists. So we need optimisations to avoid collisions based on a normal evolution of a game.

First idea:
If we would count the number or pieces on the board, somewhere in between 3 pieces (2 kings doesn't interest us), and 32 pieces (initial position) and use this value as absolute value as part of the hash key, we could achieve in my opinion 2 things.

We create aging, as in a chess-game, the number of pieces will likely be reduced.
And we create 29 different subsets of hashkeys that are unlikely to happen. We could use the aging to clean out all hash keys in the storage that are being invalidated automatically. (Idea: As to optimize storage, we could use a value like 32 - pieces on board. This way, newly created keys will be ultimately higher values.)

Putting the number or pieces also has the advantage that hash is optimized for detecting 3 fold repetition. Only the first 5 bits need to be compared when travelling through hash.

Second idea: (certainly not new)
As per my opinion, the side to move should not be xored in the hash-key, but should be a fixed bit. There is no added value in xoring the complete hash key with the side to move, as as a single bit, the risk of collision will not be obscured while maintaining the same element. Speaking differently, the zobrist key for representing the side to move would be 0x00..01 and all other keys 0x...0.

As in my opinion, we should not reserve special bits for castling rights, hashing them into the global has would be as fine... (while some examples on this side have shown that the cost of validating validity of castling in the transposition table is expensive!!)

Conclusion, what about following structure: (5 bits pieces captured)(hash key size)(side to move bit)

For TT reasons, we can continue to use the legal move checking and all other techniques. This is just about reducing collisions by splitting game stage by some absolute data. In my engine, I am testing with 26/58 bit + 5 + 1...

Rebel · Post by **Rebel** » Sun Feb 02, 2020 3:54 pm

bob wrote: ↑Fri Jan 31, 2020 6:00 am Several years ago Anthony Cozzie and I wrote a paper for the ICGA journal addressing this issue. My original question was "how many hash collisions per search is enough to cause a different (worse) move to be played?" I set up the experiment with Crafty and discovered that our search trees are so large, and so redundant, that we could tolerate far more collisions than I thought possible. I contacted Cozzie and explained how I had tested. He modified Zappa to try the same test and got similar numbers to mine. The bottom line was that one collision out of every 1M positions caused zero problems, which surprised me. In fact, one error every 10K nodes had very little effect, not that I would suggest using such a small hash signature to produce this error rate.

Allow me to ask? How did you recognize a collision?

bob wrote: ↑Fri Jan 31, 2020 6:00 am Bottom line, a 32 bit key works more than good enough (I still use 64 bits myself, as one day the search speeds will reach the point where bigger is better since errors to accumulate at ultra-high NPS values.

My first TT implementation was 32 bit, it worked well in the midgame, went terribly wrong in the late endgame. Moving to a 48-bit key solved the problem.

During that period to check for collisions I created a special version that stored the full board also in the TT. Had it running for hours per position, no collisions reported with a 48-bit key. Alright, that was 1989/90, hardware allowing only 2500 NPS or so

bob · Post by **bob** » Sun Feb 02, 2020 7:12 pm

I believe the most compact board representation to date has been 168 bits. You can probably find the discussion on rec.games.chess.computer, but it was at least 20 years ago.

As I mentioned, one can tolerate far more collisions than you might think. See the ICGA paper I wrote with Cozzie where we measured this carefully. The only significant issue is if your program can't survive a collision for whatever reason. My primary hashing could care less. Pawn hashing will cause crashes on collisions. Where a collision here is defined as key that does not go with the data, rather than two different positions mapping to the same hash key. The former is very likely in SMP searches.

lucasart · Post by **lucasart** » Tue Feb 04, 2020 1:09 pm

D Sceviour wrote: ↑Sat Feb 01, 2020 5:00 am
jdart wrote: ↑Tue Jan 28, 2020 5:09 am I recently got a note from TCEC team and they were basically saying that since they are going to be running on 176 cores for the next season, they are concerned that engines be able to handle this well. They mention this testing technique suggested by the Demmolito author:

I found a crashing bug linked to HT races. I don't know if this is the only bug causing crashes, but it sure is the most likely candidate (ie. rare crashes, whose probability of occurrence increases with nb of threads).

This might be of interest to other programmers. The debugging technique consists of simulating in single threaded search, the effect of SMP races. The important thing to note is that, even if HT read/writes are racing, the read and write operations are still atomic on 64-bit words. So the result of an HT race is not completely random, but of the form e'=(key1,data2) mixing two entries e1=(key1,data1) and e2=(key2,data2) that were racing. This can be simulated in single threaded mode (which is 100% deterministic), with as high probability of crashing as one desires, by reducing the number of bits used (ie. generate more collisions). Example using only 8-bits instead of 64-bit, and demolito crashes very easily:

if (((e->keyXorData ^ e->data) & 0xFFull) == (key & 0xFFull)) {...}

Would it be simpler to set a very small hash table - say 8k?

No. It's not the same thing. The purpose of the exercise is to reproduce races (an SMP phenomenon) with collisions (a single threaded deterministic phenomenon which can be reproduced with arbitrary frequency of occurence). Even if you have a very small hash table, but 64-bit keys, you won't have many collisions.

lucasart · Post by **lucasart** » Tue Feb 04, 2020 1:19 pm

bob wrote: ↑Sat Feb 01, 2020 5:44 am
Daniel Shawul wrote: ↑Fri Jan 31, 2020 5:39 pm
jdart wrote: ↑Fri Jan 31, 2020 5:03 pm There are two different things:

1. The possibility that distinct positions produce the same hash.
2. The possibility that two threads compete, one writing the hash entry while another is reading the same entry.

The lockless hash method prevents 2. I don't think it prevents 1 or resolves the possible issues from it, in particular fetching an illegal move.

---Jon
Yes, but if you do legality testing, you don't need to solve (2). Many engines do not use lockless hashing including stockfish I think.
Infact, I used to do legality testing only and removed it after I added lockless hashing and it was pretty safe until now.
So lockless hashing looks redundant now. Or one can move to 128 bit hashkeys, as used in perfts, and still use the hash move without legality testing.
None of that helps with the base problem, namely interleaved update. You have two choices if you want to avoid the minor overhead (xor instruction) of lockless hashing.

(1) reduce a hash entry to 64 bits, which means every write will automatically be "atomic" or
(2) use a semaphore (lock) to protect the hash entry while reading or writing, to prevent any interleaved updating or out of order memory writes from corrupting a position.

Both of those are somewhat distasteful from a performance perspective, particularly as core count increases. Leaving lockless hashing as the most practical solution. I don't see how ANY parallel search program can avoid choosing lockless or (1) or (2) above, otherwise you need to be completely immune to corrupted hash entries. The wrong score will be meaningless. You can tolerate millions of those without even noticing. But if something will cause you to crash and burn, you have to take action.

After the paper I wrote with Cozzie, it became pretty obvious that the problems could be safely ignored. But with a working solution (to a problem that is really not a problem) I chose to leave the lockless hashing in. There is ANOTHER place where it was much more important for me, pawn hashing. There I store lots of data, masks about passed pawns, open files, king shelter, etc. I really couldn't stand some of those to be wrong because the mask tells me how many pawns to loop over, for example. If it has too many bits, it would (and did) cause me problems. The "lockless" approach works with arbitrary-sized structs so pawn hashing was a simple fix. I don't store moves so legality is not an issue from that perspective, but I do store lots of other stuff that has to be correct to avoid breaking my pawn evaluation.

Completely agree with you on all points. This also corresponds to my experience.

(1) forces you to make sacrifices in what you store in hash entries, which inevitably loose elo (eg. drop the eval means you have to recalculate it, drop the score and store only the move, means you lose HT pruning, ok you search the hash move first and PVS makes it cheap, but still).

(2) has incompressible overhead which is the cost of acquire/relase of an uncontended lock. That's best case scenario with a strategy of splitting the HT in enough parts to make it almost uncontended. But in practice, I'm not sure the hit ratio on the HT is homogenous, some entries get hit more than others, so...

Just accepting races, and not caring, as no elo impact (as done in SF for example). But, as you say, the cost of fixing this (with simple xor trick) is very small. So piece of mind for a minuscule amount of added code complexity, a reasonable trade-off. The only problem with the xor trick is that the C standard offers no guarantee that it will compile the way you think it should. So, to be pedantic, you have to write it with relaxed atomics, which is verbose, and that's annoying.

mar · Post by **mar** » Tue Feb 04, 2020 3:50 pm

lucasart wrote: ↑Tue Feb 04, 2020 1:19 pm The only problem with the xor trick is that the C standard offers no guarantee that it will compile the way you think it should. So, to be pedantic, you have to write it with relaxed atomics, which is verbose, and that's annoying.

I don't see any compiler out there miscompiling this. This has been discussed in the past IIRC.
Do you remember the hypothetical science-fiction case where the xor trick would break? (out of curiosity)

hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions