TalkChess.com

Posted: **Wed May 09, 2018 12:39 pm**

Hi.

I tried to save some space in my transposition table by saving only 16 bits of the zobrist key instead of 32 bits as I did before.

Now with default tt size of ~128MB I'm getting a (not detected) hash collision just by searhing from startpos up to depth 11:

First Position:
Key: 631bda933644df0e
FEN: rnbqkb1r/pppp1ppp/7B/8/3QP3/8/PPP2PPP/RN2KBNR b KQkq - 0 4

Second Position:
Key: 631b27f21cc4df0e
FEN: r1bqkb1r/ppp1p1pp/2n4n/3p1P2/8/2NB1P2/PPPP2PP/R1BQK1NR b KQkq - 0 5

The first 16 bits are saved in the TT entry, the last 22 bits are used for the index number in the table so you can see the collision can't be detected.

My question is: Is this collision "normal" with only 16 bits saved? Or is there a problem in my zobrist code or maybe even in the random number generator I'm using (http://www.burtleburtle.net/bob/rand/smallprng.html)?

Any help is appreciated.
- Andreas

Posted: **Wed May 09, 2018 2:34 pm**

There are two things to check, the size of the hash, and how to get the index. Have a look at https://chessprogramming.wikispaces.com ... on%20Table if not already and also this http://preshing.com/20110504/hash-colli ... abilities/.

In Weini, I compute theoretical (type-2) collision rate using this approximation. My hash are 64bits, so the risk of type-1 collision is quite low.

Code: Select all

	double n = (double)Stats::visitednodes;
	double m = (double)Definitions::ttSize;
	double col = n-m+m*std::pow(1.-1./m,n);
        LOG(logINFO) << "col rate (theorical)    " << (n>0?100.f*col/n:0) << "%";

Posted: **Wed May 09, 2018 3:04 pm**

Thanks for your answer. Of course I know the chessprogramming page and my problem deals with this statement:

"... many programmers choose to trade-off space for accuracy and only store that part of the hash key not already considered as index, or even less".

In my engine I always used the "even less". I have a (default) 128MB tt size which leads (considering entry size and bucket number) to an index length of 22bit (lower bits of the key) and I'm storing (upper) 32bit of the key inside the entry. So there remain 64-22-32 = 10bit for undetectable hash collision. Probability of two different hash keys differ only in these 10bit should be VERY low.

Now I try to store only 16 bit which increases the probability to "two hash keys differ only in these (64-22-16=)26bit". Before I start to calculate, I had a look at the stockfish source and they also store only 16bit of the key in the tt entry. So I guess it should be possible, shouldn't it?

I will try a different PRNG and see if it works better...

- Andreas

Posted: **Wed May 09, 2018 3:05 pm**

16 bits can distinguish no more than 65536 positions, so this is normal.

That the other 22 bits are identical for these positions is no surprise, since all positions mapped to that TT entry have those 22 bits in common. Once the TT is full, every new probe of a TT entry will find a position in that entry that necessarily shares those 22 bits with the position being probed. If the probe is a real hit, the other 16 bits will also be the same. If the probe is not a real hit, the other bits will still be the same once for every 65536 positions probed.

So if you search a million positions per second, you should expect a dozen or so undetected errors per second.

In Stockfish the situation is even worse since each TT bucket contains three positions, which triples the number of false hits. Apparently this does not affect playing strength in a measurable way.

Posted: **Wed May 09, 2018 3:17 pm**

xr_a_y wrote: ↑Wed May 09, 2018 2:34 pm There are two things to check, the size of the hash, and how to get the index. Have a look at https://chessprogramming.wikispaces.com ... on%20Table if not already and also this http://preshing.com/20110504/hash-colli ... abilities/.

In Weini, I compute theoretical (type-2) collision rate using this approximation. My hash are 64bits, so the risk of type-1 collision is quite low.
Code: Select all
	double n = (double)Stats::visitednodes;
	double m = (double)Definitions::ttSize;
	double col = n-m+m*std::pow(1.-1./m,n);
        LOG(logINFO) << "col rate (theorical)    " << (n>0?100.f*col/n:0) << "%";

This formula is always mentioned when this topic comes up but imo not very useful since it ignores the reality that the hash table gets full and its entries overwritten. It does not matter that many positions will have identical hash keys if those positions are not stored in the TT at the same time.

Once you assume the TT is full, which it will always be if you don't clear the TT between moves, the math becomes very simple. With N bits stored in the TT entry and M entries per bucket, the error rate is 1 error per 2^N/M probes that do not hit.

Posted: **Wed May 09, 2018 3:36 pm**

You could also check if the hash move is legal in the current position, as further verification.

One final twist to that trick consist of storing the move XORed with the additional bits of the hash key that you are not saving or using for addressing. That way the probability that the move you read back is valid is drastically reduced. If you use a 16-bit move encoding and there are only 40 legal moves, the trick will reduce the undetected collision rate by a factor of ~1600.

Posted: **Wed May 09, 2018 3:56 pm**

syzygy wrote: ↑Wed May 09, 2018 3:05 pm In Stockfish the situation is even worse since each TT bucket contains three positions, which triples the number of false hits. Apparently this does not affect playing strength in a measurable way.

This is really a miracle.
My engine starts to play really crap if I e.g. store the static evaluation of the position in tt and use it with (not reliable working) 16bit hash key detection.
Changing to a different PRNG doesn't change anything as you expected. So I will stay at 32bit keys in the tt.

Thanks!

Posted: **Wed May 09, 2018 3:59 pm**

AlvaroBegue wrote: ↑Wed May 09, 2018 3:36 pm You could also check if the hash move is legal in the current position, as further verification.

One final twist to that trick consist of storing the move XORed with the additional bits of the hash key that you are not saving or using for addressing. That way the probability that the move you read back is valid is drastically reduced. If you use a 16-bit move encoding and there are only 40 legal moves, the trick will reduce the undetected collision rate by a factor of ~1600.

Good idea. WIll try if the descrease of speed by this extra check can be compensated by the saved space in tt.
Thanks.

Posted: **Wed May 09, 2018 4:00 pm**

Hash collisions have been looked at and have surprisingly little impact.
See:
http://www.craftychess.com/hyatt/collisions.html

Posted: **Wed May 09, 2018 4:36 pm**

brianr wrote: ↑Wed May 09, 2018 4:00 pm Hash collisions have been looked at and have surprisingly little impact.
See:
http://www.craftychess.com/hyatt/collisions.html

Indeed, an a-b search is quite permissive for an occasional error now and then.
The biggest problem is that there is a possibility that the engine will crash when it gets an illegal move from the TT without verifying it.
To be on the safe side I store the whole 64 bit key and still have room enough for everything else.
I use 4 entries of 16 byte in a bucket, each bucket aligned on a 64 byte cache line, I don't see any reason why I should try to make it smaller.
Nowadays a PC has so much memory that you will never use it completely for your TT anyway.

TalkChess.com

Not detected collisions in tt probing

Not detected collisions in tt probing

Re: Not detected collisions in tt probing

Re: Not detected collisions in tt probing

Re: Not detected collisions in tt probing

Re: Not detected collisions in tt probing

Re: Not detected collisions in tt probing

Re: Not detected collisions in tt probing

Re: Not detected collisions in tt probing

Re: Not detected collisions in tt probing

Re: Not detected collisions in tt probing