Not detected collisions in tt probing

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

RubiChess
Posts: 584
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Not detected collisions in tt probing

Post by RubiChess »

Hi.

I tried to save some space in my transposition table by saving only 16 bits of the zobrist key instead of 32 bits as I did before.

Now with default tt size of ~128MB I'm getting a (not detected) hash collision just by searhing from startpos up to depth 11:

First Position:
Key: 631bda933644df0e
FEN: rnbqkb1r/pppp1ppp/7B/8/3QP3/8/PPP2PPP/RN2KBNR b KQkq - 0 4

Second Position:
Key: 631b27f21cc4df0e
FEN: r1bqkb1r/ppp1p1pp/2n4n/3p1P2/8/2NB1P2/PPPP2PP/R1BQK1NR b KQkq - 0 5

The first 16 bits are saved in the TT entry, the last 22 bits are used for the index number in the table so you can see the collision can't be detected.

My question is: Is this collision "normal" with only 16 bits saved? Or is there a problem in my zobrist code or maybe even in the random number generator I'm using (http://www.burtleburtle.net/bob/rand/smallprng.html)?

Any help is appreciated.
- Andreas
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Not detected collisions in tt probing

Post by xr_a_y »

There are two things to check, the size of the hash, and how to get the index. Have a look at https://chessprogramming.wikispaces.com ... on%20Table if not already and also this http://preshing.com/20110504/hash-colli ... abilities/.

In Weini, I compute theoretical (type-2) collision rate using this approximation. My hash are 64bits, so the risk of type-1 collision is quite low.

Code: Select all

	double n = (double)Stats::visitednodes;
	double m = (double)Definitions::ttSize;
	double col = n-m+m*std::pow(1.-1./m,n);
        LOG(logINFO) << "col rate (theorical)    " << (n>0?100.f*col/n:0) << "%";
RubiChess
Posts: 584
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Not detected collisions in tt probing

Post by RubiChess »

Thanks for your answer. Of course I know the chessprogramming page and my problem deals with this statement:

"... many programmers choose to trade-off space for accuracy and only store that part of the hash key not already considered as index, or even less".

In my engine I always used the "even less". I have a (default) 128MB tt size which leads (considering entry size and bucket number) to an index length of 22bit (lower bits of the key) and I'm storing (upper) 32bit of the key inside the entry. So there remain 64-22-32 = 10bit for undetectable hash collision. Probability of two different hash keys differ only in these 10bit should be VERY low.

Now I try to store only 16 bit which increases the probability to "two hash keys differ only in these (64-22-16=)26bit". Before I start to calculate, I had a look at the stockfish source and they also store only 16bit of the key in the tt entry. So I guess it should be possible, shouldn't it?

I will try a different PRNG and see if it works better...

- Andreas
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Not detected collisions in tt probing

Post by syzygy »

16 bits can distinguish no more than 65536 positions, so this is normal.

That the other 22 bits are identical for these positions is no surprise, since all positions mapped to that TT entry have those 22 bits in common. Once the TT is full, every new probe of a TT entry will find a position in that entry that necessarily shares those 22 bits with the position being probed. If the probe is a real hit, the other 16 bits will also be the same. If the probe is not a real hit, the other bits will still be the same once for every 65536 positions probed.

So if you search a million positions per second, you should expect a dozen or so undetected errors per second.

In Stockfish the situation is even worse since each TT bucket contains three positions, which triples the number of false hits. Apparently this does not affect playing strength in a measurable way.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Not detected collisions in tt probing

Post by syzygy »

xr_a_y wrote: Wed May 09, 2018 2:34 pm There are two things to check, the size of the hash, and how to get the index. Have a look at https://chessprogramming.wikispaces.com ... on%20Table if not already and also this http://preshing.com/20110504/hash-colli ... abilities/.

In Weini, I compute theoretical (type-2) collision rate using this approximation. My hash are 64bits, so the risk of type-1 collision is quite low.

Code: Select all

	double n = (double)Stats::visitednodes;
	double m = (double)Definitions::ttSize;
	double col = n-m+m*std::pow(1.-1./m,n);
        LOG(logINFO) << "col rate (theorical)    " << (n>0?100.f*col/n:0) << "%";
This formula is always mentioned when this topic comes up but imo not very useful since it ignores the reality that the hash table gets full and its entries overwritten. It does not matter that many positions will have identical hash keys if those positions are not stored in the TT at the same time.

Once you assume the TT is full, which it will always be if you don't clear the TT between moves, the math becomes very simple. With N bits stored in the TT entry and M entries per bucket, the error rate is 1 error per 2^N/M probes that do not hit.
Last edited by syzygy on Wed May 09, 2018 5:06 pm, edited 1 time in total.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Not detected collisions in tt probing

Post by AlvaroBegue »

You could also check if the hash move is legal in the current position, as further verification.

One final twist to that trick consist of storing the move XORed with the additional bits of the hash key that you are not saving or using for addressing. That way the probability that the move you read back is valid is drastically reduced. If you use a 16-bit move encoding and there are only 40 legal moves, the trick will reduce the undetected collision rate by a factor of ~1600.
RubiChess
Posts: 584
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Not detected collisions in tt probing

Post by RubiChess »

syzygy wrote: Wed May 09, 2018 3:05 pm In Stockfish the situation is even worse since each TT bucket contains three positions, which triples the number of false hits. Apparently this does not affect playing strength in a measurable way.
This is really a miracle.
My engine starts to play really crap if I e.g. store the static evaluation of the position in tt and use it with (not reliable working) 16bit hash key detection.
Changing to a different PRNG doesn't change anything as you expected. So I will stay at 32bit keys in the tt.

Thanks!
RubiChess
Posts: 584
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Not detected collisions in tt probing

Post by RubiChess »

AlvaroBegue wrote: Wed May 09, 2018 3:36 pm You could also check if the hash move is legal in the current position, as further verification.

One final twist to that trick consist of storing the move XORed with the additional bits of the hash key that you are not saving or using for addressing. That way the probability that the move you read back is valid is drastically reduced. If you use a 16-bit move encoding and there are only 40 legal moves, the trick will reduce the undetected collision rate by a factor of ~1600.
Good idea. WIll try if the descrease of speed by this extra check can be compensated by the saved space in tt.
Thanks.
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: Not detected collisions in tt probing

Post by brianr »

Hash collisions have been looked at and have surprisingly little impact.
See:
http://www.craftychess.com/hyatt/collisions.html
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Not detected collisions in tt probing

Post by Joost Buijs »

brianr wrote: Wed May 09, 2018 4:00 pm Hash collisions have been looked at and have surprisingly little impact.
See:
http://www.craftychess.com/hyatt/collisions.html
Indeed, an a-b search is quite permissive for an occasional error now and then.
The biggest problem is that there is a possibility that the engine will crash when it gets an illegal move from the TT without verifying it.
To be on the safe side I store the whole 64 bit key and still have room enough for everything else.
I use 4 entries of 16 byte in a bucket, each bucket aligned on a 64 byte cache line, I don't see any reason why I should try to make it smaller.
Nowadays a PC has so much memory that you will never use it completely for your TT anyway.