TT Key Collisions, Workarounds?

hgm · Post by **hgm** » Wed Aug 17, 2011 10:12 am

The size of the hash table actually has little effect, if you assume that it will fill up. The length of the (non-address part of the) signature determines the probability that you will find an unrelated position there that by coincidence had the same signature.

bob wrote:But certainly you don't want that many...

Why do you so easily assume this as a certainty? Of course the collisions in itself are never helping you, but the increased packing density of the hash table because of the shorter signature does, and might more than offset the detrimental effect of collisions.

You can acheive a 1 in 1000 error rate with a signature as short as 10 bits. Together with a 6-bit depth, 16-bit score and 16-bit move + flags that would make a 6-byte entry, of which you could cram 10 into a cache line, rather than the usual 4 or 5. That means doubling the effective hash capacity, which should be worth something...

rbarreira · Post by **rbarreira** » Wed Aug 17, 2011 10:45 am

wgarvin wrote:
Clemens Pruell wrote:Yes, of course I store the full Zobrist Key in order to detect Type B (Index-) Collisions. It's only the Type A - collisions which seem to happen more often than one would think.
Type A collisions should be extremely rare. (More than one in a game should be astronomically unlikely). Have you considered whether your Zobrist numbers are random enough?

If you are just using C library rand() or something, you should replace it with some custom PRNG code. Use the Mersenne twister or something like RKISS that Stockfish uses.

Or people could just stop dicking around with pseudo-random numbers. Instead, get a file with random bits from random.org and load that into your Zobrist key array:

http://www.random.org/files/

wgarvin · Post by **wgarvin** » Wed Aug 17, 2011 1:49 pm

rbarreira wrote:
wgarvin wrote:
Clemens Pruell wrote:Yes, of course I store the full Zobrist Key in order to detect Type B (Index-) Collisions. It's only the Type A - collisions which seem to happen more often than one would think.
Type A collisions should be extremely rare. (More than one in a game should be astronomically unlikely). Have you considered whether your Zobrist numbers are random enough?

If you are just using C library rand() or something, you should replace it with some custom PRNG code. Use the Mersenne twister or something like RKISS that Stockfish uses.
Or people could just stop dicking around with pseudo-random numbers. Instead, get a file with random bits from random.org and load that into your Zobrist key array:

http://www.random.org/files/

Why bother? Random numbers have absolutely no benefit over well chosen pseudo-random numbers for this application.

rbarreira · Post by **rbarreira** » Wed Aug 17, 2011 2:12 pm

wgarvin wrote:
rbarreira wrote:
wgarvin wrote:
Clemens Pruell wrote:Yes, of course I store the full Zobrist Key in order to detect Type B (Index-) Collisions. It's only the Type A - collisions which seem to happen more often than one would think.
Type A collisions should be extremely rare. (More than one in a game should be astronomically unlikely). Have you considered whether your Zobrist numbers are random enough?

If you are just using C library rand() or something, you should replace it with some custom PRNG code. Use the Mersenne twister or something like RKISS that Stockfish uses.
Or people could just stop dicking around with pseudo-random numbers. Instead, get a file with random bits from random.org and load that into your Zobrist key array:

http://www.random.org/files/
Why bother? Random numbers have absolutely no benefit over well chosen pseudo-random numbers for this application.

Because then we wouldn't need to have discussions about which pseudo-random number generator is good enough (as seen in this thread), which generator code is available and publicly licensed for one's chosen programming language with portable code guaranteed to return the same data on 32-bit and 64-bit architectures, etc...

I don't see why it's a "bother" compared to the alternatives...

Don · Post by **Don** » Wed Aug 17, 2011 2:51 pm

rbarreira wrote:
wgarvin wrote:
rbarreira wrote:
wgarvin wrote:
Clemens Pruell wrote:Yes, of course I store the full Zobrist Key in order to detect Type B (Index-) Collisions. It's only the Type A - collisions which seem to happen more often than one would think.
Type A collisions should be extremely rare. (More than one in a game should be astronomically unlikely). Have you considered whether your Zobrist numbers are random enough?

If you are just using C library rand() or something, you should replace it with some custom PRNG code. Use the Mersenne twister or something like RKISS that Stockfish uses.
Or people could just stop dicking around with pseudo-random numbers. Instead, get a file with random bits from random.org and load that into your Zobrist key array:

http://www.random.org/files/
Why bother? Random numbers have absolutely no benefit over well chosen pseudo-random numbers for this application.
Because then we wouldn't need to have discussions about which pseudo-random number generator is good enough (as seen in this thread), which generator code is available and publicly licensed for one's chosen programming language with portable code guaranteed to return the same data on 32-bit and 64-bit architectures, etc...

I don't see why it's a "bother" compared to the alternatives...

The issue about whether it's legal is one that is hyper exaggerated only by a few. Since the issue with Rybka and the talk of the clones you see a mentality with some people who are looking to nit-pick any imagined infraction as proof that "everyone is doing the same thing." This is just a lot of "sour grapes" immaturity. So even if you get random bits from random.org it would not surprise me if there were people carefully checking each program to make sure we didn't copy our random bits from someone else - it's that ridiculous.

We are engineers and we sometimes make a big deal out of the quality of the random numbers as you say. It's in our nature. If you go back a few years you will see that it was even worse - we were carefully tuning the random numbers to minimize the hamming distance between any two numbers and being very anal about the whole thing. It didn't help in any way that could be measured but it made us feel good

I think the majority of programmers like building their data structures on the fly when practical to minimize the size of the binary and make the source code more compact - so I doubt you will see much usage of random.org.

You can also get true random numbers on any Linux machine. You simply read /dev/random or /dev/urandom - one of them produces random numbers based on chaotic system state and the other combines this with a pseudo random number generator to simulate random numbers better than any software PRNG ever could.

tpetzke · Post by **tpetzke** » Wed Aug 17, 2011 4:42 pm

Whats wrong in suggesting a different random number generator when the one you use might be the cause of trouble.

Some compiler implement very simple random generators. MSVC rand() uses something like this

(((rand = rand * 214013L + 2531011L) >> 16) & 0x7fff);

Maybe this is enough for a chess engine, maybe not. Trying an alternative will probably not hurt.

Thomas...

Don · Post by **Don** » Wed Aug 17, 2011 4:50 pm

tpetzke wrote:Whats wrong in suggesting a different random number generator when the one you use might be the cause of trouble.

Some compiler implement very simple random generators. MSVC rand() uses something like this

(((rand = rand * 214013L + 2531011L) >> 16) & 0x7fff);

Maybe this is enough for a chess engine, maybe not. Trying an alternative will probably not hurt.

Thomas...

There is nothing wrong with that. The issue has come up so often that I think people get annoyed, but they shouldn't.

You are probably right that the worst ones might cause trouble.

By the way, years ago I had a bug in one program where I was throwing away a bit of precision and this also affected the address. So I was effectively using only half the hash table and missing a bit of verification too.

bob · Post by **bob** » Wed Aug 17, 2011 5:16 pm

hgm wrote:The size of the hash table actually has little effect, if you assume that it will fill up. The length of the (non-address part of the) signature determines the probability that you will find an unrelated position there that by coincidence had the same signature.

bob wrote:But certainly you don't want that many...
Why do you so easily assume this as a certainty? Of course the collisions in itself are never helping you, but the increased packing density of the hash table because of the shorter signature does, and might more than offset the detrimental effect of collisions.

You can acheive a 1 in 1000 error rate with a signature as short as 10 bits. Together with a 6-bit depth, 16-bit score and 16-bit move + flags that would make a 6-byte entry, of which you could cram 10 into a cache line, rather than the usual 4 or 5. That means doubling the effective hash capacity, which should be worth something...

There is a secondary issue. If you have just one entry in the table, then yes, you can get a collision, but with every position overwriting that entry, you would have to get a collision on two consecutive probes which is far more unlikely than if you have a bunch of possible entries sitting around to collide with...

Neither is very probable, but I did find that about 2-3 times per game I would see just one position where the key matched, but the stored move was illegal. This leads me to believe that there were more collisions than 2-3, because in some positions that are different, the same legal move exists...

hgm · Post by **hgm** » Wed Aug 17, 2011 9:25 pm

bob wrote:There is a secondary issue. If you have just one entry in the table, then yes, you can get a collision, but with every position overwriting that entry, you would have to get a collision on two consecutive probes which is far more unlikely than if you have a bunch of possible entries sitting around to collide with...

I don't think that is true. To get a collisions in any of the entries, you need two consecutive positions mapping to that entry collide, which is exactly as unlikely as for two consecutive probes in general (if the signature does not contain address bits). For any entry it holds that you will only detect collisions with the last position stored there. Not with all positions stored there before.

Only when a probe compares with multiple entries (like 4 or 8), it drives up the colision probability by that same factor. So the bucket size might matter, but not the total hash size.

kbhearn · Post by **kbhearn** » Fri Aug 19, 2011 3:45 am

One thing you could do if you want to eliminate the possibility is only generate the zobrist key for the sake of indexing, and store a compressed form of the entire position in the hash table for position verification, though your entry size will get rather large reducing the number of entries you can have in the table. It'd probably take at least 32 bytes per hash entry (and the simplest/fastest way to stuff a position in from bitboard representations would take 32 bytes just for the position itself).

TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?

Re: TT Key Collisions, Workarounds?