Is 64 bits enough as a hash lenght

mathmoi · Post by **mathmoi** » Thu Jul 10, 2008 9:27 pm

Hi,

We all know that we need to check the validity of a TT move before we execute it on the board because otherwise we could end up corrupting the engine's internal board - and probably crash the engine.

What are the probabilities of this occuring? Let's say with a 8Gb TT and searching 15 millions nps (both values are a higher than what I expect to be able to achieve with my current hardware).

8*1024^3/16 = 536870912 entries in the TT

I need the 29 lower bits of the hash to find the index in the TT.

So the remainings 33 bits will be the ones to cause a hash collision. That means I'll get a collisions every 2^33 probes.

2^33/15000000 = 572.6 seconds between each collisions.

That means that if I don't check the validity of the move in the TT I'll probably crash every 9.5 minutes.

However if I use a 96 bits hash size (32+64bits) I can use the 29 lower bits of the 32 part to get an index and store the full 64 bits part in the TT to look for collisions. Now i'll get a collision every 2^64 probes or :

2^64/15000000 = 1229782938247 seconds between each collisions or about 40000 years.

Now my question to you. Can it be a good idea to use a 96 bits hash and completely avoid the validity check of the TT moves (since we don't mind crashing every 40000 years)?

MP

Harald · Post by **Harald** » Thu Jul 10, 2008 11:14 pm

mathmoi wrote:Hi,

We all know that we need to check the validity of a TT move before we execute it on the board because otherwise we could end up corrupting the engine's internal board - and probably crash the engine.

What are the probabilities of this occuring? Let's say with a 8Gb TT and searching 15 millions nps (both values are a higher than what I expect to be able to achieve with my current hardware).

8*1024^3/16 = 536870912 entries in the TT

I need the 29 lower bits of the hash to find the index in the TT.

So the remainings 33 bits will be the ones to cause a hash collision. That means I'll get a collisions every 2^33 probes.

2^33/15000000 = 572.6 seconds between each collisions.

That means that if I don't check the validity of the move in the TT I'll probably crash every 9.5 minutes.

However if I use a 96 bits hash size (32+64bits) I can use the 29 lower bits of the 32 part to get an index and store the full 64 bits part in the TT to look for collisions. Now i'll get a collision every 2^64 probes or :

2^64/15000000 = 1229782938247 seconds between each collisions or about 40000 years.

Now my question to you. Can it be a good idea to use a 96 bits hash and completely avoid the validity check of the TT moves (since we don't mind crashing every 40000 years)?

MP

Some other possibilities and thoughts:

You can do a test of the best move from hash even when you do not need it,
because you just use the value for cut-off.
Or you can skip this test.
You can test the hash move when you need it as first move in your search
or you can skip the test.
You can test the from and to squares and pieces with or without an incheck-test.

The test scenarios are scalable. How much bits they are worth I don't know.

Even if you have a collision (and no crash) that is probably somewhere
in the search outside the principle variation and does not change the move to play.

Now the problem is not so easy any more. Think again.

Harald

bob · Post by **bob** » Thu Jul 10, 2008 11:32 pm

mathmoi wrote:Hi,

We all know that we need to check the validity of a TT move before we execute it on the board because otherwise we could end up corrupting the engine's internal board - and probably crash the engine.

What are the probabilities of this occuring? Let's say with a 8Gb TT and searching 15 millions nps (both values are a higher than what I expect to be able to achieve with my current hardware).

8*1024^3/16 = 536870912 entries in the TT

I need the 29 lower bits of the hash to find the index in the TT.

So the remainings 33 bits will be the ones to cause a hash collision. That means I'll get a collisions every 2^33 probes.

2^33/15000000 = 572.6 seconds between each collisions.

That means that if I don't check the validity of the move in the TT I'll probably crash every 9.5 minutes.

However if I use a 96 bits hash size (32+64bits) I can use the 29 lower bits of the 32 part to get an index and store the full 64 bits part in the TT to look for collisions. Now i'll get a collision every 2^64 probes or :

2^64/15000000 = 1229782938247 seconds between each collisions or about 40000 years.

Now my question to you. Can it be a good idea to use a 96 bits hash and completely avoid the validity check of the TT moves (since we don't mind crashing every 40000 years)?

MP

It is completely worthless to use 96 bits. If you look at the paper Cozzie and I wrote about the effect of hash collisions on the quality of the search, the results were surprising. One error every 1,000,000 nodes made absolutely no measurable difference to the program's playing skill. So worrying about improving it beyond 1 error in 2^64 is pointless. BTW assuming that the upper bits are the only collision component is incorrect. All the bits matter. The larger the table, the more entries you have and the greater chance for collisions to actually happen in a way that will affect the program. But collisions are very unlikely anyway with 64 bit signatures assuming you have decent Zobrist random numbers...

mathmoi · Post by **mathmoi** » Fri Jul 11, 2008 3:11 am

bob wrote:
mathmoi wrote:Hi,

We all know that we need to check the validity of a TT move before we execute it on the board because otherwise we could end up corrupting the engine's internal board - and probably crash the engine.

What are the probabilities of this occuring? Let's say with a 8Gb TT and searching 15 millions nps (both values are a higher than what I expect to be able to achieve with my current hardware).

8*1024^3/16 = 536870912 entries in the TT

I need the 29 lower bits of the hash to find the index in the TT.

So the remainings 33 bits will be the ones to cause a hash collision. That means I'll get a collisions every 2^33 probes.

2^33/15000000 = 572.6 seconds between each collisions.

That means that if I don't check the validity of the move in the TT I'll probably crash every 9.5 minutes.

However if I use a 96 bits hash size (32+64bits) I can use the 29 lower bits of the 32 part to get an index and store the full 64 bits part in the TT to look for collisions. Now i'll get a collision every 2^64 probes or :

2^64/15000000 = 1229782938247 seconds between each collisions or about 40000 years.

Now my question to you. Can it be a good idea to use a 96 bits hash and completely avoid the validity check of the TT moves (since we don't mind crashing every 40000 years)?

MP
It is completely worthless to use 96 bits. If you look at the paper Cozzie and I wrote about the effect of hash collisions on the quality of the search, the results were surprising. One error every 1,000,000 nodes made absolutely no measurable difference to the program's playing skill. So worrying about improving it beyond 1 error in 2^64 is pointless. BTW assuming that the upper bits are the only collision component is incorrect. All the bits matter. The larger the table, the more entries you have and the greater chance for collisions to actually happen in a way that will affect the program. But collisions are very unlikely anyway with 64 bit signatures assuming you have decent Zobrist random numbers...

I red your paper about the effect of hash collosions on the quality of the search. What I'm trying to avoid is not a bad evaluation caused by a collision. I'm trying to avoid a crash of my engine caused by an illegal move taken from the TT.

BTW assuming that the upper bits are the only collision component is incorrect. All the bits matter. The larger the table, the more entries you have and the greater chance for collisions to actually happen in a way that will affect the program.

I think my assumpsion is correct if the TT is full (like most of the time). Suppose I use 24 lower bits as an index. Theses bit will _always_ collide with the hash of the position stored at this index. So the remainaing 64-24 bits will make the difference between a collision or not.

Am I wrong?

MP[/quote]

plattyaj · Post by **plattyaj** » Fri Jul 11, 2008 5:05 am

mathmoi wrote:I think my assumpsion is correct if the TT is full (like most of the time). Suppose I use 24 lower bits as an index. Theses bit will _always_ collide with the hash of the position stored at this index. So the remainaing 64-24 bits will make the difference between a collision or not.

Am I wrong?

MP

Are you storing any of the hash code itself in the hash entry? If you just rely on the lower 24 bits as both an index and the stored code, yes you will be somewhat inaccurate. Usually you would store more than that. My engine, pazter though it is

, stores the full 64 bit key so I can compare.

Andy.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 11, 2008 7:58 am

mathmoi wrote: (snippage)
I think my assumpsion is correct if the TT is full (like most of the time). Suppose I use 24 lower bits as an index. Theses bit will _always_ collide with the hash of the position stored at this index. So the remainaing 64-24 bits will make the difference between a collision or not.

Am I wrong?

MP

Of the positions that will collide into a given slot, but only one out of 2^24th of the positions will go into the same slot. So the bits left over that are used for our hash are for disambiguating this tiny subset only.

Tord Romstad · Post by **Tord Romstad** » Fri Jul 11, 2008 8:36 am

mathmoi wrote:I red your paper about the effect of hash collosions on the quality of the search. What I'm trying to avoid is not a bad evaluation caused by a collision. I'm trying to avoid a crash of my engine caused by an illegal move taken from the TT.

Why not simply test the TT move for legality before you play it, like virtually everybody else does? It is not a very expensive or complicated operation.

Tord

hgm · Post by **hgm** » Fri Jul 11, 2008 9:15 am

mathmoi wrote:That means that if I don't check the validity of the move in the TT I'll probably crash every 9.5 minutes.

Although I agree with the calculation up to here, at this point you seem to draw an unjustified conclusion:

The only thing that you have shown is that you will have a key collision every 9.5 minutes (and if you probe 2 or 4 entries at a time, even per 5 or 2.5 min). So one might assume the hash move will be essentially a random one, not related to the position.

But will such a move crash your engine? Not necessarily! It depends on your engine design. In micro-Max, which has a 0x88 board and no piece list, any combination of (FromSqr,ToSqr) square can be harmlessly interpreted as a move, as long as the squares are on the board. (As they always are, when this was a move valid in some other position, as the board doesn't change size during the game.) It just moves the contents of the FromSqr to the ToSqr, remembering the old contents of the ToSqr, so that the position can be perfectly restored on UnMake(). It does not matter if the FromSquare contains an own piece, an opponent's piece, or is empty.

In Joker, which has a piece list, most illegal moves still are perfectly executed. (I made sure that the empty square does have an encoding that is a valid piece number in the piece list, so that if an empty square is moved, updating its position does not lead to memory corruption. (I needed this anyway, because non-captures are treated as captures of an empty square.)

The only illegal move that could crash Joker was a castling, when there was a piece in the new location of the Rook, as UnMake move did always restore that to an empty square. But the probability that the hash move was a castling is not very large. In the special code of MakeMove() and UnmMake() to handle the Rook part of castling, I now fixed that. At the expense of some overhead, of course, but this is code that is hardl ever executed. Only a minor fraction of the moves are castlings. (And during most of the game castlings are no longer possible anyway.)

Dann Corbit · Post by **Dann Corbit** » Fri Jul 11, 2008 9:46 am

Dann Corbit wrote:
mathmoi wrote: (snippage)
I think my assumpsion is correct if the TT is full (like most of the time). Suppose I use 24 lower bits as an index. Theses bit will _always_ collide with the hash of the position stored at this index. So the remainaing 64-24 bits will make the difference between a collision or not.

Am I wrong?

MP
Of the positions that will collide into a given slot, but only one out of 2^24th of the positions will go into the same slot. So the bits left over that are used for our hash are for disambiguating this tiny subset only.

We also know (for certain) that the lower 24 bits for every position that goes into slot[x] will match (by definition of our hash addressing). So those bits (I would think) are not very intersting nor worth storing.

jromang · Post by **jromang** » Fri Jul 11, 2008 10:44 am

bob wrote: assuming you have decent Zobrist random numbers...

What are decent numbers ? I blindly use a Mersenne Twister random generator, and have not checked the results ; is there a special thing to do to select good Zobrist keys ?

Is 64 bits enough as a hash lenght

Is 64 bits enough as a hash lenght

Re: Is 64 bits enough as a hash lenght

Re: Is 64 bits enough as a hash lenght

Re: Is 64 bits enough as a hash lenght

Re: Is 64 bits enough as a hash lenght

Re: Is 64 bits enough as a hash lenght

Re: Is 64 bits enough as a hash lenght

Re: Is 64 bits enough as a hash lenght

Re: Is 64 bits enough as a hash lenght

Re: Is 64 bits enough as a hash lenght