31 bit hash values. How often will it fail?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Carey
Posts: 313
Joined: Wed Mar 08, 2006 8:18 pm

31 bit hash values. How often will it fail?

Post by Carey »

First a little background.

As some of you know, I care much more about classic chess programs than modern ones. I'd much rather look at the source for Machack VI, Chess 3/4, etc. than Crafty, Fritz, Rybka, Fruit, etc. etc.

I am currently working on trying to get a classic chess program running. Sorry, I can't give the name. The author isn't ready to release it, so I can't say a lot about it. Annoying, but that's the way it is.

I can say that it is a 32 bit-ish program designed to run on mainframes.

It uses a 31 bit hash. The hash is done as a standard Zobrist style.

To keep the numbers simple, let's say that it searches 10,000 nodes per move. An average game is 50 moves by each side. (Also figure we will do capture searches and select moves to much deeper than other lines. Perhaps as far as 10 or 15 plies. So lots of chances for transpositions.)

How often will we get a hash collision / failure?

For that size tree, I'd think not very often. But the 'birthday paradox' thing might disagree with me.

If we were doing 1,000 nodes and 100,000 nodes per move, how would the numbers change?



The reason I'm asking is that we are having significant troubles getting the program to run properly. It is not running as well as it used to in tournaments.

Some of that could be due to a variety of things, including compiler differences and even the possibility this was an experimental buggy version rather than a tournament version. (Yeah... joys of dealing with classic chess programs. You take whatever version you can get...)

I was thinking that it could also be the hash table.

31 bits isn't much and I'm not sure the program can handle collisions gracefully or not.


The simplest solution would be to convert to 64 bit hash and see if it still fails.

Unfortunately, for a variety of programming issues, it wouldn't be easy. It can be done if I have to, but I'd rather be reasonably sure I have to before I make the effort.

So I thought I'd ask first and see what the math says...

Carey
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 31 bit hash values. How often will it fail?

Post by bob »

Carey wrote:First a little background.

As some of you know, I care much more about classic chess programs than modern ones. I'd much rather look at the source for Machack VI, Chess 3/4, etc. than Crafty, Fritz, Rybka, Fruit, etc. etc.

I am currently working on trying to get a classic chess program running. Sorry, I can't give the name. The author isn't ready to release it, so I can't say a lot about it. Annoying, but that's the way it is.

I can say that it is a 32 bit-ish program designed to run on mainframes.

It uses a 31 bit hash. The hash is done as a standard Zobrist style.

To keep the numbers simple, let's say that it searches 10,000 nodes per move. An average game is 50 moves by each side. (Also figure we will do capture searches and select moves to much deeper than other lines. Perhaps as far as 10 or 15 plies. So lots of chances for transpositions.)

How often will we get a hash collision / failure?

For that size tree, I'd think not very often. But the 'birthday paradox' thing might disagree with me.

If we were doing 1,000 nodes and 100,000 nodes per move, how would the numbers change?



The reason I'm asking is that we are having significant troubles getting the program to run properly. It is not running as well as it used to in tournaments.

Some of that could be due to a variety of things, including compiler differences and even the possibility this was an experimental buggy version rather than a tournament version. (Yeah... joys of dealing with classic chess programs. You take whatever version you can get...)

I was thinking that it could also be the hash table.

31 bits isn't much and I'm not sure the program can handle collisions gracefully or not.


The simplest solution would be to convert to 64 bit hash and see if it still fails.

Unfortunately, for a variety of programming issues, it wouldn't be easy. It can be done if I have to, but I'd rather be reasonably sure I have to before I make the effort.

So I thought I'd ask first and see what the math says...

Carey
You will get a significant number of collisions. But the good news is, they won't affect the final search enough to measure, so you can ignore that. I'm surprised anyone used 31 (31 and not 32?) bit signatures.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: 31 bit hash values. How often will it fail?

Post by sje »

Experience says that a 32 bit hash is too short for organizing even a modest sized opening book. Forty bits is a bare minimum here, and 48 bits should be used for reliability.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: 31 bit hash values. How often will it fail?

Post by sje »

Also, a note on terminology:

A collision occurs when two different hashes map to the same entry slot in a transposition table. Collisions happen all the time and are a necessary result of using a transposition table smaller than the number of possible different hashes.

A false positive match is what can cause big problems; this occurs when two different positions have the same hash. (It is impossible to get a false negative mismatch.)
Carey
Posts: 313
Joined: Wed Mar 08, 2006 8:18 pm

Re: 31 bit hash values. How often will it fail?

Post by Carey »

bob wrote:You will get a significant number of collisions. But the good news is, they won't affect the final search enough to measure, so you can ignore that. I'm surprised anyone used 31 (31 and not 32?) bit signatures.
It might effect the search.

I don't pretend to understand the program (it's fairly complex) but he's not doing the trans table like we do these days. Can't say more.

I'm not sure how the flow goes, but I suspect that it's more than just simply a returned score. It might actually be able to effect the tree itself somehow. (As I said, the code is complicated. And there are no docs on what's going on.)

That's why I was wanting numbers. To see how often we can expect it for various tree sizes.

He said that he thought he always verified the hash collisions by checking some other info, but it doesn't look like this version does that.

(And I did happen to mention your work on collisions & hash failures etc. and that it's not as big a problem as normally expects. Provided the program can handle it without failure. Which we might be seeing.)


And yes, it's 31 bits. The program is 32-bit'ish. Meaning most of the program doesn't care whether it's 16 bit, 32 bit, or 36 bit.

Only the hash vales are 32 bit. But, if it's negative, he negates it so the sign bit is always positive. So technically 31 bits.

During the search, he sometimes negates it again, but it doesn't appear to be for color reasons. So basically it's a 31 bit hash value.

As to why he chost 31 bits... Can't say for sure. Possibly portability though. That way he could do 36 & 32 bit systems and maybe even odd systems that didn't use regular 32 bit binary signed numbers. Just a guess. I didn't ask.
Last edited by Carey on Sun Sep 07, 2008 8:23 pm, edited 1 time in total.
Carey
Posts: 313
Joined: Wed Mar 08, 2006 8:18 pm

Re: 31 bit hash values. How often will it fail?

Post by Carey »

sje wrote:Experience says that a 32 bit hash is too short for organizing even a modest sized opening book. Forty bits is a bare minimum here, and 48 bits should be used for reliability.
The opening book works fine.

it's not a massive book, but it works okay.
Carey
Posts: 313
Joined: Wed Mar 08, 2006 8:18 pm

Re: 31 bit hash values. How often will it fail?

Post by Carey »

sje wrote:Also, a note on terminology:

A collision occurs when two different hashes map to the same entry slot in a transposition table. Collisions happen all the time and are a necessary result of using a transposition table smaller than the number of possible different hashes.

A false positive match is what can cause big problems; this occurs when two different positions have the same hash. (It is impossible to get a false negative mismatch.)
I should have been a bit more careful with my wording.

However, the way things are programmed, it's not quite as inaccurate as what you suggest... :wink:

But yes, I should have worded it better. My fault. I don't write well and it just happened.
Carey
Posts: 313
Joined: Wed Mar 08, 2006 8:18 pm

Re: 31 bit hash values. How often will it fail?

Post by Carey »

Carey wrote:And yes, it's 31 bits. The program is 32-bit'ish. Meaning most of the program doesn't care whether it's 16 bit, 32 bit, or 36 bit.

Only the hash vales are 32 bit. But, if it's negative, he negates it so the sign bit is always positive. So technically 31 bits.

During the search, he sometimes negates it again, but it doesn't appear to be for color reasons. So basically it's a 31 bit hash value.

As to why he chost 31 bits... Can't say for sure. Possibly portability though. That way he could do 36 & 32 bit systems and maybe even odd systems that didn't use regular 32 bit binary signed numbers. Just a guess. I didn't ask.

I don't know what the heck I was thinking of. They aren't 31 bit integers.

And it doesn't negate it if the hash is negative.

Both of those are absolutely looney and my only excuse is I screwed up. My brain froze or something.


The hash numbers are 30 bits.... Yeah, 30 bits. The highest one I see is 0x3F4C4A95, which is a 30 bit number.

It does negate the hash value during creation, but it is based on the color not whether the hash value itself is negative.

But basically, we are working with a mere 30 bits of hash....

And still, no I don't know why it's only 30 bits.


I think I'm going to expand this and make the changes to go to 64 bits just to see if this odd program might play differently.

I know Hyatt has run tests on modern programs, but this one is definetly not modern and does its own thing its own way.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 31 bit hash values. How often will it fail?

Post by bob »

Carey wrote:
Carey wrote:And yes, it's 31 bits. The program is 32-bit'ish. Meaning most of the program doesn't care whether it's 16 bit, 32 bit, or 36 bit.

Only the hash vales are 32 bit. But, if it's negative, he negates it so the sign bit is always positive. So technically 31 bits.

During the search, he sometimes negates it again, but it doesn't appear to be for color reasons. So basically it's a 31 bit hash value.

As to why he chost 31 bits... Can't say for sure. Possibly portability though. That way he could do 36 & 32 bit systems and maybe even odd systems that didn't use regular 32 bit binary signed numbers. Just a guess. I didn't ask.

I don't know what the heck I was thinking of. They aren't 31 bit integers.

And it doesn't negate it if the hash is negative.

Both of those are absolutely looney and my only excuse is I screwed up. My brain froze or something.


The hash numbers are 30 bits.... Yeah, 30 bits. The highest one I see is 0x3F4C4A95, which is a 30 bit number.

It does negate the hash value during creation, but it is based on the color not whether the hash value itself is negative.

But basically, we are working with a mere 30 bits of hash....

And still, no I don't know why it's only 30 bits.


I think I'm going to expand this and make the changes to go to 64 bits just to see if this odd program might play differently.

I know Hyatt has run tests on modern programs, but this one is definetly not modern and does its own thing its own way.
you haven't mentioned the "era" of the program, which would give insight into the search techniques used. Pre-1989 would mean no null-move or singular extensions, no reductions. Probably basic alpha/beta with a couple of types of extensions....
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: 31 bit hash values. How often will it fail?

Post by sje »

The old CDC 6000 series machines used 60 bit words, so one memory address could hold a pair of 30 bit hash values. Oh, and subtraction was all one's complement and only upper case letters were supported with the native six bit byte length. Gosh, I'm glad those days are over.

Numeric storage in some (er, most) Lisp systems uses several (2-8) bits per value to indicate type. For example, fast integers (i.e., FIXNUMs) in GNU clisp are limited to 24 bits.

Some old DEC models (e.g., pdp-10) used 36 bit operands.