Hashing based on move lists

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Hashing based on move lists

Post by lucasart »

Gusev wrote:
lucasart wrote:
ZirconiumX wrote:
mcostalba wrote:
ZirconiumX wrote: Please demonstrate this, since with my stupid eyes I cannot see anything.
For instance Ra1, Ba8 and Ra1, Bh1


To moderation: Please be so kind to consider taking action against the the troll: it is many days already that he is dirtying up the threads. Normally this kind of people disappears after few days, but this one seems persistent so please evaluate about blocking the account.
OK, talking to Dmitri confirms it was XOR, so my mistake.

With XOR, not OR, this doesn't collide.

Matthew:out
XOR is better, but you will still have collisions. What you propose is exactly like Zobrist hashing, except that instead of generating zobrist keys at random, you use attack bitboards for the zobrist keys.

Whether this scheme collides less than Zobrist (assuming a proper random generator with good entropy and non zero keys at least) is not clear.

I can't think of a trivial example to make this collide, but I diden't give it much thought. I'm sure it's possible. At least I see no reason why it would collide less than random Zobrist.
Zobrist merely tries to ensure that, for any pair of positions, we are as unlikely to experience a collision as possible. This is not a chess-specific approach. Intuitively, it is somewhat wasteful in its generality, as it is trying to take equal care of pairs of positions one or both of which is illegal/unreachable and pairs of legal reachable positions (the only ones of practical interest). I am trying to design chess-specific hashing "biased" so as to minimize collisions between legal reachable positions. In order to achieve that, pseudo-randomness has to be abandoned. Instead, "intelligent design" is needed, so to speak. I am not claiming that I have already achieved that. But each example that makes legal reachable positions collide can lead to improvement of the directly designed technique. In the meanwhile, bad luck shall remain bad luck, for it's inherent to the non-chess-specific Zobrist approach that employs pseudo-randomness.
Abstract arguments and trying to demonstrate points by words, mind control, or whatever you want, will not work. At least not here in this forum.

You need to demonstrate your affirmations. The same goes to Matthew who has made many unfounded affirmations about this zobrist method being clearly better etc. It's not because a trivial counter-example cannot be found that the method works. That kind of simplistic reasoning is not scientific. It's excusable of a 14 year old but you should know better.

We can discuss this endlessly, but it's a waste of time and space. So far, no empirical study has been done to demonstrate that your zobrist keys are better than random ones. Please do such a study and come back with empirical evidence. At least theoretically, there's no reason why your zobrist should be better than random ones.

I remember measuring the quality of random generators for the purpose of zobrist hashing as follows:
- reduce the number of bits until collisions are measurable
- and measure the frequency of collisions in realistic scenarios (like a bunch of perft, started from a large collection of starting positions)

The conclusion was that:
- Mersenne Twister and Bob Jenkins' 64-bit PRNG are equally as good (properly seeded, obviously).
- LCG even "good ones" (ie. 2^64 periodic that at least passes many of the dieharder tests) are measurably worse.

I would not be surprised at all that your zobrist fails in a similar way to LCG in proper testing. LCG also is resistant to trivial collisions counter-examples, and it has the added benefit of passing many (not all) statistical tests as a random generator.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

You likely already have some 128 bit keys on your system

Post by sje »

You likely already have some 128 bit keys on your system:

http://en.wikipedia.org/wiki/Universall ... identifier

A UUID is 128 bits long and for good reason: the improbability of accidental duplication (false positives). A pseudorandom UUID has only 122 bits with the remaining six bits going to field selector values.

Command line calls
On Linux: dbus-uuidgen
On Mac OS/X: uuidgen
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: A brief comment on using 128 bit Zobrist hashing

Post by jdart »

In fact some engines use 32, like Stockfish.
32 bit hash codes were insufficient, even 10 years ago.

I think Stockfish stores only 32 bits of the hash in the hashtable but the other bits are used in computing the part of the hashtable to probe, so the effective hash code size is larger.

--Jon
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: A brief comment on using 128 bit Zobrist hashing

Post by michiguel »

jdart wrote:
In fact some engines use 32, like Stockfish.
32 bit hash codes were insufficient, even 10 years ago.

I think Stockfish stores only 32 bits of the hash in the hashtable but the other bits are used in computing the part of the hashtable to probe, so the effective hash code size is larger.

--Jon
If the most significant bits are stored (32 bits), and the least significant bits are used to address the entry, you get "extra bits". For a 1Gb of hash memory, if the entry size is 16 bytes, then there is 2^26 entries that are using the least significant bits. So, the effective number of bits used are 32 + 26 = 58 bits in that case.

Using the least significant bits to calculate the address and storing them at the same time is somehow wasteful.

Miguel
Gusev
Posts: 1476
Joined: Mon Jan 28, 2013 2:51 pm

Re: Hashing based on move lists

Post by Gusev »

lucasart wrote:
Gusev wrote:
lucasart wrote:
ZirconiumX wrote:
mcostalba wrote:
ZirconiumX wrote: Please demonstrate this, since with my stupid eyes I cannot see anything.
For instance Ra1, Ba8 and Ra1, Bh1


To moderation: Please be so kind to consider taking action against the the troll: it is many days already that he is dirtying up the threads. Normally this kind of people disappears after few days, but this one seems persistent so please evaluate about blocking the account.
OK, talking to Dmitri confirms it was XOR, so my mistake.

With XOR, not OR, this doesn't collide.

Matthew:out
XOR is better, but you will still have collisions. What you propose is exactly like Zobrist hashing, except that instead of generating zobrist keys at random, you use attack bitboards for the zobrist keys.

Whether this scheme collides less than Zobrist (assuming a proper random generator with good entropy and non zero keys at least) is not clear.

I can't think of a trivial example to make this collide, but I diden't give it much thought. I'm sure it's possible. At least I see no reason why it would collide less than random Zobrist.
Zobrist merely tries to ensure that, for any pair of positions, we are as unlikely to experience a collision as possible. This is not a chess-specific approach. Intuitively, it is somewhat wasteful in its generality, as it is trying to take equal care of pairs of positions one or both of which is illegal/unreachable and pairs of legal reachable positions (the only ones of practical interest). I am trying to design chess-specific hashing "biased" so as to minimize collisions between legal reachable positions. In order to achieve that, pseudo-randomness has to be abandoned. Instead, "intelligent design" is needed, so to speak. I am not claiming that I have already achieved that. But each example that makes legal reachable positions collide can lead to improvement of the directly designed technique. In the meanwhile, bad luck shall remain bad luck, for it's inherent to the non-chess-specific Zobrist approach that employs pseudo-randomness.
Abstract arguments and trying to demonstrate points by words, mind control, or whatever you want, will not work. At least not here in this forum.

You need to demonstrate your affirmations. The same goes to Matthew who has made many unfounded affirmations about this zobrist method being clearly better etc. It's not because a trivial counter-example cannot be found that the method works. That kind of simplistic reasoning is not scientific. It's excusable of a 14 year old but you should know better.

We can discuss this endlessly, but it's a waste of time and space. So far, no empirical study has been done to demonstrate that your zobrist keys are better than random ones. Please do such a study and come back with empirical evidence. At least theoretically, there's no reason why your zobrist should be better than random ones.

I remember measuring the quality of random generators for the purpose of zobrist hashing as follows:
- reduce the number of bits until collisions are measurable
- and measure the frequency of collisions in realistic scenarios (like a bunch of perft, started from a large collection of starting positions)

The conclusion was that:
- Mersenne Twister and Bob Jenkins' 64-bit PRNG are equally as good (properly seeded, obviously).
- LCG even "good ones" (ie. 2^64 periodic that at least passes many of the dieharder tests) are measurably worse.

I would not be surprised at all that your zobrist fails in a similar way to LCG in proper testing. LCG also is resistant to trivial collisions counter-examples, and it has the added benefit of passing many (not all) statistical tests as a random generator.
First you stated that you saw no reason why it would collide less than the random Zobrist. I responded to that: Because pseudo-random keys are generated in a way that does not take into account specifics of the game of chess. The traditional approach is simply too general to be optimum for the specific task. Intuitively, we should be able to do better.

Now you're asking to demonstrate that chess-specific hashing can in fact be designed to collide less. That's the next step. By "trying to demonstrate points by words", I merely meant to help you see the reason that you didn't see before. The words were not offered as positive proof of an actual achievement. The opposite is true: This idea clearly needs some refinement in order for it to work properly.
Gusev
Posts: 1476
Joined: Mon Jan 28, 2013 2:51 pm

Re: Hashing based on move lists

Post by Gusev »

syzygy wrote:
Gusev wrote:Zobrist merely tries to ensure that, for any pair of positions, we are as unlikely to experience a collision as possible. This is not a chess-specific approach. Intuitively, it is somewhat wasteful in its generality, as it is trying to take equal care of pairs of positions one or both of which is illegal/unreachable and pairs of legal reachable positions (the only ones of practical interest). I am trying to design chess-specific hashing "biased" so as to minimize collisions between legal reachable positions. In order to achieve that, pseudo-randomness has to be abandoned. Instead, "intelligent design" is needed, so to speak. I am not claiming that I have already achieved that. But each example that makes legal reachable positions collide can lead to improvement of the directly designed technique. In the meanwhile, bad luck shall remain bad luck, for it's inherent to the non-chess-specific Zobrist approach that employs pseudo-randomness.
I'm afraid you're trying to solve a problem that simply does not exist. Zobrist hashing works very well and certainly well enough that nothing can be gained by further minimising collisions. Any alternative scheme that is less efficient (in terms of cpu cycles and memory usage) is a sure loss, and it is difficult to see how a hashing scheme could be more efficient.
"Nothing can be gained": Depends on how we measure the gain. In terms of Elo points, the gain may be very well be minuscule. In terms of the number of collisions, it may be possible to achieve something. I don't see why chess-specific hashing would be less efficient. The contributions of individual pieces are computed once, at initialization. After that, they are used as before.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Hashing based on move lists

Post by mvk »

lucasart wrote:I remember measuring the quality of random generators for the purpose of zobrist hashing as follows:
- reduce the number of bits until collisions are measurable
- and measure the frequency of collisions in realistic scenarios (like a bunch of perft, started from a large collection of starting positions)

The conclusion was that:
- Mersenne Twister and Bob Jenkins' 64-bit PRNG are equally as good (properly seeded, obviously).
- LCG even "good ones" (ie. 2^64 periodic that at least passes many of the dieharder tests) are measurably worse.
That is interesting. Do you by any chance also remember references to those measurements that show that LCG's make poor Zobrist keys?
I have always believed that the Mersenne Twister application for chess Zobrist hash codes was a bit of a fad, so I like to see my belief be corrected if it is wrong.

PS: I'm using Park and Miller's to generate individual bytes of the table.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Hashing based on move lists

Post by hgm »

Indeed, it may be possible to gain something in terms of collisions, and that gain might actually be translated into a measurable Elo gain by reducing the size of the signature stored in the hash table, and thus cramming more entries in the same amount of memory.

But such a gain is very unlikely to be found by someone who obviously doesn't have the slightest idea what he is doing, dreaming up stuff behind his desk without making even the simplest tests on it, so that even the most disastrously flawed schemes make it up to these pages, and still are defended by mindless blabbering after already having been exposed as utter crap...
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Hashing based on move lists

Post by Henk »

hgm wrote: But such a gain is very unlikely to be found by someone who obviously doesn't have the slightest idea what he is doing, dreaming up stuff behind his desk without making even the simplest tests on it, so that even the most disastrously flawed schemes make it up to these pages, and still are defended by mindless blabbering after already having been exposed as utter crap...
This website cannot be used for brainstorming sessions am I right.

One also cannot expect serious articles on chess research because who can afford that. Organizations, millionairs ?
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Hashing based on move lists

Post by lucasart »

mvk wrote:
lucasart wrote:I remember measuring the quality of random generators for the purpose of zobrist hashing as follows:
- reduce the number of bits until collisions are measurable
- and measure the frequency of collisions in realistic scenarios (like a bunch of perft, started from a large collection of starting positions)

The conclusion was that:
- Mersenne Twister and Bob Jenkins' 64-bit PRNG are equally as good (properly seeded, obviously).
- LCG even "good ones" (ie. 2^64 periodic that at least passes many of the dieharder tests) are measurably worse.
That is interesting. Do you by any chance also remember references to those measurements that show that LCG's make poor Zobrist keys?
I have always believed that the Mersenne Twister application for chess Zobrist hash codes was a bit of a fad, so I like to see my belief be corrected if it is wrong.

PS: I'm using Park and Miller's to generate individual bytes of the table.
Unfortunately I didn't keep the data nor the code (I rewrote my engine a while back). But as I remember the results were not like day and nighet either. Perhaps it was something like: 32 bit zobrist with LCG is the same as 28 with good PRNG or something like that.

Regarding which PRNG to use, I think there are lots of myths flying around. One of them is "cryptographically secure". Cryptographic security (often a misunderstood concept) has nothing to do with how good a generator is for statistics (general purpose PRNG) or for Zobrist hashing.

Mersenne Twister is slow and very complicated. Besides it has an important flaw: it takes a long time to recover from weak states, so seeding properly is important and tricky. I can't think of any application where it makes sense to use the Mersenne Twister. But somehow it's popular. Seems that most people think: if it's that complicated and ugly, it must be complicated and ugly for a reason. And the myth endures...

I don't know any statistical test, or zobrist related property that Bob Jenkins' small PRNG doesn't pass. It's simple, fast, and no brainer to seed. What more do we need ?

PS: What 'n' do you use in the Lehmer generator ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.