Beginning hashtable question.

Fguy64 · Post by **Fguy64** » Mon Oct 19, 2009 3:56 pm

OK, I was reading through Bruce Moreland's page on Hash Tables...
http://web.archive.org/web/200404200216 ... ng/hashing.

and there is a comment near the top of the page That left me scratchin my head.

The hash array is indexed via a Zobrist key. You take the key for the position, modulo it by the number of elements in your table, and that's the hash element that corresponds to this position. Since many positions are apt to map to the same element in the hash table, the table elements contain a verification value, which can be used to make sure that the element in the table is the one that you are trying to find. An obvious verification value is the full 64-bit key, so that is the first field in the example above.

Note the sentences I have bolded. I feel like saying well yeah, that is why we are doing the zobrist thing. It's not at all clear to my why he would even bother to say all of this, since it seems to me to be obvious from the beginning that we use a zobrist key precisely to avoid "many positions ... mapping to the same hash element". It makes me think that I misunderstand what is meant by that phrase.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Mon Oct 19, 2009 4:12 pm

Imagine a hashtable with 1 entry.

You do

hashkey % tablesize = tableindex.
entry = table[tableindex]
if (entry.hashkey == myhashkey) blabla

All positions map to the same index. But only 1/2^64 positions will map to the same hashkey.

Fguy64 · Post by **Fguy64** » Mon Oct 19, 2009 4:27 pm

Gian-Carlo Pascutto wrote:Imagine a hashtable with 1 entry.

You do

hashkey % tablesize = tableindex.
entry = table[tableindex]
if (entry.hashkey == myhashkey) blabla

All positions map to the same index. But only 1/2^64 positions will map to the same hashkey.

Ok that seems clear enough. But what isn't clear is why we are even talking about mapping to an array index as opposed to searching on a key. I feel like I am missing something fundamental here. And it's not clear why we modulo the position key with the number of elements in the table anyway, since that doen't add any information about the position, and the position key is supposed to be unique anyway, with the exception of 1/2^64 % of the cases, which I am not concerned about anyway.

plattyaj · Post by **plattyaj** » Mon Oct 19, 2009 5:43 pm

Fguy64 wrote:Ok that seems clear enough. But what isn't clear is why we are even talking about mapping to an array index as opposed to searching on a key. I feel like I am missing something fundamental here. And it's not clear why we modulo the position key with the number of elements in the table anyway, since that doen't add any information about the position, and the position key is supposed to be unique anyway, with the exception of 1/2^64 % of the cases, which I am not concerned about anyway.

That's just the logic to put the entry into the hash itself. For Java, that mostly lives in the HashTable class. Whether you choose to use that class or write something based around an array indexed as above is a decision you'll have to make. I suspect the performance of the HashTable class will prove to be a big overhead but I know that right now you are not really after maximal performance anyway.

Andy.

bob · Post by **bob** » Mon Oct 19, 2009 6:11 pm

Fguy64 wrote:
Gian-Carlo Pascutto wrote:Imagine a hashtable with 1 entry.

You do

hashkey % tablesize = tableindex.
entry = table[tableindex]
if (entry.hashkey == myhashkey) blabla

All positions map to the same index. But only 1/2^64 positions will map to the same hashkey.
Ok that seems clear enough. But what isn't clear is why we are even talking about mapping to an array index as opposed to searching on a key. I feel like I am missing something fundamental here. And it's not clear why we modulo the position key with the number of elements in the table anyway, since that doen't add any information about the position, and the position key is supposed to be unique anyway, with the exception of 1/2^64 % of the cases, which I am not concerned about anyway.

You can't "search for a key". It would take impossibly long to do this once for each node. You take the low order bits (hence the term hash table) of the board signature (which is a representation of the chess board) to index to a specific entry (or perhaps to a small set (bucket) of entries) but you still have to be sure that this "entry" (or entries) goes with the current position, by matching the signatures (also called the "lock" or "check" bits).

Sven · Post by **Sven** » Mon Oct 19, 2009 6:29 pm

Fguy64 wrote:
Gian-Carlo Pascutto wrote:Imagine a hashtable with 1 entry.

You do

hashkey % tablesize = tableindex.
entry = table[tableindex]
if (entry.hashkey == myhashkey) blabla

All positions map to the same index. But only 1/2^64 positions will map to the same hashkey.
Ok that seems clear enough. But what isn't clear is why we are even talking about mapping to an array index as opposed to searching on a key. I feel like I am missing something fundamental here. And it's not clear why we modulo the position key with the number of elements in the table anyway, since that doen't add any information about the position, and the position key is supposed to be unique anyway, with the exception of 1/2^64 % of the cases, which I am not concerned about anyway.

A hash table is not organized like a map or another dynamic data structure, this would be very inefficient. I guess your problem may be that you expect accessing the hash table to be something where the entry must be searched somehow, for instance via binary search, Bayer tree or whatever.

Actually you don't have to search, the entry is just there, you just need its array index ... The only thing you do is that you take only part of the 64 bit key, and here the "modulo 2^xx" is usually most efficient (in C/C++ this maps to one shift operation). You cannot take the whole 64 bit key as table (array) index, for obvious reasons. Even if RAM for 2^64 hash table entries were available on a sunny day in the future, a *very* huge portion of that table would remain unused during the game - consider the relatively small number of positions visited during all searching within one chess game. No need to talk about that, I hope.

Now there are 2^(64-xx) hash keys each being mapped to the same table index due to the "modulo". So when looking up the hash table with a given 64 bit key K, you need to confirm that the data which is stored at index K%(2^xx) really belongs to K and not to another hash key with the same xx lower bits. This can be done by storing the upper (64-xx) bits as part of the hash entry and use that part for verification. (You can even store the whole 64 bits. Some engines seem to store less than (64-xx).) In case of a mismatch, this is not a "collision" but simply the situation that your hash table currently has no data for the hash key K.

However, if the verification bits do match then you can be pretty sure (although not 100.000000%) that the data stored in that hash entry belong to the position you are looking up. A collision is the highly unlikely case that two different chess positions have the same 64 bit hash key *and* one of them is the current position and the other one stored in the table. Why Zobrist keys make this unlikely, and how you could or should deal with this case, is another long story, much has been written about it and I skip it here.

Sven

Fguy64 · Post by **Fguy64** » Mon Oct 19, 2009 7:58 pm

Sven Schüle wrote:
Fguy64 wrote:
Gian-Carlo Pascutto wrote:Imagine a hashtable with 1 entry.

You do

hashkey % tablesize = tableindex.
entry = table[tableindex]
if (entry.hashkey == myhashkey) blabla

All positions map to the same index. But only 1/2^64 positions will map to the same hashkey.
Ok that seems clear enough. But what isn't clear is why we are even talking about mapping to an array index as opposed to searching on a key. I feel like I am missing something fundamental here. And it's not clear why we modulo the position key with the number of elements in the table anyway, since that doen't add any information about the position, and the position key is supposed to be unique anyway, with the exception of 1/2^64 % of the cases, which I am not concerned about anyway.
A hash table is not organized like a map or another dynamic data structure, this would be very inefficient. I guess your problem may be that you expect accessing the hash table to be something where the entry must be searched somehow, for instance via binary search, Bayer tree or whatever.

Actually you don't have to search, the entry is just there, you just need its array index ... The only thing you do is that you take only part of the 64 bit key, and here the "modulo 2^xx" is usually most efficient ...

Sven

Indeed, that was the case, ok well, not sure how I could have missed such an important point, but on we go, I guess I have a little more research to do before I try an implementation.

Fred.

Fguy64 · Post by **Fguy64** » Mon Feb 08, 2010 9:57 pm

Greg Strong wrote:This works but it's expensive to loop through the 64 squares xor-ing in appropriate hashes. The beauty of the zobrist hash scheme is that you can maintain it incrementally (at least the piece-on-square portion.) When you move a piece off a square, xor that piece-square-key to remove it. When you place a piece on a square, xor that new piece-square-key. Then, when you need your actual position hashcode, just take the incrementally-maintained key and xor in the appropriate side-to-move, castling rights, and ep square keys.

You can also test this scheme by performing your original loop-through-squares approach and compare that result to make sure that your incrementially maintained key approach is correct.

Time to take another run at hash tables.

Anyways, the bolded remark suggests to me the following, which is useful for understanding if true: Given any key A, if you XOR key B twice you get back key A. So repeatedly XORing the same key into a position key is akin to repeatedly moving a piece off and back onto a square?

Greg Strong · Post by **Greg Strong** » Mon Feb 08, 2010 11:26 pm

Correct.

Michel · Post by **Michel** » Tue Feb 09, 2010 7:51 am

Of course, this is the very nature of hashing: once you hash your position to the zobrist number then there is no way to get the position back.

I am interested if this statement is really true. A chess position contains a lot of redundancy so it seems not implausible that one might be able to revert a Zobrist hash.

Beginning hashtable question.

Re: Beginning hashtable question.

Re: Beginning hashtable question.

Re: Beginning hashtable question.

Re: Beginning hashtable question.

Re: Beginning hashtable question.

Re: Beginning hashtable question.

Re: Beginning hashtable question.

Re: Beginning hashtable question.

Re: Beginning hashtable question.

Re: Beginning hashtable question.