Crafty Transpostion Table Question

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Eric Stock

Crafty Transpostion Table Question

Post by Eric Stock »

Hi I am just curious why Crafty XORs its 64 bit hash key with its 64 bits of hash information (score, type, depth...etc) before it stores a hash entry. Why doesn't it just store the 64 bit hash the way it is? By the way, this code is from Crafty23.0.

The line of code I am interested in is

htable->word2 = word2 ^ word1;

There are two occurrences of this line in the last 5 lines of the code I copied


Thanks,
Eric Stock


Code: Select all

void HashStore(TREE * RESTRICT tree, int ply, int depth, int wtm, int type,
    int value, int bestmove) {
  register BITBOARD word1, word2;
  register HASH_ENTRY *htable;
  register int draft, age, hwhich;

/*
 ************************************************************
 *                                                          *
 *   "Fill in the blank" and build a table entry from       *
 *   current search information.                            *
 *                                                          *
 ************************************************************
 */
  word1 = transposition_id;
  word1 = &#40;word1 << 4&#41; | type;
  if &#40;value > MATE - 300&#41;
    value += ply - 1;
  else if &#40;value < -MATE + 300&#41;
    value -= ply - 1;
  word1 = &#40;word1 << 25&#41; | bestmove;
  word1 = &#40;word1 << 15&#41; | depth;
  word1 = &#40;word1 << 17&#41; | &#40;value + 65536&#41;;
  word2 = &#40;wtm&#41; ? HashKey &#58; ~HashKey;
/*
 ************************************************************
 *                                                          *
 *   If the draft of this entry is greater than the draft   *
 *   of the entry in the "depth-priority" table, or if the  *
 *   entry in the depth-priority table is from an old       *
 *   search, move that entry to the always-store table and  *
 *   then replace the depth-priority table entry by the new *
 *   hash result.                                           *
 *                                                          *
 ************************************************************
 */
  htable = trans_ref + 3 * &#40;word2 & hash_mask&#41;;
  draft = &#40;htable->word1 >> 17&#41; & 0x7fff;
  age = htable->word1 >> 61;
  if &#40;age != transposition_id || &#40;depth >= draft&#41;) &#123;
    if &#40;word2 != &#40;htable->word2 ^ htable->word1&#41;) &#123;
      hwhich = ((&#40;htable->word2 ^ htable->word1&#41; >> log_hash&#41; & 1&#41; + 1;
      &#40;htable + hwhich&#41;->word1 = htable->word1;
      &#40;htable + hwhich&#41;->word2 = htable->word2;
    &#125;
    htable->word1 = word1;
    htable->word2 = word2 ^ word1;
  &#125; else &#123;
    hwhich = (&#40;word2 >> log_hash&#41; & 1&#41; + 1;
    &#40;htable + hwhich&#41;->word1 = word1;
    &#40;htable + hwhich&#41;->word2 = word2 ^ word1;
  &#125;
&#125;
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty Transpostion Table Question

Post by bob »

Eric Stock wrote:Hi I am just curious why Crafty XORs its 64 bit hash key with its 64 bits of hash information (score, type, depth...etc) before it stores a hash entry. Why doesn't it just store the 64 bit hash the way it is? By the way, this code is from Crafty23.0.

The line of code I am interested in is

htable->word2 = word2 ^ word1;

There are two occurrences of this line in the last 5 lines of the code I copied


Thanks,
Eric Stock


Code: Select all

void HashStore&#40;TREE * RESTRICT tree, int ply, int depth, int wtm, int type,
    int value, int bestmove&#41; &#123;
  register BITBOARD word1, word2;
  register HASH_ENTRY *htable;
  register int draft, age, hwhich;

/*
 ************************************************************
 *                                                          *
 *   "Fill in the blank" and build a table entry from       *
 *   current search information.                            *
 *                                                          *
 ************************************************************
 */
  word1 = transposition_id;
  word1 = &#40;word1 << 4&#41; | type;
  if &#40;value > MATE - 300&#41;
    value += ply - 1;
  else if &#40;value < -MATE + 300&#41;
    value -= ply - 1;
  word1 = &#40;word1 << 25&#41; | bestmove;
  word1 = &#40;word1 << 15&#41; | depth;
  word1 = &#40;word1 << 17&#41; | &#40;value + 65536&#41;;
  word2 = &#40;wtm&#41; ? HashKey &#58; ~HashKey;
/*
 ************************************************************
 *                                                          *
 *   If the draft of this entry is greater than the draft   *
 *   of the entry in the "depth-priority" table, or if the  *
 *   entry in the depth-priority table is from an old       *
 *   search, move that entry to the always-store table and  *
 *   then replace the depth-priority table entry by the new *
 *   hash result.                                           *
 *                                                          *
 ************************************************************
 */
  htable = trans_ref + 3 * &#40;word2 & hash_mask&#41;;
  draft = &#40;htable->word1 >> 17&#41; & 0x7fff;
  age = htable->word1 >> 61;
  if &#40;age != transposition_id || &#40;depth >= draft&#41;) &#123;
    if &#40;word2 != &#40;htable->word2 ^ htable->word1&#41;) &#123;
      hwhich = ((&#40;htable->word2 ^ htable->word1&#41; >> log_hash&#41; & 1&#41; + 1;
      &#40;htable + hwhich&#41;->word1 = htable->word1;
      &#40;htable + hwhich&#41;->word2 = htable->word2;
    &#125;
    htable->word1 = word1;
    htable->word2 = word2 ^ word1;
  &#125; else &#123;
    hwhich = (&#40;word2 >> log_hash&#41; & 1&#41; + 1;
    &#40;htable + hwhich&#41;->word1 = word1;
    &#40;htable + hwhich&#41;->word2 = word2 ^ word1;
  &#125;
&#125;
SImple. This is the "lockless hashing" idea for parallel search.

The issue is that different threads can store into the same hash table address at approximately the same time. There are two 64 bit values, the signature (A) and the score/etc (B). If two processors try to store at the same time, you want to get eihere A1,B1 or A2,B2 stored in the table, but due to timing you can get A1,B2 or A2, B1 which is a problem.

If, when you store A1, B1, you xor them to modify the stored signature, and when you probe that entry, you xor again to recover the original signature, you eliminate the possibility of getting a match on A1, but getting data B2 which would be wrong.

The lockless hashing paper is on my web page at www.cis.uab.edu, where you click on faculty then my name...

If you don't do this, you have to use a normal Lock()/Unlock() which is way excessive overhead due to all the hash storing and probing going on.
Eric Stock

Re: Crafty Transpostion Table Question

Post by Eric Stock »

Excellent..thank you. I'll check out the paper.

Eric
CThinker
Posts: 388
Joined: Wed Mar 08, 2006 10:08 pm

Re: Crafty Transpostion Table Question

Post by CThinker »

bob wrote:
Eric Stock wrote:Hi I am just curious why Crafty XORs its 64 bit hash key with its 64 bits of hash information (score, type, depth...etc) before it stores a hash entry. Why doesn't it just store the 64 bit hash the way it is? By the way, this code is from Crafty23.0.

The line of code I am interested in is

htable->word2 = word2 ^ word1;

There are two occurrences of this line in the last 5 lines of the code I copied


Thanks,
Eric Stock


Code: Select all

void HashStore&#40;TREE * RESTRICT tree, int ply, int depth, int wtm, int type,
    int value, int bestmove&#41; &#123;
  register BITBOARD word1, word2;
  register HASH_ENTRY *htable;
  register int draft, age, hwhich;

/*
 ************************************************************
 *                                                          *
 *   "Fill in the blank" and build a table entry from       *
 *   current search information.                            *
 *                                                          *
 ************************************************************
 */
  word1 = transposition_id;
  word1 = &#40;word1 << 4&#41; | type;
  if &#40;value > MATE - 300&#41;
    value += ply - 1;
  else if &#40;value < -MATE + 300&#41;
    value -= ply - 1;
  word1 = &#40;word1 << 25&#41; | bestmove;
  word1 = &#40;word1 << 15&#41; | depth;
  word1 = &#40;word1 << 17&#41; | &#40;value + 65536&#41;;
  word2 = &#40;wtm&#41; ? HashKey &#58; ~HashKey;
/*
 ************************************************************
 *                                                          *
 *   If the draft of this entry is greater than the draft   *
 *   of the entry in the "depth-priority" table, or if the  *
 *   entry in the depth-priority table is from an old       *
 *   search, move that entry to the always-store table and  *
 *   then replace the depth-priority table entry by the new *
 *   hash result.                                           *
 *                                                          *
 ************************************************************
 */
  htable = trans_ref + 3 * &#40;word2 & hash_mask&#41;;
  draft = &#40;htable->word1 >> 17&#41; & 0x7fff;
  age = htable->word1 >> 61;
  if &#40;age != transposition_id || &#40;depth >= draft&#41;) &#123;
    if &#40;word2 != &#40;htable->word2 ^ htable->word1&#41;) &#123;
      hwhich = ((&#40;htable->word2 ^ htable->word1&#41; >> log_hash&#41; & 1&#41; + 1;
      &#40;htable + hwhich&#41;->word1 = htable->word1;
      &#40;htable + hwhich&#41;->word2 = htable->word2;
    &#125;
    htable->word1 = word1;
    htable->word2 = word2 ^ word1;
  &#125; else &#123;
    hwhich = (&#40;word2 >> log_hash&#41; & 1&#41; + 1;
    &#40;htable + hwhich&#41;->word1 = word1;
    &#40;htable + hwhich&#41;->word2 = word2 ^ word1;
  &#125;
&#125;
SImple. This is the "lockless hashing" idea for parallel search.

The issue is that different threads can store into the same hash table address at approximately the same time. There are two 64 bit values, the signature (A) and the score/etc (B). If two processors try to store at the same time, you want to get eihere A1,B1 or A2,B2 stored in the table, but due to timing you can get A1,B2 or A2, B1 which is a problem.

If, when you store A1, B1, you xor them to modify the stored signature, and when you probe that entry, you xor again to recover the original signature, you eliminate the possibility of getting a match on A1, but getting data B2 which would be wrong.

The lockless hashing paper is on my web page at www.cis.uab.edu, where you click on faculty then my name...

If you don't do this, you have to use a normal Lock()/Unlock() which is way excessive overhead due to all the hash storing and probing going on.
Bob, I thought you got rid of hash lock code? Did you not come up with another findings saying that the result of hash errors is insignificant?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty Transpostion Table Question

Post by bob »

CThinker wrote:
bob wrote:
Eric Stock wrote:Hi I am just curious why Crafty XORs its 64 bit hash key with its 64 bits of hash information (score, type, depth...etc) before it stores a hash entry. Why doesn't it just store the 64 bit hash the way it is? By the way, this code is from Crafty23.0.

The line of code I am interested in is

htable->word2 = word2 ^ word1;

There are two occurrences of this line in the last 5 lines of the code I copied


Thanks,
Eric Stock


Code: Select all

void HashStore&#40;TREE * RESTRICT tree, int ply, int depth, int wtm, int type,
    int value, int bestmove&#41; &#123;
  register BITBOARD word1, word2;
  register HASH_ENTRY *htable;
  register int draft, age, hwhich;

/*
 ************************************************************
 *                                                          *
 *   "Fill in the blank" and build a table entry from       *
 *   current search information.                            *
 *                                                          *
 ************************************************************
 */
  word1 = transposition_id;
  word1 = &#40;word1 << 4&#41; | type;
  if &#40;value > MATE - 300&#41;
    value += ply - 1;
  else if &#40;value < -MATE + 300&#41;
    value -= ply - 1;
  word1 = &#40;word1 << 25&#41; | bestmove;
  word1 = &#40;word1 << 15&#41; | depth;
  word1 = &#40;word1 << 17&#41; | &#40;value + 65536&#41;;
  word2 = &#40;wtm&#41; ? HashKey &#58; ~HashKey;
/*
 ************************************************************
 *                                                          *
 *   If the draft of this entry is greater than the draft   *
 *   of the entry in the "depth-priority" table, or if the  *
 *   entry in the depth-priority table is from an old       *
 *   search, move that entry to the always-store table and  *
 *   then replace the depth-priority table entry by the new *
 *   hash result.                                           *
 *                                                          *
 ************************************************************
 */
  htable = trans_ref + 3 * &#40;word2 & hash_mask&#41;;
  draft = &#40;htable->word1 >> 17&#41; & 0x7fff;
  age = htable->word1 >> 61;
  if &#40;age != transposition_id || &#40;depth >= draft&#41;) &#123;
    if &#40;word2 != &#40;htable->word2 ^ htable->word1&#41;) &#123;
      hwhich = ((&#40;htable->word2 ^ htable->word1&#41; >> log_hash&#41; & 1&#41; + 1;
      &#40;htable + hwhich&#41;->word1 = htable->word1;
      &#40;htable + hwhich&#41;->word2 = htable->word2;
    &#125;
    htable->word1 = word1;
    htable->word2 = word2 ^ word1;
  &#125; else &#123;
    hwhich = (&#40;word2 >> log_hash&#41; & 1&#41; + 1;
    &#40;htable + hwhich&#41;->word1 = word1;
    &#40;htable + hwhich&#41;->word2 = word2 ^ word1;
  &#125;
&#125;
SImple. This is the "lockless hashing" idea for parallel search.

The issue is that different threads can store into the same hash table address at approximately the same time. There are two 64 bit values, the signature (A) and the score/etc (B). If two processors try to store at the same time, you want to get eihere A1,B1 or A2,B2 stored in the table, but due to timing you can get A1,B2 or A2, B1 which is a problem.

If, when you store A1, B1, you xor them to modify the stored signature, and when you probe that entry, you xor again to recover the original signature, you eliminate the possibility of getting a match on A1, but getting data B2 which would be wrong.

The lockless hashing paper is on my web page at www.cis.uab.edu, where you click on faculty then my name...

If you don't do this, you have to use a normal Lock()/Unlock() which is way excessive overhead due to all the hash storing and probing going on.
Bob, I thought you got rid of hash lock code? Did you not come up with another findings saying that the result of hash errors is insignificant?
Not with respect to hash scores. But the "best move" is an issue. You can choose one of the following options if your engine makes the "best move" during the search:

(1) validate the move, assuming that making an illegal move will corrupt your board or other data structures;

(2) use an atomic lock to make sure that the move part of an entry goes with the corresponding signature part;

(3) use the lockless approach which is very low cost.

Any of the above will work just fine, but (1) introduces overhead for the validation procedure, while 2 introduces tons of memory/cache issues as well as stalling processes when there really is no conflict. I just took the solution with the least impact on NPS since all work equally well.
Engin
Posts: 918
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: Crafty Transpostion Table Question

Post by Engin »

hello,

i dont understand this:

hwhich = (((htable->word2 ^ htable->word1) >> log_hash) & 1) + 1;

what is log_hash and why &1 +1 ?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty Transpostion Table Question

Post by bob »

Engin wrote:hello,

i dont understand this:

hwhich = (((htable->word2 ^ htable->word1) >> log_hash) & 1) + 1;

what is log_hash and why &1 +1 ?
That is old code. The idea was based on Ken Thompson's "belle hash table" which used 2 tables, one "depth-preferred" and one "always store" where the always-store table was 2x bigger. To be more cache friendly, I merged these two tables so that a hash probe goes to a group of 3 entries. The 0th entry is the "depth-preferred" table entry, the 1st and 2nd entries are the "always store" entry. I use one of the bits of the hash signature to select which entry, but since &1 gives me a value of 0 or 1, I add 1 to it to make it 1 or 2.

Crafty has not used that for several versions...

Log hash is simply log2(number_of_entries). Tells me how many bits to shift off so that I get the next sequential bit for the "hwhich" calculation.
Engin
Posts: 918
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: Crafty Transpostion Table Question

Post by Engin »

ok, i understand now, many thanks, i am using on Tornado other way with 4 clusters, 1 prefer depth, and 3 for allways store, i dont using for SMP any lock/unlock (hoped its not get corrupt datas) in TT, but i made separate hashes for eval and pawn hashtables for each thread, they cant crash together anytime, but for the cost of that nether of any dont know what others are stored in his hashes.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty Transpostion Table Question

Post by bob »

Engin wrote:ok, i understand now, many thanks, i am using on Tornado other way with 4 clusters, 1 prefer depth, and 3 for allways store, i dont using for SMP any lock/unlock (hoped its not get corrupt datas) in TT, but i made separate hashes for eval and pawn hashtables for each thread, they cant crash together anytime, but for the cost of that nether of any dont know what others are stored in his hashes.
If an entry is more than 8 bytes, your program must either be able to live on through corrupt data, or else validate the data before using it. Unless you want to use locks which are very expensive on hash tables.
LiquidNitrogenOverclocker

Re: Crafty Transpostion Table Question

Post by LiquidNitrogenOverclocker »

bob wrote: Any of the above will work just fine, but (1) introduces overhead for the validation procedure, while 2 introduces tons of memory/cache issues as well as stalling processes when there really is no conflict. I just took the solution with the least impact on NPS since all work equally well.
I had to read over this a few times. Then the lightbulb went off.

That's awesome!

:)