Speaking of the hash table

diep · Post by **diep** » Wed Dec 19, 2012 1:31 pm

From crafty 22.9:

int HashProbe&#40;TREE * RESTRICT tree, int ply, int depth, int wtm, int *alpha,
    int beta&#41;
&#123;
  register BITBOARD word1, word2;
  register int type, draft, avoid_null = 0, val, probe;
  BITBOARD temp_hashkey;
  HASH_ENTRY *htable;

/*
 ************************************************************
 *                                                          *
 *   We loop thru two entries.  The first entry is the      *
 *   depth-priority entry &#40;the first in a triplet of        *
 *   entries&#41;.  The second is one of the two following      *
 *   entries, dictated by the rightmost bit of the hash     *
 *   signature that was not used to address the table       *
 *   triplet.                                               *
 *                                                          *
 *   We also return an "avoid_null" status if neither hash  *
 *   entry has enough draft to terminate the current search *
 *   but one of them does have enough draft to prove that   *
 *   a null-move search would not fail high.  This avoids   *
 *   the null-move search overhead in positions where it is *
 *   simply a waste of time to try it.                      *
 *                                                          *
 ************************************************************
 */
  tree->hash_move&#91;ply&#93; = 0;
  temp_hashkey = &#40;wtm&#41; ? HashKey &#58; ~HashKey;
  htable = trans_ref + 3 * &#40;temp_hashkey & hash_mask&#41;;
  for &#40;probe = 0; probe < 2; probe++) &#123;
    word1 = htable->word1;
    word2 = htable->word2;
    if &#40;word2 == temp_hashkey&#41; &#123;
      htable->word1 =
          &#40;word1 & 0x1fffffffffffffffULL&#41; | (&#40;BITBOARD&#41; transposition_id << 61&#41;;
      val = &#40;word1 & 0x1ffff&#41; - 65536;
      draft = &#40;word1 >> 17&#41; & 0x7fff;
      if (!tree->hash_move&#91;ply&#93;)
        tree->hash_move&#91;ply&#93; = &#40;word1 >> 32&#41; & 0x1fffff;
      type = &#40;word1 >> 59&#41; & 3;
      if (&#40;type & UPPER&#41; && depth - null_depth - 1 <= draft && val < beta&#41;
        avoid_null = AVOID_NULL_MOVE;
      if &#40;depth <= draft&#41; &#123;
        if &#40;val > MATE - 300&#41;
          val -= ply - 1;
        else if &#40;val < -MATE + 300&#41;
          val += ply - 1;
        switch &#40;type&#41; &#123;
        case EXACT&#58;
          *alpha = val;
          if &#40;draft != MAX_DRAFT&#41;
            return &#40;EXACT&#41;;
          else
            return &#40;EXACTEGTB&#41;;
        case UPPER&#58;
          if &#40;val <= *alpha&#41;
            return &#40;UPPER&#41;;
          break;
        case LOWER&#58;
          if &#40;val >= beta&#41;
            return &#40;LOWER&#41;;
          break;
        &#125;
      &#125;
    &#125;
    htable += (&#40;temp_hashkey >> log_hash&#41; & 1&#41; + 1;
  &#125;
  return &#40;avoid_null&#41;;
&#125;

So you're doing a store of the current positions ID if the position matches, something i did do in Diep 10+ years ago as well.

It's about this line:

Code: Select all

" htable->word1 =
          &#40;word1 & 0x1fffffffffffffffULL&#41; | (&#40;BITBOARD&#41; transposition_id << 61&#41;;
"

That's a 8 bytes write. You write 8 bytes in the hashprobe() routine, which is a bug.

This is why you could post about a dead old crafty 22.9 and not 23.x series which have a better implementation there.

Also between the read and the write there is a branch that gets mispredicted as odds is tiny you take it. Let's say a clock or 20 you lose there in a non-pgo'ed compile, as it's an expensive branch, which increases the bug's odds to occur.

So there is quite some clocks between the read and the 8 bytes write.

If another core writes a different entry to this slot, then you have with quite some big odds, i'd guess 70% roughly, an illegal move in there...

So for this bug to exist there is no need for some instruction stream interruption (which has very small odds of occuring at exactly this spot).

It's a bug.

Sven · Post by **Sven** » Wed Dec 19, 2012 2:52 pm

diep wrote:So you're doing a store of the current positions ID if the position matches, something i did do in Diep 10+ years ago as well.

It's about this line:
Code: Select all
" htable->word1 =
          &#40;word1 & 0x1fffffffffffffffULL&#41; | (&#40;BITBOARD&#41; transposition_id << 61&#41;;
"
That's a 8 bytes write. You write 8 bytes in the hashprobe() routine, which is a bug.

This line updates the "age" entry. In my opinion there is no bug in it, nor is it a bug to do that update within the HashProbe() at all, nor is it a bug to only update the first 8 bytes (since the second 8 bytes remain unchanged due to the "if (word2 == temp_hashkey)" condition).

diep wrote:This is why you could post about a dead old crafty 22.9 and not 23.x series which have a better implementation there.

The "better implementation" is indeed present in newer Crafty versions, but it is still doing the same "age" update, now based on the lockless hashing scheme which was (re-)introduced after 22.9. The XORing trick of course requires to write both 8-bytes words while updating "age" itself does not yet require to do so.

Sven

bob · Post by **bob** » Wed Dec 19, 2012 5:14 pm

diep wrote:From crafty 22.9:

Code: Select all

int HashProbe&#40;TREE * RESTRICT tree, int ply, int depth, int wtm, int *alpha,
    int beta&#41;
&#123;
  register BITBOARD word1, word2;
  register int type, draft, avoid_null = 0, val, probe;
  BITBOARD temp_hashkey;
  HASH_ENTRY *htable;

/*
 ************************************************************
 *                                                          *
 *   We loop thru two entries.  The first entry is the      *
 *   depth-priority entry &#40;the first in a triplet of        *
 *   entries&#41;.  The second is one of the two following      *
 *   entries, dictated by the rightmost bit of the hash     *
 *   signature that was not used to address the table       *
 *   triplet.                                               *
 *                                                          *
 *   We also return an "avoid_null" status if neither hash  *
 *   entry has enough draft to terminate the current search *
 *   but one of them does have enough draft to prove that   *
 *   a null-move search would not fail high.  This avoids   *
 *   the null-move search overhead in positions where it is *
 *   simply a waste of time to try it.                      *
 *                                                          *
 ************************************************************
 */
  tree->hash_move&#91;ply&#93; = 0;
  temp_hashkey = &#40;wtm&#41; ? HashKey &#58; ~HashKey;
  htable = trans_ref + 3 * &#40;temp_hashkey & hash_mask&#41;;
  for &#40;probe = 0; probe < 2; probe++) &#123;
    word1 = htable->word1;
    word2 = htable->word2;
    if &#40;word2 == temp_hashkey&#41; &#123;
      htable->word1 =
          &#40;word1 & 0x1fffffffffffffffULL&#41; | (&#40;BITBOARD&#41; transposition_id << 61&#41;;
      val = &#40;word1 & 0x1ffff&#41; - 65536;
      draft = &#40;word1 >> 17&#41; & 0x7fff;
      if (!tree->hash_move&#91;ply&#93;)
        tree->hash_move&#91;ply&#93; = &#40;word1 >> 32&#41; & 0x1fffff;
      type = &#40;word1 >> 59&#41; & 3;
      if (&#40;type & UPPER&#41; && depth - null_depth - 1 <= draft && val < beta&#41;
        avoid_null = AVOID_NULL_MOVE;
      if &#40;depth <= draft&#41; &#123;
        if &#40;val > MATE - 300&#41;
          val -= ply - 1;
        else if &#40;val < -MATE + 300&#41;
          val += ply - 1;
        switch &#40;type&#41; &#123;
        case EXACT&#58;
          *alpha = val;
          if &#40;draft != MAX_DRAFT&#41;
            return &#40;EXACT&#41;;
          else
            return &#40;EXACTEGTB&#41;;
        case UPPER&#58;
          if &#40;val <= *alpha&#41;
            return &#40;UPPER&#41;;
          break;
        case LOWER&#58;
          if &#40;val >= beta&#41;
            return &#40;LOWER&#41;;
          break;
        &#125;
      &#125;
    &#125;
    htable += (&#40;temp_hashkey >> log_hash&#41; & 1&#41; + 1;
  &#125;
  return &#40;avoid_null&#41;;
&#125;

So you're doing a store of the current positions ID if the position matches, something i did do in Diep 10+ years ago as well.

It's about this line:

Code: Select all

" htable->word1 =
          &#40;word1 & 0x1fffffffffffffffULL&#41; | (&#40;BITBOARD&#41; transposition_id << 61&#41;;
"

That's a 8 bytes write. You write 8 bytes in the hashprobe() routine, which is a bug.

Did you see my post where I commented that out, just to PROVE to you that is NOT where the illegal hash move stuff originates? Had the same number (roughly) of bad move messages with or without. The 128 bit stores get split. As I have repeatedly told you. repeatedly.

This is why you could post about a dead old crafty 22.9 and not 23.x series which have a better implementation there.

Current versions use lockless hashing so I get ZERO "bad move" errors. What are you talking about? I posted 22.9 because it clearly shows that a 128 bit write (actually two 64 bit writes) can get split exactly as I explained. Again, want a contact at Intel that can explain this to you? There are ZERO 128 bit atomic writes on Intel. ZERO.

Also between the read and the write there is a branch that gets mispredicted as odds is tiny you take it. Let's say a clock or 20 you lose there in a non-pgo'ed compile, as it's an expensive branch, which increases the bug's odds to occur.

So there is quite some clocks between the read and the 8 bytes write.

If another core writes a different entry to this slot, then you have with quite some big odds, i'd guess 70% roughly, an illegal move in there...

just take 22.9, comment that line OUT, and then run with 8 threads. Come back after you have learned that you are posting nonsense and that the bad move errors still occur at the SAME frequency as when that line was compiled in.

So for this bug to exist there is no need for some instruction stream interruption (which has very small odds of occuring at exactly this spot).

It's a bug.

There's a bug. But it is in your head, not in my code or understanding of the intel architecture...

diep · Post by **diep** » Wed Dec 19, 2012 7:14 pm

Sven Schüle wrote:
diep wrote:So you're doing a store of the current positions ID if the position matches, something i did do in Diep 10+ years ago as well.

It's about this line:
Code: Select all
" htable->word1 =
          &#40;word1 & 0x1fffffffffffffffULL&#41; | (&#40;BITBOARD&#41; transposition_id << 61&#41;;
"
That's a 8 bytes write. You write 8 bytes in the hashprobe() routine, which is a bug.
This line updates the "age" entry. In my opinion there is no bug in it, nor is it a bug to do that update within the HashProbe() at all, nor is it a bug to only update the first 8 bytes (since the second 8 bytes remain unchanged due to the "if (word2 == temp_hashkey)" condition).

diep wrote:This is why you could post about a dead old crafty 22.9 and not 23.x series which have a better implementation there.
The "better implementation" is indeed present in newer Crafty versions, but it is still doing the same "age" update, now based on the lockless hashing scheme which was (re-)introduced after 22.9. The XORing trick of course requires to write both 8-bytes words while updating "age" itself does not yet require to do so.

Sven

Sven if you ship 8 bytes to the memory controller it writes 8 bytes there, it doesn't update 5 most significant bits only.

It's all about statistics. The odds this goes wrong and causes Ab or aB is quite huge compared to if you store 0 bytes or 16.

So it's a clear SMP bug.

bob · Post by **bob** » Thu Dec 20, 2012 1:06 am

diep wrote:
Sven Schüle wrote:
diep wrote:So you're doing a store of the current positions ID if the position matches, something i did do in Diep 10+ years ago as well.

It's about this line:
Code: Select all
" htable->word1 =
          &#40;word1 & 0x1fffffffffffffffULL&#41; | (&#40;BITBOARD&#41; transposition_id << 61&#41;;
"
That's a 8 bytes write. You write 8 bytes in the hashprobe() routine, which is a bug.
This line updates the "age" entry. In my opinion there is no bug in it, nor is it a bug to do that update within the HashProbe() at all, nor is it a bug to only update the first 8 bytes (since the second 8 bytes remain unchanged due to the "if (word2 == temp_hashkey)" condition).

diep wrote:This is why you could post about a dead old crafty 22.9 and not 23.x series which have a better implementation there.
The "better implementation" is indeed present in newer Crafty versions, but it is still doing the same "age" update, now based on the lockless hashing scheme which was (re-)introduced after 22.9. The XORing trick of course requires to write both 8-bytes words while updating "age" itself does not yet require to do so.

Sven
Sven if you ship 8 bytes to the memory controller it writes 8 bytes there, it doesn't update 5 most significant bits only.

It's all about statistics. The odds this goes wrong and causes Ab or aB is quite huge compared to if you store 0 bytes or 16.

So it's a clear SMP bug.

Your argument is ridiculous. I have two choices. Read 16 bytes, update the age, and write all 16 back, or just update the age and write 8 back. Same problem either way. Another CPU can STILL read aB under either circumstance, absolutely nothing can be done to prevent it.

I don't update the age unless I know the sigs match, which was tested RIGHT before the age update. If it can break there, then two consecutive 8 byte stores can ALSO break. How hard is that to understand? Most of us get it...

Sven · Post by **Sven** » Thu Dec 20, 2012 10:28 am

diep wrote:Sven if you ship 8 bytes to the memory controller it writes 8 bytes there, it doesn't update 5 most significant bits only.

I never stated anything about updating 5 bits only (apart from the fact that the "age" field in 22.9 is 3 bits, not 5). These 8 bytes consist of 61 unchanged bits and 3 potentially changed bits. Of course 8 bytes are written to memory but the effect is that at most these 3 bits are changed. I fail to see, though, why this should matter.

Sven

diep · Post by **diep** » Thu Dec 20, 2012 5:37 pm

Sven Schüle wrote:
diep wrote:Sven if you ship 8 bytes to the memory controller it writes 8 bytes there, it doesn't update 5 most significant bits only.
I never stated anything about updating 5 bits only (apart from the fact that the "age" field in 22.9 is 3 bits, not 5). These 8 bytes consist of 61 unchanged bits and 3 potentially changed bits. Of course 8 bytes are written to memory but the effect is that at most these 3 bits are changed. I fail to see, though, why this should matter.

Sven

Sven there is 20 clocks or so that the branch takes; some other core might write to the hashtable entry another entry. A different position.

After that position B has been written by core2, the full 16 bytes (following cache coherency protocol, especially cacheline principle), after that we write 8 bytes from position A here without also writing the key again.

How many clocks is a node in crafty?

1200 clocks or so at bob's core2 Xeon? It's getting 20M nps there or so.

From this about 30% will be updated i'd guess.

1200 * 100 / 30 = 4000 clocks

This branch we can guess will eat at such point 20 clocks as it'll get mispredicted usually (in a normal compile).

20 / 4000 = 1/200

Multiply that chance by the chance that some other core at that moment is doing an update AND doing an update at exactly this slot.

Then you have the number of errors Bob spoke about...

So this is a real fullscale bug that CAN have impact at the game and screw you.

The odds for the bug in crafty to occur is simply huge.

It's not a write error that has to occur in order for this bug to happen, that's why it's a bug!

Having a normal implementation without the bug, odds dramatically fall to get a write error, as to get a write error, we need more than just 2 cores writing at this specific slot. We also need the scheduler to stop our instruction stream, as otherwise the delay in the cache coherency protocol will take care that we do things atomically correct.

It has been designed to go ok in such occasions....

At some kernels this means it won't happen any time soon in our lifetime... ...as the instruction stream won't get stopped at that spot (special control codes excepted - but we discussed that before how seldom this happens and usually your program is already busy terminating in such occasion).

bob · Post by **bob** » Thu Dec 20, 2012 5:55 pm

diep wrote:
Sven Schüle wrote:
diep wrote:Sven if you ship 8 bytes to the memory controller it writes 8 bytes there, it doesn't update 5 most significant bits only.
I never stated anything about updating 5 bits only (apart from the fact that the "age" field in 22.9 is 3 bits, not 5). These 8 bytes consist of 61 unchanged bits and 3 potentially changed bits. Of course 8 bytes are written to memory but the effect is that at most these 3 bits are changed. I fail to see, though, why this should matter.

Sven
Sven there is 20 clocks or so that the branch takes; some other core might write to the hashtable entry another entry. A different position.

After that position B has been written by core2, the full 16 bytes (following cache coherency protocol, especially cacheline principle), after that we write 8 bytes from position A here without also writing the key again.

How many clocks is a node in crafty?

1200 clocks or so at bob's core2 Xeon? It's getting 20M nps there or so.

From this about 30% will be updated i'd guess.

1200 * 100 / 30 = 4000 clocks

This branch we can guess will eat at such point 20 clocks as it'll get mispredicted usually (in a normal compile).

20 / 4000 = 1/200

Multiply that chance by the chance that some other core at that moment is doing an update AND doing an update at exactly this slot.

Then you have the number of errors Bob spoke about...

So this is a real fullscale bug that CAN have impact at the game and screw you.

The odds for the bug in crafty to occur is simply huge.

It's not a write error that has to occur in order for this bug to happen, that's why it's a bug!

Having a normal implementation without the bug, odds dramatically fall to get a write error, as to get a write error, we need more than just 2 cores writing at this specific slot. We also need the scheduler to stop our instruction stream, as otherwise the delay in the cache coherency protocol will take care that we do things atomically correct.

It has been designed to go ok in such occasions....

At some kernels this means it won't happen any time soon in our lifetime... ...as the instruction stream won't get stopped at that spot (special control codes excepted - but we discussed that before how seldom this happens and usually your program is already busy terminating in such occasion).

Funny, apparently you fail to read. I ran the test with that store completely removed. I Still get just as many bad hash moves without as I did with. The problem is that multiple caches are fighting for these entries, reading or writing 16 byte entries, but having to read/write 8 bytes at a time. The reads/writes get split and interleaved. And you get bad hash moves. I've been trying to explain this to you for a week. You seem to think that 16 bytes are read/written atomically. They absolutely are not. You can only read/write 8 bytes in an atomic fashion.

jeez...

diep · Post by **diep** » Thu Dec 20, 2012 5:56 pm

Bob don't post nonsense here.

bob · Post by **bob** » Thu Dec 20, 2012 10:16 pm

diep wrote:Bob don't post nonsense here.

I'm not. You are doing an excellent job of that. Your statements contradict what everyone else here has tried to explain, it contradicts Intel engineers, it contradicts Intel documentation, and it contradicts actual experimental results that I posted...

THAT, I would call "nonsense".

Speaking of the hash table

Re: Speaking of the hash table

Re: Speaking of the hash table

Re: Speaking of the hash table

Re: Speaking of the hash table

Re: Speaking of the hash table

Re: Speaking of the hash table

Re: Speaking of the hash table

Re: Speaking of the hash table

Re: Speaking of the hash table

Re: Speaking of the hash table