Transposition Table: addressing slots

Desperado · Post by **Desperado** » Fri Mar 18, 2011 11:16 am


approach 1&#58;
 
 - slotNb == 1024&#40;powerOf2&#41; , blocksize == 4,
 - address = key % &#40;1024-4&#41; + 0,1,2,3

 in this case every slotNb can be adressed directly.
 to avoid an overflow the index is reduced by blocksize.
 therefore slotNb is no longer powerOf2, so the modulo operator is needed.

Code: Select all


approach 2&#58;

 - slotNb == 1024&#40;powerOf2&#41; + 4 , blocksize== 4
 - address = key & &#40;1024-1&#41; + 0,1,2,3

 in this case every slotNb can also be adressed directly.
 here an overflow is avoided using 1 additional block&#40;4 slots&#41;.

Code: Select all


approach 3&#58;

 - slotNb == 1024&#40;powerOf2&#41; , blocksize== 4&#40;powerOf2&#41;
 - address = key & &#40;1024-4&#41; + 0,1,2,3

 here the directly adressable slots &#40;block adressing&#41;
 are only slotNb/blocksize.

maybe some of you can give me some hints or experience feedback.
i am especially interested in the following points.

* alignment tablesize (slotNb keeps powerOf2)
* alignment slots ( case 3 compared to 1,2)
* speed "%" vs "&" operation
*
what influence does the replacement scheme will
have when especially in case 3 (block addressing),
the usage of the slots is unbalanced (ex: 70%,25%,4%,1% for slot 0,1,2,3).
Somehow i have the impression, that is wasting memory size,
like only have a tablesize divided by 4 ie.

Michael

hgm · Post by **hgm** » Fri Mar 18, 2011 11:36 am

The & operator takes 1 clock cycle, and 3 of them can be done in parallel. The latency of the % operator depends on the number of bits, and shouldly somewhere between 30 and 70 clock cycles. Bob reported that using % for the hash address calculation in Crafty slowed down the NPS by 20%, IIRC.

What I usually do is

Code: Select all

- slotNb == 1024&#40;powerOf2&#41;, blocksize== 4
- address = key & &#40;1024-1&#41; ^ 0, 1, 2, 3;

So no overflow, no cache-line transgression, every slot directly addressable, homogeneous slot use.

Desperado · Post by **Desperado** » Fri Mar 18, 2011 12:13 pm

hgm wrote:The & operator takes 1 clock cycle, and 3 of them can be done in parallel. The latency of the % operator depends on the number of bits, and shouldly somewhere between 30 and 70 clock cycles. Bob reported that using % for the hash address calculation in Crafty slowed down the NPS by 20%, IIRC.

What I usually do is
Code: Select all
- slotNb == 1024&#40;powerOf2&#41;, blocksize== 4
- address = key & &#40;1024-1&#41; ^ 0, 1, 2, 3;
So no overflow, no cache-line transgression, every slot directly addressable, homogeneous slot use.

very good

i wonder why i never thought of this obvious solution. it is so nice,
simple and fulfills all my needs. thx.

but anyway i am still interested to know sth about block addressing pros/(cons),
even i cannot imagine it s doing better than direct slot addressing.

thx

hgm · Post by **hgm** » Fri Mar 18, 2011 12:51 pm

Well, I am not sure why you qote the unbalanced use of the slots like you do. It should depend on replacement scheme, and when you replace the least deep I don't see any reason why that shoud be in the first slot. So when the table gets fully loaded I expect all slots to be used equally. But on average you have to search through more slots to find it back if you do it like your approach 3. E.g. if all entries have the same draft, replacing always the first indeed makes the entries in the later slots stay there very long, which is likely to be detrimental. (Poor use of the slot; the entry is likely to be stale.) But if you would decide randomly which one to replace in that case, that effect would disappear again. But the fact that you store in slot later in the sequence than you could have, means you have to search more slots efore you find it.

With the XOR solution you can afford to always replace the first of equal drafts, because 'first' does not mean the same on different probes, so it does not bias use of one slot over another.

In my engine Spartacs I use 5 slots in a hash bucket: four depth-preferred, and one replace-always slot. The basic slots are only 12 bytes, so there is room for 4 'aging' fields in the bucket as well. The depth-preferred slots are grouped into two pairs. The primary probe always goes to one of the four depth-preferred slots. If it is not there, I check the other other slot of the pair, and then the replace-always slot. If it is not in any of those 3, it is a miss, and will replace in order of preference:

1) The primary (depth-preferred) slot if it was stale (based on its aging field)
2) The depth-preferred slot with the lowest draft if the new draft is higher
3) The primary slot if the depth-preferred drafts were equal and the new draft is higher
4) The replace-always slot if none of the above can be done

Desperado · Post by **Desperado** » Fri Mar 18, 2011 1:54 pm

hgm wrote:Well, I am not sure why you qote the unbalanced use of the slots like you do. It should depend on replacement scheme, and when you replace the least deep I don't see any reason why that shoud be in the first slot. So when the table gets fully loaded I expect all slots to be used equally. But on average you have to search through more slots to find it back if you do it like your approach 3. E.g. if all entries have the same draft, replacing always the first indeed makes the entries in the later slots stay there very long, which is likely to be detrimental. (Poor use of the slot; the entry is likely to be stale.) But if you would decide randomly which one to replace in that case, that effect would disappear again. But the fact that you store in slot later in the sequence than you could have, means you have to search more slots efore you find it.

With the XOR solution you can afford to always replace the first of equal drafts, because 'first' does not mean the same on different probes, so it does not bias use of one slot over another.

In my engine Spartacs I use 5 slots in a hash bucket: four depth-preferred, and one replace-always slot. The basic slots are only 12 bytes, so there is room for 4 'aging' fields in the bucket as well. The depth-preferred slots are grouped into two pairs. The primary probe always goes to one of the four depth-preferred slots. If it is not there, I check the other other slot of the pair, and then the replace-always slot. If it is not in any of those 3, it is a miss, and will replace in order of preference:

1) The primary (depth-preferred) slot if it was stale (based on its aging field)
2) The depth-preferred slot with the lowest draft if the new draft is higher
3) The primary slot if the depth-preferred drafts were equal and the new draft is higher
4) The replace-always slot if none of the above can be done

well, first the given numbers of "case 3" are arbitrarily, taken from memory when i played around with it,
and only want to note there can be significant unbalanced usage.

if i remember correctly, the scenario was exactly which you descriped as
"likely to be detrimental". it was, absolutely ! unfortunatelly i only used
depth prefered replacement, given an offset with "+" operator.
this leads to biased usage.
Indeed, for testing purposes i xored in a random (costly rand()) number for analysis.
(but the final step to replace the "+" with "^" i missed at that point

).

i simply forgot about the random nature of the hashnumber,
where xor in an offset would balance out the slot usage
(without explicit xor in rndNumber, and already having one

)

And it is absolutley like you explained: key ^ id can be (k,k+1,k+2,k+3)

now, in mind that the slot usage is also balanced with block addressing,
i still dont see the general reason to use block addressing.
Maybe the (or at least 1) answer is given in your example.
(i have to figure out what details your example includes)

cheers

hgm · Post by **hgm** » Fri Mar 18, 2011 2:16 pm

Yes, the hash key is a very nice basis for random numbers. In Joker I have a random term in the evaluation, in order to cause non-determinism. (But it also acts as a poor-man's mobility evaluation.) To get a reasonably cheap 8-bit random, I calculate

MAGIC*(key^mask) >> 24

where MAGIC is some constant that makes sure all bits of (key^mask) contribute to the random, and 'mask' is a variable intialized from the time at the start of the game. (I print it in the log file, to be sure I can reproduce results later, by forcing the same mask.)

The nice thing is that, within a game, it is also a reproducible random, in the sense that the same position will always use the same random term in its eval. It is only random from game to game, but within a game it is just a consistent preference for some positions over others.

Transposition Table: addressing slots

Transposition Table: addressing slots

Re: Transposition Table: addressing slots

Re: Transposition Table: addressing slots

Re: Transposition Table: addressing slots

Re: Transposition Table: addressing slots

Re: Transposition Table: addressing slots