hgm wrote:Well, I am not sure why you qote the unbalanced use of the slots like you do. It should depend on replacement scheme, and when you replace the least deep I don't see any reason why that shoud be in the first slot. So when the table gets fully loaded I expect all slots to be used equally. But on average you have to search through more slots to find it back if you do it like your approach 3. E.g. if all entries have the same draft, replacing always the first indeed makes the entries in the later slots stay there very long, which is likely to be detrimental. (Poor use of the slot; the entry is likely to be stale.) But if you would decide randomly which one to replace in that case, that effect would disappear again. But the fact that you store in slot later in the sequence than you could have, means you have to search more slots efore you find it.
With the XOR solution you can afford to always replace the first of equal drafts, because 'first' does not mean the same on different probes, so it does not bias use of one slot over another.
In my engine Spartacs I use 5 slots in a hash bucket: four depth-preferred, and one replace-always slot. The basic slots are only 12 bytes, so there is room for 4 'aging' fields in the bucket as well. The depth-preferred slots are grouped into two pairs. The primary probe always goes to one of the four depth-preferred slots. If it is not there, I check the other other slot of the pair, and then the replace-always slot. If it is not in any of those 3, it is a miss, and will replace in order of preference:
1) The primary (depth-preferred) slot if it was stale (based on its aging field)
2) The depth-preferred slot with the lowest draft if the new draft is higher
3) The primary slot if the depth-preferred drafts were equal and the new draft is higher
4) The replace-always slot if none of the above can be done
well, first the given numbers of "case 3" are arbitrarily, taken from memory when i played around with it,
and only want to note there can be significant unbalanced usage.
if i remember correctly, the scenario was exactly which you descriped as
"likely to be detrimental". it was, absolutely ! unfortunatelly i only used
depth prefered replacement, given an offset with "+" operator.
this leads to biased usage.
Indeed, for testing purposes i xored in a random (costly rand()) number for analysis.
(but the final step to replace the "+" with "^" i missed at that point
).
i simply forgot about the random nature of the hashnumber,
where xor in an offset would balance out the slot usage
(without explicit xor in rndNumber, and already having one
)
And it is absolutley like you explained: key ^ id can be (k,k+1,k+2,k+3)
now, in mind that the slot usage is also balanced with block addressing,
i still dont see the general reason to use block addressing.
Maybe the (or at least 1) answer is given in your example.
(i have to figure out what details your example includes)
cheers