Now I understand the idea. For nelems > 2G (still quite unusual today) you could use more than 32 bits in the formula above. In my opinion "nelems" should be called "nIndices" as in "number of hash table indices" which is not necessarily the same as "number of hash table elements" if you have more than one slot per index.mar wrote:Ok I will try, hope you don't mindSven Schüle wrote: I don't understand your remark. "key & 0xffffffffull" and "key >> 32" are not the same, so it seems you want to imply that you could use either the lower or the upper 32 bits of the hash key in that way. But can you explain why and how that works, in either of the two cases, to multiply by "nelem" (which I assume to be the total number of hash table entries) and then to use the upper 32 bits of the result? To me it seems this operation will often result in "0", at least for smaller hash tables.
Also, why is a multiplication by "nelem" significantly faster than a "modulo nelem"?
The "traditional" way is to use a "modulo nelems" operation, which can be optimized (manually or by the compiler) with a bitwise operation if "nelems" (resp. "nIndices") is a power of 2.
Sven
I think Volker's point is that it doesn't really matter if you use lower 32 bits or upper 32 bits to compute entry index. While these two are not functionally equivalent, they shouldn't matter at all because you should have good pseudorandom bits in either upper or lower part of the 64-bit key.
Of course once you have more than 2G entries the distribution will probably not be very good.
How it works: think of 0..0xffffffff it as a real number <0..1), this trick was called fixed point in the good old times.
So you multiply it by entry range and only keep the integer part.
Since zobrist hashes should have a good distribution of bits, it should work just fine.
As for performance: integer division is always slow. Of course if you know in advance you deal with entry_count = power of two, you can mask it of course.
But the point is that forcing TT size to be powers of two means you won't be able to utilize memory efficiently. Say you have 16 gigs of RAM, something goes for the system and so on - so you can only utilize up to 8 gigs for the hash table if you use powers of two.
Using Volker's method you should be able to utilize say 12 gigs without problems.
If "nIndices" is a power of two, say 2^K, then the "multiplication" approach uses the upper K of 32 (or more) bits for the index while the "modulo" approach uses the lower K bits. In general this should be almost equivalent, although I am not perfectly sure whether any trouble might arise from using the upper K bits when the 32 bits are the lower 32 ones of the key so that a lot of consecutive hash keys are mapped to the same table slot. It is just some doubt in my head, I can't explain why.
If the only advantage of the "multiplication" approach is the ability to use more than 50% of the available RAM for the transposition table then I would still favor the "modulo" approach together with the restriction to a power of 2 as number of hash table indices, in order to allow for a small speedup by using a shift instead of the "modulo".
For those programs where the difference between using 8GB and, say, 12GB TT hash is really important (I doubt there are many!), it would be an option to buy 20 or 24GB RAM instead of "only" 16GB, so that a doubling to 16GB TT hash would be possible with the "power of 2" approach.
Sven