hgm wrote:None of my engines stores more than 32 bits of the hash key. Joker uses 9-byte hash entries, 7 per bucket.
5 entries of 12 bytes each is also a nice design, if you want to store dual bounds + depth (2x 2-byte score, 2x 1-byte depth, 2-byte move, 4-byte key). Then you have 4 bytes left in a 64-byte bucket, which can be used for 4 aging fields (while the 5th entry is always-replace, and so does not need an aging field).
Good answer!
Regards,
Forrest
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
Zenmastur wrote:That I could understand if most chess programs were written in Lisp OR they had dynamically re-sizeable TT. Most don't.
"Most" ≠ "All"
The number of signature bits recoverable from a table entry given its contents and its address governs the false positive rate when all other factors are held equal.
40 bits will be enough for storing a moderately sized opening book...
What do you consider moderate in size for a book?
sje wrote:... 64 bits will not be enough for calculating perft(13) and higher...
Not sure why you would need a TT to calculate perft, but if you do need one, perft(8) has ~40% chance of having a collision with a 64-bit key. Perft(13) will have a collision rate of about 0.1. Having 10% of your entries over-written doesn't sound like a usable system to me!
sje wrote:It's not necessary that the length of a transposition table be a power of two. Another approach is to have the entry count of the table be some prime p and the address of an entry given by the value key mod p. For some years now, the cost of a division isn't much different than the cost of a mask and shift; using key mod p can be an improvement because it allows for better utilization of available memory. After all, there are many primes from which to choose.
How much of a difference is not much?
Regards,
Forrest
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
sje wrote:It's not necessary that the length of a transposition table be a power of two. Another approach is to have the entry count of the table be some prime p and the address of an entry given by the value key mod p. For some years now, the cost of a division isn't much different than the cost of a mask and shift; using key mod p can be an improvement because it allows for better utilization of available memory. After all, there are many primes from which to choose.
No added benefit in requiring p to be prime.
Agreed!
But I would like to know how much of a difference there is between mod and shift/mask.
syzygy wrote:
Zenmastur wrote:So I'm still left wondering why they choose to waste memory?
The easy answer is that they do that because it's easy.
Hash entries have a fixed size indepedent of the size of the hash table (i.e. independent of the number of bits used for indexing). So it does not make sense to eliminate exactly those bits that are used for indexing, unless the size of your hash table is fixed. Also keep in mind that most want hash entries to be 16 bytes, so that 4 entries fit exactly in a cacheline. Hash entries of 12 bytes are less efficient to access.
To the first statement, that's one of things I was wondering about, to the second statement, I guess that depends on how you measure efficiency.
syzygy wrote: ...Storing 32 bits of the key in the hash entry and using the rest for indexing into the table seems to be the best compromise to me, also because nowadays you probably use most of the other 32 bits for indexing into the table anyway.
Agreed.
Regards,
Forrest
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
Collisions happen all the time. They aren't much of a problem. False positive matches can be a problem, and this problem is greatly mitigated by long signatures. Symbolic (and Oscar) use 128 bit signatures, and this drives the probability of a false positive to practically nothing.
The use of a prime is a tradition first employed with the idea that a relatively large prime is unlikely to accidentally be a dominating factor of the signature generation function.
sje wrote:It's not necessary that the length of a transposition table be a power of two. Another approach is to have the entry count of the table be some prime p and the address of an entry given by the value key mod p. For some years now, the cost of a division isn't much different than the cost of a mask and shift; using key mod p can be an improvement because it allows for better utilization of available memory. After all, there are many primes from which to choose.
No added benefit in requiring p to be prime.
Agreed!
But I would like to know how much of a difference there is between mod and shift/mask.
syzygy wrote:
Zenmastur wrote:So I'm still left wondering why they choose to waste memory?
The easy answer is that they do that because it's easy.
Hash entries have a fixed size indepedent of the size of the hash table (i.e. independent of the number of bits used for indexing). So it does not make sense to eliminate exactly those bits that are used for indexing, unless the size of your hash table is fixed. Also keep in mind that most want hash entries to be 16 bytes, so that 4 entries fit exactly in a cacheline. Hash entries of 12 bytes are less efficient to access.
To the first statement, that's one of things I was wondering about, to the second statement, I guess that depends on how you measure efficiency.
syzygy wrote: ...Storing 32 bits of the key in the hash entry and using the rest for indexing into the table seems to be the best compromise to me, also because nowadays you probably use most of the other 32 bits for indexing into the table anyway.
Agreed.
Regards,
Forrest
why don't you simply test? replace the usual
htable = signature & mask;
by
htable = signature % mask;
then search to a fixed depth that takes maybe 5 minutes total. Run both and you can determine whether there is any cost or not. Note that you don't need to change anything else since the above two are equivalent in their final result. So a one line change and a benchmark run.
bob wrote:why don't you simply test? replace the usual
htable = signature & mask;
by
htable = signature % mask;
then search to a fixed depth that takes maybe 5 minutes total. Run both and you can determine whether there is any cost or not. Note that you don't need to change anything else since the above two are equivalent in their final result. So a one line change and a benchmark run.
Better do:
htable = signature % (mask + 1)
So that the search tree is identical as well. Or precompute 'mask + 1' outside of the critical path.
bob wrote:why don't you simply test? replace the usual
htable = signature & mask;
by
htable = signature % mask;
then search to a fixed depth that takes maybe 5 minutes total. Run both and you can determine whether there is any cost or not. Note that you don't need to change anything else since the above two are equivalent in their final result. So a one line change and a benchmark run.
Better do:
htable = signature % (mask + 1)
So that the search tree is identical as well. Or precompute 'mask + 1' outside of the critical path.
Should have used size rather than mask when doing the mod, was not thinking..
For key mod p to work well, it's not necessary that p be prime. But p should be odd (and so coprime with 2^N) so that all N bits of the key get a chance to contribute to the entry address generation.
Divisions take some 30 clock cycles, compared to 1-3 for shifts and multiplies. And this is not only latency, but even the throughput is only 1 per 30 cycles, as the divide unit is not pipelined.
But I guess you could pre-calculate (1<<N)/size, so that you can do the modulo as key - ((((1<<N)/size)*key)>>N)*size with sufficiently large N to not lose precision.
I would be surprised if an optimizer would be smart enough to do that, though, if you just write key%size.