Can someone explain what advantage there is to...

Zenmastur · Post by **Zenmastur** » Tue Oct 07, 2014 11:12 am

hgm wrote:None of my engines stores more than 32 bits of the hash key. Joker uses 9-byte hash entries, 7 per bucket.

5 entries of 12 bytes each is also a nice design, if you want to store dual bounds + depth (2x 2-byte score, 2x 1-byte depth, 2-byte move, 4-byte key). Then you have 4 bytes left in a 64-byte bucket, which can be used for 4 aging fields (while the 5th entry is always-replace, and so does not need an aging field).

Good answer!

Regards,

Forrest

Zenmastur · Post by **Zenmastur** » Tue Oct 07, 2014 11:26 am

sje wrote:
Zenmastur wrote:That I could understand if most chess programs were written in Lisp OR they had dynamically re-sizeable TT. Most don't.
"Most" ≠ "All"

The number of signature bits recoverable from a table entry given its contents and its address governs the false positive rate when all other factors are held equal.

40 bits will be enough for storing a moderately sized opening book...

What do you consider moderate in size for a book?

sje wrote:... 64 bits will not be enough for calculating perft(13) and higher...

Not sure why you would need a TT to calculate perft, but if you do need one, perft(8) has ~40% chance of having a collision with a 64-bit key. Perft(13) will have a collision rate of about 0.1. Having 10% of your entries over-written doesn't sound like a usable system to me!

sje wrote:It's not necessary that the length of a transposition table be a power of two. Another approach is to have the entry count of the table be some prime p and the address of an entry given by the value key mod p. For some years now, the cost of a division isn't much different than the cost of a mask and shift; using key mod p can be an improvement because it allows for better utilization of available memory. After all, there are many primes from which to choose.

How much of a difference is not much?

Regards,

Forrest

Zenmastur · Post by **Zenmastur** » Tue Oct 07, 2014 11:34 am

syzygy wrote:
sje wrote:It's not necessary that the length of a transposition table be a power of two. Another approach is to have the entry count of the table be some prime p and the address of an entry given by the value key mod p. For some years now, the cost of a division isn't much different than the cost of a mask and shift; using key mod p can be an improvement because it allows for better utilization of available memory. After all, there are many primes from which to choose.
No added benefit in requiring p to be prime.

Agreed!

But I would like to know how much of a difference there is between mod and shift/mask.

syzygy wrote:
Zenmastur wrote:So I'm still left wondering why they choose to waste memory?
The easy answer is that they do that because it's easy.

Hash entries have a fixed size indepedent of the size of the hash table (i.e. independent of the number of bits used for indexing). So it does not make sense to eliminate exactly those bits that are used for indexing, unless the size of your hash table is fixed. Also keep in mind that most want hash entries to be 16 bytes, so that 4 entries fit exactly in a cacheline. Hash entries of 12 bytes are less efficient to access.

To the first statement, that's one of things I was wondering about, to the second statement, I guess that depends on how you measure efficiency.

syzygy wrote: ...Storing 32 bits of the key in the hash entry and using the rest for indexing into the table seems to be the best compromise to me, also because nowadays you probably use most of the other 32 bits for indexing into the table anyway.

Agreed.

Regards,

Forrest

sje · Post by **sje** » Tue Oct 07, 2014 11:44 am

Symbolic's default book has 481,583 entries.

Collisions happen all the time. They aren't much of a problem. False positive matches can be a problem, and this problem is greatly mitigated by long signatures. Symbolic (and Oscar) use 128 bit signatures, and this drives the probability of a false positive to practically nothing.

The use of a prime is a tradition first employed with the idea that a relatively large prime is unlikely to accidentally be a dominating factor of the signature generation function.

http://stackoverflow.com/questions/1145 ... er-modulus

Evert · Post by **Evert** » Tue Oct 07, 2014 11:47 am

Zenmastur wrote: Not sure why you would need a TT to calculate perft,

Of course you don't. You don't need one to play chess either.
Sure helps to speed things up though.

but if you do need one, perft(8) has ~40% chance of having a collision with a 64-bit key. Perft(13) will have a collision rate of about 0.1.

How do you get those numbers?
Anyway, Steven is probably the guy who knows the most about efficiently doing perft calculations in the world.

Having 10% of your entries over-written doesn't sound like a usable system to me!

Why?

bob · Post by **bob** » Tue Oct 07, 2014 4:55 pm

Zenmastur wrote:
syzygy wrote:
sje wrote:It's not necessary that the length of a transposition table be a power of two. Another approach is to have the entry count of the table be some prime p and the address of an entry given by the value key mod p. For some years now, the cost of a division isn't much different than the cost of a mask and shift; using key mod p can be an improvement because it allows for better utilization of available memory. After all, there are many primes from which to choose.
No added benefit in requiring p to be prime.
Agreed!

But I would like to know how much of a difference there is between mod and shift/mask.

syzygy wrote:
Zenmastur wrote:So I'm still left wondering why they choose to waste memory?
The easy answer is that they do that because it's easy.

Hash entries have a fixed size indepedent of the size of the hash table (i.e. independent of the number of bits used for indexing). So it does not make sense to eliminate exactly those bits that are used for indexing, unless the size of your hash table is fixed. Also keep in mind that most want hash entries to be 16 bytes, so that 4 entries fit exactly in a cacheline. Hash entries of 12 bytes are less efficient to access.
To the first statement, that's one of things I was wondering about, to the second statement, I guess that depends on how you measure efficiency.

syzygy wrote: ...Storing 32 bits of the key in the hash entry and using the rest for indexing into the table seems to be the best compromise to me, also because nowadays you probably use most of the other 32 bits for indexing into the table anyway.
Agreed.

Regards,

Forrest

why don't you simply test? replace the usual

htable = signature & mask;

by

htable = signature % mask;

then search to a fixed depth that takes maybe 5 minutes total. Run both and you can determine whether there is any cost or not. Note that you don't need to change anything else since the above two are equivalent in their final result. So a one line change and a benchmark run.

mvk · Post by **mvk** » Tue Oct 07, 2014 7:06 pm

bob wrote:why don't you simply test? replace the usual

htable = signature & mask;

by

htable = signature % mask;

then search to a fixed depth that takes maybe 5 minutes total. Run both and you can determine whether there is any cost or not. Note that you don't need to change anything else since the above two are equivalent in their final result. So a one line change and a benchmark run.

Better do:

htable = signature % (mask + 1)

So that the search tree is identical as well. Or precompute 'mask + 1' outside of the critical path.

bob · Post by **bob** » Tue Oct 07, 2014 10:19 pm

mvk wrote:
bob wrote:why don't you simply test? replace the usual

htable = signature & mask;

by

htable = signature % mask;

then search to a fixed depth that takes maybe 5 minutes total. Run both and you can determine whether there is any cost or not. Note that you don't need to change anything else since the above two are equivalent in their final result. So a one line change and a benchmark run.

Better do:

htable = signature % (mask + 1)

So that the search tree is identical as well. Or precompute 'mask + 1' outside of the critical path.

Should have used size rather than mask when doing the mod, was not thinking..

sje · Post by **sje** » Tue Oct 07, 2014 10:40 pm

For key mod p to work well, it's not necessary that p be prime. But p should be odd (and so coprime with 2^N) so that all N bits of the key get a chance to contribute to the entry address generation.

hgm · Post by **hgm** » Tue Oct 07, 2014 10:48 pm

Divisions take some 30 clock cycles, compared to 1-3 for shifts and multiplies. And this is not only latency, but even the throughput is only 1 per 30 cycles, as the divide unit is not pipelined.

But I guess you could pre-calculate (1<<N)/size, so that you can do the modulo as key - ((((1<<N)/size)*key)>>N)*size with sufficiently large N to not lose precision.

I would be surprised if an optimizer would be smart enough to do that, though, if you just write key%size.

Can someone explain what advantage there is to...

Re: Can someone explain what advantage there is to...

Re: When the whole key is stored

Re: When the whole key is stored

Re: When the whole key is stored

Re: When the whole key is stored

Re: When the whole key is stored

Re: When the whole key is stored

Re: When the whole key is stored

key mod p

Re: When the whole key is stored