Transposition Age Tutorial

bob · Post by **bob** » Thu Jan 28, 2016 8:18 pm

Aleks Peshkov wrote:
bob wrote:What kind of info can you store in 8 bytes? Obviously you eliminate more than just depth and age.
I have 12-bit move encoding. So 12b move + 16b eval + 2bits for bound flags is enough inside 64bit slot. The rest is lock key.
What's the benefit?
My main reason is that 3 slots are much easier to keep ordered, then to shuffle 4.

The benefit of (3x free + 2x rigid) against 4x free way slot allocation is good (almost 5/4 better) when the table space is critically low, slowly diminishes to negative gain when the table is about exactly 4x of search space and becomes slightly better again as table grows relative to search space.

(2x free + 4x rigid) performed much worse. I tried rigid depth: 3-2-1-0, 2-1-0-0 and 1-1-0-0.

Why shuffle? Hash hits are more uncommon than hits. Stores happen at every node. Shuffling incurs overhead that will be wasted the majority of the time. Speed is irrelevant for hashing in general, as a 1% penalty can't be measured. Accuracy is more important, as every hit saves a lot more than that 1% cost, overall. I even changed to 8 entries per bucket and noticed no degradation in Elo. I still use only 4 because a cache line is the most convenient thing to use in terms of speed.

hgm · Post by **hgm** » Thu Jan 28, 2016 8:20 pm

bob wrote:Yes, but the bound is the important information, and with that you need draft.

Well, as he uses them only for draft=0 nodes, he knows the draft too.

Aleks Peshkov · Post by **Aleks Peshkov** » Thu Jan 28, 2016 8:51 pm

bob wrote:Why shuffle? Hash hits are more uncommon than hits. Stores happen at every node. Shuffling incurs overhead that will be wasted the majority of the time. Speed is irrelevant for hashing in general, as a 1% penalty can't be measured. Accuracy is more important, as every hit saves a lot more than that 1% cost, overall.

I do not understand the question.

On hash hit:
1) I update the hit entry age if needed.
2) If (1) happened I move the entry from outdated group to to fresh age group.
3) I rotate between the same depth entries the recently used one to the top of them.
While it looks complex there is only 3 slots to work with.

What do I do on write is complex to explain, but as my bucket is ordered all reordering happens on single pass with only 2 ifs and max 3 swaps. Update of all (up to total 5) 0-depth entries and keep them in LRU order is also obvious.

The gain is not great, but the maintenance cost is also low. There is only one potential disadvantage: more chance of losing information during simultaneous writes in several threads.

Aleks Peshkov · Post by **Aleks Peshkov** » Thu Jan 28, 2016 9:00 pm

hgm wrote:
bob wrote:Yes, but the bound is the important information, and with that you need draft.
Well, as he uses them only for draft=0 nodes, he knows the draft too. :o

Well, I think it is worse a try to ignore draft/age/eval completely and to store only the best move inside 4-bytes slots. Cash collision is not dangerous here.

bob · Post by **bob** » Fri Jan 29, 2016 1:10 am

Aleks Peshkov wrote:
D Sceviour wrote:The first test results for using a Crafty style multiple hash entry (compared to an always overwrite scheme) are disappointing so far. There is obviously a slowdown in performance, as more hash entries have to be tested. A quick tournament test showed no change in elo, so expected improvement look nominal unless there is another type of error missing.
You need scale transposition table size to test time control if you want to optimize normal table size with tournament time control.

Note that crafty has always been an "always store" program, it just chooses which of the 4 bucket entries to overwrite, but one will definitely go. I've compared this to the old belle approach of one always store table, and one depth-preferred table, with the always store being larger or smaller (1/2 or 2x). The 4 entry bucket is better, period. In fact, this is similar to set associativity in cache, with N-way = 4. 8 is guaranteed to perform better regarding tree size, only question is, does the computational cost offset this or turn it into a loss. I have not tried a 128 byte bucket, but with modern cache pre-fetching it might be a gain rather than a loss. I'll try it once the current pruning tests complete.

D Sceviour · Post by **D Sceviour** » Fri Jan 29, 2016 3:37 am

Aleks Peshkov wrote:
D Sceviour wrote:The first test results for using a Crafty style multiple hash entry (compared to an always overwrite scheme) are disappointing so far. There is obviously a slowdown in performance, as more hash entries have to be tested. A quick tournament test showed no change in elo, so expected improvement look nominal unless there is another type of error missing.
You need scale transposition table size to test time control if you want to optimize normal table size with tournament time control.

Is this correct, that a four-entry hash table has to allocate four times as much memory as a single-entry (always overwrite) hash table to achieve the same or better results?

bob · Post by **bob** » Fri Jan 29, 2016 7:13 am

D Sceviour wrote:
Aleks Peshkov wrote:
D Sceviour wrote:The first test results for using a Crafty style multiple hash entry (compared to an always overwrite scheme) are disappointing so far. There is obviously a slowdown in performance, as more hash entries have to be tested. A quick tournament test showed no change in elo, so expected improvement look nominal unless there is another type of error missing.
You need scale transposition table size to test time control if you want to optimize normal table size with tournament time control.
Is this correct, that a four-entry hash table has to allocate four times as much memory as a single-entry (always overwrite) hash table to achieve the same or better results?

Absolutely not. But I am not sure that is exactly what he is saying, either...

kbhearn · Post by **kbhearn** » Fri Jan 29, 2016 7:29 am

D Sceviour wrote:
Aleks Peshkov wrote:
D Sceviour wrote:The first test results for using a Crafty style multiple hash entry (compared to an always overwrite scheme) are disappointing so far. There is obviously a slowdown in performance, as more hash entries have to be tested. A quick tournament test showed no change in elo, so expected improvement look nominal unless there is another type of error missing.
You need scale transposition table size to test time control if you want to optimize normal table size with tournament time control.
Is this correct, that a four-entry hash table has to allocate four times as much memory as a single-entry (always overwrite) hash table to achieve the same or better results?

he's saying that to test efficiency of replacement schemes it's important that contention in the hash table is high enough that a reasonable amount of things need to actually be replaced. If your hash table is big enough that 90% of nodes don't have to fight for their slot you won't be able to distinguish between scheme A and scheme B and it'll only matter which is faster.

So what he's saying is if you're designing your engine for say a 4 GB hash when playing 40 moves in 120 minutes and your test games are running at say 40 moves in 1 minute, then you should probably be testing with only a 32MB hash table to get comparable hash pressure to compare the schemes.

Aleks Peshkov · Post by **Aleks Peshkov** » Fri Jan 29, 2016 2:12 pm

kbhearn wrote:So what he's saying is if you're designing your engine for say a 4 GB hash when playing 40 moves in 120 minutes and your test games are running at say 40 moves in 1 minute, then you should probably be testing with only a 32MB hash table to get comparable hash pressure to compare the schemes.

Exactly.

I tested various replacement schemes and some of them were better to another ones at some limited range of hash tables sizes (with always constant search limits and constant starting position). And with any larger or smaller table usage pressure they were worse.

I tested schemes where odd ply searches were better and even plies were worse comparing to simpler replacement algorithms.

hgm · Post by **hgm** » Fri Jan 29, 2016 2:47 pm

D Sceviour wrote:Is this correct, that a four-entry hash table has to allocate four times as much memory as a single-entry (always overwrite) hash table to achieve the same or better results?

No, that is certainly not correct. You should always compare replacement schemes under the constraint that they use the same amount of total memory. The idea of the schemes is to make optimal use of the resources that you have, and their merit is how well they do that.

And when you are testing the schemes, you should always do it under conditions of very heavy table overloading. If your tree fits entirely in the hash table, replacement should have no effect, as it is never done. (So the most stupid algorithm, that doesn't need to waste any time on deciding what to replace, would probably come out as best.) Note that the number of unique positions in a tree is often an order of magnitude lower than the number of nodes the search reports, due to transpositions. In the graphs I showed earlier you can see that the search only starts to slow down when the overload factor gets above 10 (= total nodes in tree 10 times larger than the number of hash entries).

Transposition Age Tutorial

Re: Transposition Age Tutorial

Re: Transposition Age Tutorial

Re: Transposition Age Tutorial

Re: Transposition Age Tutorial

Re: Transposition Age Tutorial

Re: Transposition Age Tutorial

Re: Transposition Age Tutorial

Re: Transposition Age Tutorial

Re: Transposition Age Tutorial

Re: Transposition Age Tutorial