Data structure choice for TT

Tony P. · Post by **Tony P.** » Fri Jul 10, 2020 6:41 am

Let's share thoughts on unusual data structures that may further improve the utility of a TT. To be clear, I'm not writing an engine, just theorycrafting yet

I've been considering what would happen if an engine had expensive eval or move ordering and visited 10-100 times fewer nps than the top AB engines. At such precision and speed, and possibly larger entry size, a more conservative and flexible TT entry replacement strategy would probably be needed, still not as easy as in the current NN engines that are compute bound.

Hence I've been searching for alternatives to fixed-size buckets. I guess chaining is too impractical for chess because of frequent cache misses, so open addressing methods are better candidates.

Let me give an example. When introduced to Robin Hood hashing,

dragontamer5788 wrote: ↑Mon Nov 11, 2019 4:31 pm I do believe the "best hash table" award goes to linear probing with robin hood hashing these days, but I guess that's a bit off topic. Cuckoo hashing has some nifty parallel implementations for what its worth.

I visited the link and found a comment by Wolfgang Brehm (posted 12 hours ago, oh irony) linking to the Github page detailing his invention called the patchmap.

This approach might sound very similar to the one suggested in "Ordered Hash Tables" (01 January 1974) O. Amble D. E. Knuth but there is a mayor difference in that the hash tables there are ordered first by the hash of the keys and then by the value of the key. If the entries are however only ordered by the hash value there exists a global order, which all of a sudden allows advanced search strategies like interpolation search and binary search.

In fact the approach is much more similar to robin hood hashing with linear probing because if two keys collide at the same position the key whose hash value is lower will also have been displaced more and will therefore displace the key with the larger hash value, therefore keeping up the same order. There has however been no hash table using robin hood hashing with linear probing that takes advantage of this fact.

Wolfgang's article cited there acknowledges slowish deletion and a cost of insertion that grows steeply as the load factor approaches 1, but in chess, there's no need to delete an entry on demand (except clearing the whole table) nor to always search for an empty slot during insertion - with a good enough replacement strategy, after a cheap lookup shows no match but a collision, only a few (but, optionally, a less limited variable number of) adjacent entries will usually be probed before a replaceable one is found. As a bonus, the patchmap never requires rehashing.

hgm · Post by **hgm** » Fri Jul 10, 2020 10:21 am

Note that hash tables as used for the Transposition Table in alpha-beta tree search is not really a standard hash table. Normally, loss of data by overwriting would be completely unacceptable. So the 'standard' hashing methods have developed all kind of techniques for adding new entries to a nearly full table. These can sometimes become very time consuming when you are looking for an empty slot in an almost saturated table, and interfere with the ability to find entries if others are deleted. The discussion about the various techniques focuses on this aspect: e.g. "do I have to do 100 retries in a table that is 99% filled, or just 20".

For chess this is all totally unimportant. It is unavoidable that your hash table gets overloaded, as you search more nodes than that you have memory to store them. So you will have to live with replacement. Once you accepted that, expensive methods for avoiding replacement while you are filling the last 1% (say) of an initially empty TT become totally non-competitive. Just start the replacement that you will be forced to do when the table is 100.0% filled from the very start.

The commonly used replacement methods are pretty much optimal in this respect. Just hash the position to a set of two to four table entries in the same memory cache line, and overwrite the entry in there that you consider least valuable. No extra memory accesses needed, not much work to look for the entry during probing, overwriting of other entries cannot prevent the surviving entries to be found... Ideal in every respect. And the distribution of the importance of the entries in a typical game tree is such that you will almost always find one of the most unimportant entries in a set of four, which is then suitable for overwriting.

The only issue is really what type of entries would be least important to keep. This is where the replacement schemes differ.

My current favorite is to have a bucket that contains one 'always replace' slot, where everything goes that cannot be put elsewhere, one under-cut slot for retaining the deeper entries of the current search, and two depth-preferred slots, which keep the best two depths that ever mapped into the bucket until they age (which they wouldn't doring an analysis session).

Tony P. · Post by **Tony P.** » Mon Jul 13, 2020 8:27 am

hgm wrote: ↑Fri Jul 10, 2020 10:21 am And the distribution of the importance of the entries in a typical game tree is such that you will almost always find one of the most unimportant entries in a set of four, which is then suitable for overwriting.

Thanks a lot for this insight!

My mistake was that I was overestimating the ratio of the number of important entries (I'll denote it as k) to the number of buckets (m) and hugely overestimating the expected number of entries that don't fit in the table because of multicollisions.

As a corollary of von Mises's birthday theorem (e.g. cited as Theorem 2.1 in this preprint), the number of j-collisions is ~m(k/m)^j/j! (m->inf, k/m->0) if the distribution of hashes is close enough to uniform, so at low nps in engine games, the ratio of 3-collisions is small enough (considering aging), which justifies 2-slot buckets

If there were ever a need for slots bigger than 32 bytes (e.g. to store data about a few notable quiet moves for ordering/MCTS), maybe even 2x64-byte buckets wouldn't cause a slowdown on CPUs with adjacent cache line prefetch.

Also, Zobrist hashing is 3-independent (which I didn't know) but not 4-independent, so now I wouldn't feel that bad if I had to make buckets of 2 instead of 3 or 4.

syzygy · Post by **syzygy** » Mon Jul 13, 2020 12:19 pm

Tony P. wrote: ↑Mon Jul 13, 2020 8:27 am As a corollary of von Mises's birthday theorem (e.g. cited as Theorem 2.1 in this preprint), the number of j-collisions is ~m(k/m)^j/j! (m->inf, k/m->0) if the distribution of hashes is close enough to uniform, so at low nps in engine games, the ratio of 3-collisions is small enough (considering aging), which justifies 2-slot buckets

Calculations become much easier if you assume that the TT is filled 100%. Since engines don't clean the TT between moves, this assumption is entirely realistic. No need for the birthday paradox anymore.

(Admittedly this assumption is "more true" if you are interested in hashing errors. If you are interested in useful entries being overwritten, then things are more complicated since you would need to have a model of which entries are still useful. In any event, since "replace always" seems to work best in practice, the only parameter left seems to be bucket size.)

Also, Zobrist hashing is 3-independent (which I didn't know) but not 4-independent, so now I wouldn't feel that bad if I had to make buckets of 2 instead of 3 or 4.

I don't know what you mean by N-independent, but I have never seen any indication that Zobrist hashing performs less well than an ideal "fully random" and uniformly distributed hash function. (Except when something obvious goes wrong when generating the Zobrist keys, such as generating only even numbers.)

hgm · Post by **hgm** » Mon Jul 13, 2020 1:39 pm

Even buckets of one (pure 'always replace') work acceptably for game play; this is what micro-Max uses. For analysis, however, it is really inconvenient if results from deep searches you did before are lost from the table. With buckets of two (always overwriting the lowest stored depth) you already improve that a lot, but it will still happen occasionally that two important entries map to the same bucket, and flush each other out of the table. Also, in a long analysis session almost every bucket will get to contain an entry with such a large depth that it will virtually never be replaced. So that the search itself then basically is restricted to using an always-replace table of half the size. Which performs noticeably poorer when the tree starts to overload the table.

Even with larger buckets you will eventually run into the problem that all entries that can be overwritten only by higher depth will get filled. For this reason I also put one or more under-cut slots in a bucket. When the depth-preserving slots then all are filled, the search can still run efficiently using the always-replace and under-cut slots, the latter making sure the TT still makes some distinction between precious and less-precious search results, and hangs on to the more-precious (high-depth) ones a lot longer.

Pio · Post by **Pio** » Mon Jul 13, 2020 4:08 pm

I don't know what you mean by N-independent, but I have never seen any indication that Zobrist hashing performs less well than an ideal "fully random" and uniformly distributed hash function. (Except when something obvious goes wrong when generating the Zobrist keys, such as generating only even numbers.)

One problem with Zobrist-hashing is that when two positions do collide there is a high probability that lots of positions in both of the two positions’ subtrees will collide.

Tony P. · Post by **Tony P.** » Mon Jul 13, 2020 5:05 pm

Yes, that's exactly why bucketing is needed. However, I can't easily construct a 3-collision (3 positions with the same hash) based on a 2-collision. Surely, a 3-collision may spawn further ones in the subtrees because if h(pos1)=h(pos2)=h(pos3), then h(pos1 xor move)=h(pos2 xor move)=h(pos3 xor move) for every move that's legal in all the 3 positions (that applies to any homomorphic hash function, with xor replaced by the respective group operation), but I expect fewer moves on average to be such than in the 2-collision case, considering that the colliding positions are likely to be far away from the root in very different parts of the game tree.

Anyway, I can't tell if xxHash or some other hashing method performs better than Zobrist until I actually write code and A/B test

What are some examples of fast and strong engines with non-Zobrist hashes?

Tony P. · Post by **Tony P.** » Mon Jul 13, 2020 5:23 pm

syzygy wrote: ↑Mon Jul 13, 2020 12:19 pm I don't know what you mean by N-independent

Oops, sorry for the confusion. Here's the definition in Wikipedia.

Actually, the N-independence of a family of hash functions guarantees nothing about the performance of its particular member with fixed parameters

, and it assumes nothing about the distribution of keys, but at least it gives hope for a high enough chance that the parameters give a near-uniform distribution of hashes under a naturally occurring (not adversarial) distribution of keys. However, I did read the CPW anecdote about an unlucky choice of the seed

Tony P. · Post by **Tony P.** » Mon Jul 13, 2020 5:52 pm

Tony P. wrote: ↑Mon Jul 13, 2020 5:05 pm What are some examples of fast and strong engines with non-Zobrist hashes?

Oops, Leela is an obvious example

(of a strong one). Its hash function, based on a few additions, multiplications, bit shifts and concatenations, doesn't seem slow.

Pio · Post by **Pio** » Mon Jul 13, 2020 10:12 pm

Tony P. wrote: ↑Mon Jul 13, 2020 5:52 pm
Tony P. wrote: ↑Mon Jul 13, 2020 5:05 pm What are some examples of fast and strong engines with non-Zobrist hashes?
Oops, Leela is an obvious example (of a strong one). Its hash function, based on a few additions, multiplications, bit shifts and concatenations, doesn't seem slow.

My engine does also not use Zobrist but uses instead a series of multiplications, shifts and XOR-ing but of course my engine is really bad. I think Zobrist-hashing is extremely smart and much better than what I currently do.

I have one idea for a maybe new type of replacement strategy. What about using a scheme that keeps the read entries and kicks out the seldom accessed ones. My idea is as follow (let’s take a four entry bucket as an example):

1) when saving an entry always save it as the last entry and kick out the existing entry at its place, i.e. save the new position as entry 4 by overwriting what is there (should not be done if the positions are the same and the depth is less for the new one, then you just keep the old information since it is more precious)

2) when reading the TT and getting a hit you bump up the read entry and bump down the entry that is replaced. For example if you access/read entry 3 you bump it up to entry 2 and bump the previous entry 3 down to entry 2. If you read the first entry you don’t do anything.

I guess my replacement strategy will work better for those engines that also probe interior nodes since otherwise my replacement strategy might kick out deeper entries too much.

Some nice properties of the strategy is that:
1) it is very simple
2) unreachable positions will quickly be kicked out since they won’t be accessed
3) You will not need an age field and can thus make the entries a little bit smaller and simplify the logic
4) If the TT is probed for interior nodes the TT will keep more of the deeper entries

What do you think?

/Pio

Data structure choice for TT

Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT