Undercut replacement

hgm · Post by **hgm** » Thu Jul 26, 2018 2:00 pm

As aging TT entries by search number renders an engine useless for deep interactive analysis, I prefer to treat all searches of an analysis session as belonging to the same search. This has the problem that the depth-preferred part of the TT will eventually fill up with entries of quite high depth, referring all intermediate-depth entries to the always-replace part of the table. Where they get overwritten very quickly by the flood of QS nodes (or d=1 nodes when I do not hash QS).

This is rather detrimental for the search speed. Because of the way a tree is traversed, positions are revisited in a time proportional to the effort that was needed to search them, which furthermore increases for later visits, until it isn't revisited at all. So ideally you would want to keep an entry during a time that is proportional to the time needed to produce entries of that depth. Then a slot in the TT is either used to store a deep entry for a long time, saving a lot of work on a hit, or (sequentially) a lot of low-depth entries, each hit on them saving less work, but the total savings amounting to the same. Keeping it longer is counter-productive, because the hit rate falls in time, and to keep it much longer with a low hit rate goes at the expense of time that shallower entries could be stored with high hit rate (and vice versa).

I now use a very simple replacement scheme to achieve that, by using undercut replacement instead of pure always-replace, in combination with depth-preferred (with aging). 'Undercut' replacement can be seen as a weakened version of depth-preferred: instead of only replacing when the new depth is larger than or equal to the stored depth, it also allows replacement by entries that are exactly 1 ply less deep. For d=0 or d=1 entries this makes no difference, as they will always get replaced. But entries of larger depth will get replaced less frequently, as most attempts to conquer their slot will not have enough depth to manage that. E.g. an entry with d=8 is immune to replacement by all nodes with d=0 to 6, and will survive for a long time in the table. Entries that are refused for the undercut slot will be deferred to a true always-replace slot of the hash bucket.

An easy implementation of this is:

Code: Select all

HashEntry *h = hashTable + (index & mask); // mask has LSB cleared, so h points to an even (always-replace) slot now
if(h->lock == key || (++h)->lock == key) { // hash hit (tested always-replace slot first, as that gives most hits)
  ...
} else { // hash miss; replace ('h' now points to higest (odd) member of pair, which is the undercut slot)
  h -= (depth < h->depth - 1);  // if new depth not enough to replace that, go back to always-replace slot
  h->lock = key; // replace
  h->depth = depth;
  ...
}

Although in this schem the survival time of entries is proportional to their production time (assuming this grows exponentially with depth), the d=0 entries are still overrepresented, because there are no d = -1 arrivals attempting to clear them from the table. This can be remedied by (for the others) not allowing an undercut on every attempt, but throttling succesful attempts down by (approximately) the branching factor. To achieve that the decision to allow replacement of the undercut slot could be

Code: Select all

  h -= (depth < h->depth - !(nodeCount & 15));

This would always allow replacement by equal or higher depth, but only allows undercut in 1 out of 8 cases, which statistically has the desired effect of driving up de survival time by a factor 16. It would be even better to equip each hash bucket with its own counter, rather than using nodeCount for this, so that it would always be replaced only on the 8th attempt, rather than with a probability 1/8 in every attempt, making the survival time more homogeneous. This would require extra space in the hash entry, though, which in general is only competative if there were unused bits. Only a few bits are needed, though. E.g. suppose we had a 'flags' byte in the entry that still has the high nibble available, (the rest being used for bound-type and in-check flags), we could do

Code: Select all

  h -= (depth == depth - 1 && (h->flags += 16) < 16 || depth < h->depth - 1); // supposes 'flags' unsigned

This scheme can be used in combination with depth-preferred, e.g. having 2 depth-preferred, 1 undercut and 1 always-replace slot in each hash bucket. The depth-preferred slots would accumulate the deepest two entries the bucket has ever seen, and even if these would completely fill up with depth similar to what you typically reach near the root (during an analysis session), half the table would still remain useful for helping along any new search in a close-to-optimal way.

odomobo · Post by **odomobo** » Tue Jul 31, 2018 9:18 pm

Have you done performance analysis on how this affects real games, and how it affects analysis sessions? It sounds interesting, but I haven't worked out (in my head) the soundness of the approach.

hgm · Post by **hgm** » Tue Jul 31, 2018 10:16 pm

Replacement schemes would only affect games when you heavily overload the TT (i.e. number of nodes per search more than 10 times the TT size). With 256MB hash (very small by today's memory standards) I have some 20M entries, so I would need 200M nodes per search before I could start to see anything. At 2Mnps (my typical engine speed) that would mean 100 sec/move. There really is no opportunity to play games at TC slower than that, so for games is doesn't really matter how the scheme performs, as long as it is't completely wrecked. The described scheme is upward compatible with the 'shallowest-of-two' scheme + aging (known to work quite well) in a TT 2/3 the size (while a factor 2 in size typically means a factor 1.06 in speed).

So it is really only the analysis performance that counts. This is hard to quantify. But I notice that with schemes that have no way to purge entries from the depth-preferred part of the table (when there are no search numbers to give a clue about aging), in a long analysis session (days to weeks) new searches get very sluggish. Starting another engine process (with an empty hash table) then reaches the desired depth much quicker (but doesn't have results of other analyses in its TT, so it isn't much help). With undercut replacement they don't. But when I only use undercut + always-replace, the chances for losing the result of a previously analyzed position is too large, and when I return to the parent of it it is often gone.

Undercut replacement

Undercut replacement

Re: Undercut replacement

Re: Undercut replacement