Hashed PVs

xmas79 · Post by **xmas79** » Tue Aug 20, 2013 1:15 am

bob wrote:That is pretty much what I described when I started doing this a few years back. My default size is 65536 entries, and I don't see very many <HT> truncated PVs displayed...

I tried a fixed size, but I realized that pollution "helps". If I clear my table between iterations I get truncated PVs. If I don't clear hashtable I get PVs, but I have to decide how to overwrite (always replace?). So I'm wondering what's the "correct" size and/or when to clear (if at all). I switched now to a std::vector implementation, since it happens a few times and this thing is not performance-critical. And I'm also investigating how many iterations I can "use" without polluting too much....

hgm · Post by **hgm** » Tue Aug 20, 2013 9:39 am

You should not clear the PV hash unless you also clear the regular hash. The whole idea works because for every exact entry in the hash table (which could cause a hash cutoff in a PV node), the PV on which it was based is present in the PV hash. If it would be missing, you would be in exactly the same situation as when you don't hash the PV. You could not cut off without terminatig the PV at that move. And it is very possible that the next iteration (or even the next move) gets hits on entries produced during a previous search/iteration. (You could use a mixed approach, where you would forbid a PV hash cut when the PV hash causes a miss, in the hope that it was just an occasional overwrite, and that after the next move (which is stored in the regular hash) you will again hit upon an exact entry that does have an accompanying PV. It would be cleaner to just use a more advanced replacement scheme in the PV hash, however. E.g. using some aging there to decide which PVs can be safely overwritten, or mark the entries in the PV hash as free when you overwrite the exact entry in the main hash to which they correspond.)

bob · Post by **bob** » Tue Aug 20, 2013 8:31 pm

xmas79 wrote:
bob wrote:That is pretty much what I described when I started doing this a few years back. My default size is 65536 entries, and I don't see very many <HT> truncated PVs displayed...
I tried a fixed size, but I realized that pollution "helps". If I clear my table between iterations I get truncated PVs. If I don't clear hashtable I get PVs, but I have to decide how to overwrite (always replace?). So I'm wondering what's the "correct" size and/or when to clear (if at all). I switched now to a std::vector implementation, since it happens a few times and this thing is not performance-critical. And I'm also investigating how many iterations I can "use" without polluting too much....

I use the traditional replacement concept based on "age" of the entry, where an entry ages each time a new search is started (not new iteration).

xmas79 · Post by **xmas79** » Thu Aug 22, 2013 10:31 am

After I switched to std::vector during a test with Fine #70 my program suddenly crashed at ply 41, saying there was a PV error at ply 64.... The first thing I said was: "What? I have only 63 plies available!", as I have a MAX_PLY constant set to 64 (and a max allowable search set to "MAX_PLY-1")... I traced the problem being the same thing over and over... In fact, I saw that the incriminated PV was 67 plies length despite I had only 64 plies. TT overwrites? Yes! That means that during search my engine encountered a position where retrieved a PV longer that it's depth (and that's a normal behaviour as all we have qsearch), but the problem was that it was backed-up up to the root node, adding moves on moves, reaching that length. As I was doing memcpy, I was overwriting other variables...
Increasing MAX_PLY value actually fixed the problem (as I don't really search 128 plies, and probably never will), but the only real fix would be to truncate PVs during backup when final length is >= MAX_PLY.

Natale.

hgm · Post by **hgm** » Thu Aug 22, 2013 1:08 pm

Actually the most common reason for overly long PVs is hash-table grafting, not QS. You get hits on an entry that itself might have a 63-ply long PV, and it adds to the depth you are already at when you get that hit. This is exactly why allowing TT cutoffs can improve the engine play; you would never see such long PVs and beyond-the-horizon mates when you do not allow TT cutoff in PV nodes.

Note that the 'PV stack' method I use in Fairy-Max does not restrict the length of indvidual PVs; the only requirement is that the space reserved for the stack is sufficiently largen to hold all PVs. Of course if you want to hash PVs, there would be a length limit to what can be stored in the PV hash table.

Sven · Post by **Sven** » Thu Aug 22, 2013 1:35 pm

xmas79 wrote:After I switched to std::vector during a test with Fine #70 my program suddenly crashed at ply 41, saying there was a PV error at ply 64.... The first thing I said was: "What? I have only 63 plies available!", as I have a MAX_PLY constant set to 64 (and a max allowable search set to "MAX_PLY-1")... I traced the problem being the same thing over and over... In fact, I saw that the incriminated PV was 67 plies length despite I had only 64 plies. TT overwrites? Yes! That means that during search my engine encountered a position where retrieved a PV longer that it's depth (and that's a normal behaviour as all we have qsearch), but the problem was that it was backed-up up to the root node, adding moves on moves, reaching that length. As I was doing memcpy, I was overwriting other variables...
Increasing MAX_PLY value actually fixed the problem (as I don't really search 128 plies, and probably never will), but the only real fix would be to truncate PVs during backup when final length is >= MAX_PLY.

Natale.

I see nothing special in that kind of problem. Handle fixed-size arrays, like a PV, in the same way as fixed-size character strings. In the following example:

Code: Select all

#include <cstdio>
#include <string>

#define Min&#40;a,b&#41; ((&#40;a&#41;<&#40;b&#41;?&#40;a&#41;&#58;&#40;b&#41;)

char str&#91;10+1&#93;;
char const s1&#91;&#93; = "hello, ";
char const s2&#91;&#93; = "world";
void bad&#40;) &#123;
    strcpy&#40;str, s1&#41;; // dangerous!!
    strcat&#40;str, s2&#41;; // not only dangerous but wrong!!
&#125;
void better&#40;) &#123;
    size_t len1 = strlen&#40;s1&#41;;
    size_t len2 = strlen&#40;s2&#41;;
    strncpy&#40;str, s1, len1&#41;; // avoid buffer overflow
    strncpy&#40;str+len1, s2, Min&#40;len2, sizeof&#40;str&#41;-1-len1&#41;); // avoid buffer overflow again
    str&#91;sizeof&#40;str&#41;-1&#93; = '\0'; // ensure trailing zero in any case
&#125;

the first function crashes while the second does not (if I did not make a stupid error typing the code). Appending a sub-PV from hash to an existing PV should follow the second form, of course. When using memcpy() you should calculate how much space is still available in the destination PV buffer.

I don't know how using std::vector influences your problem, to me it seems you are mixing a fixed-size array with a dynamic-sized vector - or I did not read your message carefully enough.

Sven

xmas79 · Post by **xmas79** » Thu Aug 22, 2013 3:02 pm

Hi Sven,

Sven Schüle wrote:I don't know how using std::vector influences your problem, to me it seems you are mixing a fixed-size array with a dynamic-sized vector - or I did not read your message carefully enough.

The fixed-array issue was related to the length of PVs only, as my PV is a structure like the following:

Code: Select all

int length;
int score;
move_t Moves&#91;MAX_PLY&#93;;

That way, when I had a PV of length 64 and that PV was going to be backed UP, i memcpy-ed all the moves of the source PV from ply 0 to ply 63 to the destination PV in the moves slots from 1 to 64, hence the buffer overflow problem.

The std::vector is now used in place of a hashtable, so instead of having a hashtable of PVs, where each entry is a PV, I have now a std::vector of PVs. This was a quick test I did and it performs good (as PV nodes are fewer that ALL/CUT nodes). The problem can be when a long search runs, where I can have a bunch of PVs stored in the vector. In this situation, engine can have great hash hits, so then bottleneck becomes O(n) linear search in the vector. I think I'll move to a mixed approach, where I have a fixed-size hashtable whose entries are pointers to a vector of PVs. In that way I can access to a group of PVs in O(1), and then linear search inside that group (grouped by zobrist key of course)...

bob · Post by **bob** » Fri Aug 23, 2013 8:20 pm

xmas79 wrote:After I switched to std::vector during a test with Fine #70 my program suddenly crashed at ply 41, saying there was a PV error at ply 64.... The first thing I said was: "What? I have only 63 plies available!", as I have a MAX_PLY constant set to 64 (and a max allowable search set to "MAX_PLY-1")... I traced the problem being the same thing over and over... In fact, I saw that the incriminated PV was 67 plies length despite I had only 64 plies. TT overwrites? Yes! That means that during search my engine encountered a position where retrieved a PV longer that it's depth (and that's a normal behaviour as all we have qsearch), but the problem was that it was backed-up up to the root node, adding moves on moves, reaching that length. As I was doing memcpy, I was overwriting other variables...
Increasing MAX_PLY value actually fixed the problem (as I don't really search 128 plies, and probably never will), but the only real fix would be to truncate PVs during backup when final length is >= MAX_PLY.

Natale.

Chances are, no matter what you set max ply to, you will get a PV that is longer. You have to have a cutoff point where you realize "I can't copy all of this hash-recovered pv on to the end of the current one, because it simply is too long.

That bit me as soon as I wrote this code a few years ago.

I still get some PVs that end with <HT> as a result. I could change MAXPLY to 128 or 256, but I am not sure I want to really display PVs that long, I am not sure they are sensible.

sje · Post by **sje** » Sat Aug 24, 2013 2:00 pm

Even with a large but fixed length hash-keyed table, there is always the possibility of overwrites which lose data.

As I've written earlier, Symbolic uses a MoveList class for storing PV data and other variable length lists of moves. I've written my own TWList/TWNode class templates for this, although there may be equally good templates in the standard C++ template library. In any case, PV nodes are only those when (α < score < β) and these are relatively uncommon. Most of these nodes occur deep in the search where the PV data is short and quickly formed (or copied). Therefore, the time taken to manipulate PV data is small and not likely to be critical, so no need to worry about slight inefficiencies. It is more important to assure the correct and complete recovery of data in all cases.

hgm · Post by **hgm** » Sat Aug 24, 2013 2:28 pm

This is why I originally had in mind to store the PV with the hash entry (requisitioning its neighbor entry for it). Then it would be easier to guarantee the hashed PV would only disappear when the position to which it belongs is overwritten in the TT. Problem is that this severely restricts the length of the hashed PV. (With 16-byte entries, you could only store 16 PV moves even if you back the moves into single-byte encoding.)

If I were to implement this now I probably would do it by storing the PV as a linked list of 16-byte cells, each containing 7 PV moves and a 2-byte index of a successor cell. TT nodes with exact score would contain a similar 2-byte index in the field that normally holds the hash move (as the hash move is already stored as the first move of the PV). The cells would be recycled to the free list whenever the exact TT entry that points to it is overwritten. Very few PV nodes would need a PV of more than 14 moves (only those close to the root), so usually there would only be one cell per node.

Hashed PVs

Re: Hashed PVs

Re: Hashed PVs

Re: Hashed PVs

Re: Hashed PVs

Re: Hashed PVs

Re: Hashed PVs

Re: Hashed PVs

Re: Hashed PVs

Using variable length structures for PV storage

Re: Using variable length structures for PV storage