Cache over-writing and PV's

gladius · Post by **gladius** » Thu Oct 23, 2014 10:50 pm

bob wrote:Which is what I wanted when I came up with this idea. I want to see the exact endpoint that led to this score, not some halfway point where I have to use imagination to guess what happened after that <HT> sentinel in the first example.

This is quite a nice solution! I've prototyped it in SF and it works very well.

Here is the code for anyone interested:
https://github.com/glinscott/Stockfish/ ... v?expand=1

Warning: not production ready

Rein Halbersma · Post by **Rein Halbersma** » Thu Oct 23, 2014 11:15 pm

bob wrote: What you will discover is that if you double the search time, you double the number of PV stores as a max. I've been writing up the hash path idea. Here is a bit of data. I used fine #70 which is a real stress-test for hashing.

Total probes/stores done = 387,877,352
Total times exact code executed (either HashProbe() OR HashStore()) = 2813

So almost 400M nodes searched, and I either probed OR stored an entry in the hash path table 2800 times. You can't even measure that in terms of time expended, it is so small. The benefit is that (a) you don't do this ridiculous "don't allow EXACT hash hits on the PV" which gives up some efficiency and (b) you get PVs that are complete every time except for the random hash hit. But I use a bucket size of 16 for this table since it is so infrequently accessed, which really drives overwrites down, and it doesn't require a huge table either.

Nice writeup! So just to make sure: you could in principle even use a dynamically allocated array (std::vector in C++) for this secondary hash, that just doubles its capacity (using realloc, or std::vector automatic expansion) when it becomes full? And then you'd be 100% protected from the rare hash overwrite at little to no extra cost. Is that correct?

bob · Post by **bob** » Thu Oct 23, 2014 11:24 pm

Rein Halbersma wrote:
bob wrote: What you will discover is that if you double the search time, you double the number of PV stores as a max. I've been writing up the hash path idea. Here is a bit of data. I used fine #70 which is a real stress-test for hashing.

Total probes/stores done = 387,877,352
Total times exact code executed (either HashProbe() OR HashStore()) = 2813

So almost 400M nodes searched, and I either probed OR stored an entry in the hash path table 2800 times. You can't even measure that in terms of time expended, it is so small. The benefit is that (a) you don't do this ridiculous "don't allow EXACT hash hits on the PV" which gives up some efficiency and (b) you get PVs that are complete every time except for the random hash hit. But I use a bucket size of 16 for this table since it is so infrequently accessed, which really drives overwrites down, and it doesn't require a huge table either.
Nice writeup! So just to make sure: you could in principle even use a dynamically allocated array (std::vector in C++) for this secondary hash, that just doubles its capacity (using realloc, or std::vector automatic expansion) when it becomes full? And then you'd be 100% protected from the rare hash overwrite at little to no extra cost. Is that correct?

It would seem so. However, not knowing what kind of overhead that might incur, one would obviously want to test it. But with so few stores, there is really nothing that says a linear list would not work equally well so that you never overwrite anything. But it would certainly need some testing. I started off with a hash-based approach because it seemed like the most logical solution. Whether it is or not I do not know for certain. There could be more stores in some oddly pathological position, obviously. All of my tests showed minimal numbers (I let it run on ICC for a few days playing some pretty long games (60 minutes and above) and I didn't see a single "alarming" probe/store total, which at least suggests it is reasonably bounded.

Rein Halbersma · Post by **Rein Halbersma** » Thu Oct 23, 2014 11:40 pm

bob wrote:
Rein Halbersma wrote:
bob wrote: What you will discover is that if you double the search time, you double the number of PV stores as a max. I've been writing up the hash path idea. Here is a bit of data. I used fine #70 which is a real stress-test for hashing.

Total probes/stores done = 387,877,352
Total times exact code executed (either HashProbe() OR HashStore()) = 2813

So almost 400M nodes searched, and I either probed OR stored an entry in the hash path table 2800 times. You can't even measure that in terms of time expended, it is so small. The benefit is that (a) you don't do this ridiculous "don't allow EXACT hash hits on the PV" which gives up some efficiency and (b) you get PVs that are complete every time except for the random hash hit. But I use a bucket size of 16 for this table since it is so infrequently accessed, which really drives overwrites down, and it doesn't require a huge table either.
Nice writeup! So just to make sure: you could in principle even use a dynamically allocated array (std::vector in C++) for this secondary hash, that just doubles its capacity (using realloc, or std::vector automatic expansion) when it becomes full? And then you'd be 100% protected from the rare hash overwrite at little to no extra cost. Is that correct?
It would seem so. However, not knowing what kind of overhead that might incur, one would obviously want to test it. But with so few stores, there is really nothing that says a linear list would not work equally well so that you never overwrite anything. But it would certainly need some testing. I started off with a hash-based approach because it seemed like the most logical solution. Whether it is or not I do not know for certain. There could be more stores in some oddly pathological position, obviously. All of my tests showed minimal numbers (I let it run on ICC for a few days playing some pretty long games (60 minutes and above) and I didn't see a single "alarming" probe/store total, which at least suggests it is reasonably bounded.

2800 nodes is about a dozen doublings in capacity for 400M nodes. In C++, std::vector has exactly that behavior and on average, every element in the array is only reallocated once (the last 50%, never, the 25% after that only once, the 12.5% after that only twice, etc.). So a dozen calls to realloc, 2800 copies of entries and that's it.

Of course, in an engine, you keep the allocated memory of that secondary hash table in between searches. I think that's what I'll try in my program. I need 100% guaranteed PVs.

syzygy · Post by **syzygy** » Fri Oct 24, 2014 1:00 am

gladius wrote:
bob wrote:Which is what I wanted when I came up with this idea. I want to see the exact endpoint that led to this score, not some halfway point where I have to use imagination to guess what happened after that <HT> sentinel in the first example.
This is quite a nice solution! I've prototyped it in SF and it works very well.

Here is the code for anyone interested:
https://github.com/glinscott/Stockfish/ ... v?expand=1

Warning: not production ready

How about storing in each entry only the first move of the PV and walking down the PV to obtain it in full (basically as is already being done, but for the regular tt table)?

The PVCache would need more entries, as overwriting an entry will more easily lose (part of) the PV, but each entry would take up far less memory.

syzygy · Post by **syzygy** » Fri Oct 24, 2014 1:05 am

Rein Halbersma wrote:Of course, in an engine, you keep the allocated memory of that secondary hash table in between searches. I think that's what I'll try in my program. I need 100% guaranteed PVs.

If you make sure not to overwrite any entries, then only storing the best move (and maybe a score) for each PV node seems an even better idea. Since you don't lose any entries, you will always be able to reconstruct the full PV by walking the full path.

But of course nowadays it might not matter whether each entry takes up 16 bytes or 256 bytes...

gladius · Post by **gladius** » Fri Oct 24, 2014 1:36 am

syzygy wrote:
gladius wrote:
bob wrote:Which is what I wanted when I came up with this idea. I want to see the exact endpoint that led to this score, not some halfway point where I have to use imagination to guess what happened after that <HT> sentinel in the first example.
This is quite a nice solution! I've prototyped it in SF and it works very well.

Here is the code for anyone interested:
https://github.com/glinscott/Stockfish/ ... v?expand=1

Warning: not production ready
How about storing in each entry only the first move of the PV and walking down the PV to obtain it in full (basically as is already being done, but for the regular tt table)?

The PVCache would need more entries, as overwriting an entry will more easily lose (part of) the PV, but each entry would take up far less memory.

I tried it, and it didn't work very well unfortunately. Even with a (relatively) huge table of 1 million entries, I was still getting a lot of truncated PVs. Here is the branch if you are interested in what it looked like:
https://github.com/glinscott/Stockfish/ ... h?expand=1

There is certainly more experimentation I can do, but the accuracy of the crafty method is going to be tough to beat I think.

xmas79 · Post by **xmas79** » Fri Oct 24, 2014 10:40 am

Rein Halbersma wrote:
bob wrote: What you will discover is that if you double the search time, you double the number of PV stores as a max. I've been writing up the hash path idea. Here is a bit of data. I used fine #70 which is a real stress-test for hashing.

Total probes/stores done = 387,877,352
Total times exact code executed (either HashProbe() OR HashStore()) = 2813

So almost 400M nodes searched, and I either probed OR stored an entry in the hash path table 2800 times. You can't even measure that in terms of time expended, it is so small. The benefit is that (a) you don't do this ridiculous "don't allow EXACT hash hits on the PV" which gives up some efficiency and (b) you get PVs that are complete every time except for the random hash hit. But I use a bucket size of 16 for this table since it is so infrequently accessed, which really drives overwrites down, and it doesn't require a huge table either.
Nice writeup! So just to make sure: you could in principle even use a dynamically allocated array (std::vector in C++) for this secondary hash, that just doubles its capacity (using realloc, or std::vector automatic expansion) when it becomes full? And then you'd be 100% protected from the rare hash overwrite at little to no extra cost. Is that correct?

This was my first approach, then I rewrote it as an hashtable of std::vector of PVs to keep the overhead even smaller.

xmas79 · Post by **xmas79** » Fri Oct 24, 2014 11:00 am

bob wrote:...Which is what I wanted when I came up with this idea. I want to see the exact endpoint that led to this score, not some halfway point where I have to use imagination to guess what happened after that <HT> sentinel in the first example.

Yes, that's what I usually get in my engine too.

But another question arises: I noticed that sometimes there is a difference between the score of the PV retrieved from "hashed paths" table and the score stored in the probed TT. How can it happen? I also noticed that sometimes the score MUST be forced to be different, just because in the retrieved path there is a repetition that must be scored as a draw. So when I retrieve the pv on exact hash hit I usually play it on the board and take care of repetitions and update the score accordingly. Is this a good approach?

bob · Post by **bob** » Fri Oct 24, 2014 6:51 pm

xmas79 wrote:
bob wrote:...Which is what I wanted when I came up with this idea. I want to see the exact endpoint that led to this score, not some halfway point where I have to use imagination to guess what happened after that <HT> sentinel in the first example.
Yes, that's what I usually get in my engine too.

But another question arises: I noticed that sometimes there is a difference between the score of the PV retrieved from "hashed paths" table and the score stored in the probed TT. How can it happen? I also noticed that sometimes the score MUST be forced to be different, just because in the retrieved path there is a repetition that must be scored as a draw. So when I retrieve the pv on exact hash hit I usually play it on the board and take care of repetitions and update the score accordingly. Is this a good approach?

Different depths. Remember that the PV you yank from the hash pv table goes to the endpoint where that PV was stored. If you then overwrite that entry in the main TT with a shallower entry, or a deeper entry, or an entry with a different bound, the hash score won't match the score at the end of the retrieved PV. But if you make every move from root to tip, the evaluation should match the backed up (to the root) score exactly unless there is a bug somewhere.

Draws are, and always have been problematic. Because when you graft paths as the ttable does, repetitions will be overlooked since the ttable has no path information, just score and such. Not much you can do there, except to not allow exact hash matches anywhere, which seems much worse than the problem.

Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's