Cache over-writing and PV's

Adam Hair · Post by **Adam Hair** » Sat Oct 18, 2014 3:25 am

bob wrote:
Adam Hair wrote:
bob wrote:
syzygy wrote:
gladius wrote:
hgm wrote:Note that the design goal of Stockfish is not to be a good or useful engine. Its goal is to optimize beating a small set of opponents in bullet games under conditions where hash replacement virtually does not occur. Any patches that would improve its hashing, or solve the problems that you are experiencing, would be instantly rejected, as they would not cause a higher win rate in the bulet games.
This is just not true. The goal is indeed to be a "good and useful" engine. The *means* to achieve this is indeed by self-testing in bullet matches most of the time. This has proven to be a very effective strategy for improving strength. Changes that affect hash strategy are certainly tested at hash levels where replacement plays a major role (4mb hash at 60s).
I'm still waiting for any of the critics to explain how they are succesfully using 10-minute searches to improve their engines. Those engines must be really excellent at playing long time controls! I'm sure they'd blow SF away.

Oh wait...
Now there's an argument.

But SF testing is far from perfect and has too many regression errors. Any time you see a comment "you should only test the same idea 3-4 times max, as otherwise you will likely get a false positive, it makes you wonder.... or it should. The early SPRT terminations make testing go faster, but at a significant cost.
I am not sure that I understand what you mean by "significant cost". I may be misremembering, but I think that the SPRT bounds (which are used to determine when to pass or reject a change) have been set to hold both false positives and negatives to 5%.

bob wrote: And not all modifications work well at bullet. Some only show a gain at longer time controls. So that IS a valid criticism to make for anyone that uses bullet games in the wrong conditions.
I can only quote the FAQ. Which was something like this:

Q: how many times can I resubmit the same fix (I assume with a slight change of some kind).

A: 4-5 runs total should be the max to avoid a false positive.

I've been more of a "run more games to reduce that probability myself. Yes, quick SPRT-type tests using a LOS is useful, but (for me) not as a "confirmation" of anything.

I am guessing that you have not taken the time to really figure out what SPRT is about. SPRT allows for more efficient use of resources while giving better control over regressions than fixed game testing. In other words, for any fixed game test you decide to run, a SPRT test can be designed to be as or more precise while needing less games on average.

bob · Post by **bob** » Sat Oct 18, 2014 4:17 am

Adam Hair wrote:
bob wrote:
Adam Hair wrote:
bob wrote:
syzygy wrote:
gladius wrote:
hgm wrote:Note that the design goal of Stockfish is not to be a good or useful engine. Its goal is to optimize beating a small set of opponents in bullet games under conditions where hash replacement virtually does not occur. Any patches that would improve its hashing, or solve the problems that you are experiencing, would be instantly rejected, as they would not cause a higher win rate in the bulet games.
This is just not true. The goal is indeed to be a "good and useful" engine. The *means* to achieve this is indeed by self-testing in bullet matches most of the time. This has proven to be a very effective strategy for improving strength. Changes that affect hash strategy are certainly tested at hash levels where replacement plays a major role (4mb hash at 60s).
I'm still waiting for any of the critics to explain how they are succesfully using 10-minute searches to improve their engines. Those engines must be really excellent at playing long time controls! I'm sure they'd blow SF away.

Oh wait...
Now there's an argument.

But SF testing is far from perfect and has too many regression errors. Any time you see a comment "you should only test the same idea 3-4 times max, as otherwise you will likely get a false positive, it makes you wonder.... or it should. The early SPRT terminations make testing go faster, but at a significant cost.
I am not sure that I understand what you mean by "significant cost". I may be misremembering, but I think that the SPRT bounds (which are used to determine when to pass or reject a change) have been set to hold both false positives and negatives to 5%.

bob wrote: And not all modifications work well at bullet. Some only show a gain at longer time controls. So that IS a valid criticism to make for anyone that uses bullet games in the wrong conditions.
I can only quote the FAQ. Which was something like this:

Q: how many times can I resubmit the same fix (I assume with a slight change of some kind).

A: 4-5 runs total should be the max to avoid a false positive.

I've been more of a "run more games to reduce that probability myself. Yes, quick SPRT-type tests using a LOS is useful, but (for me) not as a "confirmation" of anything.
I am guessing that you have not taken the time to really figure out what SPRT is about. SPRT allows for more efficient use of resources while giving better control over regressions than fixed game testing. In other words, for any fixed game test you decide to run, a SPRT test can be designed to be as or more precise while needing less games on average.

I understand the idea, I've even done some tests of the idea. I don't like the risk, personally. I ran a few cluster tests using SPRT/LOS, and I found the occasional early cutoff was a bit problematic, in that there was a noticeable chance for error unless you tighten the confidence up to at least 98% or better. You are not going to be able to use LESS games to get a more accurate result, however. That defies statistics. I've also become less enamored with self-testing after running some cluster tests while working on some singular extension stuff. Places where self-testing shows a clear gain, gauntlet testing shows a clear loss.

I don't believe there is a "something for nothing" going on here. I won't disagree that it can use less games to detect things that are really bad or really good, but the smaller the change, the more risk, and quite a few changes are very small Elo issues. The cases that cause me concern are those where I have seen Crafty looking pretty bad at the 5000 game mark, back to even by 15K-20K, and actually showing a positive gain at 30K or more.

I really want a test and submit since computing resources are not exactly a limiting issue for me, and I don't have to continually worry about regressions and test for them very often.

bob · Post by **bob** » Sat Oct 18, 2014 4:20 am

xmas79 wrote:
syzygy wrote:"Oh boy", seems I got you upset.
Please don't give too much importance to yourself...

Even if you get SF (or any other engine) to return a long PV, putting a lot of trust in the second half of the PV isn't very wise.
Yes... That's also the case of the OP where the PV was badly truncated at the 4th half-move and the score says it's mate in 24 moves, isn't it? Please again...

It seems that you have time (and answers) to answer other posts, but you still miss to answer my simple question... Oh wait... You don't have an answer... Just because you cannot have one... Well I will live with that... And all other people of this forum will, that are reading those words and looking you in a such embrassing situation...

Bye bye...

There is a solution that has been in Crafty for quite a while now. It works well, has no measurable cost, which makes it the best of both worlds. I've written a short paper describing it but have not taken the time to actually finish it up. Probably should tackle that pretty soon.

Those that think bogus PVs are OK have their opinion. But you are correct in that it is useful to show a real PV, and NOT just because of the users. The "developer" can real take advantage of an accurate PV as well to help find bugs.

Rein Halbersma · Post by **Rein Halbersma** » Sat Oct 18, 2014 11:25 am

bob wrote:
Rein Halbersma wrote:
hgm wrote: The more reliable way is the 'triangular-array method'. The problem there is that it suffers from hash cuts, which is why some engines do not allow hash cuts in PV nodes. Crafty does allow the hash cuts, but has a separate hash table to hash complete PVs, so the tail of the PV can be restored on a hash cutoff. (And if that is overwritten, there is of course no hash cut.)
I use the triangular-array method. At the root in my iterative deepening, immediately after my pvs() has returned, I insert the PV into the TT. This guarantees that on the next iteration, the PV is searched as the "left-most" path in the search tree. This also guarantees there can be no hash cuts along the initial PV, and no special side-table is necessary. Has worked splendidly for me for years: in DEBUG mode I assert() that the returned score from pvs() at the root is obtained by walking down the PV and calling eval() at the PV-leaf.

The only PV issues I ever have are grafting and search instability, but they are not defects of the triangular-array method.
Sorry but that is wrong. I have ALWAYS stored the PV from the depth n-1 search before starting the depth n search. But you can STILL get exact hash hits, whether from the previous iteration via a transposition, or from a previous search. And these exact hash hits terminate the PV instantly by providing a score (but no PV) for the sub-tree below the hit. I have reported on a workable solution that I use in Crafty. Overhead is so small I could not measure it, the short PVs disappear.

After studying Crafty 24.1, I think I know the difference between our approaches. In Crafty, you break from the recursive search if you find in the TT an EXACT node of sufficient draft, regardless of whether the stored score is outside the current alpha/beta window. Is this a correct summary?

In my program, a TT hit only causes a cutoff if the EXACT node has value <= alpha or value >= beta. So transpositions from later nodes or earlier positions with exact scores never yield cutoffs, and I always get a full PV. I have an assert() in place to verify that walking down the PV will give a node with static eval equal to the propagated score. This assert() has never been hit during testing.

hgm · Post by **hgm** » Sat Oct 18, 2014 12:13 pm

Many programs do not take hash cuts in PV nodes, and this is of course a guarantee against clipped PVs, which in the tri-angular method can only occur by hash cutoffs.

There is a price, however, which is that you will also never be able to see things (like checkmates) beyond the horizon, through hash grafting. By not accepting a sufficient-draft cutoff you will start to re-search that node at lower depth. Which often will follow the original PV line for that node, (which would be pretty cheap, as all side branches will experience hash cutoffs), but also can stumble on a leaf that has a hashed mate-in-2 score, and do QS there to think it sacrificed a Queen for nothing, to disastrously fail low on that branch, and having to look for alternatives.

So the Crafty method is in principle much better.

Stan Arts · Post by **Stan Arts** » Sat Oct 18, 2014 12:38 pm

hgm wrote: Basically temporally inverted hash grafting:

The engine searches the best defense first, and this pushes the mate over the horizon. Then it starts to search the inferior defense, which gets it to the same hopeless position faster, so that now it has enough depth left to see it gets checkmated.

I see. Yes it's similair to the effect of getting extended PV's via transposition.

Interestingly I've always thought of that as a feature rather than shortcomming.. Getting extra information out of the program that even the program technically doesn't see, yet. Unfortunately to me the effect seems rare instead of usual. I'd love to get epic PV's all the time but I don't.

But in a theoretical/absolute sense it's not correct, indeed.

As for bug hunting though even the hash PV's serve me well. (Got inspired to try a chessprogram for the Commodore64 beginning of the year. With a pc program alongside to get my logic right. But ofcourse this new program is much faster and elegant than my last so that was too much to resist and has grown into a full program.)

bob · Post by **bob** » Sat Oct 18, 2014 4:52 pm

Rein Halbersma wrote:
bob wrote:
Rein Halbersma wrote:
hgm wrote: The more reliable way is the 'triangular-array method'. The problem there is that it suffers from hash cuts, which is why some engines do not allow hash cuts in PV nodes. Crafty does allow the hash cuts, but has a separate hash table to hash complete PVs, so the tail of the PV can be restored on a hash cutoff. (And if that is overwritten, there is of course no hash cut.)
I use the triangular-array method. At the root in my iterative deepening, immediately after my pvs() has returned, I insert the PV into the TT. This guarantees that on the next iteration, the PV is searched as the "left-most" path in the search tree. This also guarantees there can be no hash cuts along the initial PV, and no special side-table is necessary. Has worked splendidly for me for years: in DEBUG mode I assert() that the returned score from pvs() at the root is obtained by walking down the PV and calling eval() at the PV-leaf.

The only PV issues I ever have are grafting and search instability, but they are not defects of the triangular-array method.
Sorry but that is wrong. I have ALWAYS stored the PV from the depth n-1 search before starting the depth n search. But you can STILL get exact hash hits, whether from the previous iteration via a transposition, or from a previous search. And these exact hash hits terminate the PV instantly by providing a score (but no PV) for the sub-tree below the hit. I have reported on a workable solution that I use in Crafty. Overhead is so small I could not measure it, the short PVs disappear.
After studying Crafty 24.1, I think I know the difference between our approaches. In Crafty, you break from the recursive search if you find in the TT an EXACT node of sufficient draft, regardless of whether the stored score is outside the current alpha/beta window. Is this a correct summary?

In my program, a TT hit only causes a cutoff if the EXACT node has value <= alpha or value >= beta. So transpositions from later nodes or earlier positions with exact scores never yield cutoffs, and I always get a full PV. I have an assert() in place to verify that walking down the PV will give a node with static eval equal to the propagated score. This assert() has never been hit during testing.

That is the point of the hash entry. But scores outside the window won't produce a short pv, only EXACT entries cause that problem, because you should stop the search instantly when you get an EXACT hit since that entry gives you the result for the search from this point forward in the tree. What you are doing is throwing away part of the efficiency the ttable offers, namely why even store an EXACT entry if you are going to refuse to use it?

My approach saves the actual path from an EXACT entry to the terminal position that produced that backed-up score. When I get an EXACT hit, I not only terminate the search at this point, but I graft the stored PV onto the existing PV so that I have the exact path that leads from the root to the terminal node that produced this score. No short PVs at all.

bob · Post by **bob** » Sat Oct 18, 2014 4:55 pm

syzygy wrote:
bob wrote:
syzygy wrote:
Zenmastur wrote:I don't know if I agree 100%, but I do know that if your dredging your cache for the PV and the time control is long enough that major portions of the cache are over written before the move is made or analysis stopped, the last PV out put is likely to be trashed. This makes it completely unreliable and is a nightmare as far as time is concerned.
Clipping of the PV in SF does not make the non-clipped part unreliable.

I save the PV's and then use the engine to search each and every move in it to ferret out all the trash moves. I can't ever recall a case where the PV was accurate. I always find errors and the evaluations always change.
Sure, but what do you expect?

If the root position is searched to depth 40, then the position after 20 half moves was searched only to depth 20. That's just how things work.

Even if you get SF (or any other engine) to return a long PV, putting a lot of trust in the second half of the PV isn't very wise.
Unless you do it right. Then it is perfectly accurate.
I'm talking about "perfectly accurate" PVs. The second half of such PV is hardly reliable from a "move quality" point of view and will contain "trash moves".

Note that the OP is complaining about the moves of "perfectly accurate" PVs. (Those output by somewhat recent versions of SF are accurate in so far as they are not clipped.)

I think EVERYBODY here but YOU are talking about the PV from the root position to the terminal position that produces that evaluation. The OP was complaining about a mate score with no PV that leads to the checkmate position.

Using the "an EXACT entry is good, but I am not going to use it because I can get a truncated PV" is a tad out-of-date, because a solution exists that has no measurable down-side.

syzygy · Post by **syzygy** » Sat Oct 18, 2014 6:29 pm

bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
Zenmastur wrote:I don't know if I agree 100%, but I do know that if your dredging your cache for the PV and the time control is long enough that major portions of the cache are over written before the move is made or analysis stopped, the last PV out put is likely to be trashed. This makes it completely unreliable and is a nightmare as far as time is concerned.
Clipping of the PV in SF does not make the non-clipped part unreliable.

I save the PV's and then use the engine to search each and every move in it to ferret out all the trash moves. I can't ever recall a case where the PV was accurate. I always find errors and the evaluations always change.
Sure, but what do you expect?

If the root position is searched to depth 40, then the position after 20 half moves was searched only to depth 20. That's just how things work.

Even if you get SF (or any other engine) to return a long PV, putting a lot of trust in the second half of the PV isn't very wise.
Unless you do it right. Then it is perfectly accurate.
I'm talking about "perfectly accurate" PVs. The second half of such PV is hardly reliable from a "move quality" point of view and will contain "trash moves".

Note that the OP is complaining about the moves of "perfectly accurate" PVs. (Those output by somewhat recent versions of SF are accurate in so far as they are not clipped.)
I think EVERYBODY here but YOU are talking about the PV from the root position to the terminal position that produces that evaluation.

I know very well what SF is doing and the PV it displays is on a path towards that terminal node, hence "perfectly accurate". If the PV is not clipped, it ends in that terminal node. I am the one that fixed that some months ago. Before the fix, the ending of the PV could be completely bogus. Not anymore.

If you walk along this PV while letting the engine search, then obviously the score keeps changing as the engine is searching deeper and deeper. This is what the OP must be talking about in the part I am quoting, unless he is not using SF5 or something more recent.

bob · Post by **bob** » Sat Oct 18, 2014 8:00 pm

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
Zenmastur wrote:I don't know if I agree 100%, but I do know that if your dredging your cache for the PV and the time control is long enough that major portions of the cache are over written before the move is made or analysis stopped, the last PV out put is likely to be trashed. This makes it completely unreliable and is a nightmare as far as time is concerned.
Clipping of the PV in SF does not make the non-clipped part unreliable.

I save the PV's and then use the engine to search each and every move in it to ferret out all the trash moves. I can't ever recall a case where the PV was accurate. I always find errors and the evaluations always change.
Sure, but what do you expect?

If the root position is searched to depth 40, then the position after 20 half moves was searched only to depth 20. That's just how things work.

Even if you get SF (or any other engine) to return a long PV, putting a lot of trust in the second half of the PV isn't very wise.
Unless you do it right. Then it is perfectly accurate.
I'm talking about "perfectly accurate" PVs. The second half of such PV is hardly reliable from a "move quality" point of view and will contain "trash moves".

Note that the OP is complaining about the moves of "perfectly accurate" PVs. (Those output by somewhat recent versions of SF are accurate in so far as they are not clipped.)
I think EVERYBODY here but YOU are talking about the PV from the root position to the terminal position that produces that evaluation.
I know very well what SF is doing and the PV it displays is on a path towards that terminal node, hence "perfectly accurate". If the PV is not clipped, it ends in that terminal node. I am the one that fixed that some months ago. Before the fix, the ending of the PV could be completely bogus. Not anymore.

If you walk along this PV while letting the engine search, then obviously the score keeps changing as the engine is searching deeper and deeper. This is what the OP must be talking about in the part I am quoting, unless he is not using SF5 or something more recent.

The thread has digressed as most here do. Originally he was asking about why the search can't seem to "connect" iteration to iteration and keep things in a rational way so that N+1 doesn't suddenly take 100X as much time as iteration N.

Then someone broached the short PV problem. If you retrieve from the hash you either throw out some efficiency by not allowing EXACT matches along the PV, or else you get the short PV problem.

I've never understood why anyone would not use the "triangular array" method unless they simply don't understand it. It is essentially free, computationally, and with the PV grafting approach I added to Crafty, the PVs always perfectly connect the root to the terminal position that produced the score unless the path is simply too long, which can happen with multiple grafts on one path.

I have a hard time grasping the concept of "I have this useful information, but I am not going to use it..."

Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's

Re: Cache over-writing and PV's