TT with Null Move Cuts A PV Node/Line

Edsel Apostol · Post by **Edsel Apostol** » Wed Feb 26, 2014 7:58 pm

bob wrote:
Edsel Apostol wrote:
bob wrote:
Edsel Apostol wrote:
bob wrote:
Edsel Apostol wrote:You can try the following:

-don't do null move in the PV nodes
-don't do cutoff from the hash table in the PV nodes, just use the hash move
-store the principal variation to the hash table after every iteration in case the PV positions stored in it has been overwritten already
Seemed to me he was describing a problem trying to construct the PV from the hash table. The above don't help that at all. Perhaps I misunderstood his question?
I think his problem is incomplete PV. He tried various ways to collect the PV (one of them is using the exact nodes from the main TT, others are triangular array, etc) but still fails to display the PV at it's full length from time to time. I'm suggesting ways on how it can be solved. That's what we've been doing in our engine and we don't have that truncated PV issue.
Yes, but your "fix" is not so good. You ignore any hash hit in the PV that would stop the search. Why throw out good information like that, it hurts efficiency?

My solution uses all available information, and doesn't return short PVs either. There is no measurable overhead in my PV storing approach, and I don't give up any search efficiency by not honoring TT entries with sufficient depth, when at a PV node.

You can actually eliminate ALL search instability by throwing the TT out completely, or else just using it to store a best move, but no bound/score/draft, etc.
It's a design trade-off. A simple solution for a bit of inefficiency.

In my limited testing in our engine, returning exact score in PV nodes from hash table if it is outside the alpha and beta window makes it weaker. If I limit it to return exact score if it satisfies (score > alpha && score < beta), then it is at least at par with the one without the cut-offs. So performance-wise they are at par, with the disadvantage that the principal variation is broken at times.

I think the reason why it behaves that way is that with the new search window (previous iteration lets say is -20, 20, now lets say it's 0, 40) the search may stumble upon a better continuation, which wouldn't happen if it is being cut-off at PV nodes when an exact score from a different search window is encountered.

Of course, YMMV with Crafty. It may have more stable search than ours to take advantage of doing cut-offs in PV nodes.
I hate to point this out, but it sounds like you have a bug if this weakens your program. What is a TT hit? The result from a previous search of this same position. I don't see how using that could hurt. I don't expect it to help a lot with PVS because so few nodes are searched with anything but a null-window, but I don't see any rational way it could hurt.

I don't see how your example can happen, however. A true score is a "constant" for a position. Alpha/Beta guarantees to return the same score that a pure minimax search would return. Hashing can sometimes let you find a better move due to searching deeper via tree grafting, but it should not make you choose a worse move, that would be defective.

It could hurt I think if the current search window is different from the previous search window. Modern engines don't use pure alpha-beta anymore. There are lots of heuristics relying on the value of the window, any change on the window may change the principal variation as well. I think that the result of the search with the current window is more accurate than the result of the previous iteration with a different window, therefore it would be better to continue the search with the current window than to do the cutoff. It's just my theory and I could be wrong.

Anyways, I've checked the latest Crafty and it seems to return a hash hit for exact nodes even if the score is outside the current alpha/beta window? Have you tested to return a hash hit only if the exact score from hash table satisfies (score > alpha and score < beta)?

bob · Post by **bob** » Wed Feb 26, 2014 10:59 pm

Code: Select all

   8 Crafty-24.0-2        2642    4    4 30080   66%  2514   22% 
   9 Crafty-24.0-1        2640    4    4 30080   65%  2514   22% 
  10 Crafty-24.0R08-1     2640    4    4 30080   65%  2514   22% 
  11 Crafty-24.0R08-2     2638    4    4 30080   65%  2514   21%

version with R08 is the version that disables hash hits/cutoffs in PV nods but still maintains the best move. 24.0 is normal version. All are within the error bar of each other, but there is an indication that making this change has some minor Elo cost. When I finish some other tests I have already started, I might try taking this out to 100K games each to get the error bar down to +/- 1, rather than +/-4

bob · Post by **bob** » Wed Feb 26, 2014 11:03 pm

Edsel Apostol wrote:
bob wrote:
Edsel Apostol wrote:
bob wrote:
Edsel Apostol wrote:
bob wrote:
Edsel Apostol wrote:You can try the following:

-don't do null move in the PV nodes
-don't do cutoff from the hash table in the PV nodes, just use the hash move
-store the principal variation to the hash table after every iteration in case the PV positions stored in it has been overwritten already
Seemed to me he was describing a problem trying to construct the PV from the hash table. The above don't help that at all. Perhaps I misunderstood his question?
I think his problem is incomplete PV. He tried various ways to collect the PV (one of them is using the exact nodes from the main TT, others are triangular array, etc) but still fails to display the PV at it's full length from time to time. I'm suggesting ways on how it can be solved. That's what we've been doing in our engine and we don't have that truncated PV issue.
Yes, but your "fix" is not so good. You ignore any hash hit in the PV that would stop the search. Why throw out good information like that, it hurts efficiency?

My solution uses all available information, and doesn't return short PVs either. There is no measurable overhead in my PV storing approach, and I don't give up any search efficiency by not honoring TT entries with sufficient depth, when at a PV node.

You can actually eliminate ALL search instability by throwing the TT out completely, or else just using it to store a best move, but no bound/score/draft, etc.
It's a design trade-off. A simple solution for a bit of inefficiency.

In my limited testing in our engine, returning exact score in PV nodes from hash table if it is outside the alpha and beta window makes it weaker. If I limit it to return exact score if it satisfies (score > alpha && score < beta), then it is at least at par with the one without the cut-offs. So performance-wise they are at par, with the disadvantage that the principal variation is broken at times.

I think the reason why it behaves that way is that with the new search window (previous iteration lets say is -20, 20, now lets say it's 0, 40) the search may stumble upon a better continuation, which wouldn't happen if it is being cut-off at PV nodes when an exact score from a different search window is encountered.

Of course, YMMV with Crafty. It may have more stable search than ours to take advantage of doing cut-offs in PV nodes.
I hate to point this out, but it sounds like you have a bug if this weakens your program. What is a TT hit? The result from a previous search of this same position. I don't see how using that could hurt. I don't expect it to help a lot with PVS because so few nodes are searched with anything but a null-window, but I don't see any rational way it could hurt.

I don't see how your example can happen, however. A true score is a "constant" for a position. Alpha/Beta guarantees to return the same score that a pure minimax search would return. Hashing can sometimes let you find a better move due to searching deeper via tree grafting, but it should not make you choose a worse move, that would be defective.
It could hurt I think if the current search window is different from the previous search window. Modern engines don't use pure alpha-beta anymore. There are lots of heuristics relying on the value of the window, any change on the window may change the principal variation as well. I think that the result of the search with the current window is more accurate than the result of the previous iteration with a different window, therefore it would be better to continue the search with the current window than to do the cutoff. It's just my theory and I could be wrong.

Anyways, I've checked the latest Crafty and it seems to return a hash hit for exact nodes even if the score is outside the current alpha/beta window? Have you tested to return a hash hit only if the exact score from hash table satisfies (score > alpha and score < beta)?

Not many EXACT hits will come from the previous iteration, they will usually be one ply short of the current draft requirement. Most of those come from either (a) inside the current search or (b) the previous search which was a ponder of the wrong move, so that you get some high-draft entries. With PVS, almost all nodes are null-window anyway, so the bounds as tight as possible.

Only time I say any real benefit to fail-soft was when I experimented with mtd(f) since it gives you at least a hint of how far to shift the window for the next search.

Evert · Post by **Evert** » Wed Feb 26, 2014 11:39 pm

bob wrote: That returned score > alpha+1 is simply a lower bound on the true score and I want that real number.

If I can indeed rely on that, there is no issue.

I remember a description of search instability where a fail-high could not fail high if it was researched with an open window. Am I remembering that wrong? Or am I mixing things up (that discussion was on aspiration windows rather than null-move search in PVS)?

xmas79 · Post by **xmas79** » Thu Feb 27, 2014 12:33 am

I remember a description of search instability where a fail-high could not fail high if it was researched with an open window. Am I remembering that wrong? Or am I mixing things up (that discussion was on aspiration windows rather than null-move search in PVS)?

Please define "open window"... If you have [alpha,beta] window and fail high on the window [alpha,alpha+1], how you would research? [alpha,beta]? In fail-soft yes, you can still (and will) fail high if the returned score was > beta.

A new fail-high on the research can happen (for example) if you have failed high because search found you won a knight (and you searched with a window [alpha, alpha+1]), then research with a window like [alpha, alpha + 500] and search cannot fail high on the knight win, but will fail high if you win a queen.

I think that search instability can be better understood at root, where it usually triggers a fail-high followed by a fail-low (if you shift the research window like [beta, beta+margin], or a fail-high followed by an exact score (if you use a window like [alpha, beta+margin]). And it seems to me that in the context of the (re)search it helps more than it hurts.

bob · Post by **bob** » Thu Feb 27, 2014 5:06 am

Evert wrote:
bob wrote: That returned score > alpha+1 is simply a lower bound on the true score and I want that real number.
If I can indeed rely on that, there is no issue.

I remember a description of search instability where a fail-high could not fail high if it was researched with an open window. Am I remembering that wrong? Or am I mixing things up (that discussion was on aspiration windows rather than null-move search in PVS)?

Here's the likely scenario you are thinking about. You have a deep draft entry that simply says "LOWER" with a deep depth value. That will cause a cutoff each time you get a hash hit until your search depth passes the depth of that entry. But when you relax beta, now it is larger than the "LOWER" bound you have, so the hash entry won't be used, and your depth is not deep enough to see what the tt entry was hinting about, so you promptly fail low. But that IS a good move and should be played, unless something else fails high.

mvk · Post by **mvk** » Thu Feb 27, 2014 6:29 am

bob wrote:
Code: Select all
   8 Crafty-24.0-2        2642    4    4 30080   66%  2514   22% 
   9 Crafty-24.0-1        2640    4    4 30080   65%  2514   22% 
  10 Crafty-24.0R08-1     2640    4    4 30080   65%  2514   22% 
  11 Crafty-24.0R08-2     2638    4    4 30080   65%  2514   21% 
version with R08 is the version that disables hash hits/cutoffs in PV nods but still maintains the best move. 24.0 is normal version. All are within the error bar of each other, but there is an indication that making this change has some minor Elo cost. When I finish some other tests I have already started, I might try taking this out to 100K games each to get the error bar down to +/- 1, rather than +/-4

Few things:
1a. What is "1" and "2"? Can you show the WDL table or compute LoS?
1b. Are the games that "1" and "2" play independent (different book lines?)

2. It occurs to me that 4 versions are of interest:

variant 1. Don't allow TT-pruning in PV nodes & follow the best move from TT. (This is R08?)

variant 2. Don't allow TT-pruning in PV nodes & follow the best move from PV table first (and TT table second). This is Rookie v3's behaviour. As you have the PV in your external data structure, you should be able to emulate this.

variant 3. Allow TT-pruning in PV nodes & follow the best move from TT (Crafty default behaviour?)

variant 4. Allow TT-pruning in PV nodes & follow the best move from PV table

How are you normally following the PV? From TT only or from a real PV table? In my case I always follow the actual PV, even if the TT is saturated.

P.S. In general I agree that information should not get thrown out. But it is also a tradeoff how to spend programmer time. My progress rate is still ~1 elo for 4 hours of effort. That means that if I spend 8 hours on something like this which is not essential to strength, and for which a simple solution is available, I cannot spend those 8 hours on something else. There is an opportunity cost attached to such features. (In this case it would be much simpler if pointers fit in TT entries, then you can at least use malloc/free for the PV's instead of rolling your own memory management system.)

bob · Post by **bob** » Thu Feb 27, 2014 8:54 pm

mvk wrote:
bob wrote:
Code: Select all
   8 Crafty-24.0-2        2642    4    4 30080   66%  2514   22% 
   9 Crafty-24.0-1        2640    4    4 30080   65%  2514   22% 
  10 Crafty-24.0R08-1     2640    4    4 30080   65%  2514   22% 
  11 Crafty-24.0R08-2     2638    4    4 30080   65%  2514   21% 
version with R08 is the version that disables hash hits/cutoffs in PV nods but still maintains the best move. 24.0 is normal version. All are within the error bar of each other, but there is an indication that making this change has some minor Elo cost. When I finish some other tests I have already started, I might try taking this out to 100K games each to get the error bar down to +/- 1, rather than +/-4
Few things:
1a. What is "1" and "2"? Can you show the WDL table or compute LoS?
1b. Are the games that "1" and "2" play independent (different book lines?)

2. It occurs to me that 4 versions are of interest:

variant 1. Don't allow TT-pruning in PV nodes & follow the best move from TT. (This is R08?)

variant 2. Don't allow TT-pruning in PV nodes & follow the best move from PV table first (and TT table second). This is Rookie v3's behaviour. As you have the PV in your external data structure, you should be able to emulate this.

variant 3. Allow TT-pruning in PV nodes & follow the best move from TT (Crafty default behaviour?)

variant 4. Allow TT-pruning in PV nodes & follow the best move from PV table

How are you normally following the PV? From TT only or from a real PV table? In my case I always follow the actual PV, even if the TT is saturated.

P.S. In general I agree that information should not get thrown out. But it is also a tradeoff how to spend programmer time. My progress rate is still ~1 elo for 4 hours of effort. That means that if I spend 8 hours on something like this which is not essential to strength, and for which a simple solution is available, I cannot spend those 8 hours on something else. There is an opportunity cost attached to such features. (In this case it would be much simpler if pointers fit in TT entries, then you can at least use malloc/free for the PV's instead of rolling your own memory management system.)

Sorry. -1 and -2 are simply two separate runs. Sort of a "verification" run. This is a statistical sample, obviously. One expects to occasionally get a 2-3 sigma result. But very rarely two times in a row. The two matches used the same opponents, the same time controls, the same starting positions. I'd normally expect the results to be VERY close (which they are here). I ran the second run just to make sure no oddball sample rolled in.

I ALWAYS follow the PV first for each iteration. At the end of an iteration, I very carefully walk the tt to make sure every PV move is present. If not, I add an entry. That guarantees that the left-hand-side of search iteration N+1 will follow the PV from iteration N until the last ply where it might not have a move if the previous PV had no q-search captures on the end to make it long enough. Both 24.0 and 24.0R08 do this exactly the same way. Only difference is not allowing hash hits along the PV to cause a cutoff or provide an exact score. Added exactly 2 lines of code to Crafty to do this.

The main thing I do differently is that when I store a normal tt entry that is EXACT, I store the PV from beyond this node in a separate PV hash table. If I get a hash hit later for this position, the PV isn't truncated, I copy the PV I saved onto the current PV to give me the exact sequence of moves that lead to the score the hash hit provided.

Some of your variants don't apply to Crafty, because it always installs the PV at the beginning of an iteration, since occasionally they get overwritten by the end of the previous iteration, particularly those out near the tips. This guarantees me that the PV from N-1 is always the first thing searched at iteration N.

Here's the LOS table from Bayeselo, just the 4 versions in question:

Code: Select all

                     Cr Cr Cr Cr 
Crafty-24.0-1           93 98 99
Crafty-24.0-2         6    72 90
Crafty-24.0R08-1      1 27    76
Crafty-24.0R08-2      0  9 23

bob · Post by **bob** » Thu Feb 27, 2014 8:57 pm

Cheney wrote:I've read some more and something clicked

First, the issue I have experienced with my PVTT is a bug with the MakeNullMove. Not only was I not properly restoring the board state when unmaking the move, but during the nullmove, I also had a bad hash. This appears to have lead to more collisions and writing the TT with an invalid move. Later, when I would walk the PVTT and play the moves out, the board became corrupt.

Robert, I read what you wrote in your PV example. All along I was under the impression that a beta cut was killing the triangular array when in fact, it is an EXACT hash:
You find the EXACT hash entry, and you would normally cut the PV off at that point.
I get it and this explains the Triangular Array being truncated. Thank you.

I will look at Crafty. I expect to see where you write the TT that there is something similar to an array attached to that TT entry.

All this review lead me to something else which did not make sense: the oscillation of cutting and not cutting across PLYs when using a zero window. But I think I get that now and will start another post.

I don't really "attach" anything. I have a second TT organized a bit differently, but addressed with the same hash signature approach. I didn't want a link because almost no tt entries are EXACT and the link would be wasted. A second tt accessed a bit differently (bucket with 16 PVs as opposed to bucked with 4 normal entries)...

mvk · Post by **mvk** » Thu Feb 27, 2014 11:49 pm

bob wrote: Sorry. -1 and -2 are simply two separate runs. Sort of a "verification" run. This is a statistical sample, obviously. One expects to occasionally get a 2-3 sigma result. But very rarely two times in a row. The two matches used the same opponents, the same time controls, the same starting positions. I'd normally expect the results to be VERY close (which they are here). I ran the second run just to make sure no oddball sample rolled in.

The results would be very close, but if there is bias from the opening selection, then you don't see that. That is why I would do a repeat always with different openings (essentially increasing the nr of games in the first place).

Anyway, the test setup with your two variants looks fine with me. We effectively follow the PV in the same way. I'm also starting to think that a simple log of historic PV node results (best moves) is enough to reconstruct the PV, and then you can just enumerate the invocations and use that for addressing. No need to recycle that memory, not within the same game.

TT with Null Move Cuts A PV Node/Line

Re: TT with Null Move Cuts A PV Node/Line

test results

Re: TT with Null Move Cuts A PV Node/Line

Re: TT with Null Move Cuts A PV Node/Line

Re: TT with Null Move Cuts A PV Node/Line

Re: TT with Null Move Cuts A PV Node/Line

Re: test results

Re: test results

Re: TT with Null Move Cuts A PV Node/Line

Re: test results