ROC analysis of Late Move Reductions

Discussion of chess software programming and technical issues.

Moderator: Ras

syzygy
Posts: 5713
Joined: Tue Feb 28, 2012 11:56 pm

Re: ROC analysis of Late Move Reductions

Post by syzygy »

Don wrote:This is all in the context of how we define a cut node and an all node which I made clear in my post which you apparently missed seeing.

I won't repeat it in full here but basically we estimate which node type it will be BEFORE it is even searched and we stick with that definition. And our definition has the merits of being much easier to reason about.
It might be better to not even call it an estimation.
What you are doing is taking into account the role the node is playing in the currently ongoing verification that a sibling move of a PV move is not better than the PV move. Of course once that verification has failed, everything changes. But while doing the verification, it makes sense to realise in what part of the tree you are.
The classical definition is useless for a chess program if you actually intend to do anything useful with it.
When your search finishes, your ALL and CUT nodes are the classical ALL and CUT nodes.

Nodes that you label as CUT nodes while the search is ongoing but that don't give a cut are actually not ALL nodes: they will likely not be part of the final verification tree at all. (Of course some of those will make a reappearance as an ALL node during a re-search. At that point they will play a very different role and I suppose it makes sense to treat them differently that time around.)

In short, in my view it makes no sense to define a node being visited during the search as an ALL node only because all its moves are searched. Firstly, this definition is rather useless, because with this definition you're already finished with the node at the time you find out its type. Secondly, this definition does not equal the classical definition which only applies to complete verification trees that form a "proof" of the PV.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: ROC analysis of Late Move Reductions

Post by Don »

syzygy wrote:
Don wrote:This is all in the context of how we define a cut node and an all node which I made clear in my post which you apparently missed seeing.

I won't repeat it in full here but basically we estimate which node type it will be BEFORE it is even searched and we stick with that definition. And our definition has the merits of being much easier to reason about.
It might be better to not even call it an estimation.
What you are doing is taking into account the role the node is playing in the currently ongoing verification that a sibling move of a PV move is not better than the PV move. Of course once that verification has failed, everything changes. But while doing the verification, it makes sense to realise in what part of the tree you are.
The classical definition is useless for a chess program if you actually intend to do anything useful with it.
When your search finishes, your ALL and CUT nodes are the classical ALL and CUT nodes.
Well said, this is probably why it works so well with us and I agree with you that it's much more than an estimation.

Nodes that you label as CUT nodes while the search is ongoing but that don't give a cut are actually not ALL nodes: they will likely not be part of the final verification tree at all. (Of course some of those will make a reappearance as an ALL node during a re-search. At that point they will play a very different role and I suppose it makes sense to treat them differently that time around.)

In short, in my view it makes no sense to define a node being visited during the search as an ALL node only because all its moves are searched. Firstly, this definition is rather useless, because with this definition you're already finished with the node at the time you find out its type. Secondly, this definition does not equal the classical definition which only applies to complete verification trees that form a "proof" of the PV.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Steve Maughan
Posts: 1295
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: ROC analysis of Late Move Reductions

Post by Steve Maughan »

Hi Robert,
Houdini wrote:(...).The approach requires some special management of the hash table in order to distinguish information coming from CUT and ALL nodes - in view of the different reduction levels the information is not interchangeable.
This is an interesting idea.

So do you have CUT and ALL as part of the Zobrist key - or do you set a flag in the hash record? I assume you then don't use a CUT hash record to prune an ALL node at a given depth - correct?

Thanks,

Steve
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: ROC analysis of Late Move Reductions

Post by bob »

Don wrote:
bob wrote:
Don wrote:
bob wrote:
Don wrote:
Daniel Shawul wrote:In light of the LMR euphoria, I thought i would bump this thread up which actually has some substance. I run some tests regarding lmr and prunings some weeks ago. Reducing ALL nodes more is a loss for me after 10000 games as you mentioned but I did it only in the upper parts >6 depth. If i did it everywhere it is a total loss. As I mentioned before, rybka and co seem to reduce CUT nodes more which is weird.
We also reduce cut nodes more. It appears to be safer to do that and I will explain our reasoning on this. I don't for sure it this reasoning is correct but many of our tests show that reducing them more has an unexpected effect which is the opposite of what you might expect. It can actually slow down the search slightly and improve the quality. It's even easy to reason about why this might be happening. Here is how it goes. Keep in mind all of this is from a PVS search framework:

"reduce cut nodes more"? A normal cut node only searches one move, and you never reduce the first move searched. I'm not following here...
This is all in the context of how we define a cut node and an all node which I made clear in my post which you apparently missed seeing.
I saw it, in fact. It is an exact repeat of what I did in the DTS implementation of Cray Blitz. But the concept of "reducing a cut node" is a contradiction in terms if you add "but don't reduce the first move." A "CUT" node would only have one move searched if it is a true CUT node.
What if the second move gives the cutoff? Do you consider that an all node just because the moves were ordered wrong?
This is wrong thinking. A node is an ALL node, if and only if every move is searched and returns a score <= alpha. A node is a cut node if and only if you do not search every move before returning a value >= beta. However, for parallel search, we need a stronger definition of CUT, namely that it usually fails high on the first move most of the time (Schaeffer used the > 90% of the time definition).

If you "think" a node is a cut node, and the first move does not fail high, the node right below that CUT node is instantly a CUT node itself since there is a refutation for your first move tried.

That's a problem in that a parallel search at the supposed ALL node that follows a CUT node is wasted effort since the ALL node will quickly fail high introducing search overhead.

I am still nowhere near convinced that reducing one type of node more than another (or vice-versa) is theoretically correct. Explanations sound good. But that doesn't make them correct. Chess is about depth. Anytime you reduce the depth, you add risk. I don't see any justification for varying the risk depending on whether the node is CUT or ALL. Simply seems wrong. This "sorta-like-IID" makes little sense. If you didn't spot something last iteration, you won't spot it this iteration unless you search a ply deeper. It might take deeper depth to prove that every move at this ALL node is bad (< alpha), where a shallow search won't see this. It might take a deeper depth to prove that some move at this CUT node is good (> beta) where a shallower search won't see this. I don't see anything that differentiates one of the two cases from the other. If I reduce the depth, I am not going to see the tactical issue I need to see. I'm just as interested in seeing a new best move show up as I am in proving the current best move is really best...

My original comment still stands. I believe one reduces moves on the basis of the move, not the position. It is not uncommon for a CUT node to become an ALL node, and vice-versa. If you make decisions on the expected node type, you introduce yet another instability since you would do things differently depending on whether it is really a CUT node or an ALL node..
We feel differently. We don't do null pruning at PV nodes for example and repeated experiments have show us that this hurts the program. I don't really understand why, I just know that the node type matters for Komodo. Komodo seems to be a relatively strong program despite all these blunders I am making.
This is a different issue, and avoiding null-moves on PV nodes is theoretically sound/correct. A null is an illegal move. At a PV node, a null could be chosen as best, which would be illegal. So if the null-move does ANYTHING other than fail low, you have wasted effort since you have to re-search to find a legal move to play, anyway. Avoiding wasted effort is always correct.



I won't repeat it in full here but basically we estimate which node type it will be BEFORE it is even searched and we stick with that definition. And our definition has the merits of being much easier to reason about.

The classical definition is useless for a chess program if you actually intend to do anything useful with it.


Also, you don't know a node is a "CUT" node until you get a beta cutoff...

You can guess. You can also flip a coin I suppose??

If you are at a cut node the previous node was an all node. At all nodes you expect all moves to fail low and at cut nodes you expect the first move to fail high and exit. The insight is in this: what happens if a cut node does produce a cutoff? The answer is that the previous move (which is at an ALL node) will fail high and get researched. The re-search saves the day! If the heavy reduction at the cut node causes the search to FAIL to produce the expected cutoff (and it would have otherwise), there is NO damage done because of this.

What if the reduced search has the opposite effect? BTW, the above is incorrectly written. I assume you meant "if a cut node does NOT produce a cutoff", making it an ALL node instead. More dangerously, what if the CUT position DOES produce a cutoff? You re-search to verify, I assume? Safe enough. What if it doesn't. When you get back here again, from the previous ALL node (where the move was expected to fail low but did not) you are STILL at the same type of node, expecting the same result, not changing the reduction rules...


There is no free lunch however. The more aggressive cut node reductions should save time, but that may not necessarily happen. Because it can trigger additional re-searches at the previous node it can actually cause the search to take MORE time. But strangely enough we have seen a slight quality improvement, probably due to these additional re-searches which with LMR will be done at greater depths (because these won't be reduced.)

The way you categorize cut and all nodes could change all of this however. We have a very anal definition because in reality you cannot know until a search is complete, so we decide BEFORE the first move is searched. It cannot change because it always toggles back and forth. If the previous node was a cut node, we consider this an ALL node. The only exception is for PV nodes which are a special case of ALL nodes. In the PVS framework, the first zero width window searched from a PV node is by our definition a CUT node and if you have to do a re-search then it is suddenly promoted to a PV nodes (as per PVS search) and only then can the cut and all nodes swap positions. In other words, these internal search failures can force the status of every node in the subtree to swap if it propagates back to the last PV nodes.

it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.


Anyway i set expected CUT nodes to ALL after few moves are searched and no cut-offs occur so most will be ALL nodes anyway. The other test i did was to reduce loosing captures by a much lower value (1 ply), while other moves could be reduced as much as 6 ply. That seemed to improve scorpio a bit so i have that enabled by default now. I suspect that captures,checks and others moves that are not reduced form the sole of an LMR heavy engine so tampering with those even with small reduction values may be damaging.
Also i did some test with aggressive pruning but this is a loss as always. Move count pruning never worked for me above depth=2, and it probably never will. I am better off reducing increasing lmr reduction in the upper part of the tree by one more ply than worry about pruning in the last plies.
The problem I saw when doing this in DTS is that once you fail to fail-high on the first move of a CUT node, EVERYTHING along the path is now "in doubt". Every node type could be flipping as you find a new best move, or you find the best move is worse than the aspiration bound.

I ended up doing this twice.

Once going down the tree, then again going back up the tree, so that the current CUT node is no longer considered CUT (since we didn't fail on the first move) and the previous ALL node is is now likely a CUT node since we are not failing high, and a fail-low here turns into a fail-high one ply back.

It was complicated, and had a lot of error associated with it.