Nodes type clarification.

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27809
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Nodes type clarification.

Post by hgm »

bob wrote:The problem that I recognized is that PV nodes are _critical_. A wrong move first and the tree explodes. That's why I only do IID there. At ALL/CUT nodes, thanks to null-move reductions, and and LMR-type reductions, the effort expended by IID was more expensive than the savings gained. I have tried this many times, but never got better results than when simply applying IID when there is no hash move on a PV node. PV nodes are the only place where reductions don't apply. The root PV node best move usually takes 90% of the search effort if not more. Getting it right is critical. Ditto for other PV nodes along the left-most edge of the graph. For my tests, IID elsewhere made the tree larger, not smaller. Perhaps if you don't do good captures first (SEE) or killer moves, this might pay off. But at traditional CUT nodes, I get a "good enough move" ordered first 92% of the time or so at no cost in extra searching. IID would need to improve on that while not losing more than it gains, which I don't see happening in my code at least.
It might be that my move ordering is worse (in uMax it will certainly be), but the reason why it does not pay off for you in CUT nodes might also be because of the way you implemented it. If you do not use self-deepening, doing IID in a CUT-node runs into the inefficiency I pointed out above, revisiting the daughter ALL-node many times in a row, generating moves every time you re-enter it.

In Joker I prevent this by using self-deepening: the ALL-node will not return if it knows it will be immediately called again with increased depth, but will increase the depth itself (so it can continue to use the already generated and sorted move list). So the main effect of the IID is that in the cases where the first move is not cut move (i.e. your 8%) you are very likely informed before full depth is reached:

Code: Select all

Search(depth, maxDepth)
{
    MoveGen();
    Probehash();
    bestScore = hash.score;
    for&#40;iidDepth=hash.depth+1; iidDepth <= maxDepth && &#40;iidDepth <= depth || bestScore <= alpha&#41;; iidDept++)
    &#123;
        for&#40;ALL_MOVES&#41;
        &#123;
            Make&#40;);
            score = -Search&#40;iidDepth-1, depth-1&#41;;
            UnMake&#40;);
            if&#40;score>bestScore&#41; bestScore = score;
        &#125;
    &#125;
    StoreHash&#40;);
    return bestScore;
&#125;
User avatar
hgm
Posts: 27809
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Nodes type clarification.

Post by hgm »

bob wrote:The flaw is that this is an ALL node, ...
Well, this is the assumption on which the rest of your post is founded, and it is wrong.

The node is not an ALL node. You merely think it is an ALL node. That makes all the difference.

What you gain by searching it at reduced depth, is that you find the cut-move in this so-called ALL node at highly reduced cost. If there is one. And sometimes there is one. You say it yourself: only in 92% of the cases the first move you try (uninformed) is a cut-move. So in 8% of the cases, it is not, and fails low. That means that in this 8% of the cases the daughter to which this move leads, which you expected to be an ALL node, failed high.

That is a non-negligible fraction. If you can cut the search effort tenfold for those 8% of the classified ALL node by abandoning the no-good cut-move before you reach full depth, there is significant savings.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Nodes type clarification.

Post by bob »

hgm wrote:
bob wrote:The flaw is that this is an ALL node, ...
Well, this is the assumption on which the rest of your post is founded, and it is wrong.

The node is not an ALL node. You merely think it is an ALL node. That makes all the difference.
The general idea is that such "types" are correct about 92-95% of the time. But you are missing the _key_ point. At a node where we are going to fail high, as opposed to a node where we will search all moves, when we fail high, we do so 90+% of the time on the _first_ move searched. Without using IID at all. So you have 10% (or less) to make an improvement in. Not much room. IID is not free. This was added to Crafty in version 8.9, which was 12-14 years ago. I don't recall who first suggested it (perhaps deep thought guys, I am not sure). I tried using it (a) everywhere [horribly slow], (b) at all CUT nodes [slower, but not as slow as everywhere] and finally just on PV nodes where ordering is absolutely critical because of the size of the tree. This latter worked for me and I have been using it ever since.


What you gain by searching it at reduced depth, is that you find the cut-move in this so-called ALL node at highly reduced cost. If there is one. And sometimes there is one. You say it yourself: only in 92% of the cases the first move you try (uninformed) is a cut-move. So in 8% of the cases, it is not, and fails low. That means that in this 8% of the cases the daughter to which this move leads, which you expected to be an ALL node, failed high.
Not quite. 92% of the time when I fail high, I fail high on the _first_ move I search. One ply deeper in the tree, on the 8% that don't fail high, I will get a fail high there. 8% of the time your IID will help. 92% of the time at that same node I am going to search all nodes normally and IID offers zero. Help in 8%, harm in 92%, the math does not seem to support doing this.

On the other hand, at PV nodes, where I don't have a hash move, IID does help. We know it does because normal iterative deepening is set up to handle this exact case, to feed the best moves from the previous search into the current (one ply deeper) search. But in this case, we don't have hash moves at every ply because of the fail-high at the last iteration that did not get resolved to an exact score. And any effort expended to get a good first move will pay off.

But I don't see how it is going to pay off when it helps 8% and hurts 92% of the non-PV nodes.


That is a non-negligible fraction. If you can cut the search effort tenfold for those 8% of the classified ALL node by abandoning the no-good cut-move before you reach full depth, there is significant savings.


I agree. But what about the wasted effort at the _other_ 92% of the nodes where you do this to get a best move that doesn't matter because they turn out to really be ALL nodes?

Anything that can bring the fail-high percentage on the first move up is good, if the cost of doing so is not more than the cost of not getting the right move first. One could search a perfectly ordered tree, but the effort to order the tree perfectly would cost far more than missing the best move 8% of the time.
User avatar
hgm
Posts: 27809
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Nodes type clarification.

Post by hgm »

bob wrote:Not quite. 92% of the time when I fail high, I fail high on the _first_ move I search. One ply deeper in the tree, on the 8% that don't fail high, I will get a fail high there. 8% of the time your IID will help. 92% of the time at that same node I am going to search all nodes normally and IID offers zero. Help in 8%, harm in 92%, the math does not seem to support doing this.
If that is what your 92%-number really means, it is not really relevant to this discussion, and you are misinterpreting it.

Misinterpretation, because the 92% of the hindsight cut-nodes that searched the cut-move first all require much less search than the 8% that found the cut-move only late. For all we know, these 8% might on the average have found the cut-move only after searching 9 other moves, so that each of them would take 10 times longer than the other 92%. That would make the ratio of the search effort 92% vs 10*8%=80%, or after renormalization 53.5% vs 46.5%. Then the nodes where you did not get it right first time would nearly consume half the search time, and the savings that were possible on them would be very substantial. 8% might be the large majority in terms of search effort, rather than the minor fraction you suggest it to be.

The number would be irrelevant to the IID discussion, because the decision to do IID is not taken in hindsight, based on the true nature of the node, but in advance, based on the classification. If the classification was wrong most of the time, you might as well not have bothered with it, so I assume that in the large majority of cases the classification is correct So when you find that 92% of all hindsight cut-nodes cut off on the first move, it really does not tell us anything about the minor fraction of those that were initially classified as ALL nodes. It might very well be that 95% of the hindsght cut-nodes were initially classified as CUT nodes, and 96% of those (so 91.2% of the total) would cut off on the first move, while of the remaining 5% only 16% (= 0.8% of the total) would cut off on the first move. This would bring the total of first-move cuts to 92%, while for the nodes we are discussing 84% would in fact have to search multiple moves.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Nodes type clarification.

Post by bob »

hgm wrote:
bob wrote:Not quite. 92% of the time when I fail high, I fail high on the _first_ move I search. One ply deeper in the tree, on the 8% that don't fail high, I will get a fail high there. 8% of the time your IID will help. 92% of the time at that same node I am going to search all nodes normally and IID offers zero. Help in 8%, harm in 92%, the math does not seem to support doing this.
If that is what your 92%-number really means, it is not really relevant to this discussion, and you are misinterpreting it.

Misinterpretation, because the 92% of the hindsight cut-nodes that searched the cut-move first all require much less search than the 8% that found the cut-move only late. For all we know, these 8% might on the average have found the cut-move only after searching 9 other moves, so that each of them would take 10 times longer than the other 92%. That would make the ratio of the search effort 92% vs 10*8%=80%, or after renormalization 53.5% vs 46.5%. Then the nodes where you did not get it right first time would nearly consume half the search time, and the savings that were possible on them would be very substantial. 8% might be the large majority in terms of search effort, rather than the minor fraction you suggest it to be.
I think I posted some better numbers on r.g.c.c or here a few years back, where I actually captured exact counts. 92% failed high on the first move, another 3% (a guess since it was long ago) on the second. But even more importantly, if we didn't fail high by the 4th or 5th move, we were not going to fail high at all...


The number would be irrelevant to the IID discussion, because the decision to do IID is not taken in hindsight, based on the true nature of the node, but in advance, based on the classification.
I don't try to "classify nodes". The only such thing I do is that if I am searching a position where alpha and beta are the same as they were at the root node, then I am on a PV node and I am willing to do IID there. But nowhere else.

If the classification was wrong most of the time, you might as well not have bothered with it, so I assume that in the large majority of cases the classification is correct So when you find that 92% of all hindsight cut-nodes cut off on the first move, it really does not tell us anything about the minor fraction of those that were initially classified as ALL nodes. It might very well be that 95% of the hindsght cut-nodes were initially classified as CUT nodes, and 96% of those (so 91.2% of the total) would cut off on the first move, while of the remaining 5% only 16% (= 0.8% of the total) would cut off on the first move. This would bring the total of first-move cuts to 92%, while for the nodes we are discussing 84% would in fact have to search multiple moves.
What it does say, however, is that even when the hash move is not available, winning captures and killers are "good enough" in most cases. Good enough to use at zero cost, as opposed to a shallow search which is not free at all.

I experimented with IID a bunch when I first read about the idea. The key is to test it in games, not in test positions where move ordering is, by definition, bad, because there is a surprise tactical shot that won't be discovered until sufficient depth is reached. In games, what I do at present was the only case that actually offered any improvement, and it is something that is not done very often. Maybe once every 10 moves in a real game where I get a fail high and no best score before time runs out...
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: Nodes type clarification.

Post by Pradu »

xsadar wrote:With reference to PV nodes (which is all I commented on above), assuming good move ordering just predicts that the left side of the tree is composed of pv nodes and that they don't occur elsewhere, which is exactly what I said.

With alpha-beta you don't use null windows or re-searches, so if you mispredict you're just out of luck. With PVS, when your move ordering fails, you can get a predicted PV node in the middle of a tree due to a re-search. You can't do that in normal alpha-beta. That was my point.
Sorry, I misread your post.
User avatar
hgm
Posts: 27809
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Nodes type clarification.

Post by hgm »

bob wrote:I think I posted some better numbers on r.g.c.c or here a few years back, where I actually captured exact counts. 92% failed high on the first move, another 3% (a guess since it was long ago) on the second. But even more importantly, if we didn't fail high by the 4th or 5th move, we were not going to fail high at all...
Somehow I doubt the relevance of this statistic. The overwhelming majority of the cut-nodes where you had no hash move are d=1 nodes. These are nodes where you could not do IID in the first place The IID situation only occurs if you have no hash hit for a deep search.

And the probability that the first move of your static ordering will be a cutmove at, say, d=10, will not be the same as that it is cut-move at d=1. You would have to add the probability to that for low-failing of the hash move at d=2, 3, ... 10. That could add up. Even if there as only 1% probability that a subsequent iteration wil have to change hash move, after 10 iterations this would add 10%. By doing IID you could at least pre-empt a hash-move change at very low depth at very low cost.

Intuitively, I would expect that the probability for a hash-move change is higher at very early iteration than at very deep iterations (in non-PV nodes, where the asymptotic score lies well outside the window).
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Nodes type clarification.

Post by bob »

hgm wrote:
bob wrote:I think I posted some better numbers on r.g.c.c or here a few years back, where I actually captured exact counts. 92% failed high on the first move, another 3% (a guess since it was long ago) on the second. But even more importantly, if we didn't fail high by the 4th or 5th move, we were not going to fail high at all...
Somehow I doubt the relevance of this statistic. The overwhelming majority of the cut-nodes where you had no hash move are d=1 nodes. These are nodes where you could not do IID in the first place The IID situation only occurs if you have no hash hit for a deep search.
My point was that I don't care about "cut" nodes in the general sense. Good captures, killers, do well there. But in the specific case of PV nodes, where a good first move is critical, I apply IID when I have no hash move. It doesn't happen very often. Only on the first move after a search that fails high and then times out before the actual PV/Score is resolved... Or when any move fails high, at ply=2 I probably will have no clue about a "good move" to try. The measured effect is an improvement of 20-25% in some positions, zero-10% in the majority of positions... When I tried it anywhere else, it slowed me down...


And the probability that the first move of your static ordering will be a cutmove at, say, d=10, will not be the same as that it is cut-move at d=1. You would have to add the probability to that for low-failing of the hash move at d=2, 3, ... 10. That could add up. Even if there as only 1% probability that a subsequent iteration wil have to change hash move, after 10 iterations this would add 10%. By doing IID you could at least pre-empt a hash-move change at very low depth at very low cost.

Intuitively, I would expect that the probability for a hash-move change is higher at very early iteration than at very deep iterations (in non-PV nodes, where the asymptotic score lies well outside the window).
User avatar
hgm
Posts: 27809
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Nodes type clarification.

Post by hgm »

Bob, one more question about this 92% cutoff-rate by the first move. Does this include cutoffs cause by the null move? I suppose this is the first move you try, in nodes where you have no hash move.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Nodes type clarification.

Post by bob »

hgm wrote:Bob, one more question about this 92% cutoff-rate by the first move. Does this include cutoffs cause by the null move? I suppose this is the first move you try, in nodes where you have no hash move.
No. It is a measure of the case "I tried a real move, what percent of the time, when I get a fail high, does the fail high happen on the first move searched? This first move could be a hash move, or if no hash move, the first capture. Or if no captures are possible, the first killer, or just the first move generated.

I display this number, because something tells me the number tells more about the position that we suspect. For example, if this number is > 95%, what makes move ordering so good? One could speculate that this position is "well understood" and there are no unexpected tactical/positional tricks to deal with. On the other hand, if this drops to < 90%, it says that for some reason, on this iteration, previous move ordering info is failing, which likely means there is something going on that makes previous "good moves" fail.

This might be an indicator that we need to think a bit longer since anything that breaks move ordering implies that moves that were good previously are no longer good. This suggests that we are breaking new tactical ground and probably need to use a little extra time to understand what is going on.

I think this "percentage" is more a measure of "do I understand this position?" type idea, rather than "Is my move ordering good?" We know that this kind of move ordering works well, but we also know that it comes from previous searches (except for captures of course). If something invalidates that, it suggests that something unexpected is going on.

Hsu had a clever way of extending search time. I once quizzed him about it at an ACM event. His basic idea was expressed as "If the actual tree size can not be expressed as a canonical number based on the alpha/beta formula, then we use extra time..." This might be a way of determining when extra time is needed without requiring a score drop or fail low.

If you output this number, you will notice that on normal positions it is quite high, but it drops on fail high / fail low root positions. When you think about it, it actually makes sense as to why this is happening. And it is providing information about the current tree that might be useful beyond just a number to display to measure move ordering effectiveness.