Singular Margin based on depth?

bob · Post by **bob** » Tue Feb 14, 2012 5:42 pm

lkaufman wrote:
bob wrote:
The tree is exponential in behavior. If you extend a root move, that entire sub-tree gets extended. If you wait until you are near the tips, you make the singular test many times, and you will not extend every branch. But if you extended at the root, every one of those "tip positions" is automatically extended.

For example, suppose your root singular margin results in one of every 10 moves getting extended. Those entire sub-trees are now one ply deeper. Suppose you don't extend those positions, but near the tips of each one, you extend one of every 5 moves. With the former, 100% of the sub-tree is extended by one ply. With the latter, only 20% is extended, even though we are extending twice as frequently at the tips than at the root. An 80% smaller tree.

I still don't get it. In your example the root extension would cause the total tree size to grow by 10% (never mind the sub-tree), whereas the near-leaf extension would make the total tree size grow by 20%. What am I missing?

There are other root branches. But if you extend at the root, the entire sub-tree for that root move is extended by a ply. If you extend at the tips, only a PART of that particular sub-tree is extended, only 20% in the example I gave.

So extend at the root, 100% of the sub-tree is extended by one ply. Extend at the tip and only the tips that get extended get bigger, the rest are unchanged, which is obviously a "smaller number".

bob · Post by **bob** » Tue Feb 14, 2012 5:43 pm

BubbaTough wrote:
lkaufman wrote: I still don't get it. In your example the root extension would cause the total tree size to grow by 10% (never mind the sub-tree), whereas the near-leaf extension would make the total tree size grow by 20%. What am I missing?
It is not completely clear to me that extending a few deep nodes causes a worse explosion than extending a lot of shallow nodes (except in obvious cases where all the shallow nodes are subsumed within the few deep nodes). My intuition is this is very much the type of thing that is sometimes true and is sometimes not, and has to be measured on a case by case basis. I would guess the particular settings of the program in question were not deduced by pure logic, but were one of many many half-hazardly concocted things that were tried, and happened to work best. If people here can come up with convincing explanations post-mortem, that is great, but I wouldn't read to much into it. Try whatever makes sense to you, when you find out it doesn't work, try the opposite, and after you have found something that works great via exhaustive search, you can make up a wonderful philosophy on why your solution makes so much sense.

-Sam

In a tree search, I go by the idea that "once you make a decision, you have to live with it until that sub-tree is completed." The closer to the root you are, the bigger the sub-tree and the longer you have to live with the consequences of that decision.

bob · Post by **bob** » Tue Feb 14, 2012 5:48 pm

lkaufman wrote:We've tried both ways on this, but the results were just one elo apart, not significant. In cases like this, I think you have to go by logic and what "ought" to work, unless you think even a tiny plus like one elo should count for more than your own reasoning.

I am always suspicious of such results. In fact, that is what led me to conclude that the original Fruit "history pruning" was wrong. Fruit uses a pruning percentage number that I tried tuning on our cluster. And there was no pattern. I generally can see a normal parabolic-type curve with a clear maximum value at the optimal value, and a drop-off in Elo on either side of that optimal value. For the history pruning value in Fruit, it behaved randomly. Which suggested to me that it was ineffectual. I turned that part of the pruning off completely (actually what we today call LMR, a reduction) and just reduced everything but moves that could be considered tactical, and it played better. Which was what I had already found in Crafty, after trying several different ways of updating the history counter (the discussions can be found here in the archives).

When something offers no improvement (and I rarely test to 1 elo since that requires beyond 100K games) I toss it as "random noise".

bob · Post by **bob** » Tue Feb 14, 2012 5:52 pm

BubbaTough wrote:

José Carlos wrote:

Code: Select all

Consider this example&#58;

D2           A
          /  |  \
D1      B    C    D
       /|\  /|\  /|\
D0    E F GH I JK L M


  In this tree, A is searched to depth 2, which leads to 13 nodes searched.
  If you extend A, it will be searched to depth 3&#58;

D3                            A
                     /        |             \
D2          B                 C                          D
        /   |   \          /  |   \                 /    |    \
D1    E     F     G     H     I        J        K        L        M
     /|\   /|\   /|\   /|\   /|\     / | \    / | \    / | \    / | \
D0  N O P Q R S T U V W X Y Z A' B' C' D' E' F' G' H' I' J' K' L' M' N'

  40 nodes.
  Now suppose you take the original tree and extend most leaf nodes, like this&#58;

D2                            A
                     /        |             \
D1          B                 C                          D
        /   |   \          /  |   \                 /    |    \
D0    E+1   F+1   G+1   H+1   I+1      J+1      K+1      L        M
     /|\   /|\   /|\   /|\   /|\     / | \    / | \  
D0  N O P Q R S T U V W X Y Z A' B' C' D' E' F' G' H'

  I've marked the extended nodes with "+1". As you see, you have less nodes &#40;34&#41;.

  I hope the tree reads correctly.

Good example for root. I think most everyone is convinced (or can be) that extending root causes >= nodes searched than extending farther down the tree. To me, this is kind of obvious as all the nodes extended near the leaves are of course a subset of the nodes extended at the root. It is not at all clear to me, however, that this kind of subset relationship holds for the original topic (extending a node where the highest ranked node is better than all others by a certain margin).

-Sam

This is probably worthy of a long discussion / investigation, since Hsu and Campbell certainly did not venture out into this area...

Extending near the root is expensive. Should you use a wider SE margin to extend less? It would make sense in terms of costs, since costs go down if you extend less. But then, what is the justification for extending more frequently near the tips? It is less costly so you can afford to do extra extensions when you are almost "out of depth"?

I've not given it any great amount of thought, as it is not a simple question.

bob · Post by **bob** » Tue Feb 14, 2012 5:55 pm

lkaufman wrote:I'll simplify and rephrase the question: Would always extending the first move near the root lead to a bigger tree than always extending the first move near the leaf?

Yes.

If you extend the first move ALWAYS, first point is that the first move is usually the biggest tree (in about 85% of the searches done one does not change the best move on the next iteration). If you do that, ALL the moves at the tip of that tree are extended. Whereas if you extend just the first move at each tip, all the non-first-move trees are not extended

bob · Post by **bob** » Tue Feb 14, 2012 5:56 pm

lkaufman wrote:
rvida wrote:
lkaufman wrote: "At lower depths the overhead is smaller." I don't get this. If you are using a depth reduction of 50% (or any percentage), the cost of the exclusion search should be a HIGHER percentage at lower depths than at higher ones. Can you explain this comment?
Absolute number of extra nodes is important not the percentage.
Why?

Would seem to me the two numbers are equivalent... Increase total nodes by 10%, increases total nodes by a specific number of nodes when you know the previous tree size...

bob · Post by **bob** » Tue Feb 14, 2012 5:59 pm

Eelco de Groot wrote:
rvida wrote:
lkaufman wrote:One of the ideas in the Ippolit based programs (and in Critter) is that when doing the test for Singularity (for Singular Extension), the margin is proportional to depth. Aside from whether this tests well or not, can anyone explain the rationale for this idea? It is not at all obvious to us why a deeper search should require a larger margin. One might even argue that move ordering gets better with depth, so a smaller margin might be needed at higher depths.
Original purpose was to limit the overhead of the exclusion search. Larger margin causes it to fail-high faster on non-singular moves. At lower depths the overhead is smaller and a smaller margin is affordable.

But with other improvements in Critter's search this became (almost) a non-issue and now I use a fixed margin.
I vote for Richard's answer in this thread My own rationale has changed a bit as well. The main purpose I see for the singular extension is that the singular move has to fail low i.e. after a deeper search the move no longer produces a beta cutoff so of course no longer being singular either. Only then it will have had some value to search the hashmove deeper. But if in the position where you test for singularity, there are at least two moves that give a beta cutoff, even if the first one fails the second one may still take over. So in that case there is little point in a singular extension. Basically this means you do not have to have a margin at all, it makes more sense, theoretically that is, just to be sure there are two moves that fail high against beta. Next factor is the nullwindow results of the singularity test degrade very quickly like all nullwindowsearches, that the nullwindow result against the best move is better than some margin is not worth very much in precision because you use very shallow searches (typically half depth) but also if these go deeper the lines become less accurate (the search 'degrades' so to speak and should drop below beta further if they can't reach it in the first place because of imprecise moves) and the hashresult itself is not fixed so you test against a moving target.

The larger margin then is there because you would still like to use this very imprecise nullwindow result even if it has degraded. But other than being able to use degraded nullwindow searches I don't think the margin has a real meaning, the tuning depends more on how fast your nullwindow searches degade in accuracy, not on some actual distance to beta or even more imprecise (distance to) the hashresult of the stored beta-cutoff.

Regards, Eelco

I don't agree. The "margin" was not chosen to minimize the cost of the singular search. It was chosen to set a threshold for which moves need to be extended. Do you want to just extend moves that are 1/3 of a pawn better than all the rest, which will recognize deep positional threats, or do you just want to extend moves that are a pawn better than the rest, to recognize tactical threats only?

If you use that margin for any other purpose, then I would not call that "singular extensions" as defined in the literature.

bob · Post by **bob** » Tue Feb 14, 2012 6:02 pm

lkaufman wrote:
Eelco de Groot wrote:
rvida wrote:
lkaufman wrote:One of the ideas in the Ippolit based programs (and in Critter) is that when doing the test for Singularity (for Singular Extension), the margin is proportional to depth. Aside from whether this tests well or not, can anyone explain the rationale for this idea? It is not at all obvious to us why a deeper search should require a larger margin. One might even argue that move ordering gets better with depth, so a smaller margin might be needed at higher depths.
Original purpose was to limit the overhead of the exclusion search. Larger margin causes it to fail-high faster on non-singular moves. At lower depths the overhead is smaller and a smaller margin is affordable.

But with other improvements in Critter's search this became (almost) a non-issue and now I use a fixed margin.
I vote for Richard's answer in this thread My own rationale has changed a bit as well. The main purpose I see for the singular extension is that the singular move has to fail low i.e. after a deeper search the move no longer produces a beta cutoff so of course no longer being singular either. Only then it will have had some value to search the hashmove deeper. But if in the position where you test for singularity, there are at least two moves that give a beta cutoff, even if the first one fails the second one may still take over. So in that case there is little point in a singular extension. Basically this means you do not have to have a margin at all, it makes more sense, theoretically that is, just to be sure there are two moves that fail high against beta. Next factor is the nullwindow results of the singularity test degrade very quickly like all nullwindowsearches, that the nullwindow result against the best move is better than some margin is not worth very much in precision because you use very shallow searches (typically half depth) but also if these go deeper the lines become less accurate (the search 'degrades' so to speak and should drop below beta further if they can't reach it in the first place because of imprecise moves) and the hashresult itself is not fixed so you test against a moving target.

The larger margin then is there because you would still like to use this very imprecise nullwindow result even if it has degraded. But other than being able to use degraded nullwindow searches I don't think the margin has a real meaning, the tuning depends more on how fast your nullwindow searches degade in accuracy, not on some actual distance to beta or even more imprecise (distance to) the hashresult of the stored beta-cutoff.

Regards, Eelco
I'm not completely following your explanation about degraded nullwindow searches, but my own interpretation of it would be that if you reduce the search depth (for the singular test) by a percentage of depth remaining rather than by a constant number of plies, you need a larger margin due to the increased gap between singular depth and actual depth. Is that what you are saying? If so it sounds right.
What do you (and others reading this) think about the question of whether the margin should be from beta, from the hash score, or from some function of both?

Yes. Hsu did not use a "variable depth reduction." Whether that is a good idea or not is completely unknown so far as I know. They used a fixed reduction, and tuned the window to optimize the number of extended nodes to produce the best result.

I think it risky to say, near the root, "I have a lot of depth remaining, so I am going to whack off some percentage of it and use the resulting singular margin search to decide whether this move is singular or not." Seems risky to whack the depth significantly near those root positions, because now you may well lack the necessary depth to see why the move is actually singular (or is not singular)...

Eelco de Groot · Post by **Eelco de Groot** » Tue Feb 14, 2012 7:28 pm

lkaufman wrote:
Eelco de Groot wrote:
rvida wrote:
lkaufman wrote:One of the ideas in the Ippolit based programs (and in Critter) is that when doing the test for Singularity (for Singular Extension), the margin is proportional to depth. Aside from whether this tests well or not, can anyone explain the rationale for this idea? It is not at all obvious to us why a deeper search should require a larger margin. One might even argue that move ordering gets better with depth, so a smaller margin might be needed at higher depths.
Original purpose was to limit the overhead of the exclusion search. Larger margin causes it to fail-high faster on non-singular moves. At lower depths the overhead is smaller and a smaller margin is affordable.

But with other improvements in Critter's search this became (almost) a non-issue and now I use a fixed margin.
I vote for Richard's answer in this thread My own rationale has changed a bit as well. The main purpose I see for the singular extension is that the singular move has to fail low i.e. after a deeper search the move no longer produces a beta cutoff so of course no longer being singular either. Only then it will have had some value to search the hashmove deeper. But if in the position where you test for singularity, there are at least two moves that give a beta cutoff, even if the first one fails the second one may still take over. So in that case there is little point in a singular extension. Basically this means you do not have to have a margin at all, it makes more sense, theoretically that is, just to be sure there are two moves that fail high against beta. Next factor is the nullwindow results of the singularity test degrade very quickly like all nullwindowsearches, that the nullwindow result against the best move is better than some margin is not worth very much in precision because you use very shallow searches (typically half depth) but also if these go deeper the lines become less accurate (the search 'degrades' so to speak and should drop below beta further if they can't reach it in the first place because of imprecise moves) and the hashresult itself is not fixed so you test against a moving target.

The larger margin then is there because you would still like to use this very imprecise nullwindow result even if it has degraded. But other than being able to use degraded nullwindow searches I don't think the margin has a real meaning, the tuning depends more on how fast your nullwindow searches degade in accuracy, not on some actual distance to beta or even more imprecise (distance to) the hashresult of the stored beta-cutoff.

Regards, Eelco
I'm not completely following your explanation about degraded nullwindow searches, but my own interpretation of it would be that if you reduce the search depth (for the singular test) by a percentage of depth remaining rather than by a constant number of plies, you need a larger margin due to the increased gap between singular depth and actual depth. Is that what you are saying? If so it sounds right.

Yes, I think that is another reasoning for an increased margin. But I am just not sure that assuming the margin does work this way, at least for Stockfish, and I am prety sure this is tested although I have no idea myself what other variations were tested against it, I am just just not so sure that why we think we are doing this is actually the reason why it works. I have posted about this some time before, maybe I am not expressing myself very clearly now but I am just trying to say it in different words; In my opinion this is not just about extending the singular move but about improving the quality of the tree, in case you would have to fall back on another move than the singular move. So for instance if you use a half depth search, it is surely very imprecise as Bob is saying, but it is in line with what you do in Internal Iterative Deepening, and it is about the quality of the search tree that is improved behind the moves that are not the singular move, and it pays off if the singular move fails low against beta.

Also if you do not test against beta but against the value of the hash move, some reasoning for that is possible too. At very small depths the result returned of a nullwindow search against beta is just a lower limit, assuming it is a cut node and the hash move fails high against alpha (>= beta), but it is also reasonably correlated with the static evaluation. Because the depth is so small. So even if the static evaluation has some systematic error, the same systematic error is probably also present after a small search from other moves in that node. So to find a move that is positionally close (to the best move found so far in this CUT node) it is probably better not to test against beta but against something that is closer to a real positional evaluation (i.e. in this case the hashresult), even if alpha-beta technically it is just a lower bound above beta. After eight plies or so, I fear that this actually does not work anymore. I can't see why it should. The nullwindow test against beta should lost its correlation with the static eval of the position. Does that make any sense?

There seem several things in play here, why I am not so sure this is just about extending the singular move. Also, it has to be tested but I don't believe the way the singular extension is done now, at least for Stockfish, is optimal. But you just have to test it and if what I am saying does not work it is simply yet another theory fit for the dustbin.

Eelco

What do you (and others reading this) think about the question of whether the margin should be from beta, from the hash score, or from some function of both?

Singular Margin based on depth?

Re: Singular Margin based on depth?

Re: Singular Margin based on depth?

Re: Singular Margin based on depth?

Re: Singular Margin based on depth?

Re: Singular Margin based on depth?

Re: Singular Margin based on depth?

Re: Singular Margin based on depth?

Re: Singular Margin based on depth?

Re: Singular Margin based on depth?