Null Move in Quiescent search

mvk · Post by **mvk** » Thu Dec 10, 2015 9:19 pm

In PDS with 2 moments, variation/sigma always goes down with deeper search, even in the fully uncorrelated back propagation model, as long as the node is not singular (in which case sigma stays the same). At least, it won't pathologically go up when expanding the tree. For correlated back propagation, if such a thing is even possible, I think it should be similar, but I haven't thought it through yet. My hunch there is that unaccounted correlation can only invoke a false sense of independence, and with that a too fast reduction of sigma, but still no increase. Therefore I wonder if there are pathologies in artificial trees under back propagation of probability distributions of higher orders than what minimax does (which in a way is nothing more than a back propagation of PDs that have only the first moment). That would be interesting to check.

I actually think that sigma should always go down (i.e.: accuracy improves in every game) with deeper search as long as the evaluation has some predictive value. And then it doesn't matter if you back propagate one order with minimax or more with another method. In Nau's trees, indeed maybe the real problem is just that the evaluations are not predictions of the search to begin with?

[Edit: I just found this article]
http://www.sciencedirect.com/science/ar ... 0206000117

Luštrek e.a. wrote:Deeper searches in game-playing programs relying on the minimax principle generally produce better results. Theoretical analyses, however, suggest that in many cases minimaxing amplifies the noise introduced by the heuristic function used to evaluate the leaves of the game tree, leading to what is known as pathological behavior, where deeper searches produce worse results. In most minimax models analyzed in previous research, positions’ true values and sometimes also heuristic values were only losses and wins. In contrast to this, a model is proposed in this paper that uses real numbers for both true and heuristic values. This model did not behave pathologically in the experiments performed. The mechanism that causes deeper searches to produce better evaluations is explained. A comparison with chess is made, indicating that the model realistically reflects position evaluations in chess-playing programs. Conditions under which the pathology might appear in a real-value model are also examined. The essential difference between our real-value model and the common two-value model, which causes the pathology in the two-value model, is identified. Most previous research reports that the pathology tends to disappear when there are dependences between the values of sibling nodes in a game tree. In this paper, another explanation is presented which indicates that in the two-value models the error of the heuristic evaluation was not modeled realistically.

hgm · Post by **hgm** » Thu Dec 10, 2015 9:44 pm

OK, so it is apparently widely recognized that the problem is not so much with minimax as well with the tree model. It is true that fixed-depth minimax is not the most stable search algorithm. E.g. singular extension seems to solve the problem even for the rather pathological tree model used by Beal.

Funny thing is that mini-Shogi might actually be a game where minimax has problems. Most moves seem to be singular there.

Michel · Post by **Michel** » Fri Dec 11, 2015 9:59 am

I am also aware of the fact that it is possible to make synthetic game trees which do not (appear to) show pathology in experiments. One can even make plausible sounding a posteriori explanations why this is so.

However for a satisfying (scientific) theory one would like to be able to make testable predictions about the boundary between pathological/non-pathological behaviour. I think this has not really been achieved yet.

@Marcel
The pathology Nau observed is with a predictive static evaluation function. The pathology is caused purely by the minimax search. Anyway it is good to learn that PD search does not suffer from pathology.

hgm · Post by **hgm** » Fri Dec 11, 2015 11:56 am

To me your first remark sounds a bit like putting the world upside-down. No explanations, be it a posteriori or otherwise, are needed at all for anything. It is completely obvious that you can make trees that obey any statistics you choose. Just start with a root and expand all the nodes according to the statistics you wanted.

If you want every node to have b children, and the probability that a won node has n = 1, ..., b lost children equal to f_n, the fraction of errors remaining after two levels of minimaxing is b*(f_1*E + f_2*E*E + ... + f_b*N^b), if E is the original error rate << 1. In the limit of low error rate this becomes pathological when b*f_1 > 1. When f_1=0 it would even be quadratically convergent.

What needs an explanation is why the f_n should have the form of a binomial distribution, as if the probabilities that sibbling nodes are won should be independent of each other. That is a very special property, and a highly unnatural one at that. I wonder if there would exist any game that behaves that way. It is a very strict requirement, from which all kind of things follow that will not be obeyed in practice. Like this relation between branching ratio and fraction of lost positions. In real live these are also independent parameters.

It would be interesting to see what kind of statistics apply to trees of Chess-like games. E.g. one could analyze a tablebase to determine the f_n.

Michel · Post by **Michel** » Sat Dec 12, 2015 11:29 am

Nice!

I seem to get somewhat more complicated formulas but the first order approximation is the same. The error rate is multiplied by b*f_1 after two rounds of minimaxing. So indeed a clear threshold between pathology and non-pathology.

The model is highly simplified but it appears to give an indication that singular moves put a limit on the effectivity of minimax, regardless of the quality of the evaluation function. I was not aware of that.

PS. Why do you consider a binomial distribution strange for the f_n? A binomial distribution is just a bell shaped curve so it seems rather natural. It does not imply that sibling nodes are independent.

I am more worried about the assumption that the errors are independent. If an evaluation function makes a certain error it is likely to make the same error in sibling nodes.

hgm · Post by **hgm** » Sat Dec 12, 2015 12:54 pm

After some more thinking I started to realize that the whole tree model used by Beal is pathologic. The fraction of lost positions (side-to-move point of view) k is in equilibrium if it obeys (1-k)^b = k. But it is not a stable equilibrium! If you work your way down from the leaves with the recursion

k_(d+1) = (1 - k_d)^b = F(k_d)

where k_d is the fraction of lost positions at remaining depth d, it diverges away from the equilibrium, because dF/dk < -1 at the equilibrium point K:

F'(K) = -b*(1-K)^(b-1) = -b*F(K)/(1-K) = -b*K/(1-K)

Now b = ln(K)/ln(1-K), so F'(K) = -K*ln(K)/((1-K)*ln(1-K)), which is < -1 for 0 < K < 1/2 due to the assymmetry of x*ln(x), which starts steeper at x=0 and reaches its maximum at x = 1/e. So only for b=1, where K = 0.5, you would reach the stability margin.

Now the factor b*K/(1-K) happens to be exactly the same factor as which the errors blow up per level of minimaxing. So I think this just reflects the general instability of the tree. The slightest deviation from the equilibrium fraction of k at the leaves would exponentially amplify towards the root, and jump around (chaotically?) when it gets of order 1. Starting with random result assignment at the leaves is just not a viable prescription for generating a homogeneous tree.

D Sceviour · Post by **D Sceviour** » Sat Dec 12, 2015 3:21 pm

hgm wrote:
Michel wrote:Why minimax search works so well in chess like games remains somewhat of a mystery (not solved in this thesis AFAICS).
I don't think it is that much of a mystery. The thesis uses a tree model where the probability that a node is a win or a loss is independent for all nodes. This leads to a rather high fraction of singular moves. E.g. to have 1 out of every 16 nodes a loss for the side to move requires a branching ratio b = ln(1/16)/ln(15/16) = 43 (which is reasonable for Chess). But the fraction of won positions where the winning move is singular is b*(1/15)^2 = 19%.

In Chess singular moves are much rarer than that. One factor contributing to this is that the the probability that a node is a win is not independent. The game-theoretical result of especially sibbling nodes will be highly correlated. Many positions are quiet, and most moves then won't affect the result much, so that many of the daughter nodes will inherit the result from the parent.

I agree. My empirical estimate is singularity occurs less than 2%. I wish I could contribute further but it has been too long since I calculated stability functions.

Michel · Post by **Michel** » Sat Dec 12, 2015 5:37 pm

hgm wrote:After some more thinking I started to realize that the whole tree model used by Beal is pathologic. The fraction of lost positions (side-to-move point of view) k is in equilibrium if it obeys (1-k)^b = k. But it is not a stable equilibrium! If you work your way down from the leaves with the recursion

k_(d+1) = (1 - k_d)^b = F(k_d)

where k_d is the fraction of lost positions at remaining depth d, it diverges away from the equilibrium, because dF/dk < -1 at the equilibrium point K:

F'(K) = -b*(1-K)^(b-1) = -b*F(K)/(1-K) = -b*K/(1-K)

Now b = ln(K)/ln(1-K), so F'(K) = -K*ln(K)/((1-K)*ln(1-K)), which is < -1 for 0 < K < 1/2 due to the assymmetry of x*ln(x), which starts steeper at x=0 and reaches its maximum at x = 1/e. So only for b=1, where K = 0.5, you would reach the stability margin.

Now the factor b*K/(1-K) happens to be exactly the same factor as which the errors blow up per level of minimaxing. So I think this just reflects the general instability of the tree. The slightest deviation from the equilibrium fraction of k at the leaves would exponentially amplify towards the root, and jump around (chaotically?) when it gets of order 1. Starting with random result assignment at the leaves is just not a viable prescription for generating a homogeneous tree.

Yes I have seen computations like that before. If you generate a tree by assigning a certain win probability to leaf nodes then in the limit (preserving parity of depth) the root position is either always won or always lost depending on a certain threshold.

Generating from the root appears to give more natural results. But theoretically it seems a bit weird.

hgm · Post by **hgm** » Sun Dec 13, 2015 9:34 am

To me it is generating random outcomes at the leaves that seems wierd. Like all preceding moves did not have any effect on the outcome at all. You play a number of moves, and at the end you flip a coin to determine the outcome, and the probability does depend in no way on what you played. The moves only serve to alter your luck. What kind of game is that?

As to the binomial distribution:
If you suppose all sibbling nodes are equivalent, i.e. that the frequency of ++-++ is equal to that of -++++, +-+++ etc., then a binomial distribution fixes the frequency of any individual outcome in a way to make them independent. Only if you assume non-equivalence you could still get a binomial outcome with dependent win probabilities. E.g. that a + implies all its 'older' sibblings must also be +.

Michel · Post by **Michel** » Sun Dec 13, 2015 10:41 am

hgm wrote:To me it is generating random outcomes at the leaves that seems wierd. Like all preceding moves did not have any effect on the outcome at all. You play a number of moves, and at the end you flip a coin to determine the outcome, and the probability does depend in no way on what you played. The moves only serve to alter your luck. What kind of game is that?

I agree that the leaf outcomes should not be independent. But ultimately the value at the root is determined by the value at the leaves, not the other way round.

Of course this has the effect that the leaf values depend conditionally on the root value (in a probabilistic sense). This is why generating from the root may make sense.

hgm wrote: As to the binomial distribution:
If you suppose all sibbling nodes are equivalent, i.e. that the frequency of ++-++ is equal to that of -++++, +-+++ etc., then a binomial distribution fixes the frequency of any individual outcome in a way to make them independent. Only if you assume non-equivalence you could still get a binomial outcome with dependent win probabilities. E.g. that a + implies all its 'older' sibblings must also be +.

Your model did not include such an ordering!

Concerning the model: it seems to yield the counter intuitive conclusion that in tactical situations (lots of singular moves) one should not search at all (actually depth 1 to get a move) and in quiet positions one should search as deeply possible.

One can probably fix this by including hard wins/losses (E=0). I recall reading an article once which showed that enough hard scores also have a negative effect on pathology (this is intuitively clear, if all scores are hard then there is no pathology).

Null Move in Quiescent search

Re: Null Move in Quiescent search

Re: Null Move in Quiescent search

Re: Null Move in Quiescent search

Re: Null Move in Quiescent search

Re: Null Move in Quiescent search

Re: Null Move in Quiescent search

Re: Null Move in Quiescent search

Re: Null Move in Quiescent search

Re: Null Move in Quiescent search

Re: Null Move in Quiescent search