beyond minimax

hgm · Post by **hgm** » Fri Apr 27, 2007 11:28 pm

Minimax is the theoretically perfect way to decide upon a move in a zero-sum games, provided that the scores in the end leaves are exact. Fortunately, in Chess only the evaluations of checkmates and legal draws are exact, and search trees are seldomly deep enough that their leaves consist entirely of those. It is not obvious that minimax is also a correct algorithm when the scores at the nodes are inaccurate guesses.

Indeed in practice minimax has a few obvious shortcomings, that can be traced to this:
1) Pure minimax doesn't care about the length of the path. The winner is in no hurry to get to the win, as long as he remains able to enforce it later. The loser sees no reason to prolong the game.
2) Minimax is quite arrogant, and sees no reason for avoiding avoidable risks. If it can reach a desirable end leaf through a path A->B->C, where all other positions reachable from B are immediately lost, and there is another line A->B'->C where from B' there are also moves to positions C', C", ... that are only slightly worse than C, it sees no reason to prefer B' over B.

The latter is of course exceedingly stupid: if, once in B, we can search further ahead beyond C, and the score at C drops a lot because of this, there is no way to avoid swallowing the bitter pill. If in B' we would have discovered that C turns bad, we still have the option to move to C' or C", which are unlikely to all suffer the same score drop as C. Yet, minimax would assign the moves A->B and A->B' exactly equal scores, namey the evaluation of C, and the choice between the two would depend on random factors. One should never waive free insurance...

It is not so difficult to adapt minimax to take account of the path length or insurance effects. The score of a node in pure minimax is the score of the best move in that node. We could easily derive that score differently, so that the second-best, and later moves are not entirely ignored. An obvious mathematical generalization would be to replace the maximum of N moves by the Nth root of the sum of the Nth powers of all the moves. For N->infinity, this is the same as the maximum, as the largest score dominates the sum more and more if N get higher. The other extreme, N=1, would take the sum of all the move scores to make the node score, i.e. all moves would weigh equally no matter how low their score. (Probably not a good idea for Chess, unless your opponent has the level of a random move generator...)

Now such methods of score propagation towards the root strongly interfere with alpha-beta pruning. But there are plenty of alternatives to just a plain maximum that do not hurt alpha-beta pruning too much, but do allow some tinkering with the score of the best move, possibly based on scores of other moves, to make the score of the node. The method I have been using for a long time to make the winner prefer the shortest path to a win is one of them: I add a bonus to the score of the best move if this score is negative, to make the score of the node. The losing side is then rewarded for doing moves, offering it an incentive to fight the loss as hard and long as it can.

Similarly, one could add a small bonus to the score of the best move if there were other moves that had an equal, or nearly equal score. The more moves that are nearly as good as the best the higher the bonus, so that it will prefer the path that offers the most alternatives if the original goal turns sour. These bonuses could be infinitesimal, meaning they would only act as a tie breaker to otherwise truly equal scoring positions (in practice most-likely the same position, then). Or you could be more generous, making the program willing to compromise a little between best score and certaintty that it will realize it.

Of course the search window will have to be pre-compensated for the later tinkering: it must never be possible that a move score that was out-of-window in the search of the move (so that it is merely a bound) will be brought in window by the tinkering (as it would be interpreted as if it were exact, then). So if you are going to add 1 point to the score of the move afterwards, you should subtract it from the window with which the move is searched first. The fail-low and fail-high regions of the move searched will than be nicely mapped back to outside the original window for the node. If only negative scores will get one point added, you have subtract 1 only from those window limits that were negative.

Similarly, you could for instance subtract a 1-point penalty from a move that is the only good move. In that case you would have to shift the window for search of the move by 1 point upward. But as you are not certain that you will have to give the penalty, you need an accurate score in the original window as well. So the total window where you need an exact score would only have beta shifted up by one.

If you get a beta cutoff even with this up-shifted beta, you are done. If you get a score equal to the original beta, it is now not a cutoff, as you might have to subtract one point of the score. So you search on, but as soon as you find a move that is equal in score (again the original beta) you now know that there are two moves with equal scores, and you will not have to subtract the single-good-move penalty. So you shift back beta to its original value, and take the beta cutoff after all. You could even decide to do that if your second move approaches the best to within a certain limit (e.g. 20 cP), provided that alpha is enough below beta to see that.

Another trick you could play is make the node score dependent on cuurent evaluation, as well as best move, to discourage the planning of risky pathways through 'bad neighborhoods': subtract a small penalty from the best-move score if the current eval is bad (or the current QS score). As the eval is known before you have to search moves, the search window is easily adapted to this.

Dan Andersson · Post by **Dan Andersson** » Sat Apr 28, 2007 1:43 am

Have you looked at UCT? It is being utilized in some new Go programs in conjunction with Monte-Carlo evaluation. A bit memory intensive since it stores a tree in memory.

http://en.wikipedia.org/wiki/Upper_Conf ... _%28UCT%29
http://hal.inria.fr/inria-00117266
And a new analysis of Bandit type searches:
http://hal.inria.fr/inria-00136198

MvH Dan Andersson

hgm · Post by **hgm** » Sat Apr 28, 2007 10:41 am

Hmm, It sounds interesting, although it seems a different pproach from what I propose here. I would not have to store anything.

The term 'Monte Carlo' does come up a number of times, and this reminds me of another idea I had to implement the 'emergency exit' idea without much hassle:

You could add a small pseudo-random number to the score of each move, before taking the maximum, but otherwise do completely regular minimax. If the random is on the order of the noise that you will typically have in the evaluation (noise being defined as the difference between the evaluation of a quiet node and its score on a deep search), but slightly smaller, it will hardly affect the accuracy of the program. But in a node where you have many moves with approximately best score, the highest random bonus is likely to be higher than in a node where there is only a single best move. So the search would be statistically drawn to a path that goes through nodes that provide many alternatives.

Even if it was drawn by a side branch that was made PV only by virtue of a lucky random bonus, by the time you get there, you will search deeper, and if the true best line really is better, you would see it at that point, and switch back to it.

Of course the pseudo-randoms used should not be really random: to get consistent hashing, they would have to be the same each time you revisit the same node. But that can be achieved easily enough by deriving the pseudo-random from the hash key of that node, and the From and To squares of the move (e.g. XOR all that and thake the low bits as an index in some small table). I already do something similar to randomize the move choice, in the evaluation (add a random derived from the hash key of the evaluated position).

beyond minimax

beyond minimax

Re: beyond minimax

Re: beyond minimax