A new algorithm accounting for the uncertainty in eval funcs

tholden · Post by **tholden** » Thu Nov 13, 2014 2:12 pm

Maybe I confused things by saying "game lattice". I meant "game directed acyclic graph". (DAGs can always be made into lattices in the order theoretic sense, by adding extra terminal nodes.)

AlvaroBegue · Post by **AlvaroBegue** » Thu Nov 13, 2014 2:20 pm

tholden wrote:Maybe I confused things by saying "game lattice". I meant "game directed acyclic graph". (DAGs can always be made into lattices in the order theoretic sense, by adding extra terminal nodes.)

How does that work with this game DAG with root at `a'? How do you turn that into a lattice?

Code: Select all

  d     e
  |\   /|
  | \ / |
  |  X  | <-- &#91;The "X" is not a node&#93;
  | / \ |
  |/   \|
  b     c
   \   /
    \ /
     a

tholden · Post by **tholden** » Thu Nov 13, 2014 2:52 pm

This is somewhat off topic... I really just meant DAG, and the (wrong!) comment was just by way of excuse for my terminological confusion... But you're right. For some reason I was under the impression that all DAGs were semi-lattices, but this isn't true. In your example, d and e have three lower bounds, but no greatest lowest bound. Sorry! So turning your game into a lattice would require adding both a terminal node, and your x, rather destroying the structure.

I should remind myself of definitions before using terms...

tholden · Post by **tholden** » Thu Nov 13, 2014 3:23 pm

Returning to topic, having read those Braun and Smith papers, I see that using a DAG rather than a tree would cause some difficulties, which I had not previously considered, since it introduces non-independence between the random events whose probabilities are being combined.

The practical importance of this will be unclear until we implement the algorithm. My hunch is that it would not have a massive impact on play. However, if it turned out to be important, Braun and Smith (1998) suggest assuming independence between sufficiently remote branches, but solving exactly on small sub-trees, which is one possibility. Another would be to ignore the effect completely during search, but then perform a final Monte Carlo step (sampling from the probabilities on leaf nodes) in order to choose the move with the highest expected value at the root.

tholden · Post by **tholden** » Thu Nov 13, 2014 3:41 pm

Actually, the proposed random search algorithm means the DAG problem could be solved even during the search stage, at much lower cost than for the Braun and Smith algorithm.

To choose where to search, take a single sample from the distribution at all leaf nodes. (So this is an assignment of W, L or D to each leaf.) Propagate this back to the root. Then, starting from the root, at each node, take a single sample uniformly at random from amongst the weakly dominant child nodes (these will usually be the child nodes marked as W), and set this child node to be new node.

The algorithm given in the doc for choosing where to search was of complexity proportional to the height of the DAG times its branching factor, whereas this is proportional to the number of nodes in the DAG, so it's potentially quite a bit slower. However, this may be ameliorated by using the same sample from the leaf nodes for many explorations of the tree. In the extreme version of this, we would explore every weakly dominant path for each sample from the leaves.

Hrvoje Horvatic · Post by **Hrvoje Horvatic** » Thu Nov 13, 2014 4:26 pm

Hello Tom,

I've been trying to beat alpha-beta/PVS for ages, so I'm simpathetic with your attempt... but as I see it, it will fail for the same reason as the approach of Baum & Smith... because chess is BOTH strategic AND tactical game... the multiplication propagation/backup rule (similar to probability distribution propagation of B&S) is basically suited for strategical games like go, but in my opinion it will fail in chess...

I'm ignoring for the moment that it is computationaly more expensive to do probability multiplications than simple integer minimaxing, because if the method worked, it would pay for itself in better search decisions... so I will focus on the main problem - tactics...

Let's contrast strategy with tactics to get this straight... in strategical analysis, if the position contains many bad moves, it is bad position, and if it contains many good moves it is good position... you "average" move scores between different positions to get which position is better... one of the ways this "averaging" could be done is the multiplication rule... what this actually means is that all the moves influence the decision... but in tactical positions, you very often end up in positions where all the moves except one are bad - but that one is the winning move... and "averaging" this move with the rest of the bad moves will trash the score...

..just my 50 cents...

I still believe there has to be "a middle way" between single-minded "choose one best move" of minimax and "averaging" across all moves in a manner similar to what you say, but I also believe that if you pit alpha-beta against "probability based" propagation rules, it will win every time...

Of course, maybe I have misread your paper, maybe you have somehow already solved that problem, and I just don't get it...

tholden · Post by **tholden** » Thu Nov 13, 2014 7:41 pm

The expense of multiplication seems like a non-issue to me. Time spent updating probabilities will be a tiny proportion, and by avoiding branches, update via multiplication can actually go faster on some hardware than update via a max. In any case, if the variant of the algorithm for DAGs was used, then no probabilities ever need to be updated.

I'm also not sure it's fair to say that my algorithm would unduly penalise situations with one good move and many bad ones. It is trivial to see that the probability of a win at a parent must be greater than the probability of a win at all its child nodes, so nodes with a strong child will themselves be strong. Of course, they would be stronger if they had several strong children, but this is as it should be. The probability update law does not average probabilities, rather it is just the probabilistic version of choosing the maximum/minimum.

A related worry is that the algorithm will never explore nodes with only one good child. This seems unlikely, since a well calibrated evaluation function will almost never return a high probability that a node is a loss, as such with the weak-dominance search heuristic, many nodes will be explored, at least to moderate depths. Further exploration could (if needed) be obtained via applying the weak dominance heuristic at nodes off the principal variation.

AlvaroBegue · Post by **AlvaroBegue** » Thu Nov 13, 2014 8:26 pm

I finally had the time to carefully read your proposal, and I think the propagation rule you propose is not workable.

In order to make things a bit easier, let's try to apply your algorithm to the game of go, with non-integer komi (meaning, there are no draws). For a position that has been expanded to depth 1, the propagation rule says the position will be a win as long as one of the moves leads to a win. If there are 300 moves available and the probability of winning with any one of them is 1%, the propagated probability according to minimax is 1%, but according to your rule it's 95%. This seems pretty absurd and can't possibly work.

The idea of deciding where to expand the tree by treating the problem as a multi-arm bandit is precisely the standard technique for MCTS. What you seem to be proposing is a hybrid that would grow a tree in MCTS fashion, use an evaluation function to evaluate the leaves (instead of a random playout), and then use a messed up rule to back up scores.

I don't mean to sound too harsh: I like how you want to think about the big picture instead of following the orthodoxy, but I think your propagation rule is particularly broken.

tholden · Post by **tholden** » Thu Nov 13, 2014 11:18 pm

I'm not sure what exactly you're disagreeing with. If a single event (losing following a particular move) has probability 99%, then the probability of that event happening 300 times in 300 independent trials is 99%^300 (i.e. around 5%). As I presume you know, this isn't "my propagation rule", this is just the laws of probability. Thus the probability of at least one win in those trials is 95%.

Thus, if the independence assumption is reasonable, then it is a consequence of my algorithm that moves enabling more degrees of freedom later on should be preferred. But this is a common prescription, as I understand it, with many chess evaluation functions incorporating a degrees of freedom term.

You may perhaps be disagreeing with the independence assumption. I only know the basics of Go, but I would expect there to be more correlation in the errors of Go evaluation functions than in chess ones. If the output of the evaluation function only changes a little with each move, then much of the uncertainty in the quality of each of those 300 child nodes may be driven by uncertainty about the quality of the parent, meaning it may be much more likely that either all 300 nodes are losses or all 300 nodes are wins. I suggested one way of dealing with non-independence in footnote 4 of the doc. Perhaps this might be more important for Go.

It is an interesting question whether there's a less ad hoc way of dealing with non-independence than that given in footnote 4. Perhaps something could be done using the evaluation function at the parent node, and the laws of conditional probability. I need to think a bit.

Generally, I would hope that problems caused by non-independence can be resolved by going to greater depths though.

I'm happy to hear that the multi-arm bandit approach is used in MCTS. I'm less crazy than I thought!

AlvaroBegue · Post by **AlvaroBegue** » Fri Nov 14, 2014 12:58 am

What I am disagreeing with is that it's a good idea to give a 95% winning probability to a position where whatever you do you only have a 1% probability of winning. Chances are the position is too desperate and what you do next is irrelevant.

This is not particularly a feature of go: In a closed position with inferior material in chess, you'll see that whatever you do you are just waiting for the opponent to break the pawn barrier and destroy you, and you may have many moves that accomplish nothing.

If anything, the minimax rule already gives a depth-1 value that is too optimistic, because if there is some noise in the evaluation function, you'll pick the score that happens to get the most advantageous noise. I feel the mechanism you propose for backing up probabilities has a lot more of that problem. When you then compare moves that have been explored to different depths, the scores will tell you more about the parity of the subtree than about the position.

I didn't see footnote 4, but now that I read it I am not sure I understand it.

A new algorithm accounting for the uncertainty in eval funcs

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f

Re: A new algorithm accounting for the uncertainty in eval f