Exactly, that is why averaging is the better planning method in most cases. At first, I didn't realize why moves like Bxh7 and anything that gets major piece closer to opponent king was getting such a high score. Once the opponent makes the somewhat obvious escape move after our optimisitic Bxh7 with a +10 pawns score, the score goes back down to 0 because the escape actually happened on the board.Michel wrote:Daniel Shawul wrote:That is why I do not use averaging as a backup operator for MCTS in chess -- I return the minimax value of the best bandit instead. This gives a more stable search and i think is better tactically as well. A typical example is a position where you are threatening a mate-in-1 but the opponent has one or two escape move that keeps the score at 0. With averaging as a backup operator, the engine would hugely prefer these position as it can get mate scores from all the other non-escape moves. Since Mate scores are very high , even one of them ruins the actual score. My engine with this backup operator exhibits a wildly varying score from move to move. However, with minimax as a backup operator, the engine will keep its score at 0, and the score remains stable like a standard alpha-beta search. Thus, the averaging backup operator is preferred as the better planning method for games dominated with strategy, and the minimax operator for tactical games like chess. Note that the tactical gain i get from using the minimax operator, though significant, is miniscule compared to that from expanding the tree rapidly every visit.Usually the k-armed-bandit problem is formulated with arms that have a benign distribution, like Gaussians with fixed standard deviation, only differing in expectation value. Unfortunately random sampling of a Chess tree is not like that at all. It is more like dealing with bandits where most of the contribution to the average comes from a few outcome with very small probablilty but huge payout. You really have no idea of the average until you have enough visits to representatively sample the unlikely events. Likewise, in Chess you will only get a meaningful value when you have enough visits to discover the tactics that define the main line, and start exploiting that.
I have no idea how they removed the tactical blindness from A0 -- I am currently stuck at TSCP + 100 elo with the MCTS + qsearch approach.
You are obviously much more an expert than I. But I guess the idea is that an escapable mate threat should not be scored as zero (the minimax score) since it is still has a significant value as a threat (restricting the opponent), and moreover there could be nearby lines where the near mate is actually a mate.
I had that to/from a logistic score at first, but it seemed to me that it was an unnecessary intermediate step so now I am using raw scores instead. The stability issue with the minmax operator was still there before i switched to backing up raw scores instead. Though it was more convenient to do so, maybe I shouldn't have removed for the same reasons as why elo scale is needed. The minimax backup returns the score of a single leaf node, so it seems the conversion may not to be needed atleast for this case.BTW. I assume before applying the averaging operator you first convert scores to a winning probability? It seems obvious but I just wanted to make sure.
Daniel