You seem to forget that the "innovation" is mostly the neural network they use. There isn't anything special about mcts but apparently using a neural network + reinforcement learning instead of random playouts seems to work quite well.syzygy wrote:Yes, it seems the randomness refers just to the random playouts. But without any randomness left I don't see what the M and the C are still doing in AlphaZero's MCTS.AlvaroBegue wrote:Randomness has never been a requirement in the tree part of the search.syzygy wrote:[...] And the randomness of the selection is also quite limited (if not completely absent - the paper states that the edge is selected that maximizes an "upper confidence bound").
The "move probabilities" term in the paper initially suggested to me that the selection of the next node followed those probabilities. Apparently it does not, so why they are called probabilities is still rather unclear to me. (Probably it has something to do with the "confidence" in UCB.)The earliest description I remember of this type of algorithm with MoGo's UCT search, which picked the move that maximizes a formula called UCB1 ("UCB" stands for "Upper Confidence Bound", just like in the recent paper). Actually I imagine most implementations don't have any random element in the tree part of the playout.
To me it feels more like the tree-expansion of proof-number search or of book-building algorithms. The innovation in MCTS, at least to me, lied solely in the random playouts (which should not work very well for chess).AlphaGo initially combined the results of playouts with the evaluation at the leaf of the tree (that was probably its biggest innovation), and then AlphaGo Zero and AlphaZero do away with the playout altogether, using only an evaluation function. Still, the whole mechanism for growing the tree and for backing up values is the same, so it feels like the MCTS label fits.
Standard mcts doesn't really work for chess because random playouts aren't any good. The move probabilites (or whatever you wanna call it) from the NN provide a "soft-pruning". Completly stupid moves will have a very small probability and therefore won't be visited very often. Contrary to standard mcts that may visits those nodes way more often