Hi again,
I've been working on a Monte-Carlo Tree Search for a while now. Initially, I used immediate evaluation in the simulation step. However, I recently switched to 2-ply Alpha-Beta searches. I am finding that the Alpha-Beta searches are much better at estimating win probabilities. The Alpha-Beta searches also help my MCTS pick up on subtle tactics that it missed before. I believe the playing strength has improved significantly with the new approach, even though the search depth is only between 8-20 ply at 20 seconds per move.
I am using the bandit algorithm UCB1, but I'm unsure if it is the right tool for this search. I also wonder what the best policy for selecting a final move is, given that the number of nodes expanded is about 20x fewer than it used to be. I am currently using "robust child."
Does anyone have experience with Monte Carlo Alpha-Beta? If so, what selection policies do you use?
Selection in Monte Carlo Alpha-Beta
Moderator: Ras
-
- Posts: 84
- Joined: Wed Aug 04, 2021 12:42 am
- Full name: Ellie Moore
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Selection in Monte Carlo Alpha-Beta
UCB1 is the mathematical perfect solution of "Exploration vs Exploitation" so a correct solution for - this move looks good but is it really the best one?RedBedHed wrote: ↑Fri Sep 16, 2022 8:45 am Hi again,
I've been working on a Monte-Carlo Tree Search for a while now. Initially, I used immediate evaluation in the simulation step. However, I recently switched to 2-ply Alpha-Beta searches. I am finding that the Alpha-Beta searches are much better at estimating win probabilities. The Alpha-Beta searches also help my MCTS pick up on subtle tactics that it missed before. I believe the playing strength has improved significantly with the new approach, even though the search depth is only between 8-20 ply at 20 seconds per move.
I am using the bandit algorithm UCB1, but I'm unsure if it is the right tool for this search. I also wonder what the best policy for selecting a final move is, given that the number of nodes expanded is about 20x fewer than it used to be. I am currently using "robust child."
Does anyone have experience with Monte Carlo Alpha-Beta? If so, what selection policies do you use?
UCB1 is garuanteed to converge to the best solution. I could share my implementation of it but suffice it to say it can be done insanely fast:
Food for thought: fast inverse sqrt calculation, ln(x) = c * log2(x), most importantly of all log2(x) for a power of 2 == std::countr_zero(x).
Also you can keep a direct pointer to the very best leaf node in each node and update it when backpropagating. Selection of a new leaf node can be done instantly (i found).
Stronger than UCB1 exists so: convergence can be done faster. But the only way this is done to train a policy network which will select the next node among all nodes. (Just train it such that the engine strength increases is all)
MCTS requires a noisy playout until the end so you really just get -1, 0, 1 as the output for a given search. You can skip that random playout by an evaluation function that is very strong (like lc0). So in summary the strongest MCTS engine is this: Policy Network + Evaluation Network
One fact I was suprised by: Just the Lc0 evaluation network is so good that with 1 node visit per move it has an elo of above 2000.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Selection in Monte Carlo Alpha-Beta
UCB1 is not much good, is it? A tree search in chess is nothing like repeated sampling of a distribution with finite variance.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Selection in Monte Carlo Alpha-Beta
Its good. UCB1 selects a move depending on if it leads to a good outcome. If the average outcome is good then it may be a VERY bad move. (1 forced losing line but 100 winning ones)
So you dont select the child with the best average winrate - but you select the child to move with the highest amount of picks according to ucb1.
Chess is a perfect game and exactly the same problem as the https://en.wikipedia.org/wiki/Multi-armed_bandit
This move looks good. But 10 moves later it turns out to be bad. Alphabeta solves this. MCTS can solve it also.
The only problem with MCTS is that you have to grow a tree in memory.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 3230
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Selection in Monte Carlo Alpha-Beta
AFAIK you can use MCTS-AB with UCT and UCB1 with win/draw/loss as score, but you can also add a score based heuristic function for position evaluation and then use PUCT:
https://www.chessprogramming.org/Christ ... Rosin#PUCT
then it would look more like a Best-First Minimax Search:
https://www.chessprogramming.org/Best-F ... max_Search
--
Srdja
https://www.chessprogramming.org/Christ ... Rosin#PUCT
then it would look more like a Best-First Minimax Search:
https://www.chessprogramming.org/Best-F ... max_Search
--
Srdja
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Selection in Monte Carlo Alpha-Beta
Apparently Kommodo Dragon's MCTS mode is doing it like that, too: https://youtu.be/JUQoSVqiavw?t=1689
-
- Posts: 84
- Joined: Wed Aug 04, 2021 12:42 am
- Full name: Ellie Moore
Re: Selection in Monte Carlo Alpha-Beta
Woah. Yeah is pretty wild that Lc0 and AlphaZero are so strong evaluation-wise. I don't think I could ever hope to train a network like that on my laptop sadly :/dangi12012 wrote: ↑Fri Sep 16, 2022 11:43 amUCB1 is the mathematical perfect solution of "Exploration vs Exploitation" so a correct solution for - this move looks good but is it really the best one?RedBedHed wrote: ↑Fri Sep 16, 2022 8:45 am Hi again,
I've been working on a Monte-Carlo Tree Search for a while now. Initially, I used immediate evaluation in the simulation step. However, I recently switched to 2-ply Alpha-Beta searches. I am finding that the Alpha-Beta searches are much better at estimating win probabilities. The Alpha-Beta searches also help my MCTS pick up on subtle tactics that it missed before. I believe the playing strength has improved significantly with the new approach, even though the search depth is only between 8-20 ply at 20 seconds per move.
I am using the bandit algorithm UCB1, but I'm unsure if it is the right tool for this search. I also wonder what the best policy for selecting a final move is, given that the number of nodes expanded is about 20x fewer than it used to be. I am currently using "robust child."
Does anyone have experience with Monte Carlo Alpha-Beta? If so, what selection policies do you use?
UCB1 is garuanteed to converge to the best solution. I could share my implementation of it but suffice it to say it can be done insanely fast:
Food for thought: fast inverse sqrt calculation, ln(x) = c * log2(x), most importantly of all log2(x) for a power of 2 == std::countr_zero(x).
Also you can keep a direct pointer to the very best leaf node in each node and update it when backpropagating. Selection of a new leaf node can be done instantly (i found).
Stronger than UCB1 exists so: convergence can be done faster. But the only way this is done to train a policy network which will select the next node among all nodes. (Just train it such that the engine strength increases is all)
MCTS requires a noisy playout until the end so you really just get -1, 0, 1 as the output for a given search. You can skip that random playout by an evaluation function that is very strong (like lc0). So in summary the strongest MCTS engine is this: Policy Network + Evaluation Network
One fact I was suprised by: Just the Lc0 evaluation network is so good that with 1 node visit per move it has an elo of above 2000.
I haven't really thought about the speed of UCB1... I have it implemented with std::log and std::sqrt... I will look into this asap

-
- Posts: 84
- Joined: Wed Aug 04, 2021 12:42 am
- Full name: Ellie Moore
Re: Selection in Monte Carlo Alpha-Beta
Yeah, I was thinking the same thing... But I can confirm that UCB1 does work. My understanding of it is this:
Each node is a multi-armed bandit, and each action is an arm. Rewards are winning probabilities based on normalized evaluations back-propagated up the tree. UCB1 starts by exploring a lot, and it exploits more as nodes accrue visits.
The algorithm is:
Code: Select all
child_value/child_visits + sqrt(2log(parent_visits)/child_visits)
The exploration term is:
Code: Select all
sqrt(2log(parent_visits)/child_visits)
So when the child's visit count is low, UCB1 will likely try it.
The exploitation term is just the winning probability:
Code: Select all
child_value/child_visits
So when child (and parent) visit counts are high, UCB1 will pick the nodes with stronger winning probabilities.
During a search, nodes that have been visited very frequently (closer to the root) are often exploited. Nodes closer to the leaves need to be explored more. But I think the idea is that, given enough time, the search will eventually converge around what minimax would consider the pv line, and with only a fraction of the nodes expanded.
Last edited by RedBedHed on Fri Sep 16, 2022 8:12 pm, edited 1 time in total.
-
- Posts: 84
- Joined: Wed Aug 04, 2021 12:42 am
- Full name: Ellie Moore
Re: Selection in Monte Carlo Alpha-Beta
Ah okay, so PUCT really is worth trying then... I just wish I understood it lol. I need to take some probability courses. I've been studying computer science for too long. :'( Do you happen to know where I can find lecture materials related to PUCT?smatovic wrote: ↑Fri Sep 16, 2022 5:14 pm AFAIK you can use MCTS-AB with UCT and UCB1 with win/draw/loss as score, but you can also add a score based heuristic function for position evaluation and then use PUCT:
https://www.chessprogramming.org/Christ ... Rosin#PUCT
then it would look more like a Best-First Minimax Search:
https://www.chessprogramming.org/Best-F ... max_Search
--
Srdja
-
- Posts: 3230
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Selection in Monte Carlo Alpha-Beta
https://www.chessprogramming.org/Christ ... ite_note-4
https://www.chessprogramming.org/Christ ... ite_note-5
https://www.chessprogramming.org/Christ ... ite_note-6
You can take a look at the original paper, or at Deepmind's AlphaZero paper, or at the implementation by Lc0 for example.
--
Srdja