This is a question for Daniel Shawul (but also anyone else who might know...): You mentioned in some posts that a "minmax backup operator" worked better than the averaging operator for you. But you never mentioned exactly what you were doing - could you please elaborate a bit?
I wanted to try something like this, but I'm confused because in the leaf nodes you always have 1 win, loss or draw, and if you back up maximal/minimal scores, don't you always end up with 1/0/-1 and nothing in between? What am I missing?
cheers
Martin
MCTS with minmax backup operator?
Moderator: Ras
-
- Posts: 72
- Joined: Mon Mar 07, 2016 4:41 pm
- Location: Zürich, Switzerland
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MCTS with minmax backup operator?
I keep actual scores in centipawns but that is irrelvant to the minmax/averaging stuff.fierz wrote:This is a question for Daniel Shawul (but also anyone else who might know...): You mentioned in some posts that a "minmax backup operator" worked better than the averaging operator for you. But you never mentioned exactly what you were doing - could you please elaborate a bit?
I wanted to try something like this, but I'm confused because in the leaf nodes you always have 1 win, loss or draw, and if you back up maximal/minimal scores, don't you always end up with 1/0/-1 and nothing in between? What am I missing?
cheers
Martin
In the internal nodes, the score of a node is calculated from its children. Given childrens with scores of say [1,-1,-1,-1,0,0,], the minmax methods picks the first child and assigns a score of +1 to the parent node; the averaging method assigns a score of -1/3 to the parent.
Now you could also do the minmax and averaging by considering results from previous searches (expansions in case of MCTS). Infact, what AlphaZero does is to keep a single averaged score from different searches. So if the node in the above example had an old score of +1 from 1 previous visit, the average score will be (-1/3+1)/2. You could do the same with the minimax score as well; average it with the previous minimax score. So this one is similar to averaging scores from different iterative depths ...
The result for me clearly demonstrates that any kind of averaging deterioriates elo, except for the case of "avg" and "avg-old" (look below for definitions).
Code: Select all
1. scorpio-mcts-min 425 35 35 687 84.9% -69 14.8%
2. scorpio-mcts-mix 321 32 32 687 77.7% -52 22.6%
3. scorpio-mcts-ab 284 31 31 687 73.7% -46 16.6%
4. scorpio-mcts-min-old -58 30 30 688 42.3% 10 20.6%
5. scorpio-mcts-mix-old -102 31 31 687 38.6% 16 20.5%
6. scorpio-mcts-avg-old -194 32 32 687 30.0% 32 23.3%
7. scorpio-mcts-avg -672 60 60 687 2.9% 111 4.7%
mix = 3:1 ration minmax averaging
avg = Averaging
min-old = Averages current minmax score with previous minmax score
avg-old = Averages current average score with previous average score ( AlphaZero uses this)
mix-od = Averages current mix score with old mix score
Daniel
-
- Posts: 72
- Joined: Mon Mar 07, 2016 4:41 pm
- Location: Zürich, Switzerland
Re: MCTS with minmax backup operator?
Thanks for the quick reply!
On the averaging part, perhaps I was doing something stupid so far, I have been keeping track of wins/draws/losses for each playout; in your example
Do you mean to say that it's better to compute (old average = -1/3) + 1 / 2 = +1/3? Did you ever try both versions?
And on the minmax part, you do the same then? I will try!
On the averaging part, perhaps I was doing something stupid so far, I have been keeping track of wins/draws/losses for each playout; in your example
and a parent value of -1/3, if I did a new expansion and added a child with +1, I would have [1,-1,-1,-1,0,0,1] and a new average of -1/7.Given childrens with scores of say [1,-1,-1,-1,0,0,]
Do you mean to say that it's better to compute (old average = -1/3) + 1 / 2 = +1/3? Did you ever try both versions?
And on the minmax part, you do the same then? I will try!
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MCTS with minmax backup operator?
It is Ok to keep track of the total number of wins/losses with the number of visits which allows you to compute the average and standard deviation of the score eventually. Standard implementation of UCT actually does this.fierz wrote:Thanks for the quick reply!
On the averaging part, perhaps I was doing something stupid so far, I have been keeping track of wins/draws/losses for each playout; in your example
and a parent value of -1/3, if I did a new expansion and added a child with +1, I would have [1,-1,-1,-1,0,0,1] and a new average of -1/7.Given childrens with scores of say [1,-1,-1,-1,0,0,]
Yes, as I mentioned above. Those are "avg" and "avg-old" and this is the only one helped by averaging with previous score of the node.Do you mean to say that it's better to compute (old average = -1/3) + 1 / 2 = +1/3? Did you ever try both versions?
And on the minmax part, you do the same then? I will try!
Three of the six versions (those with "old" suffix) I mentioned before average with old scores kept for a node. This only helps the "avg" operator though; both the "min" and "mix" versions suffer from averaging with old scores. I hope the definitions I gave above are not confusing.
Note that when doing one playout from root to leaf, I have scores for siblings at each leve, that were not selected for expansion, from a qsearch() I do for each child when they are first created.
-
- Posts: 72
- Joined: Mon Mar 07, 2016 4:41 pm
- Location: Zürich, Switzerland
Re: MCTS with minmax backup operator?
Thanks again for the prompt reply.
One more dumb question: if you average a single value of the last rollout with say 100'000 previous results, isn't that a bit extreme (it kind of makes sense with 1 vs 6 as before, but with large numbers....)?
Is the A0 algorithm published somewhere? I looked at the A0 paper but there are no details in there
One more dumb question: if you average a single value of the last rollout with say 100'000 previous results, isn't that a bit extreme (it kind of makes sense with 1 vs 6 as before, but with large numbers....)?
Is the A0 algorithm published somewhere? I looked at the A0 paper but there are no details in there

-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MCTS with minmax backup operator?
The averaging is weighted by the number of visits, so in your example it hardly changes anything. The "min" version I have doesn't care if you had an old score with 100000 vistis, it just takes the value from the current ply. That is what an alpha-beta engine actually does, i.e the score can swing wildy going from depth d to d+1 since it doesnt average those two. This is what I found to perform the best for chess.fierz wrote:Thanks again for the prompt reply.
One more dumb question: if you average a single value of the last rollout with say 100'000 previous results, isn't that a bit extreme (it kind of makes sense with 1 vs 6 as before, but with large numbers....)?
The A0 algorithm is a standard UCT implementation for which there are many sources. You should probably look into the AlphaGo / AlphaGo Zero paper for mode details not the alphazero arxiv paper.Is the A0 algorithm published somewhere? I looked at the A0 paper but there are no details in there
-
- Posts: 72
- Joined: Mon Mar 07, 2016 4:41 pm
- Location: Zürich, Switzerland
Re: MCTS with minmax backup operator?
Dear Daniel,
I'm sorry to ask again (feeling pretty dumb...) but I just don't get it: with only 6 nodes, and adding a seventh node, you explained that the seventh node is added to the previous average directly, i.e.
new average = 1/2 ( old average(of six) + new value)
instead of
new average = 1/7 ( 6*old average + new value)
Now you say that the averaging is weighted by the number of visits? I don't get it
- that seems to imply the second formula above, but before you said it was the first one? What am I missing? I'm doing the second one, which is standard UCT as far as I can find it in the web, so how is A0 and your program different?
Regarding max/min: my engine is doing rollouts in the end, I can only get the values win/lose/draw. If I min/max from there, all I can back up to the root is also win/lose/draw, and that's different from alphabeta with eval where I can back up intermediate values and will switch....
I'm sorry to ask again (feeling pretty dumb...) but I just don't get it: with only 6 nodes, and adding a seventh node, you explained that the seventh node is added to the previous average directly, i.e.
new average = 1/2 ( old average(of six) + new value)
instead of
new average = 1/7 ( 6*old average + new value)
Now you say that the averaging is weighted by the number of visits? I don't get it

Regarding max/min: my engine is doing rollouts in the end, I can only get the values win/lose/draw. If I min/max from there, all I can back up to the root is also win/lose/draw, and that's different from alphabeta with eval where I can back up intermediate values and will switch....
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MCTS with minmax backup operator?
I didn't care to be precise because all I wanted is to convey how the averaging and minmax backup are done to get the parent node score.fierz wrote:Dear Daniel,
I'm sorry to ask again (feeling pretty dumb...) but I just don't get it: with only 6 nodes, and adding a seventh node, you explained that the seventh node is added to the previous average directly, i.e.
new average = 1/2 ( old average(of six) + new value)
instead of
new average = 1/7 ( 6*old average + new value)
Now you say that the averaging is weighted by the number of visits? I don't get it- that seems to imply the second formula above, but before you said it was the first one? What am I missing? I'm doing the second one, which is standard UCT as far as I can find it in the web, so how is A0 and your program different?
Regarding max/min: my engine is doing rollouts in the end, I can only get the values win/lose/draw. If I min/max from there, all I can back up to the root is also win/lose/draw, and that's different from alphabeta with eval where I can back up intermediate values and will switch....
The averaging is weighted by the number of visits to each child, not by the number of children.
In my previous example with average scores for the six children of [1,-1,-1,-1,0,0] and corresponding number of visits of say [4,2,2,1,1,2], you would get an average score of
Code: Select all
parent_score = (4*1+2*-1+2*-1+1*-1+1*0+2*0) / (4+2+2+1+1+2)
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MCTS with minmax backup operator?
I think what is confusing is that you are probably doing montecarlo playouts until you encounter terminal postions, i.e. where you have +1,0,-1 as a score. AlphaZero and my engine do not do montecarlo rollouts but use either a NN or a standard evaluation function to determine the winning probability from the leaf position. For example if the eval says +100 centi pawns up, that is a winning probability of 0.64. In your case, if you do multiple, not one, rollouts to evaluate a leaf position you can determine the winning probability by averaging the results from those rollouts. Say 10 montecarlo playouts are used to evaluate a leaf and you have 5 wins, 3 losses, 2 draws then your winning probability is a fraction which is (5*1+3*-1+2*0)/10=0.2.Regarding max/min: my engine is doing rollouts in the end, I can only get the values win/lose/draw. If I min/max from there, all I can back up to the root is also win/lose/draw, and that's different from alphabeta with eval where I can back up intermediate values and will switch....
It is only when you do a single rollout, which has a very large standard deviation not to be reliable, that you have to back up a win/lose/draw score. Infact it is recommended to use averaging when the standard deviation is high, and minmax when it is low. AlphaZero and my engine assume that the scores returned from NN/evaluation function give reliable estimates of the winning probability of the position so you could start using minmax right after one visist. Note that AlphaGo used to mix monte carlo rollouts and NN evaluation in a 50-50 manner but they found out the result from the NN is reliable enough to not use the costly montecarlo rollouts.
I hope this helps.
-
- Posts: 72
- Joined: Mon Mar 07, 2016 4:41 pm
- Location: Zürich, Switzerland
Re: MCTS with minmax backup operator?
Dear Daniel,
thanks, now it makes a lot more sense to me
I wanted to try MCTS/UCT because of its nice property of not needing an eval, just to learn something, so to remain pure I will try the multi-rollout.
best regards
Martin
thanks, now it makes a lot more sense to me

I wanted to try MCTS/UCT because of its nice property of not needing an eval, just to learn something, so to remain pure I will try the multi-rollout.
best regards
Martin