Search traps in MCTS and chess

Daniel Shawul · Post by **Daniel Shawul** » Mon Dec 25, 2017 11:25 am

The following paper gives reasons as to why MCTS is not so successful in some domains like Chess, while it is very successful in domains like Go. The premise is that Chess has so many "shallow traps", that will make MCTS very inefficient to reach the minimax value. Even if the MCTS is given 50x more time is needed by an exhaustive minimax tree, it could fail to find a level-5 or level-7 trap. It will spend, f.i, 95% of its time searching an asymetric tree of depth > 7 when a shallow trap of depth-7 exists, thus, missing to find the level-7 trap. I think I am observing this phenomenon in my MCTS chess engines, and I wonder how AlphaZero solved it. They refered it in their paper so I am sure they are aware of it. I can perfectly see that the NN being capable of solving level-3 traps involing mates but deeper traps could be a problem. If they didn't have a solution to this problem, it means AlphaZero is at the mercy of an oppoennt that creates such traps deliberately ( a more 'tactical' engine i guess).

Paper
https://www.aaai.org/ocs/index.php/ICAP ... /1458/1571
Abstract
"Upper Confidence bounds applied to Trees (UCT), a banditbased
Monte-Carlo sampling algorithm for planning, has recently
been the subject of great interest in adversarial reasoning.
UCT has been shown to outperform traditional minimax
based approaches in several challenging domains such
as Go and Kriegspiel, although minimax search still prevails
in other domains such as Chess. This work provides insights
into the properties of adversarial search spaces that play a key
role in the success or failure of UCT and similar samplingbased
approaches. We show that certain “early loss” or “shallow
trap” configurations, while unlikely in Go, occur surprisingly
often in games like Chess (even in grandmaster games).
We provide evidence that UCT, unlike minimax search, is unable
to identify such traps in Chess and spends a great deal of
time exploring much deeper game play than needed."

Daniel

CheckersGuy · Post by **CheckersGuy** » Mon Dec 25, 2017 1:17 pm

Pure mcts almost certainly has that problem but I am not sure that their version of mcts (which is mcts without random playouts) suffers greatly from that problem. In my opninion, the tactical oversight is mostly due to the random playouts which are completly missing in A0. Would love to see some more research into this but unfortunately I am at the mercy of DeepMind

hgm · Post by **hgm** » Mon Dec 25, 2017 4:33 pm

I am puzzled about MCTS anyway. Why do people continue to count explored moves that are satisfactorily proven to be sub-optimal in the statistics / value of the node? This must slow down convergence to minimax enormously, in tactical positions, where you have to explore a lot before you hit upun the PV.

CheckersGuy · Post by **CheckersGuy** » Mon Dec 25, 2017 5:38 pm

hgm wrote:I am puzzled about MCTS anyway. Why do people continue to count explored moves that are satisfactorily proven to be sub-optimal in the statistics / value of the node? This must slow down convergence to minimax enormously, in tactical positions, where you have to explore a lot before you hit upun the PV.

What threshhold would u use for a node to qualify as "proven to be sub-optimal in the stastics" ? Those inferior nodes will have a bad action value and a bad uct-value as the number of passes through that node increase. So practically speaking, the node won't be visited very often (very small percentile). As for why count those nodes anyways. It probably has something to do with the convergence of mcts to perfect ply as the number of iterations go to infinity.

Daniel Shawul · Post by **Daniel Shawul** » Mon Dec 25, 2017 5:44 pm

CheckersGuy wrote:Pure mcts almost certainly has that problem but I am not sure that their version of mcts (which is mcts without random playouts) suffers greatly from that problem. In my opninion, the tactical oversight is mostly due to the random playouts which are completly missing in A0. Would love to see some more research into this but unfortunately I am at the mercy of DeepMind

I don't use the rollouts and yet I seem to have the problem. If a tactical line has some weak looking move (e.g. queen sacrifice), it would take a lot of nodes before that line is fully explored by MCTS. This has nothing to do with the rollouts but an inherent problem of MCTS in domains like chess.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Mon Dec 25, 2017 5:54 pm

hgm wrote:I am puzzled about MCTS anyway. Why do people continue to count explored moves that are satisfactorily proven to be sub-optimal in the statistics / value of the node? This must slow down convergence to minimax enormously, in tactical positions, where you have to explore a lot before you hit upun the PV.

UCB will viist nodes that are proven statistically inferior with confidence bounds less and less (exponentially visit rate maybe). The effort spent on week looking moves is really small (especially when you use a small exploration coefficient), which is good for games like Go, but for chess it can miss shallow traps. Using a larger exploration coefficient will pull the algorithms towards minimax that can solve those traps but then it will have a huge branching factor. Unless there is an algorithmic breakthrough the only way to deal with this problem is to avoid getting into this kind of situations. This is one of the things I want explained besides the strength of the pure policy network player.

hgm · Post by **hgm** » Mon Dec 25, 2017 6:01 pm

This seems a general problem of selective search, whether it is MCTS or alpha-beta with aggressive reduction. If it superficially looks bad, it will take a long time to discover it is actuallly good. This is why selecting moves based on superficial score is a bad idea. The move-selection heuristic must also suggest moves that might sometimes be good, even if on the average they will be very bad. Randomly sacrificing Queens will probably just drive up the branching ratio. But Queen sacrifices that serve some purpose (e.g. remove a piece that protected a square near the King) should not be rejected.

The move selection heuristic should not be so much a strong player, but an inquisitive one.

cdani · Post by **cdani** » Tue Dec 26, 2017 12:19 am

My intuitive answer is that, as a human can suggest a relatively good evaluation of a position without searching concrete lines, A0 can also, so it avoids tactical traps by selecting good positions.

CheckersGuy · Post by **CheckersGuy** » Tue Dec 26, 2017 2:13 am

All the soft pruning techniques (like uct and rave) still preserve the convergence property of mcts. As the number of iterations go to infinity the values approach the correct minimax value. That's why uct/rave decay as the number of iterations increase. Hard pruning would ruin that property but may not really matter as we already know from pruning techniques in alpha-beta-search.

My guess is, that we can only find out by starting DeepZero

Daniel Shawul · Post by **Daniel Shawul** » Wed Dec 27, 2017 1:12 am

Using a minmax backup operator seems to perform much better than an averaging backup operator. The latter is used in AlphaZero but I think the minmax backup operator maybe better for Chess. The scores are much more stable with the minimax operator and also it seems to be better in solving tactics. With the minimax operator, I select the move with the highest score (instead of highest number of visits) now.

A sample game where ScorpioMCTS demolishes TSCP now
[pgn]
[Event "Computer Chess Game"]
[Site "daniel-Satellite-C855"]
[Date "2017.12.26"]
[Round "-"]
[White "scorpio"]
[Black "tscp"]
[Result "1-0"]
[TimeControl "40/180"]
[Annotator "1. +0.31 4... -0.20"]

1. e4 {+0.31/42} e5 2. Nf3 {+0.31/44 4} Nc6 3. Nc3 {+0.33/57 4} Nf6 4. Bc4
{+0.27/29 4} Bc5 {-0.20/6 6} 5. O-O {+0.24/34 4} Ng4 {-0.05/5 6} 6. h3
{+1.14/15 4} Nxf2 {+0.39/6 6} 7. Rxf2 {+2.82/35 4} Bxf2+ {+0.18/6 5} 8.
Kxf2 {+2.85/31 4} d6 {+0.05/6 5} 9. d3 {+2.83/39 4} Bd7 {+0.14/5 5} 10. Bg5
{+3.32/23 4} Qc8 {+0.02/6 5} 11. Kg1 {+3.34/21 4} O-O {+0.05/5 5} 12. Nd5
{+3.10/23 4} Re8 {+0.03/5 5} 13. Bh4 {+3.24/14 4} a5 {+0.07/6 4} 14. Qd2
{+3.48/23 4} a4 {+0.12/5 4} 15. Qg5 {+4.22/34 4} Kh8 {+0.12/5 4} 16. Qh5
{+5.07/20 4} Nd8 {-0.18/5 4} 17. Ng5 {+5.85/12 4} h6 {-0.91/6 4} 18. Nf6
{+5.65/11 4} Re7 {-0.98/6 4} 19. Bxf7 {+6.69/11 4} Bc6 {-1.22/6 4} 20. Qg6
{+19.30/5 4} Bxe4 {-10.18/6 3} 21. dxe4 {+97.07/4 4} Qf5 {-99.96/5 0.5} 22.
exf5 {+198.98/3 4} hxg5 {-99.98/3 0.1} 23. Qh5# {+199.57/1 4}
{Xboard adjudication: Checkmate} 1-0

[/pgn]

Search traps in MCTS and chess

Search traps in MCTS and chess

Re: Search traps in MCTS and chess

Re: Search traps in MCTS and chess

Re: Search traps in MCTS and chess

Re: Search traps in MCTS and chess

Re: Search traps in MCTS and chess

Re: Search traps in MCTS and chess

Re: Search traps in MCTS and chess

Re: Search traps in MCTS and chess

Minmax backup operator for MCTS