The scaling of Deep Learning MCTS Go engines

Laskos · Post by **Laskos** » Sun Oct 23, 2016 9:17 am

Since Zen and Crazy Stone started to top KGS ratings above 9d level, employing Deep Learning for their evaluation and MCTS search, it is interesting to see if they scale well with time control/hardware. We already know from the paper that AlphaGo does scale well with hardware, that's why they employed a large cluster to defeat Lee Sedol. I played 80 games with Crazy Stone Deep Learning at 10s vs 5s (doubling time control) on two devices (Chinese rules, Komi 7.5):

Phone: in these conditions about 3d level
PC: about 10-12 times faster, in these conditions about 6d level

Results were:

Phone: 58/80 ===> +168 ELO points for doubling time control
PC: 56/80 ===> +147 ELO points for doubling time control

It's hard to play more games with this software, but we can say that roughly diminishing returns are not that evident as in Chess ELO-wise (just look at Andreas thread with doubling of time). But diminishing returns do appear by the nature of the Go dan ratings. I took what seems to me the most rigorous ELO-like dan rating scheme for Go, GoR by the European Go Federation http://senseis.xmp.net/?GoR . It is ELO-like in the meaning that it uses logistic too, but with different and varying constants by strength. It is calibrated in such a way as to have a difference of 1 rank evaluated at 100 GoR points. But these have no univoque translation to winning percentages or usual ELO points, 100 GoR points (or 1 rank, say between 6d and 7d) mean different winning percentages at different strengths of the opponents. From the same resource I plotted what 100 GoR points (1 dan difference) mean in our usual ELO points as a function of rank (dan strength).

From test results with Crazy Stone, we see that doubling in time at 3d level means 0.87 dan improvement. Doubling in time at 6d level means 0.63 dan improvement. And due to the nature of Go ratings, these diminishing returns will accelerate for stronger pro players (although their ratings are messier). Looking at the continuation of the data to pro levels, it may well be that doubling in time is worth 0.3 stone handicap for high pro level. Still, as AlphaGo has shown, Deep Learning MCTS in Go is highly parallelizable, better than in Chess, and combined with this good scaling with time, with less accentuated diminishing returns than in Chess, moderately big hardware with improved software will beat humans easily. I remember 10-15 years ago, before MCTS, when Go engines were not only very weak, but also scaled very badly.

matthewlai · Post by **matthewlai** » Sun Oct 23, 2016 11:40 am

MCTS does seem to scale much better than alpha-beta, especially with tricks like virtual loss used in AlphaGo. Most of the scaling inefficiency (as well as many other problems) from alpha-beta comes from the fact that it's depth-first. MCTS, being a best-first search, does not suffer from that.

MCTS obviously also does not scale perfectly, since it uses prior results to steer the search, but as the tree gets larger, the incremental information gain from each playout decreases, and scaling gets better.

Laskos · Post by **Laskos** » Sun Oct 23, 2016 4:53 pm

matthewlai wrote:MCTS does seem to scale much better than alpha-beta, especially with tricks like virtual loss used in AlphaGo. Most of the scaling inefficiency (as well as many other problems) from alpha-beta comes from the fact that it's depth-first. MCTS, being a best-first search, does not suffer from that.

MCTS obviously also does not scale perfectly, since it uses prior results to steer the search, but as the tree gets larger, the incremental information gain from each playout decreases, and scaling gets better.

It's amazing in Go how MCTS is very selective, but is tactically astute enough. This wouldn't work in Chess, I guess. The scaling ELO wise is much better than with alpha/beta search, probably for the reason (best-first) you said. And despite steep increase in strength of strong humans not clearly revealed in their dan rankings which become messy for pros, they stand no chance against a cluster and neural network eval guided MCTS.

matthewlai · Post by **matthewlai** » Sun Oct 23, 2016 4:59 pm

Laskos wrote:
matthewlai wrote:MCTS does seem to scale much better than alpha-beta, especially with tricks like virtual loss used in AlphaGo. Most of the scaling inefficiency (as well as many other problems) from alpha-beta comes from the fact that it's depth-first. MCTS, being a best-first search, does not suffer from that.

MCTS obviously also does not scale perfectly, since it uses prior results to steer the search, but as the tree gets larger, the incremental information gain from each playout decreases, and scaling gets better.
It's amazing in Go how MCTS is very selective, but is tactically astute enough. This wouldn't work in Chess, I guess. The scaling ELO wise is much better than with alpha/beta search, probably for the reason (best-first) you said. And despite steep increase in strength of strong humans not clearly revealed in their dan rankings which become messy for pros, they stand no chance against a cluster and neural network eval guided MCTS.

I'm not sure if MCTS really won't work for chess. Like you said, Go can also be very sharp tactically, and sometimes requiring a very precise sequence of moves.

I think the problem is just that people haven't tried MCTS with chess seriously, because MCTS is so new, and A-B has been around for so long. People have a tendency to stick very hard to known approaches in computer chess for some reason. Not many people are experimenting with drastically different stuff.

An interesting approach would be something like using MCTS with Q-search as the leaf evaluation function.

Albert Silver · Post by **Albert Silver** » Sun Oct 23, 2016 10:26 pm

Laskos wrote:Since Zen and Crazy Stone started to top KGS ratings above 9d level, employing Deep Learning for their evaluation and MCTS search, it is interesting to see if they scale well with time control/hardware. We already know from the paper that AlphaGo does scale well with hardware, that's why they employed a large cluster to defeat Lee Sedol. I played 80 games with Crazy Stone Deep Learning at 10s vs 5s (doubling time control) on two devices (Chinese rules, Komi 7.5):

Phone: in these conditions about 3d level
PC: about 10-12 times faster, in these conditions about 6d level

Results were:

Phone: 58/80 ===> +168 ELO points for doubling time control
PC: 56/80 ===> +147 ELO points for doubling time control

It's hard to play more games with this software, but we can say that roughly diminishing returns are not that evident as in Chess ELO-wise (just look at Andreas thread with doubling of time). But diminishing returns do appear by the nature of the Go dan ratings. I took what seems to me the most rigorous ELO-like dan rating scheme for Go, GoR by the European Go Federation http://senseis.xmp.net/?GoR . It is ELO-like in the meaning that it uses logistic too, but with different and varying constants by strength. It is calibrated in such a way as to have a difference of 1 rank evaluated at 100 GoR points. But these have no univoque translation to winning percentages or usual ELO points, 100 GoR points (or 1 rank, say between 6d and 7d) mean different winning percentages at different strengths of the opponents. From the same resource I plotted what 100 GoR points (1 dan difference) mean in our usual ELO points as a function of rank (dan strength).

From test results with Crazy Stone, we see that doubling in time at 3d level means 0.87 dan improvement. Doubling in time at 6d level means 0.63 dan improvement. And due to the nature of Go ratings, these diminishing returns will accelerate for stronger pro players (although their ratings are messier). Looking at the continuation of the data to pro levels, it may well be that doubling in time is worth 0.3 stone handicap for high pro level. Still, as AlphaGo has shown, Deep Learning MCTS in Go is highly parallelizable, better than in Chess, and combined with this good scaling with time, with less accentuated diminishing returns than in Chess, moderately big hardware with improved software will beat humans easily. I remember 10-15 years ago, before MCTS, when Go engines were not only very weak, but also scaled very badly.

Would this leap forward be a one-time deal, or can one expect newer versions to be further refined and stronger?

Michel · Post by **Michel** » Mon Oct 24, 2016 7:23 am

There are many articles about MCTS but this one was particularly clear to me:

https://jeffbradberry.com/posts/2015/09 ... ee-search/

It contains a link to a paper with a convergence proof (for the UCT variant).

A proposed alternative explanation for the apparent under performing of MCTS in chess is simply that static evaluation in chess is very good (there is a high correlation with game outcomes). So there is no need to replace static evaluation with random playouts (which would presumably be much more noisy).

This being said, current engines prune so much that at low depth their search may well resemble MCTS (a comment from Marcel van Kervinck).

Laskos · Post by **Laskos** » Mon Oct 24, 2016 7:45 am

Albert Silver wrote:
Would this leap forward be a one-time deal, or can one expect newer versions to be further refined and stronger?

It seems to me they will steadily improve. Crazy Stone still seems 1-2 KGS ranks below AlphaGo on the same hardware. I don't even know if Zen and Crazy Stone employ the value network to decide the MC playouts. As of now, the Crazy Stone Deep Learning on i7 PC is still little below pro level, and the KGS server version with 32 cores and 8 GPUs is regular pro level.

AlphaGo of the paper is still almost completely un-tuned, a fact which might bring additional 2-3 stones. All in all, I guess in 2-3 years, publicly available engines will improve by 3-4 KGS ranks, to regular pro level for i7 1 GPU and to top pro on 32 core 8 GPUs.

matthewlai · Post by **matthewlai** » Mon Oct 24, 2016 11:17 am

Michel wrote:There are many articles about MCTS but this one was particularly clear to me:

https://jeffbradberry.com/posts/2015/09 ... ee-search/

It contains a link to a paper with a convergence proof (for the UCT variant).

A proposed alternative explanation for the apparent under performing of MCTS in chess is simply that static evaluation in chess is very good (there is a high correlation with game outcomes). So there is no need to replace static evaluation with random playouts (which would presumably be much more noisy).

This being said, current engines prune so much that at low depth their search may well resemble MCTS (a comment from Marcel van Kervinck).

The evaluation method (static eval vs random playouts) and search method (alpha-beta vs MCTS) are actually orthogonal.

It's also possible to have alpha-beta with random playouts (do 100 playouts for each evaluation), and MCTS with static eval (back-up static eval instead).

Just because MCTS was initially developed with random playouts, and that it happened to be best for Go for a long time, doesn't mean it's the only way. AlphaGo is living proof that MCTS + static eval also works really really well.

Michel · Post by **Michel** » Mon Oct 24, 2016 5:36 pm

AlphaGo is living proof that MCTS + static eval also works really really well.

I did not know that AlphaGo uses a static eval. Thanks for pointing that out!

The scaling of Deep Learning MCTS Go engines

The scaling of Deep Learning MCTS Go engines

Re: The scaling of Deep Learning MCTS Go engines

Re: The scaling of Deep Learning MCTS Go engines

Re: The scaling of Deep Learning MCTS Go engines

Re: The scaling of Deep Learning MCTS Go engines

Re: The scaling of Deep Learning MCTS Go engines

Re: The scaling of Deep Learning MCTS Go engines

Re: The scaling of Deep Learning MCTS Go engines

Re: The scaling of Deep Learning MCTS Go engines