The scaling of Deep Learning MCTS Go engines
Posted: Sun Oct 23, 2016 9:17 am
Since Zen and Crazy Stone started to top KGS ratings above 9d level, employing Deep Learning for their evaluation and MCTS search, it is interesting to see if they scale well with time control/hardware. We already know from the paper that AlphaGo does scale well with hardware, that's why they employed a large cluster to defeat Lee Sedol. I played 80 games with Crazy Stone Deep Learning at 10s vs 5s (doubling time control) on two devices (Chinese rules, Komi 7.5):
Phone: in these conditions about 3d level
PC: about 10-12 times faster, in these conditions about 6d level
Results were:
Phone: 58/80 ===> +168 ELO points for doubling time control
PC: 56/80 ===> +147 ELO points for doubling time control
It's hard to play more games with this software, but we can say that roughly diminishing returns are not that evident as in Chess ELO-wise (just look at Andreas thread with doubling of time). But diminishing returns do appear by the nature of the Go dan ratings. I took what seems to me the most rigorous ELO-like dan rating scheme for Go, GoR by the European Go Federation http://senseis.xmp.net/?GoR . It is ELO-like in the meaning that it uses logistic too, but with different and varying constants by strength. It is calibrated in such a way as to have a difference of 1 rank evaluated at 100 GoR points. But these have no univoque translation to winning percentages or usual ELO points, 100 GoR points (or 1 rank, say between 6d and 7d) mean different winning percentages at different strengths of the opponents. From the same resource I plotted what 100 GoR points (1 dan difference) mean in our usual ELO points as a function of rank (dan strength).
From test results with Crazy Stone, we see that doubling in time at 3d level means 0.87 dan improvement. Doubling in time at 6d level means 0.63 dan improvement. And due to the nature of Go ratings, these diminishing returns will accelerate for stronger pro players (although their ratings are messier). Looking at the continuation of the data to pro levels, it may well be that doubling in time is worth 0.3 stone handicap for high pro level. Still, as AlphaGo has shown, Deep Learning MCTS in Go is highly parallelizable, better than in Chess, and combined with this good scaling with time, with less accentuated diminishing returns than in Chess, moderately big hardware with improved software will beat humans easily. I remember 10-15 years ago, before MCTS, when Go engines were not only very weak, but also scaled very badly.
Phone: in these conditions about 3d level
PC: about 10-12 times faster, in these conditions about 6d level
Results were:
Phone: 58/80 ===> +168 ELO points for doubling time control
PC: 56/80 ===> +147 ELO points for doubling time control
It's hard to play more games with this software, but we can say that roughly diminishing returns are not that evident as in Chess ELO-wise (just look at Andreas thread with doubling of time). But diminishing returns do appear by the nature of the Go dan ratings. I took what seems to me the most rigorous ELO-like dan rating scheme for Go, GoR by the European Go Federation http://senseis.xmp.net/?GoR . It is ELO-like in the meaning that it uses logistic too, but with different and varying constants by strength. It is calibrated in such a way as to have a difference of 1 rank evaluated at 100 GoR points. But these have no univoque translation to winning percentages or usual ELO points, 100 GoR points (or 1 rank, say between 6d and 7d) mean different winning percentages at different strengths of the opponents. From the same resource I plotted what 100 GoR points (1 dan difference) mean in our usual ELO points as a function of rank (dan strength).
From test results with Crazy Stone, we see that doubling in time at 3d level means 0.87 dan improvement. Doubling in time at 6d level means 0.63 dan improvement. And due to the nature of Go ratings, these diminishing returns will accelerate for stronger pro players (although their ratings are messier). Looking at the continuation of the data to pro levels, it may well be that doubling in time is worth 0.3 stone handicap for high pro level. Still, as AlphaGo has shown, Deep Learning MCTS in Go is highly parallelizable, better than in Chess, and combined with this good scaling with time, with less accentuated diminishing returns than in Chess, moderately big hardware with improved software will beat humans easily. I remember 10-15 years ago, before MCTS, when Go engines were not only very weak, but also scaled very badly.