Google's AlphaGo team has been working on chess

Michel · Post by **Michel** » Tue Dec 12, 2017 10:45 pm

Apparently it is worse. Stockfish needs a thousand times as many nodes in its tree, and then it still loses.

Well something is worse. It could be the evaluation function, or perhaps A/B does not scale as well as UCT. Or perhaps it is the move ordering. Most likely it is a combination of all of these things. Without further testing we cannot know.

syzygy · Post by **syzygy** » Tue Dec 12, 2017 10:58 pm

hgm wrote:Apparently it is worse. Stockfish needs a thousand times as many nodes in its tree, and then it still loses.

But you seem to assume that AlphaZero's NN isn't worth much, or at least not worth more than 1000 nodes in SF's tree.

If you are right about that, then what makes AlphaZero play well must be its book-building/IDEA-like approach to search. (I think we agree here, but I may not have understood you well.)

I'm curious to see if it can be made to work with some variation of SF's regular search in the leaf nodes. If it does, then we can start adding domain-specific knowledge to the selection step and obtain a nice improvement.

I remain somewhat sceptical about the ability of a neural network to infer general chess-evaluation principles from a couple of million games. (Maybe that just shows my lack of knowledge about the current state of the art in neural networks.)

syzygy · Post by **syzygy** » Tue Dec 12, 2017 11:07 pm

I guess there is evidence that at very long time controls a book-building approach beats "pure" alpha-beta. In particular if an experienced human guides the search, but it should be possible to at least approach if not surpass the efficiency of the human.

Maybe tree building is a better term.

The question is at what time control a tree building approach can get the upper hand over pure alpha-beta. As computers get faster, that time control gets shorter.

CheckersGuy · Post by **CheckersGuy** » Tue Dec 12, 2017 11:22 pm

syzygy wrote:I guess there is evidence that at very long time controls a book-building approach beats "pure" alpha-beta. In particular if an experienced human guides the search, but it should be possible to at least approach if not surpass the efficiency of the human.

Maybe tree building is a better term.

The question is at what time control a tree building approach can get the upper hand over pure alpha-beta. As computers get faster, that time control gets shorter.

My guess is that alpha beta will only get "worse" in the next years. Thats because , at least what it seems like, alpha-zero's approach scales much better than alpha-beta. Therefore alpha-zero gains more with the next generation hardware than, let's say, stockfish does.

Additionally, I am against the term "book-building-approach". It's not like A0 zero stores the best move in a given position.

hgm · Post by **hgm** » Tue Dec 12, 2017 11:48 pm

syzygy wrote:But you seem to assume that AlphaZero's NN isn't worth much, or at least not worth more than 1000 nodes in SF's tree.

But that is not just an assumption, we kow that from a simple extrapolation, don't we? In fig. 2a of the AlphaZero paper we see that the Elo of AlphaZero drops below that of Stockfish at ~300ms/move (i.e. 24k AlphaZero nodes), and is some 120 Elo worse at 40ms/move (3.2k nodes). That is 120 Elo for a factor 8 reduction in tree size, and there are early 4 more factors 8 to go before we are at a tree size of a single node. By that time Stockfish' 1000 nodes would be about 600 Elo stronger than AlphaZero's single node. If we assume 70 Elo per doubling for Stockfish, we could reduce its tree size to about 2-3 nodes to get it on par with AlphaZero's single node.

And that makes sense. I don't believe a NN could be much more accurate than QS, when the situation gets complex.

syzygy · Post by **syzygy** » Wed Dec 13, 2017 12:38 am

hgm wrote:
syzygy wrote:But you seem to assume that AlphaZero's NN isn't worth much, or at least not worth more than 1000 nodes in SF's tree.
But that is not just an assumption, we kow that from a simple extrapolation, don't we? In fig. 2a of the AlphaZero paper we see that the Elo of AlphaZero drops below that of Stockfish at ~300ms/move (i.e. 24k AlphaZero nodes), and is some 120 Elo worse at 40ms/move (3.2k nodes). That is 120 Elo for a factor 8 reduction in tree size, and there are early 4 more factors 8 to go before we are at a tree size of a single node. By that time Stockfish' 1000 nodes would be about 600 Elo stronger than AlphaZero's single node. If we assume 70 Elo per doubling for Stockfish, we could reduce its tree size to about 2-3 nodes to get it on par with AlphaZero's single node.

And that makes sense. I don't believe a NN could be much more accurate than QS, when the situation gets complex.

QS is an interesting point. It seems AlphaZero doesn't care whether the position being expanded and NN-evaluated is a quiet position or not.

But the QS point also shows that an evaluation function only works with a minimum of search. The evaluation function can know a lot about passed pawns, but if a relatively simple combination wins the pawn, that knowledge is not worth much. For the same reason I do not yet rule out that the strength of AlphaZero's evaluation function becomes more apparent as its search takes care of the immediate tactics that its NN cannot grasp.

syzygy · Post by **syzygy** » Wed Dec 13, 2017 1:00 am

CheckersGuy wrote:Additionally, I am against the term "book-building-approach". It's not like A0 zero stores the best move in a given position.

But I am looking at the (so-called MC) tree it builds, starting with a 1-node tree and growing it by successively expanding leaf nodes (whether by NN or some other means). That is how automatic book building works.

See e.g. Thomas Lincke's dissertation. Section 3.4 considers expansion strategies. The rule he came up with gives a high expansion priority to a move that is good or has a shallow subbook. It seems he was not aware of the multi-armed bandit theory.

Milos · Post by **Milos** » Wed Dec 13, 2017 1:37 am

syzygy wrote:But I am looking at the (so-called MC) tree it builds, starting with a 1-node tree and growing it by successively expanding leaf nodes (whether by NN or some other means). That is how automatic book building works.

See e.g. Thomas Lincke's dissertation. Section 3.4 considers expansion strategies. The rule he came up with gives a high expansion priority to a move that is good or has a shallow subbook. It seems he was not aware of the multi-armed bandit theory.

That's probably because UCT gives deeper but narrower book, so unless you artificially introduce noise in the root node (as they did in self-play games of Alpha0) you'd end up with pretty narrow book tree, which is really not very useful.
Actually, I strongly doubt that even that second move they explored a most at root (judging by the value of Dirichlet noise they explored 2, max 3 moves) is really not sufficient, and making anti-alpha0 book, or even applying Romichess' learning to SF would soon yield a total destruction of Alpha0 especially since MCTS without rollouts offers very little if any SMP randomness for multi-threads.
It seems to me that Alpha0 selected openings in training that were giving biggest advantage early in the game, and stopped using all others where immediate gain was not observed, practically cutting all closed openings in the beginning.

Milos · Post by **Milos** » Wed Dec 13, 2017 2:50 am

hgm wrote:But that is not just an assumption, we kow that from a simple extrapolation, don't we? In fig. 2a of the AlphaZero paper we see that the Elo of AlphaZero drops below that of Stockfish at ~300ms/move (i.e. 24k AlphaZero nodes), and is some 120 Elo worse at 40ms/move (3.2k nodes). That is 120 Elo for a factor 8 reduction in tree size, and there are early 4 more factors 8 to go before we are at a tree size of a single node. By that time Stockfish' 1000 nodes would be about 600 Elo stronger than AlphaZero's single node. If we assume 70 Elo per doubling for Stockfish, we could reduce its tree size to about 2-3 nodes to get it on par with AlphaZero's single node.

And that makes sense. I don't believe a NN could be much more accurate than QS, when the situation gets complex.

Rarely I would agree with you, but here, I agree completely.
At 40ms/move A0 would do 3200 MCT searches and SF on 64 cores in the starting position (what actually measured in Fig.2) would search 1-2 million nodes/move.
If you scaled time to 12.5us A0 would have only 1 MCT search, i.e. only single eval of NN, and SF would search 100-200 nodes only and difference would probably grow to 200-300Elo in SFs favour.

Daniel Shawul · Post by **Daniel Shawul** » Wed Dec 13, 2017 2:54 pm

syzygy wrote:
hgm wrote:
syzygy wrote:But you seem to assume that AlphaZero's NN isn't worth much, or at least not worth more than 1000 nodes in SF's tree.
But that is not just an assumption, we kow that from a simple extrapolation, don't we? In fig. 2a of the AlphaZero paper we see that the Elo of AlphaZero drops below that of Stockfish at ~300ms/move (i.e. 24k AlphaZero nodes), and is some 120 Elo worse at 40ms/move (3.2k nodes). That is 120 Elo for a factor 8 reduction in tree size, and there are early 4 more factors 8 to go before we are at a tree size of a single node. By that time Stockfish' 1000 nodes would be about 600 Elo stronger than AlphaZero's single node. If we assume 70 Elo per doubling for Stockfish, we could reduce its tree size to about 2-3 nodes to get it on par with AlphaZero's single node.

And that makes sense. I don't believe a NN could be much more accurate than QS, when the situation gets complex.
QS is an interesting point. It seems AlphaZero doesn't care whether the position being expanded and NN-evaluated is a quiet position or not.

But the QS point also shows that an evaluation function only works with a minimum of search. The evaluation function can know a lot about passed pawns, but if a relatively simple combination wins the pawn, that knowledge is not worth much. For the same reason I do not yet rule out that the strength of AlphaZero's evaluation function becomes more apparent as its search takes care of the immediate tactics that its NN cannot grasp.

This is something I originally missed but the NN doesn't have to resolve a lot of tactics. This works because they have a pretty big expansion rate -- compared to previous MCTS implementation -- that will expand a node right after it is visited while previous implementations wait until node is visited atleast 10 times or more. I had the later implementation first where i store the eval score first and return it until again and again it is expanded -- this sounds wasteful but it could help in shaping the tree in a UCB manner. With this nebiyu got trumped by tscp due to missing tactics as a result of its shallow tree. After I made it to expand right after each visit (just like in the paper), it grows its tree quickly and captures a lot of tactics. There is not really a big need to use big shallow search depth searches like 8, which btw is pretty bad if you expand right after each visit. You will waste a lot of time scoring inferior moves with a shallow depth search of 8. With alpha-beta we use just qsearch to make the eval quiet, and i think we can use the same approach in MCTS but we really need to expand faster with just qsearch. Also expand after each visit makes sense because now we have a determinsitic eval unlike in the previous case when rollouts were used.

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess