Sorry, Ronald, none of your quotes to me says clearly there wasn't any adjustment of the NN during the match anymore.syzygy wrote:"fully trained". Adjusting the NN is called "training".peter wrote:Can you or Ronald please show the lines of the paper in which is said, the NN wasn't adjusted anymore through the games?hgm wrote:No adjustment of the N was done during the match.
...
It is all clearly described in the paper.
I have found only this till now:
We evaluated the fully trained instances of AlphaZero against Stockfish, Elmo and the previous version of AlphaGo Zero (trained for 3 days) in chess, shogi and Go respectively, ...
There are many other clear statements:Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.Recently, the AlhpaGo Zero algorithm achieved superhuman performance in the game of Go, by representing Go knowledge using deep convolutional neural networks (22, 28), trained solely by reinforcement learning from games of self-play (29). In this paper, we apply a similar but fully generic algorithm, which we call AlphaZero, ...AlphaZero learns these move probabilities and value estimates entirely from self-play; these are then used to guide its search.The parameters θ of the deep neural network in AlphaZero are trained by self-play reinforcement learning, starting from randomly initialised parameters θ.The games against SF were played by the (fully trained) AlphaZero on a machine with 4 TPUs. A completely different setup than that used for adjusting the neural network parameters.We trained a separate instance of AlphaZero for each game. Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks.When playing SF8 for 100 games, the MCTS performed 80,000 simulations per second during 1 minute for each move.During training, each MCTS used 800 simulations.
Maybe I'm to biased in the meantime by myself, but I'd still like to see the 100 games and have a clear statement about "learning" of A0 by playing against SF or not.
If you're right, which I could well imagine of course too, I still would like to see a rematch under less biased conditions yet.
