It applies in both match play and training. If you search for it in the pseudo-code you can see how it's used. I haven't looked at lc0 code so I don't know what it corresponds to.yanquis1972 wrote: ↑Fri Dec 07, 2018 5:06 amI just read that as code for training, would the 1.25 value apply to match play and does it correlate to lc0s search variable?matthewlai wrote: ↑Fri Dec 07, 2018 2:24 amThey are all in the pseudo-code in supplementary materials.jp wrote: ↑Fri Dec 07, 2018 2:20 amWhat were the best values/functions for CPUCT used for playing & training?matthewlai wrote: ↑Fri Dec 07, 2018 2:15 amThat would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.Daniel Shawul wrote: ↑Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
Code: Select all
class AlphaZeroConfig(object): def __init__(self): ### Self-Play self.num_actors = 5000 self.num_sampling_moves = 30 self.max_moves = 512 # for chess and shogi, 722 for Go. self.num_simulations = 800 # Root prior exploration noise. self.root_dirichlet_alpha = 0.3 # for chess, 0.03 for Go and 0.15 for shogi. self.root_exploration_fraction = 0.25 # UCB formula self.pb_c_base = 19652 self.pb_c_init = 1.25 ### Training self.training_steps = int(700e3) self.checkpoint_interval = int(1e3) self.window_size = int(1e6) self.batch_size = 4096 self.weight_decay = 1e-4 self.momentum = 0.9 # Schedule for chess and shogi, Go starts at 2e-2 immediately. self.learning_rate_schedule = { 0: 2e-1, 100e3: 2e-2, 300e3: 2e-3, 500e3: 2e-4 }
Alphazero news
Moderator: Ras
-
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Alphazero news
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Alphazero news
AlphaZero got 44 cores just because that's the machine they ran on. The games were run on a 44 cores + 4 1st gen TPU machines, no pondering.mwyoung wrote: ↑Fri Dec 07, 2018 9:34 amI was told gen 3. But it did not say in the information posted. Here is what was posted on the site.
For the games themselves, Stockfish used 44 CPU (central processing unit) cores and AlphaZero used a single machine with four TPUs and 44 CPU cores. Stockfish had a hash size of 32GB and used syzygy endgame tablebases.
BTW: I need to buy a 2080 ti since 2 TPUs are equal one 2080 ti.
So I guess you could say Stockfish got 4 TPUs, too, but it would be a bit cheesy to say since SF cannot make use of them.
AlphaZero is not CPU-bound. Most of the cores are idle during play.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
- Posts: 159
- Joined: Tue Apr 30, 2013 1:29 am
Re: Alphazero news
Going through the games it's clear Alpha Zero exposes a weakness in SF's long term planning. It gets very promising positions with long term pressure on regular basis which it is sometimes able to convert. It seems to be weaker tactically, maybe a stronger engine would be able to get even more wins.
I am not convinced the newest SF would win against it. The ELO is calculated against a pool of similar engines. It's not clear if 50 or 100 ELO more against this pool is equal to 50-100 ELO more against an opponent of a different type.
Due to architectural differences and difficulty of coming up with a definition of fair hardware conditions I think it's not very important if Alpha Zero on 4TPUs is stronger than the newest SF on 44 cores or w/e. What is important is that the games show that there are paths in chess which SF is still unable to understand and the losses are very different than just playing against another alpha-beta engine on 4x or 10x the hardware.
I also feel those games, even drawn ones are more interesting than average top GM game. It starts to look like very human risk aversion causes so many quick draws and uneventful games, not theoretical limitations of chess.
I am not convinced the newest SF would win against it. The ELO is calculated against a pool of similar engines. It's not clear if 50 or 100 ELO more against this pool is equal to 50-100 ELO more against an opponent of a different type.
Due to architectural differences and difficulty of coming up with a definition of fair hardware conditions I think it's not very important if Alpha Zero on 4TPUs is stronger than the newest SF on 44 cores or w/e. What is important is that the games show that there are paths in chess which SF is still unable to understand and the losses are very different than just playing against another alpha-beta engine on 4x or 10x the hardware.
I also feel those games, even drawn ones are more interesting than average top GM game. It starts to look like very human risk aversion causes so many quick draws and uneventful games, not theoretical limitations of chess.
-
- Posts: 186
- Joined: Wed May 23, 2018 9:29 pm
Re: Alphazero news
The paper says that during the training moves are selected "in proportion to the root visit count", without mention that it happens only for first 30 plies (so I assume it happens for the entire game).
However, in the training pseudocode it looks like there is a temperature cutoff at ply 30:
matthewlai, would it be possible to clarify whether temperature cutoff was used during training or not?
There is temperature cutoff in description of the play versus stockfish though (not during training), so maybe that's what was meant in pseudocode:
"by softmax sampling with a temperature of 10.0 among moves for which the value was no more than 1% away from the best move for the first 30 plies".
That raises some more questions though (not as important as training question though):
- "Softmax sampling with temperature 10" is a bit ambiguous, my best guess is that that means "proportional to exp(N / 10)".
- Values not more than 1% away. Is value of Q or value of N? (I guess it's Q?)
- If it's Q, what does "1% away" mean? It is just 1% of Q range (i.e. 0.02, as Q is from -1 to 1, e.g. if Q for best move is -0.015, then moves with Q >= -0.035 are taken)?
Or it's a relative percentage? E.g. if Q=(-0.015), then nodes with Q >= (-0.01515) are sampled? (doesn't look correct)
However, in the training pseudocode it looks like there is a temperature cutoff at ply 30:
Code: Select all
def select_action(config: AlphaZeroConfig, game: Game, root: Node):
visit_counts = [(child.visit_count, action)
for action, child in root.children.iteritems()]
if len(game.history) < config.num_sampling_moves:
_, action = softmax_sample(visit_counts)
else:
_, action = max(visit_counts)
return action
There is temperature cutoff in description of the play versus stockfish though (not during training), so maybe that's what was meant in pseudocode:
"by softmax sampling with a temperature of 10.0 among moves for which the value was no more than 1% away from the best move for the first 30 plies".
That raises some more questions though (not as important as training question though):
- "Softmax sampling with temperature 10" is a bit ambiguous, my best guess is that that means "proportional to exp(N / 10)".
- Values not more than 1% away. Is value of Q or value of N? (I guess it's Q?)
- If it's Q, what does "1% away" mean? It is just 1% of Q range (i.e. 0.02, as Q is from -1 to 1, e.g. if Q for best move is -0.015, then moves with Q >= -0.035 are taken)?
Or it's a relative percentage? E.g. if Q=(-0.015), then nodes with Q >= (-0.01515) are sampled? (doesn't look correct)
-
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Alphazero news
During training, we do softmax sampling by visit count up to move 30. There is no value cutoff. Temperature is 1.crem wrote: ↑Fri Dec 07, 2018 12:34 pm The paper says that during the training moves are selected "in proportion to the root visit count", without mention that it happens only for first 30 plies (so I assume it happens for the entire game).
However, in the training pseudocode it looks like there is a temperature cutoff at ply 30:
matthewlai, would it be possible to clarify whether temperature cutoff was used during training or not?Code: Select all
def select_action(config: AlphaZeroConfig, game: Game, root: Node): visit_counts = [(child.visit_count, action) for action, child in root.children.iteritems()] if len(game.history) < config.num_sampling_moves: _, action = softmax_sample(visit_counts) else: _, action = max(visit_counts) return action
There is temperature cutoff in description of the play versus stockfish though (not during training), so maybe that's what was meant in pseudocode:
"by softmax sampling with a temperature of 10.0 among moves for which the value was no more than 1% away from the best move for the first 30 plies".
That raises some more questions though (not as important as training question though):
- "Softmax sampling with temperature 10" is a bit ambiguous, my best guess is that that means "proportional to exp(N / 10)".
- Values not more than 1% away. Is value of Q or value of N? (I guess it's Q?)
- If it's Q, what does "1% away" mean? It is just 1% of Q range (i.e. 0.02, as Q is from -1 to 1, e.g. if Q for best move is -0.015, then moves with Q >= -0.035 are taken)?
Or it's a relative percentage? E.g. if Q=(-0.015), then nodes with Q >= (-0.01515) are sampled? (doesn't look correct)
In those games against SF to increase diversity (this is the only place we used softmax sampling in normal gameplay), we did the same but with a higher temperature, and only consider moves within 1% value.
Definition of temperature is the standard definition in softmax - exp(N / 10) is correct.
By 1% we mean in absolute value. All our values are between 0 and 1, so if the best move has a value of 0.8, we would sample from all moves with values >= 0.79.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
- Posts: 18
- Joined: Thu Apr 10, 2014 5:20 pm
Re: Alphazero news
OK what we know :
1) Stockfish is the best engine in the world
2) LC0 guys did manage to reverse engineer A0 successfully
3) LC0 and A0 roughly at the same strength
4) NN are not less resource hungry than Alpha Beta
5) Scalability is about the same in both methods
6) Google has unacceptable behaviour, hiding data, obfuscating opponents and hyping results
1) Stockfish is the best engine in the world
2) LC0 guys did manage to reverse engineer A0 successfully
3) LC0 and A0 roughly at the same strength
4) NN are not less resource hungry than Alpha Beta
5) Scalability is about the same in both methods
6) Google has unacceptable behaviour, hiding data, obfuscating opponents and hyping results
-
- Posts: 186
- Joined: Wed May 23, 2018 9:29 pm
Re: Alphazero news
The paper says: "At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for a draw, and +1 for a win.".matthewlai wrote: ↑Fri Dec 07, 2018 12:49 pm All our values are between 0 and 1, so if the best move has a value of 0.8, we would sample from all moves with values >= 0.79.
So it's not like that? It's 0 for loss, 1 for win and 0.5 for a draw?
Also paper says that initial Q=0 (and pseudocode also says "if self.visit_count == 0: return 0"). Does it mean that it's initialized to "loss" value?
Whether it's -1 to 1 or 0 to 1 is also important to Cpuct scaling (or C(s) in the latest version of the paper). Do c_base and c_init values assume that Q range is -1..1 or 0..1?
-
- Posts: 33
- Joined: Sun Oct 14, 2018 7:01 pm
- Full name: Sina Vaziri
Re: Alphazero news
That does not mean we can't improve alpa-beta engines so they could handle those paths efficiently.OneTrickPony wrote: ↑Fri Dec 07, 2018 12:13 pm ...What is important is that the games show that there are paths in chess which SF is still unable to understand and the losses are very different than just playing against another alpha-beta engine on 4x or 10x the hardware.
-
- Posts: 3617
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Alphazero news
I only looked so far for TCEC opening games. AO seems to be sometimes like patzer and loses in 22 moves to outdated SF
.

Jouni
-
- Posts: 694
- Joined: Sun Nov 08, 2015 11:10 pm
- Full name: Bojun Guo