Stockfish 8 ???
They really fear real Stockfish ...
Unfortunately we weren't able to get our time machine up and running before we finished writing the paper in order to test against engines that hadn't been released yet. But don't worry, we are working hard on that!
Do you have results against modern versions? Against Stockfish 9, for example?
We did play against a development version of Stockfish that was very close to SF 9 (we finished testing just before SF 9 was released), and the result was very similar to results against SF 8. It's in Fig 2 here - http://science.sciencemag.org/content/362/6419/1140
Elo differences from new versions are always greater between two versions of the same (or very similar) programs, and they don't usually translate to equal improvement against other opponents.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
mwyoung wrote: ↑Fri Dec 07, 2018 12:29 am
They played a later version of stockfish. A January version but they did not give the score other then A0 won.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Stockfish 8 ???
They really fear real Stockfish ...
Unfortunately we weren't able to get our time machine up and running before we finished writing the paper in order to test against engines that hadn't been released yet. But don't worry, we are working hard on that!
While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily. And who knows, maybe someone will come up with a better implementation than we did! I am personally really excited about what Leela has already achieved, and hopefully it won't be long before it will be at the level of AlphaZero!
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Stockfish 8 ???
They really fear real Stockfish ...
Unfortunately we weren't able to get our time machine up and running before we finished writing the paper in order to test against engines that hadn't been released yet. But don't worry, we are working hard on that!
Do you have an opinion on whether alpha zero would beat the latest version of stockfish ?
I don't think it would be really useful for me to speculate on that.
Suffice to say, AlphaZero today is not the same as AlphaZero in the paper either.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai wrote: ↑Fri Dec 07, 2018 12:00 am
Unfortunately we weren't able to get our time machine up and running before we finished writing the paper in order to test against engines that hadn't been released yet. But don't worry, we are working hard on that!
I am not sure why you have to wait for your time machine. Just send the time machine you are going to create back in time to now and you can use it
Doesn't that violate some laws of physics or something?
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
I've been looking at some of the shogi games (hand-selected by Habu, the Kasparov of shogi), and they are utterly impenetrable. All known joseki (openings) and king-safety principles are thrown out the window! In some of these games, the king doesn't just sit undeveloped in the center but does the chess equivalent of heading out to the middle of the board in the middle game before coming back to the corner for safety and then winning. Astounding!
Daniel Shawul wrote: ↑Fri Dec 07, 2018 12:35 am
While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?
Daniel Shawul wrote: ↑Fri Dec 07, 2018 12:35 am
While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?
They are all in the pseudo-code in supplementary materials.
class AlphaZeroConfig(object):
def __init__(self):
### Self-Play
self.num_actors = 5000
self.num_sampling_moves = 30
self.max_moves = 512 # for chess and shogi, 722 for Go.
self.num_simulations = 800
# Root prior exploration noise.
self.root_dirichlet_alpha = 0.3 # for chess, 0.03 for Go and 0.15 for shogi.
self.root_exploration_fraction = 0.25
# UCB formula
self.pb_c_base = 19652
self.pb_c_init = 1.25
### Training
self.training_steps = int(700e3)
self.checkpoint_interval = int(1e3)
self.window_size = int(1e6)
self.batch_size = 4096
self.weight_decay = 1e-4
self.momentum = 0.9
# Schedule for chess and shogi, Go starts at 2e-2 immediately.
self.learning_rate_schedule = {
0: 2e-1,
100e3: 2e-2,
300e3: 2e-3,
500e3: 2e-4
}
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn
noobpwnftw wrote: ↑Fri Dec 07, 2018 1:09 am
I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?
Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?
180' + 15" is the time control for Season 9 Superfinal.
In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.