Alphazero news

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

sovaz1997 wrote: Fri Dec 07, 2018 12:07 am
matthewlai wrote: Fri Dec 07, 2018 12:00 am
Vinvin wrote: Thu Dec 06, 2018 11:26 pm
Stockfish 8 ???
They really fear real Stockfish ...
Unfortunately we weren't able to get our time machine up and running before we finished writing the paper in order to test against engines that hadn't been released yet. But don't worry, we are working hard on that!
Do you have results against modern versions? Against Stockfish 9, for example?
We did play against a development version of Stockfish that was very close to SF 9 (we finished testing just before SF 9 was released), and the result was very similar to results against SF 8. It's in Fig 2 here - http://science.sciencemag.org/content/362/6419/1140

Elo differences from new versions are always greater between two versions of the same (or very similar) programs, and they don't usually translate to equal improvement against other opponents.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

mwyoung wrote: Fri Dec 07, 2018 12:29 am They played a later version of stockfish. A January version but they did not give the score other then A0 won.
The score is given (in the form of a bar graph) in Fig 2. here: http://science.sciencemag.org/content/362/6419/1140
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

Daniel Shawul wrote: Fri Dec 07, 2018 12:35 am
matthewlai wrote: Fri Dec 07, 2018 12:00 am
Vinvin wrote: Thu Dec 06, 2018 11:26 pm
Stockfish 8 ???
They really fear real Stockfish ...
Unfortunately we weren't able to get our time machine up and running before we finished writing the paper in order to test against engines that hadn't been released yet. But don't worry, we are working hard on that!
While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily. And who knows, maybe someone will come up with a better implementation than we did! I am personally really excited about what Leela has already achieved, and hopefully it won't be long before it will be at the level of AlphaZero!
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

duncan wrote: Fri Dec 07, 2018 12:38 am
matthewlai wrote: Fri Dec 07, 2018 12:00 am
Vinvin wrote: Thu Dec 06, 2018 11:26 pm
Stockfish 8 ???
They really fear real Stockfish ...
Unfortunately we weren't able to get our time machine up and running before we finished writing the paper in order to test against engines that hadn't been released yet. But don't worry, we are working hard on that!
Do you have an opinion on whether alpha zero would beat the latest version of stockfish ?
I don't think it would be really useful for me to speculate on that.

Suffice to say, AlphaZero today is not the same as AlphaZero in the paper either.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

duncan wrote: Fri Dec 07, 2018 12:42 am
matthewlai wrote: Fri Dec 07, 2018 12:00 am
Unfortunately we weren't able to get our time machine up and running before we finished writing the paper in order to test against engines that hadn't been released yet. But don't worry, we are working hard on that!
I am not sure why you have to wait for your time machine. Just send the time machine you are going to create back in time to now and you can use it :)
Doesn't that violate some laws of physics or something? :D
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
IanO
Posts: 496
Joined: Wed Mar 08, 2006 9:45 pm
Location: Portland, OR

Re: Alphazero news

Post by IanO »

Even more exciting: they released the full game scores of the hundred game matches for all three games, chess, shogi, and go!

https://deepmind.com/research/alphago/a ... resources/

I've been looking at some of the shogi games (hand-selected by Habu, the Kasparov of shogi), and they are utterly impenetrable. All known joseki (openings) and king-safety principles are thrown out the window! In some of these games, the king doesn't just sit undeveloped in the center but does the chess equivalent of heading out to the middle of the board in the middle game before coming back to the corner for safety and then winning. Astounding!

The Deep Mind blog post is here: https://deepmind.com/blog/alphazero-she ... gi-and-go/
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Alphazero news

Post by jp »

matthewlai wrote: Fri Dec 07, 2018 2:15 am
Daniel Shawul wrote: Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

jp wrote: Fri Dec 07, 2018 2:20 am
matthewlai wrote: Fri Dec 07, 2018 2:15 am
Daniel Shawul wrote: Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?
They are all in the pseudo-code in supplementary materials.

Code: Select all

class AlphaZeroConfig(object):

  def __init__(self):
    ### Self-Play
    self.num_actors = 5000

    self.num_sampling_moves = 30
    self.max_moves = 512  # for chess and shogi, 722 for Go.
    self.num_simulations = 800

    # Root prior exploration noise.
    self.root_dirichlet_alpha = 0.3  # for chess, 0.03 for Go and 0.15 for shogi.
    self.root_exploration_fraction = 0.25

    # UCB formula
    self.pb_c_base = 19652
    self.pb_c_init = 1.25

    ### Training
    self.training_steps = int(700e3)
    self.checkpoint_interval = int(1e3)
    self.window_size = int(1e6)
    self.batch_size = 4096

    self.weight_decay = 1e-4
    self.momentum = 0.9
    # Schedule for chess and shogi, Go starts at 2e-2 immediately.
    self.learning_rate_schedule = {
        0: 2e-1,
        100e3: 2e-2,
        300e3: 2e-3,
        500e3: 2e-4
    }
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
clumma
Posts: 186
Joined: Fri Oct 10, 2014 10:05 pm
Location: Berkeley, CA

Re: Alphazero news

Post by clumma »

Matthew: Congrats on the phenomenal success.

Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn

Thank you!

-Carl
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

noobpwnftw wrote: Fri Dec 07, 2018 1:09 am I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?

Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?
180' + 15" is the time control for Season 9 Superfinal.

In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.