Alphazero news

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

crem
Posts: 177
Joined: Wed May 23, 2018 9:29 pm

Re: Alphazero news

Post by crem »

matthewlai wrote: Fri Dec 07, 2018 12:49 pm All our values are between 0 and 1, so if the best move has a value of 0.8, we would sample from all moves with values >= 0.79.
The paper says: "At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for a draw, and +1 for a win.".

So it's not like that? It's 0 for loss, 1 for win and 0.5 for a draw?
Also paper says that initial Q=0 (and pseudocode also says "if self.visit_count == 0: return 0"). Does it mean that it's initialized to "loss" value?


Whether it's -1 to 1 or 0 to 1 is also important to Cpuct scaling (or C(s) in the latest version of the paper). Do c_base and c_init values assume that Q range is -1..1 or 0..1?
USGroup1
Posts: 33
Joined: Sun Oct 14, 2018 7:01 pm
Full name: Sina Vaziri

Re: Alphazero news

Post by USGroup1 »

OneTrickPony wrote: Fri Dec 07, 2018 12:13 pm ...What is important is that the games show that there are paths in chess which SF is still unable to understand and the losses are very different than just playing against another alpha-beta engine on 4x or 10x the hardware.
That does not mean we can't improve alpa-beta engines so they could handle those paths efficiently.
Jouni
Posts: 3293
Joined: Wed Mar 08, 2006 8:15 pm

Re: Alphazero news

Post by Jouni »

I only looked so far for TCEC opening games. AO seems to be sometimes like patzer and loses in 22 moves to outdated SF :o .

[pgn] [Event "Computer Match"] [Site "London, UK"] [Date "2018.01.18"] [Round "255"] [White "Stockfish 8"] [Black "AlphaZero"] [Result "1-0"] [PlyCount "43"] [EventDate "2018.??.??"] 1. e4 {book} e6 {book} 2. d4 {book} d5 {book} 3. Nc3 {book} Nf6 {book} 4. Bg5 { book} Be7 {book} 5. e5 {book} Nfd7 {book} 6. h4 {book} Bxg5 {book} 7. hxg5 { book} Qxg5 {book} 8. Nh3 {book} Qe7 {book} 9. Qg4 g6 10. Ng5 h6 11. O-O-O Nc6 12. Nb5 Nb6 13. Rd3 h5 14. Rf3 a6 15. Qg3 Nd8 16. Nc3 Nd7 17. Bd3 Nf8 18. Rh4 Rg8 19. Bc4 Qd7 20. Nce4 dxe4 21. Nxe4 Nh7 22. Rxh5 1-0 [/pgn]
Jouni
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Alphazero news

Post by noobpwnftw »

Jouni wrote: Fri Dec 07, 2018 2:00 pm I only looked so far for TCEC opening games. AO seems to be sometimes like patzer and loses in 22 moves to outdated SF :o .
g4 with a queen, can't defend. :D
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Alphazero news

Post by Laskos »

OneTrickPony wrote: Fri Dec 07, 2018 12:13 pm
I am not convinced the newest SF would win against it. The ELO is calculated against a pool of similar engines. It's not clear if 50 or 100 ELO more against this pool is equal to 50-100 ELO more against an opponent of a different type.
While that's true, Lc0 with the best nets on my powerful GPU and average CPU ("Leela Ratio" of say 2.5) beats heavily SF8, but loses slightly to SF10, from regular openings. Against SF8, it's similar to what happens in this paper. My guess is that this particular "old" A0 in those TCEC conditions is somewhat weaker than SF10.
Lc0 needs a "Leela Ratio" of 2.5 to have similar results to A0 ("Leela Ratio" 1 by definition), so Lc0 (with the best nets) is still lagging pretty significantly behind A0.
In some games it becomes apparent that they are fairly similar in playing style, strengths and weaknesses.
Damir
Posts: 2802
Joined: Mon Feb 11, 2008 3:53 pm
Location: Denmark
Full name: Damir Desevac

Re: Alphazero news

Post by Damir »

Where can we see all 1000 games that were played ?
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

crem wrote: Fri Dec 07, 2018 1:02 pm
matthewlai wrote: Fri Dec 07, 2018 12:49 pm All our values are between 0 and 1, so if the best move has a value of 0.8, we would sample from all moves with values >= 0.79.
The paper says: "At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for a draw, and +1 for a win.".

So it's not like that? It's 0 for loss, 1 for win and 0.5 for a draw?
Also paper says that initial Q=0 (and pseudocode also says "if self.visit_count == 0: return 0"). Does it mean that it's initialized to "loss" value?


Whether it's -1 to 1 or 0 to 1 is also important to Cpuct scaling (or C(s) in the latest version of the paper). Do c_base and c_init values assume that Q range is -1..1 or 0..1?
All the values in the search are [0, 1]. We store them as [-1, 1] only for network training, to have training targets centered around 0. At play time, when network evaluations come back, we shift them to [0, 1] before doing anything with them.

Yes, all values are initialized to loss value.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
trulses
Posts: 39
Joined: Wed Dec 06, 2017 5:34 pm

Re: Alphazero news

Post by trulses »

matthewlai wrote: Fri Dec 07, 2018 4:50 pm [...]
Since you're here in these posts just wanted to say congrats on the publication and thanks for the pseudocode.

Is most of the code for alphazero python code, or is the pseudocode transcribed from a different language like C++?
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

trulses wrote: Fri Dec 07, 2018 5:09 pm
matthewlai wrote: Fri Dec 07, 2018 4:50 pm [...]
Since you're here in these posts just wanted to say congrats on the publication and thanks for the pseudocode.

Is most of the code for alphazero python code, or is the pseudocode transcribed from a different language like C++?
Thanks!

AlphaZero is mostly in C++. Network training code is in Python (Tensorflow). Network inference is through C++ Tensorflow API.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Alphazero news

Post by Gian-Carlo Pascutto »

matthewlai wrote: Fri Dec 07, 2018 12:49 pm During training, we do softmax sampling by visit count up to move 30. There is no value cutoff. Temperature is 1.
This is a rather important difference and will explain a lot about Leela Chess Zero's endgame problems.

Thanks for clarifying some of these things. The 0..1 vs -1..1 range thing is a bit funny. I interpreted the paper as 0..1 initially because that's what older MCTS papers used, then people pointed out that the AZ papers work on a -1..1 range and we changed things. And now it turns out the original version was what AZ had after all.
Yes, all values are initialized to loss value.
Were other settings ever considered, notably 0.5 or parent?