Alphazero news

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Alphazero news

Post by Gian-Carlo Pascutto »

matthewlai wrote: Fri Dec 07, 2018 12:49 pm During training, we do softmax sampling by visit count up to move 30. There is no value cutoff. Temperature is 1.
This is a rather important difference and will explain a lot about Leela Chess Zero's endgame problems.

Thanks for clarifying some of these things. The 0..1 vs -1..1 range thing is a bit funny. I interpreted the paper as 0..1 initially because that's what older MCTS papers used, then people pointed out that the AZ papers work on a -1..1 range and we changed things. And now it turns out the original version was what AZ had after all.
Yes, all values are initialized to loss value.
Were other settings ever considered, notably 0.5 or parent?
Javier Ros
Posts: 200
Joined: Fri Oct 12, 2012 12:48 pm
Location: Seville (SPAIN)
Full name: Javier Ros

Re: Alphazero news

Post by Javier Ros »

matthewlai wrote: Fri Dec 07, 2018 5:20 pm
My sincere congratulations to the DeepMind team, because after half a century of alpha-beta algorithm their new approach has revolutionized computer chess and created authentic artworks in their games against Stockfish.

Javier Ros

Associate Professor of Applied Mathematics at the University of Seville (Spain).
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Alphazero news

Post by jp »

crem wrote: Fri Dec 07, 2018 1:02 pm Whether it's -1 to 1 or 0 to 1 is also important to Cpuct scaling (or C(s) in the latest version of the paper). Do c_base and c_init values assume that Q range is -1..1 or 0..1?
Apart from the range, how different is AZ's C(s) from what Lc0 uses?
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Alphazero news

Post by jp »

Gian-Carlo Pascutto wrote: Fri Dec 07, 2018 8:48 pm
matthewlai wrote: Fri Dec 07, 2018 12:49 pm During training, we do softmax sampling by visit count up to move 30. There is no value cutoff. Temperature is 1.
This is a rather important difference and will explain a lot about Leela Chess Zero's endgame problems.
How?
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Alphazero news

Post by Gian-Carlo Pascutto »

jp wrote: Fri Dec 07, 2018 9:22 pm How?
One of Leela's problems is thinking theoretically drawn endgames can be won. This happens because during the training there is an intentional non-zero chance of "blundering" and in such endgames eventually a blunder will cause the side with the advantage to win.

The blundering was implemented for the whole game because the paper says AZ works like that, but it was now clarified this was actually only done during the first 30 moves.
main line
Posts: 60
Joined: Thu Jul 07, 2016 10:15 pm

Re: Alphazero news

Post by main line »

Jouni wrote: Fri Dec 07, 2018 2:00 pm I only looked so far for TCEC opening games. AO seems to be sometimes like patzer and loses in 22 moves to outdated SF :o .

[pgn] [Event "Computer Match"] [Site "London, UK"] [Date "2018.01.18"] [Round "255"] [White "Stockfish 8"] [Black "AlphaZero"] [Result "1-0"] [PlyCount "43"] [EventDate "2018.??.??"] 1. e4 {book} e6 {book} 2. d4 {book} d5 {book} 3. Nc3 {book} Nf6 {book} 4. Bg5 { book} Be7 {book} 5. e5 {book} Nfd7 {book} 6. h4 {book} Bxg5 {book} 7. hxg5 { book} Qxg5 {book} 8. Nh3 {book} Qe7 {book} 9. Qg4 g6 10. Ng5 h6 11. O-O-O Nc6 12. Nb5 Nb6 13. Rd3 h5 14. Rf3 a6 15. Qg3 Nd8 16. Nc3 Nd7 17. Bd3 Nf8 18. Rh4 Rg8 19. Bc4 Qd7 20. Nce4 dxe4 21. Nxe4 Nh7 22. Rxh5 1-0 [/pgn]
And what is this: (Who is patzer here)

http://view.chessbase.com/cbreader/2018 ... 26031.html
jhellis3
Posts: 546
Joined: Sat Aug 17, 2013 12:36 am

Re: Alphazero news

Post by jhellis3 »

Well, I am most amused..... :D
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

Gian-Carlo Pascutto wrote: Fri Dec 07, 2018 8:48 pm Were other settings ever considered, notably 0.5 or parent?
Yes and 0 seems to work best. Assumption being that most positions have 1 or at most a few good moves. All other moves are akin to passing or worse. In most equal-ish positions, passing will give the opponent a big advantage.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Alphazero news

Post by Gian-Carlo Pascutto »

matthewlai wrote: Fri Dec 07, 2018 10:24 pm
Gian-Carlo Pascutto wrote: Fri Dec 07, 2018 8:48 pm Were other settings ever considered, notably 0.5 or parent?
Yes and 0 seems to work best. Assumption being that most positions have 1 or at most a few good moves. All other moves are akin to passing or worse. In most equal-ish positions, passing will give the opponent a big advantage.
I think this ends up explaining why FPU reductions as implemented by both LZ and lc0 work though :-)
glennsamuel32
Posts: 136
Joined: Sat Dec 04, 2010 5:31 pm
Location: 223

Re: Alphazero news

Post by glennsamuel32 »

Hello Matthew, nice to see you back after so long !!

Does this mean Giraffe will get some updates in the future ? :D
Judge without bias, or don't judge at all...