LCZero update (2)

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Rein Halbersma
Posts: 749
Joined: Tue May 22, 2007 11:13 am

LCZero update (2)

Post by Rein Halbersma »

[Moderation] This thread was split off from the original LCZero update thread ( http://talkchess.com/forum/viewtopic.ph ... &start=260 ), and meant to continue the discussion, because the other was getting unmanageably long.
lucasart wrote: Sorry if that's a dumb newbie question. I'm not familiar at all with NN.

But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).

The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.
Uri Blass
Posts: 10798
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: LCZero update

Post by Uri Blass »

Rein Halbersma wrote:
lucasart wrote: Sorry if that's a dumb newbie question. I'm not familiar at all with NN.

But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).

The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.
if bias is a difference from perfect play then random play has a bigger bias than top human level.

Fixing the errors that random play has seems to me an harder task than fixing the errors of top humans.
David Xu
Posts: 47
Joined: Mon Oct 31, 2016 9:45 pm

Re: LCZero update

Post by David Xu »

Uri Blass wrote:
Rein Halbersma wrote:
lucasart wrote: Sorry if that's a dumb newbie question. I'm not familiar at all with NN.

But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).

The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.
if bias is a difference from perfect play then random play has a bigger bias than top human level.

Fixing the errors that random play has seems to me an harder task than fixing the errors of top humans.
"Bias" does not mean the same thing as "imperfection". A bias is a systematic difference from imperfection, i.e. a difference that has a directional component. This is pertinent because it could cause the network to get stuck in a local minimum some distance away from the global optimum.
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: LCZero update

Post by CMCanavessi »

In the end, TSCP proved to still be too much for poor Leela Gen 20, even at 40/40. The match ended 8-2 in favor of TSCP.

I'll upload the full pgn in a minute
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: LCZero update

Post by CMCanavessi »

Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
Werewolf
Posts: 1996
Joined: Thu Sep 18, 2008 10:24 pm

Re: LCZero update

Post by Werewolf »

Sorry to hear that Carlos. But your engine will get stronger and catch up
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: LCZero update

Post by Milos »

CMCanavessi wrote:In the end, TSCP proved to still be too much for poor Leela Gen 20, even at 40/40. The match ended 8-2 in favor of TSCP.

I'll upload the full pgn in a minute
TSCP is 1700Elo on CCRL40/4. Error margins are huge, but 8:2 score is at least 200Elo difference for TSCP.
That means LeelaZero is more than 2000 Elo behind SF9.
Still extremely long way to go.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: LCZero update

Post by JJJ »

I can still win against it at 200 ms per move, but I wouldn't try against more time control, I wouldn't stand a chance.
Jhoravi
Posts: 291
Joined: Wed May 08, 2013 6:49 am

Re: LCZero update

Post by Jhoravi »

CMCanavessi wrote:Here are the 10 games: http://www.mediafire.com/file/yydmm8u5j ... nchmark.7z
Thanks. But how are you able to make LCZero vary its opening when it doesn't have opening book?
David Xu
Posts: 47
Joined: Mon Oct 31, 2016 9:45 pm

Re: LCZero update

Post by David Xu »

LCZero possesses a "-noise" command that applies Dirichlet noise to its move selection, thereby causing randomness in its play.