LCZero update (2)

Rein Halbersma · Post by **Rein Halbersma** » Sun Mar 25, 2018 3:08 pm

[Moderation] This thread was split off from the original LCZero update thread ( http://talkchess.com/forum/viewtopic.ph ... &start=260 ), and meant to continue the discussion, because the other was getting unmanageably long.

lucasart wrote: Sorry if that's a dumb newbie question. I'm not familiar at all with NN.

But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning

The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).

The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.

Uri Blass · Post by **Uri Blass** » Sun Mar 25, 2018 8:08 pm

Rein Halbersma wrote:
lucasart wrote: Sorry if that's a dumb newbie question. I'm not familiar at all with NN.

But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).

The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.

if bias is a difference from perfect play then random play has a bigger bias than top human level.

Fixing the errors that random play has seems to me an harder task than fixing the errors of top humans.

David Xu · Post by **David Xu** » Sun Mar 25, 2018 9:31 pm

Uri Blass wrote:
Rein Halbersma wrote:
lucasart wrote: Sorry if that's a dumb newbie question. I'm not familiar at all with NN.

But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).

The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.
if bias is a difference from perfect play then random play has a bigger bias than top human level.

Fixing the errors that random play has seems to me an harder task than fixing the errors of top humans.

"Bias" does not mean the same thing as "imperfection". A bias is a systematic difference from imperfection, i.e. a difference that has a directional component. This is pertinent because it could cause the network to get stuck in a local minimum some distance away from the global optimum.

CMCanavessi · Post by **CMCanavessi** » Sun Mar 25, 2018 9:41 pm

In the end, TSCP proved to still be too much for poor Leela Gen 20, even at 40/40. The match ended 8-2 in favor of TSCP.

I'll upload the full pgn in a minute

CMCanavessi · Post by **CMCanavessi** » Sun Mar 25, 2018 9:57 pm

Here are the 10 games: http://www.mediafire.com/file/yydmm8u5j ... nchmark.7z

Werewolf · Post by **Werewolf** » Sun Mar 25, 2018 11:33 pm

Sorry to hear that Carlos. But your engine will get stronger and catch up

Milos · Post by **Milos** » Sun Mar 25, 2018 11:55 pm

CMCanavessi wrote:In the end, TSCP proved to still be too much for poor Leela Gen 20, even at 40/40. The match ended 8-2 in favor of TSCP.

I'll upload the full pgn in a minute

TSCP is 1700Elo on CCRL40/4. Error margins are huge, but 8:2 score is at least 200Elo difference for TSCP.
That means LeelaZero is more than 2000 Elo behind SF9.
Still extremely long way to go.

JJJ · Post by **JJJ** » Mon Mar 26, 2018 1:02 am

I can still win against it at 200 ms per move, but I wouldn't try against more time control, I wouldn't stand a chance.

Jhoravi · Post by **Jhoravi** » Mon Mar 26, 2018 5:44 am

CMCanavessi wrote:Here are the 10 games: http://www.mediafire.com/file/yydmm8u5j ... nchmark.7z

Thanks. But how are you able to make LCZero vary its opening when it doesn't have opening book?

David Xu · Post by **David Xu** » Mon Mar 26, 2018 5:49 am

LCZero possesses a "-noise" command that applies Dirichlet noise to its move selection, thereby causing randomness in its play.

LCZero update (2)

LCZero update (2)

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update