A Simple Alpha(Go) Zero Tutorial

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
BeyondCritics
Posts: 346
Joined: Sat May 05, 2012 12:48 pm
Location: Bergheim

A Simple Alpha(Go) Zero Tutorial

Post by BeyondCritics » Sat Dec 30, 2017 12:31 am


brianr
Posts: 356
Joined: Thu Mar 09, 2006 2:01 pm

Re: A Simple Alpha(Go) Zero Tutorial

Post by brianr » Sat Dec 30, 2017 1:27 pm

Great find. Thank you.

For follow-up, note links (from above):
https://github.com/suragnair/alpha-zero-general

Downloadable paper:
https://github.com/suragnair/alpha-zero ... riteup.pdf

TommyTC
Posts: 18
Joined: Thu Mar 30, 2017 6:52 am

Re: A Simple Alpha(Go) Zero Tutorial

Post by TommyTC » Sun Dec 31, 2017 5:03 pm

"It assumes basic familiarity with machine learning and reinforcement learning concepts, and should be accessible if you understand neural network basics and Monte Carlo Tree Search. "

I guess "simple" is in the mind of the beholder :)

Henk
Posts: 5819
Joined: Mon May 27, 2013 8:31 am

Re: A Simple Alpha(Go) Zero Tutorial

Post by Henk » Wed Jan 03, 2018 3:51 pm

I'm using -1 + 2 / (1 + Exp(-sum) in the output layer to get v(s) value's between [-1, 1] but Exp is now consuming most of all processing time.

Are there faster alternatives ?

Daniel Shawul
Posts: 3757
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: A Simple Alpha(Go) Zero Tutorial

Post by Daniel Shawul » Wed Jan 03, 2018 3:56 pm

ReLU is used after the convolution steps -- which leads to faster convergence and also faster computation time, but you are bound to sigmoid or tanh in the fully connecteld layers

User avatar
hgm
Posts: 23718
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: A Simple Alpha(Go) Zero Tutorial

Post by hgm » Wed Jan 03, 2018 3:59 pm

Henk wrote:I'm using -1 + 2 / (1 + Exp(-sum) in the output layer to get v(s) value's between [-1, 1] but Exp is now consuming most of all processing time.

Are there faster alternatives ?
Just tabulate the function, so that it requires only an array access.

Henk
Posts: 5819
Joined: Mon May 27, 2013 8:31 am

Re: A Simple Alpha(Go) Zero Tutorial

Post by Henk » Wed Jan 03, 2018 4:05 pm

Daniel Shawul wrote:ReLU is used after the convolution steps -- which leads to faster convergence and also faster computation time, but you are bound to sigmoid or tanh in the fully connecteld layers
I haven't started with implementing convolution steps. Might be that these layers make it even much slower. So better not optimize yet.

Henk
Posts: 5819
Joined: Mon May 27, 2013 8:31 am

Re: A Simple Alpha(Go) Zero Tutorial

Post by Henk » Wed Jan 03, 2018 4:07 pm

hgm wrote:
Henk wrote:I'm using -1 + 2 / (1 + Exp(-sum) in the output layer to get v(s) value's between [-1, 1] but Exp is now consuming most of all processing time.

Are there faster alternatives ?
Just tabulate the function, so that it requires only an array access.
Argument is a double. Or you mean (make it discrete) first convert it into an integer and then do a lookup to get an approximation.

User avatar
hgm
Posts: 23718
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: A Simple Alpha(Go) Zero Tutorial

Post by hgm » Wed Jan 03, 2018 4:52 pm

For inference 8-bit integers seem to be enough for cell outputs. This at least is what the Google gen-1 TPUs use. Only for the back-propagation during training a better precision is needed. Of course this means that the weight x output products can be 16 bit, and a number of those will be summed to act as input to the sigmoid layer.

But it should not cause any problems to do a piece-wise linear approximation of the sigmoid. E.g. quantize the input in 256 intervals, and tabulate both the function and its derivative in each interval.

The coursest approximation of the sigmoid would be to just clip f(x) = x at -1 and +1. Even that might work.

Henk
Posts: 5819
Joined: Mon May 27, 2013 8:31 am

Re: A Simple Alpha(Go) Zero Tutorial

Post by Henk » Fri Jan 05, 2018 3:22 pm

In Monte Carlo Tree search I'm using PUCT. But I don't know what would be a reasonable value for exploration constant C ( degree of exploration). Would it be more like 0.9 or 0.1 or something else ?

Post Reply