2-part series that explains a0 better... 12/19/2017 ...

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

pilgrimdan
Posts: 405
Joined: Sat Jul 02, 2011 10:49 pm

2-part series that explains a0 better... 12/19/2017 ...

Post by pilgrimdan »

https://www.chess.com/article/view/how- ... play-chess

How Does AlphaZero Play Chess?

after nine (9) hours and 44 million games of split-personality chess...

AlphaZero had (very possibly) taught itself enough to become the greatest chess player, silicon- or carbon-based, of all time...

An engine using pure MCTS would evaluate a position by generating a number of move sequences (called “playouts”)...

and averaging the final scores (win/draw/loss) that they yield...

AlphaZero creates 800 playouts on each move...

It also augments pure MCTS by preferring moves that it has not tried (much) already...

its really creating semi-random playouts...

Notice that so far there’s absolutely nothing chess-specific in what AlphaZero is doing...

One weakness of MCTS is that since it’s based on creating semi-random playouts...

it can get it completely wrong in tense positions where there is one precise line of optimal play...

---------------------------

didn't see a link to part-2 ...
CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: 2-part series that explains a0 better... 12/19/2017 ...

Post by CheckersGuy »

This is article is a little misleading. There is no random playouts or even semi-random playouts as in the normal mcts tree. Once a leaf node is reached they call the NN and back up the score (no random playouts at all)
pilgrimdan
Posts: 405
Joined: Sat Jul 02, 2011 10:49 pm

Re: 2-part series that explains a0 better... 12/19/2017 ...

Post by pilgrimdan »

CheckersGuy wrote:This is article is a little misleading. There is no random playouts or even semi-random playouts as in the normal mcts tree. Once a leaf node is reached they call the NN and back up the score (no random playouts at all)
okay... you may well be right...

wish i knew more about NN's...

they seem very non intuitive... (at least to me)

I am starting to read a little bit here and there about NN's...

but they are hard for me to understand...
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: 2-part series that explains a0 better... 12/19/2017 ...

Post by hgm »

Note that the NN is just one part of the algorithm, and MCTS is the other. And the NN is conceptually a very easy-to-understand part, while the MCTS is comparatively complex.

A NN is nothing but a collection of connected 'cells' (neurons). The connections are directional, i.e. connect an output of one cell to an input of another. The connections are not completely passive transmissions, but have a 'weight', i.e. they multiply the signal output by one cell with the weight before giving it to the target cell. A cell typically sums all its inputs, and then applies some non-linear operation on this (threshold detection, one or two-sided clipping, doing some non-linear arithmetic on the sum), and uses that as output.

So it is just a collection of cells sending signals to each other. Some cell inputs and some cell outputs are connected to the external world, and act as inputs and outputs of the entire NN. The connections are fixed for a given network, but the weight of the connections can be altered during training.

The AlphaZero network is somewhat similar to the processing of visual information in the human brain: it is organized in layers of cells, each layer having an 8x8 organization, repeating the same set of cells and connection pattern 64 times, once for each board square. E.g. the input part of the network is a layer that for each square has a cell for each (colored) piece type (which will give an output of 1 if a piece of that type is on that square, and 0 otherwise), repeated a few times for previous positions (to make it possible to recognize repetitions), plus a cell identifying the side to move, plus a cell indicating the value of the 50-move counter, plus 4 cells for castling rights. This connects to a second layer, where there are 256 cells for each square, and each cell gets input not just from all the piece-type cells in the corresponding square of the input layer, but also from those cells in the 8 immediately adjacent squares. So each of the 256 cells in the second layer can recognize a different pattern of piece types in the 3x3 area of the board cetered on that square. The same 256 patterns will be recognized in each of the 8x8 'squares' of the second layer (i.e. they use the same weights in their input conections). Such a pattern that repeats for every location in the input field is called a filter in NN jargon.

This then is followed by many more (19 or 39), very similar layers, which recognize patterns in the patterns of the previous layer, or the layer before that (connections skipping one layer). The cells in the last layer are associated with moves. These layers still have 8x8 structure, and can have (for instance) for each square have a cell whose activity indicates the piece you have to move, and a cell for each piece type whose activity indicates that you have to move the corresponding piece to it. (This was a bit simpler in Go, where you just have to indicate the square where the stone should be dropped, so that you just needed a single cell in each square of the output layer.)

The AlphaZero NN also calculated an evaluation, which does not have an 8x8 structure, but is a single output depending on patterns in all the squares. So it also has layers with connections to all cells of the previous layer (not just to neighboring squares), for that purpose.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: 2-part series that explains a0 better... 12/19/2017 ...

Post by corres »

[quote="hgm"]

Note that the NN is just one part of the algorithm, and MCTS is the other. And the NN is conceptually a very easy-to-understand part, while the MCTS is comparatively complex.
A NN is nothing but a collection of connected 'cells' (neurons). The connections are directional, i.e. connect an output of one cell to an input of another. The connections are not completely passive transmissions, but have a 'weight', i.e. they multiply the signal output by one cell with the weight before giving it to the target cell. A cell typically sums all its inputs, and then applies some non-linear operation on this (threshold detection, one or two-sided clipping, doing some non-linear arithmetic on the sum), and uses that as output.
So it is just a collection of cells sending signals to each other. Some cell inputs and some cell outputs are connected to the external world, and act as inputs and outputs of the entire NN. The connections are fixed for a given network, but the [i]weight[/i] of the connections can be altered during training.
The AlphaZero network is somewhat similar to the processing of visual information in the human brain: it is organized in [i]layers[/i] of cells, each layer having an 8x8 organization, repeating the same set of cells and connection pattern 64 times, once for each board square. E.g. the input part of the network is a layer that for each square has a cell for each (colored) piece type (which will give an output of 1 if a piece of that type is on that square, and 0 otherwise), repeated a few times for previous positions (to make it possible to recognize repetitions), plus a cell identifying the side to move, plus a cell indicating the value of the 50-move counter, plus 4 cells for castling rights. This connects to a second layer, where there are 256 cells for each square, and each cell gets input not just from all the piece-type cells in the corresponding square of the input layer, but also from those cells in the 8 immediately adjacent squares. So each of the 256 cells in the second layer can recognize a different pattern of piece types in the 3x3 area of the board cetered on that square. The same 256 patterns will be recognized in each of the 8x8 'squares' of the second layer (i.e. they use the same weights in their input conections). Such a pattern that repeats for every location in the input field is called a [i]filter[/i] in NN jargon.
This then is followed by many more (19 or 39), very similar layers, which recognize patterns in the patterns of the previous layer, or the layer before that (connections skipping one layer). The cells in the last layer are associated with moves. These layers still have 8x8 structure, and can have (for instance) for each square have a cell whose activity indicates the piece you have to move, and a cell for each piece type whose activity indicates that you have to move the corresponding piece to it. (This was a bit simpler in Go, where you just have to indicate the square where the stone should be dropped, so that you just needed a single cell in each square of the output layer.)
The AlphaZero NN also calculated an evaluation, which does not have an 8x8 structure, but is a single output depending on patterns in all the squares. So it also has layers with connections to all cells of the previous layer (not just to neighboring squares), for that purpose.

[/quote]

I think EVERYBODY would have to read your post before they take part in the debate about AlphaZero.
pilgrimdan
Posts: 405
Joined: Sat Jul 02, 2011 10:49 pm

Re: 2-part series that explains a0 better... 12/19/2017 ...

Post by pilgrimdan »

corres wrote:
hgm wrote:
Note that the NN is just one part of the algorithm, and MCTS is the other. And the NN is conceptually a very easy-to-understand part, while the MCTS is comparatively complex.
A NN is nothing but a collection of connected 'cells' (neurons). The connections are directional, i.e. connect an output of one cell to an input of another. The connections are not completely passive transmissions, but have a 'weight', i.e. they multiply the signal output by one cell with the weight before giving it to the target cell. A cell typically sums all its inputs, and then applies some non-linear operation on this (threshold detection, one or two-sided clipping, doing some non-linear arithmetic on the sum), and uses that as output.
So it is just a collection of cells sending signals to each other. Some cell inputs and some cell outputs are connected to the external world, and act as inputs and outputs of the entire NN. The connections are fixed for a given network, but the weight of the connections can be altered during training.
The AlphaZero network is somewhat similar to the processing of visual information in the human brain: it is organized in layers of cells, each layer having an 8x8 organization, repeating the same set of cells and connection pattern 64 times, once for each board square. E.g. the input part of the network is a layer that for each square has a cell for each (colored) piece type (which will give an output of 1 if a piece of that type is on that square, and 0 otherwise), repeated a few times for previous positions (to make it possible to recognize repetitions), plus a cell identifying the side to move, plus a cell indicating the value of the 50-move counter, plus 4 cells for castling rights. This connects to a second layer, where there are 256 cells for each square, and each cell gets input not just from all the piece-type cells in the corresponding square of the input layer, but also from those cells in the 8 immediately adjacent squares. So each of the 256 cells in the second layer can recognize a different pattern of piece types in the 3x3 area of the board cetered on that square. The same 256 patterns will be recognized in each of the 8x8 'squares' of the second layer (i.e. they use the same weights in their input conections). Such a pattern that repeats for every location in the input field is called a filter in NN jargon.
This then is followed by many more (19 or 39), very similar layers, which recognize patterns in the patterns of the previous layer, or the layer before that (connections skipping one layer). The cells in the last layer are associated with moves. These layers still have 8x8 structure, and can have (for instance) for each square have a cell whose activity indicates the piece you have to move, and a cell for each piece type whose activity indicates that you have to move the corresponding piece to it. (This was a bit simpler in Go, where you just have to indicate the square where the stone should be dropped, so that you just needed a single cell in each square of the output layer.)
The AlphaZero NN also calculated an evaluation, which does not have an 8x8 structure, but is a single output depending on patterns in all the squares. So it also has layers with connections to all cells of the previous layer (not just to neighboring squares), for that purpose.
I think EVERYBODY would have to read your post before they take part in the debate about AlphaZero.
okay ... thanks HG ... will need to read that many times over ... and over a span of a couple of days ...

the first reading through ... may have just as well been greek ... so this will take time for it to sink in ...

but thanks for taking the time to post it ...
CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: 2-part series that explains a0 better... 12/19/2017 ...

Post by CheckersGuy »

That sums up neural networks quite well but I would suggest everyone to get into the math of neural networks. The machine learning course by Andrew Ng (I think you can find it on coursera) is quite well and a good introduction.

The term "neural-network" is sometimes a little bit fussy. It's all about math and not even of the complex kind :lol:
pilgrimdan
Posts: 405
Joined: Sat Jul 02, 2011 10:49 pm

Re: 2-part series that explains a0 better... 12/19/2017 ...

Post by pilgrimdan »

CheckersGuy wrote:That sums up neural networks quite well but I would suggest everyone to get into the math of neural networks. The machine learning course by Andrew Ng (I think you can find it on coursera) is quite well and a good introduction.

The term "neural-network" is sometimes a little bit fussy. It's all about math and not even of the complex kind :lol:
corrections to the original article...

'Corrections: AlphaZero creates a number of playouts on each move, not 800. That was during training. And, AlphaZero contented itself with about 0.1 percent of what Stockfish examined per second, not 99.89 percent fewer.'