What does LCzero learn?

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 05, 2018 10:08 pm

Uri Blass wrote:
Evert wrote:What you're asking is how we can extract knowledge from the neural network. The answer is, you can't. Not easily anyway.

You ask the network to evaluate a position, and it spits out a number (or you ask it for a move and it spits out a move). How it got to that result is almost impossible to trace. You'd need to reconstruct the patterns that are encoded in the network and write them in a form that is accessible to humans. Then you need to do that for all patterns in the network, which is infeasible.

Neural networks are a black box. Simple networks can be understood fairly easily, but networks suitable for practical applications have too many connections for a human to keep track of.

Getting information like that out would be really great though. We know AlphaGo plays better Go than a human. How it does this is unknown. It would be great if we could extract the insights it has and translate them into human terms to improve our understanding of the game.

As an aside, artificial neural networks are a great tool that can be trained to perform extremely well in many applications. It's easy to forget that they're also dumb as a box of bricks. Show a human child a single drawn picture of an elephant and it can recognise elephants in other drawings, photographs and in real life. An ANN needs hundreds or thousands of images to learn that.
For now the picture of the value head the network and the policy head is too small to see clearly what is written in it even when I try to increase the size of the screen.

I can understand if there is information that is too much for humans to memorize but the problem is that I do not understand even what type of information it learns(something that at least I can understand even in tablebases when the program simply have a score for every position of 6 pieces or less pieces).

I do not understand basically what is value and policy head.

looking at the article it seems that policy is about probability and the program give probability to every move and update the probabilities based on experience of playing against itself but I do not understand exactly in what type of methods it evaluates probabilities in position it never saw in the past.

Here is a clear explanation of convolutional neural networks that shows how a network is able to identify hand-written digits (0-9).

https://ujjwalkarn.me/2016/08/11/intuit ... -convnets/

There is a nice visualization tool towards the end that shows what the network is doing to identify the number 8.

http://scs.ryerson.ca/~aharley/vis/conv/flat.html

To answer your questions directly:

a) What it learns the weights of the edges connecting the neurons. It is better to think of the perceptron (one neuron) and a simple task like doing a linear regression. An standard evaluation function is one such instance y = sum(w_i * x_i). The weights (w_i) are for example the value of a pawn, knight and the x_i are inputs ,e.g number of pawns and knights. So training a neural network means finiding the best fit or best weights. That is what the weights file of leela-zero has -- ofcourse in that is case it has many neursos, multi-layer, convolutional etc...

b) The policy and value heads are for estimating move probablities (useful for MCTS) and value head is for returning the evaluation of the position. For training (best fitting) you need to calculate a loss function (how much the regression line deviates from the data e.g. mean squared error). Note that alphaGo was using two separate NNs for policy and evaluation but they merged them later where you have a stack of 20 or 40 block convolutional/residual blocks followed by a policy/value head.

The neural network first layers identifies small features like existence of curves, arcs etc with its filters. In a chess board that is 8x8, say you used filter sizes of 3x3. So a convolutional layer slides that 3x3 patter over the board and sees how much it fits a given region (e.g Queen on b6, and King on A8) might be one feature detected by a 3x3 filter. The next layers, with more convolutions, evaluate the interaction of small features that are far apart on the board and so on.

Daniel

What does LCzero learn?

Re: What does LCzero learn?