TalkChess.com

Posted: **Sun Oct 28, 2018 4:35 pm**

https://www.google.com/search?client=sa ... 8&oe=UTF-8

Are there others?

Anyone know the relative advantages and disadvantages?

Posted: **Sun Oct 28, 2018 4:46 pm**

This was discussed (among other topics) quite recently in this thread: http://talkchess.com/forum3/viewtopic.php?f=7&t=68326.

--Jon

Posted: **Sun Oct 28, 2018 4:54 pm**

To answer your question more directly though:

Nelder-Mead simplex is a "black box" optimization method, which means it does not rely on the gradient of the function being available. Therefore it is commonly used for nonlinear functions (but even in that domain it is not the best method).

For evaluation functions, mostly they are linear functions of the eval parameters, so you can compute the gradient. If you know the gradient, generally you can use methods that are faster/more efficient than black box methods.

The standard method in machine learning for very large data sets is Stochastic Gradient Descent or some variant of it.

There is also a family of so-called batch gradient algorithms that do multiple iterative passes over the whole data set. I use one of these called ADAM.

--Jon

Posted: **Tue Oct 30, 2018 3:51 am**

Think of this:

The static position eval is of the form:
v = Sigma Ci x Fi where C are coefficients and F are features.

Now compare that to the equation for an SLP (single layer perceptron = a neural network with only one weight layer):
O = Sigma Wi x Ii where W are weights and I are inputs.

These equations are the same and thus all chess programs have an SLP in them. Therefore, methods available
for training neural networks are open for chess programs. One interesting method is TDL (temporal difference learning)
that can be done via self-play, opponent play or even training positions. The nice thing about TDL is that you don't need an oracle
because it is based on win, draw lose.

I suggest reading papers written by Gerald Tessauro.

Posted: **Tue Oct 30, 2018 5:34 am**

Pretty sure your Temporal Difference Learning is implemented incorrectly (and highly inefficient) if you are only propagating WDL back.

TDL's "oracle" should be the engines own search()

The idea being to push the static evaluation of the current position towards the score returned by a search of it.

If you dont see why this is TDL as often explained, consider that at some point previously the current position was a leaf node and its score was the current static evaluation.

W.R.T. this stuff all using similar maths, that is because all of this stuff can be described as operations on vectors. The differences manifest not at the pairwise multiplications, but in the summations. A traditional static evaluator uses a straight summation of the products, while ML techniques will frequently shape this sum somehow (sigmoid, relu, etc..) and more advanced information-theory approaches may be summing the squares of those products.

Posted: **Tue Oct 30, 2018 11:42 pm**

DustyMonkey wrote: ↑Tue Oct 30, 2018 5:34 am Pretty sure your Temporal Difference Learning is implemented incorrectly (and highly inefficient) if you are only propagating WDL back.

TDL's "oracle" should be the engines own search()

The idea being to push the static evaluation of the current position towards the score returned by a search of it.

If you dont see why this is TDL as often explained, consider that at some point previously the current position was a leaf node and its score was the current static evaluation.

I think you are confusing TDL with Reinforcement Learning. RL is as you describe. The key difference with TDL is that there is the assumption that moves made near the end of the game have more to do with the outcome than moves near the beginning. Ex. In a 120 move chess game the first move c4 has little to nothing to do with losing. Of course, a weak player can make a critically bad move on move 2
(for example the classic fools mate). I am referring to far better played game than that.

I think the Leela team would be better off with TDL instead of RL.

DustyMonkey wrote: ↑Tue Oct 30, 2018 5:34 am W.R.T. this stuff all using similar maths, that is because all of this stuff can be described as operations on vectors. The differences manifest not at the pairwise multiplications, but in the summations. A traditional static evaluator uses a straight summation of the products, while ML techniques will frequently shape this sum somehow (sigmoid, relu, etc..) and more advanced information-theory approaches may be summing the squares of those products.

Here you are talking about neuron activation functions and they are typically sigmoid or relu. However, they
can be linear - if so then that is the same as a chess programs method.

The static eval in a chess program (a good one) is a sophisticated and highly tuned expert system with
the math of a SLP with a linear activation function. Of course, some programs have some nonlinear features in them.

TalkChess.com

methods for tuning coefficients

methods for tuning coefficients

Re: methods for tuning coefficients

Re: methods for tuning coefficients

Re: methods for tuning coefficients

Re: methods for tuning coefficients

Re: methods for tuning coefficients