Page 1 of 1

methods for tuning coefficients

Posted: Sun Oct 28, 2018 4:35 pm
by smcracraft
https://www.google.com/search?client=sa ... 8&oe=UTF-8

Are there others?

Anyone know the relative advantages and disadvantages?

Re: methods for tuning coefficients

Posted: Sun Oct 28, 2018 4:46 pm
by jdart
This was discussed (among other topics) quite recently in this thread: http://talkchess.com/forum3/viewtopic.php?f=7&t=68326.

--Jon

Re: methods for tuning coefficients

Posted: Sun Oct 28, 2018 4:54 pm
by jdart
To answer your question more directly though:

Nelder-Mead simplex is a "black box" optimization method, which means it does not rely on the gradient of the function being available. Therefore it is commonly used for nonlinear functions (but even in that domain it is not the best method).

For evaluation functions, mostly they are linear functions of the eval parameters, so you can compute the gradient. If you know the gradient, generally you can use methods that are faster/more efficient than black box methods.

The standard method in machine learning for very large data sets is Stochastic Gradient Descent or some variant of it.

There is also a family of so-called batch gradient algorithms that do multiple iterative passes over the whole data set. I use one of these called ADAM.

--Jon

Re: methods for tuning coefficients

Posted: Tue Oct 30, 2018 3:51 am
by CRoberson
Think of this:

The static position eval is of the form:
v = Sigma Ci x Fi where C are coefficients and F are features.

Now compare that to the equation for an SLP (single layer perceptron = a neural network with only one weight layer):
O = Sigma Wi x Ii where W are weights and I are inputs.

These equations are the same and thus all chess programs have an SLP in them. Therefore, methods available
for training neural networks are open for chess programs. One interesting method is TDL (temporal difference learning)
that can be done via self-play, opponent play or even training positions. The nice thing about TDL is that you don't need an oracle
because it is based on win, draw lose.

I suggest reading papers written by Gerald Tessauro.

Re: methods for tuning coefficients

Posted: Tue Oct 30, 2018 5:34 am
by DustyMonkey
Pretty sure your Temporal Difference Learning is implemented incorrectly (and highly inefficient) if you are only propagating WDL back.

TDL's "oracle" should be the engines own search()

The idea being to push the static evaluation of the current position towards the score returned by a search of it.

If you dont see why this is TDL as often explained, consider that at some point previously the current position was a leaf node and its score was the current static evaluation.

W.R.T. this stuff all using similar maths, that is because all of this stuff can be described as operations on vectors. The differences manifest not at the pairwise multiplications, but in the summations. A traditional static evaluator uses a straight summation of the products, while ML techniques will frequently shape this sum somehow (sigmoid, relu, etc..) and more advanced information-theory approaches may be summing the squares of those products.

Re: methods for tuning coefficients

Posted: Tue Oct 30, 2018 11:42 pm
by CRoberson
DustyMonkey wrote: Tue Oct 30, 2018 5:34 am Pretty sure your Temporal Difference Learning is implemented incorrectly (and highly inefficient) if you are only propagating WDL back.

TDL's "oracle" should be the engines own search()

The idea being to push the static evaluation of the current position towards the score returned by a search of it.

If you dont see why this is TDL as often explained, consider that at some point previously the current position was a leaf node and its score was the current static evaluation.
I think you are confusing TDL with Reinforcement Learning. RL is as you describe. The key difference with TDL is that there is the assumption that moves made near the end of the game have more to do with the outcome than moves near the beginning. Ex. In a 120 move chess game the first move c4 has little to nothing to do with losing. Of course, a weak player can make a critically bad move on move 2
(for example the classic fools mate). I am referring to far better played game than that.

I think the Leela team would be better off with TDL instead of RL.
DustyMonkey wrote: Tue Oct 30, 2018 5:34 am W.R.T. this stuff all using similar maths, that is because all of this stuff can be described as operations on vectors. The differences manifest not at the pairwise multiplications, but in the summations. A traditional static evaluator uses a straight summation of the products, while ML techniques will frequently shape this sum somehow (sigmoid, relu, etc..) and more advanced information-theory approaches may be summing the squares of those products.
Here you are talking about neuron activation functions and they are typically sigmoid or relu. However, they
can be linear - if so then that is the same as a chess programs method.

The static eval in a chess program (a good one) is a sophisticated and highly tuned expert system with
the math of a SLP with a linear activation function. Of course, some programs have some nonlinear features in them.