tuning for the uninformed

zenpawn · Post by **zenpawn** » Sat Dec 02, 2017 6:14 pm

jdart wrote:Texel tuning is basically a form of logistic regression. There is a large literature on this, and a lot of explanatory material online. As initially described it is slightly unorthodox because it is using a squared-error loss function to handle the 3 categories of win, loss, draw. But the general principle is still the same.

--Jon

If this was in response to my post directly above it, I have Texel tuning implemented using the local optimization function given as pseudo-code on the CPW, but it sounds like there are ways to get closer first via "gradient descent or gauss-newton". Unfortunately, what I've read of these so far was over my head.

Regarding the 3-categories (win, loss, draw) being unusual, does that mean using an oracle eval would actually be more appropriate? Would it just plugin where the result (Ri) goes or does it have to be converted to a -1 to 1 range first (using the sigmoid function?)?

jdart · Post by **jdart** » Sat Dec 02, 2017 6:56 pm

The theory of logistic regression with discrete outcomes (dependent variables) deals mainly with three cases:
1. binary (0 or 1) outcomes
2. ordinal outcomes (such as poor, average, good).
3. discrete but unordered outcomes.

None of these commonly use a squared error distance to measure goodness of fit, as the Texel method does.

However, our problem is basically 2, but with a twist: there are three possible values for a game (0, 0.5 or 1), but the values are meaningful in the sense that 0.5 is really equidistant between 0 and 1. So in that case the Texel method may be a reasonable approach, it is just not as theoretically grounded.

As for using the eval as an oracle: yes, you can do this. Your label is then the oracle's value and you regress to find the best match between predicted and the oracle value. All this changes in the whole procedure is the "loss function" that measures goodness of fit. You could use mean absolute or squared difference between predicted and oracle eval, for example.

Yet another approach is to make predicted moves based on your eval match actually played moves by an "oracle" such as a strong program or a strong human player. This approach has been used in Shogi - there is a paper by Hoki and Kaneko.

But realize that then you are really solving a different problem. The regression will make your eval conform to the oracle's eval. In effect, this is a fancy way of reverse engineering the oracle's eval. It may or may not make your eval predict game results better, or actually play better, but if the oracle is much stronger then it likely will cause improvement.

--Jon

jdart · Post by **jdart** » Sat Dec 02, 2017 9:09 pm

It is has occurred me that the correct model here might be Poisson regression (https://en.wikipedia.org/wiki/Poisson_regression). If we ran for each position a match of n games, and if the probability of a win is some value p, then the distribution of game results (x2) would be integers that could be modeled as a Poisson distribution and the parameters could be tuned using that method to model the outcomes. I don't know if this is completely valid but it seems plausible. Texel tuning as initially described is the special case of n=1.

--Jon

ZirconiumX · Post by **ZirconiumX** » Sun Dec 03, 2017 11:09 am

jdart wrote:It is has occurred me that the correct model here might be Poisson regression (https://en.wikipedia.org/wiki/Poisson_regression). If we ran for each position a match of n games, and if the probability of a win is some value p, then the distribution of game results (x2) would be integers that could be modeled as a Poisson distribution and the parameters could be tuned using that method to model the outcomes. I don't know if this is completely valid but it seems plausible. Texel tuning as initially described is the special case of n=1.

--Jon

This is only mildly related to the topic, but Jon, I think you said that Arasan used a closed-form equation for the derivatives of your eval. Would you mind going over how you calculated those? It confuses me a lot.

For example, even if I take a stupid material only tapered eval with mean-squared error, I get this:

Code: Select all

Let count_p_w, count_n_w etc be the count of pawns, knights etc for white.
Let count_p_b, count_n_b etc be the count of pawns, knights etc for black.
Let phase_p, phase_n etc be the phase weight of pawns, knights, etc for tapered eval.
Let value_p_o, value_n_o etc be the material value of pawns, knights, etc in the opening.
Let value_p_e, value_n_e etc be the material value of pawns, knights, etc in the endgame

value_o = &#40;count_p_w - count_p_b&#41; * value_p_o + &#40;count_n_w - count_n_b&#41; * value_n_o + ...
value_e = &#40;count_p_w - count_p_b&#41; * value_p_e + &#40;count_n_w - count_n_b&#41; * value_n_e + ...
total_phase = 16 * phase_p + 4 * phase_n + ...
phase = &#40;count_p_w + count_p_b&#41; * phase_p + &#40;count_n_w + count_n_b&#41; * phase_n + ...
value = (&#40;phase * value_o&#41; + (&#40;total_phase - phase&#41; * value_e&#41;) / total_phase
sigmoid = 1 / &#40;1 + 10 ** (-K*value&#41;)
error = &#40;result - sigmoid&#41; * &#40;result - sigmoid&#41;

And I have no idea how to then differentiate that, because that's a lot of variables.

AlvaroBegue · Post by **AlvaroBegue** » Sun Dec 03, 2017 2:03 pm

ZirconiumX wrote:
jdart wrote:It is has occurred me that the correct model here might be Poisson regression (https://en.wikipedia.org/wiki/Poisson_regression). If we ran for each position a match of n games, and if the probability of a win is some value p, then the distribution of game results (x2) would be integers that could be modeled as a Poisson distribution and the parameters could be tuned using that method to model the outcomes. I don't know if this is completely valid but it seems plausible. Texel tuning as initially described is the special case of n=1.

--Jon
This is only mildly related to the topic, but Jon, I think you said that Arasan used a closed-form equation for the derivatives of your eval. Would you mind going over how you calculated those? It confuses me a lot.

For example, even if I take a stupid material only tapered eval with mean-squared error, I get this:
Code: Select all
Let count_p_w, count_n_w etc be the count of pawns, knights etc for white.
Let count_p_b, count_n_b etc be the count of pawns, knights etc for black.
Let phase_p, phase_n etc be the phase weight of pawns, knights, etc for tapered eval.
Let value_p_o, value_n_o etc be the material value of pawns, knights, etc in the opening.
Let value_p_e, value_n_e etc be the material value of pawns, knights, etc in the endgame

value_o = &#40;count_p_w - count_p_b&#41; * value_p_o + &#40;count_n_w - count_n_b&#41; * value_n_o + ...
value_e = &#40;count_p_w - count_p_b&#41; * value_p_e + &#40;count_n_w - count_n_b&#41; * value_n_e + ...
total_phase = 16 * phase_p + 4 * phase_n + ...
phase = &#40;count_p_w + count_p_b&#41; * phase_p + &#40;count_n_w + count_n_b&#41; * phase_n + ...
value = (&#40;phase * value_o&#41; + (&#40;total_phase - phase&#41; * value_e&#41;) / total_phase
sigmoid = 1 / &#40;1 + 10 ** (-K*value&#41;)
error = &#40;result - sigmoid&#41; * &#40;result - sigmoid&#41;
And I have no idea how to then differentiate that, because that's a lot of variables.

I'll give you the beginning of an explanation here. If you want to see the whole derivation, look up "backpropagation".

I am going to compute the partial derivative of `error' with respect to each quantity in your formulas, `d_error/d_quantity'. We'll go from the end of the computation.

d_error/d_error = 1

Then we see this:
error = (result - sigmoid)^2

Take the derivative of both sides with respect to `sigmoid' and you get

d_error/d_sigmoid = 2 * (result - sigmoid)

The next one is:
sigmoid = 1 / (1 + 10^(-K*value))

I am interested in computing `d_error/d_value', which by the chain rule is

d_error/d_value = d_error/d_sigmoid * d_sigmoid/d_value

We have already computed `d_error/d_sigmoid' before, and `d_sigmoid/d_value' can be obtained by differentiating `sigmoid = 1 / (1 + 10^(-K*value))' with respect to d_value:

d_sigmoid/d_value = (K * log(10) * 10^(K * value))/(10^(K * value) + 1)^2

If you understand how this last step works, you can keep going back and compute the derivative of the error with respect to everything, in particular with respect to your evaluation parameters.

jdart · Post by **jdart** » Sun Dec 03, 2017 4:47 pm

If your calculus is rusty, you can use this online tool:

https://www.derivative-calculator.net/

I handle midgame/endgame weighting as follows: each parameter has an entry in an array that is a structure and includes how it should be scaled.

Each time a gradient is updated, I call a scale method that applies the scaling factor for the associated parameter.

See tune.cpp and specifically the scale method here:

https://github.com/jdart1/arasan-chess

--Jon

fierz · Post by **fierz** » Tue Dec 12, 2017 10:17 pm

Is there anyone who has a large epd file for download somewhere with the results of the games?

cheers
Martin

AlvaroBegue · Post by **AlvaroBegue** » Tue Dec 12, 2017 11:47 pm

fierz wrote:Is there anyone who has a large epd file for download somewhere with the results of the games?

Give this a try: https://bitbucket.org/alonamaloh/ruy_tu ... th_results

jdart · Post by **jdart** » Thu Dec 14, 2017 4:09 pm

I have done something similar.

It has occurred to me, though, that the current tuning method would work just the same if each position had a winning probability, not just a 0, 0.5, 1 result. So if you really want to burn a lot of CPU cycles, you could run a match of say, 10 games with each position. The result would be a training set with more accurate labels.

--Jon

AlvaroBegue · Post by **AlvaroBegue** » Thu Dec 14, 2017 4:15 pm

jdart wrote:I have done something similar.

It has occurred to me, though, that the current tuning method would work just the same if each position had a winning probability, not just a 0, 0.5, 1 result. So if you really want to burn a lot of CPU cycles, you could run a match of say, 10 games with each position. The result would be a training set with more accurate labels.

--Jon

I don't have a strong argument for why I think this, but I think if you had 10 times the CPU budget to spare, you would be better off having 10 times as many positions labelled by a single game.

tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed - Poisson?

Re: tuning for the uninformed - Poisson?

Re: tuning for the uninformed - Poisson?

Re: derivatives, scaling

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed