ZirconiumX wrote: jdart wrote:
It is has occurred me that the correct model here might be Poisson regression (https://en.wikipedia.org/wiki/Poisson_regression
). If we ran for each position a match of n games, and if the probability of a win is some value p, then the distribution of game results (x2) would be integers that could be modeled as a Poisson distribution and the parameters could be tuned using that method to model the outcomes. I don't know if this is completely valid but it seems plausible. Texel tuning as initially described is the special case of n=1.
This is only mildly related to the topic, but Jon, I think you said that Arasan used a closed-form equation for the derivatives of your eval. Would you mind going over how you calculated those? It confuses me a lot.
For example, even if I take a stupid material only tapered eval with mean-squared error, I get this:
Code: Select all
Let count_p_w, count_n_w etc be the count of pawns, knights etc for white.
Let count_p_b, count_n_b etc be the count of pawns, knights etc for black.
Let phase_p, phase_n etc be the phase weight of pawns, knights, etc for tapered eval.
Let value_p_o, value_n_o etc be the material value of pawns, knights, etc in the opening.
Let value_p_e, value_n_e etc be the material value of pawns, knights, etc in the endgame
value_o = (count_p_w - count_p_b) * value_p_o + (count_n_w - count_n_b) * value_n_o + ...
value_e = (count_p_w - count_p_b) * value_p_e + (count_n_w - count_n_b) * value_n_e + ...
total_phase = 16 * phase_p + 4 * phase_n + ...
phase = (count_p_w + count_p_b) * phase_p + (count_n_w + count_n_b) * phase_n + ...
value = ((phase * value_o) + ((total_phase - phase) * value_e)) / total_phase
sigmoid = 1 / (1 + 10 ** (-K*value))
error = (result - sigmoid) * (result - sigmoid)
And I have no idea how to then differentiate that, because that's a lot of variables.
I'll give you the beginning of an explanation here. If you want to see the whole derivation, look up "backpropagation".
I am going to compute the partial derivative of `error' with respect to each quantity in your formulas, `d_error/d_quantity'. We'll go from the end of the computation.
d_error/d_error = 1
Then we see this:
error = (result - sigmoid)^2
Take the derivative of both sides with respect to `sigmoid' and you get
d_error/d_sigmoid = 2 * (result - sigmoid)
The next one is:
sigmoid = 1 / (1 + 10^(-K*value))
I am interested in computing `d_error/d_value', which by the chain rule is
d_error/d_value = d_error/d_sigmoid * d_sigmoid/d_value
We have already computed `d_error/d_sigmoid' before, and `d_sigmoid/d_value' can be obtained by differentiating `sigmoid = 1 / (1 + 10^(-K*value))' with respect to d_value:
d_sigmoid/d_value = (K * log(10) * 10^(K * value))/(10^(K * value) + 1)^2
If you understand how this last step works, you can keep going back and compute the derivative of the error with respect to everything, in particular with respect to your evaluation parameters.