Troubles with Texel Tuning

Daniel Shawul · Post by **Daniel Shawul** » Fri Sep 29, 2017 12:15 am

AlvaroBegue wrote:
Yes. Just download the repository here: https://bitbucket.org/alonamaloh/ruy_tune

The file sample/tune.cpp contains an example that finds the material values. The file sample/evaluation_parameters has the initial guesses, which are all 0.

Awesome! Just tried this

Code: Select all

daniel@daniel-Satellite-C855&#58;~/tune/ruy_tune/sample$ time ./tune
Iteration 1&#58; fx=0.254122 xnorm=225.906 gnorm=0.00107094 step=16849.2
Iteration 2&#58; fx=0.226642 xnorm=308.495 gnorm=0.00129121 step=1
Iteration 3&#58; fx=0.190256 xnorm=305.099 gnorm=0.000439781 step=1
Iteration 4&#58; fx=0.17945 xnorm=335.221 gnorm=0.000254876 step=1
Iteration 5&#58; fx=0.160943 xnorm=430.828 gnorm=0.000239747 step=1
Iteration 6&#58; fx=0.143313 xnorm=577.106 gnorm=0.000139124 step=1
Iteration 7&#58; fx=0.134633 xnorm=758.539 gnorm=0.000118312 step=1
Iteration 8&#58; fx=0.129293 xnorm=924.942 gnorm=0.000109973 step=1
Iteration 9&#58; fx=0.126691 xnorm=1060.07 gnorm=2.79885e-05 step=1
Iteration 10&#58; fx=0.126251 xnorm=1124.05 gnorm=1.54913e-05 step=1
Iteration 11&#58; fx=0.125993 xnorm=1200.32 gnorm=6.48393e-06 step=1
Iteration 12&#58; fx=0.125989 xnorm=1202.11 gnorm=3.96651e-06 step=0.0918372
Iteration 13&#58; fx=0.125976 xnorm=1220.09 gnorm=1.36075e-06 step=1
Iteration 14&#58; fx=0.125975 xnorm=1220.76 gnorm=2.78424e-07 step=1
Iteration 15&#58; fx=0.125975 xnorm=1220.57 gnorm=7.5212e-08 step=0.28461
L-BFGS optimization terminated with status code = 0

real	0m53.938s
user	0m53.788s
sys	0m0.152s

I don't know why last time i tried to optimize piece values by regression i thought it was hopeless.

Btw, have you considered using radial basis functions (RBF) to reduce the number of objective function calls ? Automatic differentiation is fast but RBF serves the same purpose with a black box objective that does not give the gradient.

Daniel

Evert · Post by **Evert** » Fri Sep 29, 2017 12:33 am

AndrewGrant wrote:I think I implemented SGD properly. All of my piece values make relative sense, ie knight ~= bishop, queen > rook, pawn < rest.

But I got rather low values for my pawns. ~60 when only tuning the piece values, and ~40 when tuning piece values + PSQT. Also, my knight in the midgame is worth as much as a rook (by piece value)

How likely is it that these values being low or too high, is just a result of the tuner compensating for me over valuing certain pawn features? Maybe my rook bonuses are too high in the midgame, so my rook value is lowered?

Either way, I plan on opening almost every param to the tuner tonight, and we shall see.

I had something similar, I think my MG pawn value was 0.70, and I had MG rook ~ minor+pawn for a long time as well. To get end-game values correct, I needed (at least) a value for passed pawns. The MG rook value didn't improve until I added things like mobility and bonuses for piece placement.
One thing to bear in mind is that you never actually get to see the MG rook value: as soon as you get a material imbalance, you will blend in some of the EG value, which should be (much) higher than a minor piece, so the exchange will actually be worth a bit more than the MG rook/minor difference.

jdart · Post by **jdart** » Fri Sep 29, 2017 2:43 am

You only need a global optimizer if you believe there are multiple local minima.

I actually do not tune any piece values. They are all constant. PSTs however can effectively adjust the values.

--Jon

jdart · Post by **jdart** » Fri Sep 29, 2017 4:54 am

Btw, have you considered using radial basis functions (RBF) to reduce the number of objective function calls ? Automatic differentiation is fast but RBF serves the same purpose with a black box objective that does not give the gradient.

RBF is a very good global optimization method but it is not really designed for high-dimensional problems (30 variables is a lot for RBF). Gradient descent on the other hand can handle large numbers of variables. It can also readily be parallelized. Furthermore if you have a linear function (many eval terms in chess are typically linearly related to the eval result) you can actually compute the gradient and you do not have to approximate it.

--Jon

AlvaroBegue · Post by **AlvaroBegue** » Fri Sep 29, 2017 5:04 am

jdart wrote:
Btw, have you considered using radial basis functions (RBF) to reduce the number of objective function calls ? Automatic differentiation is fast but RBF serves the same purpose with a black box objective that does not give the gradient.
RBF is a very good global optimization method but it is not really designed for high-dimensional problems (30 variables is a lot for RBF). Gradient descent on the other hand can handle large numbers of variables. It can also readily be parallelized. Furthermore if you have a linear function (many eval terms in chess are typically linearly related to the eval result) you can actually compute the gradient and you do not have to approximate it.

--Jon

You can [nearly] always compute the gradient. Reverse mode automatic differentiation is the technique that enables that. It is implemented in RuyTune in the file include/autodiff.hpp . In the context of neural networks it's usually called "backpropagation". But it can be used for much more general functions than neural networks.

Daniel Shawul · Post by **Daniel Shawul** » Fri Sep 29, 2017 6:03 am

jdart wrote:
Btw, have you considered using radial basis functions (RBF) to reduce the number of objective function calls ? Automatic differentiation is fast but RBF serves the same purpose with a black box objective that does not give the gradient.
RBF is a very good global optimization method but it is not really designed for high-dimensional problems (30 variables is a lot for RBF). Gradient descent on the other hand can handle large numbers of variables. It can also readily be parallelized. Furthermore if you have a linear function (many eval terms in chess are typically linearly related to the eval result) you can actually compute the gradient and you do not have to approximate it.

--Jon

I think you misunderstood me.

RBF or neural network is used only to develop a surrogate model (response surface) of the black box objective. After you build the model, you use it to approximate the costly objective function and ofcourse its derivative (which is why i suggested it as alternative). This is very important in engineering optimization because your objective function is actually a costly laboratory experiment! Also, this method can handle missing data points via interpolation on the built response surface.

It can be used with any gradient descent, BFGS, DFP etc as well as global optimiztion methods. I remeber years ago in class an instructor gave us a black box objective ( a windows exe) that spits numbers, and with RBF sampling 50 points we were able to plot a response surface that was nearly identical with the actual objective function. And the optimization methods were a ton faster when they were used with the surrogate model (i guess the cost of exectuting the black box exe made it significantly slower than using the response surface)

Daniel

jdart · Post by **jdart** » Fri Sep 29, 2017 3:59 pm

No, I am very familiar with the theory and I am aware you can combine a surrogate model with other methods.

I have actually used such a method for tuning non-eval parameters such as pruning margins.

But for eval tuning, computing the objective is not really expensive. Furthermore the function does not have be treated as a black box, because typically most if not all of the terms are linear. So there is no point building a surrogate model when you have access to the real model, both its values and its gradient (and 2nd derivative if you want that). And it is generally safe to assume the problem is convex and is without multiple minima. In this situation gradient-based methods are both fast and accurate in finding the minimum.

Daniel Shawul · Post by **Daniel Shawul** » Fri Sep 29, 2017 7:39 pm

jdart wrote: But for eval tuning, computing the objective is not really expensive. Furthermore the function does not have be treated as a black box, because typically most if not all of the terms are linear. So there is no point building a surrogate model when you have access to the real model, both its values and its gradient (and 2nd derivative if you want that).

The surrogate model is for the mean-squared-error over thousands of positions, not for the result of one evaluation call (which is linear as you said). The MSE over many positions, on the other hand, is very much non-linear. To compute the gradient, you would have to call f(x) atleast twice. With Alvaro's database it takes 5 sec to compute the MSE (one call), and if i do a 1-ply search on all positions it takes minutes, so it is pretty expensive.

@Alavaro, it seems about 3% of the positions (about 30000 positions) in your database are <= mate_in_1. There are even some stalemate/mated positions with no moves to make. I think this should be removed since they have nothing to do with the material on the board.

AlvaroBegue · Post by **AlvaroBegue** » Fri Sep 29, 2017 7:42 pm

Daniel Shawul wrote:[
@Alavaro, it seems about 3% of the positions (about 30000 positions) in your database are <= mate_in_1. There are even some stalemate/mated positions with no moves to make. I think this should be removed since they have nothing to do with the material but just a lucky placement.

There should be no positions where the king is in check. I'm not sure if the others are a problem: The evaluation function does encounter positions like these, apparently. So I don't know on what grounds they should be removed.

Daniel Shawul · Post by **Daniel Shawul** » Fri Sep 29, 2017 7:58 pm

AlvaroBegue wrote:
Daniel Shawul wrote:[
@Alavaro, it seems about 3% of the positions (about 30000 positions) in your database are <= mate_in_1. There are even some stalemate/mated positions with no moves to make. I think this should be removed since they have nothing to do with the material but just a lucky placement.
There should be no positions where the king is in check. I'm not sure if the others are a problem: The evaluation function does encounter positions like these, apparently. So I don't know on what grounds they should be removed.

Stalemates (a draw = 0.5) with inferior material count should be bad for the tuner. The first 10 are

1. No moves in Position 26472 Fen: 8/8/8/4bk2/5p1p/7P/1p3q2/7K w - -
2. No moves in Position 27385 Fen: 8/7k/5Q1N/8/5PP1/6K1/6P1/8 b - -
3. No moves in Position 51566 Fen: 8/8/5Q2/7k/8/6K1/8/8 b - -
4. No moves in Position 60333 Fen: 8/1p1b2pk/p6p/P6P/7K/8/6r1/8 w - -
5. No moves in Position 66519 Fen: 8/6p1/6k1/7p/5p1P/3p1P1K/4r3/8 w - -
6. No moves in Position 66578 Fen: 8/8/5q2/7K/8/5k2/8/8 w - -
7. No moves in Position 72318 Fen: 8/p5p1/1p1k4/3P4/7p/7P/r5PK/5q2 w - -
8. No moves in Position 73913 Fen: 8/2P5/P2R3p/7k/5Pp1/P5B1/3K3P/8 b - -
9. No moves in Position 76410 Fen: 6R1/8/8/8/4K3/8/8/6Bk b - -

[board]8/8/8/4bk2/5p1p/7P/1p3q2/7K w - -[/board]

Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning