C++ code for tuning evaluation function parameters

AlvaroBegue · Post by **AlvaroBegue** » Thu Nov 10, 2016 4:35 am

Hi,

I just finished writing two C++ header files that facilitate tuning evaluation parameters by trying to make the score a good predictor of the result of the game. This is similar in nature to Texel's tuning method. It's around 300 lines of code, and the only dependence is libLBFGS.

One of the header files implements reverse-mode automatic differentiation, and the other one uses it to optimize the mean squared error using L-BFGS. You don't need to know what any of that means to use the code: Just look at the example below.

I'll be happy to share the code and/or the database of quiescent positions and results with anyone interested. Just let me know.

The user has to provide a function that takes an EPD string and returns a score in centipawns. The catch is that the function needs to be able to use a score type that is a template parameter, so I can provide a funky type (RT::Variable) that allows me to automatically compute derivatives. If your evaluation function is just a linear combination of features and you set it up to learn the coefficients (like in the example below), this is really just doing logistic regression for you. But the code is more general, and it can tune any real-valued parameters in your evaluation function. It should even be able to train a neural network (which is what I ultimately want to do with this), although at some point performance will become an issue.

I also provide a mechanism to read parameters from a key-value file. The result of the optimization is written to the same file.

Here's a sample training program, which learns material values. This is a toy example, which just counts the material directly from the EPD string, but you would normally have to build a board from the EPD string and then call an evaluation function that has been appropriately turned into a template.

Code: Select all

#include "ruy_tune.hpp"

template <typename Score>
Score evaluate&#40;std&#58;&#58;string const &epd&#41; &#123;
  static Score pawn_value = RT&#58;&#58;parameter<Score>("pawn_value");
  static Score knight_value = RT&#58;&#58;parameter<Score>("knight_value");
  static Score bishop_value = RT&#58;&#58;parameter<Score>("bishop_value");
  static Score rook_value = RT&#58;&#58;parameter<Score>("rook_value");
  static Score queen_value = RT&#58;&#58;parameter<Score>("queen_value");
  
  int count&#91;128&#93; = &#123;0&#125;;
  for &#40;size_t i = 0; epd&#91;i&#93; != ' '; ++i&#41;
    ++count&#91;static_cast<int>&#40;epd&#91;i&#93;)&#93;;
  
  return &#40;count&#91;'P'&#93;-count&#91;'p'&#93;) * pawn_value
    + &#40;count&#91;'N'&#93;-count&#91;'n'&#93;) * knight_value
    + &#40;count&#91;'B'&#93;-count&#91;'b'&#93;) * bishop_value
    + &#40;count&#91;'R'&#93;-count&#91;'r'&#93;) * rook_value
    + &#40;count&#91;'Q'&#93;-count&#91;'q'&#93;) * queen_value;
&#125;

int main&#40;) &#123;
  RT&#58;&#58;train&#40;evaluate<RT&#58;&#58;Variable>, "quiescent_positions_with_results", "evaluation_parameters");
&#125;

The file "quiescent_positions_with_result" looks like this:

Code: Select all

3r4/4k3/8/5p1R/8/1b2PB2/1P6/4K3 b - - 1-0
3nk2r/rp1b2pp/pR3p2/3P4/5Q2/3B1N2/5PPP/5RK1 b k - 1-0
1R6/7p/4k1pB/p1Ppn3/3K3P/8/r7/8 w - - 0-1
3R4/5B1k/2b4p/5p2/1P6/4q3/P4RPP/6K1 b - - 1/2-1/2
8/5kp1/p4n1p/3pK3/1B6/8/8/8 w - - 0-1
3q3k/1br2pp1/1p6/pP1pR1b1/3P4/P2Q2P1/1B5P/5RK1 b - - 1-0
2b1rbk1/1p1n1pp1/3B3p/6q1/2B1P3/2N2P1P/R2Q2P1/6K1 b - - 1/2-1/2
2q3k1/5pp1/p3p2p/1p6/1Q1P4/5PP1/PP2N2P/3R2K1 b - - 1-0
8/7Q/p2p1pp1/4b1k1/6r1/8/P4PP1/3R1RK1 b - - 1-0
rq3rk1/2p2ppp/p2b4/1p1Rp1BQ/4P3/1P5P/1PP2PP1/3R2K1 b - - 1-0
&#91;... 1,336,000 more lines...&#93;

If I set all the initial values to 0, it takes 27 seconds to run to convergence on my [pretty fast] machine. The output to the terminal looks like this:

Code: Select all

Iteration 1&#58; fx=0.383541 xnorm=21 gnorm=0.00130048 step=14292.4
Iteration 2&#58; fx=0.2012 xnorm=271.085 gnorm=0.000403718 step=1
Iteration 3&#58; fx=0.17345 xnorm=384.371 gnorm=0.000388558 step=1
Iteration 4&#58; fx=0.151441 xnorm=505.641 gnorm=0.000204739 step=1
Iteration 5&#58; fx=0.138558 xnorm=643.655 gnorm=8.16831e-05 step=1
Iteration 6&#58; fx=0.131303 xnorm=804.649 gnorm=4.81988e-05 step=1
Iteration 7&#58; fx=0.128105 xnorm=1000.85 gnorm=0.000111498 step=1
Iteration 8&#58; fx=0.126405 xnorm=1109.15 gnorm=3.24418e-05 step=1
Iteration 9&#58; fx=0.126073 xnorm=1176.96 gnorm=1.62823e-05 step=1
Iteration 10&#58; fx=0.125991 xnorm=1214.03 gnorm=4.58934e-06 step=1
Iteration 11&#58; fx=0.125982 xnorm=1229.57 gnorm=3.8447e-06 step=1
Iteration 12&#58; fx=0.125977 xnorm=1227.44 gnorm=2.75394e-06 step=0.478224
Iteration 13&#58; fx=0.125976 xnorm=1224.41 gnorm=1.44869e-06 step=1
Iteration 14&#58; fx=0.125975 xnorm=1220.77 gnorm=3.39401e-07 step=1
Iteration 15&#58; fx=0.125975 xnorm=1220.63 gnorm=1.21549e-07 step=0.474846
L-BFGS optimization terminated with status code = 0

The file "evaluation_parameters" is both an input and an output. After the tuning it looks like this:

Code: Select all

bishop_value 338.668
knight_value 318.914
pawn_value 100.797
queen_value 1001.77
rook_value 509.719

Any interest?

zenpawn · Post by **zenpawn** » Thu Nov 10, 2016 10:41 am

Absolutely. I've been trying CLOP lately without much success, i.e., my hand-tuned parameters still win. Thanks.

mar · Post by **mar** » Thu Nov 10, 2016 10:58 am

AlvaroBegue wrote:Any interest?

Yes!

Sven · Post by **Sven** » Thu Nov 10, 2016 11:20 am

AlvaroBegue wrote:Any interest?

Looks very impressive - yes, I am interested

AlvaroBegue wrote:If I set all the initial values to 0, [...]

The file "evaluation_parameters" is both an input and an output. After the tuning it looks like this:
Code: Select all
bishop_value 338.668
knight_value 318.914
pawn_value 100.797
queen_value 1001.77
rook_value 509.719

May I assume that you initially set pawn_value to 100.0 instead of 0? Otherwise, how would the tuning process accidentally guess a score close to 100?

Joerg Oster · Post by **Joerg Oster** » Thu Nov 10, 2016 11:58 am

AlvaroBegue wrote: Any interest?

Me, too!

AlvaroBegue · Post by **AlvaroBegue** » Thu Nov 10, 2016 1:28 pm

I set up a BitBucket repository with the code, the database and the example: https://bitbucket.org/alonamaloh/ruy_tune

If you don't want to mess with Mercurial, here is a download link for you: https://bitbucket.org/alonamaloh/ruy_tu ... 6ca75b.zip

The compilation command is somewhere in README.md. The organization of the project is rather primitive, but I want to put some version out there right away to get feedback. So please let me know what you think if you try to use it.

AlvaroBegue · Post by **AlvaroBegue** » Thu Nov 10, 2016 1:32 pm

Sven Schüle wrote:
AlvaroBegue wrote:Any interest?
Looks very impressive - yes, I am interested
AlvaroBegue wrote:If I set all the initial values to 0, [...]

The file "evaluation_parameters" is both an input and an output. After the tuning it looks like this:
Code: Select all
bishop_value 338.668
knight_value 318.914
pawn_value 100.797
queen_value 1001.77
rook_value 509.719
May I assume that you initially set pawn_value to 100.0 instead of 0? Otherwise, how would the tuning process accidentally guess a score close to 100?

No, there is an internal parameter in the code that is used for converting the score to an expected result in [-1,1]. The formula is tanh(0.0043*score_in_centipawns). If you are familiar with Texel's tuning method, this is equivalent to fixing the constant K instead of asking the user to compute one.

If you find that the scale of your evaluation isn't right after tuning with my code, please let me know. I could change the constant, I could make it a parameter, or I could even learn it together with the rest of the tuning, in such a way that the overall scale of the evaluation scores doesn't change. Let me know if you feel strongly about this.

jdart · Post by **jdart** » Thu Nov 10, 2016 4:02 pm

You might also consider applying regularization when optimizing. Arasan uses L2, which gives a small penalty for large score values. You can also consider Lasso regularization (L1) which will encourage zeroing out parameters that do not make a significant difference. These can be combined (linear weighting).

Generally L-BFGS and the like are used when you can explicitly compute gradients, rather than estimating them. If you don't have gradients and are treating the eval function like a black box then there are more efficient methods. But it sounds like what you are doing is fast enough.

--Jon

AlvaroBegue · Post by **AlvaroBegue** » Thu Nov 10, 2016 4:43 pm

jdart wrote:You might also consider applying regularization when optimizing. Arasan uses L2, which gives a small penalty for large score values. You can also consider Lasso regularization (L1) which will encourage zeroing out parameters that do not make a significant difference. These can be combined (linear weighting).

True. I will work on adding this feature.

Generally L-BFGS and the like are used when you can explicitly compute gradients, rather than estimating them. If you don't have gradients and are treating the eval function like a black box then there are more efficient methods. But it sounds like what you are doing is fast enough.

In case it's not entirely clear, I am computing gradients, although instead of doing it explicitly, I am using automatic differentiation.

Joerg Oster · Post by **Joerg Oster** » Sat Nov 12, 2016 3:10 pm

AlvaroBegue wrote:I set up a BitBucket repository with the code, the database and the example: https://bitbucket.org/alonamaloh/ruy_tune

If you don't want to mess with Mercurial, here is a download link for you: https://bitbucket.org/alonamaloh/ruy_tu ... 6ca75b.zip

The compilation command is somewhere in README.md. The organization of the project is rather primitive, but I want to put some version out there right away to get feedback. So please let me know what you think if you try to use it.

Thank you!
I was able to repeat your tuning example.

Just for fun I also tried with Stockfish's piece values (well, approximately) as start values.

Code: Select all

bishop_value 800
knight_value 800
pawn_value 180
queen_value 2500
rook_value 1250

This is the tuning process:

Code: Select all

Iteration 1&#58; fx=0.143092 xnorm=3019.91 gnorm=4.62981e-05 step=86738.3
Iteration 2&#58; fx=0.14252 xnorm=3012.41 gnorm=1.61657e-05 step=1
Iteration 3&#58; fx=0.142413 xnorm=3007.39 gnorm=1.6785e-05 step=1
Iteration 4&#58; fx=0.141561 xnorm=2946.66 gnorm=1.97628e-05 step=1
Iteration 5&#58; fx=0.140789 xnorm=2856 gnorm=1.91265e-05 step=1
Iteration 6&#58; fx=0.139985 xnorm=2700.05 gnorm=4.12587e-05 step=0.412323
Iteration 7&#58; fx=0.133579 xnorm=2110.23 gnorm=8.23834e-05 step=2.08852
Iteration 8&#58; fx=0.132986 xnorm=2056.25 gnorm=7.50197e-05 step=0.193246
Iteration 9&#58; fx=0.132433 xnorm=2085.13 gnorm=4.43492e-05 step=0.15903
Iteration 10&#58; fx=0.132279 xnorm=2104.84 gnorm=2.43216e-05 step=0.323912
Iteration 11&#58; fx=0.132245 xnorm=2081.26 gnorm=1.84572e-05 step=1
Iteration 12&#58; fx=0.132207 xnorm=2091.46 gnorm=8.12122e-07 step=1
Iteration 13&#58; fx=0.132207 xnorm=2090.42 gnorm=7.50604e-07 step=1
Iteration 14&#58; fx=0.132206 xnorm=2088.55 gnorm=8.98003e-07 step=1
Iteration 15&#58; fx=0.132202 xnorm=2077.71 gnorm=2.33536e-06 step=1
Iteration 16&#58; fx=0.132193 xnorm=2051.76 gnorm=5.82e-06 step=1
Iteration 17&#58; fx=0.128517 xnorm=1032.7 gnorm=0.00011989 step=10.0092
Iteration 18&#58; fx=0.128516 xnorm=1028 gnorm=0.000120341 step=0.000922782
Iteration 19&#58; fx=0.128102 xnorm=1024.94 gnorm=9.08555e-05 step=1
Iteration 20&#58; fx=0.126818 xnorm=1071.55 gnorm=4.02879e-05 step=1
Iteration 21&#58; fx=0.126079 xnorm=1162.04 gnorm=1.3499e-05 step=1
Iteration 22&#58; fx=0.125989 xnorm=1213.15 gnorm=1.11323e-05 step=1
Iteration 23&#58; fx=0.125977 xnorm=1214.36 gnorm=2.24878e-06 step=0.171952
Iteration 24&#58; fx=0.125975 xnorm=1218.41 gnorm=1.24597e-06 step=1
Iteration 25&#58; fx=0.125975 xnorm=1220.28 gnorm=6.8249e-08 step=1
L-BFGS optimization terminated with status code = 0

And the result:

Code: Select all

bishop_value 338.604
knight_value 318.866
pawn_value 100.784
queen_value 1001.38
rook_value 509.73

Identical values as with starting from 0. Impressive!
And even though this is material-counting only, this is a very interesting result.

C++ code for tuning evaluation function parameters

C++ code for tuning evaluation function parameters

Re: C++ code for tuning evaluation function parameters

Re: C++ code for tuning evaluation function parameters

Re: C++ code for tuning evaluation function parameters

Re: C++ code for tuning evaluation function parameters

Re: C++ code for tuning evaluation function parameters

Re: C++ code for tuning evaluation function parameters

Re: C++ code for tuning evaluation function parameters

Re: C++ code for tuning evaluation function parameters

Re: C++ code for tuning evaluation function parameters