I just finished writing two C++ header files that facilitate tuning evaluation parameters by trying to make the score a good predictor of the result of the game. This is similar in nature to Texel's tuning method. It's around 300 lines of code, and the only dependence is libLBFGS.
One of the header files implements reverse-mode automatic differentiation, and the other one uses it to optimize the mean squared error using L-BFGS. You don't need to know what any of that means to use the code: Just look at the example below.
I'll be happy to share the code and/or the database of quiescent positions and results with anyone interested. Just let me know.
The user has to provide a function that takes an EPD string and returns a score in centipawns. The catch is that the function needs to be able to use a score type that is a template parameter, so I can provide a funky type (RT::Variable) that allows me to automatically compute derivatives. If your evaluation function is just a linear combination of features and you set it up to learn the coefficients (like in the example below), this is really just doing logistic regression for you. But the code is more general, and it can tune any real-valued parameters in your evaluation function. It should even be able to train a neural network (which is what I ultimately want to do with this), although at some point performance will become an issue.
I also provide a mechanism to read parameters from a key-value file. The result of the optimization is written to the same file.
Here's a sample training program, which learns material values. This is a toy example, which just counts the material directly from the EPD string, but you would normally have to build a board from the EPD string and then call an evaluation function that has been appropriately turned into a template.
Code: Select all
#include "ruy_tune.hpp"
template <typename Score>
Score evaluate(std::string const &epd) {
static Score pawn_value = RT::parameter<Score>("pawn_value");
static Score knight_value = RT::parameter<Score>("knight_value");
static Score bishop_value = RT::parameter<Score>("bishop_value");
static Score rook_value = RT::parameter<Score>("rook_value");
static Score queen_value = RT::parameter<Score>("queen_value");
int count[128] = {0};
for (size_t i = 0; epd[i] != ' '; ++i)
++count[static_cast<int>(epd[i])];
return (count['P']-count['p']) * pawn_value
+ (count['N']-count['n']) * knight_value
+ (count['B']-count['b']) * bishop_value
+ (count['R']-count['r']) * rook_value
+ (count['Q']-count['q']) * queen_value;
}
int main() {
RT::train(evaluate<RT::Variable>, "quiescent_positions_with_results", "evaluation_parameters");
}
The file "quiescent_positions_with_result" looks like this:
Code: Select all
3r4/4k3/8/5p1R/8/1b2PB2/1P6/4K3 b - - 1-0
3nk2r/rp1b2pp/pR3p2/3P4/5Q2/3B1N2/5PPP/5RK1 b k - 1-0
1R6/7p/4k1pB/p1Ppn3/3K3P/8/r7/8 w - - 0-1
3R4/5B1k/2b4p/5p2/1P6/4q3/P4RPP/6K1 b - - 1/2-1/2
8/5kp1/p4n1p/3pK3/1B6/8/8/8 w - - 0-1
3q3k/1br2pp1/1p6/pP1pR1b1/3P4/P2Q2P1/1B5P/5RK1 b - - 1-0
2b1rbk1/1p1n1pp1/3B3p/6q1/2B1P3/2N2P1P/R2Q2P1/6K1 b - - 1/2-1/2
2q3k1/5pp1/p3p2p/1p6/1Q1P4/5PP1/PP2N2P/3R2K1 b - - 1-0
8/7Q/p2p1pp1/4b1k1/6r1/8/P4PP1/3R1RK1 b - - 1-0
rq3rk1/2p2ppp/p2b4/1p1Rp1BQ/4P3/1P5P/1PP2PP1/3R2K1 b - - 1-0
[... 1,336,000 more lines...]
If I set all the initial values to 0, it takes 27 seconds to run to convergence on my [pretty fast] machine. The output to the terminal looks like this:
Code: Select all
Iteration 1: fx=0.383541 xnorm=21 gnorm=0.00130048 step=14292.4
Iteration 2: fx=0.2012 xnorm=271.085 gnorm=0.000403718 step=1
Iteration 3: fx=0.17345 xnorm=384.371 gnorm=0.000388558 step=1
Iteration 4: fx=0.151441 xnorm=505.641 gnorm=0.000204739 step=1
Iteration 5: fx=0.138558 xnorm=643.655 gnorm=8.16831e-05 step=1
Iteration 6: fx=0.131303 xnorm=804.649 gnorm=4.81988e-05 step=1
Iteration 7: fx=0.128105 xnorm=1000.85 gnorm=0.000111498 step=1
Iteration 8: fx=0.126405 xnorm=1109.15 gnorm=3.24418e-05 step=1
Iteration 9: fx=0.126073 xnorm=1176.96 gnorm=1.62823e-05 step=1
Iteration 10: fx=0.125991 xnorm=1214.03 gnorm=4.58934e-06 step=1
Iteration 11: fx=0.125982 xnorm=1229.57 gnorm=3.8447e-06 step=1
Iteration 12: fx=0.125977 xnorm=1227.44 gnorm=2.75394e-06 step=0.478224
Iteration 13: fx=0.125976 xnorm=1224.41 gnorm=1.44869e-06 step=1
Iteration 14: fx=0.125975 xnorm=1220.77 gnorm=3.39401e-07 step=1
Iteration 15: fx=0.125975 xnorm=1220.63 gnorm=1.21549e-07 step=0.474846
L-BFGS optimization terminated with status code = 0
The file "evaluation_parameters" is both an input and an output. After the tuning it looks like this:
Code: Select all
bishop_value 338.668
knight_value 318.914
pawn_value 100.797
queen_value 1001.77
rook_value 509.719
Any interest?