Evaluation & Tuning in Chess Engines
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 1288
 Joined: Tue Apr 19, 2016 4:08 am
 Location: U.S.A
 Full name: Andrew Grant
 Contact:
Evaluation & Tuning in Chess Engines
For a while now I've been thinking about taking "Texel Tuning" to the next step. For a while I've been using a naive Gradient Decent framework to tune Ethereal's evaluation terms. But this only worked for Linear terms  not King Safety or Complexity, as their derivatives are far more complex to compute. Additionally, I was hand waving an error between adjusting a terms endgame weight and having that be offset by Complexity.
So I finally bit the bullet, did the math, and implemented things. Since doing so I've pushed 3 tuning elo gainers to Ethereal. The first tuned all Linear terms in the evaluation for ~+10.0 elo. The second demonstrated that King Safety could be tuned for elo with a ~+3.4 elo patch. Finally, as a final proof of concept, I tuned all parameters at once  Linear, Safety, & Complexity  and found a gain of ~+2.3 elo.
This document has 4 sections. The first of which outlines how evaluation functions tend to work in modern alphabeta chess engines. The second section explains the original Texel Tuning ideas, and how I have refined the original ideas and dataset generation. The third section goes through and actually finds the formulas needed to compute Gradients for the entire evaluation. The fourth section shares snippets of code from Ethereal's tuner, to give a general idea of how the system is implemented.
It is still a work in progress. Probably lots of typos, missing words, whatever. I do think the maths have been checked however, which is confirmed by the success of the tuning scheme.
TL;DR: AdaGrad implementation which fully computes Gradient's for evaluation functions which are 1) Linear, or 2) Can be described as a set of Linear operations which are later adjusted by a partiallydifferentiable function. This method could be applied to most the open source engines I am familiar with. Also, a new way to build data sets for tuning.
So I finally bit the bullet, did the math, and implemented things. Since doing so I've pushed 3 tuning elo gainers to Ethereal. The first tuned all Linear terms in the evaluation for ~+10.0 elo. The second demonstrated that King Safety could be tuned for elo with a ~+3.4 elo patch. Finally, as a final proof of concept, I tuned all parameters at once  Linear, Safety, & Complexity  and found a gain of ~+2.3 elo.
This document has 4 sections. The first of which outlines how evaluation functions tend to work in modern alphabeta chess engines. The second section explains the original Texel Tuning ideas, and how I have refined the original ideas and dataset generation. The third section goes through and actually finds the formulas needed to compute Gradients for the entire evaluation. The fourth section shares snippets of code from Ethereal's tuner, to give a general idea of how the system is implemented.
It is still a work in progress. Probably lots of typos, missing words, whatever. I do think the maths have been checked however, which is confirmed by the success of the tuning scheme.
TL;DR: AdaGrad implementation which fully computes Gradient's for evaluation functions which are 1) Linear, or 2) Can be described as a set of Linear operations which are later adjusted by a partiallydifferentiable function. This method could be applied to most the open source engines I am familiar with. Also, a new way to build data sets for tuning.
Re: Evaluation & Tuning in Chess Engines
This is very cool, readable, and nicely typeset! Congrats on the progress you've made so far!

 Posts: 2246
 Joined: Wed Mar 08, 2006 7:47 pm
 Location: Hattingen, Germany
Re: Evaluation & Tuning in Chess Engines
Thank you. Another breakthrough in understanding evaluation tuning.
Re: Evaluation & Tuning in Chess Engines
In Minic I'm using a much more lazy approach. I always compute the derivative numerically moving each parameters by a little delta.
This is of course very slow compared to what you do but works for everything.
Another approach can be to use method that does not need any derivatives (like PSO for instance).
This is of course very slow compared to what you do but works for everything.
Another approach can be to use method that does not need any derivatives (like PSO for instance).
 hgm
 Posts: 26436
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: Evaluation & Tuning in Chess Engines
For nonlinear optimization problems I have always used the method of randomly perturbing the best solution I found so far, biased in the direction that gave me successful steps in the recent past. Every succesfull step adds to the bias, and increases the random perturbations around it, every rejected step shrinks both the bias and the size of the perturbations.
Re: Evaluation & Tuning in Chess Engines
Back in 2000, having no initial idea of gradient descent coding, I tried something similar to train a two hidden layer NN Backgammon. Perturb weights, play game, if new weights win keep, else junk. Loop. Had no way of objectively testing against opposition, but it did improve and the 250,000 game version used to win against me and against a web based Backgammon.hgm wrote: ↑Mon Aug 24, 2020 9:53 amFor nonlinear optimization problems I have always used the method of randomly perturbing the best solution I found so far, biased in the direction that gave me successful steps in the recent past. Every succesfull step adds to the bias, and increases the random perturbations around it, every rejected step shrinks both the bias and the size of the perturbations.
I tried something vaguely similar for chess engine search. Maybe you have also tried? I had a list of about 50 search parameters and first shot was to play several tens of thousands of bullet games, each game on a different random parameter set. And built histograms for each parameter of winrate against parameter value. It was difficult to pick out any pattern.
Next stage (too many things to do) was the random perturbation, play game (batch), keep if wins, junk if not. I guess this will require more resources than I have, one needs to play reasonably deep games else the search is not being fully tested, and deeper games require more time. On the plus side, for eval tuning, you need a lot of positions in part because some eval features only figure occasionally, whereas for search tuning, a parameter change is almost certainly going to be game decisive, so maybe fewer games are needed to hone in on a result.
Seems likely others have already tried similar?

 Posts: 570
 Joined: Mon Jul 20, 2015 3:06 pm
 Contact:
Re: Evaluation & Tuning in Chess Engines
Congradulations, and thank you for the PDF on nonlinear tuning. The PST's for the NN methods are using an enlarged database. Has anybody tried to use a larger PST database such as:
PST [piece][square][enemy king position]
This could adapt quite well to a tuning method.
PST [piece][square][enemy king position]
This could adapt quite well to a tuning method.
Re: Evaluation & Tuning in Chess Engines
I did not read a lot of AI and mathematics papers and I have really ~0 knowedlge in both topics, but I must say that this article is the most readable one i encountered so far.
I wish all papers on this topics would be written as simple and clear as this one.
I wish all papers on this topics would be written as simple and clear as this one.
Re: Evaluation & Tuning in Chess Engines
I apologise if this comes across as critical, but I'd like to share a few quick thoughts on chess tuning.
If I denote C as a set of constants, and T as a set of position attributes to tune, and if, for the sake of writing a concise post I may be permitted to oversimplify, then, roughly speaking, the result of the tuning will be:
eval = C1*T1 + C2*T2 + C3*T3 + ... + Cn*Tn
Some issues arise:
* the constants are not going to be constant between position types (my remedy would be a polynomial relationship which would allow powers of the position attributes to multiply with each other. This would give rise to the problem of an infinite number of possible solutions, to which my remedy would be to define some rules for simplicity and then optimise for the simplest polynomial fit)
* most methods of optimising the eval expression are going to find local maxima rather than a global maximum. My remedy would be to use one of these big linear number crunchers like the type of software used to test supercomputer flops in "real world software" to find the global maximum
* it is very likely that there are going to be a large number of bestfit solutions  but common sense would tell us that there should only be one solution
* for it to be even possible to find an optimal solution, it is a mathematical certainty that the number of positions for which an evaluation exists would need to be greater than the number of position attributes to tune
* if exact numbers are used for the sample evaluations, it is very likely to be impossible to make a fit due to two (or more) expressions being incompatible. To make it work, I think the samples would need to use evaluation ranges rather than exact numbers
If I denote C as a set of constants, and T as a set of position attributes to tune, and if, for the sake of writing a concise post I may be permitted to oversimplify, then, roughly speaking, the result of the tuning will be:
eval = C1*T1 + C2*T2 + C3*T3 + ... + Cn*Tn
Some issues arise:
* the constants are not going to be constant between position types (my remedy would be a polynomial relationship which would allow powers of the position attributes to multiply with each other. This would give rise to the problem of an infinite number of possible solutions, to which my remedy would be to define some rules for simplicity and then optimise for the simplest polynomial fit)
* most methods of optimising the eval expression are going to find local maxima rather than a global maximum. My remedy would be to use one of these big linear number crunchers like the type of software used to test supercomputer flops in "real world software" to find the global maximum
* it is very likely that there are going to be a large number of bestfit solutions  but common sense would tell us that there should only be one solution
* for it to be even possible to find an optimal solution, it is a mathematical certainty that the number of positions for which an evaluation exists would need to be greater than the number of position attributes to tune
* if exact numbers are used for the sample evaluations, it is very likely to be impossible to make a fit due to two (or more) expressions being incompatible. To make it work, I think the samples would need to use evaluation ranges rather than exact numbers
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
It's not "how smart you are", it's "how are you smart".

 Posts: 4134
 Joined: Fri Mar 10, 2006 4:23 am
 Location: http://www.arasanchess.org
Re: Evaluation & Tuning in Chess Engines
Thanks, will look at this. Arasan has tuned nonlinear parameters for some time now, especially king safety terms. The tuning code explicitly calculates the gradient for these.
However, I wouldn't really recommend my code as a good model for how to do this. It is complex because the tuning parameters are maintained in a separate structure, and then these are transformed into the actual evaluation parameters read by the eval code during tuning. Then, at the end of tuning the tuner generates a code file (C++) that has the eval parameters written out as constants. There is probably a simpler way to do this and a way that doesn't spread the whole tuning apparatus across and into multiple files.
Btw. I have been recently looking at this algorithm (AdaBound) for optimization: https://www.luolc.com/publications/adabound/ but I haven't gotten it working yet.
Jon
However, I wouldn't really recommend my code as a good model for how to do this. It is complex because the tuning parameters are maintained in a separate structure, and then these are transformed into the actual evaluation parameters read by the eval code during tuning. Then, at the end of tuning the tuner generates a code file (C++) that has the eval parameters written out as constants. There is probably a simpler way to do this and a way that doesn't spread the whole tuning apparatus across and into multiple files.
Btw. I have been recently looking at this algorithm (AdaBound) for optimization: https://www.luolc.com/publications/adabound/ but I haven't gotten it working yet.
Jon