How Do You Automatically Tune Your Evaluation Tables

Tom Likens · Post by **Tom Likens** » Tue Jan 07, 2014 8:09 pm

I have a question, which I'm sure many programmers have wrestled with and likely solved. Basically, I'm curious how you automatically tune the tables in your chess program. I use CLOP as well as some homegrown solutions to tuning individual variables. Of course, CLOP also allows multiple variables to be tuned simultaneously, but the number of games required for convergence grows prodigiously, (4-5 variables usually require 40-50K games for a reliable result). This problem is even worse for a table of 64 values. For some tables, it's possible to take advantage of the table's symmetry so that you only need to tune approximately 16 values, (rotating and mirroring the results works well for certain times of evaluation terms). But for others, you really should CLOP all 64 values. Unfortunately, that will never converge in any reasonable amount of time.

One idea I've tried, with mixed success, is to CLOP only selected squares and then perform a linear or quadratic interpolation of the squares between the selected squares. Essentially, establishing a gradient field. This has kind of worked, but I'm sure there must be a better way. I'm curious, what others have tried.

regards,
--tom

rbarreira · Post by **rbarreira** » Tue Jan 07, 2014 8:12 pm

Tom Likens wrote: One idea I've tried, with mixed success, is to CLOP only selected squares and then perform a linear or quadratic interpolation of the squares between the selected squares. Essentially, establishing a gradient field. This has kind of worked, but I'm sure there must be a better way. I'm curious, what others have tried.

Have you tried doing that followed by optimizing just the squares in between? You could even iteratively optimize half the squares each time after you've done the initial step you just mentioned.

Tom Likens · Post by **Tom Likens** » Tue Jan 07, 2014 9:12 pm

rbarreira wrote:
Tom Likens wrote: One idea I've tried, with mixed success, is to CLOP only selected squares and then perform a linear or quadratic interpolation of the squares between the selected squares. Essentially, establishing a gradient field. This has kind of worked, but I'm sure there must be a better way. I'm curious, what others have tried.
Have you tried doing that followed by optimizing just the squares in between? You could even iteratively optimize half the squares each time after you've done the initial step you just mentioned.

I haven't, and it's probably worth a try. But, I'm still wondering if I'm missing something obvious. I've seen the statement a few times, that Stockfish's tables are autotuned, so they must have a methodology for it.

Daniel Shawul · Post by **Daniel Shawul** » Tue Jan 07, 2014 9:16 pm

I guess that you have to use some kind of function to interpolate over the PSQTs and then tune its parameters instead of all the values. You can either use a predetermined shape (say quadratic) and tune its parameters, or try to build a 'response surface' automatically and then optimize on the model instead of the actual data. In the past I did an optimization on a deterministic black-box function using radial basis functions to build a response surface, which proved to be much faster. IIRC CLOP itself may be using a flavor of that although in a stochastic sense since game outcomes always have some uncertainty. In any case there is no escaping the fact that you will have to reduce the parameter space somehow, which is hopefully low enough for PSQTs. How many 'key points' do you need to tune Knight's PSQT for instance? I guess that you would need the four corners, the sides and the center as well. They are symmetric so you have only 64/8=8 squares to begin with. It will be twice that if you have back-rank penalties. Also if you can build PSQT from simple equations , as Fruit does/used to do, you will be able to tune the parameters directly.

Tom Likens · Post by **Tom Likens** » Tue Jan 07, 2014 11:55 pm

Daniel Shawul wrote:I guess that you have to use some kind of function to interpolate over the PSQTs and then tune its parameters instead of all the values. You can either use a predetermined shape (say quadratic) and tune its parameters, or try to build a 'response surface' automatically and then optimize on the model instead of the actual data. In the past I did an optimization on a deterministic black-box function using radial basis functions to build a response surface, which proved to be much faster. IIRC CLOP itself may be using a flavor of that although in a stochastic sense since game outcomes always have some uncertainty. In any case there is no escaping the fact that you will have to reduce the parameter space somehow, which is hopefully low enough for PSQTs. How many 'key points' do you need to tune Knight's PSQT for instance? I guess that you would need the four corners, the sides and the center as well. They are symmetric so you have only 64/8=8 squares to begin with. It will be twice that if you have back-rank penalties. Also if you can build PSQT from simple equations , as Fruit does/used to do, you will be able to tune the parameters directly.

I've been converging (pardon the pun) to this same conclusion. It's obvious that CLOP or any other optimization technique can't be applied to all the squares independently, so some reduction of the state-space has to occur. It's interesting that Fruit uses an equation to build its piece square tables, I never knew that. I'd bet money that Fabien switched to this after wrestling with this same problem.

lucasart · Post by **lucasart** » Wed Jan 08, 2014 12:11 am

Tom Likens wrote:I have a question, which I'm sure many programmers have wrestled with and likely solved. Basically, I'm curious how you automatically tune the tables in your chess program. I use CLOP as well as some homegrown solutions to tuning individual variables. Of course, CLOP also allows multiple variables to be tuned simultaneously, but the number of games required for convergence grows prodigiously, (4-5 variables usually require 40-50K games for a reliable result). This problem is even worse for a table of 64 values. For some tables, it's possible to take advantage of the table's symmetry so that you only need to tune approximately 16 values, (rotating and mirroring the results works well for certain times of evaluation terms). But for others, you really should CLOP all 64 values. Unfortunately, that will never converge in any reasonable amount of time.

One idea I've tried, with mixed success, is to CLOP only selected squares and then perform a linear or quadratic interpolation of the squares between the selected squares. Essentially, establishing a gradient field. This has kind of worked, but I'm sure there must be a better way. I'm curious, what others have tried.

regards,
--tom

Here's how I did it:
* I started by replicating the Fruit 2.1 PST. I wrote the code myself, because I didn't like the Fruit coding style, but I replicated the functionality almost exactly.
* I hand-tuned and simplified, gradually reducing the number of parameters.
* Then I simplified even more and did some CLOP tuning.

The problem with CLOP is that the nb of games explodes with dimension. I never got any success beyond 3 parameters with CLOP. So I only tuned the most important PST, which are 1/ King 2/ Knight. And I tuned them using super simplified parametric form, with only 2 or 3 parameters.

But CLOP is really not the best solution. It's the best I know how to use, but Joona has a much better auto-tuner, and he's recently got some pretty amazing result tuning SF's King PST.

Perhaps in a theoretical sense CLOP is best, but in reality it's horribly costly, explodes with dimension. Also, CLOP always starts from scratch, it cannot just iteratively improve already good values.

Tom Likens · Post by **Tom Likens** » Wed Jan 08, 2014 3:52 am

lucasart wrote: Here's how I did it:
* I started by replicating the Fruit 2.1 PST. I wrote the code myself, because I didn't like the Fruit coding style, but I replicated the functionality almost exactly.
* I hand-tuned and simplified, gradually reducing the number of parameters.
* Then I simplified even more and did some CLOP tuning.

The problem with CLOP is that the nb of games explodes with dimension. I never got any success beyond 3 parameters with CLOP. So I only tuned the most important PST, which are 1/ King 2/ Knight. And I tuned them using super simplified parametric form, with only 2 or 3 parameters.

But CLOP is really not the best solution. It's the best I know how to use, but Joona has a much better auto-tuner, and he's recently got some pretty amazing result tuning SF's King PST.

Perhaps in a theoretical sense CLOP is best, but in reality it's horribly costly, explodes with dimension. Also, CLOP always starts from scratch, it cannot just iteratively improve already good values.

Hello Lucas,

Yes, I've done some of this myself--starting with a (semi)reasonable guess and then essentially hand tuning, while simultaneously trying to reduce the variables to something manageable. I should probably look into what Joona has done, since I'd like to auto-tune things as much as possible (what's the point of having a computer if we don't let it do the drudge work?).

I tend to agree about CLOP. I think Remi has produced some amazing software, which has revolutionized the way we develop chess programs. Bayeselo taught everyone about statistics and the staggering number of games required to measure real progress, that alone was enough to make his reputation, CLOP was simply an added bonus. I also like the CLOP interface--but, that said, I don't think CLOP is a silver bullet. It takes a *long* time to converge and once you get past a few parameters it's almost hopeless. I don't blame Remi for that, I just think it's a hard problem. One suggestion I'm going to consider is SPSA. It has the potential to do what I want and I might learn something along the way, (always a nice ancillary bonus).

regards,
--tom

Rebel · Post by **Rebel** » Wed Jan 08, 2014 7:02 am

lucasart wrote:Here's how I did it:
* I started by replicating the Fruit 2.1 PST. I wrote the code myself, because I didn't like the Fruit coding style, but I replicated the functionality almost exactly.
* I hand-tuned and simplified, gradually reducing the number of parameters.
* Then I simplified even more and did some CLOP tuning.

Out of curiosity, the Fruit PST's are reasonable so I always wondered once you have reasonable PST's what the maximum elo gain would be if you optimize them. How many elo did you get?

hgm · Post by **hgm** » Wed Jan 08, 2014 10:28 am

Indeed, this seems a legitimate worry. PST are a very course evaluation term, only expressing some very vague ideas, like centralization, advance, and staying away from edges and corners. Of course the range of the terms has to have a reasonable magnitude, so that it won't sacrifice a Rook to move a Knight out of a corner. But beyond that...

I always have the feeling that when I am doing this, I am trying to fit a circle with a square. For any part of the circle that you try to capture inside the square, by enlarging the latter, you will create new area in the square that sticks out of the circle. It seems quite pointless to try to determine the optimal square size that minimizes the non-overlapping area to 3 digits, because you will improve from poor to poor.001. While going to an octagon (= new eval term, cutting off the corners) in stead of a square would immediately do far better in reducing the non-overlapping area even if the square size was completely untuned (e.g. the smallest one that completely covered the circle). In fact starting from a square that minimized the non-overlapping areas would not do particularly well when you made it into an octagon by cutting off the corners.

lucasart · Post by **lucasart** » Wed Jan 08, 2014 10:44 am

Rebel wrote:
lucasart wrote:Here's how I did it:
* I started by replicating the Fruit 2.1 PST. I wrote the code myself, because I didn't like the Fruit coding style, but I replicated the functionality almost exactly.
* I hand-tuned and simplified, gradually reducing the number of parameters.
* Then I simplified even more and did some CLOP tuning.
Out of curiosity, the Fruit PST's are reasonable so I always wondered once you have reasonable PST's what the maximum elo gain would be if you optimize them. How many elo did you get?

I don't know, because this was the result of many patches over the years. Those were of two kinds:
1/ simplification: removing parameters and trying to compensate by making something more general, plus some hand tuning. Those did not give much, but allowed me to simplify to an extreme and have very few parameters left.
2/ CLOP tuning of the most sensitive PSQ. Those are King and Knight. The other ones are not sensitive enough, so tuning them with CLOP is hopeless. That gave most of the elo, but was made possible by preliminary step 1/.

I would have to implement the Fruit PSQ again to be able to measure, and really I can't be bothered. But my guess is that I have gained easily 20 ELO out of tuning PSQ alone.

Fruit's PSQ is (like the rest of Fruit) very well designed. But it is not tuned properly. All features have default weights that are reasonable and within the right ballpark, but that's it. They are not optimized properly. Fabien was not interested in tuning.

How Do You Automatically Tune Your Evaluation Tables

How Do You Automatically Tune Your Evaluation Tables

Re: How Do You Automatically Tune Your Evaluation Tables

Re: How Do You Automatically Tune Your Evaluation Tables

Re: How Do You Automatically Tune Your Evaluation Tables

Re: How Do You Automatically Tune Your Evaluation Tables

Re: How Do You Automatically Tune Your Evaluation Tables

Re: How Do You Automatically Tune Your Evaluation Tables

Re: How Do You Automatically Tune Your Evaluation Tables

Re: How Do You Automatically Tune Your Evaluation Tables

Re: How Do You Automatically Tune Your Evaluation Tables