How Do You Automatically Tune Your Evaluation Tables

mvk · Post by **mvk** » Sat Feb 22, 2014 3:21 pm

petero2 wrote:2. Instead of just looping over all parameters repeatedly in the same order, I keep statistics about how often and by how much changing a parameter has improved E. Parameters with a good history are tried more often than other parameters. (I don't know how much this really helps but it seemed to be a reasonable approach.)

I do this also, as follows: First I loop over all parameters and do a hill climbing on that parameter. I don't look for the top, but bail out when there is any improvement, and then continue with the next parameter. After all parameters are done, I sort them based on the improvement I found. Then I halve the population, keeping only the most improving parameters, and loop over the reduced population again. Then halving again, etc, until the population is 1. Then I go back to iterating over the whole set again, etc.

This gives volatile parameters more computation time than the stables ones. And it allows correlated parameters to evolve together more quickly.

jdart · Post by **jdart** » Thu Feb 27, 2014 3:27 pm

I think this is quite interesting and am planning to give it a try.

I think though there are some parameters that are probably not going to be effectively tuned by this method. One example is the classic trade down bonus. If you are very far ahead in material, your score is always going to predict a win and you will win a very large percentage of the time. The trade down bonus will scarcely affect that. But it does help you convert the advantage to a win faster.

--Jon

AlvaroBegue · Post by **AlvaroBegue** » Thu Feb 27, 2014 5:54 pm

I don't see why the trade-down bonus would be an exception to this. I believe the trade-down bonus captures the fact that positions with the same material imbalance but fewer pieces on the board are won more often. If that's the case, the type of tuning we have described would work just fine.

jdart · Post by **jdart** » Thu Feb 27, 2014 6:28 pm

To clarify: If you have a Queen advantage for example, you should reduce the number of opponent pieces through trades if possible. But you will probably win with a Queen advantage even if you don't do this. So varying the bonus for trades will change the winning percentage very little and your tuning may just give some random value. But in fact the bonus may help you reduce distance to win even if it does not help winning percentage.

--Jon

petero2 · Post by **petero2** » Thu Feb 27, 2014 7:51 pm

jdart wrote:I think this is quite interesting and am planning to give it a try.

I think though there are some parameters that are probably not going to be effectively tuned by this method. One example is the classic trade down bonus. If you are very far ahead in material, your score is always going to predict a win and you will win a very large percentage of the time. The trade down bonus will scarcely affect that. But it does help you convert the advantage to a win faster.

My program has a piece trade bonus and I did include it in the optimization. After optimization the bonus was adjusted such that if you are one queen ahead and trade a rook, the trade bonus is about 1cp. So the bonus is currently very close to zero. I can see at least three possible reasons for this result:

1. The optimization does not care about how quickly a game is won, only the final result counts. So if the piece trade bonus does not improve winning chances but does improve the average number of moves to mate, the optimization will think the bonus is useless.

2. Maybe the piece trade bonus does not make sense for computers. When I learned to play chess, the motivation for the trade bonus I heard was to simplify the game and give the opponent less opportunities to create complications and set traps. It is quite possible that this aspect does not apply to computer programs that are tactically very strong.

3. My program has other evaluation terms that behave similarly to the piece trade bonus and may make the piece trade bonus redundant. For example, I have a term for "redundancy of major pieces", which gives the side with more major pieces a bonus of ~75cp for trading a major piece. Also the bishop pair bonus grows when minor pieces are traded.

An effect similar to the piece trade bonus problem appears if you would try to tune the evaluation function used to guide the search in KBNK endgames. (My program does have such a function but I don't include its weights in the optimization.) The optimal evaluation function would return +infinity for all positions in this endgame, except for the rare exceptions when the defending side can quickly capture one of the pieces. However an evaluation function that always returned +infinity would of course be completely useless for guiding the search in this endgame.

Antonio Torrecillas · Post by **Antonio Torrecillas** » Fri Feb 28, 2014 7:49 pm

An effect similar to the piece trade bonus problem appears if you would try to tune the evaluation function used to guide the search in KBNK endgames. (My program does have such a function but I don't include its weights in the optimization.) The optimal evaluation function would return +infinity for all positions in this endgame, except for the rare exceptions when the defending side can quickly capture one of the pieces. However an evaluation function that always returned +infinity would of course be completely useless for guiding the search in this endgame.

You can add some sence of progress, simply setting a linear increase from 0.5 in the starting position up to the final value(1.0/0.5/0.0) in the final position of the game.
I'm running a test to see if this approach leads to a better result or not than assign the final value for all positions.

I've verified that you can split a test set and tune with each chunk in sequence. This method bring a parameter set just as good as tuning with the whole set. ( ten set of one million positions versus a ten million single set).So what I do is run a small tournament, generate the test set and tune and repeat until no improvement. This avoid effectively the overtuning.

Another aspect that seems to be better is to use varied opponents, including stronger (>+100) and weaker engines (<-100) . I try to ensure a good mix of balanced and unbalanced positions in the test set.If the parameter set derive from the reasonable values, the stronger engines quickly punish and the tuning system leads to relearn the correct values.

jdart · Post by **jdart** » Fri Feb 28, 2014 8:36 pm

Sounds like Miguel is doing it a different way, using a large fixed set of positions from comp games (I infer, not necessarily his own engine). I am not sure whether this is better than continually re-generating the test set from computer matches. But it is simpler.

--Jon

jdart · Post by **jdart** » Sat Mar 01, 2014 6:53 pm

You can add some sence of progress, simply setting a linear increase from 0.5 in the starting position up to the final value(1.0/0.5/0.0) in the final position of the game.

That is an idea, but I think a better approach would be to actually play several games (ideally with different engines) from the starting position and obtain the winning percentage. The problem is, with the tuning method as initially outlined, most positions are represented by only one game (unless they are within range of opening books), and so you don't know if the goodness of the position is strongly related to the outcome of the game or not.

Obtaining win percentage for a large set of positions over all game phases would be computationally expensive, even with fast time control games. But I think you'd wind up with better tuning if you could optimize the score to winning percentage difference.

--Jon

Antonio Torrecillas · Post by **Antonio Torrecillas** » Sat Mar 01, 2014 8:22 pm

jdart wrote:
You can add some sence of progress, simply setting a linear increase from 0.5 in the starting position up to the final value(1.0/0.5/0.0) in the final position of the game.
That is an idea, but I think a better approach would be to actually play several games (ideally with different engines) from the starting position and obtain the winning percentage. The problem is, with the tuning method as initially outlined, most positions are represented by only one game (unless they are within range of opening books), and so you don't know if the goodness of the position is strongly related to the outcome of the game or not.

Obtaining win percentage for a large set of positions over all game phases would be computationally expensive, even with fast time control games. But I think you'd wind up with better tuning if you could optimize the score to winning percentage difference.

--Jon

The idea of assigning the final result to all position of a game is not new. For a reference see chapter three of buro's Glem.http://wiki.cs.pdx.edu/wurzburg2009/pap ... o/glem.pdf
The idea of the linear approach is not new, just think in the temporary differences method.

These methods of labelling position are quick but imprecise. The method of measuring the winning probability with games is more accurate but far slower.
I actually have +600 parameters in the evaluation function, so a million positions are barely good enough for a rough estimate of the more general parameters.
So I prefer a quick method and use a tuning method that take care of the noise introduced. With engines with more than +100 elo difference, the linear assumption is easily matched and the noise somehow reduced.
I change often the tuning set to avoid overfitting and to have some chances that less common features be present at some stage of the process. At the same time I measure the strength of some intermediate sets.
Obviously not perfect, but this is the best I've found so far for a large parameter set.

Steve Maughan · Post by **Steve Maughan** » Sat Mar 01, 2014 9:34 pm

I've come to this late - all interesting stuff though. And relevant to where I am with Maverick.

A couple of thoughts:

1. Setting the results to win, draw lose for all positions seems to be "a stretch". Antonio suggested a linear increase from 0.5 to 1.0. This may also be wrong if the position is basically won but takes a while to play out. How about getting a strong engine and scoring all of the position, then converting this into a probability using the sigmoid function?

2. Robert Houdart calibrated Houdini such that a 1.00 pawn advantage correlated to a 90% win against an opponent of the same strength. Maybe he used a similar training mechanism?

Best regards,

Steve

How Do You Automatically Tune Your Evaluation Tables

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: How Do You Automatically Tune Your Evaluation Tables

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm