How Do You Automatically Tune Your Evaluation Tables
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 4134
 Joined: Fri Mar 10, 2006 4:23 am
 Location: http://www.arasanchess.org
Re: The texel evaluation function optimization algorithm
Also see:
http://en.wikipedia.org/wiki/NEWUOA and related algorithms. CONDOR (http://www.appliedmathematics.net/ is one implementation derived from these. Another is in the dlib library (http://dlib.net/optimization.html).
These algorithms are for optimization problems where the function to be optimized is possibly expensive to compute and noisy.
Jon
http://en.wikipedia.org/wiki/NEWUOA and related algorithms. CONDOR (http://www.appliedmathematics.net/ is one implementation derived from these. Another is in the dlib library (http://dlib.net/optimization.html).
These algorithms are for optimization problems where the function to be optimized is possibly expensive to compute and noisy.
Jon
Re: The texel evaluation function optimization algorithm
Let me see if I understand correctly what the issue is here.
In a simple case, where weights do not depend on gamephase, the evaluation function should return values such that if E1 > E2, then the probability to win the position corresponding to E1 (say P1) is larger than the position corresponding to E2 (say P2). The absolute scale of the evaluation is arbitrary, but one typically fixes the value of the pawn (say, VP = 100, but this is again arbitrary).
In practice, the evaluation is really a 2vector, for middlegame and endgame scores, which is converted to a scalar by a linear weight depending on the material on the board ("game phase"). We should still have the property that if E1 > E2, then the probability of winning P1 is larger than the probability of winning P2. However, we now have two scale factors: for the middlegame and for the end game. One is still arbitrary, but the other one needs to be fixed somehow, otherwise the interpolation makes no sense.
A simple way to do this is to simply set VP(middlegame) = VP(end game), then tuning will make sure that other terms that become relatively more important in the end game get a higher weight in the end game than terms that are less important.
However, fixing the value of the pawn comes at a price: it is not guaranteed that the probability of winning the game given a +1 evaluation (say) is the same in the middlegame as it is in the end game. In fact, this is very probably not true (and I'm not even thinking about pathological cases where the extra pawn is meaningless; those you should detect and score as nearly equal anyway), but this is a tacit assumption in the tuning method described by Peter.
Instead, one should keep the probability P(win; eval) constant for a given evaluation score across different game phases. But if we also fix the value of the pawn in the end game, the problem is overdetermined (there is no free parameter to play with). Conclusion: the end game value of the pawn has to be allowed to vary along with the other weights in order for this tuning method to work properly.
Does that sound right? It does explain why I got nothing sensible when I tried to apply it to just positional weights (but not piece values).
In a simple case, where weights do not depend on gamephase, the evaluation function should return values such that if E1 > E2, then the probability to win the position corresponding to E1 (say P1) is larger than the position corresponding to E2 (say P2). The absolute scale of the evaluation is arbitrary, but one typically fixes the value of the pawn (say, VP = 100, but this is again arbitrary).
In practice, the evaluation is really a 2vector, for middlegame and endgame scores, which is converted to a scalar by a linear weight depending on the material on the board ("game phase"). We should still have the property that if E1 > E2, then the probability of winning P1 is larger than the probability of winning P2. However, we now have two scale factors: for the middlegame and for the end game. One is still arbitrary, but the other one needs to be fixed somehow, otherwise the interpolation makes no sense.
A simple way to do this is to simply set VP(middlegame) = VP(end game), then tuning will make sure that other terms that become relatively more important in the end game get a higher weight in the end game than terms that are less important.
However, fixing the value of the pawn comes at a price: it is not guaranteed that the probability of winning the game given a +1 evaluation (say) is the same in the middlegame as it is in the end game. In fact, this is very probably not true (and I'm not even thinking about pathological cases where the extra pawn is meaningless; those you should detect and score as nearly equal anyway), but this is a tacit assumption in the tuning method described by Peter.
Instead, one should keep the probability P(win; eval) constant for a given evaluation score across different game phases. But if we also fix the value of the pawn in the end game, the problem is overdetermined (there is no free parameter to play with). Conclusion: the end game value of the pawn has to be allowed to vary along with the other weights in order for this tuning method to work properly.
Does that sound right? It does explain why I got nothing sensible when I tried to apply it to just positional weights (but not piece values).
Re: The texel evaluation function optimization algorithm
Hi Evert,
I guess you're right.
My reasoning was that the other endgame related terms in the evaluation can modify the pawn material value enough if required.
A simple case would be a constant to increase/decrease the PSQ values of the pawn. But as there are more pawn related terms there are multiple ways to do that.
However I'm is not sure whether it can really be compensated. Maybe I rerun the test later.
Thomas...
I guess you're right.
My reasoning was that the other endgame related terms in the evaluation can modify the pawn material value enough if required.
A simple case would be a constant to increase/decrease the PSQ values of the pawn. But as there are more pawn related terms there are multiple ways to do that.
However I'm is not sure whether it can really be compensated. Maybe I rerun the test later.
Thomas...
Re: The texel evaluation function optimization algorithm
That was my initial thought too: the value of the pawn is just an arbitrary constant, so what does it matter? The other terms just scale to get the correct relative score. But then it occurred to me that the mapping of evaluation score to winning chances is not necessarily constant and could be messed up by fixing the value of the pawn.tpetzke wrote: My reasoning was that the other endgame related terms in the evaluation can modify the pawn material value enough if required.
A simple case would be a constant to increase/decrease the PSQ values of the pawn. But as there are more pawn related terms there are multiple ways to do that.
I guess one could test if this is true by plotting the winning rate for a given evaluation advantage as a function of game phase (so a 3D plot). Ideally you'd probably want that to be independent of game phase.
Note that things get messy again when you consider that the communication protocol states that the printed score is supposed to be in "centipawns", which does moreorless imply fixing the (typical) value of the pawn at 100, for both middle and end game. I guess most people ignore this anyway.
 Steve Maughan
 Posts: 1089
 Joined: Wed Mar 08, 2006 7:28 pm
 Location: Florida, USA
 Contact:
Re: The texel evaluation function optimization algorithm
Hi Evert,
This effect may turn out to be cosmetic. But if the pawn's value is significantly different in the opening to the endgame (see Marco Belli's post earlier in the thread http://goo.gl/au9O39) then it could be interesting. I suspect a swing of 100% could easily account for the 20 ELO difference noted by Thomas.
Also, an engine with such a low pawn value in the opening is going to have an "interesting" playing style with lots of pawn sacrifices in return for active play.
Steve
Exactly!Evert wrote:(...) But then it occurred to me that the mapping of evaluation score to winning chances is not necessarily constant and could be messed up by fixing the value of the pawn.(...)
This effect may turn out to be cosmetic. But if the pawn's value is significantly different in the opening to the endgame (see Marco Belli's post earlier in the thread http://goo.gl/au9O39) then it could be interesting. I suspect a swing of 100% could easily account for the 20 ELO difference noted by Thomas.
Also, an engine with such a low pawn value in the opening is going to have an "interesting" playing style with lots of pawn sacrifices in return for active play.
Steve
http://www.chessprogramming.net  Maverick Chess Engine
Re: The texel evaluation function optimization algorithm
I don't think there is any such assumption in my method. My method only fixates K, not the value of any evaluation weight (such as the nominal value of a pawn). Also my method optimizes on the scores returned by the qsearch function, which is not a vectorvalue. The interpolation between MG and EG scores has to happen before search sees the score, otherwise alphabeta would not work.Evert wrote:However, fixing the value of the pawn comes at a price: it is not guaranteed that the probability of winning the game given a +1 evaluation (say) is the same in the middlegame as it is in the end game. In fact, this is very probably not true (and I'm not even thinking about pathological cases where the extra pawn is meaningless; those you should detect and score as nearly equal anyway), but this is a tacit assumption in the tuning method described by Peter.
In fact my method assumes that there is not a perfect match between qsearch scores and win probability. The goal of the optimization is to adjust the weights to make the mismatch smaller.
Also note that if the evaluation function internally uses (MG,EG) pairs for its weights, such weights would correspond to two parameters in the optimization problem. You could also use additional parameters to describe the weighting formula. For example, in texel many evaluation terms are weighted like this:
Code: Select all
S = S_mg if material >= param1
S_eg if material <= param2
S_mg + (S_eg  S_mg) * (param1  material) / (param1  param2) otherwise
From your post I get the impression that you are not really using the method I described, but instead a variation of it. In my method the qsearch score is used to compute E, the set of training positions include nonquiet positions and positions where the evaluation score is very large, and no evaluation weights are fixed to a particular value (such as pawn=100).Evert wrote:Does that sound right? It does explain why I got nothing sensible when I tried to apply it to just positional weights (but not piece values).
It is possible to optimize only a subset of the evaluation parameters. However as I wrote in the description of my method, one big objection to this method is that it assumes that "correlation implies causation". This is also a possible reason you didn't see any improvement. Nevertheless, I have so far increased the strength of texel about 150 elo points in 2.5 months using this method. It is quite likely that the method will stop working at some point though, probably long before texel gets close to stockfish strength.
Re: The texel evaluation function optimization algorithm
Of course; I think you misunderstand what I was trying to say.petero2 wrote: I don't think there is any such assumption in my method. My method only fixates K, not the value of any evaluation weight (such as the nominal value of a pawn). Also my method optimizes on the scores returned by the qsearch function, which is not a vectorvalue. The interpolation between MG and EG scores has to happen before search sees the score, otherwise alphabeta would not work.
Assuming a traditional linear interpolation between midgame and endgame scores (I gather you do something slightly different, which honestly sounds reasonable), the evaluation is a 2vector composed of midgame and endgame which gets projected to a 1D value for the purpose of alphabeta. In principle, the two evaluations can use a very different scale (say, centipawns for the midgame and millipawns for the end game for a slightly stupid example), but they need to be normalised with respect to each other in order to do the interpolation (of course, if you don't do that and get it badly wrong the program plays bad chess). One easy normalisation is just fixing the value of the pawn in both cases (as Thomas was doing). However, when doing that the winning rate given a +0.5 (say) advantage in the end game need not be the same as the winning rate for a +0.5 advantage in the middle game. This would show up as noise if you pass the +1 score through a logistic function and compare that with the actual outcome of the games from which the positions were taken. Contrary to the noise inherent in playing games, this is a systematic error, an inherent spread in the results that you cannot get rid of unless you allow the value of the pawn to change with game phase.
I know I say evaluation rather than qscore, but it comes down to the same thing; the qsearch is only needed to make sure you reach a quiet position that the evaluation can say something about.
Of course. I do know how an optimisation works.In fact my method assumes that there is not a perfect match between qsearch scores and win probability. The goal of the optimization is to adjust the weights to make the mismatch smaller.
However, it should still not matter whether a given score comes from a midgame or an endgame position: the win probability (distribution) for a given score should be independent of gamephase, otherwise you may get swamped by systematic uncertainties. This, it seems to me, is a tacit assumption that is required for the method to work well.
Of course. Unless you filter out nonquiet positions first (say, positions where qscore != eval) you cannot use eval directly, but I don't see why you'd do that (except to throw away useful positions and possibly sabotage your statistics). I do throw away positions that are "close to" mate, since the evaluation is pointless for those anyway.From your post I get the impression that you are not really using the method I described, but instead a variation of it. In my method the qsearch score is used to compute E, the set of training positions include nonquiet positions and positions where the evaluation score is very large,
Well, I was also trying to use it on a very young engine which is very poorly tuned to begin with (also not an orthochess engine, but that hardly matters for the method), where the range of drawish scores is much larger than in chess (a pawnadvantage is worth less than it is in chess). It also doesn't recognise some dead drawn material combinations. This all combines to give a relatively weak correlation between score and win probability, which is probably a contributing factor to why this didn't work. I'll get back to it at some point in the future.However as I wrote in the description of my method, one big objection to this method is that it assumes that "correlation implies causation". This is also a possible reason you didn't see any improvement.
Re: The texel evaluation function optimization algorithm
Wow!Nevertheless, I have so far increased the strength of texel about 150 elo points in 2.5 months using this method.
Re: The texel evaluation function optimization algorithm
A call to eval and a call to qSearch is computational wise a huge difference. If you want nonquiet positions in the set you should replace the nonquiet position with the position from the qSearch the score comes from.Of course. Unless you filter out nonquiet positions first (say, positions where qscore != eval) you cannot use eval directly, but I don't see why you'd do that (except to throw away useful positions and possibly sabotage your statistics). I do throw away positions that are "close to" mate, since the evaluation is pointless for those anyway.
Thomas...

 Posts: 90
 Joined: Sun Nov 02, 2008 3:43 pm
 Location: Barcelona
Re: The texel evaluation function optimization algorithm
In a tactical path , the static or the quiesce values are not trustable.tpetzke wrote:A call to eval and a call to qSearch is computational wise a huge difference. If you want nonquiet positions in the set you should replace the nonquiet position with the position from the qSearch the score comes from.Of course. Unless you filter out nonquiet positions first (say, positions where qscore != eval) you cannot use eval directly, but I don't see why you'd do that (except to throw away useful positions and possibly sabotage your statistics). I do throw away positions that are "close to" mate, since the evaluation is pointless for those anyway.
If you bother with the computational effort of quiesce evaluation. Just consider the positions where selected from a pgn file and a full search has made the move. If, in a given position, a quiesce will return a different value than the static evaluation, there is two possibilities:
The quiesce move was played, then the quiesce value is presumably better.
The quiesce capture was not played so the static and quiesce value are just as good/bad.
This is what I do:When I parse the pgn file I discard every position from where a capture or a check or an evasion or a promote move was made. This is easy and quick to do: you look for a '=','+','x' characters in the pgn move or a '+' in the previous move for an evasion. I discard positions where a tactical shot is developing and I expect the rest is quiet enough.(I do tune with static values).
best regards.
Antonio.