Question on Texel's Tuning Method
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Question on Texel's Tuning Method
I have a theoretical question on Texel's tuning method. If I understand it correctly, the function takes a set of positions, and minimizes the mean squared error of the positions. The error in a position is the difference between the (adjusted) evaluation and the result of the game.
Clearly this is an effective technique, but I can't help but think that most of the positions being analyzed don't have a clear winner. In a pathological case, the game's winner may be slightly losing in the middle game. This would be asking the algorithm to tune a winning position to evaluate as a losing position, and vice versa.
Similarly, wouldn't these unclear positions dominate the contribution to mean squares error? That is, highly accurate lategame evaluations will be ignored in favor of slightly inflated early and midgame evaluations.
My thought was, wouldn't it make more sense to instead calculate a position's error by calculating the difference between the current position and the position, say, 5 moves into the future (perhaps an average of several future moves)? You'd need a technique to prevent it from tuning all parameters to 0, but you only need to prevent the final positions of the games from substantially lowering their scores (there are a few techniques that come to mind).
The effect of this should be that the mean squared error is now dominated by positions on the cusp of a large eval swing, and that positions throughout the duration of the game all have similar effects on contributions to mean squared error.
Is my thinking sound, or am I missing something?
Clearly this is an effective technique, but I can't help but think that most of the positions being analyzed don't have a clear winner. In a pathological case, the game's winner may be slightly losing in the middle game. This would be asking the algorithm to tune a winning position to evaluate as a losing position, and vice versa.
Similarly, wouldn't these unclear positions dominate the contribution to mean squares error? That is, highly accurate lategame evaluations will be ignored in favor of slightly inflated early and midgame evaluations.
My thought was, wouldn't it make more sense to instead calculate a position's error by calculating the difference between the current position and the position, say, 5 moves into the future (perhaps an average of several future moves)? You'd need a technique to prevent it from tuning all parameters to 0, but you only need to prevent the final positions of the games from substantially lowering their scores (there are a few techniques that come to mind).
The effect of this should be that the mean squared error is now dominated by positions on the cusp of a large eval swing, and that positions throughout the duration of the game all have similar effects on contributions to mean squared error.
Is my thinking sound, or am I missing something?
 hgm
 Posts: 25842
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: Question on Texel's Tuning Method
I think the normal Texel method is sound enough, because what you in fact do is tune the evaluation to the average score of many very similar positions. And yes, that includes positions that are slightly better, but were still lost. But on average the number of wins from positions that are slightly better would be larger than the number of losses. The better the position, the larger the the difference. That is what you want the evaluation to quantify: your prospects of scoring points.
By artificially making attempts to remove losses from positions that were better, you distort the average, and the positions start to look better than they actually are. This would wreck the tuning.
By artificially making attempts to remove losses from positions that were better, you distort the average, and the positions start to look better than they actually are. This would wreck the tuning.
Re: Question on Texel's Tuning Method
There’s a great deal of power in “averaging”. My own lifechanging (haha, but I learnt something) was a physics experiment from way back, we had to do one a week for two years and write it up, drove me mad with boredom usually. However the worst was being told (everybody did something different each week) to measure a short steel bar with a micrometer screw gauge, highly accurate, and then with a crude wooden ruler with millimetre markings. The idea was to guess at the first significant decimal place (eg 45.7 mm or whatever) put the bar down, write down the guess (well, eyesighted guess) and repeat 100 times. Blow me down, but the average of the guesses, with several decimal places matched the micrometer screw gauge.hgm wrote: ↑Wed Jul 08, 2020 9:17 pmI think the normal Texel method is sound enough, because what you in fact do is tune the evaluation to the average score of many very similar positions. And yes, that includes positions that are slightly better, but were still lost. But on average the number of wins from positions that are slightly better would be larger than the number of losses. The better the position, the larger the the difference. That is what you want the evaluation to quantify: your prospects of scoring points.
By artificially making attempts to remove losses from positions that were better, you distort the average, and the positions start to look better than they actually are. This would wreck the tuning.
The average of the noise approximates well to the actuality.
Re: Question on Texel's Tuning Method
Depends on your training positions. Texel Tuning can also handle draw and loss results aside from win result. If the training position has a draw result, the engine being trained has to show a drawish score or close to 0, if the result is a loss, the engine has to show a losing score.odomobo wrote: ↑Wed Jul 08, 2020 4:32 pmI have a theoretical question on Texel's tuning method. If I understand it correctly, the function takes a set of positions, and minimizes the mean squared error of the positions. The error in a position is the difference between the (adjusted) evaluation and the result of the game.
Clearly this is an effective technique, but I can't help but think that most of the positions being analyzed don't have a clear winner.

 Posts: 4096
 Joined: Fri Mar 10, 2006 4:23 am
 Location: http://www.arasanchess.org
Re: Question on Texel's Tuning Method
I tried to win the “guess the weight of the leg of ham” competition in village event once by going last and trying to do a running average of everybody else’s guesses as they announced them. Failed dismally.
Re: Question on Texel's Tuning Method
I guess you guessed too high.
I would have guessed e^(sum(ln(Wi))/N) where e is 2.71... , ln the natural logarithm and N the number of samples/guesses. Of course you could choose whatever base/log_basepair you want.
The reason behind my formula is that I think one is as likely to guess true_value/x as true_value*x
/Pio
 hgm
 Posts: 25842
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: Question on Texel's Tuning Method
There is a good analogy here with a 'strip detector'. When you want to determine the location of a point source (e.g. a tight light beam) you can make a dector that consists of adjacent lightsensitive strips. But you would never know the location within the strip that detects it. You could of course solve that by making very many very narrow strips. But it is far simpler to just put a 'diffuser' somewhat in front of the strip, which spreads out the light beam to a spot about the width of a strip. By measuring which fraction then falls into one strip, and which in the next one, you can get a pretty precise idea of where the center of the beam was w.r.t. the strips.
In Chess the strips are the win / draw / loss results. The diffuser is the nonperfect play between the position and game end, which causes the same position to not always fall on the same strip at game end. So that the ratio of the various results can be used to precisely measure the quality of the position.
In Chess the strips are the win / draw / loss results. The diffuser is the nonperfect play between the position and game end, which causes the same position to not always fall on the same strip at game end. So that the ratio of the various results can be used to precisely measure the quality of the position.
Re: Question on Texel's Tuning Method
The explanation of average scores of many similar positions does partially answer my question, but there's another aspect that I don't think I expressed well.
When tuning something (an eval in chess, or anything else), your error function will determine what things are most important to your tuning algorithm. For example, let's say we're tuning a simple f(x), with training data t(x), and our error function is abs[t(x)  f(x)]. Then our tuning algo is interested in absolute difference, i.e. 10000 vs 10010 is just as important as 10 vs 20.
But for the error function abs[t(x)  f(x)]/abs[t(x)], then it only cares about the relative difference, i.e. 10000 vs 11000 is just as important as 10 vs 11. In the latter case, smaller numbers must be much more accurate than larger numbers, but in the latter case, all numbers must be as accurate. Depending on the nature of your f(x), the nature of your training data, and the error function, then the tuning algorithm might spend all of its effort on large values, or might spend all of its effort on small values. Another way to put it, large values may end up dominating the mean squared error result, or small values may dominate it.
For texel tuning, here are the cases that it finds important and unimportant (as I understand it). Note that this is assuming all of the evaluations are relatively accurate (let's say at least within 100 cp).
Intuitively, this doesn't seem like the best approach. To me, it seems a better approach would be to value unclear middlegame positions (ending in W/L) more similarly to other positions. But maybe I'm missing something.
When tuning something (an eval in chess, or anything else), your error function will determine what things are most important to your tuning algorithm. For example, let's say we're tuning a simple f(x), with training data t(x), and our error function is abs[t(x)  f(x)]. Then our tuning algo is interested in absolute difference, i.e. 10000 vs 10010 is just as important as 10 vs 20.
But for the error function abs[t(x)  f(x)]/abs[t(x)], then it only cares about the relative difference, i.e. 10000 vs 11000 is just as important as 10 vs 11. In the latter case, smaller numbers must be much more accurate than larger numbers, but in the latter case, all numbers must be as accurate. Depending on the nature of your f(x), the nature of your training data, and the error function, then the tuning algorithm might spend all of its effort on large values, or might spend all of its effort on small values. Another way to put it, large values may end up dominating the mean squared error result, or small values may dominate it.
For texel tuning, here are the cases that it finds important and unimportant (as I understand it). Note that this is assuming all of the evaluations are relatively accurate (let's say at least within 100 cp).
 Decisive positions: mostly unimportant
 Unclear positions (in a game that eventually ends in draw): mostly unimportant
 Unclear positions (in a game that eventually ends in win/loss): very important
Intuitively, this doesn't seem like the best approach. To me, it seems a better approach would be to value unclear middlegame positions (ending in W/L) more similarly to other positions. But maybe I'm missing something.
Re: Question on Texel's Tuning Method
You should first convert your scores to win probabilities first so the big scores won’t dominate. It is the mean squares of the win probabilities you want to minimise. I am not sure however if it is best to minimise the squares. I would probably test to minimise the absolute error which has the advantage of better handling bad labelling.odomobo wrote: ↑Fri Jul 10, 2020 1:32 pmThe explanation of average scores of many similar positions does partially answer my question, but there's another aspect that I don't think I expressed well.
When tuning something (an eval in chess, or anything else), your error function will determine what things are most important to your tuning algorithm. For example, let's say we're tuning a simple f(x), with training data t(x), and our error function is abs[t(x)  f(x)]. Then our tuning algo is interested in absolute difference, i.e. 10000 vs 10010 is just as important as 10 vs 20.
But for the error function abs[t(x)  f(x)]/abs[t(x)], then it only cares about the relative difference, i.e. 10000 vs 11000 is just as important as 10 vs 11. In the latter case, smaller numbers must be much more accurate than larger numbers, but in the latter case, all numbers must be as accurate. Depending on the nature of your f(x), the nature of your training data, and the error function, then the tuning algorithm might spend all of its effort on large values, or might spend all of its effort on small values. Another way to put it, large values may end up dominating the mean squared error result, or small values may dominate it.
For texel tuning, here are the cases that it finds important and unimportant (as I understand it). Note that this is assuming all of the evaluations are relatively accurate (let's say at least within 100 cp).
The reason for this is in similar unclear positions (ending in W/L), we're averaging huge squared error values; we aren't getting the mean of the error values and then squaring them.
 Decisive positions: mostly unimportant
 Unclear positions (in a game that eventually ends in draw): mostly unimportant
 Unclear positions (in a game that eventually ends in win/loss): very important
Intuitively, this doesn't seem like the best approach. To me, it seems a better approach would be to value unclear middlegame positions (ending in W/L) more similarly to other positions. But maybe I'm missing something.
BR
/Pio