A little improvement to the Texel's tuning method

Fabio Gobbato · Post by **Fabio Gobbato** » Sat Oct 11, 2014 7:37 pm

I have used it to tune my engine Pedone with very good results.

I have downloaded about 200000 games from CCRL played by the top engine at 40/40 time control.
From each game I have extracted 7-8 random positions and I have put together in a file (fen.csv) the FEN string of every position, the final result of the game and the number of plies of the entire game.

I have filtered the file deleting same positions with different results to obtain only positions with a single result and I have also deleted opening and checkmated positions.
I have also extract from the fen string the number of plies of the position from the beginning of the game, calculating it from the moves number.

The resulting file is something like this:
FEN string Result N ply Tot ply
3r2k1/p3qppr/3p1b1P/2p1nP2/2P2B2/1P4Q1/P3R1B1/3R3K w - - 0 35, 0.5, 68, 122,
4k3/R2n3p/1r2p1p1/P1p1p1Pn/1pP1P3/7P/P4P1N/5BK1 b - - 0 34, 0, 67, 157,
3k4/R2n3p/P2rp1p1/2p1p1P1/1pP1Pn2/5N1P/P4P2/5BK1 w - - 1 37, 1, 72, 157,
...

Instead of comparing the final result with the evaluation like in Texel's tuning method, I compare the evaluation with the interpoled result:

I = 0.5 + ((final result - 0.5) * n plies of the position) / total plies of the game of this position

With this I calculate the evaluation error like in Texel's tuning method

S = 1 / ( 1 + e ^ ( -eval/K ) )

evaluation error = 1/N Σ ( I - S )^2

I have get better results than comparing the evaluation with the final result.

michiguel · Post by **michiguel** » Sat Oct 11, 2014 9:00 pm

Fabio Gobbato wrote:I have used it to tune my engine Pedone with very good results.

I have downloaded about 200000 games from CCRL played by the top engine at 40/40 time control.
From each game I have extracted 7-8 random positions and I have put together in a file (fen.csv) the FEN string of every position, the final result of the game and the number of plies of the entire game.

I have filtered the file deleting same positions with different results to obtain only positions with a single result and I have also deleted opening and checkmated positions.
I have also extract from the fen string the number of plies of the position from the beginning of the game, calculating it from the moves number.

The resulting file is something like this:
FEN string Result N ply Tot ply
3r2k1/p3qppr/3p1b1P/2p1nP2/2P2B2/1P4Q1/P3R1B1/3R3K w - - 0 35, 0.5, 68, 122,
4k3/R2n3p/1r2p1p1/P1p1p1Pn/1pP1P3/7P/P4P1N/5BK1 b - - 0 34, 0, 67, 157,
3k4/R2n3p/P2rp1p1/2p1p1P1/1pP1Pn2/5N1P/P4P2/5BK1 w - - 1 37, 1, 72, 157,
...

Instead of comparing the final result with the evaluation like in Texel's tuning method, I compare the evaluation with the interpoled result:

I = 0.5 + ((final result - 0.5) * n plies of the position) / total plies of the game of this position

With this I calculate the evaluation error like in Texel's tuning method

S = 1 / ( 1 + e ^ ( -eval/K ) )

evaluation error = 1/N Σ ( I - S )^2

I have get better results than comparing the evaluation with the final result.

This is a very interesting idea. An important effect is that ther parameters that get fit from information extracted from fewer positions won't have big swings because of the smaller samples.

Are you using quiescence search or just the static evaluation? Texel uses the former and Gaviota the latter.

Miguel

mvk · Post by **mvk** » Sat Oct 11, 2014 10:34 pm

Fabio Gobbato wrote: Instead of comparing the final result with the evaluation like in Texel's tuning method, I compare the evaluation with the interpoled result:

I = 0.5 + ((final result - 0.5) * n plies of the position) / total plies of the game of this position

[...]

I have get better results than comparing the evaluation with the final result.

Thank you. Can you in some way already quantify the difference between the two approaches? Is it just faster convergence, or is there a strength difference in the end?

Fabio Gobbato · Post by **Fabio Gobbato** » Sun Oct 12, 2014 2:59 pm

I use Quiescence search.

This method gives a better tuning result I haven't compared it with the Texel's approach but I think there is a big difference.

The idea is that all the games start from a draw position and the result goes to 1, 0 or stay draw. And the evaluation can never match a 1 o 0 result if you bring a position near the beginning of the game because the position should have an evaluation near 0cp.

The most errors with the Texel approach are about 0.4-0.5 and they come from the difference between the final result and the evaluation from a starting position.

AlvaroBegue · Post by **AlvaroBegue** » Sun Oct 12, 2014 4:18 pm

I understand the reasoning behind using this progressive formula, but there are two things about this method that seem wrong to me:

(1) There is no need to remove positions that appear multiple times with different results: The value that minimizes the sum of squares of the errors is the mean of the results, which is what you want your evaluation function to return. (*)

(2) If a position appears 20% into the game and has a true average result of 0.6, your method encourages the evaluation function to assign it a score of 0.52. If the same 0.6 average result appears in a position 80% into the game, the optimal score for it is 0.58. In principle your engine should be indifferent if presented with an opportunity to exchange a whole bunch of material or not, where both options have an expected result of 0.6. But with your scheme, there is an oversized incentive to advance the stage of the game when you are ahead and delay it when you are behind.

I really think the phenomenon described in objection (2) will result in a weaker engine.

(*)- This is not a very special property of the square of the error: Because of Jensen's inequality, any convex function of the error would have the same feature. Another important possibility for measuring error is something like cross-information, entropy or minus the log-likelihood (they are all the same thing).

Fabio Gobbato · Post by **Fabio Gobbato** » Mon Oct 13, 2014 6:54 pm

1) Ok, it's not a real problem to have 2 or more position with different results

2) I cannot find in this way the real score of the position, but I can find a good approximation in a very little time. And the tuning seems much better than comparing the evaluation with the final result.

This is the best tuning approach that I have found. What is the best tuning method in your opinion?

AlvaroBegue · Post by **AlvaroBegue** » Mon Oct 13, 2014 7:22 pm

Fabio Gobbato wrote:This is the best tuning approach that I have found. What is the best tuning method in your opinion?

I have only played a little bit with tuning of chess evaluation functions, so I don't have a strong opinion. I have some experience tuning in other contexts (like financial models), and introducing a bias the way you are doing here seems like the wrong thing to do. But of course if your engine plays better after being tuned with this method, my feelings should be completely ignored.

I like the general method you are following:
* Take a few positions from each game.
* Resolve immediate captures with quiescence search.
* Make your evaluation be a model that predicts the outcome of the game when passed through a sigmoid function.

I would prefer measuring the error using log-likelihood instead of sum of squares, but someone in this forum reported that it made very little difference in practice. You also need to put some thought into how to handle draws in the log-likelihood computation.

Probably the most important thing to have for tuning to work well is enough data.

asanjuan · Post by **asanjuan** » Wed Oct 15, 2014 5:42 pm

Fabio Gobbato wrote:1) Ok, it's not a real problem to have 2 or more position with different results

2) I cannot find in this way the real score of the position, but I can find a good approximation in a very little time. And the tuning seems much better than comparing the evaluation with the final result.

This is the best tuning approach that I have found. What is the best tuning method in your opinion?

Hello Fabio,

As Álvaro said, the most important thing is diversity in the sample, and, (very important if you use a small set of positions) avoid overtunning.

If my reasoning doesn't fail, if you pick 8 random positions from 200000 games you end with 1.6 million positions, wich is a small number of positions in my opinion.

Using the final outcome of the game has a fundamental advantage: it is able to discover the winning strategies.

The first time I used this algorithm I generated a position's file that only contained won and lost postitions. It was a blunder, of course, but I noticed that using only decided games (this is, without drawn games), after the optimization process, it started to play in a ultra-extra-amazing-aggresive style. I could see incredible attacks over the board, but the evaluations were of course very optimistic or pessimistic, and sometimes wasn't able to draw drawn positions. On the other hand, sometimes was able to draw lost positions.

I deduced that I needed a compromise in the won/lost/drawn positions to get a balanced evaluation. This is the important thing.

For example, if you use games played by top programs, the draw ratio of your sample is very high, and the winning strategies are vanished away in the process.
This is why I am using only games generated with self play and adding a strong opponent to discover new knowledge.

Your formula interpolating the final result is a method to get a better fitted evaluation with the current position, or in other words, to avoid the "optimism" of your engine, but the goal of the process is not to say "hey, this position is +0.05 for white", but to detect the winning strategies and evaluate objectively if the position A is better than position B because it led to more wins in the training, and then, make better decisions.

Hope it helps.

Alberto.

tpetzke · Post by **tpetzke** » Wed Oct 22, 2014 8:10 pm

So far I was not able to improve my evaluation with the Texel method, I might give your change a try however.

I was always able to push the evaluation error significantly down compared to my current weight set but the engine played worse. I don't really understand why but some possible cause might be

1. It is difficult to tune weights that are associated with positional elements you find not so often in the PV of a game, but very often in the nodes encountered by eval and then discarded by search. An example might be the triple pawn. How often do you see one in a real game played by top engines. Maybe never. Does it mean you can get rid of it ?

2. The number of drawn games has a high impact on the result. Too many draw games and you eval will be simplified a lot by the tuner. To less is probably also a bad thing.

I have not found the ideal combination, but I keep searching

Thomas...

asanjuan · Post by **asanjuan** » Wed Oct 22, 2014 8:41 pm

To use games played by top programs is useless for this learning method IMO. You need suboptimal, but playable moves to get into positions where you can learn from the mistake. This is how your program can fall into the triple pawn and learn a good weight from it.

Imagine that you introduce a new eval feature. I think that the best way to find a balanced weight for the new eval term is as follows:
1. Set the eval term a value from your own knowledge. Something that makes sense for you.
2. Play this version against a stronger opponent some thousands of games. The new eval term may put the program in positions where the eval term takes effect.
3. Extract all the positions and run the learning method as we all know, tunning EVERY parameter of your evaluation.
4. Run a verification match against the previous version to see if it plays better chess or not.

A little improvement to the Texel's tuning method

A little improvement to the Texel's tuning method

Re: A little improvement to the Texel's tuning method

Re: A little improvement to the Texel's tuning method

Re: A little improvement to the Texel's tuning method

Re: A little improvement to the Texel's tuning method

Re: A little improvement to the Texel's tuning method

Re: A little improvement to the Texel's tuning method

Re: A little improvement to the Texel's tuning method

Re: A little improvement to the Texel's tuning method

Re: A little improvement to the Texel's tuning method