Evaluation Tuning

tpetzke · Post by **tpetzke** » Mon Aug 24, 2015 2:04 pm

Hi Michael,

my two cents after playing a bit with the Texel tuning method.

The outcome is very sensitive about the percentage of positions of drawn games that you include. As an extreme imagine a set with positions only from drawn games, the lowest error will be produced by a set with all weights being 0. This set will not win a lot of games.

A lower evaluation error does not necessarily mean better game playing performance. The most common case in all my tests was a new set with a lower error that finally scores only 48% or so. You can also drive the evaluation error easily down by exposing more parameters. However it does not improve the engine.

I was tuning MG and EG values at the same time and I did not experience crazy values like Queen 200 and 1400 but I usually ended up with queen values bigger than 1200 (both for MG and EG).

One thing that always troubles me is that you use actual positions from the game. In most of those positions either both sides have a queen or they are already exchanged. So the material value of the queen is zeroed out for most of the positions. Positions with a material imbalance would probably be better but the sets most likely contain not enough. Most of the positions a chess engine has to deal with in eval are later not occurring in the game. So fitting the eval to those positions might not be the best preparation for real life.

Thomas...

Evaluation Tuning

Re: Evaluation Tuning