Testing: Optimizing for testing set

Tony · Post by **Tony** » Sun Aug 10, 2008 12:20 pm

New thread for a side subject.

If a lot of starting positions are needed to make a valid test, don't you run the risk of optimizing for these testpositions ( just as fe when testing with a tactical testset) ?

And if we are optimizing for the testset, are we improving the engine ?

Tony

hgm · Post by **hgm** » Sun Aug 10, 2008 12:55 pm

Yes, we are optimizing for the set. And no, that does not necessarily improve the engine if the set is too small or unrepresentative.

Now the situation is not that bleak, as many improvements that give a better result on one position also will give better results on other positions, just because they make the engine play betterChess. But occasionally you will encounter a testposition where the engine is at 'cross roads', and evaluates two moves that lead to completely different lines of ply only slightly different. One line might lead to a very bad position, the other to a good one, for reasons the engine could not possibly see.

In those cases there is a very big risk that you will start to tune evaluation parameters to extreme values that sway the choice of move in that position towards the line that happens to end good for a completely different reason (e.g. accidental tactics), because the general detremental effect of that tuning is more than offset by the great improvement it gives in the result on this position, by affecting that first move.

The best way to guard against this, is to have enough positions, so that the result on a single position is very unlikely to outweigh any general detrimental effect of a mis-tuning. Another way would be to sufficiently randomize the play of the engines in any position (e.g. by adding random scores in the root before picking the best move), so that a small change in tuning will never cause a sudden 100% to 0% switch in the choice of a certain fatal move. That makes it more difficult to abuse the tuning to improve results on any specific position. (Another way of looking at this is that it increases the effective number of initial positions, by using the engiine randomization to generate a lot of positions from each single position we make it start from, in the first few moves.)

bob · Post by **bob** » Sun Aug 10, 2008 4:28 pm

Tony wrote:New thread for a side subject.

If a lot of starting positions are needed to make a valid test, don't you run the risk of optimizing for these testpositions ( just as fe when testing with a tactical testset) ?

And if we are optimizing for the testset, are we improving the engine ?

Tony[/quote

It depends on what you mean by "test set". If you mean a group of positions where you search them and hopefully find a "key move" for each, then yes, I would not do that at all. If you mean a group of positions that you use as starting positions and play complete games from that point, then it is a different issue. I would think that so long as you cover the kinds of openings you play, or that you would allow your opponent to play, then you are discovering how you are going to do if you reach those positions.

Making the piositions "representative" is an issue. I don't know whether the current "ten moves deep, chosen by popularity of play" is a good approach or not. And I will probably spend some time later on trying to refine this a bit...

Testing: Optimizing for testing set

Testing: Optimizing for testing set

Re: Testing: Optimizing for testing set

Re: Testing: Optimizing for testing set