Page 1 of 5

Stockfish's tuning method

Posted: Fri Oct 07, 2011 4:32 am
by zamar
As Marco has already told you, my time for the computer chess will be very limited in the future. Because I have a new job plus I need to look after three children, it's practically impossible to allocate necessary resources for this hobby. Before leaving, I want to share the tuning method that was used successfully with SF. This is not a scientific breakthrough, but a simple method that worked nicely for us.

Following text is released under public domain and can be freely distributed:

Stockfish tuning method

Introduction

The following tuning method was used to significantly improve Stockfish's playing strength (40-70 ELO points).
The method is a practical approach and not mathematically very sound. Because algorithm is very simple, it's very
likely already invented a long time ago. No pseudo- or source-code is given, just an idea behind the algorithm.

Step 1. Single variable.

Let's assume that we have a single variable x which we want to tune. Current value for x is 100. We assume that
this value is quite good (because we already have a strong engine), but not perfect. Next we need to choose delta for x.
Delta must be big enough that engine(x = 100 - delta) and engine(x = 100 + delta) has at least a 1-3 elo point
difference. However delta must not to be too big, because then asymmetries start to play a big role. One just need to
use his intuition here. Let's choose delta = 20.

Now we match engines engine(80) and engine(120) [self-play]. If engine 120 wins, we tune x slightly upwards. In our tuning sessions We successfully
used apply_factor = 0.002. So in that case new value for x would be. x = 100 + (120 - 100) * 0.002 = 100.04.

Next match would be engine(80.004) vs. engine(120.04)

In our tuning session we used 30000-100000 super-fast games, before reading final result.

Step 2. Varying delta.

Instead of fixing delta, we fixed standard deviation of gaussian distribution and calculated a random value for delta
for each iteration. But again one needs to use his intuition to pick a suitable value for standard deviation.

Step 3. Multiple variables.

Doing this for only one variable at a time would be a way too slow. Instead we used to tune 7-35 variables at the same time.
As an example let's say that we have three variables: x = 100, y = 100, z = 100.
We would then calculate a random delta with sign for each of these variables based on gaussian distribution.
Let's say we get x_delta = +15, y_delta = -10, z_delta = +12.

We then match engine(x=115, y=90, z=112) vs. engine(x=85, x=110, x=88). If first engine wins and apply_factor = 0.002 is used,
new values are: x = 100 + (115 - 100) * 0.002, y = 100 + (90 - 100) * 0.002, z = 100 + (112 - 100) * 0.002.

With 30000-100000 super-fast games, we usually got some improvement compared to only by hand tuned values.

Pros and cons.

What actually happens with multiple variables is that most important variables shall dominate and they shall approach their optimal values,
while less important variables take "a random walk". In the beginning this increases strength. But after a while important variables stop
approaching optimal values and "random walk" takes over and the strength starts to decrease. So the method doesn't converge and it needs
to be stopped at "suitable moment". This is a big problem as well as manual selection of deltas (or standard deviation of delta).

Most other tuning algorithms start "from scratch". Although we know a value which is very close to optimal, they make no use of it.
They start matching engine(x=0) vs. pool of other engines and engine(x=200) vs. pool of other engines. And it takes tens of thousands or
hundreds of thousands iteration before they can even reach the current level (x=100) and only after that we can get some
improvement. Instead method described in here starts from that "very good" value and attempts to improve it slightly.

Variable selection.

Variable selection is extremely important. for example if you tune each piece-square table value separately, you are doomed the fail, because
changing only one value is going to change the strength of the program only very little. So in tuning we end up doing only random walk and
the strength of the program only goes slightly downhill.

But instead using ampli-bias knobs for tables proved to be very successfull approach for us. Each value for the table is adjusted using these two
knobs like: new_value = orig_value + var_bias + orig_value * var_amplitude / 100. (Here we optimize variables var_bias and var_amplitude).
Each chess engine is full of different kind of tables and if we can give even "a slight push" for each of these tables, it'll result in considerable increase in the end.

Re: Stockfish's tuning method

Posted: Fri Oct 07, 2011 6:16 am
by Ferdy
Thanks a lot.

Re: Stockfish's tuning method

Posted: Fri Oct 07, 2011 8:28 am
by Edmund
Thank you very much for these insights.

I would be more than enlighted to read Remi Coulom comment on this, evaluating this approach from a scientific background and of cause drawing parallels to his own testing-tool.

Re: Stockfish's tuning method

Posted: Fri Oct 07, 2011 9:43 am
by Steve Maughan
Hi Joona,

Thanks for sharing. It seems like a good, practical, hill-climb optimization approach.

Cheers,

Steve

Re: Stockfish's tuning method

Posted: Fri Oct 07, 2011 4:22 pm
by bob
Steve Maughan wrote:Hi Joona,

Thanks for sharing. It seems like a good, practical, hill-climb optimization approach.

Cheers,

Steve
It's very similar to what I have been doing using our cluster. It has some SERIOUS drawbacks however. Tuning one value at a time is perfectly workable, and the idea of a sort of "stepwise refinement using variable delta" works pretty well to zero in on the optimal values. Say piece values. But when you change more than one at a time, regardless of how you do it, problems arise. If one value dominates, you can likely get it "real close". But if you have two values that are very close in importance, random walks can take a LONG time to find optimal values. I found many values in Crafty that really didn't seem to have an "upper bound". But at some point, going larger doesn't do anything. Or going smaller. But add in several of these at once, and I am far from convinced the testing approach is any good at all, based on a ton of test games myself. If you can be certain that the two variables don't somehow cancel each other out when you tweak the values, you can make this work well. But I know of no methodology that will suggest this except testing, when the auto-tuning fails to work.

We've taken the science of tuning a single value to a high-level. But tuning multiple values at the same time is beyond problematic, as it requires so many games, and so many iterations, you need really big hardware (and I DO NOT like self-testing here, that can really lead you astray). As an example, take tennis. You have a small weakness in your backhand. You have trouble running across the court to your backhand side, when your opponent hits the ball cross-court fairly close to the net. Do you REALLY want to tune your game so that you optimize your ability to hit the ball cross-court close to the net to your backhand side? You get eaten up by someone that is left-handed as you have learned to hit it to their "optimal spot". You get eaten up by someone that is right-handed but who can return that shot easily as it is their favorite. I dislike self-testing completely, you are learning to exploit your own weaknesses. Instead I want to test against several opponents so that I don't learn to exploit just one of their weaknesses, I learn to exploit something that works against them all pretty equally.

Re: Stockfish's tuning method

Posted: Fri Oct 07, 2011 7:17 pm
by zamar

Re: Stockfish's tuning method

Posted: Fri Oct 07, 2011 8:39 pm
by Rémi Coulom
Thanks for posting your algorithm.
zamar wrote:The method is a practical approach and not mathematically very sound. Because algorithm is very simple, it's very
likely already invented a long time ago.
It is not so unsound. It is like the SPSA algorithm, except SPSA does not use self-play. You can read about SPSA there, if you are interested:
http://www.jhuapl.edu/SPSA/

In order to guarantee convergence of SPSA, it is necessary to decay the deltas and learning rate in time.

As I mentioned in my paper, SPSA has the potential to be close in performance to CLOP, but its main weakness (as Joona says) is that it is very difficult to choose good values for all its meta-parameters. In my experiments, SPSA with optimal meta-parameters performs like CLOP. But in practice, it is not possible to find the optimal meta-parameters of SPSA, so I'd prefer using CLOP.

I did not understand the part about ampli-bias knobs.

Rémi

Re: Stockfish's tuning method

Posted: Fri Oct 07, 2011 8:47 pm
by mcostalba
Rémi Coulom wrote: I did not understand the part about ampli-bias knobs.
This is what makes the stuff to work: the tuning started to work when, instead of tuning directly the values of the engine parameters, we started tune them indirectly using some other variables, each one controlling all the parameters at once and each one in a different way.

Say you want to tune 10 engine parameters at once, we found a smaller set of say 5 control parameters, independent one from the other, and where each of this "control" parameter was made so that changing its value made it change the values of all the original engine parameters. This makes the tuning more sensible and less subject to noise.

The name "ampli-bias knobs" was chosen as an analogy to old analog TV where you can control the full image view (made of many pixels) using only two knobs: contrast (ampli) and luminance (bias).

Re: Stockfish's tuning method

Posted: Fri Oct 07, 2011 9:02 pm
by Steve Maughan
Another idea which has popped into my head and may be crazy but here goes:

1. Assume you want to tune 10 parameters. Set them to the sensible score +/- delta randomly for each game, recording their value and result for each game.

2. After the 30,000 games create a count model (poisson or negative binomial) and model the game's outcome based on the optimization parameter being high or low (1 or 0). The zscore from the model will tell you which parameters are statistically important.

3. Move the parameters with the highest zscore the most.

4. Rinse and repeat

I've no idea if this would work or not. I think the count model could probably be probed for a more sophisticated tuning of the mean score, the adjustment and the size of delta.

Any comments?

Steve

Re: Stockfish's tuning method

Posted: Fri Oct 07, 2011 9:06 pm
by Gerd Isenberg
zamar wrote:Created quickly a page in wiki:

https://chessprogramming.wikispaces.com ... ing+method
Oups sorry, I didn't recognize it was you (not aware of all you confusing alias names), but thought somebody else had copied/pasted the text without quoting original source - so I deleted the page but restored it later ;-)

I have edited it slightly. Thank you!

Cheers,
Gerd