Stockfish's tuning method

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Rémi Coulom
Posts: 429
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: Stockfish's tuning method

Post by Rémi Coulom » Fri Oct 07, 2011 6:39 pm

Thanks for posting your algorithm.
zamar wrote:The method is a practical approach and not mathematically very sound. Because algorithm is very simple, it's very
likely already invented a long time ago.
It is not so unsound. It is like the SPSA algorithm, except SPSA does not use self-play. You can read about SPSA there, if you are interested:
http://www.jhuapl.edu/SPSA/

In order to guarantee convergence of SPSA, it is necessary to decay the deltas and learning rate in time.

As I mentioned in my paper, SPSA has the potential to be close in performance to CLOP, but its main weakness (as Joona says) is that it is very difficult to choose good values for all its meta-parameters. In my experiments, SPSA with optimal meta-parameters performs like CLOP. But in practice, it is not possible to find the optimal meta-parameters of SPSA, so I'd prefer using CLOP.

I did not understand the part about ampli-bias knobs.

Rémi

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Stockfish's tuning method

Post by mcostalba » Fri Oct 07, 2011 6:47 pm

Rémi Coulom wrote: I did not understand the part about ampli-bias knobs.
This is what makes the stuff to work: the tuning started to work when, instead of tuning directly the values of the engine parameters, we started tune them indirectly using some other variables, each one controlling all the parameters at once and each one in a different way.

Say you want to tune 10 engine parameters at once, we found a smaller set of say 5 control parameters, independent one from the other, and where each of this "control" parameter was made so that changing its value made it change the values of all the original engine parameters. This makes the tuning more sensible and less subject to noise.

The name "ampli-bias knobs" was chosen as an analogy to old analog TV where you can control the full image view (made of many pixels) using only two knobs: contrast (ampli) and luminance (bias).

User avatar
Steve Maughan
Posts: 1061
Joined: Wed Mar 08, 2006 7:28 pm
Location: Florida, USA
Contact:

Re: Stockfish's tuning method

Post by Steve Maughan » Fri Oct 07, 2011 7:02 pm

Another idea which has popped into my head and may be crazy but here goes:

1. Assume you want to tune 10 parameters. Set them to the sensible score +/- delta randomly for each game, recording their value and result for each game.

2. After the 30,000 games create a count model (poisson or negative binomial) and model the game's outcome based on the optimization parameter being high or low (1 or 0). The zscore from the model will tell you which parameters are statistically important.

3. Move the parameters with the highest zscore the most.

4. Rinse and repeat

I've no idea if this would work or not. I think the count model could probably be probed for a more sophisticated tuning of the mean score, the adjustment and the size of delta.

Any comments?

Steve

Gerd Isenberg
Posts: 2125
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: Stockfish's tuning method

Post by Gerd Isenberg » Fri Oct 07, 2011 7:06 pm

zamar wrote:Created quickly a page in wiki:

https://chessprogramming.wikispaces.com ... ing+method
Oups sorry, I didn't recognize it was you (not aware of all you confusing alias names), but thought somebody else had copied/pasted the text without quoting original source - so I deleted the page but restored it later ;-)

I have edited it slightly. Thank you!

Cheers,
Gerd

Rémi Coulom
Posts: 429
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: Stockfish's tuning method

Post by Rémi Coulom » Fri Oct 07, 2011 7:29 pm

mcostalba wrote:The name "ampli-bias knobs" was chosen as an analogy to old analog TV where you can control the full image view (made of many pixels) using only two knobs: contrast (ampli) and luminance (bias).
Thanks. I understand now.

Rémi

jdart
Posts: 3803
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Stockfish's tuning method

Post by jdart » Sat Oct 08, 2011 2:06 am

How many CPUs, and how much time, do you need for 30,000 games?

mjlef
Posts: 1420
Joined: Thu Mar 30, 2006 12:08 pm
Contact:

Re: Stockfish's tuning method

Post by mjlef » Sat Oct 08, 2011 2:30 am

and were these fixed depth or timed matches?

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Stockfish's tuning method

Post by mcostalba » Sat Oct 08, 2011 2:50 am

Rémi Coulom wrote: Thanks. I understand now.
My mathematics skills are very rusty these days, but more or less (forgive me if I am not precise), if we consider the N parameters to tune in parallel a vector of dimension N, then we tune instead the coefficients of the matrix for which the vector is an eigenvector.

zamar
Posts: 613
Joined: Sun Jan 18, 2009 6:03 am

Re: Stockfish's tuning method

Post by zamar » Sat Oct 08, 2011 2:12 pm

zamar wrote:The method is a practical approach and not mathematically very sound. Because algorithm is very simple, it's very
likely already invented a long time ago.
It is not so unsound. It is like the SPSA algorithm, except SPSA does not use self-play. You can read about SPSA there, if you are interested:
http://www.jhuapl.edu/SPSA/
Thanks for the link Remi. Yes, it looks like that our method is SPSA with the exception of self-play.
As I mentioned in my paper, SPSA has the potential to be close in performance to CLOP, but its main weakness (as Joona says) is that it is very difficult to choose good values for all its meta-parameters. In my experiments, SPSA with optimal meta-parameters performs like CLOP. But in practice, it is not possible to find the optimal meta-parameters of SPSA, so I'd prefer using CLOP.
The attractiveness in SPSA for us is that when we already have a very good starting value, it immediately starts to improve it. While CLOP (if I've understood it correctly) always starts from scratch. Most of the SF tuning was done using only one QUAD-core computer, so we could only use at maximum 100'000 games for each set of variables (set containing 7-30 variables).

But I don't know, it's of course possible that CLOP could have done better...
Joona Kiiski

zamar
Posts: 613
Joined: Sun Jan 18, 2009 6:03 am

Re: Stockfish's tuning method

Post by zamar » Sat Oct 08, 2011 2:17 pm

jdart wrote:How many CPUs, and how much time, do you need for 30,000 games?
We used time controls to reach 100'000 games/1 week. Only one quad core computer was used. We ran 1CPU matches on parallel although this adds some extra noise to the results.
Joona Kiiski

Post Reply