Evaluation Tuning: When To Stop?

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
Posts: 83
Joined: Thu Sep 27, 2012 12:24 am

Evaluation Tuning: When To Stop?

Post by Cheney » Mon May 29, 2017 8:58 pm


I am at the point of manually tuning my very basic evaluation. I have PSQ, added some passed pawn knowledge, and am adding some other pawn structure items, like doubled, isolated, and chained/supported pawns.

With my testing, the basics are to first play the current version against the previous version and figure out if there is an improvement. Hopefully, the "guessed" bonuses/penalties are right around the ballpark to get some kind of gain.

I have then adjusted the bonus/penalties a bit and run the games again. A few times, this lead to no gain or a decrease in elo. This method of testing seems very inefficient as it appears manually adjusting, playing, adjusting, and playing again could take weeks or months.

I feel like I am brute forcing this to an end. Without learning some other tuning method like CLOP or Texel, is this really the best way to go about this then when I am comfortable with a gain, move on to the next eval feature?

Thank you :)

Posts: 3645
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: Evaluation Tuning: When To Stop?

Post by Ferdy » Mon May 29, 2017 11:36 pm

It only stops if you stop adding features including search features that affects evaluation. You can stop tuning after it reaches optimal values. Do actual game tests as many games as possible and as many different opponents as possible to determine optimal values.

As number of features increases it is no longer that easy to find the best combination of feature values manually, so you may try auto-tuning before doing the actual game test.

I have tried auto tuning before.
1. Clop
2. Texel
3. score comparison with strong engine score, error decreases as your score is close to the score of the strong engine
4. move comparison to the best move of the position or best move by a strong engine
5. Genetics algorithm or GA, this is interesting as feature values that work in certain individuals are transferred to other individuals, then apply [2], or [3] or [4].

User avatar
Posts: 2095
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra

Re: Evaluation Tuning: When To Stop?

Post by cdani » Tue May 30, 2017 12:48 pm

I tune Andscacs values by hand also.

Is not necessary to tune to death every value, specially with a young engine where you will add a lot of new stuff that will change the optimal value of most parameters many times.

Is important also to respect the error bars of the tests.

Posts: 83
Joined: Thu Sep 27, 2012 12:24 am

Re: Evaluation Tuning: When To Stop?

Post by Cheney » Fri Jun 02, 2017 3:09 pm

Thank you for the info :)

I do not want to tune the eval excessively but I also would like to just see an improvement and when I dont, it is concerning.

Adding passed pawns added approximately 50 elo over the previous version and in a series of games with other engines.

Adding other basics (isolated, chained, doubled) only added 25 elo.

I have just added backwards pawns. I have tried different definitions of what a backwards pawn is (example: penalty if only on its half of the board) and I have tried different mg and eg values but it consistenly no gain over a few thousand games... A whole lot of the engines jockeying for position :)

My thoughts... Do I leave this as is, remove it, tune the other pawn structure values to find a balance, etc. I know there is a level of randomness especially for two very equal engines. This is an example of my dilemma, figuring out when to move on or stay and spend what looks like weeks or months tuning a single feature.

Posts: 336
Joined: Sat May 05, 2012 12:48 pm
Location: Bergheim

Re: Evaluation Tuning: When To Stop?

Post by BeyondCritics » Fri Jun 02, 2017 5:20 pm

You should definitely consider this script: https://github.com/zamar/spsa
It is known to work and a version of it is used for stockfish tuning. That means you can ask them for assistance.

Post Reply