Evaluation Tuning: When To Stop?

Cheney · Post by **Cheney** » Mon May 29, 2017 10:58 pm

Hi!

I am at the point of manually tuning my very basic evaluation. I have PSQ, added some passed pawn knowledge, and am adding some other pawn structure items, like doubled, isolated, and chained/supported pawns.

With my testing, the basics are to first play the current version against the previous version and figure out if there is an improvement. Hopefully, the "guessed" bonuses/penalties are right around the ballpark to get some kind of gain.

I have then adjusted the bonus/penalties a bit and run the games again. A few times, this lead to no gain or a decrease in elo. This method of testing seems very inefficient as it appears manually adjusting, playing, adjusting, and playing again could take weeks or months.

I feel like I am brute forcing this to an end. Without learning some other tuning method like CLOP or Texel, is this really the best way to go about this then when I am comfortable with a gain, move on to the next eval feature?

Thank you

Ferdy · Post by **Ferdy** » Tue May 30, 2017 1:36 am

It only stops if you stop adding features including search features that affects evaluation. You can stop tuning after it reaches optimal values. Do actual game tests as many games as possible and as many different opponents as possible to determine optimal values.

As number of features increases it is no longer that easy to find the best combination of feature values manually, so you may try auto-tuning before doing the actual game test.

I have tried auto tuning before.
1. Clop
2. Texel
3. score comparison with strong engine score, error decreases as your score is close to the score of the strong engine
4. move comparison to the best move of the position or best move by a strong engine
5. Genetics algorithm or GA, this is interesting as feature values that work in certain individuals are transferred to other individuals, then apply [2], or [3] or [4].

cdani · Post by **cdani** » Tue May 30, 2017 2:48 pm

I tune Andscacs values by hand also.

Is not necessary to tune to death every value, specially with a young engine where you will add a lot of new stuff that will change the optimal value of most parameters many times.

Is important also to respect the error bars of the tests.

Cheney · Post by **Cheney** » Fri Jun 02, 2017 5:09 pm

Thank you for the info

I do not want to tune the eval excessively but I also would like to just see an improvement and when I dont, it is concerning.

Adding passed pawns added approximately 50 elo over the previous version and in a series of games with other engines.

Adding other basics (isolated, chained, doubled) only added 25 elo.

I have just added backwards pawns. I have tried different definitions of what a backwards pawn is (example: penalty if only on its half of the board) and I have tried different mg and eg values but it consistenly no gain over a few thousand games... A whole lot of the engines jockeying for position

My thoughts... Do I leave this as is, remove it, tune the other pawn structure values to find a balance, etc. I know there is a level of randomness especially for two very equal engines. This is an example of my dilemma, figuring out when to move on or stay and spend what looks like weeks or months tuning a single feature.

BeyondCritics · Post by **BeyondCritics** » Fri Jun 02, 2017 7:20 pm

You should definitely consider this script: https://github.com/zamar/spsa
It is known to work and a version of it is used for stockfish tuning. That means you can ask them for assistance.

Evaluation Tuning: When To Stop?

Evaluation Tuning: When To Stop?

Re: Evaluation Tuning: When To Stop?

Re: Evaluation Tuning: When To Stop?

Re: Evaluation Tuning: When To Stop?

Re: Evaluation Tuning: When To Stop?