Parameter Tuning Algorithm

Desperado · Post by **Desperado** » Thu Jan 21, 2021 5:36 pm

At this point I would like to present and further develop my algorithm.

Due to the last events I thought about which ingredients are necessary to increase the effectiveness of an optimizer. It already starts with the fact that the input for the optimizer in many cases has no relation to the capabilities of the engine. The problem is that the database has a significant influence on the output of the optimizer.

So, many developers follow very individual citeria or processes from which the data is obtained. Still, to some extent, the attributes that are considered important overlap. First, I will try to name the most prominent attributes so far and define the framework in which the algorithm is used.

1. Global context

The optimization algorithm is only part of an overall concept to improve the playing strength of an engine. In turn, the most effective algorithm for this so far is to determine the Elo through games and compare it with the previous version.

The optimizer works without this global context and simply processes a search request.
The assumption that the delivered search result has a useful impact in the global context is implied, does not depend on the effectiveness or efficiency of the search algorithm.

To ensure that the measure that will be found is related to a real Elo increase as often as possible, the goal is to include the data components and engine components in the algorithm.

2. Database

Basically, the approach described below is based on game databases.

First of all, I would like to form two categories of the data at this point.

• The position information for which a score will be determined (K1)
• A corresponding result obtained from a game (K2)

When selecting position information, there are general approaches that are always found in individual solutions. In contrast, however, there are also attributes that have a strong influence on the result, which, according to my observations, were included rather randomly in the training data. I would like to name a few points now. The order is arbitrary and is in no way intended to express any particular priority.

2a. Quiet criterion

The main attributes of a position should not change suddenly. In order to evaluate the impact of the attributes, they should not fluctuate. This means a subsequent state has similar properties to the position being evaluated.
There are different practical approaches to take this feature into account.
Examples:
• Using the quiescent search when selecting data from
• Selection of leaf nodes of a main line
• The material distribution remains identical for n half-moves in the main line

2b. WDL distribution
The only strategy that is practically used and that I could observe was,
use a fixed number of positions from a game. This weights each game to the same extent. So for three games with three results nx1xW + nx1xD + nx1xL. If you don't do this, the number of moves have an influence on the distribution of the game results. I don't know if there is a basic strategy to choose a certain distribution with respect to the games.

2c. Relevance of the data (is there content)
It is self-explanatory that the properties to be optimized are also present in the data in a representative measure. At the same time, it should be considered that the parameter to be optimized is also in a normal relation to other parameters.

2d. Correlation of data and engine (evaluation function)
According to my observations, this feature is becoming more and more important.
A common strategy is to determine/generate the data from games played with the own engine. There are also other approaches that do this implicitly. If the data source is filtered, e.g. for quiet positions, a relationship between the selected data and the evaluation function is also established in advance.

2e. Content quality of the database
Although most developers try to extract information from sources generated by strong entities, I don't currently see the need to follow this pattern, however, just as little not to do it. The points 2d + 2f might be more relevant than the games of engines that are several hundred Elo stronger.

2f. Capacity of the static evaluation function
More complex evaluation functions than those of the own engine, simply cannot be mapped to it. So if the output of the own function has to be adjusted, it will unfortunately be mapped to parameters that improve the fitness output without the parameter actually being the reason for it. I consider this point relevant only in the early stages of developing a scoring function. The issue does not disappear, yet an error in later versions is spread over dozens or hundreds of parameters.

2g. Imbalance
The values to be optimized must not only exist representatively, but must also be unbalanced to the extent that the influence on the measure has a relevant effect.

2h. The result of the game
I see two main aspects that lead to an outcome of a game. Firstly, the theoretical outcome can be considered and secondly, the playing strength of two entities is decisive. Even objectively bad positions are often won by the stronger party. So this point can also be taken into account when selecting training data.

In detail, these points are considered in individual solutions. However, I have not yet seen a concept that attempts to systematically combine the above-mentioned contents or deliberately neglects them.

3. The static evaluation function

The attempt to project the internal scoring system to some form of win probability has probably accompanied chess programming from the beginning. If the engine is not part of the game data, there is no connection between the training data and the scoring function. But this can be forced and is part of the algorithm.

4. Algorithm

4a. The training data is selected by the optimizer.

A freely chosen game database is transferred into the Epd format. This is now, especially in the context of the above, in a raw state. The basic selection mechanism takes into account three quantities.

1. error of the static evaluation (error flat score = efs).
2. error of a "deep" search e.g. depth 1 (error deep score = eds).
3. the result of the game

The data is run through once and the values are determined.
The basic condition now is that "eds" < "efs".
I will explain the reason when I explain the fitness function.

Condition: IF ( EDS < EFS ) THEN USE_DATA (part of the base algorithm).

The further thought, which can optionally be used as a parameter, is to extend the condition to minimize the noise in the data.

Optional: IF ( EDS < EFS && ABS(EDS-EFS) < T)

Thus, at least a relationship between the training data and the own evaluation function is already established. (configurable part of the algorithm)

Optional (idea): use for the operation only the parameters that should be optimized. This way you can strengthen the relationship. (this might be a variant of the algorithm)

This process step already addresses the points 2 → a / c / d / f / g

The following information is stored: Fen / result / score of the search
r1bq1rk1/pp1n1pb1/3p1npp/2pP4/2P2P2/2N4P/PP1BB1PN/R1Q1R1K1 b - - 1 "1-0" ce 90

4b. Optimizer search algorithm

For simplicity, I choose a minimizer such as the one you can find at CPW.
This refers to the logic and not to the settings used by the algorithm. (Part of the basic algorithm: cpw local optimization routine). As a variant I consider when the search logic changes.

4c. The output of the static evaluation function is optimized.
Thus, the performance issue should be sufficiently solved.

4c. The fitness function / loss function

The calculation of the MSE (part of the basic algorithm) represents the basic configuration.
Other loss functions are quite conceivable as selectable settings. (configurable part)

...
deepscore = Epd::getEvaluation(entry); // retrieves the score in the string
staticscore = eval(position) // get current vector output

computeErrror(deepscore,flatscore)
…

double Tuner::computeError(int deepscore, int flatscore)
{
double err = sig(deepscore,400) - sig(flatscore,400)
return err * err;
}

The decisive factor in the selection criterion was that the error of the "deep" search was smaller than the error of the static evaluation. Thus, the value should move towards the stored reference value, as it was ensured that this was closer to the target value. Both values were determined from the same scoring function which leads to a smooth transition. At this point the previous steps are transferred into the actual idea.

I am not an expert in this area, I think this is a form of reinforcement learning that works similar to temporal difference. If you want, you can think of the static function as a search of depth 0 and the reference value as a search of depth 1.

All errors are transferred into the corresponding loss function, here first of all into the MSE.

5. Conclusion

With this procedure I was able to improve one of my engines, yesterday with 2 iterations by about 15 Elo. The engine strength for which I used this algorithm is at the 2950 Elo level compared to today's CCRL references.

How this procedure relates to the listed sub-items of point 2 and what potential there still is, everyone can evaluate for themselves or also bring into a discussion.

Valuable parts of the algorithm for me are that the data source, regardless of individual interpretations and implementations, by definition creates a strong bound between the selected data and the scoring function. The data used for training is very robust compared to the original source. The selection is generated systematically.

The groundwork has been done...

Desperado · Post by **Desperado** » Thu Jan 21, 2021 6:14 pm

If you want to make a shortcut, just begin at 4.Algorithm

Sven · Post by **Sven** » Thu Jan 21, 2021 7:22 pm

4a. [...] The basic condition now is that "eds" < "efs".

Is that the condition for one single position to survive in the test set?
Why eds < efs and not eds <= efs?

4c. [...]
deepscore = Epd::getEvaluation(entry); // retrieves the score in the string
...
double Tuner::computeError(int deepscore, int flatscore)
{
double err = sig(deepscore,400) - sig(flatscore,400)

If "deepscore" is the game result stored in the database, i.e. 1-0, 0-1, 1/2-1/2, then I don't understand why you use "sig(deepscore, 400)" instead of simply "deepscore". sig() transforms an eval score into a winning probability in [0.0 .. 1.0] but "deepscore" does not need to be transformed IMO.

Desperado · Post by **Desperado** » Thu Jan 21, 2021 7:50 pm

Sven wrote: ↑Thu Jan 21, 2021 7:22 pm
4a. [...] The basic condition now is that "eds" < "efs".
Is that the condition for one single position to survive in the test set?
Why eds < efs and not eds <= efs?

4c. [...]
deepscore = Epd::getEvaluation(entry); // retrieves the score in the string
...
double Tuner::computeError(int deepscore, int flatscore)
{
double err = sig(deepscore,400) - sig(flatscore,400)
If "deepscore" is the game result stored in the database, i.e. 1-0, 0-1, 1/2-1/2, then I don't understand why you use "sig(deepscore, 400)" instead of simply "deepscore". sig() transforms an eval score into a winning probability in [0.0 .. 1.0] but "deepscore" does not need to be transformed IMO.

If you want to come closer to a score, then it would be impossible if the scores are already equal.
There should be some room for improvement. That is the idea when "error of deepscore" < "error of flatscore".

The result values "1-0","1/2","0-1" are only used in the pre-selection phase to determine the direction in which the score (staticEval) should be shifted.

The basic idea is to compare two scores. I know that in general a one-ply search gives a better result than the static eval.
I only accept the one-ply search score as reference score (selection criteria) if it is closer to the result compared to the staticEval score.
This assures that if the static eval comes closer to the one-ply search result, the static eval also comes closer to the result at the same time.

Finally i measure if the distance of the staticEval gets smaller to the one-ply search score.
Well, instead of computing the error related to the result 1.0/0.5/0.0 i compute the error related to the distance of sig(staticEval)-sig(one-ply-search).

So deepscore is the result of the one-ply search and flatscore the result of the staticEval. (deepscore != result)

I think it is very smooth, because it is sensitive to what the current evaluation already outputs (represented by the one-ply search),
and the target to come closer to the result.

Desperado · Post by **Desperado** » Thu Jan 21, 2021 8:25 pm

...and yes, the description relates to one position.

And to avoid confusion, but i think it is clear. The one-ply search score keeps to be constant, as it is stored in the epd-string.
When the vector is modified only the staticEval needs to be computed with the goal to be like the one-ply-search-score.

Desperado · Post by **Desperado** » Thu Jan 21, 2021 8:41 pm

4a. The training data is selected by the optimizer.

A freely chosen game database is transferred into the Epd format. This is now, especially in the context of the above, in a raw state. The basic selection mechanism takes into account three quantities.

1. error of the static evaluation (error flat score = efs).
2. error of a "deep" search e.g. depth 1 (error deep score = eds).
3. the result of the game

The data is run through once and the values are determined.
The basic condition now is that "eds" < "efs".
I will explain the reason when I explain the fitness function.

Condition: IF ( EDS < EFS ) THEN USE_DATA (part of the base algorithm).

This is meant for every position entry in the epd file. It should be written

Condition: IF ( ERROR(ONE_PLY_SEARCH) < ERROR(STATICEVAL) ) USE_POSITION

The error relates to the game result.

Desperado · Post by **Desperado** » Fri Jan 22, 2021 11:04 am

The most important points summarized

- The error is determined from two values of the evaluation function (one-ply,static-eval)
◦ one-ply is reference value
◦ The reference value is by definition basically the better value
◦ The selected reference values are always closer to the game result (selection of data)
◦ The error to a fixed output value is dynamic (e.g. e(20,68), e(20,37),e(20,97))
- Training data
◦ Strongly correlate with the evaluation function.
◦ Selection criterion confirms reference value as better value: E(only-ply) < E(static-eval)
◦ Optionally, the error distance ABS( E(one-ply) - E(static-eval)) < THRESHOLD can take into account the level of the error.
Note: this is not the same as the absolute distance of the values, since an equal distance of the values does not mean that the error is also smaller, i.e. the relation to the game result.
◦ generalized process that fits the characteristics of the engine and optimizer, since both components are involved in the selection of the data.
- Game Result
◦ is used to confirm the reference value as correct.
◦ flows indirectly into the tuning (through the selection criterion of the data).
- Evaluation function
1:1 relationship to the evaluation function of the trainer (parameter set / logic).

Parameter Tuning Algorithm

Parameter Tuning Algorithm

Re: Parameter Tuning Algorithm

Re: Parameter Tuning Algorithm

Re: Parameter Tuning Algorithm

Re: Parameter Tuning Algorithm

Re: Parameter Tuning Algorithm

Re: Parameter Tuning Algorithm