Parameter Tuning Algorithm
Posted: Thu Jan 21, 2021 5:36 pm
At this point I would like to present and further develop my algorithm.
Due to the last events I thought about which ingredients are necessary to increase the effectiveness of an optimizer. It already starts with the fact that the input for the optimizer in many cases has no relation to the capabilities of the engine. The problem is that the database has a significant influence on the output of the optimizer.
So, many developers follow very individual citeria or processes from which the data is obtained. Still, to some extent, the attributes that are considered important overlap. First, I will try to name the most prominent attributes so far and define the framework in which the algorithm is used.
1. Global context
The optimization algorithm is only part of an overall concept to improve the playing strength of an engine. In turn, the most effective algorithm for this so far is to determine the Elo through games and compare it with the previous version.
The optimizer works without this global context and simply processes a search request.
The assumption that the delivered search result has a useful impact in the global context is implied, does not depend on the effectiveness or efficiency of the search algorithm.
To ensure that the measure that will be found is related to a real Elo increase as often as possible, the goal is to include the data components and engine components in the algorithm.
2. Database
Basically, the approach described below is based on game databases.
First of all, I would like to form two categories of the data at this point.
• The position information for which a score will be determined (K1)
• A corresponding result obtained from a game (K2)
When selecting position information, there are general approaches that are always found in individual solutions. In contrast, however, there are also attributes that have a strong influence on the result, which, according to my observations, were included rather randomly in the training data. I would like to name a few points now. The order is arbitrary and is in no way intended to express any particular priority.
2a. Quiet criterion
The main attributes of a position should not change suddenly. In order to evaluate the impact of the attributes, they should not fluctuate. This means a subsequent state has similar properties to the position being evaluated.
There are different practical approaches to take this feature into account.
Examples:
• Using the quiescent search when selecting data from
• Selection of leaf nodes of a main line
• The material distribution remains identical for n half-moves in the main line
2b. WDL distribution
The only strategy that is practically used and that I could observe was,
use a fixed number of positions from a game. This weights each game to the same extent. So for three games with three results nx1xW + nx1xD + nx1xL. If you don't do this, the number of moves have an influence on the distribution of the game results. I don't know if there is a basic strategy to choose a certain distribution with respect to the games.
2c. Relevance of the data (is there content)
It is self-explanatory that the properties to be optimized are also present in the data in a representative measure. At the same time, it should be considered that the parameter to be optimized is also in a normal relation to other parameters.
2d. Correlation of data and engine (evaluation function)
According to my observations, this feature is becoming more and more important.
A common strategy is to determine/generate the data from games played with the own engine. There are also other approaches that do this implicitly. If the data source is filtered, e.g. for quiet positions, a relationship between the selected data and the evaluation function is also established in advance.
2e. Content quality of the database
Although most developers try to extract information from sources generated by strong entities, I don't currently see the need to follow this pattern, however, just as little not to do it. The points 2d + 2f might be more relevant than the games of engines that are several hundred Elo stronger.
2f. Capacity of the static evaluation function
More complex evaluation functions than those of the own engine, simply cannot be mapped to it. So if the output of the own function has to be adjusted, it will unfortunately be mapped to parameters that improve the fitness output without the parameter actually being the reason for it. I consider this point relevant only in the early stages of developing a scoring function. The issue does not disappear, yet an error in later versions is spread over dozens or hundreds of parameters.
2g. Imbalance
The values to be optimized must not only exist representatively, but must also be unbalanced to the extent that the influence on the measure has a relevant effect.
2h. The result of the game
I see two main aspects that lead to an outcome of a game. Firstly, the theoretical outcome can be considered and secondly, the playing strength of two entities is decisive. Even objectively bad positions are often won by the stronger party. So this point can also be taken into account when selecting training data.
In detail, these points are considered in individual solutions. However, I have not yet seen a concept that attempts to systematically combine the above-mentioned contents or deliberately neglects them.
3. The static evaluation function
The attempt to project the internal scoring system to some form of win probability has probably accompanied chess programming from the beginning. If the engine is not part of the game data, there is no connection between the training data and the scoring function. But this can be forced and is part of the algorithm.
4. Algorithm
4a. The training data is selected by the optimizer.
A freely chosen game database is transferred into the Epd format. This is now, especially in the context of the above, in a raw state. The basic selection mechanism takes into account three quantities.
1. error of the static evaluation (error flat score = efs).
2. error of a "deep" search e.g. depth 1 (error deep score = eds).
3. the result of the game
The data is run through once and the values are determined.
The basic condition now is that "eds" < "efs".
I will explain the reason when I explain the fitness function.
Condition: IF ( EDS < EFS ) THEN USE_DATA (part of the base algorithm).
The further thought, which can optionally be used as a parameter, is to extend the condition to minimize the noise in the data.
Optional: IF ( EDS < EFS && ABS(EDS-EFS) < T)
Thus, at least a relationship between the training data and the own evaluation function is already established. (configurable part of the algorithm)
Optional (idea): use for the operation only the parameters that should be optimized. This way you can strengthen the relationship. (this might be a variant of the algorithm)
This process step already addresses the points 2 → a / c / d / f / g
The following information is stored: Fen / result / score of the search
r1bq1rk1/pp1n1pb1/3p1npp/2pP4/2P2P2/2N4P/PP1BB1PN/R1Q1R1K1 b - - 1 "1-0" ce 90
4b. Optimizer search algorithm
For simplicity, I choose a minimizer such as the one you can find at CPW.
This refers to the logic and not to the settings used by the algorithm. (Part of the basic algorithm: cpw local optimization routine). As a variant I consider when the search logic changes.
4c. The output of the static evaluation function is optimized.
Thus, the performance issue should be sufficiently solved.
4c. The fitness function / loss function
The calculation of the MSE (part of the basic algorithm) represents the basic configuration.
Other loss functions are quite conceivable as selectable settings. (configurable part)
...
deepscore = Epd::getEvaluation(entry); // retrieves the score in the string
staticscore = eval(position) // get current vector output
computeErrror(deepscore,flatscore)
…
double Tuner::computeError(int deepscore, int flatscore)
{
double err = sig(deepscore,400) - sig(flatscore,400)
return err * err;
}
The decisive factor in the selection criterion was that the error of the "deep" search was smaller than the error of the static evaluation. Thus, the value should move towards the stored reference value, as it was ensured that this was closer to the target value. Both values were determined from the same scoring function which leads to a smooth transition. At this point the previous steps are transferred into the actual idea.
I am not an expert in this area, I think this is a form of reinforcement learning that works similar to temporal difference. If you want, you can think of the static function as a search of depth 0 and the reference value as a search of depth 1.
All errors are transferred into the corresponding loss function, here first of all into the MSE.
5. Conclusion
With this procedure I was able to improve one of my engines, yesterday with 2 iterations by about 15 Elo. The engine strength for which I used this algorithm is at the 2950 Elo level compared to today's CCRL references.
How this procedure relates to the listed sub-items of point 2 and what potential there still is, everyone can evaluate for themselves or also bring into a discussion.
Valuable parts of the algorithm for me are that the data source, regardless of individual interpretations and implementations, by definition creates a strong bound between the selected data and the scoring function. The data used for training is very robust compared to the original source. The selection is generated systematically.
The groundwork has been done...
Due to the last events I thought about which ingredients are necessary to increase the effectiveness of an optimizer. It already starts with the fact that the input for the optimizer in many cases has no relation to the capabilities of the engine. The problem is that the database has a significant influence on the output of the optimizer.
So, many developers follow very individual citeria or processes from which the data is obtained. Still, to some extent, the attributes that are considered important overlap. First, I will try to name the most prominent attributes so far and define the framework in which the algorithm is used.
1. Global context
The optimization algorithm is only part of an overall concept to improve the playing strength of an engine. In turn, the most effective algorithm for this so far is to determine the Elo through games and compare it with the previous version.
The optimizer works without this global context and simply processes a search request.
The assumption that the delivered search result has a useful impact in the global context is implied, does not depend on the effectiveness or efficiency of the search algorithm.
To ensure that the measure that will be found is related to a real Elo increase as often as possible, the goal is to include the data components and engine components in the algorithm.
2. Database
Basically, the approach described below is based on game databases.
First of all, I would like to form two categories of the data at this point.
• The position information for which a score will be determined (K1)
• A corresponding result obtained from a game (K2)
When selecting position information, there are general approaches that are always found in individual solutions. In contrast, however, there are also attributes that have a strong influence on the result, which, according to my observations, were included rather randomly in the training data. I would like to name a few points now. The order is arbitrary and is in no way intended to express any particular priority.
2a. Quiet criterion
The main attributes of a position should not change suddenly. In order to evaluate the impact of the attributes, they should not fluctuate. This means a subsequent state has similar properties to the position being evaluated.
There are different practical approaches to take this feature into account.
Examples:
• Using the quiescent search when selecting data from
• Selection of leaf nodes of a main line
• The material distribution remains identical for n half-moves in the main line
2b. WDL distribution
The only strategy that is practically used and that I could observe was,
use a fixed number of positions from a game. This weights each game to the same extent. So for three games with three results nx1xW + nx1xD + nx1xL. If you don't do this, the number of moves have an influence on the distribution of the game results. I don't know if there is a basic strategy to choose a certain distribution with respect to the games.
2c. Relevance of the data (is there content)
It is self-explanatory that the properties to be optimized are also present in the data in a representative measure. At the same time, it should be considered that the parameter to be optimized is also in a normal relation to other parameters.
2d. Correlation of data and engine (evaluation function)
According to my observations, this feature is becoming more and more important.
A common strategy is to determine/generate the data from games played with the own engine. There are also other approaches that do this implicitly. If the data source is filtered, e.g. for quiet positions, a relationship between the selected data and the evaluation function is also established in advance.
2e. Content quality of the database
Although most developers try to extract information from sources generated by strong entities, I don't currently see the need to follow this pattern, however, just as little not to do it. The points 2d + 2f might be more relevant than the games of engines that are several hundred Elo stronger.
2f. Capacity of the static evaluation function
More complex evaluation functions than those of the own engine, simply cannot be mapped to it. So if the output of the own function has to be adjusted, it will unfortunately be mapped to parameters that improve the fitness output without the parameter actually being the reason for it. I consider this point relevant only in the early stages of developing a scoring function. The issue does not disappear, yet an error in later versions is spread over dozens or hundreds of parameters.
2g. Imbalance
The values to be optimized must not only exist representatively, but must also be unbalanced to the extent that the influence on the measure has a relevant effect.
2h. The result of the game
I see two main aspects that lead to an outcome of a game. Firstly, the theoretical outcome can be considered and secondly, the playing strength of two entities is decisive. Even objectively bad positions are often won by the stronger party. So this point can also be taken into account when selecting training data.
In detail, these points are considered in individual solutions. However, I have not yet seen a concept that attempts to systematically combine the above-mentioned contents or deliberately neglects them.
3. The static evaluation function
The attempt to project the internal scoring system to some form of win probability has probably accompanied chess programming from the beginning. If the engine is not part of the game data, there is no connection between the training data and the scoring function. But this can be forced and is part of the algorithm.
4. Algorithm
4a. The training data is selected by the optimizer.
A freely chosen game database is transferred into the Epd format. This is now, especially in the context of the above, in a raw state. The basic selection mechanism takes into account three quantities.
1. error of the static evaluation (error flat score = efs).
2. error of a "deep" search e.g. depth 1 (error deep score = eds).
3. the result of the game
The data is run through once and the values are determined.
The basic condition now is that "eds" < "efs".
I will explain the reason when I explain the fitness function.
Condition: IF ( EDS < EFS ) THEN USE_DATA (part of the base algorithm).
The further thought, which can optionally be used as a parameter, is to extend the condition to minimize the noise in the data.
Optional: IF ( EDS < EFS && ABS(EDS-EFS) < T)
Thus, at least a relationship between the training data and the own evaluation function is already established. (configurable part of the algorithm)
Optional (idea): use for the operation only the parameters that should be optimized. This way you can strengthen the relationship. (this might be a variant of the algorithm)
This process step already addresses the points 2 → a / c / d / f / g
The following information is stored: Fen / result / score of the search
r1bq1rk1/pp1n1pb1/3p1npp/2pP4/2P2P2/2N4P/PP1BB1PN/R1Q1R1K1 b - - 1 "1-0" ce 90
4b. Optimizer search algorithm
For simplicity, I choose a minimizer such as the one you can find at CPW.
This refers to the logic and not to the settings used by the algorithm. (Part of the basic algorithm: cpw local optimization routine). As a variant I consider when the search logic changes.
4c. The output of the static evaluation function is optimized.
Thus, the performance issue should be sufficiently solved.
4c. The fitness function / loss function
The calculation of the MSE (part of the basic algorithm) represents the basic configuration.
Other loss functions are quite conceivable as selectable settings. (configurable part)
...
deepscore = Epd::getEvaluation(entry); // retrieves the score in the string
staticscore = eval(position) // get current vector output
computeErrror(deepscore,flatscore)
…
double Tuner::computeError(int deepscore, int flatscore)
{
double err = sig(deepscore,400) - sig(flatscore,400)
return err * err;
}
The decisive factor in the selection criterion was that the error of the "deep" search was smaller than the error of the static evaluation. Thus, the value should move towards the stored reference value, as it was ensured that this was closer to the target value. Both values were determined from the same scoring function which leads to a smooth transition. At this point the previous steps are transferred into the actual idea.
I am not an expert in this area, I think this is a form of reinforcement learning that works similar to temporal difference. If you want, you can think of the static function as a search of depth 0 and the reference value as a search of depth 1.
All errors are transferred into the corresponding loss function, here first of all into the MSE.
5. Conclusion
With this procedure I was able to improve one of my engines, yesterday with 2 iterations by about 15 Elo. The engine strength for which I used this algorithm is at the 2950 Elo level compared to today's CCRL references.
How this procedure relates to the listed sub-items of point 2 and what potential there still is, everyone can evaluate for themselves or also bring into a discussion.
Valuable parts of the algorithm for me are that the data source, regardless of individual interpretations and implementations, by definition creates a strong bound between the selected data and the scoring function. The data used for training is very robust compared to the original source. The selection is generated systematically.
The groundwork has been done...