I am a little bit tired and have probably not thought things through but can it be that your set of data can be very unbalanced. What I mean is that maybe you have not included 50% games with wins/losses and 50% of draws. If you have generated the positions from where your engine plays itself from a variety of balanced positions you will probably have 90% draws. If that is the case and you haven’t divided them in equally big groups it becomes much more important for the tuner to get the prediction of the 90% correct than the 10% of wins/losses.Desperado wrote: ↑Wed Jan 13, 2021 11:44 pmWell, it is not inteded to keep the average constant, that is part of the puzzle.hgm wrote: ↑Wed Jan 13, 2021 11:09 pmThat much I understood. But doing this should greatly drive up the rms error, agreed? Because keeping the average constant will prevent any effect on the prediction for those positions that have gamePhase = 0.5, but it sure as hell will have an enormous impact on positions with gamePhase=0 or gamePhase=1. Because these only depend on the mg or eg values, and do not care the slightest about the other or the average of them.
And in particular the mg values are completely wrong, with a score of 68 cP for a position where you are a Queen up.
So basically you are talking about why the optimizer tries to make the error as large as possible, rather than minimize it...
Yes the mg values are completely wrong, that is why this threads exists. I want to find out why and how that can happen.
The effect can be softened (using qs() instead of eval and using a better scalingfactorK), but it does not disappear.
The effect even does not appear without tapered eval, because the tuner is not able to split a term (for example a knight value) in any from.
As you point out the phase bounds have massive impact on that matter, that's why i began to analyse the data and produced a
file where the total error per phase is about equal. Each phase has the same portion of influence on the total error and the mse.
That does not change the result either, but that is the place where i want to continue my analyses.
Somehow the tuner keeps to reduce the mg values in an unreasonable proportion.
i cannot follow why the tuner tries to make the error as large as possible.
If the frequency of phase 24(mg) is 15% and the error component in one position is larger than the average error, then both quantities together also produce the largest error. Apply the same logic to phase 22 and 20 and the three middlegame phases already have a share of 45%.
At the same time, the middlegame positions will produce a larger error percentage than an endgame position (quiet criterion) using static evaluation. Of course, the tuner will get the better result if he accepts the endgame errors and minimizes the midgame errors.
Unfortunately, the tuner achieves this by generating two mathematically useful values, but content-wise nonsensical values.
This might explain your strange numbers. That the numbers differ between the experiment of step size 5 and 1 can then be explained with that it is much more important to get a fine grained eval than a good predictor of wins/losses. You can emulate a much finer eval since you have more degrees of freedom wit MG and EG interpolation and since the approximation of draws might be of much greater importance for you.
I have couple of more ideas of how to improve the Texel algorithm and data generation but I can take that later when we have sorted the big issues first.