Tapered Evaluation and MSE (Texel Tuning)

Desperado · Post by **Desperado** » Sun Jan 10, 2021 12:46 pm

So far, I have a basic tuning algorithm that minimizes the mean squared error for my scoring function.
The optimization function is good enough to find an optimum within minutes when I have a choice of terms
with fewer than (say) 20 parameters. For example, I tune material, mobility, and a handful of individual parameters.
So this semi-automation gives me good control over what I want to change and introduce, while saving my time resources.

Problem description

I ran into a problem in the algorithm that I'm sure some of you have already solved.
Suppose I match the material weights with (using arbitrary numbers)

What I want to show is that the average value for MG/EG seems to be OK,but mg and eg scores diverge.

Code: Select all

Material:   P,  N,  B,  R,  Q      P,  N,  B,   R,   Q
Start: MG: 80,300,320,500,980 EG:100,300,320, 500, 980
End:   MG:  5, 24, 32, 48, 68 EG:192,595,610,1020,1990

Conclusions:

At first I was looking for a bug, but there is no bug.
I have tried to understand what is going on and have come to the following conclusions.

1. the data contains a natural imbalance between mg and eg scores.
By natural I mean that the positions were randomly selected from pgn games.

2. phase model: (mg * phase + eg *(maxphase - phase)) / maxphase.

The average values can be mg(13),eg(11),max(24) in the data set.
This forces the tuner to bring the mg score to its minimum and maximize the eg score.
The average scores (mg+eg)/2 are fine. Especially when mg-error > eg-error.

This is a correct behavior of the tuner.

3. tapered evaluation

It may sound strange at this point, but I was thinking about what a tapered evaluation actually is.
The obvious thought is that many evaluation functions are mapped to one, but they are weighted differently.

I'd like to look at it more descriptively. A score is a result of a scoring function, so we have different scoring functions.
In most chess engines we have two scoring functions called "mg" and "eg", in one logic,
which could easily be separated by duplicating the code. Generalizing, we can replace two with "N" evaluation functions.

4. 2(N) evaluations and one error

We have managed to produce a single error, but we have two evaluation functions (mg,eg).
The tuner does not know and does not care what the weights are. Consequently, the evaluation function with
the larger average error will be minimized and will be balanced with the smaller average error to get the "optimized" sum.

That is the crux of the matter! There are infinitely many sums that produce the same optimal average.
The function with the larger error is permanently minimized and balanced as long as the algorithm is running.

This may be blurred or not even noticeable when several hundred parameters are optimized
because the algorithm takes too long to get to this stage.

5. one-to-one error calculation (my solution)

The only idea so far that solves the problem is to tune each evaluation function (mg,eg) separately.
So I enforce a one-to-one relationship between error and evaluation function (mg,eg evaluations/points).

Questions:

1. Is there an error in my thinking?
2. How did you guys solved it?
3. Tuning hundreds of parameters will not make the problem visible,
because the algorithm would take too long to reach that point ?!?

I look forward to any feedback. Thanks.

Pio · Post by **Pio** » Sun Jan 10, 2021 3:38 pm

Desperado wrote: ↑Sun Jan 10, 2021 12:46 pm So far, I have a basic tuning algorithm that minimizes the mean squared error for my scoring function.
The optimization function is good enough to find an optimum within minutes when I have a choice of terms
with fewer than (say) 20 parameters. For example, I tune material, mobility, and a handful of individual parameters.
So this semi-automation gives me good control over what I want to change and introduce, while saving my time resources.

Problem description

I ran into a problem in the algorithm that I'm sure some of you have already solved.
Suppose I match the material weights with (using arbitrary numbers)

What I want to show is that the average value for MG/EG seems to be OK,but mg and eg scores diverge.
Code: Select all
Material:   P,  N,  B,  R,  Q      P,  N,  B,   R,   Q
Start: MG: 80,300,320,500,980 EG:100,300,320, 500, 980
End:   MG:  5, 24, 32, 48, 68 EG:192,595,610,1020,1990
Conclusions:

At first I was looking for a bug, but there is no bug.
I have tried to understand what is going on and have come to the following conclusions.

1. the data contains a natural imbalance between mg and eg scores.
By natural I mean that the positions were randomly selected from pgn games.

2. phase model: (mg * phase + eg *(maxphase - phase)) / maxphase.

The average values can be mg(13),eg(11),max(24) in the data set.
This forces the tuner to bring the mg score to its minimum and maximize the eg score.
The average scores (mg+eg)/2 are fine. Especially when mg-error > eg-error.

This is a correct behavior of the tuner.

3. tapered evaluation

It may sound strange at this point, but I was thinking about what a tapered evaluation actually is.
The obvious thought is that many evaluation functions are mapped to one, but they are weighted differently.

I'd like to look at it more descriptively. A score is a result of a scoring function, so we have different scoring functions.
In most chess engines we have two scoring functions called "mg" and "eg", in one logic,
which could easily be separated by duplicating the code. Generalizing, we can replace two with "N" evaluation functions.

4. 2(N) evaluations and one error

We have managed to produce a single error, but we have two evaluation functions (mg,eg).
The tuner does not know and does not care what the weights are. Consequently, the evaluation function with
the larger average error will be minimized and will be balanced with the smaller average error to get the "optimized" sum.

That is the crux of the matter! There are infinitely many sums that produce the same optimal average.
The function with the larger error is permanently minimized and balanced as long as the algorithm is running.

This may be blurred or not even noticeable when several hundred parameters are optimized
because the algorithm takes too long to get to this stage.

5. one-to-one error calculation (my solution)

The only idea so far that solves the problem is to tune each evaluation function (mg,eg) separately.
So I enforce a one-to-one relationship between error and evaluation function (mg,eg evaluations/points).

Questions:

1. Is there an error in my thinking?
2. How did you guys solved it?
3. Tuning hundreds of parameters will not make the problem visible,
because the algorithm would take too long to reach that point ?!?

I look forward to any feedback. Thanks.

You will have to fix one value in your evaluation that cannot change for example you could set MG pawn to 100 cp. Then don’t ever change that value. In that way the tuner will not be able to put whatever values it wants and the tuning will become faster and better. One problem you can get when varying all values in some optimisation algorithms is that they can learn to set all values to zero.

Desperado · Post by **Desperado** » Sun Jan 10, 2021 9:37 pm

Hello, thank you for your reply.

Unfortunately it is not enough to define an anchor.
I have tried this again and again in the past for different reasons.

Nevertheless, based on your answer, I tested it again today in two variations.
One, in which only an anchor for the pawn value (MG) was fixed and a second in which middlegame value
and endgame value were fixed.

The explanation why this doesn't work is, in my opinion, simple.
For a knight, the pair MG(30),EG(570) will still produce a smaller error than the pair MG(300),EG(300).
This will remain the status until the error from the relation of knight to pawn is smaller than the
resulting error of the pair MG(30),EG(570) for the knight.This only helps within the values of the same evaluation,
e.g. middle game. But as long as the error can be reduced more by diverging the average value for a term, it does not help.

By the way, I can tell from my experience,
that fixing a term can have an optimizing influence, but is not necessary.

To cut a long story short, I have tested as described above and unfortunately it does not work.
The core problem is not solved with your solution approach.

The error for example from MG(30),EG(570) is still smaller than MG(300),EG(300) for one term.
As long as I only map a value (a function) to an error, the problem does not exist. The tuner cannot split the average value for a term.
This changes if several input signals (mg,eg) are mapped to one value and finally to one error.
The tuner weights them and divides the average value.

Maybe later I will post the code for minimization, which is pretty simple
It should fall into the category of a beginner's algorithm, since I wrote it myself.

Regards

Pio · Post by **Pio** » Sun Jan 10, 2021 10:22 pm

Desperado wrote: ↑Sun Jan 10, 2021 9:37 pm Hello, thank you for your reply.

Unfortunately it is not enough to define an anchor.
I have tried this again and again in the past for different reasons.

Nevertheless, based on your answer, I tested it again today in two variations.
One, in which only an anchor for the pawn value (MG) was fixed and a second in which middlegame value
and endgame value were fixed.

The explanation why this doesn't work is, in my opinion, simple.
For a knight, the pair MG(30),EG(570) will still produce a smaller error than the pair MG(300),EG(300).
This will remain the status until the error from the relation of knight to pawn is smaller than the
resulting error of the pair MG(30),EG(570) for the knight.This only helps within the values of the same evaluation,
e.g. middle game. But as long as the error can be reduced more by diverging the average value for a term, it does not help.

By the way, I can tell from my experience,
that fixing a term can have an optimizing influence, but is not necessary.

To cut a long story short, I have tested as described above and unfortunately it does not work.
The core problem is not solved with your solution approach.

The error for example from MG(30),EG(570) is still smaller than MG(300),EG(300) for one term.
As long as I only map a value (a function) to an error, the problem does not exist. The tuner cannot split the average value for a term.
This changes if several input signals (mg,eg) are mapped to one value and finally to one error.
The tuner weights them and divides the average value.

Maybe later I will post the code for minimization, which is pretty simple
It should fall into the category of a beginner's algorithm, since I wrote it myself.

Regards

I don’t know exactly how you do your optimisation but you only need one anchor since your parameters most likely are in a connected set, i.e all parameters are directly or indirectly a function of the pawn MG value.

What do you mean by the tuner cannot split the average term for a term? Let’s say you are in 0.7 MG and 0.3 EG. You see each game phase term as a separate term so for the above phase you will weight all the MG values as 0.7 and weight all EG values as 0.3. For this position the minimisation will focus 0.7/0.3 as much on the MG values. In the extreme case for a position with MG = 0, EG = 1 only the endgame parameters will be learnt.

Pio · Post by **Pio** » Sun Jan 10, 2021 10:39 pm

Pio wrote: ↑Sun Jan 10, 2021 10:22 pm
Desperado wrote: ↑Sun Jan 10, 2021 9:37 pm Hello, thank you for your reply.

Unfortunately it is not enough to define an anchor.
I have tried this again and again in the past for different reasons.

Nevertheless, based on your answer, I tested it again today in two variations.
One, in which only an anchor for the pawn value (MG) was fixed and a second in which middlegame value
and endgame value were fixed.

The explanation why this doesn't work is, in my opinion, simple.
For a knight, the pair MG(30),EG(570) will still produce a smaller error than the pair MG(300),EG(300).
This will remain the status until the error from the relation of knight to pawn is smaller than the
resulting error of the pair MG(30),EG(570) for the knight.This only helps within the values of the same evaluation,
e.g. middle game. But as long as the error can be reduced more by diverging the average value for a term, it does not help.

By the way, I can tell from my experience,
that fixing a term can have an optimizing influence, but is not necessary.

To cut a long story short, I have tested as described above and unfortunately it does not work.
The core problem is not solved with your solution approach.

The error for example from MG(30),EG(570) is still smaller than MG(300),EG(300) for one term.
As long as I only map a value (a function) to an error, the problem does not exist. The tuner cannot split the average value for a term.
This changes if several input signals (mg,eg) are mapped to one value and finally to one error.
The tuner weights them and divides the average value.

Maybe later I will post the code for minimization, which is pretty simple
It should fall into the category of a beginner's algorithm, since I wrote it myself.

Regards
I don’t know exactly how you do your optimisation but you only need one anchor since your parameters most likely are in a connected set, i.e all parameters are directly or indirectly a function of the pawn MG value.

What do you mean by the tuner cannot split the average term for a term? Let’s say you are in 0.7 MG and 0.3 EG. You see each game phase term as a separate term so for the above phase you will weight all the MG values as 0.7 and weight all EG values as 0.3. For this position the minimisation will focus 0.7/0.3 as much on the MG values. In the extreme case for a position with MG = 0, EG = 1 only the endgame parameters will be learnt.

If I ever would try Texel’s tuning I would probably try to weigh positions close to the end position more of two reasons. The first reason is that the closer you are to the end result the less likely you are to have made a blunder in between the position and the endgame result so the position should have a greater correlation with endgame result. The second reason is that you mostly evaluate the leaf positions that are many plies from the root so it is more important to get the positions maybe 10 plies closer to the end correct than it is to get the root position correct.

Desperado · Post by **Desperado** » Sun Jan 10, 2021 10:57 pm

Here is my selfmade algorithm, nothing special. I was able to develop it while writing the code.
I have commented the code so that you can follow my confused thoughts better.

MInimization routine:

Code: Select all

// The method performs a loop over all parameters and tries to improve them.
// This routine needs to be repeated serveral times and called by a controller.
// The reason is, when we improved a parameter in the list after having improved
// a previous one, the previous one needs to optimized again because the 
// corresponding parameters did change.

double Tuner::minimize(param_t* param)
{
    double lo, hi, bestFitness;
    int id, offset;

    // shuffle param access, more see next comment
    //for(int i = 0; i < pcount; i++)
    //{
    //    id = rand() % pcount;
    //    tmp = accessId[id];
    //    accessId[id] = accessId[i];
    //    accessId[i] = tmp;
    //}

    bestFitness = mse();

    // the basic approach is to loop over all loaded parameters
    // the accessId includes a shuffled index, so the parameters can be looped in different
    // order. That provides a nice form to pick a random subset of n < i. If "shuffling" is
    // disabled accessId includes the numbers 0...n. pcount is the number of loades parameters.
    for(int i = 0; i < pcount; i++)
    {
        id = accessId[i];

        // defining the change of the score, the offset of: stepsize = 5 range = 2
        // stepsize 5 * rangevalues (-2,-1,0,1,2) = -10,-5,0,5,10
        // This relaxes the term resolution and speed of the algorithm
        // Additionally it is sensitive to the type of term.
        offset = param[id].stepsize * (rand() % (RANGE * 2 + 1) - RANGE);

        // The basic idea is to check wether a parameter improves and
        // in which direction it will improve. Must we substract or add points.
        // Therefore we measure two directions. This is the basic idea of gradient
        // gd algorithm, but i dont know how to implement it, lol.

        // lo - direction
        *param[id].score -= offset;
        lo = mse();
        *param[id].score += offset;

        // hi - direction
        *param[id].score += offset;
        hi = mse();
        *param[id].score -= offset;

        // Now the following situations can happen. The current offset does not
        // improve in any direction, so we continue with the next parameter.
        // More interesting is, when we improve the bestFitness, then we follow
        // the direction which improves more than the other. We keep the direction
        // and the offset as long as possible. As option we can switch to another
        // offset but keeping the direction. This might be an optimazation and should
        // be tested.

        if(lo < hi && lo < bestFitness)
        {
            bestFitness = lo;
            *param[id].score -= offset;

            do
            {
                // copies  current score values. If the result does not improve
                // we will restore the scores.
                updateParameters(BACKUP);

                // modify and compute resulting mse
                *param[id].score -= offset;
                lo = mse();

                // no further improvement, there is nothing more todo for
                // the current parameter. restore the score from the latest
                // modification and continue parameter loop.
                if(lo >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }

                bestFitness = lo;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
        }

        // Now this is code duplication for the other direction.
        // The code should be simplified. A routine with direction parameter
        // would do the job.
        else if(hi < lo && hi < bestFitness)
        {
            bestFitness = hi;
            *param[id].score += offset;

            do
            {
                updateParameters(BACKUP);
                *param[id].score += offset;
                hi = mse();
                if(hi >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }
                bestFitness = hi;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
        }

        printParameters(param);
        printf(" %f ", bestFitness);
    }

    // We looped over the parameter vector. This is defined as a single epoch.
    // To repeat this process, the function nees to be called again. The controller
    // will stop from outside.
    return bestFitness;
}

The Controller:

Code: Select all

// A simple controller for the minimizing fuction
void Tuner::solve()
{
    double bestFitness;

    for(int epoch = 0; epoch < 30; epoch++)
    {
        minimize(param);
        printParameters(param);
        printf(" epoch: %d", epoch);
    }
}

The Helpler functions:

Code: Select all

// The method loops of a batch. Each epd string needs to have a result score.
// It can call a simple static evaluation and it will return a score with white pov.
// The same can be done using a quiesence search also return a score with white pov.
double Tuner::mse()
{
    pos_t pos;
    double errorsum = 0.0;

    for(int i = 0; i < min(epdcount, BATCHSIZE); i++)
    {
        // load position
        Epd::setPosition(&pos, i);

        // compute
        #if USE_QS
        ISearch::qs(&pos);
        #endif
        #if USE_STATIC_EVAL
        ISearch::staticEval(&pos);
        #endif

        // error
        errorsum += computeError(Epd::getResultFromString(i), ISearch::resultScore);
    }

    return errorsum /= min(epdcount, BATCHSIZE);
}

// The result values are 1.0 white wins / 0.5 draw / 0.0 black wins.
// The scalingFactorK is a own topic. Basically you can always use 1.0
// It can be computed, and will express if your evaluation function
// underestimates or overestimates the data. My interpreations is, when
// the scaling factor has been optimized, the eval() fits "perfectly"
// to the existing dataset.
double Tuner::computeError(double result, int score)
{
    double err = result - Math::sigmoid(scalingFactorK * score, 400);
    return err * err;
}

If you find logical error in the code, just let me know.
Otherwise for people having their own ressources and code infrastructure available,
it is a ten minute job to implement it and to try the minimizing algorithm.

Pio · Post by **Pio** » Sun Jan 10, 2021 11:54 pm

Desperado wrote: ↑Sun Jan 10, 2021 10:57 pm Here is my selfmade algorithm, nothing special. I was able to develop it while writing the code.
I have commented the code so that you can follow my confused thoughts better.

MInimization routine:

Code: Select all

// The method performs a loop over all parameters and tries to improve them.
// This routine needs to be repeated serveral times and called by a controller.
// The reason is, when we improved a parameter in the list after having improved
// a previous one, the previous one needs to optimized again because the 
// corresponding parameters did change.

double Tuner::minimize(param_t* param)
{
    double lo, hi, bestFitness;
    int id, offset;

    // shuffle param access, more see next comment
    //for(int i = 0; i < pcount; i++)
    //{
    //    id = rand() % pcount;
    //    tmp = accessId[id];
    //    accessId[id] = accessId[i];
    //    accessId[i] = tmp;
    //}

    bestFitness = mse();

    // the basic approach is to loop over all loaded parameters
    // the accessId includes a shuffled index, so the parameters can be looped in different
    // order. That provides a nice form to pick a random subset of n < i. If "shuffling" is
    // disabled accessId includes the numbers 0...n. pcount is the number of loades parameters.
    for(int i = 0; i < pcount; i++)
    {
        id = accessId[i];

        // defining the change of the score, the offset of: stepsize = 5 range = 2
        // stepsize 5 * rangevalues (-2,-1,0,1,2) = -10,-5,0,5,10
        // This relaxes the term resolution and speed of the algorithm
        // Additionally it is sensitive to the type of term.
        offset = param[id].stepsize * (rand() % (RANGE * 2 + 1) - RANGE);

        // The basic idea is to check wether a parameter improves and
        // in which direction it will improve. Must we substract or add points.
        // Therefore we measure two directions. This is the basic idea of gradient
        // gd algorithm, but i dont know how to implement it, lol.

        // lo - direction
        *param[id].score -= offset;
        lo = mse();
        *param[id].score += offset;

        // hi - direction
        *param[id].score += offset;
        hi = mse();
        *param[id].score -= offset;

        // Now the following situations can happen. The current offset does not
        // improve in any direction, so we continue with the next parameter.
        // More interesting is, when we improve the bestFitness, then we follow
        // the direction which improves more than the other. We keep the direction
        // and the offset as long as possible. As option we can switch to another
        // offset but keeping the direction. This might be an optimazation and should
        // be tested.

        if(lo < hi && lo < bestFitness)
        {
            bestFitness = lo;
            *param[id].score -= offset;

            do
            {
                // copies  current score values. If the result does not improve
                // we will restore the scores.
                updateParameters(BACKUP);

                // modify and compute resulting mse
                *param[id].score -= offset;
                lo = mse();

                // no further improvement, there is nothing more todo for
                // the current parameter. restore the score from the latest
                // modification and continue parameter loop.
                if(lo >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }

                bestFitness = lo;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
        }

        // Now this is code duplication for the other direction.
        // The code should be simplified. A routine with direction parameter
        // would do the job.
        else if(hi < lo && hi < bestFitness)
        {
            bestFitness = hi;
            *param[id].score += offset;

            do
            {
                updateParameters(BACKUP);
                *param[id].score += offset;
                hi = mse();
                if(hi >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }
                bestFitness = hi;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);
        }

        printParameters(param);
        printf(" %f ", bestFitness);
    }

    // We looped over the parameter vector. This is defined as a single epoch.
    // To repeat this process, the function nees to be called again. The controller
    // will stop from outside.
    return bestFitness;
}

The Controller:

Code: Select all

// A simple controller for the minimizing fuction
void Tuner::solve()
{
    double bestFitness;

    for(int epoch = 0; epoch < 30; epoch++)
    {
        minimize(param);
        printParameters(param);
        printf(" epoch: %d", epoch);
    }
}

The Helpler functions:

Code: Select all

// The method loops of a batch. Each epd string needs to have a result score.
// It can call a simple static evaluation and it will return a score with white pov.
// The same can be done using a quiesence search also return a score with white pov.
double Tuner::mse()
{
    pos_t pos;
    double errorsum = 0.0;

    for(int i = 0; i < min(epdcount, BATCHSIZE); i++)
    {
        // load position
        Epd::setPosition(&pos, i);

        // compute
        #if USE_QS
        ISearch::qs(&pos);
        #endif
        #if USE_STATIC_EVAL
        ISearch::staticEval(&pos);
        #endif

        // error
        errorsum += computeError(Epd::getResultFromString(i), ISearch::resultScore);
    }

    return errorsum /= min(epdcount, BATCHSIZE);
}

// The result values are 1.0 white wins / 0.5 draw / 0.0 black wins.
// The scalingFactorK is a own topic. Basically you can always use 1.0
// It can be computed, and will express if your evaluation function
// underestimates or overestimates the data. My interpreations is, when
// the scaling factor has been optimized, the eval() fits "perfectly"
// to the existing dataset.
double Tuner::computeError(double result, int score)
{
    double err = result - Math::sigmoid(scalingFactorK * score, 400);
    return err * err;
}

If you find logical error in the code, just let me know.
Otherwise for people having their own ressources and code infrastructure available,
it is a ten minute job to implement it and to try the minimizing algorithm.

Hi Desperado

I think the big error is that you try to find optimal values for one term within each epoch. That will lead to sub optimisation and will also lead to that corrections for one term will propagate very very slowly to other terms. Instead of trying to find very good values for each term by itself try to update all terms by a tiny amount. Then increase the number of epochs or better when the error loss does not get better by much. If you want faster convergence you can increase the delta for every term that has successively managed to improve two times after each other in the same direction and half the delta by two when it does not and change the test direction on that term on the next pass. Your convergence should converge much faster in that way and it will be a lot less sensitive to initial small step sizes.

My optimisation should yield a big speed up for terms are far from optimum. With the normal way of updating by delta = 1 it will take 1000 passes to converge for a term offset by 1000 from optimum but will only take 20 passes with my optimisation since it will only need 10 doublings and 10 halvings to zoom in on the correct value

Desperado · Post by **Desperado** » Mon Jan 11, 2021 12:25 am

Hi Pio,

... What do you mean by the tuner cannot split the average term for a term? ...

Code: Select all

simpleEval(){
  int mat[PID] = {0,100,300,300,500,900};
  score = materialscore();
  return score
}

In the simpleEval() the tuner will produce an average score of 300
for a minor piece. The score for a knight cannot be devided into sth. like
(100+500)/2 which is also 300 for that term (knight value).

Code: Select all

taperedEval() {
  int mgMat[PID] = {0, 50, 60, 70 150, 200};
  int egMat[PID] = {0,150,540,530,850,1600};
  mg = materialMG();
  eg = materialEG();
  score = (mg * phase + eg *(PHASE_MAX-phase)) / PHASE_MAX
  return score;
}

In the tapered eval() the tuner gets the opportunity to divide the term (knight value),
into something like (60+540)/2 which is also 300. The tuner tries to keep the 300 BUT

If you have 10000 positions and the average mg-phase is 13 of 24, the weights are 13(mg):11(eg)
code speaks more than words...

Code: Select all

    const int mg_eg_sum = 200; // pawnvalue (90,110)
    int mg, eg;
    for(int i = 0; i < mg_eg_sum; i++)
    {
        mg = (mg_eg_sum - i);
        eg = i;
        printf("\n[%d %d %.4f] ", mg, eg, (double) ((mg * 13) + (eg * 11)) / 24);
    }
    getchar();

The tuner will bring the mg value to its minum because the phase factor is dominant.

Code: Select all

[200 0 108.3333]
[199 1 108.2500]
[198 2 108.1667]
[197 3 108.0833]
[196 4 108.0000]
[195 5 107.9167]
[194 6 107.8333]
[193 7 107.7500]
[192 8 107.6667]
[191 9 107.5833]
[190 10 107.5000]
...
[1 199  91.7500]

Now, a short answer to your last post. Please look at the routines minimize() and solve().
There is a pretty simple documentation what the functions are doing. One Epoch loops over all parameters,
modifiying each single one by a tiny amount, as long it can be improved. The minimizing code is at most 50 lines
of code (with code duplication). The complete logic can be seen in this routine it is very simple.

I gladly accept suggestions for improvement, especially if the code shown was also read

Thanks anyway for your ideas and comments.

Regards.

Pio · Post by **Pio** » Mon Jan 11, 2021 1:07 am

Desperado wrote: ↑Mon Jan 11, 2021 12:25 am Hi Pio,

... What do you mean by the tuner cannot split the average term for a term? ...
Code: Select all
simpleEval(){
  int mat[PID] = {0,100,300,300,500,900};
  score = materialscore();
  return score
}
In the simpleEval() the tuner will produce an average score of 300
for a minor piece. The score for a knight cannot be devided into sth. like
(100+500)/2 which is also 300 for that term (knight value).
Code: Select all
taperedEval() {
  int mgMat[PID] = {0, 50, 60, 70 150, 200};
  int egMat[PID] = {0,150,540,530,850,1600};
  mg = materialMG();
  eg = materialEG();
  score = (mg * phase + eg *(PHASE_MAX-phase)) / PHASE_MAX
  return score;
}
In the tapered eval() the tuner gets the opportunity to divide the term (knight value),
into something like (60+540)/2 which is also 300. The tuner tries to keep the 300 BUT

If you have 10000 positions and the average mg-phase is 13 of 24, the weights are 13(mg):11(eg)
code speaks more than words...
Code: Select all
    const int mg_eg_sum = 200; // pawnvalue (90,110)
    int mg, eg;
    for(int i = 0; i < mg_eg_sum; i++)
    {
        mg = (mg_eg_sum - i);
        eg = i;
        printf("\n[%d %d %.4f] ", mg, eg, (double) ((mg * 13) + (eg * 11)) / 24);
    }
    getchar();
	
The tuner will bring the mg value to its minum because the phase factor is dominant.
Code: Select all
[200 0 108.3333]
[199 1 108.2500]
[198 2 108.1667]
[197 3 108.0833]
[196 4 108.0000]
[195 5 107.9167]
[194 6 107.8333]
[193 7 107.7500]
[192 8 107.6667]
[191 9 107.5833]
[190 10 107.5000]
...
[1 199  91.7500]
Now, a short answer to your last post. Please look at the routines minimize() and solve().
There is a pretty simple documentation what the functions are doing. One Epoch loops over all parameters,
modifiying each single one by a tiny amount, as long it can be improved. The minimizing code is at most 50 lines
of code (with code duplication). The complete logic can be seen in this routine it is very simple.

I gladly accept suggestions for improvement, especially if the code shown was also read

Thanks anyway for your ideas and comments.

Regards.

I tried my best to read your code but I have only programmed very little in C++ 15 years ago.

I got the impression that you tried to get a very good estimate of every term by itself with the following code.

Code: Select all


offset;

            do
            {
                // copies  current score values. If the result does not improve
                // we will restore the scores.
                updateParameters(BACKUP);

                // modify and compute resulting mse
                *param[id].score -= offset;
                lo = mse();

                // no further improvement, there is nothing more todo for
                // the current parameter. restore the score from the latest
                // modification and continue parameter loop.
                if(lo >= bestFitness)
                {
                    updateParameters(RESET);
                    break;
                }

                bestFitness = lo;
                printParameters(param);
                printf(" %f ", bestFitness);
            } while(1);

I have no idea about what you mean by

The tuner will bring the mg value to its minum because the phase factor is dominant.

The values tuned won’t change their values in a direct way depending on the phase at all (if the phase is not also tuned). Only the sensitivity will go up with the weight (or phase it has) since the term will contribute more to the error.

I have read the code you posted, but I have no idea of how you are handling the tuning of your tapered eval. The tuning of the tapered eval should not be any different than with not having a tapered eval. You can just see the different MG EG terms as completely distinct terms that they are.

I am trying to help but I cannot guess code you haven’t posted. If you don’t want my advice such as speeding up the convergence exponentially I won’t answer any more.

Pio · Post by **Pio** » Mon Jan 11, 2021 1:39 am

I can only guess that you somehow are averaging up phase values even though you absolutely shouldn’t. You should just treat the phase as a different constant for each position and treat the different MG and EG terms as completely different terms. You should drop the thought that MG and EG are something special or that they have any other special dependencies compared with any other term combination. For example will the knight value be very correlated with the bishop value and the rook value will be very correlated with the queen value. The minimisation of the error will take care of everything. I also posted in another thread recently that you could speed up the convergence even faster with orthogonalization with for example PCA but it will be a mess to implement that without floating point arithmetics.

Tapered Evaluation and MSE (Texel Tuning)

Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)