Devlog of Leorik

lithander · Post by **lithander** » Mon Mar 28, 2022 6:43 pm

It's rare that programming something these days makes me go "WHOAAAT?!?" like when I was a kid in the 90s. But here's a crazy story on optimization...

A.k.a. how to tune high-quality PSTs from scratch (material values) in 20 seconds.

All previous versions of Leorik have just taken their PST values from MinimalChess. I was pretty happy with the quality but tuning these tables took a lot of time. Literally days of trial and error with hours of waiting in between.

For Leorik I wanted to revisit the tuning process to get shorter iteration times for my upcoming experiments with a better evaluation.

I'm not sure how familiar everybody here is with tuning? Maybe a short recap/introduction can't hurt. Everyone knows PSTs I guess. But where do the values come from? It turns out you can write a program to "extract" these values mathematically from a set of annotated positions. That's typically just a text file where each line is a FEN string of the position and a game result: Is it winning, losing or a draw? And you have a lot of such lines. Maybe a million.

It's easy to imagine that you can use such a data set to assess the predictive quality of your evaluation. You typically use a sigmoid function to map the unbounded evaluation score given in centipawns into the range of [-1..1] the spectrum between a win and a loss. The difference of of the stored outcome and your prediction is the error. Iterate over the complete set of positions and compute the mean squared error of your eval. The goal of tuning is now to find a configuration of PST values that minimizes this error function!

Maybe you've heard of Texel's tuning method or used it yourself. That's what I used for MinimalChess too because it's really... really... simple. You literally just add +1 to one of your PST values and compare the result of the error function before and after the change. Did it help? If not you try to decrease the value instead. You do that for all PST values and you keep doing it until neither increasing nor decreasing any of the PST values will minimize the error function further.

This is pretty slow because for each change of *one* value you'll have to run the error function again over all positions. In my case 725.000 positions. You can do that only a few times per second and you need a lot of +1 and -1 until you arrive at good PSTs if you start from just material values.

In the last year I remember a few posts where people mentioned they were using gradient descent for their tuning. I also remembered a good online course on machine learning that I did a few years ago (and subsequently forgot all details about) so I watched the relevant videos again. Andrew Ng is really a good teacher but the stuff is very math heavy and doesn't seem to fit at all to what I'm doing with my chess engine. Lot's of greek symbols and partial deriviatives. How does all that relate to my incremental evaluation function? With tapering based on gamephase? The vector of coefficients is easy: That's just all the PST values together. A vector with 756 elements. But what's my feature vector?

So I turned the question around: What feature vector would produce the same result when multiplied with a vector of coefficients as my engine's current evaluation? Turns out you can calculate such a vector for each FEN and at that point the tuning has little to do anymore with the details of your engine. Now it fits the information you find on wikipedia or elsewhere.

Code: Select all

        public static float[] GetFeatures(BoardState pos, double phase)
        {
            float[] result = new float[N];

            //phase is used to interpolate between endgame and midgame score but we want to incorporate it into the features vector
            //score = midgameScore + phase * (endgameScore - midgameScore)
            //score = midgameScore + phase * endgameScore - phase * midgameScore
            //score = phase * endgameScore + (1 - phase) * midgameScore;
            float phaseEg = (float)(phase);
            float phaseMG = (float)(1 - phase);

            ulong occupied = pos.Black | pos.White;
            for (ulong bits = occupied; bits != 0; bits = Bitboard.ClearLSB(bits))
            {
                int square = Bitboard.LSB(bits);
                Piece piece = pos.GetPiece(square);
                int pieceOffset = ((int)piece >> 2) - 1;
                int squareIndex = (piece & Piece.ColorMask) == Piece.White ? square ^ 56 : square;
                int sign = (piece & Piece.ColorMask) == Piece.White ? 1 : -1;

                int iMg = pieceOffset * 128 + 2 * squareIndex;
                int iEg = iMg + 1;
                result[iMg] += sign * phaseMG;
                result[iEg] += sign * phaseEg;
            }
            return result;
        }

Now to compute the MSE on the training set you don't need the engine's evaluation routines anymore. Great!

Code: Select all

        public static double MeanSquareError(List<Data2> data, float[] coefficients, double scalingCoefficient)
        {
            double squaredErrorSum = 0;
            foreach (Data2 entry in data)
            {
                float eval = Evaluate(entry.Features, coefficients);
                squaredErrorSum += SquareError(entry.Result, eval, scalingCoefficient);
            }
            double result = squaredErrorSum / data.Count;
            return result;
        }

        public static float Evaluate(float[] features, float[] coefficients)
        {
            //dot product of features vector with coefficients vector
            float result = 0;
            for (int i = 0; i < N; i++)
                result += features[i] * coefficients[i];
            return result;
        }

Now you have just lots of arrays of 768 floats which you multiply-add together with a standard dot product. And the result is your evaluation. And still you try to find values for the coefficients so that the MeanSquareError on the total set is minimized. That's where it clicked for me conceptually and it became about optimizing the speed of the implementation again.

So, first of all gradient descent is already much faster than Texel's tuning method. Because with one pass over the training data set you accumulate information for each coefficient at the same time: You take note whether it was on average contributing too much or too little to the result of the evaluations. (That information is the gradient) And based on that information you can adjust all coefficients at once into the right direction! With one pass over the training set you can improve all 768 values of the PSTs at once. When you start with just material values (all Pawn tables include only values of 100 for example) about 2000 such iterations later you have already pretty decent quality PSTs. And one iteration would take less than a second to compute. So it takes maybe half an hour to compute a good set of PSTs from scratch.

But can we make it faster?

The first attempt I did was to use SIMD instructions.

Code: Select all

public static float EvaluateSIMD(float[] features, float[] coefficients)
        {
            //dot product of features vector with coefficients vector
            float result = 0;
            int slots = Vector<float>.Count;
            for (int i = 0; i < N; i += slots)
            {
                Vector<float> vF = new Vector<float>(features, i);
                Vector<float> vC = new Vector<float>(coefficients, i);
                result += Vector.Dot(vF, vC);
            }
            return result;
        }

This does help. It's pretty much exactly twice as fast as the previously shown implementation which was using a simple for-loop over an array of floats. But given that there's space for 8 32bit floats in a 256bit vector I was kinda hoping for more. But I guess that's about right for C#. Not sure how much better C/C++ would do.

If you look at the feature vector however you realize that only a few percent of elements are actually non-zero. The vast majority of the multiplications and additions don't affect the result at all. So I introduced an index buffer to each feature vector storing the indices that have non-zero values.

Code: Select all

private static float Evaluate(float[] features, short[] indices, float[] coefficients)
        {
            //dot product of a selection (indices) of elements from the features vector with coefficients vector
            float result = 0;
            foreach (short i in indices)
                result += features[i] * coefficients[i];
            return result;
        }

Turns out this is already doing better than the SIMD version despite not involving any SIMD. It's almost twice as fast: An iteration of gradient descent over the complete set of position takes 200ms now. With the SIMD implementation it was 400ms. With just plain for loops it was 700ms.

But even though the index buffer helps to save pointless computations all these zeros still take a lot of space up in memory. After loading the 725.000 positions that each consist of 756 floats to encode the features the process uses 2500MB of memory. To store mostly zeroes!

So I changed the encoding to store features as a tuple of (float, short) where the float is the feature value and the short the index. No need to store zero's and no need to have an index buffer. Just iterate over these tuples and you get the value (the float) and the index of the coeffecient to multiply with (the short) and it's sorted correctly so that you can utilize the same cache lines as much as possible.
The process memory shrunk to just 300 MB that way. And now a full iteration of gradient descent takes only 65ms! This is a great example of how important cache friendly programming is these days! And it was my first big WHOAAT? moment of the day.

This is already quite fast. Fast enough for all practical purposes probably but it's still just running on one thread. In theory this workload should be well suited for parallelization. And I was waiting for an excuse to try the new Task Parallel Library that was recently added to .Net
It's one of the examples where the documentation leaves you scratching your head for a while. Seriously... just look at the method signatures of the different overloads.

But in actual code it looks quite elegant.

Code: Select all

public static void Minimize(List<Data2> data, float[] coefficients, double scalingCoefficient, float alpha)
        {
            float[] accu = new float[N];
            foreach (Data2 entry in data)
            {
                float eval = Evaluate(entry.Features, coefficients);
                double sigmoid = 2 / (1 + Math.Exp(-(eval / scalingCoefficient))) - 1;
                float error = (float)(sigmoid - entry.Result);

                foreach (Feature f in entry.Features)
                    accu[f.Index] += error * f.Value;
            }

            for (int i = 0; i < N; i++)
                coefficients[i] -= alpha * accu[i] / data.Count;
        }

...becomes...

Code: Select all

        public static void MinimizeParallel(List<Data2> data, float[] coefficients, double scalingCoefficient, float alpha)
        {
            //each thread maintains a local accu. After the loop is complete the accus are combined
            Parallel.ForEach(data,
                //initialize the local variable accu
                () => new float[N],
                //invoked by the loop on each iteration in parallel
                (entry, loop, accu) => 
                {
                    float eval = Evaluate(entry.Features, coefficients);
                    double sigmoid = 2 / (1 + Math.Exp(-(eval / scalingCoefficient))) - 1;
                    float error = (float)(sigmoid - entry.Result);

                    foreach (Feature f in entry.Features)
                        accu[f.Index] += error * f.Value;

                    return accu;
                },
                //executed when each partition has completed.
                (accu) =>
                {
                    lock(coefficients)
                    {
                        for (int i = 0; i < N; i++)
                            coefficients[i] -= alpha * accu[i] / data.Count;
                    }
                }
            );
        }

It's pretty cool how my specific use-case where you need a thread-local accumulation buffer is not a major problem for the API even though picking the right overload was a bit tricky. But most of the added lines were actually comments. And you don't have to deal with any of the details like how many threads or cores are best suited for that kind of problem. So it really feels like you're just programming the equivalent of "MAKE THIS RUN FAST PLZZZ!"
And how fast is it? If you run the program after this change all 12 logical cores of my Ryzen 3600 are reported to be utilized equally with 75% load. And the net result is another 700% speedup! Now it's 9ms per iteration of gradient descent. That was the 2nd time my jaw dropped today.

That's 2000 epochs in less than 20 seconds. And that allows you to tune tapered PSTs from scratch just starting with material values in the time it takes to... well... I don't know much that could be accomplished in 20 seconds, actually.

Mike Sherwin · Post by **Mike Sherwin** » Mon Mar 28, 2022 7:20 pm

lithander wrote: ↑Mon Mar 28, 2022 6:43 pm It's rare that programming something these days makes me go "WHOAAAT?!?" like when I was a kid in the 90s. But here's a crazy story on optimization...

A.k.a. how to tune high-quality PSTs from scratch (material values) in 20 seconds.

All previous versions of Leorik have just taken their PST values from MinimalChess. I was pretty happy with the quality but tuning these tables took a lot of time. Literally days of trial and error with hours of waiting in between.

For Leorik I wanted to revisit the tuning process to get shorter iteration times for my upcoming experiments with a better evaluation.

I'm not sure how familiar everybody here is with tuning? Maybe a short recap/introduction can't hurt. Everyone knows PSTs I guess. But where do the values come from? It turns out you can write a program to "extract" these values mathematically from a set of annotated positions. That's typically just a text file where each line is a FEN string of the position and a game result: Is it winning, losing or a draw? And you have a lot of such lines. Maybe a million.

It's easy to imagine that you can use such a data set to assess the predictive quality of your evaluation. You typically use a sigmoid function to map the unbounded evaluation score given in centipawns into the range of [-1..1] the spectrum between a win and a loss. The difference of of the stored outcome and your prediction is the error. Iterate over the complete set of positions and compute the mean squared error of your eval. The goal of tuning is now to find a configuration of PST values that minimizes this error function!

Maybe you've heard of Texel's tuning method or used it yourself. That's what I used for MinimalChess too because it's really... really... simple. You literally just add +1 to one of your PST values and compare the result of the error function before and after the change. Did it help? If not you try to decrease the value instead. You do that for all PST values and you keep doing it until neither increasing nor decreasing any of the PST values will minimize the error function further.

This is pretty slow because for each change of *one* value you'll have to run the error function again over all positions. In my case 725.000 positions. You can do that only a few times per second and you need a lot of +1 and -1 until you arrive at good PSTs if you start from just material values.

In the last year I remember a few posts where people mentioned they were using gradient descent for their tuning. I also remembered a good online course on machine learning that I did a few years ago (and subsequently forgot all details about) so I watched the relevant videos again. Andrew Ng is really a good teacher but the stuff is very math heavy and doesn't seem to fit at all to what I'm doing with my chess engine. Lot's of greek symbols and partial deriviatives. How does all that relate to my incremental evaluation function? With tapering based on gamephase? The vector of coefficients is easy: That's just all the PST values together. A vector with 756 elements. But what's my feature vector?

So I turned the question around: What feature vector would produce the same result when multiplied with a vector of coefficients as my engine's current evaluation? Turns out you can calculate such a vector for each FEN and at that point the tuning has little to do anymore with the details of your engine. Now it fits the information you find on wikipedia or elsewhere.
Code: Select all
        public static float[] GetFeatures(BoardState pos, double phase)
        {
            float[] result = new float[N];

            //phase is used to interpolate between endgame and midgame score but we want to incorporate it into the features vector
            //score = midgameScore + phase * (endgameScore - midgameScore)
            //score = midgameScore + phase * endgameScore - phase * midgameScore
            //score = phase * endgameScore + (1 - phase) * midgameScore;
            float phaseEg = (float)(phase);
            float phaseMG = (float)(1 - phase);

            ulong occupied = pos.Black | pos.White;
            for (ulong bits = occupied; bits != 0; bits = Bitboard.ClearLSB(bits))
            {
                int square = Bitboard.LSB(bits);
                Piece piece = pos.GetPiece(square);
                int pieceOffset = ((int)piece >> 2) - 1;
                int squareIndex = (piece & Piece.ColorMask) == Piece.White ? square ^ 56 : square;
                int sign = (piece & Piece.ColorMask) == Piece.White ? 1 : -1;

                int iMg = pieceOffset * 128 + 2 * squareIndex;
                int iEg = iMg + 1;
                result[iMg] += sign * phaseMG;
                result[iEg] += sign * phaseEg;
            }
            return result;
        } 
Now to compute the MSE on the training set you don't need the engine's evaluation routines anymore. Great!
Code: Select all
        public static double MeanSquareError(List<Data2> data, float[] coefficients, double scalingCoefficient)
        {
            double squaredErrorSum = 0;
            foreach (Data2 entry in data)
            {
                float eval = Evaluate(entry.Features, coefficients);
                squaredErrorSum += SquareError(entry.Result, eval, scalingCoefficient);
            }
            double result = squaredErrorSum / data.Count;
            return result;
        }

        public static float Evaluate(float[] features, float[] coefficients)
        {
            //dot product of features vector with coefficients vector
            float result = 0;
            for (int i = 0; i < N; i++)
                result += features[i] * coefficients[i];
            return result;
        }
Now you have just lots of arrays of 768 floats which you multiply-add together with a standard dot product. And the result is your evaluation. And still you try to find values for the coefficients so that the MeanSquareError on the total set is minimized. That's where it clicked for me conceptually and it became about optimizing the speed of the implementation again.

So, first of all gradient descent is already much faster than Texel's tuning method. Because with one pass over the training data set you accumulate information for each coefficient at the same time: You take note whether it was on average contributing too much or too little to the result of the evaluations. (That information is the gradient) And based on that information you can adjust all coefficients at once into the right direction! With one pass over the training set you can improve all 768 values of the PSTs at once. When you start with just material values (all Pawn tables include only values of 100 for example) about 2000 such iterations later you have already pretty decent quality PSTs. And one iteration would take less than a second to compute. So it takes maybe half an hour to compute a good set of PSTs from scratch.

But can we make it faster?

The first attempt I did was to use SIMD instructions.
Code: Select all
public static float EvaluateSIMD(float[] features, float[] coefficients)
        {
            //dot product of features vector with coefficients vector
            float result = 0;
            int slots = Vector<float>.Count;
            for (int i = 0; i < N; i += slots)
            {
                Vector<float> vF = new Vector<float>(features, i);
                Vector<float> vC = new Vector<float>(coefficients, i);
                result += Vector.Dot(vF, vC);
            }
            return result;
        }
This does help. It's pretty much exactly twice as fast as the previously shown implementation which was using a simple for-loop over an array of floats. But given that there's space for 8 32bit floats in a 256bit vector I was kinda hoping for more. But I guess that's about right for C#. Not sure how much better C/C++ would do.

If you look at the feature vector however you realize that only a few percent of elements are actually non-zero. The vast majority of the multiplications and additions don't affect the result at all. So I introduced an index buffer to each feature vector storing the indices that have non-zero values.
Code: Select all
private static float Evaluate(float[] features, short[] indices, float[] coefficients)
        {
            //dot product of a selection (indices) of elements from the features vector with coefficients vector
            float result = 0;
            foreach (short i in indices)
                result += features[i] * coefficients[i];
            return result;
        }
Turns out this is already doing better than the SIMD version despite not involving any SIMD. It's almost twice as fast: An iteration of gradient descent over the complete set of position takes 200ms now. With the SIMD implementation it was 400ms. With just plain for loops it was 700ms.

But even though the index buffer helps to save pointless computations all these zeros still take a lot of space up in memory. After loading the 725.000 positions that each consist of 756 floats to encode the features the process uses 2500MB of memory. To store mostly zeroes! So I changed the encoding to store features as a tuple of (float, short) where the float is the feature value and the short the index. No need to store zero's and no need to have an index buffer. Just iterate over these tuples and you get the value (the float) and the index of the coeffecient to multiply with (the short) and it's sorted correctly so that you can utilize the same cache lines as much as possible.
The process memory shrunk to just 300 MB that way. And now a full iteration of gradient descent takes only 65ms! This is a great example of how important cache friendly programming is these days! And it was my first big WHOAAT? moment of the day.

This is already quite fast. Fast enough for all practical purposes probably but it's still just running on one thread. In theory this workload should be well suited for parallelization. And I was waiting for an excuse to try the new Task Parallel Library that was recently added to .Net
It's one of the examples where the documentation leaves you scratching your head for a while. Seriously... just look at the method signatures of the different overloads.

But in actual code it looks quite elegant.
Code: Select all
public static void Minimize(List<Data2> data, float[] coefficients, double scalingCoefficient, float alpha)
        {
            float[] accu = new float[N];
            foreach (Data2 entry in data)
            {
                float eval = Evaluate(entry.Features, coefficients);
                double sigmoid = 2 / (1 + Math.Exp(-(eval / scalingCoefficient))) - 1;
                float error = (float)(sigmoid - entry.Result);

                foreach (Feature f in entry.Features)
                    accu[f.Index] += error * f.Value;
            }

            for (int i = 0; i < N; i++)
                coefficients[i] -= alpha * accu[i] / data.Count;
        }
...becomes...
Code: Select all
        public static void MinimizeParallel(List<Data2> data, float[] coefficients, double scalingCoefficient, float alpha)
        {
            //each thread maintains a local accu. After the loop is complete the accus are combined
            Parallel.ForEach(data,
                //initialize the local variable accu
                () => new float[N],
                //invoked by the loop on each iteration in parallel
                (entry, loop, accu) => 
                {
                    float eval = Evaluate(entry.Features, coefficients);
                    double sigmoid = 2 / (1 + Math.Exp(-(eval / scalingCoefficient))) - 1;
                    float error = (float)(sigmoid - entry.Result);

                    foreach (Feature f in entry.Features)
                        accu[f.Index] += error * f.Value;

                    return accu;
                },
                //executed when each partition has completed.
                (accu) =>
                {
                    lock(coefficients)
                    {
                        for (int i = 0; i < N; i++)
                            coefficients[i] -= alpha * accu[i] / data.Count;
                    }
                }
            );
        }
It's pretty cool how my specific use-case where you need a thread-local accumulation buffer is not a major problem for the API even though picking the right overload was a bit tricky. But most of the added lines were actually comments. And you don't have to deal with any of the details like how many threads or cores are best suited for that kind of problem. So it really feels like you're just programming the equivalent of "MAKE THIS RUN FAST PLZZZ!"
And how fast is it? If you run the program after this change all 12 logical cores of my Ryzen 3600 are reported to be utilized equally with 75% load. And the net result is another 700% speedup! Now it's 9ms per iteration of gradient descent. That was the 2nd time my jaw dropped today.

That's 2000 epochs in less than 20 seconds. And that allows you to tune tapered PSTs from scratch just starting with material values in the time it takes to... well... I don't know much that could be accomplished in 20 seconds, actually.

It seems to me that Alpha Zero needing 40 million games to tune its evaluation makes it look like 750,000 positions might not be enough. Genetic mutation strategies take way to long starting from scratch. However, now that you have good PSTs why not try to make them better using some genetic mutation strategy. I would randomly create 1000 different small variant versions and have them play each other. After so many games I'd eliminate the bottom 500 and randomly create 500 more from say the top 100.

mvanthoor · Post by **mvanthoor** » Tue Mar 29, 2022 12:05 am

lithander wrote: ↑Mon Mar 28, 2022 6:43 pm

Now to compute the MSE on the training set you don't need the engine's evaluation routines anymore. Great!

Nice write-up; I'll have to re-read that again when I pick up development of Rustic again.

However: you don't need your evaluation function? How can this be? PST's are just the basis where most evaluations start, and then they stack lots of other terms on top of them. You also need to tune those, and they need to be tuned in conjunction with the PST values; so maybe your method will work if you only tune PST's, but as soon as you are adding other terms in the evaluation, I suspect you'll need it again.

lithander · Post by **lithander** » Tue Mar 29, 2022 6:03 pm

mvanthoor wrote: ↑Tue Mar 29, 2022 12:05 am Nice write-up; I'll have to re-read that again when I pick up development of Rustic again.

Thanks. I was thinking about you when I decided to write this almost tutorial-like update. Let's hope it will prove useful!

mvanthoor wrote: ↑Tue Mar 29, 2022 12:05 am However: you don't need your evaluation function? How can this be? PST's are just the basis where most evaluations start, and then they stack lots of other terms on top of them. You also need to tune those, and they need to be tuned in conjunction with the PST values; so maybe your method will work if you only tune PST's, but as soon as you are adding other terms in the evaluation, I suspect you'll need it again.

At the moment Leorik has only PSTs as an evaluation so for *me* that was not a problem. If you have other eval terms and you don't plan to tune them you can just treat them as a constant offset to whatever your Dot(feature, coefficients) function spits out. If you want to tune these terms too you have to find a way to include them in the feature vector I guess. But in either case you wouldn't have to call your engines eval function during the tuning process. (afaik)

Mike Sherwin wrote: ↑Mon Mar 28, 2022 7:20 pm It seems to me that Alpha Zero needing 40 million games to tune its evaluation makes it look like 750,000 positions might not be enough.

That's like comparing apples with a forest of apple trees. A rather small 192x15 Leela Zero network is still 21MB large. That's a LOT of weights they have to "tune" in their engine. My PSTs are less then 2KB in comparison and so I also need much less data to tune them.

Mike Sherwin wrote: ↑Mon Mar 28, 2022 7:20 pm Genetic mutation strategies take way to long starting from scratch. However, now that you have good PSTs why not try to make them better using some genetic mutation strategy. I would randomly create 1000 different small variant versions and have them play each other. After so many games I'd eliminate the bottom 500 and randomly create 500 more from say the top 100.

As you have explained and demonstrated earlier PSTs alone are just too limited to really capture all the details of what makes one board position better than another. So even if evolutionary algorithms would be able to find better coefficients than gradient descent I don't expect large Elo gains from that.

My plan is instead to expand the concept of what is considered the typical use of PSTs. Why limit yourself to only a 'phase' parameter? I'm thinking of adding additional parameters for something like the positions of the kings. Why just a midgame and an endgame table per piece? I'm thinking of splitting pawns into different tables (and thus values) for passed pawns or isolated pawns etc.

What Leorik is missing is the ability to reason about concepts like Pawn structure, King safety and mobility. But instead of adding hand crafted evaluation terms on top of the PSTs or just skipping all that to adopt NNUEs right away (which promises the biggest Elo gains) I'm interested in exploring the vast space between neural networks and classical PSTs.

jswaff · Post by **jswaff** » Sat Apr 16, 2022 4:31 am

I'm graduating from the Texel method to gradient descent (with mini-batches). It's working beautifully, but I simplified my evaluation by disabling "tapered eval" and the draw-scaling-factor while I was working everything out. Now that it's working I'm trying to work out how to fold tapered eval back in. Can you speak a bit more about how you did that?

BTW I just finished the ML course you referenced by Andrew Ng.

That really helped tremendously.

EDIT: I think I've answered my own question. I'm currently doing a simple matrix multiplication to compute the eval score for each record:

SimpleMatrix b = x.mult(theta);

x is the feature matrix (one row per training record and one column per feature), and theta contains the weights. So to do tapered eval I suppose I'll have to do a little more work here, like calculating the middle game score (using x and theta), and the endgame score (using x and theta), and then do the interpolation just as the Eval function does. I still don't need to call the evaluation but it's a bit more work than a simple matrix multiplication. Am I on the right track?

lithander wrote: ↑Mon Mar 28, 2022 6:43 pm It's rare that programming something these days makes me go "WHOAAAT?!?" like when I was a kid in the 90s. But here's a crazy story on optimization...

A.k.a. how to tune high-quality PSTs from scratch (material values) in 20 seconds.

All previous versions of Leorik have just taken their PST values from MinimalChess. I was pretty happy with the quality but tuning these tables took a lot of time. Literally days of trial and error with hours of waiting in between.

For Leorik I wanted to revisit the tuning process to get shorter iteration times for my upcoming experiments with a better evaluation.

I'm not sure how familiar everybody here is with tuning? Maybe a short recap/introduction can't hurt. Everyone knows PSTs I guess. But where do the values come from? It turns out you can write a program to "extract" these values mathematically from a set of annotated positions. That's typically just a text file where each line is a FEN string of the position and a game result: Is it winning, losing or a draw? And you have a lot of such lines. Maybe a million.

It's easy to imagine that you can use such a data set to assess the predictive quality of your evaluation. You typically use a sigmoid function to map the unbounded evaluation score given in centipawns into the range of [-1..1] the spectrum between a win and a loss. The difference of of the stored outcome and your prediction is the error. Iterate over the complete set of positions and compute the mean squared error of your eval. The goal of tuning is now to find a configuration of PST values that minimizes this error function!

Maybe you've heard of Texel's tuning method or used it yourself. That's what I used for MinimalChess too because it's really... really... simple. You literally just add +1 to one of your PST values and compare the result of the error function before and after the change. Did it help? If not you try to decrease the value instead. You do that for all PST values and you keep doing it until neither increasing nor decreasing any of the PST values will minimize the error function further.

This is pretty slow because for each change of *one* value you'll have to run the error function again over all positions. In my case 725.000 positions. You can do that only a few times per second and you need a lot of +1 and -1 until you arrive at good PSTs if you start from just material values.

In the last year I remember a few posts where people mentioned they were using gradient descent for their tuning. I also remembered a good online course on machine learning that I did a few years ago (and subsequently forgot all details about) so I watched the relevant videos again. Andrew Ng is really a good teacher but the stuff is very math heavy and doesn't seem to fit at all to what I'm doing with my chess engine. Lot's of greek symbols and partial deriviatives. How does all that relate to my incremental evaluation function? With tapering based on gamephase? The vector of coefficients is easy: That's just all the PST values together. A vector with 756 elements. But what's my feature vector?

So I turned the question around: What feature vector would produce the same result when multiplied with a vector of coefficients as my engine's current evaluation? Turns out you can calculate such a vector for each FEN and at that point the tuning has little to do anymore with the details of your engine. Now it fits the information you find on wikipedia or elsewhere.
Code: Select all
        public static float[] GetFeatures(BoardState pos, double phase)
        {
            float[] result = new float[N];

            //phase is used to interpolate between endgame and midgame score but we want to incorporate it into the features vector
            //score = midgameScore + phase * (endgameScore - midgameScore)
            //score = midgameScore + phase * endgameScore - phase * midgameScore
            //score = phase * endgameScore + (1 - phase) * midgameScore;
            float phaseEg = (float)(phase);
            float phaseMG = (float)(1 - phase);

            ulong occupied = pos.Black | pos.White;
            for (ulong bits = occupied; bits != 0; bits = Bitboard.ClearLSB(bits))
            {
                int square = Bitboard.LSB(bits);
                Piece piece = pos.GetPiece(square);
                int pieceOffset = ((int)piece >> 2) - 1;
                int squareIndex = (piece & Piece.ColorMask) == Piece.White ? square ^ 56 : square;
                int sign = (piece & Piece.ColorMask) == Piece.White ? 1 : -1;

                int iMg = pieceOffset * 128 + 2 * squareIndex;
                int iEg = iMg + 1;
                result[iMg] += sign * phaseMG;
                result[iEg] += sign * phaseEg;
            }
            return result;
        } 
Now to compute the MSE on the training set you don't need the engine's evaluation routines anymore. Great!
Code: Select all
        public static double MeanSquareError(List<Data2> data, float[] coefficients, double scalingCoefficient)
        {
            double squaredErrorSum = 0;
            foreach (Data2 entry in data)
            {
                float eval = Evaluate(entry.Features, coefficients);
                squaredErrorSum += SquareError(entry.Result, eval, scalingCoefficient);
            }
            double result = squaredErrorSum / data.Count;
            return result;
        }

        public static float Evaluate(float[] features, float[] coefficients)
        {
            //dot product of features vector with coefficients vector
            float result = 0;
            for (int i = 0; i < N; i++)
                result += features[i] * coefficients[i];
            return result;
        }
Now you have just lots of arrays of 768 floats which you multiply-add together with a standard dot product. And the result is your evaluation. And still you try to find values for the coefficients so that the MeanSquareError on the total set is minimized. That's where it clicked for me conceptually and it became about optimizing the speed of the implementation again.

So, first of all gradient descent is already much faster than Texel's tuning method. Because with one pass over the training data set you accumulate information for each coefficient at the same time: You take note whether it was on average contributing too much or too little to the result of the evaluations. (That information is the gradient) And based on that information you can adjust all coefficients at once into the right direction! With one pass over the training set you can improve all 768 values of the PSTs at once. When you start with just material values (all Pawn tables include only values of 100 for example) about 2000 such iterations later you have already pretty decent quality PSTs. And one iteration would take less than a second to compute. So it takes maybe half an hour to compute a good set of PSTs from scratch.

But can we make it faster?

The first attempt I did was to use SIMD instructions.
Code: Select all
public static float EvaluateSIMD(float[] features, float[] coefficients)
        {
            //dot product of features vector with coefficients vector
            float result = 0;
            int slots = Vector<float>.Count;
            for (int i = 0; i < N; i += slots)
            {
                Vector<float> vF = new Vector<float>(features, i);
                Vector<float> vC = new Vector<float>(coefficients, i);
                result += Vector.Dot(vF, vC);
            }
            return result;
        }
This does help. It's pretty much exactly twice as fast as the previously shown implementation which was using a simple for-loop over an array of floats. But given that there's space for 8 32bit floats in a 256bit vector I was kinda hoping for more. But I guess that's about right for C#. Not sure how much better C/C++ would do.

If you look at the feature vector however you realize that only a few percent of elements are actually non-zero. The vast majority of the multiplications and additions don't affect the result at all. So I introduced an index buffer to each feature vector storing the indices that have non-zero values.
Code: Select all
private static float Evaluate(float[] features, short[] indices, float[] coefficients)
        {
            //dot product of a selection (indices) of elements from the features vector with coefficients vector
            float result = 0;
            foreach (short i in indices)
                result += features[i] * coefficients[i];
            return result;
        }
Turns out this is already doing better than the SIMD version despite not involving any SIMD. It's almost twice as fast: An iteration of gradient descent over the complete set of position takes 200ms now. With the SIMD implementation it was 400ms. With just plain for loops it was 700ms.

But even though the index buffer helps to save pointless computations all these zeros still take a lot of space up in memory. After loading the 725.000 positions that each consist of 756 floats to encode the features the process uses 2500MB of memory. To store mostly zeroes! So I changed the encoding to store features as a tuple of (float, short) where the float is the feature value and the short the index. No need to store zero's and no need to have an index buffer. Just iterate over these tuples and you get the value (the float) and the index of the coeffecient to multiply with (the short) and it's sorted correctly so that you can utilize the same cache lines as much as possible.
The process memory shrunk to just 300 MB that way. And now a full iteration of gradient descent takes only 65ms! This is a great example of how important cache friendly programming is these days! And it was my first big WHOAAT? moment of the day.

This is already quite fast. Fast enough for all practical purposes probably but it's still just running on one thread. In theory this workload should be well suited for parallelization. And I was waiting for an excuse to try the new Task Parallel Library that was recently added to .Net
It's one of the examples where the documentation leaves you scratching your head for a while. Seriously... just look at the method signatures of the different overloads.

But in actual code it looks quite elegant.
Code: Select all
public static void Minimize(List<Data2> data, float[] coefficients, double scalingCoefficient, float alpha)
        {
            float[] accu = new float[N];
            foreach (Data2 entry in data)
            {
                float eval = Evaluate(entry.Features, coefficients);
                double sigmoid = 2 / (1 + Math.Exp(-(eval / scalingCoefficient))) - 1;
                float error = (float)(sigmoid - entry.Result);

                foreach (Feature f in entry.Features)
                    accu[f.Index] += error * f.Value;
            }

            for (int i = 0; i < N; i++)
                coefficients[i] -= alpha * accu[i] / data.Count;
        }
...becomes...
Code: Select all
        public static void MinimizeParallel(List<Data2> data, float[] coefficients, double scalingCoefficient, float alpha)
        {
            //each thread maintains a local accu. After the loop is complete the accus are combined
            Parallel.ForEach(data,
                //initialize the local variable accu
                () => new float[N],
                //invoked by the loop on each iteration in parallel
                (entry, loop, accu) => 
                {
                    float eval = Evaluate(entry.Features, coefficients);
                    double sigmoid = 2 / (1 + Math.Exp(-(eval / scalingCoefficient))) - 1;
                    float error = (float)(sigmoid - entry.Result);

                    foreach (Feature f in entry.Features)
                        accu[f.Index] += error * f.Value;

                    return accu;
                },
                //executed when each partition has completed.
                (accu) =>
                {
                    lock(coefficients)
                    {
                        for (int i = 0; i < N; i++)
                            coefficients[i] -= alpha * accu[i] / data.Count;
                    }
                }
            );
        }
It's pretty cool how my specific use-case where you need a thread-local accumulation buffer is not a major problem for the API even though picking the right overload was a bit tricky. But most of the added lines were actually comments. And you don't have to deal with any of the details like how many threads or cores are best suited for that kind of problem. So it really feels like you're just programming the equivalent of "MAKE THIS RUN FAST PLZZZ!"
And how fast is it? If you run the program after this change all 12 logical cores of my Ryzen 3600 are reported to be utilized equally with 75% load. And the net result is another 700% speedup! Now it's 9ms per iteration of gradient descent. That was the 2nd time my jaw dropped today.

That's 2000 epochs in less than 20 seconds. And that allows you to tune tapered PSTs from scratch just starting with material values in the time it takes to... well... I don't know much that could be accomplished in 20 seconds, actually.

lithander · Post by **lithander** » Sat Apr 16, 2022 10:58 pm

jswaff wrote: ↑Sat Apr 16, 2022 4:31 am I'm graduating from the Texel method to gradient descent (with mini-batches). It's working beautifully, but I simplified my evaluation by disabling "tapered eval" and the draw-scaling-factor while I was working everything out. Now that it's working I'm trying to work out how to fold tapered eval back in. Can you speak a bit more about how you did that?

BTW I just finished the ML course you referenced by Andrew Ng. That really helped tremendously.

EDIT: I think I've answered my own question. I'm currently doing a simple matrix multiplication to compute the eval score for each record:

SimpleMatrix b = x.mult(theta);

x is the feature matrix (one row per training record and one column per feature), and theta contains the weights. So to do tapered eval I suppose I'll have to do a little more work here, like calculating the middle game score (using x and theta), and the endgame score (using x and theta), and then do the interpolation just as the Eval function does. I still don't need to call the evaluation but it's a bit more work than a simple matrix multiplication. Am I on the right track?

Tapered evaluation just means that you have both midgame and endgame tables and want to interpolate between them based on a phase. So for example the phase of a position says that you currently have to take 70% of the midgame score and 30% of the endgame score.

You probably first calculated the midgame and endgame evaluations separately and only then interpolated between them based on phase. But once you realize you can roll that step into the features you don't have to do any seperate interpolation step.

Ideally you'll want to tune all the coefficients (midgame & endgame tables) at the same time. So the amount of coefficients doubles and that means you also need twice the amount of features, now. Where each feature was previously either 0 or 1 (either a piece is present (1) or it isn't (0)) your feature will now be in the range of [0..1] based on the phase. So in the example above you'd just set all the midgame features to 0.7 and all the endgame features to 0.3. And that should calculate exactly the same score as what your tapered evaluation did.

jswaff · Post by **jswaff** » Sat Apr 16, 2022 11:13 pm

lithander wrote: ↑Sat Apr 16, 2022 10:58 pm
jswaff wrote: ↑Sat Apr 16, 2022 4:31 am I'm graduating from the Texel method to gradient descent (with mini-batches). It's working beautifully, but I simplified my evaluation by disabling "tapered eval" and the draw-scaling-factor while I was working everything out. Now that it's working I'm trying to work out how to fold tapered eval back in. Can you speak a bit more about how you did that?

BTW I just finished the ML course you referenced by Andrew Ng. That really helped tremendously.

EDIT: I think I've answered my own question. I'm currently doing a simple matrix multiplication to compute the eval score for each record:

SimpleMatrix b = x.mult(theta);

x is the feature matrix (one row per training record and one column per feature), and theta contains the weights. So to do tapered eval I suppose I'll have to do a little more work here, like calculating the middle game score (using x and theta), and the endgame score (using x and theta), and then do the interpolation just as the Eval function does. I still don't need to call the evaluation but it's a bit more work than a simple matrix multiplication. Am I on the right track?
Tapered evaluation just means that you have both midgame and endgame tables and want to interpolate between them based on a phase. So for example the phase of a position says that you currently have to take 70% of the midgame score and 30% of the endgame score.

You probably first calculated the midgame and endgame evaluations separately and only then interpolated between them based on phase. But once you realize you can roll that step into the features you don't have to do any seperate interpolation step.

Ideally you'll want to tune all the coefficients (midgame & endgame tables) at the same time. So the amount of coefficients doubles and that means you also need twice the amount of features, now. Where each feature was previously either 0 or 1 (either a piece is present (1) or it isn't (0)) your feature will now be in the range of [0..1] based on the phase. So in the example above you'd just set all the midgame features to 0.7 and all the endgame features to 0.3. And that should calculate exactly the same score as what your tapered evaluation did.

Ah, of course! I have all the PSTs in theta (weights vector) and I have middle and endgame features. I hadn't considered using 'fractional features' - makes perfect sense. Thanks for the hint.

Mike Sherwin · Post by **Mike Sherwin** » Sun Apr 17, 2022 11:20 pm

I got a winning position against the Skeleton King and then threw it away giving the Skeleton King equality. But managed to get another winning position and this time grinding his bones to dust I found the Undead Crown among his remains.

This is probably the last version of Leorik that I'll be able to win against.
[pgn][Event "Computer chess game"]
[Site "DESKTOP-HFVHK2B"]
[Date "2022.04.17"]
[Round "?"]
[White "Leorik-2.0.2"]
[Black "Mike Sherwin"]
[Result "0-1"]
[BlackElo "2400"]
[ECO "C47"]
[Opening "Four Knights"]
[Time "12:52:13"]
[Variation "Van der Wiel Variation"]
[WhiteElo "2543"]
[TimeControl "30+30"]
[Termination "normal"]
[PlyCount "147"]
[WhiteType "computer"]
[BlackType "human"]

1. e4 {(e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 d2-d4 e5xd4 Nf3xd4 Bf8-b4
Nd4xc6 b7xc6 e4-e5 Bb4xc3+ b2xc3 Nf6-e4 Bc1-b2 O-O Bf1-d3) +0.28/19 12} e5
2. Nf3 {(Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 d2-d4 e5xd4 Nf3xd4 Bf8-b4 Nd4xc6 b7xc6
e4-e5 Nf6-d5 Bc1-d2 Bb4xc3 b2xc3 Qd8-e7 Qd1-e2 O-O) +0.13/18 14} Nc6 3. Nc3
{(Nb1-c3 Ng8-f6 d2-d4 e5xd4 Nf3xd4 Bf8-b4 Nd4xc6 b7xc6 e4-e5 Nf6-d5 Bc1-d2
Qd8-e7 Qd1-e2 Bb4xc3 b2xc3 O-O Ra1-d1 Bc8-b7 h2-h3) +0.22/19 59} Nf6 4. Be2
{(Bf1-e2 d7-d5 e4xd5 Nf6xd5 O-O Nd5xc3 d2xc3 Qd8xd1 Rf1xd1) +0.09/18 30}
Bb4 5. Nd5 {(Nc3-d5 Bb4-d6 Nd5xf6+ Qd8xf6 c2-c3 Qf6-g6 d2-d3 Bd6-e7 O-O
d7-d6 Nf3-e1 O-O Be2-h5 Qg6-e6 Bc1-e3 d6-d5 e4xd5 Qe6xd5 Ne1-f3) +0.30/19
18} Bc5 6. d3 {(d2-d3 Nf6xd5 e4xd5 Nc6-e7 c2-c4 c7-c6 Nf3xe5 d7-d6 Ne5-f3
c6xd5 O-O Bc8-e6 d3-d4 Bc5-b6 Nf3-g5 O-O Ng5xe6 f7xe6) +0.28/18 17} Nxd5 7.
exd5 {(e4xd5 Nc6-e7 Nf3xe5 d7-d6 Ne5-c4 Ne7xd5 d3-d4 Bc5-b6 a2-a4 c7-c6
a4-a5 Bb6-c7 O-O O-O Be2-f3 Bc8-e6 a5-a6 Ra8-b8 Bc1-d2 Rf8-e8) +0.25/20 41}
Nd4 8. c3 {(c2-c3 Nd4xe2 Qd1xe2 O-O O-O d7-d6 d3-d4 Bc5-b6 a2-a4 c7-c6
d4xe5 c6xd5 Bc1-g5 f7-f6 e5xf6 g7xf6 Bg5-h6 Rf8-e8) +0.32/18 19} Nxf3+ 9.
Bxf3 {(Be2xf3 Bc5-b6 b2-b4 a7-a5 b4-b5 d7-d6 Bc1-b2 a5-a4 a2-a3 O-O O-O
Bc8-f5 Ra1-c1 Qd8-g5 g2-g3) +0.31/20 30} a5 10. d4 {(d3-d4 e5xd4 c3xd4
Bc5-b4+ Bc1-d2 Qd8-e7+ Ke1-f1 Bb4xd2 Qd1xd2 O-O Ra1-d1 Rf8-e8 Kf1-g1 d7-d6
h2-h3 Bc8-f5 Qd2-f4 Bf5-e4 Rd1-e1 f7-f5) +0.50/20 27} exd4 11. cxd4 {(c3xd4
Bc5-b4+ Ke1-f1 O-O Kf1-g1 Bb4-d6 h2-h4 c7-c6 a2-a4 Rf8-e8 Bc1-e3 Qd8-f6
Qd1-b3) +0.46/19 15} Bb4+ 12. Kf1 {(Ke1-f1 O-O Bc1-f4 Rf8-e8 Qd1-d3 Qd8-e7
Bf4-e3 Bb4-d6 Kf1-g1 a5-a4 a2-a3 c7-c5 d4xc5 Bd6xc5 Be3xc5 Qe7xc5 Ra1-d1
d7-d6 h2-h3 Bc8-d7) +0.38/20 55} Bd6 13. Qe1+ {(Qd1-e1+ Qd8-e7 Qe1xe7+
Ke8xe7 Bc1-g5+ Ke7-f8 g2-g3 a5-a4 Kf1-g2 h7-h6 Bg5-e3 a4-a3 b2xa3 Ra8xa3
Rh1-d1) +0.55/19 44} Qe7 14. Qxe7+ {(Qe1xe7+ Ke8xe7 Bc1-g5+ Ke7-f8 a2-a4
Ra8-b8 Bg5-d2 b7-b6 Kf1-g1 Bc8-b7 Ra1-c1 Rb8-e8 Bd2-e3 Kf8-g8 b2-b3 h7-h6
h2-h4 h6-h5 Rc1-d1) +0.45/19 13} Kxe7 15. Bg5+ {(Bc1-g5+ f7-f6 Bg5-d2
Ke7-f7 Kf1-g1 a5-a4 h2-h4 Rh8-e8 a2-a3 b7-b6 h4-h5 Bc8-a6 Bd2-e3 h7-h6)
+0.38/19 37} f6 16. Bd2 {(Bg5-d2 a5-a4 g2-g3 Ke7-f7 Kf1-g2 b7-b5 h2-h4
Rh8-d8 h4-h5 h7-h6 Rh1-e1 b5-b4 Ra1-d1) +0.33/18 13} b6 17. h4 {(h2-h4
Ke7-f7 h4-h5 Bc8-a6+ Kf1-g1 h7-h6 Rh1-h4 a5-a4 a2-a3 Ra8-d8 Bf3-e4 Ba6-b7
Bd2-b4 Rh8-f8 Bb4xd6 c7xd6 Ra1-d1 f6-f5 Rh4-f4) +0.37/19 55} Ba6+ 18. Kg1
{(Kf1-g1 Ke7-f7 h4-h5 h7-h6 Bf3-e4 a5-a4 Ra1-e1 Ba6-c4 a2-a3 Rh8-d8 Rh1-h4
f6-f5 Be4xf5 Bc4xd5 Bd2-e3 Ra8-a5 Bf5-g6+ Kf7-g8) +0.29/19 24} Bc4 19. h5
{(h4-h5 a5-a4 h5-h6 Ke7-f7 a2-a3 Rh8-e8 h6xg7 Kf7xg7 Rh1-h4 f6-f5 Bd2-h6+
Kg7-h8) +0.41/20 20} Rae8 20. h6 {(h5-h6 Ke7-f7 h6xg7 Kf7xg7 Bd2-h6+ Kg7-g8
a2-a4 Bc4-d3 Rh1-h5 Re8-d8 Ra1-c1 Bd3-g6 Rh5-h4 Kg8-f7 Bh6-e3 Rh8-e8 Rc1-d1
Kf7-g7 Rh4-g4) +0.74/21 25} g6 21. b3 {(b2-b3 Bc4-d3 g2-g3 Ke7-f7 Kg1-g2
Bd3-e4 Bd2-e3 Be4xf3+ Kg2xf3 Re8-e7 Ra1-d1 Rh8-e8 Rh1-e1 Bd6-b4 Re1-f1
c7-c6 d5xc6 d7xc6 d4-d5 c6xd5) +0.73/20 13} Bd3 22. Be3 {(Bd2-e3 Ke7-f7
Rh1-h4 f6-f5 Ra1-d1 Bd3-c2 Rd1-c1 Bc2-e4 Bf3xe4 Re8xe4 Rh4xe4 f5xe4 g2-g4
Rh8-d8 Kg1-g2 b6-b5 a2-a4 b5-b4 g4-g5) +0.68/19 20} Kf7 23. Rh4 {(Rh1-h4
b6-b5 Ra1-d1 Bd3-f5 Rd1-c1 b5-b4 Rc1-e1 Rh8-f8 g2-g3 g6-g5 Rh4-h1 g5-g4
Bf3-g2 Bf5-e4 Rh1-h4 Be4xg2 Kg1xg2 Re8-e4 Rh4-h5 Rf8-d8) +0.53/20 21} g5
24. Rh1 {(Rh4-h1 Re8-e7 Bf3-h5+ Kf7-f8 Ra1-c1 Bd3-e4 Bh5-f3 Kf8-f7 Kg1-f1
Be4xf3 g2xf3 Kf7-g6 a2-a3 Rh8-c8 a3-a4 f6-f5 Be3-d2 Rc8-d8 Bd2-c3 Bd6-f4
Rc1-d1) +0.38/21 75} Rhg8 25. Rd1 {(Ra1-d1 Bd3-c2 Bf3-h5+ Bc2-g6 Bh5-e2
Bg6-e4 Rd1-f1 Re8-b8 Be2-h5+ Kf7-e7 Bh5-f3 Be4xf3 g2xf3 Bd6-f4 Rf1-c1
Bf4xe3 f2xe3 Ke7-d6 Rh1-h5 Rb8-c8 e3-e4 Rg8-d8) +0.48/22 20} Be4 26. Bh5+
{(Bf3-h5+ Be4-g6 Bh5-e2 Bg6-e4 Rd1-f1 Re8-b8 Be2-h5+ Kf7-e7 Bh5-f3 Be4xf3
g2xf3 Bd6-f4 Rf1-c1 Bf4xe3 f2xe3 Ke7-d6 Rh1-h5 Rg8-c8 e3-e4 c7-c6 d5xc6)
+0.52/21 13} Bg6 27. Be2 {(Bh5-e2 f6-f5 Be2-b5 f5-f4 Be3-d2 Re8-d8 f2-f3
Rg8-e8 Bb5-c4 Bg6-c2 Rd1-c1 Bc2-f5 Bc4-a6 Kf7-g6 Ba6-b5 Re8-e7 a2-a4 Re7-e8
Bd2-c3) +0.48/20 19} Re7 28. Bc4 {(Be2-c4 Bg6-e4 Be3-d2 g5-g4 Rd1-f1 Be4-f3
Rf1-c1 Bf3-e2 Bd2-e3 Be2xc4 b3xc4 Re7-e4 c4-c5 Bd6-f4 Be3xf4 Re4xf4 c5xb6
c7xb6 Rc1-c7 Rg8-d8) +0.48/20 29} Rge8 29. Rc1 {(Rd1-c1 Bg6-f5 Bc4-e2
Re8-d8 Be2-f3 g5-g4 Bf3-e2 g4-g3 Rh1-h4 Rd8-e8 Be2-f3 Bf5-e4 Bf3-g4 Be4-g6
Be3-d2 g3xf2+ Kg1xf2 Re7-e4 Bd2-c3 Bd6-f4) +0.47/20 76} f5 30. Bd2 {(Be3-d2
f5-f4 Rc1-f1 Bg6-f5 f2-f3 Kf7-g6 Kg1-f2 c7-c5 d5xc6/ep d7xc6 a2-a4 c6-c5
Bd2-c3 Re7-e3 Rf1-c1 Bf5-d3 Bc4-d5 Re3-e2+ Kf2-g1 Re8-d8) +0.17/20 14} Kf6
31. g3 {(g2-g3 f5-f4 g3xf4 g5xf4 f2-f3 Bg6-f7 a2-a4 Re7-e2 Rc1-d1 Re8-g8+
Kg1-f1 Re2-e8 Kf1-f2 Rg8-g3 Rd1-e1 Re8xe1 Rh1xe1 Kf6-g5 Re1-h1 Kg5-f5)
+0.27/20 15} f4 32. gxf4 {(g3xf4 Bd6xf4 Bd2xf4 g5xf4 f2-f3 Re7-e1+ Kg1-f2
Re1xh1 Rc1xh1 Kf6-g5 Bc4-b5 Re8-e3 Rh1-c1 Bg6-d3 Bb5xd3 Re3xd3 Rc1xc7
Rd3-d2+ Kf2-f1 Rd2xd4 Rc7xd7 Kg5xh6) +0.15/22 43} Bxf4 33. Bxf4 {(Bd2xf4
g5xf4 Kg1-g2 Kf6-g5 Bc4-b5 Re7-e4 Rc1xc7 Re4xd4 Rc7xd7 Bg6-e4+ f2-f3
Rd4-d2+ Kg2-g1 Rd2-d1+ Kg1-h2 Rd1-d2+ Kh2-g1 Rd2-d1+ Kg1-h2 Rd1-d2+ Kh2-g1)
0.00/21 25} gxf4 34. Kg2 {(Kg1-g2 Kf6-g5 f2-f3 Re7-e3 Kg2-f2 Bg6-d3 Rh1-d1
Bd3xc4 Rc1xc4 Re3-e2+ Kf2-g1 Kg5xh6 Rc4xc7 Re2-e1+ Rd1xe1 Re8xe1+ Kg1-f2
Re1-a1 a2-a4 Ra1-a2+) -0.06/20 21} d6 35. f3 {(f2-f3 Re7-e3 Rh1-h4 Kf6-g5
Rh4-g4+ Kg5xh6 Kg2-f2 Kh6-g7 a2-a4 Re8-e7 Rg4xf4 h7-h5 Rf4-h4 Re7-e8 Rh4-h1
Re8-e7 Rh1-h2 Kg7-h8 Rh2-g2 Kh8-g7 Rg2-g5 h5-h4) +0.12/22 35} Bf5 36. Bb5
{(Bc4-b5 Re8-g8+ Kg2-f2 Rg8-g3 Rc1-g1 Rg3-g5 Rg1-d1 Kf6-g6 Rd1-c1 Rg5-g3
Rh1-h2 Kg6-g5 Bb5-e2 Bf5-g6 Rc1-h1 Re7-e3 Rh1-c1 Re3-e7) 0.00/24 51} Rg8+
37. Kf2 {(Kg2-f2 Rg8-g3 Rc1-g1 Rg3-g5 Rg1-d1 Kf6-g6 Rd1-c1 Rg5-g3 Rh1-h2
Kg6-g5 Rc1-h1 Re7-e3 Rh2-h5+ Kg5-g6 Bb5-e2 b6-b5 Rh1-h4 Bf5-d3 Be2xd3+
Re3xd3 Rh4xf4 Kg6xh5 Kf2xg3) 0.00/23 9} Rg3 38. Rcg1 {(Rc1-g1 Rg3-g5 Rg1-c1
Bf5-e6 d5xe6 Rg5xb5 Rh1-h4 Rb5-f5 Rh4-g4 Kf6xe6 Rc1-e1+ Ke6-f6 Re1xe7
Kf6xe7 Kf2-e2 Rf5-f7 Ke2-d3 Ke7-e6 Rg4-g7 d6-d5 Rg7xf7 Ke6xf7 a2-a4)
-0.03/23 27} Rg5 39. Rc1 {(Rg1-c1 Bf5-e6 d5xe6 Rg5xb5 Rh1-h4 Rb5-f5 Rh4-g4
Kf6xe6 Rg4-g7 Rf5-f7 Rg7-g5 Ke6-d7 d4-d5 Rf7-f6 Rc1-h1 b6-b5 Rg5-g7 Rf6-f5
Rg7xe7+ Kd7xe7 Rh1-c1 Ke7-d7 Rc1-d1) -0.13/23 55} Kg6 40. a3 {(a2-a3 Rg5-h5
Rc1xc7 Re7xc7 Bb5-e8+ Kg6-f6 Rh1xh5 Rc7-c2+ Kf2-g1 Rc2-d2 Be8-c6 Rd2xd4
Kg1-f2 Rd4-d3 b3-b4 Rd3xa3 b4-b5 Ra3-a2+ Kf2-e1 Ra2-b2 Ke1-d1 a5-a4 Kd1-c1)
-0.60/23 22} Rh5 41. Rxh5 {(Rh1xh5 Kg6xh5 b3-b4 a5xb4 a3xb4 Kh5xh6 Bb5-e2
Kh6-g6 Rc1-g1+ Kg6-f6 Rg1-g8 Re7-e3 Rg8-f8+ Kf6-g6 Rf8-g8+ Kg6-f7 Rg8-h8
Re3-c3 b4-b5 Rc3-c2 Rh8-b8 Rc2-d2 Rb8-b7) -0.76/23 17} Kxh5 42. b4 {(b3-b4
a5xb4 Rc1-h1+ Kh5-g5 a3xb4 Re7-e3 Rh1-c1 Re3-b3 Rc1xc7 Rb3xb4 Bb5-d7 Bf5-g6
Rc7-c6 Rb4xd4 Rc6xd6 Kg5xh6 Kf2-e2 Rd4-b4 Ke2-d2 Kh6-g5 Kd2-c3 Rb4-b1
Bd7-c6 Bg6-f5 Rd6-d8 Rb1-d1) -0.85/26 34} Kxh6 43. bxa5 {(b4xa5 b6xa5 a3-a4
Kh6-g7 Rc1-g1+ Bf5-g6 Rg1-g4 Re7-f7 Bb5-d3 Kg7-h6 Rg4-h4+ Kh6-g5 Rh4-g4+
Kg5-h5 Bd3-c4 Kh5-h6 Rg4-g1 Kh6-g7 Rg1-e1 h7-h5 Re1-e8 Bg6-c2 Re8-a8 Bc2xa4
Ra8xa5 Ba4-c2) -0.76/26 31} bxa5 44. a4 {(a3-a4 Kh6-g7 Rc1-g1+ Bf5-g6
Bb5-c4 Re7-f7 Rg1-e1 h7-h5 Re1-e8 Bg6-c2 Bc4-b5 Bc2-b3 Bb5-c6 h5-h4 Re8-a8
h4-h3 Ra8xa5 Rf7-e7 Kf2-g1 Re7-e2 Ra5-a7 h3-h2+ Kg1-h1 Kg7-f6 Ra7xc7
Bb3-c4) -0.84/26 24} Rg7 45. Rh1+ {(Rc1-h1+ Kh6-g5 Rh1-g1+ Kg5-f6 Rg1-h1
Rg7-f7 Bb5-e8 Rf7-f8 Be8-b5 Rf8-a8 Rh1-c1 Ra8-a7 Rc1-h1 Ra7-b7 Rh1-g1 h7-h5
Rg1-g8 h5-h4 Rg8-f8+ Kf6-g5 Rf8-g8+ Bf5-g6 Kf2-g2 Rb7-b6 Rg8-a8 Bg6-f7
Bb5-c4) -0.67/27 29} Kg6 46. Be8+ {(Bb5-e8+ Kg6-f6 Rh1-h6+ Bf5-g6 Be8xg6
h7xg6 Rh6-h4 Kf6-g5 Rh4-h8 Rg7-e7 Rh8-c8 Re7-h7 Kf2-g2 Rh7-f7 Rc8-a8 Rf7-e7
Ra8xa5 Re7-e2+ Kg2-f1 Re2-d2 Ra5-b5 Rd2xd4 a4-a5 Rd4-d2 a5-a6 Rd2-a2
Rb5-b7) -0.92/27 37} Kf6 47. Rh6+ {(Rh1-h6+ Bf5-g6 Be8xg6 h7xg6 Rh6-h8
Rg7-e7 Rh8-a8 Re7-e3 Ra8xa5 Re3-d3 Ra5-a8 Rd3xd4 a4-a5 Rd4-d2+ Kf2-f1
Rd2-a2 a5-a6 Kf6-e5 a6-a7 Ke5xd5 Kf1-e1 Kd5-c4 Ke1-f1 Kc4-d4 Kf1-g1 Ra2-d2)
-0.73/26 16} Bg6 48. Bxg6 {(Be8xg6 h7xg6 Rh6-h8 Rg7-e7 Rh8-f8+ Kf6-g5
Rf8-a8 Re7-e3 Ra8xa5 Re3-d3 Ra5-a7 Rd3xd4 Ra7xc7 Rd4xd5 Kf2-e2 Rd5-e5+
Ke2-f2 Re5-a5 Rc7-c4 Kg5-f5 Kf2-e2 Ra5-d5 Rc4-c8 Rd5-e5+ Ke2-f2 Re5-a5
Rc8-c4 Kf5-e5 Kf2-e2) -0.94/31 24} hxg6 49. Rh4 {(Rh6-h4 Kf6-f5 Rh4-h8
Rg7-e7 Rh8-f8+ Kf5-g5 Rf8-a8 Re7-e3 Ra8xa5 Re3-d3 Ra5-a7 Rd3xd4 Ra7xc7
Rd4-d2+ Kf2-e1 Rd2xd5 Ke1-e2 Kg5-h4 Rc7-g7 g6-g5 Rg7-b7 Rd5-c5 Rb7-b5
Rc5-c3 a4-a5 Rc3-e3+ Ke2-d2) -1.09/33 40} Kg5 50. Rh8 {(Rh4-h8 Rg7-e7
Rh8-a8 Re7-e3 Ra8xa5 Re3-d3 Ra5-a7 Rd3xd4 Ra7xc7 Rd4-d2+ Kf2-e1 Rd2xd5
Ke1-e2 Kg5-h4 Rc7-c1 g6-g5 Rc1-h1+ Kh4-g3 Rh1-g1+ Kg3-h2 Rg1-g4 Kh2-h3
Rg4-g1 Rd5-c5 Rg1-g4 Kh3-h2 Ke2-d3) -1.00/34 34} Re7 51. Ra8 {(Rh8-a8
Re7-e3 Ra8xa5 Re3-d3 Ra5-a7 Rd3xd4 Ra7xc7 Rd4-d2+ Kf2-e1 Rd2xd5 Rc7-c1
Kg5-h4 Ke1-e2 g6-g5 Rc1-h1+ Kh4-g3 Rh1-g1+ Kg3-h2 Rg1-g4 Rd5-e5+ Ke2-f2
Re5-c5 Kf2-e2 Rc5-a5) -0.99/33 46} Re3 52. Rxa5 {(Ra8xa5 Re3-d3 Ra5-a7
Rd3xd4 Ra7xc7 Rd4-d2+ Kf2-e1 Rd2xd5 Rc7-c1 Kg5-h4 Ke1-e2 Rd5-e5+) -0.99/31
30} Kh4 53. Ra7 {(Ra5-a7 Re3-c3 a4-a5 Rc3-c2+ Kf2-e1 Kh4-g3 a5-a6 Kg3xf3
Ke1-d1 Rc2-a2 Ra7xc7 Ra2xa6 Rc7-c6 Ra6-a1+ Kd1-d2 Kf3-e4 Rc6xd6 g6-g5
Kd2-c3 f4-f3 Rd6-e6+ Ke4xd5 Re6-e5+ Kd5-d6 Re5xg5 f3-f2 Rg5-g6+ Kd6-d5
Rg6-g5+ Kd5-e4) -1.59/30 14} Rc3 54. a5 {(a4-a5 Rc3-c2+ Kf2-e1 Kh4-g3 a5-a6
Kg3xf3 Ke1-d1 Rc2-a2 Ra7xc7 Ra2xa6 Rc7-c6 Ra6-a1+ Kd1-d2 Kf3-e4 Rc6xd6
g6-g5 Kd2-c3 f4-f3 Rd6-e6+ Ke4xd5 Re6-e5+ Kd5-d6 Re5xg5 f3-f2) -1.59/28 45}
Rc2+ 55. Ke1 {(Kf2-e1 Kh4-g3 Ke1-d1 Rc2-a2 Kd1-c1 Kg3xf3 Kc1-b1 Ra2-a4
Kb1-b2 Kf3-e4 Ra7xc7 Ra4xa5 Kb2-c3 Ra5-a1 Rc7-f7 f4-f3 Kc3-c4 Ra1-c1+
Kc4-b3 Rc1-d1 Rf7-f6 g6-g5 Kb3-c3 Rd1-d3+ Kc3-c2) -2.45/28 13} Kg3 56. Kd1
{(Ke1-d1 Rc2-a2 Ra7xc7 Kg3xf3 Rc7-g7 Ra2xa5 Rg7xg6 Ra5xd5 Kd1-d2 Rd5xd4+
Kd2-c3 Rd4-d1 Kc3-c2 Rd1-d5 Kc2-c3 Kf3-e3 Kc3-c4 Rd5-d1 Rg6-f6 f4-f3
Rf6-e6+ Ke3-f4 Re6-f6+ Kf4-g3 Rf6-g6+ Kg3-f2 Kc4-b5 Rd1-b1+ Kb5-c6)
-2.54/29 46} Rc4 57. Ke2 {(Kd1-e2 g6-g5 Ke2-d3 Rc4-c1 Kd3-e4 g5-g4 Ra7-a8
Rc1-e1+ Ke4-f5 Kg3xf3 a5-a6 g4-g3 a6-a7 g3-g2 Ra8-g8 Re1-a1 Rg8-g4 Ra1xa7
Rg4xf4+ Kf3-e3 Rf4-e4+ Ke3-d3 Re4-g4 Ra7-a2 Kf5-g6 Kd3-c4) -2.47/26 25}
Rxd4 58. Rxc7 {(Ra7xc7 Rd4xd5 a5-a6 g6-g5 a6-a7 Rd5-a5 Rc7-g7 Ra5-a2+
Ke2-d3 Kg3xf3 Kd3-c4 g5-g4 Kc4-b5 g4-g3 Kb5-b6 g3-g2 Kb6-b7 Kf3-f2 a7-a8Q
Ra2xa8 Kb7xa8 g2-g1Q Rg7xg1 Kf2xg1 Ka8-b7 f4-f3 Kb7-c6 f3-f2 Kc6xd6)
-2.10/29 21} Rxd5 59. a6 {(a5-a6 g6-g5 a6-a7 Rd5-a5 Ke2-d3 Kg3xf3 Kd3-c4
Kf3-g3 Kc4-b4 Ra5-a1 Rc7-c3+ f4-f3 Rc3-a3 Ra1xa3 Kb4xa3 f3-f2 a7-a8Q f2-f1Q
Qa8-g8 Qf1-f5 Qg8-b3+ Kg3-g2 Qb3-b2+ Qf5-f2 Qb2-b8 Qf2-g3+ Ka3-b4 Kg2-f3)
-2.26/28 26} Re5+ 60. Kd3 {(Ke2-d3 Kg3xf3 Kd3-c3 Re5-a5 a6-a7 Kf3-g4 Rc7-f7
f4-f3 Kc3-d4 g6-g5 Kd4-e3 Ra5-a3+ Ke3-f2 d6-d5 Rf7-d7 Ra3-a2+ Kf2-e3 f3-f2
Rd7-f7 Ra2xa7 Rf7xf2 Ra7-e7+ Ke3-d4 Kg4-g3 Rf2-f8 Re7-d7 Rf8-c8) -1.83/27
37} Kxf3 61. a7 {(a6-a7 Re5-a5 Kd3-c4 Kf3-g4 Rc7-f7 f4-f3 Kc4-d3 Kg4-g3
Rf7-g7 Kg3-f2 Rg7xg6 Ra5xa7 Rg6xd6 Kf2-g2 Rd6-g6+ Kg2-f1 Rg6-b6 f3-f2)
-2.17/26 18} Ra5 62. Kc4 {(Kd3-c4 Kf3-g4 Kc4-b4 Ra5-a2 Kb4-b5 f4-f3 Rc7-f7
Kg4-g3 Rf7-g7 Kg3-h2 Rg7-f7 Kh2-g2 Rf7-g7 Ra2xa7 Rg7xa7 f3-f2 Ra7-a2 g6-g5
Kb5-c6) -1.79/26 22} g5 63. Kb4 {(Kc4-b4 Ra5-a6 Rc7-c3+ Kf3-g2 Rc3-a3
Ra6xa7 Ra3xa7 g5-g4 Ra7-a2+ Kg2-h3 Kb4-c3 g4-g3 Kc3-d3 g3-g2 Ra2-a7 g2-g1Q
Ra7-h7+ Kh3-g2 Rh7-g7+ Kg2-h1 Rg7xg1+ Kh1xg1 Kd3-e4 Kg1-f1 Ke4xf4 Kf1-g2
Kf4-f5 Kg2-g3) -1.04/28 41} Rxa7 64. Rxa7 {(Rc7xa7 g5-g4 Kb4-c3 g4-g3
Kc3-d4 g3-g2 Ra7-g7 Kf3-f2 Kd4-e4 f4-f3 Ke4-f4 g2-g1Q Rg7xg1 Kf2xg1 Kf4xf3
Kg1-f1 Kf3-g4 Kf1-g2 Kg4-f5 Kg2-g3 Kf5-e6 Kg3-g4 Ke6xd6 Kg4-f4 Kd6-d5
Kf4-f3 Kd5-d4 Kf3-g3 Kd4-e3 Kg3-g4) -0.13/30 46} g4 65. Kc4 {(Kb4-c4 g4-g3
Kc4-d5 g3-g2 Ra7-g7 Kf3-f2 Kd5-d4 f4-f3 Kd4-e4 d6-d5+ Ke4-f4 Kf2-e2 Rg7-g5
d5-d4 Rg5-g8 d4-d3 Rg8-e8+ Ke2-f1) -5.69/28 19} g3 66. Kd5 {(Kc4-d5 g3-g2
Ra7-g7 Kf3-f2 Kd5-e4 f4-f3 Ke4-f4 d6-d5 Rg7-a7 Kf2-e2 Ra7-a2+ Ke2-d3
Ra2-a3+ Kd3-c4 Kf4xf3 g2-g1Q Ra3-a4+ Kc4-b3 Ra4-a8 d5-d4 Ra8-b8+ Kb3-c3
Rb8-e8 Qg1-h1+ Kf3-f2 d4-d3 Re8-c8+ Kc3-d2 Kf2-g3 Qh1-e4 Rc8-d8) -6.02/31
23} g2 67. Rg7 {(Ra7-g7 Kf3-f2 Kd5-e4 f4-f3 Ke4-f4 d6-d5 Rg7-a7 Kf2-e2
Ra7-a2+ Ke2-d3 Ra2-a3+ Kd3-c4 Kf4xf3 g2-g1Q Ra3-a4+ Kc4-b3 Ra4-a8 d5-d4
Ra8-b8+ Kb3-c2 Rb8-c8+ Kc2-d1 Rc8-e8 d4-d3 Re8-b8 Qg1-h1+ Kf3-e3 Qh1-e1+
Ke3-f4 d3-d2 Kf4-f5 Kd1-e2) -6.41/32 45} Kf2 68. Ke4 {(Kd5-e4 f4-f3 Ke4-f4
d6-d5 Rg7-a7 Kf2-e2 Ra7-a2+ Ke2-d3 Ra2-a3+ Kd3-c4 Kf4xf3 g2-g1Q Ra3-a4+
Kc4-b3 Ra4-f4 d5-d4 Kf3-e4 Qg1-e3+ Ke4-f5 d4-d3 Rf4-e4) -6.41/30 32} f3 69.
Kf4 {(Ke4-f4 d6-d5 Rg7-a7 Kf2-e2 Ra7-a2+ Ke2-d3 Ra2-a3+ Kd3-c4 Kf4xf3
g2-g1Q Ra3-a4+ Kc4-b3 Ra4-a8 d5-d4 Ra8-b8+ Kb3-c2 Rb8-e8 Qg1-h1+ Kf3-g4
d4-d3 Re8-c8+ Kc2-b2 Rc8-b8+ Kb2-a3 Rb8-d8 Qh1-e4+ Kg4-h3 d3-d2 Kh3-g3
Qe4-e2) -6.61/30 19} d5 70. Ra7 {(Rg7-a7 Kf2-e2 Ra7-a2+ Ke2-d3 Ra2-a3+
Kd3-c2 Kf4xf3 g2-g1Q Ra3-a2+ Kc2-c3 Ra2-a3+ Kc3-b4 Ra3-e3 d5-d4 Re3-e8
d4-d3 Re8-d8 Kb4-c3 Rd8-b8 Qg1-f1+) -9.20/29 36} Ke2 71. Ra2+ {(Ra7-a2+
Ke2-d3 Ra2-a3+ Kd3-c2 Kf4xf3 g2-g1Q Ra3-a2+ Kc2-b3 Ra2-a8 Qg1-h2 Kf3-g4
d5-d4 Ra8-e8 d4-d3 Re8-d8 d3-d2 Rd8-d3+ Kb3-c2 Rd3-d8 Qh2-e2+ Kg4-f5 d2-d1Q
Rd8xd1) -9.45/28 31} Kd3 72. Kxf3 {(Kf4xf3 g2-g1Q Ra2-a3+ Kd3-c4 Ra3-a4+
Kc4-b3 Ra4-a8 Qg1-h2 Kf3-g4 d5-d4 Ra8-e8 d4-d3 Re8-d8 d3-d2 Rd8-d3+ Kb3-c2
Rd3-d8 d2-d1Q+ Rd8xd1 Kc2xd1 Kg4-f5 Qh2-e2 Kf5-f6 Qe2-e1 Kf6-f7 Kd1-c2
Kf7-f6) -9.46/27 19} g1=Q 73. Ra3+ {(Ra2-a3+ Kd3-d2 Ra3-a2+ Kd2-c3 Ra2-a8
d5-d4 Ra8-c8+ Kc3-d2 Rc8-e8 Qg1-e3+ Re8xe3 d4xe3 Kf3-e4 e3-e2 Ke4-e5
e2-e1Q+ Ke5-f4 Qe1-e2 Kf4-f5) -9.61/26 42} Kd4 74. Ra8 {(Ra3-a8 Kd4-c3
Kf3-f4 Qg1-h2+ Kf4-f5 d5-d4 Kf5-e6 d4-d3 Ra8-c8+ Kc3-b4 Rc8-d8 d3-d2 Rd8-d5
Qh2-e2+ Ke6-d6 d2-d1Q Rd5xd1 Qe2xd1+ Kd6-e7 Kb4-b3 Ke7-f8 Qd1-g4 Kf8-f7
Qg4-f5+ Kf7-g7 Qf5-e4) -9.62/26 25 White resigns} *
[/pgn]

lithander · Post by **lithander** » Tue Apr 19, 2022 2:29 pm

Mike Sherwin wrote: ↑Sun Apr 17, 2022 11:20 pm I got a winning position against the Skeleton King and then threw it away giving the Skeleton King equality. But managed to get another winning position and this time grinding his bones to dust I found the Undead Crown among his remains. This is probably the last version of Leorik that I'll be able to win against.

Given it's rating of ~2550 (CCRL) I think that is very impressive that you can still win!

I'm currently experimenting with ideas to improve the evaluation without hardcoding chess knowledge into the engine. (And without using outright neural networks)

Inspired by the idea of using the game phase (midgame..endgame) to modify the material values of pieces I wanted to explore something similar based on the king position. It's basically another set of 6 PSTs with small values modifying the "normal" material values but instead of applying these values based on the phase of the game it's going to be multiplied with a value that reflects the general position of the King. Probably in the range [-1..1] and derived from a (tuned) lookup table.

I'm quite curious what this structure will allow me to "encode" in terms of positional knowledge. For example it should be possible to bump the value of pawns that are serving as a pawn shield for the castled king only when this castling move has actually happened.

Mike Sherwin · Post by **Mike Sherwin** » Wed Apr 20, 2022 4:27 am

lithander wrote: ↑Tue Apr 19, 2022 2:29 pm
Mike Sherwin wrote: ↑Sun Apr 17, 2022 11:20 pm I got a winning position against the Skeleton King and then threw it away giving the Skeleton King equality. But managed to get another winning position and this time grinding his bones to dust I found the Undead Crown among his remains. This is probably the last version of Leorik that I'll be able to win against.
Given it's rating of ~2550 (CCRL) I think that is very impressive that you can still win!

The reason I was able to win those games was mainly because there was at no time anything difficult for me to solve. Only the second game was there a lot of action but I only had to solve simple problems that did not really feel like problems. I never felt pressured in any of the games.

I've played 8 long time control games against SF 3 thru 10 and my record is 4 losses and 4 draws for something like a 3000 performance rating. Also I played 3 knight odds games against SF10 at ltc and got 2 draws and one win. Here is my win and a draw.
forum3/viewtopic.php?f=11&t=69474&hilit ... 30#p786107

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik