PSQT Tuning questions

jmcd · Post by **jmcd** » Tue Jul 05, 2022 1:21 am

I'm diving headfirst into eval tuning right now and am hoping to get some questions answered or have a discussion about problems I have and things I see other engines doing. Currently I've incrementally tuned everything in my evaluation function except for Pawn PSQT, Rook PSQT, King PSQT, and passed pawn bonuses. When I try to tune these values the engine only becomes weaker. I suspect that the reason for this may either be a result of overfitting or the problem of correlation != causation.

I've looked at the evaluation functions for some very strong engines and have noticed that they use 8x4 PSQTs, and I assume they horizontally mirror them to create the full bonus table. I like this idea, as it seems like it would be effective for avoiding overfitting. However, isnt this horizontal mirroring potentially ineffective for chess? I have very little knowledge on real chess strategy, but since the king and queen are on opposite sides, I assume that a square may potentially be much weaker/stronger than its horizontal mirror. As far as I know in the game of chess, 1.d4 and 1.e4 are very different moves, and it seems like a huge oversimplification to treat them as the same. Please correct me if I'm completely misunderstanding how these 8x4 PSQT tables work.

Another thing I've noticed that I'd like to get verification on- is it not completely pointless to tune your piece values? Since a piece will always be accompanied by a PSQT score, changing the value of a piece is the equivalent of changing the entire PSQT by the same amount. If this is the case, I dont understand why all engines dont set their piece values to the standard 100,300,300,500,900?

algerbrex · Post by **algerbrex** » Tue Jul 05, 2022 1:45 am

Hi Johnathan, a couple of things,

First, I've experimented with using your approach and incrementally tuning different evaluation terms separately, and while it's worked sometimes, I've often found it to give a poorer equality evaluation overall versus tuning all values each session. While overfitting can occur and it's important to find a good stopping condition/number of training epochs, the benefit of being able to tune and tweak orthogonal values is important. So that might be something to try with regards to not being able to tune certain PSQTs and gain Elo.

Second, trying a different dataset might help. Although plenty of others have, I personally haven't found much success using Andrew Grant's datasets, which if I remember correctly from your code, it's the one you're using. Try using another dataset, like the very popular Zurichess dataset (https://github.com/lithander/Leorik/blo ... abeled.epd), and see if that helps with your tuning.

Third, I've also experimented with horizontally mirrored PSQTs, like the ones you mentioned, but I found them to be quite a bit weaker than using 8x8 PSQTs. I suspect this is because of many of the points you mentioned, the different sides of a chess board are not always very equal, and using the 8x8 PSQTs gives you enough granularity to capture this idea. But others have found success using them, so it might just be something you have to experiment with yourself. After all, I've found chess programming in many cases to be more of an art than an exact science

Fourth, the idea you mention of holding the piece values constant and only tuning the PSQT table values isn't a crazy idea and has been mentioned in various places on this forum, although I can't recall any exact threads or open-source engines that use such an approach at the moment.

I don't believe you're using tapered evaluation yet in your engine, but I think the above idea might run into difficulties when trying to tune values based on the middle and endgame, since holding the piece values constant and only tuning the PSQT values may not provide enough granularity to capture the idea of how much a piece's value can fluctuate between the different phases of the game. But again, that's not a hard and fast rule, just my intution.

jmcd · Post by **jmcd** » Tue Jul 05, 2022 2:04 am

algerbrex wrote: ↑Tue Jul 05, 2022 1:45 am Hi Johnathan, a couple of things,

First, I've experimented with using your approach and incrementally tuning different evaluation terms separately, and while it's worked sometimes, I've often found it to give a poorer equality evaluation overall versus tuning all values each session. While overfitting can occur and it's important to find a good stopping condition/number of training epochs, the benefit of being able to tune and tweak orthogonal values is important. So that might be something to try with regards to not being able to tune certain PSQTs and gain Elo.

Second, trying a different dataset might help. Although plenty of others have, I personally haven't found much success using Andrew Grant's datasets, which if I remember correctly from your code, it's the one you're using. Try using another dataset, like the very popular Zurichess dataset (https://github.com/lithander/Leorik/blo ... abeled.epd), and see if that helps with your tuning.

Third, I've also experimented with horizontally mirrored PSQTs, like the ones you mentioned, but I found them to be quite a bit weaker than using 8x8 PSQTs. I suspect this is because of many of the points you mentioned, the different sides of a chess board are not always very equal, and using the 8x8 PSQTs gives you enough granularity to capture this idea. But others have found success using them, so it might just be something you have to experiment with yourself. After all, I've found chess programming in many cases to be more of an art than an exact science

Fourth, the idea you mention of holding the piece values constant and only tuning the PSQT table values isn't a crazy idea and has been mentioned in various places on this forum, although I can't recall any exact threads or open-source engines that use such an approach at the moment.

I don't believe you're using tapered evaluation yet in your engine, but I think the above idea might run into difficulties when trying to tune values based on the middle and endgame, since holding the piece values constant and only tuning the PSQT values may not provide enough granularity to capture the idea of how much a piece's value can fluctuate between the different phases of the game. But again, that's not a hard and fast rule, just my intution.

Thanks for your response. Its pretty much exactly what I was looking for. And the evaluation is tapered already.

I'll try using the dataset you linked and will see how it performs. Obviously with 6 PSQTs that still aren't tuned, there is still a lot of room for improvement. I'm going to keep the piece values at the normal values as it always seemed like a strange contradiction to me to not have a pawn equal to 100 centipawns.

algerbrex · Post by **algerbrex** » Tue Jul 05, 2022 2:20 am

jmcd wrote: ↑Tue Jul 05, 2022 2:04 am
algerbrex wrote: ↑Tue Jul 05, 2022 1:45 am Hi Johnathan, a couple of things,

First, I've experimented with using your approach and incrementally tuning different evaluation terms separately, and while it's worked sometimes, I've often found it to give a poorer equality evaluation overall versus tuning all values each session. While overfitting can occur and it's important to find a good stopping condition/number of training epochs, the benefit of being able to tune and tweak orthogonal values is important. So that might be something to try with regards to not being able to tune certain PSQTs and gain Elo.

Second, trying a different dataset might help. Although plenty of others have, I personally haven't found much success using Andrew Grant's datasets, which if I remember correctly from your code, it's the one you're using. Try using another dataset, like the very popular Zurichess dataset (https://github.com/lithander/Leorik/blo ... abeled.epd), and see if that helps with your tuning.

Third, I've also experimented with horizontally mirrored PSQTs, like the ones you mentioned, but I found them to be quite a bit weaker than using 8x8 PSQTs. I suspect this is because of many of the points you mentioned, the different sides of a chess board are not always very equal, and using the 8x8 PSQTs gives you enough granularity to capture this idea. But others have found success using them, so it might just be something you have to experiment with yourself. After all, I've found chess programming in many cases to be more of an art than an exact science

Fourth, the idea you mention of holding the piece values constant and only tuning the PSQT table values isn't a crazy idea and has been mentioned in various places on this forum, although I can't recall any exact threads or open-source engines that use such an approach at the moment.

I don't believe you're using tapered evaluation yet in your engine, but I think the above idea might run into difficulties when trying to tune values based on the middle and endgame, since holding the piece values constant and only tuning the PSQT values may not provide enough granularity to capture the idea of how much a piece's value can fluctuate between the different phases of the game. But again, that's not a hard and fast rule, just my intution.
Thanks for your response. Its pretty much exactly what I was looking for. And the evaluation is tapered already.

Sure thing.

jmcd wrote: ↑Tue Jul 05, 2022 2:04 am I'll try using the dataset you linked and will see how it performs. Obviously with 6 PSQTs that still aren't tuned, there is still a lot of room for improvement. I'm going to keep the piece values at the normal values as it always seemed like a strange contradiction to me to not have a pawn equal to 100 centipawns.

Hmm, not sure I share the intuition. If you're using a tapered evaluation, then it seems to make sense to me that the value of a pawn is dependent on the phase of the game. In the middle game, a pawn can often be given away for abig initiative and lead in development, so it probably isn't quite worth a full 100 cenitpawns, especially more so if it's on certain squares, which is what the PSQT help to indicate. But in the endgame where promotion is often the only path to victory, pawns become extremely important and easily worth the value of a minor or more, which is again what a different pawn value would help indicate, as well as the passed pawn bonuses dependent on rank.

But as I said, definitely experiment and find what works best for you and your engine. And maybe I'll be able to learn something

jmcd · Post by **jmcd** » Tue Jul 05, 2022 3:07 am

algerbrex wrote: ↑Tue Jul 05, 2022 2:20 am
Hmm, not sure I share the intuition. If you're using a tapered evaluation, then it seems to make sense to me that the value of a pawn is dependent on the phase of the game. In the middle game, a pawn can often be given away for abig initiative and lead in development, so it probably isn't quite worth a full 100 cenitpawns, especially more so if it's on certain squares, which is what the PSQT help to indicate. But in the endgame where promotion is often the only path to victory, pawns become extremely important and easily worth the value of a minor or more, which is again what a different pawn value would help indicate, as well as the passed pawn bonuses dependent on rank.

But as I said, definitely experiment and find what works best for you and your engine. And maybe I'll be able to learn something

I get that the value of a piece changes throughout the game. But if I took my current mg pawn value and added it to every element in my mg pawn psqt table and then set the mg pawn value to zero, the evaluation function would remain the same. The change in piece value over the course of a game can be wholly represented by the psqts. Because of that, if you want to maintain the 1 pawn = 100 centipawns, you can just not tune the pawn value and let the psqts sort it out.

If I'm wrong about this I'd definitely like to know.

algerbrex · Post by **algerbrex** » Tue Jul 05, 2022 7:10 am

jmcd wrote: ↑Tue Jul 05, 2022 3:07 am I get that the value of a piece changes throughout the game. But if I took my current mg pawn value and added it to every element in my mg pawn psqt table and then set the mg pawn value to zero, the evaluation function would remain the same. The change in piece value over the course of a game can be wholly represented by the psqts. Because of that, if you want to maintain the 1 pawn = 100 centipawns, you can just not tune the pawn value and let the psqts sort it out.

If I'm wrong about this I'd definitely like to know.

Ah ok, I understand you. I was just a little confused about what you were getting at.

I follow your logic, and as I said, I don't think your idea is crazy and is certainly something worth exploring. My intuition (not based on any hard data), was that the tuning you get from tuning both the piece values and PSQT values would be able to more detailed than holding the piece values constant and just tuning PSQT values. But I think this was mistakenly based on me recalling having toyed with the idea in the past and not having very good results. Regardless, in my estimation, it's a perfectly fine idea to go with.

jmcd · Post by **jmcd** » Tue Jul 05, 2022 7:30 am

I've done most of the training with the zurichess dataset now and it looks like its a pretty massive ELO gain, and I didn't have to withhold any evaluation parameters. Is this dataset large enough though? When I used the Ethereal one, I saw a significant improvement in strength when I went from 1M to 7.5M. The zurichess only has 725k positions.

algerbrex · Post by **algerbrex** » Tue Jul 05, 2022 8:32 am

jmcd wrote: ↑Tue Jul 05, 2022 7:30 am I've done most of the training with the zurichess dataset now and it looks like its a pretty massive ELO gain, and I didn't have to withhold any evaluation parameters. Is this dataset large enough though? When I used the Ethereal one, I saw a significant improvement in strength when I went from 1M to 7.5M. The zurichess only has 725k positions.

I've found that when doing the kind of evaluation tuning we're doing, millions of positions aren't needed to see good strength gains, and quality is far more important than quantity. The Zurichess dataset is pretty popular and the method used to create it makes it quite high quality (not saying Andrew's positions aren't high quality as well). The majority of Blunder's evaluation before 7.4.0, if I recall correctly, was tuned using only 400K positions from Zuirhcess's dataset. I only upped it too the full 725K once I got a better computer, and then very recently ~1M which was made possible efficeny-wise by using gradient descent.

PSQT Tuning questions

PSQT Tuning questions

Re: PSQT Tuning questions

Re: PSQT Tuning questions

Re: PSQT Tuning questions

Re: PSQT Tuning questions

Re: PSQT Tuning questions

Re: PSQT Tuning questions

Re: PSQT Tuning questions