Hello, everyone! I have recently had a simple idea to improve on PST‐only evaluations elegantly (without adding a lot of complexity). I wanted to know whether the idea seems sound!
My idea is that instead of using a single PST, one would have a PST for every piece–square combination (a table of PSTs), and you’d use the average (or sum) of all PSTs that are appropriate for the given board state.
So, say you have one of your rooks on a1: then you’d have a PST represented by that rook. Then if you also have a rook on e1, you’d have a (different) PST represented by that rook. And so on for every piece you and your opponent have on the board. You’d then use all of those PSTs to evaluate like you would normally, and then take the average of all of those evaluations.
Of course, such a table would be too extensive to be written by hand, so it’d need to be generated automatically. (Which might not be entirely trivial.)
The premise is that it could help encode certain patterns a bit more accurately than a regular PST. For example, if you have a rook on d1, then it might bump the values of queens and rooks on that file, to represent that stacking them might be useful. Likewise for pawns on that file, which would represent that pawns protected by rooks are more valuable.
At first, I felt like this could become slow when there are a lot of pieces on the board, but a friend mentioned that it seemed easily optimizable by SIMD/SWAR instructions. Even without those kinds of optimizations, though, I still think this might be a worthwhile approach to investigate.
Any kinds of thoughts would be appreciated! Has this been thought of before? If so, what has it been called?
evaluating by “tables of PSTs” (PSTTs) — is this a sensible idea?
Moderator: Ras
-
- Posts: 26
- Joined: Wed Feb 16, 2022 6:21 am
- Full name: P. M. Zamboni
-
- Posts: 87
- Joined: Thu Oct 07, 2021 12:48 am
- Location: Warsaw, Poland
- Full name: Michal Witanowski
Re: evaluating by “tables of PSTs” (PSTTs) — is this a sensible idea?
Yes, I have king-relative PSTs for non-neural-net evaluation in Caissa which is kind of subset of what you described. Basically there's separate PST for each king placement (something similar as NNUE does in Stockfish). It gives at least +150 Elo compared to simple PST, because there's much more king-safety knowledge in this kind of PST.
I also tried pawn pairs tables to improve pawn structure evaluation. It contained 48*48*2 values (our pawn on any square vs. any other pawn on any square). It gave yet another +100 Elo, but in the end I'm not using it as it's slow to compute and needs to be cached and it made engine slower when it was used along with NNUE. Here's trainer code for this: https://github.com/Witek902/Caissa/blob ... r.cpp#L307 It basically adds more input features to the PST
I never tried every piece vs. every piece tables though, but I guess it could be another ~100 Elo gain. The problem with this solution is performance, because it has to be updated incrementally and even then it requires iterating through all pieces on board for every added/moved/removed piece. I'm tempted to get back to this idea at some point.
I also tried pawn pairs tables to improve pawn structure evaluation. It contained 48*48*2 values (our pawn on any square vs. any other pawn on any square). It gave yet another +100 Elo, but in the end I'm not using it as it's slow to compute and needs to be cached and it made engine slower when it was used along with NNUE. Here's trainer code for this: https://github.com/Witek902/Caissa/blob ... r.cpp#L307 It basically adds more input features to the PST
I never tried every piece vs. every piece tables though, but I guess it could be another ~100 Elo gain. The problem with this solution is performance, because it has to be updated incrementally and even then it requires iterating through all pieces on board for every added/moved/removed piece. I'm tempted to get back to this idea at some point.
Author of Caissa Chess Engine: https://github.com/Witek902/Caissa
-
- Posts: 36
- Joined: Thu Mar 03, 2022 7:29 am
- Full name: Alvin Peng
Re: evaluating by “tables of PSTs” (PSTTs) — is this a sensible idea?
If I'm understanding you correctly, your idea similar to (but not exactly like) a 1 hidden layer feedforward neural network.
A neural network with 768 input features and 768 hidden neurons is like having 768 different piece-square tables. Each one of these 768 piece-square tables (i.e. hidden neuron in the neural network) would have a high weight associated with a particular piece being on a particular square, a massive negative bias, and a ReLU activation. This would force each hidden neuron to only be active when it's designated piece-square feature is active. IMO, this is vastly inferior to having a neural network learn itself when each hidden neuron should be active.
Also, unlike a 1 hidden layer feedforward neural network, your idea takes the average of all the active neuron outputs after the ReLU activation rather than using a weighted sum.
If you are looking for an elegant PST-only evaluation, I would recommend taking a look at efficiently updatable neural networks (NNUEs). They are basically just a bunch of piece square tables combined together using ReLU and a weighted sum.
A neural network with 768 input features and 768 hidden neurons is like having 768 different piece-square tables. Each one of these 768 piece-square tables (i.e. hidden neuron in the neural network) would have a high weight associated with a particular piece being on a particular square, a massive negative bias, and a ReLU activation. This would force each hidden neuron to only be active when it's designated piece-square feature is active. IMO, this is vastly inferior to having a neural network learn itself when each hidden neuron should be active.
Also, unlike a 1 hidden layer feedforward neural network, your idea takes the average of all the active neuron outputs after the ReLU activation rather than using a weighted sum.
If you are looking for an elegant PST-only evaluation, I would recommend taking a look at efficiently updatable neural networks (NNUEs). They are basically just a bunch of piece square tables combined together using ReLU and a weighted sum.