Positional quiesence
Posted: Sat Apr 12, 2014 11:10 am
The initial setup of orthodox Chess is all wrong for the King and Rooks: Kings don't belong on the central files, but need to be safely tucked away in a corner. This is why castling was invented, and why having castled should give a hefty positional bonus. As we all know, awarding such a bonus all at once, at the point where the King reaches its safe destination, can cause problems through horizon effect: when your opponent can castle, the engine will attempt all kind of silly positional, and perhaps even material sacrifices to push the castling over the horizon. (The bonus for good King safety is easily larger than a Pawn...) While of course in the long run the opponent will castle; there is no way you could prevent that. But the program does not know.
There also is a more subtle problem: if both can castle within the horizon, both get the bonus, and there is no net advantage. And castling is a quiet move, almost like null move. So when I castle, it gives the opponent the opportunity to castle as well. It is very usual in Chess that castling happens in subsequent moves, like it is infective. So rather than castling immediately, and allowing his opponent to do so as well, the program will try to push the opponent's castling over the horizon, by playing aggressive moves to which the opponent has to react, so it can castle on the last move before the horizon. So in stead of castling, it prefers other moves. The larger the castling bonus, the more reluctant it will be to castle!
Of course the conventional solution is to not award the castling bonus all at once, but distribute it over the preparation stages. Like emptying the squares between K and R, and awarding part of the Pawn-shield bonus it will get after castling already for having the rights to do so. An alternative would be to consider castling a non-quiet move, to be searched in Quiescence Search. That would prevent the trick of castling just before the horizon, as the reply castling would still be searched in QS. (As you can castle only once, and in most of the game not at all, this can never cause search explosion, and is not very expensive.) It would not help against making a sacrificial threat just before the horizon, and standing pat yourself after the opponent deals with it, though.
The reason I am bringing this up is that I am dealing with a Chess variant now (Chu Shogi) in which the initial position is all wrong on a vastly larger scale. All valuable sliders, such as Queen and Super-Rooks go in front (directly behind the Pawns) of the 4-rank deep initial army, and the light (Knight-class) pieces are tucked away on the back rank. So before you can safely engage the enemy, you have to swap that around in some massive 'vertical castling' operation, where you have the same problem that the setup is crowded with pieces in between them, and that there are no special moves that make them hop over each other, like castling in orthodox Chess allows the Rook to hop over the King.
So the obvious solution is to encourage advancing of the light stepping pieces on the back rank to 4th and 5th rank by giving an eval bonus for it through the PST. And make sure the bonus outweighs what can be earned from mobility by developing the valuable sliders, so that you won't start deploying them 'naked' in the center. (This is also a rule in orthodox Chess: develop the minors before your Queen and Rooks.) The light pieces have to step forward rank by rank, after creating room for them to do so, however. So there is automatically some distribution of the eventual bonus for being at the front. But even stepping a single rank needs a bonus that exceeds the PST + mobility increase for developing a slider, which usually can be done in a single move. Otherwise it would plan to develop 4 sliders in the 4 moves it can see up the the horizon (assumed to be at 8 ply), rather than move a single stepper up 4 ranks.
So even the distributed bonuses are pretty large, and as a result the horizon effect kicks in: no matter how large I make the bonus, it is very reluctant to advance the steppers in its rear guard, as these are quiet moves, and would allow the opponent to be the same. So in stead of actually playing them from the root, it starts to make pointless threats with its sliders in the center, too dangerous to ignore for the opponent, so that he can save its own good moves for just before the horizon. (And thus will never play them.) A bit similar to roaming the board with the Queen in Chess openings, making attacks on all kind of Pawns that can easily be protected by other Pawns, just to push development of the opponent's pieces over the horizon. The larger you make the bonus, the less inclined it will be to play the moves.
I wonder how this very fundamental problem can be solved. For material evaluation QS was invented to prevent saving your best capture for last, counting yourself rich in the illusion he would not retaliate. But it is not feasible to try every light-piece forward non-capture move in QS, as that would surely explode it. Without such a search for 'positional quiescence' (i.e. search until he runs out of moves with a spectacularly high positional score) you would get very much alternating scores between odd and even iterations, depending on who can do the last move, and ignore the opponent's equally good one that would follow.
Such alternating scores can be ameliorated by a 'tempo bonus', for who has the move. But for that to make sense, the value of the bonus should really adapt to the positional gain that is realistically possible to achieve with a move. Giving it the value gained by your best positional move would cause just the inverse alteration (since in fact it would be equivalent to doing one extra ply of search, except that you would not worry if the move was tactically sound.) So you would need to give it only half of it to dampen the oscillations of the score with increasing depth.
This suggests an alternative approach to positional quiescence: if the final ply before QS is met by stand-pat (i.e. returns the evaluation score after the move), you would not take that eval score, but the average of the score before and after the move. Or, in other words, if you stand pat in reply to a non-capture, don't use currentEval, but average it with the previousEval of the parent, (which was likely better, from the POV of the side standing pat), to decide if you would still like to stand pat. This would strongly discourage saving good positional moves for the last ply of the branch.
There also is a more subtle problem: if both can castle within the horizon, both get the bonus, and there is no net advantage. And castling is a quiet move, almost like null move. So when I castle, it gives the opponent the opportunity to castle as well. It is very usual in Chess that castling happens in subsequent moves, like it is infective. So rather than castling immediately, and allowing his opponent to do so as well, the program will try to push the opponent's castling over the horizon, by playing aggressive moves to which the opponent has to react, so it can castle on the last move before the horizon. So in stead of castling, it prefers other moves. The larger the castling bonus, the more reluctant it will be to castle!
Of course the conventional solution is to not award the castling bonus all at once, but distribute it over the preparation stages. Like emptying the squares between K and R, and awarding part of the Pawn-shield bonus it will get after castling already for having the rights to do so. An alternative would be to consider castling a non-quiet move, to be searched in Quiescence Search. That would prevent the trick of castling just before the horizon, as the reply castling would still be searched in QS. (As you can castle only once, and in most of the game not at all, this can never cause search explosion, and is not very expensive.) It would not help against making a sacrificial threat just before the horizon, and standing pat yourself after the opponent deals with it, though.
The reason I am bringing this up is that I am dealing with a Chess variant now (Chu Shogi) in which the initial position is all wrong on a vastly larger scale. All valuable sliders, such as Queen and Super-Rooks go in front (directly behind the Pawns) of the 4-rank deep initial army, and the light (Knight-class) pieces are tucked away on the back rank. So before you can safely engage the enemy, you have to swap that around in some massive 'vertical castling' operation, where you have the same problem that the setup is crowded with pieces in between them, and that there are no special moves that make them hop over each other, like castling in orthodox Chess allows the Rook to hop over the King.
So the obvious solution is to encourage advancing of the light stepping pieces on the back rank to 4th and 5th rank by giving an eval bonus for it through the PST. And make sure the bonus outweighs what can be earned from mobility by developing the valuable sliders, so that you won't start deploying them 'naked' in the center. (This is also a rule in orthodox Chess: develop the minors before your Queen and Rooks.) The light pieces have to step forward rank by rank, after creating room for them to do so, however. So there is automatically some distribution of the eventual bonus for being at the front. But even stepping a single rank needs a bonus that exceeds the PST + mobility increase for developing a slider, which usually can be done in a single move. Otherwise it would plan to develop 4 sliders in the 4 moves it can see up the the horizon (assumed to be at 8 ply), rather than move a single stepper up 4 ranks.
So even the distributed bonuses are pretty large, and as a result the horizon effect kicks in: no matter how large I make the bonus, it is very reluctant to advance the steppers in its rear guard, as these are quiet moves, and would allow the opponent to be the same. So in stead of actually playing them from the root, it starts to make pointless threats with its sliders in the center, too dangerous to ignore for the opponent, so that he can save its own good moves for just before the horizon. (And thus will never play them.) A bit similar to roaming the board with the Queen in Chess openings, making attacks on all kind of Pawns that can easily be protected by other Pawns, just to push development of the opponent's pieces over the horizon. The larger you make the bonus, the less inclined it will be to play the moves.
I wonder how this very fundamental problem can be solved. For material evaluation QS was invented to prevent saving your best capture for last, counting yourself rich in the illusion he would not retaliate. But it is not feasible to try every light-piece forward non-capture move in QS, as that would surely explode it. Without such a search for 'positional quiescence' (i.e. search until he runs out of moves with a spectacularly high positional score) you would get very much alternating scores between odd and even iterations, depending on who can do the last move, and ignore the opponent's equally good one that would follow.
Such alternating scores can be ameliorated by a 'tempo bonus', for who has the move. But for that to make sense, the value of the bonus should really adapt to the positional gain that is realistically possible to achieve with a move. Giving it the value gained by your best positional move would cause just the inverse alteration (since in fact it would be equivalent to doing one extra ply of search, except that you would not worry if the move was tactically sound.) So you would need to give it only half of it to dampen the oscillations of the score with increasing depth.
This suggests an alternative approach to positional quiescence: if the final ply before QS is met by stand-pat (i.e. returns the evaluation score after the move), you would not take that eval score, but the average of the score before and after the move. Or, in other words, if you stand pat in reply to a non-capture, don't use currentEval, but average it with the previousEval of the parent, (which was likely better, from the POV of the side standing pat), to decide if you would still like to stand pat. This would strongly discourage saving good positional moves for the last ply of the branch.