Piece drops and horizon effect
Posted: Thu Sep 12, 2013 3:27 pm
I have little experience with Crazyhouse engines. But I noticed that many of the weaker Shogi engines show the same characteristic foolish behavior:
1) Drop a Pawn at X so it attacks an opponent Pawn at Y
2) Y x X
3) Piece x X
4) P@Y {attacking the Piece}
5) Withdraw Piece {usually to where it came from}
Much of this sequence is forced: if Y would not capture X in (2), then X could capture Y in (3), often with promotion. So you cannot ignore the drop (1) and start wih your own attack, even if you have a profitable one. Using the Pawn to capture, however, compromises your Pawn wall, through which the recapturing piece might enter, capturing something that before was shielded by the Pawn. This forces (4) to close hole.
At the end of the sequence everything is back where it started, but the other side now has the move. So the sequence wastes a tempo for the side that initiated it. If he withdraws the recapturing piece to another square then were it came from, it does not even waste a tempo, but is equivalent to moving the piece there directly (if that was possible). It just inserts 4 extra ply.
Now wasting a tempo is only penalized if it pushes useful moves (with positive score) over the horizon. A 'side-to-move bonus' would not help, as who is to move at the leave node is only dependent on the depth, not on whether you waste a tempo along the branch. If there is trouble coming towards you, the 4 inserted ply are actually a rather cheap way to push it beyond the horizon. This probably explains why the sequence of moves is so popular; it is like you are allowed to play R=4 null moves in the PV: head in the sand, and don't move a muscle while the opponent gradually closes in on you without meeting any resistance.
Even when there is no real threat, but the engine simple doesn't see anything to strive for, it will engage in this practice. This is similar to Chess engines starting to move there Rook back and forth when they don't know what to do anymore.
I wonder how this kind of nonsense can be suppressed. With path-dependent scoring it would be easy enough to recognize the pattern in the path (like moving the same piece multiple times to places where it could have moved at once, or already was), and penalize that. But that interferes badly with hashing. Now hashing is in fact somewhat helpful to suppress the horizon effect that inserting the exra 4 plies causes: The null move in stead of (1) would bring you to the same position as after (5), and would have been searched before you attempted (1), and apparently failed low, or you would not even try (1). So you should get a hash hit on it, grafting the misery you tried to push over the horizon. Except if this was a PV node where you did not search a null move. Or when you were closer than 4 ply to the horizon, so that you can nicely stand pat after (4), ignoring that you now have to rescue your Piece, and at the same time have the trouble that made the original null move fail coming at you.
So I am looking for other ways to discourage this. If I cannot do so at the leave node by making the eval path dependent, I can perhaps affect the propagation of this eval towards the root. For instance, I could subtract a penalty from the score of any Pawn drop in front of another Pawn. That way, the score from a null move in stead of (1) that you retrieve from the TT after (5) would always seem worse than taking the immediate null move. The problem is that you don't want to discourage Pawn drop (1) when it is good, e.g. because (2) is not possible at all, the Pawn at Y being pinned, so that there now is an extra Pawn attacker of it. So you only want to penalize these Pawn drops if they are indeed refuted by capture (2).
Now if (1) fails high at large remaining depth it could not have taken the full path to (5), where a hash hit would await it that fails low, and there wouldn't be any reason to mistrust (1). If (1) already fails low by itself, there is nothing to worry about. When it is PV then it should return a PV, on which we can do a pattern match to see if it is planning a time-wasting sequence of moves (be it the drop-capture sequence or just moving a slider back and forth). Making the score of a move dependent on the path leading from it, rather than to it, does not introduce any inconsistencies with hashing. You could simply subtract a penalty from a PV move that is planning to take a path that you recognize as pointless or harmful, forcing it to take the shorter path in stead, and prove there that the position you are aiming for is any good (at higher depth).
This would also solve a case encountered in normal Chess, when you can give a perpetual when you are ahead (so drawing is not attractive), but are facing a setback in one of the moves. The search will then start to insert some moves of the perpetual wherever it can, (basically every other move), to push the setback beyond the horizon. So after check1 - evade1 - moveA - moveB - check2 - evade2 as leading PV moves, when check2 - evade2 are the reverse moves of check1 - evade1, it should recognize that, and realize that any improvement this PV gives over an immediate moveA - moveB must be illusory.
1) Drop a Pawn at X so it attacks an opponent Pawn at Y
2) Y x X
3) Piece x X
4) P@Y {attacking the Piece}
5) Withdraw Piece {usually to where it came from}
Much of this sequence is forced: if Y would not capture X in (2), then X could capture Y in (3), often with promotion. So you cannot ignore the drop (1) and start wih your own attack, even if you have a profitable one. Using the Pawn to capture, however, compromises your Pawn wall, through which the recapturing piece might enter, capturing something that before was shielded by the Pawn. This forces (4) to close hole.
At the end of the sequence everything is back where it started, but the other side now has the move. So the sequence wastes a tempo for the side that initiated it. If he withdraws the recapturing piece to another square then were it came from, it does not even waste a tempo, but is equivalent to moving the piece there directly (if that was possible). It just inserts 4 extra ply.
Now wasting a tempo is only penalized if it pushes useful moves (with positive score) over the horizon. A 'side-to-move bonus' would not help, as who is to move at the leave node is only dependent on the depth, not on whether you waste a tempo along the branch. If there is trouble coming towards you, the 4 inserted ply are actually a rather cheap way to push it beyond the horizon. This probably explains why the sequence of moves is so popular; it is like you are allowed to play R=4 null moves in the PV: head in the sand, and don't move a muscle while the opponent gradually closes in on you without meeting any resistance.
Even when there is no real threat, but the engine simple doesn't see anything to strive for, it will engage in this practice. This is similar to Chess engines starting to move there Rook back and forth when they don't know what to do anymore.
I wonder how this kind of nonsense can be suppressed. With path-dependent scoring it would be easy enough to recognize the pattern in the path (like moving the same piece multiple times to places where it could have moved at once, or already was), and penalize that. But that interferes badly with hashing. Now hashing is in fact somewhat helpful to suppress the horizon effect that inserting the exra 4 plies causes: The null move in stead of (1) would bring you to the same position as after (5), and would have been searched before you attempted (1), and apparently failed low, or you would not even try (1). So you should get a hash hit on it, grafting the misery you tried to push over the horizon. Except if this was a PV node where you did not search a null move. Or when you were closer than 4 ply to the horizon, so that you can nicely stand pat after (4), ignoring that you now have to rescue your Piece, and at the same time have the trouble that made the original null move fail coming at you.
So I am looking for other ways to discourage this. If I cannot do so at the leave node by making the eval path dependent, I can perhaps affect the propagation of this eval towards the root. For instance, I could subtract a penalty from the score of any Pawn drop in front of another Pawn. That way, the score from a null move in stead of (1) that you retrieve from the TT after (5) would always seem worse than taking the immediate null move. The problem is that you don't want to discourage Pawn drop (1) when it is good, e.g. because (2) is not possible at all, the Pawn at Y being pinned, so that there now is an extra Pawn attacker of it. So you only want to penalize these Pawn drops if they are indeed refuted by capture (2).
Now if (1) fails high at large remaining depth it could not have taken the full path to (5), where a hash hit would await it that fails low, and there wouldn't be any reason to mistrust (1). If (1) already fails low by itself, there is nothing to worry about. When it is PV then it should return a PV, on which we can do a pattern match to see if it is planning a time-wasting sequence of moves (be it the drop-capture sequence or just moving a slider back and forth). Making the score of a move dependent on the path leading from it, rather than to it, does not introduce any inconsistencies with hashing. You could simply subtract a penalty from a PV move that is planning to take a path that you recognize as pointless or harmful, forcing it to take the shorter path in stead, and prove there that the position you are aiming for is any good (at higher depth).
This would also solve a case encountered in normal Chess, when you can give a perpetual when you are ahead (so drawing is not attractive), but are facing a setback in one of the moves. The search will then start to insert some moves of the perpetual wherever it can, (basically every other move), to push the setback beyond the horizon. So after check1 - evade1 - moveA - moveB - check2 - evade2 as leading PV moves, when check2 - evade2 are the reverse moves of check1 - evade1, it should recognize that, and realize that any improvement this PV gives over an immediate moveA - moveB must be illusory.