Salvaging a timed-out iteration

Tord Romstad · Post by **Tord Romstad** » Fri May 23, 2008 6:11 pm

bob wrote:Done right, there is no risk. At the root there are two types of fail-high moves, assuming you are doing aspiration search and normal PVS with null-move.

1. You search a move and it fails high on the aspiration search, which means the search is better than your original estimate. Playing this move without completing the re-search has no risk at all.

2. You search any but the first move and get a fail-high on the null-window search. This move is not safe to play until you research and at least get a fail high on the original beta (aspiration) bound. null-window searches can fail high on the null-window search, but the actual move might not have a better score and might well fail low on the re-search. I do not accept a null-window fail-high at the root as a new best move until the research proves this with either a better score or a fail-high on the aspiration window...

And I have had zero problems since I started handling case 2 properly as above. Trusting the null-window fail-high will absolutely wreck you at times, which is troubling but a fact of search.

It seems that you didn't understand my idea. What I proposed was that we could try trusting the null-window fail-high in sufficiently desperate positions. If the previous best move has a clearly losing score, the null-window search of a move later in the move list fails high, and the allocated time runs out before the re-search is finished, just play the fail-high move. There is a risk that it's not good, but in practice this risk isn't particularly important, because the previous best move appears to be losing anyway. I'm not sure it works, but I think it is worth trying.

Your case 2 doesn't exist for me, by the way: I don't use aspiration windows. I've tried them many times, but they just don't work at all for me.

Tord

AndrewShort · Post by **AndrewShort** » Sat May 24, 2008 2:18 am

well, I have fixed it in a simple way. If the timed out iteration has a PV returned (which can only happen if a score got backed up to the root on at least the first move, which is first move of the previous pv), then I use the first move of that new pv as the move to make. Easy, and I can see in some cases that my engine is choosing a new move.

However, in the vast majority of cases, the move doesn't change. That's because a search depth of x+1 does not often yield a different move from a search depth of x (the pv might be different at the tail end, but the first move of pv does not often change). In this case, the move would only be different if the moves would be different anyway, and the time out happened to occur after the new move was found. Rare.

bob · Post by **bob** » Sat May 24, 2008 3:21 am

Tord Romstad wrote:
bob wrote:Done right, there is no risk. At the root there are two types of fail-high moves, assuming you are doing aspiration search and normal PVS with null-move.

1. You search a move and it fails high on the aspiration search, which means the search is better than your original estimate. Playing this move without completing the re-search has no risk at all.

2. You search any but the first move and get a fail-high on the null-window search. This move is not safe to play until you research and at least get a fail high on the original beta (aspiration) bound. null-window searches can fail high on the null-window search, but the actual move might not have a better score and might well fail low on the re-search. I do not accept a null-window fail-high at the root as a new best move until the research proves this with either a better score or a fail-high on the aspiration window...

And I have had zero problems since I started handling case 2 properly as above. Trusting the null-window fail-high will absolutely wreck you at times, which is troubling but a fact of search.
It seems that you didn't understand my idea. What I proposed was that we could try trusting the null-window fail-high in sufficiently desperate positions. If the previous best move has a clearly losing score, the null-window search of a move later in the move list fails high, and the allocated time runs out before the re-search is finished, just play the fail-high move. There is a risk that it's not good, but in practice this risk isn't particularly important, because the previous best move appears to be losing anyway. I'm not sure it works, but I think it is worth trying.

Your case 2 doesn't exist for me, by the way: I don't use aspiration windows. I've tried them many times, but they just don't work at all for me.

Tord

I understood that. And there is another alternative...

Proposed by Hans Berliner many years ago. Goes like this:

You search a move for several iterations and get it as best, until iteration N (the last one you have time for) and here that move fails low, and you switch to a different move. Quite often, this move loses quicker than the original, but at the depth N you search it to, it looks better. Berliner proposed sticking with the original best move as it took a very deep move to refute it.

This came up in a position where you are apparently doing OK until iteration N where you discover you are going to lose your queen. Instead, you throw away a bishop and save the queen,but you are still dead lost. And all of this assumes your opponent sees the very deep tactics necessary to win the queen. So you have two choices:

(1) hope your opponent doesn't see the winning continuation, you don't lose the queen, and you will probably win the game.

(2) throw away the knight in a way that he can't possibly miss taking it, thereby saving the queen but losing the game. And in a much more obvious way that "shows him how to beat you".

(1) makes good sense against humans and we did this in a very late version of Cray Blitz, a mode we could turn on or off at will. We used it all the time in OTB games against humans, and even against computers if we thought they were significantly weaker than CB.

There are lots of ways to speculate, but against a strong computer opponent they will all most likely fail badly...

hgm · Post by **hgm** » Sat May 24, 2008 9:07 am

The 'showing him how to beat you' problem is not really related to time control. It is a built-in shortcoming of minimax, that does score the loss of a Queen after 20 ply in exactly the same way as losing it on ply 2. It is also a fault of naive evaluation, which thinks that being a Rook behind is enormously better than being a Queen behind. While in fact both are positions that lose the game with >99.9% probability.

Combined, these two flaws lead to the choice in the example you give. If you think it would be better to play a move that loses a Queen after N ply than losing a Knight on ply 2, the score should reflect that. This is easy to achieve, by applying a saturating correction to evaluation scores (like arctan(eval / 100cP), in combination with a hefty delayed-loss bonus (say 5% of the score).

In that case the loss of a Queen (say -800) would produce a score of -1.446 when it occurs. The loss of a Knight (say -200) would produce a score of -1.107. But the latter score would be softened in propagating from ply 2 to the root only one time 5%, to -1.052. The loss of the Queen, propagating from ply 20, would be softened 10 times 5%, to -0.866. So it would be percieved as better to lose the Queen on ply 20 than give the Knight now.

Having the scores reflect you r preference is also safer w.r.t. alpha-beta. Because with alpha-beta you might never be sure how bad the alternative with the lower score is. If you fin the score of the Q-loss-in-20 first, it is no problem, because at that time it was the best move, and got an exact score and a valid PV. You can then weight that against the score-wise better Knight sacrifice. But suppose you searched the Knight sacrifice first. You would then never know that there was a preferable option that 'only' loses a Queen on ply 20, because the search of that branch might fail low on the sacrifice of a Bishop on ply 4 to prevent the Queen loss.

Saturating the score and giving bonus for delaying a loss does not suffer from such problems.

bob · Post by **bob** » Sat May 24, 2008 2:54 pm

hgm wrote:The 'showing him how to beat you' problem is not really related to time control. It is a built-in shortcoming of minimax, that does score the loss of a Queen after 20 ply in exactly the same way as losing it on ply 2. It is also a fault of naive evaluation, which thinks that being a Rook behind is enormously better than being a Queen behind. While in fact both are positions that lose the game with >99.9% probability.

Think outside the box. I am _up_ a piece right now. later I can either lose the queen to a very deep threat, or give back the rook right now. which do you do? Giving up the rook (which is now just an exchange) is not a guaranteed loss.

This is not quite as simple an evaluation issue as you make it out to be. If you assume "zero origin" then either loses. But if you are up, it might change your mind a bit. If you are down already, then most any risk is acceptable since you are losing anyway unless you pull a rabbit from your hat...

Combined, these two flaws lead to the choice in the example you give. If you think it would be better to play a move that loses a Queen after N ply than losing a Knight on ply 2, the score should reflect that. This is easy to achieve, by applying a saturating correction to evaluation scores (like arctan(eval / 100cP), in combination with a hefty delayed-loss bonus (say 5% of the score).

In that case the loss of a Queen (say -800) would produce a score of -1.446 when it occurs. The loss of a Knight (say -200) would produce a score of -1.107. But the latter score would be softened in propagating from ply 2 to the root only one time 5%, to -1.052. The loss of the Queen, propagating from ply 20, would be softened 10 times 5%, to -0.866. So it would be percieved as better to lose the Queen on ply 20 than give the Knight now.

Having the scores reflect you r preference is also safer w.r.t. alpha-beta. Because with alpha-beta you might never be sure how bad the alternative with the lower score is. If you fin the score of the Q-loss-in-20 first, it is no problem, because at that time it was the best move, and got an exact score and a valid PV. You can then weight that against the score-wise better Knight sacrifice. But suppose you searched the Knight sacrifice first. You would then never know that there was a preferable option that 'only' loses a Queen on ply 20, because the search of that branch might fail low on the sacrifice of a Bishop on ply 4 to prevent the Queen loss.

Saturating the score and giving bonus for delaying a loss does not suffer from such problems.

I don't see how that can possibly work in todays alpha/beta with hashing framework. Which makes this idea unusable, unless the hash table is thrown out.

hgm · Post by **hgm** » Sun May 25, 2008 8:59 am

If you are up a Rook (+500, -> +1.373 after saturation), losing the exchange would leave +300 (-> +1.249). Losing a Queen, OTOH, would leave you with -300 (-> -1.249). So in this case the difference is asymptotically large, and the delayed-loss bonus could never bridge the gap between the two, no matter how much you could delay the loss of the Queen. So again, the scheme works perfectly.

As the scores are not path dependent, hashing is fully transparent, as usual.

Changing the evaluation (with the saturating correction) of course has no impact on alpha-beta at all. Neither has the delayed loss bonus, provided it is properly implemented (i.e. pre-adjusting the window limits for the changes these scores will undergo after the search returns them). I have been using that in all my engines since 1980, and it has always worked flawlessly.

So what you say is simply not true.

bob · Post by **bob** » Sun May 25, 2008 11:11 pm

hgm wrote:If you are up a Rook (+500, -> +1.373 after saturation), losing the exchange would leave +300 (-> +1.249). Losing a Queen, OTOH, would leave you with -300 (-> -1.249). So in this case the difference is asymptotically large, and the delayed-loss bonus could never bridge the gap between the two, no matter how much you could delay the loss of the Queen. So again, the scheme works perfectly.

As the scores are not path dependent, hashing is fully transparent, as usual.

Changing the evaluation (with the saturating correction) of course has no impact on alpha-beta at all. Neither has the delayed loss bonus, provided it is properly implemented (i.e. pre-adjusting the window limits for the changes these scores will undergo after the search returns them). I have been using that in all my engines since 1980, and it has always worked flawlessly.

So what you say is simply not true.

That was not my point. Which was that losing a piece might or might not be a bad thing. I've seen a program many times think "aha, I am winning a pawn" when doing a shallow search, only to realize that on the last iteration, it is not.

There's a difference between choosing to be down a piece or a queen (both of which will likely lose) if the initial score is around zero, vs when the initial score is around +5.00. Now losing the piece is not tantamount to losing the game, and the speculative point would be different for each case...

hgm · Post by **hgm** » Sun May 25, 2008 11:24 pm

Well, that is exactly what saturating the evaluation score quantifies: losing a Rook when you are a Queen ahead (or behind) hardly changes the score, because you are in the saturated part of the curve (although losing a Rook will remain ~5 times as expensive as losing a Pawn). But losing it near equality (where arctan(x) ~ x) fully counts.

So what exactly is your point???

bob · Post by **bob** » Mon May 26, 2008 2:33 am

hgm wrote:Well, that is exactly what saturating the evaluation score quantifies: losing a Rook when you are a Queen ahead (or behind) hardly changes the score, because you are in the saturated part of the curve (although losing a Rook will remain ~5 times as expensive as losing a Pawn). But losing it near equality (where arctan(x) ~ x) fully counts.

So what exactly is your point???

Try going back to the beginning of the thread, and discussing "speculation" in how to address something that is discovered at the "last minute".

The two cases are different. Yet the numbers are the same. And how they need to be handled are different, when discussing some sort of "speculative solution" when something happens at the last minute.

So what exactly is _your_ point, within the context of the start of this thread? Mine was that it isn't black and white as to how to best handle something seen at the last minute before an iteration has been completed... Nothing more, nothing less...

hgm · Post by **hgm** » Mon May 26, 2008 8:58 am

My point was that 'last minute' merely means 'large depth'. Apparently there are ways to postpone the loss to a very late ply, or an earlier iteration would have already seen it before.

If you think that it is preferable to do the move that leads to such a late, but bigger loss than trying to limit the damage by giving less (but still lethal) material early on, because the opponent is much more likely to miss a deep tactic than an shallow one, the score should reflect it.

Imagine you would have finished the 20th iteration, with the conclusion that the move that looked good (say +0.03) on the 19th now does -8, and your best move of the 20th does -2.5. Wouldn't it still be better to go for the -8 than for the -2.5. And if you came from +5.03 in stead, would't it be better to go for a certain +2.5 than risk the -3? And wouldn't that be exactly the same even if you were able to finish a 21st iteration after that (confirming the scores of the 20th)?

So my point is that these are things that should be reflected by the score, and not by accidental timing effects.

Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration

Re: Salvaging a timed-out iteration