Good Example of Horizon effect in Eval

Ross Boyd · Post by **Ross Boyd** » Mon Feb 04, 2008 7:44 pm

hgm wrote:Yes, it would be nice if you had recognizers that were so accurate that you could stop search immediately. But it might be a lot easier to make recognizers that only cut the search after being in the end-game a few moves. Like KRKR = draw if you are already in there 3 moves.

That's also what I'm trying to do. When an interior node is recognized in the search the depth_remaining is reduced to 3 ie.

Code: Select all

    
    // draw recognition code
    ....
    // 50 move rule
    ....

    mat_probe&#40;pos, mat&#41;; // probe material imbalance matrix 
    if &#40;mat_recognition&#40;pos,mat,false&#41; ) // is it a recognized ending?
        depth = min&#40;depth,3&#41;;

bob · Post by **bob** » Wed Feb 06, 2008 11:23 pm

Ross Boyd wrote:A few days ago Steve Maughan and Uri discussed a K+P ending which required knowledge of KPvKQ to solve it. Quote below...

Steve Maughan wrote:
Here a position that's been discussed on Susan Polgar's blog (http://susanpolgar.blogspot.com/2008/01 ... stuff.html)

[D]8/8/7p/2p5/2P1K3/1kP1P3/8/8 w - -

It's a draw. But many of the top engines think that Kf5 wins for white. The engines that seem to have problems are Glaurung 2, Naum 2.2 and Rybka - Shredder and Hiarcs see that it's a draw. It's probably a position where the square of pawns rule breaks down. Anyway I thought it was of interest

Uri replied:

The main problem here is that glaurung does not see that the following is a draw

[D]4Q3/8/8/8/7K/8/1kp5/8 w - - 0 9

Uri
I agree with Uri but when I tested Trace 2 I came up against a fairly common problem... evaluation horizon effect...

T2 understands that Uri's example is drawn - however, she avoids capturing the h pawn so that her KPKQ evaluation doesn't kick in and return a draw. In the original position she scores Kf5 as +6.75 @ 26 ply depth.

Also, mighty Rybka 2.3.2a gets to 25 ply and scores Kf5 as +7.29

Evaluation horizon effect is a pain in the a$$ to solve. If I write a KPPKQ recognizer the same horizon effect will occur when there is KPPPKQ on the board. ... so I have to write a KPPPPKQ ... and so on...

Turn off EGBB and EGTB and see if your engine can solve it.

Any ideas on how to fix this one?

Ross

This is not that uncommon a theme in chess programs. It is a subset of a larger problem known as an "evaluation discontinuity". Whenever a single move can change things enough to make a large change in the evaluation, that move can either be played, or avoided, at points in the tree where it gives maximal advantage to the side with the option.

In the classic horizon effect, where one side gives up pawns to avoid losing a trapped piece, only to still lose the piece, the program staves off the loss by forcing the opponent to react to intervening threats that do nothing but occupy his time, still leaving the original problem in place. That is something that will happen here as you saw. If you know that KQ vs KP with the P is a bishop pawn on the 7th rank supported by the king is a draw, then if you have the queen, you will not trade of capture anything so that you can keep your KQ vs KPR material advantage, which is better than ripping the rook and leaving yourself in a drawn position. Of course it is drawn no matter what, but it is "when" the draw becomes visible that counts. One solution is to score it differently, so that K+bP on 7th is drawish if the opponent can't quickly win the pawn. Regardless of what other material the opponent has. IF KP vs KQ is drawn, then certainly KPP vs KQ can't be worse for the KPP side that just KP...

Those are the interesting sorts of problems one has to solve, or these enormous tree searches we do will find literally billions of ways to creatively screw up or evaluation intentions...

yoshiharu · Post by **yoshiharu** » Thu Feb 07, 2008 3:13 am

bob wrote:
This is not that uncommon a theme in chess programs. It is a subset of a larger problem known as an "evaluation discontinuity". Whenever a single move can change things enough to make a large change in the evaluation, that move can either be played, or avoided, at points in the tree where it gives maximal advantage to the side with the option.

For this effect to show up, are transposition tables necessary?

Cheers, Mauro

Tony · Post by **Tony** » Thu Feb 07, 2008 9:43 am

bob wrote:
... Of course it is drawn no matter what, but it is "when" the draw becomes visible that counts. One solution is to score it differently, so that K+bP on 7th is drawish if the opponent can't quickly win the pawn. Regardless of what other material the opponent has. IF KP vs KQ is drawn, then certainly KPP vs KQ can't be worse for the KPP side that just KP...

Those are the interesting sorts of problems one has to solve, or these enormous tree searches we do will find literally billions of ways to creatively screw up or evaluation intentions...

Unfortunately, you found another good way to get into trouble.

KPPKQ can most certainly be worse than KPKQ because a lot of the defences are based on stalemate, wich most likely are not possible with an extra pawn.

The same idea with KNNKP ( which is definately worse than KNNK)

Tony

bob · Post by **bob** » Thu Feb 07, 2008 7:27 pm

yoshiharu wrote:
bob wrote:
This is not that uncommon a theme in chess programs. It is a subset of a larger problem known as an "evaluation discontinuity". Whenever a single move can change things enough to make a large change in the evaluation, that move can either be played, or avoided, at points in the tree where it gives maximal advantage to the side with the option.

For this effect to show up, are transposition tables necessary?

Cheers, Mauro

No. Here's a trivial case to illustrate. Suppose your king safety scores can go from 0 to 1.5 pawns. And suppose you turn that off at some point, let's say when you remove the queens. Now the program is faced with the loss of 1.5 pawns (your opponent's king is terribly exposed) if queens are traded, and it is going to try to avoid that as long as possible. Or it will try to do it at a point where it is to its best interest to suddenly turn that off. So you might well wreck your own king safety, which brings up your opponent's king safety score, and now you trade queens because you both lose your "attacking bonus". And all you did was wreck your position to make it worthwhile to trade queens. There are lots of places where a discontinuity in the evaluation can produce results you never imagined in your wildest dreams. Because the program will use those discontinuities in very unusual ways thanks to the enormous tree search space it has to play around with.

Ugly.

And yes, I have seen lots of similar cases over the years.

bob · Post by **bob** » Thu Feb 07, 2008 7:28 pm

Tony wrote:
bob wrote:
... Of course it is drawn no matter what, but it is "when" the draw becomes visible that counts. One solution is to score it differently, so that K+bP on 7th is drawish if the opponent can't quickly win the pawn. Regardless of what other material the opponent has. IF KP vs KQ is drawn, then certainly KPP vs KQ can't be worse for the KPP side that just KP...

Those are the interesting sorts of problems one has to solve, or these enormous tree searches we do will find literally billions of ways to creatively screw up or evaluation intentions...
Unfortunately, you found another good way to get into trouble.

KPPKQ can most certainly be worse than KPKQ because a lot of the defences are based on stalemate, wich most likely are not possible with an extra pawn.

The same idea with KNNKP ( which is definately worse than KNNK)

Tony

I completely agree. But it was just an illustration of the discontinuity problem and how it can affect things in negative and unexpected ways...

Tony · Post by **Tony** » Fri Feb 08, 2008 11:18 am

bob wrote:
yoshiharu wrote:
bob wrote:
This is not that uncommon a theme in chess programs. It is a subset of a larger problem known as an "evaluation discontinuity". Whenever a single move can change things enough to make a large change in the evaluation, that move can either be played, or avoided, at points in the tree where it gives maximal advantage to the side with the option.

For this effect to show up, are transposition tables necessary?

Cheers, Mauro
No. Here's a trivial case to illustrate. Suppose your king safety scores can go from 0 to 1.5 pawns. And suppose you turn that off at some point, let's say when you remove the queens. Now the program is faced with the loss of 1.5 pawns (your opponent's king is terribly exposed) if queens are traded, and it is going to try to avoid that as long as possible. Or it will try to do it at a point where it is to its best interest to suddenly turn that off. So you might well wreck your own king safety, which brings up your opponent's king safety score, and now you trade queens because you both lose your "attacking bonus". And all you did was wreck your position to make it worthwhile to trade queens. There are lots of places where a discontinuity in the evaluation can produce results you never imagined in your wildest dreams. Because the program will use those discontinuities in very unusual ways thanks to the enormous tree search space it has to play around with.

Ugly.

And yes, I have seen lots of similar cases over the years.

Nice example. But is this obvious explanation correct ?

How do you see the difference between this and something I thought I observed lately ?

Kingsafety scoring is non linear. Suppose kingsafetyscore at this moment is .5 pawn. (mine is +1, yours plus 0.5)

Now by wrecking my own kingsafety, I can continue the attack. My attackindex goes up with 30 yours with 50. But because I'm allready on plus 1 score, the value of my 30 is 0.5 pawn while your 50 is also worth 0.5 pawns.

So by ruining my kingsafety and giving you more extra counterplay, I kept my +0.5 score

If this conclusion is correct, the kingsafety scoring is too much exponential up.
Problem only is how to see the difference between discontinuity and non-linearity.

Both seem to have the same beheaviour. Giving the oppenent too much counterplay until it's called a deadly king attack.

Tony

Uri Blass · Post by **Uri Blass** » Fri Feb 08, 2008 1:17 pm

Tony wrote:
bob wrote:
yoshiharu wrote:
bob wrote:
This is not that uncommon a theme in chess programs. It is a subset of a larger problem known as an "evaluation discontinuity". Whenever a single move can change things enough to make a large change in the evaluation, that move can either be played, or avoided, at points in the tree where it gives maximal advantage to the side with the option.

For this effect to show up, are transposition tables necessary?

Cheers, Mauro
No. Here's a trivial case to illustrate. Suppose your king safety scores can go from 0 to 1.5 pawns. And suppose you turn that off at some point, let's say when you remove the queens. Now the program is faced with the loss of 1.5 pawns (your opponent's king is terribly exposed) if queens are traded, and it is going to try to avoid that as long as possible. Or it will try to do it at a point where it is to its best interest to suddenly turn that off. So you might well wreck your own king safety, which brings up your opponent's king safety score, and now you trade queens because you both lose your "attacking bonus". And all you did was wreck your position to make it worthwhile to trade queens. There are lots of places where a discontinuity in the evaluation can produce results you never imagined in your wildest dreams. Because the program will use those discontinuities in very unusual ways thanks to the enormous tree search space it has to play around with.

Ugly.

And yes, I have seen lots of similar cases over the years.
Nice example. But is this obvious explanation correct ?

How do you see the difference between this and something I thought I observed lately ?

Kingsafety scoring is non linear. Suppose kingsafetyscore at this moment is .5 pawn. (mine is +1, yours plus 0.5)

Now by wrecking my own kingsafety, I can continue the attack. My attackindex goes up with 30 yours with 50. But because I'm allready on plus 1 score, the value of my 30 is 0.5 pawn while your 50 is also worth 0.5 pawns.

So by ruining my kingsafety and giving you more extra counterplay, I kept my +0.5 score

Tony

Your conclusion is not correct because my king safety calculation is not done in this way.

I first calculate one attack index based on the difference of the score of both kings and only after I do it I put the number into a table.

From movei's code:

Code: Select all

scorelight=calckingsafetylight&#40;);
		scoredark=calckingsafetydark&#40;);
		score=scoredark-scorelight;
		if &#40;score>0&#41;
			score=kingbonus&#91;score&#93;;
		else
			score=-kingbonus&#91;0-score&#93;;

kingbonus is non linear but of course in your example I do not keep the +0.5 score.

Uri

bob · Post by **bob** » Fri Feb 08, 2008 6:00 pm

Tony wrote:
bob wrote:
yoshiharu wrote:
bob wrote:
This is not that uncommon a theme in chess programs. It is a subset of a larger problem known as an "evaluation discontinuity". Whenever a single move can change things enough to make a large change in the evaluation, that move can either be played, or avoided, at points in the tree where it gives maximal advantage to the side with the option.

For this effect to show up, are transposition tables necessary?

Cheers, Mauro
No. Here's a trivial case to illustrate. Suppose your king safety scores can go from 0 to 1.5 pawns. And suppose you turn that off at some point, let's say when you remove the queens. Now the program is faced with the loss of 1.5 pawns (your opponent's king is terribly exposed) if queens are traded, and it is going to try to avoid that as long as possible. Or it will try to do it at a point where it is to its best interest to suddenly turn that off. So you might well wreck your own king safety, which brings up your opponent's king safety score, and now you trade queens because you both lose your "attacking bonus". And all you did was wreck your position to make it worthwhile to trade queens. There are lots of places where a discontinuity in the evaluation can produce results you never imagined in your wildest dreams. Because the program will use those discontinuities in very unusual ways thanks to the enormous tree search space it has to play around with.

Ugly.

And yes, I have seen lots of similar cases over the years.
Nice example. But is this obvious explanation correct ?

How do you see the difference between this and something I thought I observed lately ?

Kingsafety scoring is non linear. Suppose kingsafetyscore at this moment is .5 pawn. (mine is +1, yours plus 0.5)

Now by wrecking my own kingsafety, I can continue the attack. My attackindex goes up with 30 yours with 50. But because I'm allready on plus 1 score, the value of my 30 is 0.5 pawn while your 50 is also worth 0.5 pawns.

So by ruining my kingsafety and giving you more extra counterplay, I kept my +0.5 score

If this conclusion is correct, the kingsafety scoring is too much exponential up.
Problem only is how to see the difference between discontinuity and non-linearity.

Both seem to have the same beheaviour. Giving the oppenent too much counterplay until it's called a deadly king attack.

Tony

I didn't say that example would affect everyone. It was just an example. Do I want to give up .3 here to keep +.5? And then on the _next_ move do I want to give up another .3 to keep the +.5? I am now down .1 overall, which is exactly how the old material horizon-effect cases happened, except then you would give up a pawn or two or three to avoid giving up the piece, only to be forced to give up the piece anyway. And you went from a totally equal position to one that is dead lost, thinking you were saving a piece that was unsavable given enough depth.

If you make your "uncastled penalty" big enough, you can see the same thing. You will wreck your own position to keep your opponent from castling, and when he finally does, you realize you are now completely lost...

In essence, anytime something "big" happens at some instantaneous point, you give the search an opportunity to use that in ways completely unexpected... These discontinuities are probably safe when they are 100% correct. For example, a passed pawn runs and promotes. +7 is doable (+8 is not so good as you might just keep the pawn since you get the same score with the pawn that can run or with the queen after promotion. Unless your opponent forces you to move the pawn to avoid losing it). So if the "boundary condition" is very accurate, it might be safe. but many such conditions are "iffy" at best, and then the discontinuity can be used in ways you did not anticipate.

hgm · Post by **hgm** » Fri Feb 08, 2008 6:58 pm

Evaluation discontinuities are hard on the search, but if they are a reality of life, the search will have to deal with them (through a smarter extension policy).

If your opponents King is so exposed that your best estimate is that (considering the Pawn structure) he will lose two Pawns becasue of double attacks (check+Pawn) by the Queen, then the two Pawn advantage that evaporates on trading Queens before cashing in on the opponent's bad King safety is a real effect. Your Queen is worth 1100cP, and the opponent's Queen only 900cP in this situation.

This is not different from having a Rook against the opponent's Bishop. There it also would be unwise to 'trade' the Rook for the Bishop. If your search cannot handle the 200cP loss that will materialize at the instant of the trade, there is no way you could fix it by fiddling with the King safety. If the opponent can force you to lose the exchange, your search will have to be able to handle it, or you will see classical horizon effect. Scrificing Pawns for no end, or wrecking King safety, might be manifestations of that.

The castling example (from which, unfortunately, Joker suffers a lot, I must confess) seems more an example of mis-evaluation: the discontinuity is not real there in the first place. But that doesn't prove that discontinuities are bad: a continuous mis-evaluation can be just as bad. In this case the possession of castling rights is apparently under-estimated, so that the engine think it has an unrealistically large advantage when the opponent has not castled yet, but is still allowed to do so. It should be solved by more realistic evaluation of the rights.

Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval