Basic endgame tests

hgm · Post by **hgm** » Thu Aug 11, 2022 8:27 pm

likeawizard wrote: ↑Thu Aug 11, 2022 7:29 pm With my basic understanding evaluation discontinuity is a problem the transition from middle game eval to endgame eval is accomplished in an ungraceful manner. Where a continuous improvement of the position can at an arbitrary point switch to a different evaluation model and completely change the picture. Wouldn't this result in the search algorithm exploring potentially good positions only to flip back and forth. Isn't there value in variations progressing more gracefully and continuously?

The evaluation function would know whether the condition that dictates the switch between one method of scoring features and another is satisfied or not. So there would be no 'flipping'; it is just that some leaves would be evaluated according to one method, and other leaves according to another method. When that reflects reality, blurring the transition would just lead to misevaluation.

A good example is promotion of a passer; by advancing it 6 steps from its initial position you will in the end gain some 900cP. But virtually all of that comes from pushing it from the 7th to the 8th rank. Awarding the gain gradually 'to avoid a discontinuity' as 150cP per step will lead to a ludicrous over-estimate of the value of a 7th-rank passer.

If almost every aspect of evaluation is highly discontinuous, why would it be a special problem for search if the transition from middle-game to end-game is also discontinuous?

likeawizard · Post by **likeawizard** » Thu Aug 11, 2022 8:50 pm

I am not sure if what I am saying is outright wrong or I am just bad at explaining myself, as the counter examples you give don't really reflect on what I am trying to convey.

I very recently wrote code for exactly the scenario of advancing pawns and gave it quite a lot of thought. Initially I remember some chess commentators saying that a protected passed pawn on the 7th rank could be valued as high as a rook. But then it dawned on me what consequences that would bring. In such a situation the evaluation function would outright favor a knight trade for the pawn even if the pawn has actually zero tactical potential of ever promoting. So gradually blurring a 100cp pawn on rank one to 900cp queen makes little sense.

My idea was more in the context of figuring out how to deal with King safety versus King activity. Does it make sense to define a hard switch between middlegame and endgame? Or does it make sense to have both of those factors always present but on a sliding scale depending how endgame-ish the position is. On one hand it runs the risk of disregarding king safety when one shouldn't on the other hand one might have a advantageous king position in endgame proper.

Thanks for the comments though it will certainly give me some food for thought.

algerbrex · Post by **algerbrex** » Thu Aug 11, 2022 9:47 pm

likeawizard wrote: ↑Thu Aug 11, 2022 8:50 pm ...

I think what HGM may be getting at (or maybe I misunderstand his point too, in which case, take these as my own thoughts) is that evaluation discontinuity is not inherently bad. It all depends on context.

To take a rather extreme and albeit contrived example, suppose engine A's evaluation jumped from +0.20 to +7.00 in a single move. Would you know that A's evaluation function has a problem with discontinuity? Well, it would depend. If in that single move A's opponent hung their queen, A's evaluation is perfectly justified in jumping to such a high score since the position has changed so drastically within a very short sequence of moves. On the other hand, if A's opponent just moved their knight to an outpost square, then definitely A's evaluation function has an issue with discontinues. But we only know this based on more context than just a score jump.

Now, consider the following position:

[fen]8/5p1k/5p1p/pp6/5P1P/1P2P1P1/P2qQK2/8 b - - 5 49[/fen]

This was from a blitz game recently between two different versions of my engine. Black decided to take on e2 and trade queens. In doing so, it traded into a losing king-pawn endgame.

Now suppose the engine playing white had a special king-pawn evaluation function, which was only called in king-pawn endgames. And suppose once the queenless position from the above position was reached, that special king-pawn evaluation function made the score shoot up from +1 to +5. That's a huge jump in score, and definitely a discontinuity. But again, when we consider the context, it's not a bad jump in discontinuity, since when black decided to trade into the queenless endgame, they very quickly drastically change the nature of the position, and it went from being almost equal to losing for black.

Now the reason that tapered evaluation is so helpful and often boosts an engine's strength by 100-200 Elo is that it helps to get rid of unjustified evaluation discontinuity jumps caused by very rough metrics of trying to determine when to switch the evaluation between middle game and endgame "mode". But building on top of this tapered evaluation by having functions that are dispatched appropriately depending on the endgame reached is also appropriate and can make an engine stronger by helping it to recognize when evaluation discontinuities should occur.

likeawizard · Post by **likeawizard** » Fri Aug 12, 2022 6:55 am

I think in some way I have a general bias from physics where discontinuities are almost always seen as a bad thing where the model no longer makes sense.

Does your last paragraph imply that tapered evaluation is only beneficial as a band-aid on badly constructed evaluation functions to smooth out these 'unjustified' jumps? Or does it have some universal utility always?

I have a feeling that unlike move generation and in many ways the search algorithm of the program where you can almost always make objective improvements in performance, the eval function is more 'dark arts'. It kind of lives on a slider - fast&dumb vs slow&smart and the right solution can be almost anything from one extreme to the other and anything in between.

But thanks to both of you and HGM for your input. I have the belief that to get something answered and explained it is best to provide the wrong answer yourself and have someone else correct you. Hope I don't come of as a belligerent idiot in the process.

hgm · Post by **hgm** » Fri Aug 12, 2022 10:19 am

likeawizard wrote: ↑Thu Aug 11, 2022 8:50 pm My idea was more in the context of figuring out how to deal with King safety versus King activity. Does it make sense to define a hard switch between middlegame and endgame? Or does it make sense to have both of those factors always present but on a sliding scale depending how endgame-ish the position is. On one hand it runs the risk of disregarding king safety when one shouldn't on the other hand one might have a advantageous king position in endgame proper.

Well, the purpose of the heuristic evaluation is to give an indication for how far removed you are from the goal of winning. If that changes abruptly as a function of a certain measure of the position, then you will have to live with that. I mentioned promotion, and trading of the last Pawn, but trading into a Pawn ending is another good example of an event that completely alters the perspective for winning. In all these cases it works best to get a large score change when the transition occurs, but award a small fraction of the gain (say 5-10%) to changes in the measure that lead up to the event. E.g. for promotion you want to award advance of the passer even when it is still far away from promotion, for drawishness you want to award capturing enemy Pawns even when he still has several, and when the Pawn structure would be winning, you would want to encourage trading of pieces even when there still are several. But only a small fraction of the eventual reward.

It seems to me that it is either safe or suicidal for a King to leave his shelter and move to the front line. And that in situations where it is still suicidal it would not be very helpful to penalize that very little because you are nearly in a material composition where it would be safe. So that it might do it anyway in order to prevent a minor loss (such as creation of an opponent passer), and get checkmated as a result. As my math teacher used to say: "Nearly correct... Therefore faulty!"

Basic endgame tests

Re: Basic endgame tests

Re: Basic endgame tests

Re: Basic endgame tests

Re: Basic endgame tests

Re: Basic endgame tests