End-game evaluation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

End-game evaluation

Post by hgm »

I am getting to the point now to fill Spartacus' material table (which so far only held the B-pair bonus). One of the most important tasks of the material table would be to correct the additive piece-value score for drawish end-games, so that the engine no longer prefers (say) KBK (+325) over KPP (+200).

My question was what end-games need such a correction, and how large that correction should be. Obviously KBK, KNK and KBKB (like B) can be corrected to a hard zero, pruning further search. KNNK is trickier (like KBKN etc., but those don't really need a correction), because there can be checkmates and mate-in-1, and you want the engine to recognize those when it enters KNNK is such a position (e.g. after a forced capture in KNNKQ). So I guess I should evaluate those as zero, but not prune the search (or prune based on the 50-move counter being large enough. I could then apply that same routine to KBKB (unlike), KBKN, KNKN, KRKR (to effect the pruning) if I have it anyway.

Other end-games that need special treatment seem to be:
Strong side has no mating potential:
KBKP, KNKP (corrected to slightly negative ~-0.1)
KBKPP, KNKPP (~ -0.4 ?)
KBKPPP, KNKPPP (~ -0.7 ?, but not as the natural much as when the Pawn side had 4 Pawns)
KNNK... Are problematic. They need a huge correction, and KNNKP could actually be a light advantage for the Knights (although I understand that due to the 50-move rule this will almost always be draw). But what about
KNNKPP
KNNKPPP
KNNKPPPP
...?
Is the Pawn side always better off in those? I guess that with 4 Pawns the Knights might start to face a serious challenge. Furthermore, two Knights is so much that the weak side now also can have a minor, or even a Rook!
KNNKB and KNNKN seem totally drawish (treat like KRKR?)
What if the weak side has 1, 2 or 3 Pawns on top of it? Should this be treated exactly the same as after trading a minor? Or is it slightly better or worse for the Pawn side?

With any other combination of two minors you have mating potential, but only barely so, and if the opponent has non-Pawn material it is very questionable if you will be able to annihilate that without jeopardizing your mating potential.
KBBKN is the only 2 vs 1 minor case that is theoretically won (although also here the 50-move rule makes that very doubtful). How about KBBKNP? I suppose this must be better than KBBKBP, because that would even be draw if you manage to get the Pawn without trading the minor. Of course the natural scoring would award the B-pair here, but that does not explain the difference between a defending B or N.

In general for Pawnless end-games the rule of thumb could be used that this halves your advantage (e.g. KRBKR, KRNKR = +1.5, just at the draw limit, and KRKB, KRKN would be +1, or preferably even somewhat lower). The problem is always how to evaluate it with extra Pawns.
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: End-game evaluation

Post by Evert »

hgm wrote:I am getting to the point now to fill Spartacus' material table (which so far only held the B-pair bonus). One of the most important tasks of the material table would be to correct the additive piece-value score for drawish end-games, so that the engine no longer prefers (say) KBK (+325) over KPP (+200).
Interesting. I'd been wondering what the advantage of having a material table would be (not having one, nor much of an idea of what to fill it with), but this is fairly obvious.
My question was what end-games need such a correction, and how large that correction should be. Obviously KBK, KNK and KBKB (like B) can be corrected to a hard zero, pruning further search. KNNK is trickier (like KBKN etc., but those don't really need a correction), because there can be checkmates and mate-in-1, and you want the engine to recognize those when it enters KNNK is such a position (e.g. after a forced capture in KNNKQ). So I guess I should evaluate those as zero, but not prune the search (or prune based on the 50-move counter being large enough. I could then apply that same routine to KBKB (unlike), KBKN, KNKN, KRKR (to effect the pruning) if I have it anyway.
I think it's useful to distinguish two situations: not enough material remaining to deliver mate (in which case you can claim a draw and terminate the search immediately) and not enough material to force mate against optimal play. The latter is a lot harder to evaluate and almost certainly should not terminate the search. What I do in that case is discard the material evaluation completely (it's irrelevant) and just return PSQ evaluation for the pieces (maybe even only for the kings), resulting in a small draw-ish score.

I'm afraid that's not much help otherwise though...
User avatar
OliverUwira
Posts: 170
Joined: Mon Sep 13, 2010 9:57 am
Location: Frankfurt am Main

Re: End-game evaluation

Post by OliverUwira »

Hello H.G.,

here's my two cents:
hgm wrote:Other end-games that need special treatment seem to be:
Strong side has no mating potential:
KBKP, KNKP (corrected to slightly negative ~-0.1)
KBKPP, KNKPP (~ -0.4 ?)
KBKPPP, KNKPPP (~ -0.7 ?, but not as the natural much as when the Pawn side had 4 Pawns)
I'd rather set the stronger side's score to zero and then do a passed pawn eval for the weaker side. The farther the pawns have come, the more dangerous the situation is. A fixed correction penalty doesn't feel right.
hgm wrote:KNNK... Are problematic. They need a huge correction, and KNNKP could actually be a light advantage for the Knights (although I understand that due to the 50-move rule this will almost always be draw).
You might want to add knowledge of the Troitzky line. Together with bonuses for the king's position (the closer to the edge, the better), KNNKP should be easy to evaluate.

http://en.wikipedia.org/wiki/Two_knight ... itzky_line

Also check the section about the second Troitzky line (further down in the article) - this line identifies the endings that are winnable in 50 moves no matter where the kings are.
hgm wrote: But what about
KNNKPP
KNNKPPP
KNNKPPPP
...?
Is the Pawn side always better off in those? I guess that with 4 Pawns the Knights might start to face a serious challenge. Furthermore, two Knights is so much that the weak side now also can have a minor, or even a Rook!
KNNKB and KNNKN seem totally drawish (treat like KRKR?)

What if the weak side has 1, 2 or 3 Pawns on top of it? Should this be treated exactly the same as after trading a minor? Or is it slightly better or worse for the Pawn side?
I tend to share the idea that 4 pawns will pose the knights some problems. Three might be tough if the pawns are spread far apart. Again a fixed penalty doesn't seem to be right.
hgm wrote: With any other combination of two minors you have mating potential, but only barely so, and if the opponent has non-Pawn material it is very questionable if you will be able to annihilate that without jeopardizing your mating potential.
KBBKN is the only 2 vs 1 minor case that is theoretically won (although also here the 50-move rule makes that very doubtful). How about KBBKNP? I suppose this must be better than KBBKBP, because that would even be draw if you manage to get the Pawn without trading the minor. Of course the natural scoring would award the B-pair here, but that does not explain the difference between a defending B or N.
KBBKNP: I think those endings are reasonably winnable because the knight has serious zugzwang issues when it has to cling onto the pawn. KBBKN has good practical winning chances if you seperate the knight from the king (zugzwang again).

KBBKBP: Dead draw if the weaker side's bishop has legal moves.
hgm wrote: In general for Pawnless end-games the rule of thumb could be used that this halves your advantage (e.g. KRBKR, KRNKR = +1.5, just at the draw limit, and KRKB, KRKN would be +1, or preferably even somewhat lower). The problem is always how to evaluate it with extra Pawns.
Maybe one could load the table with some special numbers that indicate some specialized eval function. E.g penalties range from x to -x so you could index specializations with x+1, x+2 etc...
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: End-game evaluation

Post by jdart »

You are free to look at how I do it (http://www.arasanchess.org). I have two routines to adjust the raw material score, one for the case of pawnless endgames and one for positions with pawns. I don't claim the values I use are optimal but they are reasonably close I think.

If the position cannot be won for either side (like KB vs KB) I will also detect it as a draw during the search and cut off early.

Re piece vs pawns - it is better to have the piece usually, and this advantage increases in the endgame. Generally pawns are better only if very advanced and your scoring should already take care of that. But if the side with the piece has no pawn I don't give a bonus because he can't mate: and I give less of a bonus with few pawns because there's a risk you will end up with a single piece.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: End-game evaluation

Post by jdart »

Another remark. Some of the cases you mention are rare enough that they probably aren't worth special coding for.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: End-game evaluation

Post by hgm »

Actually you can put a lot in material tables:

1) Piece interactions (Bishop pair, but also devaluation of Queen-class pieces by presence of opponent minors, devaluation of Pawns when you have a piece minority).
2) Trading gradients (encouraging game-phase progress when you have a winning advantage, say > +2).
3) Flagging special cases (tablebase / bitbase probing, recognizers).

The way I am setting it up now considers light and dark Bishops as different pieces (so the Bishop index runs 0-3 in stead of 0-2), so I can also put discounts on advantages with unlike Bishops. The table contains bytes (and this makes it occupy already 2.6MB, as I have to account for upto 10 Pawns and upto 3 'Queens' for Capablanca), and I plan to assign the highest codes (say 240-255) for special evaluation routines (e.g. indexing a table of functions, or used in a switch statement), and the rest (value-120) as an additive score.

Fruit handles this purely by multipliers: each side gets assigned a multiplier (usually 1) based on the material, and if it is ahead, the advantage is multiplied with it. Multipliers < 1 occur when you have no mating potential in your pieces, and no Pawns, or a single Pawn that the opponent can dispose of by sacrificing a piece.

I am trying to see a pattern in this that could be generalized to other variants, e.g. Spartan Chess. Each code could indicate a routine that sets a certain combination of multipliers other than (1, 1), there are not that many. Or do some special evaluation (e.g. for KPK, KQKP or KBPK).
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: End-game evaluation

Post by kbhearn »

Many of the pawn cases you mention should hinge most of their advantage on the pawn eval rather than a fixed advantage for the pawns. KNvKPP should probably be viewed as very close to 0 for instance unless the pawns are very fast/far advanced and neither the king nor the knight is nearby to blockade. By 3 pawns it should probably constitute an actual small advantage i would think as 3 pawns can be very difficult for a minor and king to stop even if they're not very far advanced to begin with.

I would think KNNKPP should probably still be slight advantage to the knights (blockading one pawn while winning the other should often be possible). 3 or 4 pawns should probably be 0ish for the knights unless the pawn eval has additional dangers (as saccing one of the knights for 2 pawns shouldn't be too hard - with 3 it's still thinkable that you might be able to blockade one of the pawns and win the other two, so 3 maybe slightly +, 4 slightly -)
PK
Posts: 893
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: End-game evaluation

Post by PK »

another set of drawish situations:

KR vs KB/KN(p) (there are some practical winning chances, so only dividing score by a constant is aplicable)
KRB/KRN vs KR (as above)
KQB/KQN vs KQ (almost certain draw)

all of them get tricky in presence of pawns. Glass evaluates these configurations, but values it uses are a bit off (I have seen Glass going into that kind of endings where simpler defense without sacrificing material was perfectly viable).
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: End-game evaluation

Post by hgm »

I digested the Fruit end-game evaluation, and it seems to be based on the following general principles:

You discount the score of the side that is ahead according to naive additive evaluation through a multiplier < 1, to squash unjustified optimism. This is needed in basically two general cases:
  • A) No Pawns, or 1 Pawn with an enlarged danger of losing it
    B) Color-bound pieces of opposite color
Case (A) reflects that Chess basically is a game of Pawns, decided by how efficiently you can guide them to promotion, forcing the opponent to sacrifice a piece for it. In case (B) each side can be so much ahead on its own color that there is no hope to overcome its defence there.

The system of multipliers is is a more logical one than giving material-based additive bonuses , because of the high variability of the Pawn value, which can range from 50 to 250 cP. This makes it difficult to decide who is ahead for a given material composition, so that an additive discount might award the side that is ahead, and shifts the point of natural equality to a non-zero value, leading to inaccuracies in the decision to seek/avoid draws (through repetitiones etc.).

Cases that need dicounts, ranked according to severity:
  • 1) Without Pawns and no (forced) mating potential in your piece combination, you get of course multiplier 0.
    2) In Pawnless end-games the advantage of a single minor or the exchange is often not enough (e.g. KBNKB, KRKB, KRKN, KRBKR, KRNKR are all general draws). Fruit multiplies these by 1/8 (even if your opponent does have Pawns, provided you are still ahead).
    3) When you are only 1 Pawn away from (1), and the opponent has a piece he can sacrifice for that last Pawn, you get multiplier 1/4.
    4a) When the opponent can convert to (2) by sacrificing a piece for your last Pawn, Fruit discounts by 1/2.
    4b) It also does this whith unlike Bishops (the only color-bound piece inorthodox Chess), and you are not more than 2 Pawns ahead (presumably because the Bishop on its own color has the advantage over upto 2 Pawns when it does not have to deal with the other Bishop as well.
In Fruit the picture is complicated by a few refinements particular to orthodox Chess: KNN is not only an exceptionally valuable combination without forced mating potential, but suffers from the complication that KNNKP can be a forced win. So you cannot discount to 0 for NN when the opponent still has Pawns. Because you can be very much ahead in power, even against 4 Pawns or minor + Pawn, and it is not inconceivable that the Knights could force their way into KNNKP. So Fruit uses 1/16 in this case.

Another refinement is that minor + {Pawn that can be sacrificed away}, which is already bleak, has virtually no prospects at all if the opponent King is blocking the path of the Pawn on a square that a color-bound minor of the attacker cannot cover. Fruit tests for this, and increases the discount from 1/4 to 1/16 in this case. I guess in general this should also be related to how difficult it is to catch the remaining piece of the defender, which he must preserve to avoid the zugzwang that would otherwise force the King to leave his invulnarable defensive position. B and N are way too strong to be cornered by K+B. But is is not inconceivable that other variants have stronger color-bound pieces, capable of hunting down a piece of the defender weaker than N or B. So if in KXPKY the endgame KXKY would be such that you can force baring of the weak side, even though KXK still would be a draw, the favored defensive position in KXPKY might not be worth much.

Finally there are some exceptions for known end-games: KBBKN is an exception to the rule that a single minor ahead is not enough to win without Pawns. (But Fruit still gives it 1/2, probably because the 50-move rule often makes this a draw anyway.) Also the well-known KBPK and KNPK draws with edge Pawns, and KQKP with 7th-rank Pawns have special recognizers that could force a draw score.

An exception that Fruit does not make, but would be logical, is to discount KBPPKNN (where the B side could be ahead with advanced Pawns), because it has only two Pawns, which could be sacrificed away by the two Knights comparatively easily. So this seems to belong in (4a). To me it is also doubtful whether KNNPKB would deserve a discount as bad as KNPKB; it still is true that a B for P sac would reduce the leading side to the useless KNN, but with two Knights it should not be that difficult to screen the Pawn from attack, something that could be hard with only a single minor. The high discount invites premature conversion of KNNPPKBN to KNNPKB by a Knight sac (N+P ahead discounted by 1/4 is +1, thus lower than an undiscounted PP of +2 even without advanced passers). A discount of 1/2 on KNNPPKBN because both Pawns can be sacrificed away with impunity by the BN would alsohelpsuppress this tendency somewhat (but probably not enough all by itself).
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: End-game evaluation

Post by hgm »

The consequences of color-binding in general are actually a quite interesting. The strategy to set up defense against a color-bound piece (+Pawns) on the color where it cannot get is generally applicable, and not limited to unlike color-bound pieces. But in orthodox Chess this is not so noticeable, because a Knight is a color-alternator, and thus needs to be on one color to exert influence over the other, and can be harassed there. A Rook can access both colors, but it is easy to stay on the one not accessible to the opponent. Indeed a Rook has not much trouble defending against BPP, just as an unlike B would have, but that is kind of expected, because a Rook is assumed to be worth two Pawns more than a Bishop.

In Spartan Chess the two lightest pieces of the Spartans (Captain and Lieutenant) have both color-changing and color-preserving moves, and both have a value close to that of Knight/Bishop. So in Spartan Chess you can have a situation where the Persions, with BPP, have difficulty to overcome a defence bya single C or L staged purely on the other color than the Bishop, the Captain only using its (2,0) moves, and the Lieutenant only the (1,1) and (2,2) moves, after they switched to the right color. So these end-games could have the same drawishness as that of unlike Bishops in orthodox Chess, easily fencing off a majority of 2 Pawns.

For the Spartans, and in Berolina Chess, which uses the same Pawns that capture straight and move diagonally, another interesting phenomenon occurs. Pawns become effectively color-bound (as long as they don't capture), and you could get into positions where the color-bound Bishop cannot hinder them at all, because they are on the other color. They just march on to promotion. So the concept 'bad Bishop' can take on dramatic dimensions, the Bishop becoming virtually worthless as a defender (other than being a sink of moves to avoid zugzwang).