End-game evaluation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: End-game evaluation

Post by Evert »

hgm wrote: In Spartan Chess the two lightest pieces of the Spartans (Captain and Lieutenant) have both color-changing and color-preserving moves, and both have a value close to that of Knight/Bishop. So in Spartan Chess you can have a situation where the Persions, with BPP, have difficulty to overcome a defence bya single C or L staged purely on the other color than the Bishop, the Captain only using its (2,0) moves, and the Lieutenant only the (1,1) and (2,2) moves, after they switched to the right color. So these end-games could have the same drawishness as that of unlike Bishops in orthodox Chess, easily fencing off a majority of 2 Pawns.
Sjaak has some startup-logic to determine whether pieces are colour bound and it has a term for them in the evaluation. I think it mainly uses it to give a pair bonus, but I've certainly thought about a few generalised endgame evaluation terms (mainly, keep the king away from the corner of a colour bound piece, but also considering an edge pawn with a promotion square on the colour of the piece to be slightly better).
I don't remember whether I took the Lieutenant's colour-changing ability into account or not. I think I deliberately didn't, because I thought a pair bonus would be good for them, but of course I didn't have the resources to test this idea.
I'm very interested in the idea of colour-weakness, and I think I could make it an important evaluation term in Sjaak (maybe not Jazz, where I can use much more game-specific evaluation terms), but I don't seem to have found a form for the evaluation term that actually works...
For the Spartans, and in Berolina Chess, which uses the same Pawns that capture straight and move diagonally, another interesting phenomenon occurs. Pawns become effectively color-bound (as long as they don't capture), and you could get into positions where the color-bound Bishop cannot hinder them at all, because they are on the other color. They just march on to promotion. So the concept 'bad Bishop' can take on dramatic dimensions, the Bishop becoming virtually worthless as a defender (other than being a sink of moves to avoid zugzwang).
The berolina pawns certainly seem to become dangerous passers much more easily. I find them hard to evaluate in an ending.
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: End-game evaluation

Post by mjlef »

Two of the classic rules used in many programs are these:

1. if side X has no pawns, he need to be a rook or material or more ahead to win. (example: BN vs B is a likely draw). One exception is RR vs NB, which is often a win for the rook side.

2. if side X has a single pawn, and the other side a minor piece, see if rule 1 applies after the apparent losing exchange. (example NP vs B is usually a draw).

Now has a material imbalance table in it way back in the 1980's. of course, it was a lot smaller and only started working when each side was down to 1 Q, 1 R, 1 B, 1 N and at most 3 pawns. This was because in a world where you only had 640 kbytes for everything, there was not much room for a bigger table.

Note that NN vs nothing is a draw, so you need to check for this in the above two rules.
User avatar
hgm
Posts: 27793
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: End-game evaluation

Post by hgm »

Yes, there are exeptions. KBBKN is also often won. (Without the 50-move rule it would be a general win.)

But most of the time you need to be a major piece ahead in order to win, i.e. one that can deliver checkmate against a bare King. Such a piece can be quite a bit weaker than a Rook, though:

Of all fully symmetric pieces with 8 move targets the Knight is in practice the strongest. But there are two amongst those that do have mating potential, despite the fact that their middle-game value is less than a Knight: the non-royal King (a.k.a. Man), and the (1,0)+(2,0) leaper ('Captain' in Spartan Chess). I did some tablebases with those, and usually they seem good for a win.

1) KCN-KN -> won
2) KCN-KB -> won
3) KCB-KN -> won
4) KCB-KB -> won
5) KMN-KN -> won
6) KMN-KB -> won
7) KMB-KN -> won
8) KMB-KB -> won
9) KCE-KN -> won
10) KCE-KB -> won
11) KME-KN -> won
12) KME-KB -> won
13) KEE-KN -> draw
14) KME-KD -> won
15) KDE-KD -> won (but takes ~65 moves)
16) KME-KM -> mostly draw
17) KDE-KM -> mostly draw
18) KQ-KM -> mostly won, but...


Here E designates the (1,1)+(2,2) leaper, ('Modern Elephant'), the short-range equivalent to the Bishop. We see that in the end-game the E is slightly weaker than B: KEE-KN is generally drawn where KBBKN is a win. Yet in combination with the Captain B, N and E can all beat both N and B (like as well as unlike, when this arises).

The results 9-12 are relevant to Spartan Chess: the Lieutenant there is an E with a color-changing non-capture added to it. My tablebase generator cannot handle that, because this extra move breaks the 8-fold symmetry (the Lieuteant can only do it sideways). But since it is upward compatible with E, and end-game that is already won with E will certainly bewon with L instead. Similarly with the M: Spartan Chess has no M, but it has two Kings, and the Spartans can choose which of them they expose to check. so again this is upward compatible with having K+M. So it seems in Spartan Chess most combinations of light pieces can beat a single B or N.

Apparently having mating potential in the extra piece is the decisive factor here. Not so much that the pieces with mating potential are stronger. E.g. as defenders they are not necessarily better: KDEKD is generally won (albeit slowly), despite the fact that you are only a color-bound E ahead. Subduing KM is harder; M can be an excellent defender. This is evidenced by the fact that KQKM has a substantial number of draws. It seems mostly won, but it turns out that all positions where the K and M connect are in fact draws. It is only because the K and M are so easily forked by Q when they are far apart and undefended, and that it would take many moves to walk them towards each other, that the Q has so many wins. They are basically all tactical wins in 2 or 3 moves. The short-range D and E do not have the tremendous forking power of the Queen, so the defending M has much better opportunity to unite with its King.
User avatar
hgm
Posts: 27793
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: End-game evaluation

Post by hgm »

Interesting for Spartan Chess is that when I give the Modern Elephant extra (1,0) non-capture moves in all 4 directions, two of them can actually overwhelm a Knight. The Spartan Lieutenant is the exact intermediate between this enhanced E and the basic E. When I take that average in another way, by using one enhanced E plus a basic one, the end-game is only marginally above criticality: eventually it is generally won, won the number of won positions per DTC bin does hardly increase from the beginning, and at times falls back to a very low value, to revive again, until finally after ~200 moves, it explodes and exhausts all remaining positions.

Soit is not clear whether two genuine Spartan Lieutenants will be able to beat the Knight. I guess I should really suppress the diagonal symmetry in the generator, and do the true calculation.
User avatar
hgm
Posts: 27793
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: End-game evaluation

Post by hgm »

The explanation of course is that when you are a light piece ahead, but none of your remaining pieces has mating potential by itself, you cannot trade into a won end-game. You must win additional material, while trading becomes a weapon for the defense, which you have to dodge. E.g. with 2 vs 1, when you have BB or BN, you really must win the remaining defender without any compensation, and any trade is curtains. But if you have a Captain or Man amongst your two, you merely have to trade the other. A big difference!

When a Rook-class piece is involved, the advantage is typically the exchange. (In Fruit's heuristic, a Rook is counted as 2 minors, so it also considers the exchange as being one minor ahead.) With R vs B or N you also have the problem that you have nothing to trade, because all your force is in a single piece, and must win the defender for naught. With RB or RN vs R you have B or N to trade, but you must 'trade' it for a superior piece, which in general is quite difficult (or the piece would not be very superior...).

Another remark on BN vs N is that you must win the N, while trading is fatal, which makes your own N useless for attacking the hunted N, as such attacks between equal piece type will be reciprocal. Replacing the N of the attacker by another light piece, e.g. the (1,0)+(2,2) leaper featuring in the Clobberers army of Chess with Different Armies, many more positions are won (though still quite far from nearly all). With BN vs B or BB vs B you suffer the same problem with respect to your B, while with NN you are doomed from the beginning for lack of mating potential. The only orthodox combination of 2 vs 1 minor that does not suffer from any of these problems is thus BB vs N, which indeed is a win. (The B-pair bonus probably also helps to make this possible; if you don't take for granted that the B are on unlike color, of course KBBKN is also only 50%won...)
User avatar
hgm
Posts: 27793
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: End-game evaluation

Post by hgm »

I found a few other interesting Spartan end-games: Q vs 2 light pieces. In Chess this would be KQKBB, KQKBN and KQKNN. The Spartan equivalents are KQKCC, KQKLC, and KQKLL. Now the statistics of such end-games suggest they are won by the Queen, but this is mostly because the overwhelming majority of positions will have at least one of the light pieces unprotected, after which it will be an easy prey for the Queen through checks. So almost all wins in end-games like this are fast tactics. This eclipses what happens in the small but important minority of positions were all pieces are protected.

In KQKBN, for instance, there is a fortress draw, based on the position

[d]8/8/8/8/K2n4/7Q/1b6/1k6 w - - 0 1

This fortress is impenetrable, as the defending King can safely step between b3-a3-a2-b1-c1-c2, keeping the Bishop defended, while the the minors lockout the attacking King at a safe distance. It does not even crumble when the attacker has an Amazon (= QN compound) in stead of a Queen.

In Spartan Chess there are such fortresses too; with the usual substitutions L=B and R=C:

[d]8/8/8/8/4K3/3c1c2/7Q/4k3 w - - 0 1
[d]8/8/8/8/4K3/3r1r2/7Q/4k3 w - - 0 1

and

[d]K6Q/8/8/8/8/3l4/3c4/3k4 w - - 0 1
[d]K6Q/8/8/8/8/3b4/3r4/3k4 w - - 0 1

Another gend-game is 2 light pieces vs Rook. In Chess this is usually trivially drawn, because all you have to do is sac the Rook for oneof the minors, and the remaining minor will not have mating potential. In Spartan Chess both Captain and an extra King do have mating potential. When one of the light pieces is Lieutenant, which does not have mating potential, the Rook can succesfully employ the same strategy, and harrass eithe King or the light piece with mating potential. But when both the light pieces do have mating potential, it gets tricky. Every Rook sac is now fatal, and the attacking side can afford to ignore attacks on its light pieces, so you can only harass the King, which will be hiding behind its pieces.

Turns out that KRKCC is still draw, but KRKKC is a win for the Spartans. (KRKKK would also be a loss for the Rook, but in Spartan Chess you cannot have more than two Kings.) Even with predesignated royal and non-royal Kings it would be a win for the Spartans, but the possibility toexpose either of them to check when you still have two makes live of course a lot easier. Now the extra K is worth nearly a Rook even in the middle game, because of the 'delocalized royalty' it offers (the tactical middle-game value of a non-royal King would only be ~250cP). So it seems this high value, making it significantly better than other light pieces, is retained all the way to the end-game. (Remember KQKK is also draw, while KQKL and KQKC are easy wins.)
User avatar
hgm
Posts: 27793
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: End-game evaluation

Post by hgm »

The Fruit system does havea weak point:

With 1 minor vs 2 Pawns, it is conceivable the minor is ahead. And because it has no mating potential, Fruit gives it mutiplier 0, which will result in a 0.00 score. The Pawns won't get a discount (multiplier 1), but that does not help them when they are behind.

Now 1 minor vs 1 Pawns will also suffer from this, as that makes it even more likely the minor is ahead. This is all fine when you are playing the minor,as it correctly tells you there is not a shred of hope to win. But when you areplaying the Pawns, it cannot see that KNKP is a tad worse than KNKPP (or even KNKPPP with very poor Pawns), and will start playing purely random moves when a positive score is beyond the horizon. This will likely blunder the Pawns away without any reason, while fighting to keep the Pawns, move your King to the center, push the Pawns when safe, could conceivably have caused positive scores to come within the horizon, and perhaps even a win. Especially against a fallible opponent...

So I guess a multiplier 0 is a bad idea, even when your chanches for winning can be mathematically proven not to exists. Because it completely blinds the side that could still win, and prevents it from even trying.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: End-game evaluation

Post by Desperado »

hgm wrote:The Fruit system does havea weak point:

With 1 minor vs 2 Pawns, it is conceivable the minor is ahead. And because it has no mating potential, Fruit gives it mutiplier 0, which will result in a 0.00 score. The Pawns won't get a discount (multiplier 1), but that does not help them when they are behind.

Now 1 minor vs 1 Pawns will also suffer from this, as that makes it even more likely the minor is ahead. This is all fine when you are playing the minor,as it correctly tells you there is not a shred of hope to win. But when you areplaying the Pawns, it cannot see that KNKP is a tad worse than KNKPP (or even KNKPPP with very poor Pawns), and will start playing purely random moves when a positive score is beyond the horizon. This will likely blunder the Pawns away without any reason, while fighting to keep the Pawns, move your King to the center, push the Pawns when safe, could conceivably have caused positive scores to come within the horizon, and perhaps even a win. Especially against a fallible opponent...

So I guess a multiplier 0 is a bad idea, even when your chanches for winning can be mathematically proven not to exists. Because it completely blinds the side that could still win, and prevents it from even trying.
Hi HG,

well, indeed multipliers arent a good solution.
Currently this topic is one of my next points i want to work off.
Instead of using multipliers it makes more sense to me using offsets for
such cases.

In many cases 2 pawns can be stopped by a minor piece and the side with
the minor can draw, but if there is an advantage it must be for the side which has the pawns of course.

Minor against 3 pawns (without structural damage) can very hard to defend,
especially for a knight and seem to be a clear advantage for the pawn-side.

More than 3 pawns (without structural damage) is a clear advantage and will be won more often than it will be drawn.

One general problem i see (independant of adjustments with multipliers or offsets) is that
somehow the search cannot handle abrupt changes with material weights.

As example:

lets say i dont have any material scaling and i start with the simple draw configurations like KNk,KBk ...

- A more accurate evaluation can lead to a much optimistic
or pessimistic search.
- can produce wrong behaviour like KNkp, where the side with the minor
avoids capturing the pawn. And suddenly by undesired nullmove
behaviour the pawn will be promoted...

I think the difficulty is, when to scale material values, it should contain
sth. like "material continuity", so especially selective search can handle it
well too. (Think of KNpkpppp vs KNkppp, changing a pair of pawns can be
a score jump from +1(about) to -2.5(scaled) without capturing a piece.
Now how sensitive will the search (pruning techniques) be ? Difficult...

My guess is, that it is not that important to scale the special cases only.
(That will just randomize a little your engine. (optimistic,pessimistic play),
but not necessarily playing strength)
It is logical to me, that the material configurations leading to the special cases must scaled in the same directions,
so you get a more soften change. Thats one of my tasks for the next weeks :-).

So, multipliers arent the best solution, but at least they give a wink into
the right direction.

cheers, Michael
User avatar
hgm
Posts: 27793
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: End-game evaluation

Post by hgm »

Well, I think that a perfect system would need both offsets and multipliers. Originally my material table just implemented offsets. But the problem is the large variabilty of the Pawn value. In KBK or KNK you could simply cance the base value of the piece (although that calls for pretty large offsets), and the minor positional bonus is small enough to not cause any problems. (E.g. if you go for an illusory +0.15, in stead of a real +0.1 in a still very simple precursor end-game, your prospects for winning must have been pretty close to 0 anyway...)

But in KBKP the Pawn value can range from ~1 to 2.5. If you subtract 2 to make sure the Bishop is not even ahead against a very poor Pawn, the Pawn side would be ahead by +1.5 with a 7th-rank passer. While for the Pawn side this also is pretty much a dead draw, the only Pawn being threatened by a Bishop sac. Definitely not as good as, say KBPKNPP with only a 5th-rank Pawn, which could be close to +1 (for black). So it is OK if KBKP scores in favor of the Pawn, but when the Pawn value increases from 1 to 2.5, the score should more realistically go from 0 to 0.4 (i.e. multiplier 1/4, after an offset +2).

The case KBKPP would need an offset slightly above 1 (to make KBKPP better for black than KBKP, even when the Pawns are pretty poor), but the value of the Pawns could in principle rise to +5, i.e. +3 against the ofsetted Bishop value, which is again rather high for something that can be a hopeless draw. (One unprotected 7th-rank passer being stopped by the Bishop, the other, King-protected 7th-rank passer blocked by the King). So an offset +1.2 combined with multiplier 1/2 seems more realistic here.


For now I solve the problem by never using a multiplier 0, except in the cases of a trivial draw (e.g. KNNKB), where blundering away material of the side that is behind indeed has no consequences whatsoever. That means I can use a pretty simple system, where the material table just has to flag which side(s) have to be discounted (when ahead), and the actual multiplier is then purely determined by the number of Pawns that side has: Pawnless -> x 1/8, 1 Pawn -> x 1/4, more Pawns -> x 1/2. That allowsme to flag both sides for a discount in, say, KBPPPPKBPP, with unlike Bishops. Of couse for most material combinations with 2 or more Pawns a discount is undesirable, but then I simply do not flag those.

The trivial draws are flagged by another code in the material table, because apart from the discounting they can break off the search after the 50-move counter (and ply level!) reaches a value that makes it safe to do so. The number of ply for pruning the node can depend on the material combination (immediate in KK, KNK, 1 in KBKN, 5 in KRKR), and if not pruned, the score is a hard 0.

This does away with the subtlety with which Fruit treats the KNNK* combination (1/16 when the opponnent has Pawns, 0 otherwise), by grouping them all in the Pawnless 1/8 category. (KNNK, KNNKB and KNNKN of course fall in the trivial-draw category, and other opponent combinations where you still can be ahead all have Pawns, so NN couldin theory still win.) In compensation I can do something that Fruit doesn't do, namely discount KBPP or KNPP against KNN or KNNP, (with 1/2, because of 2 Pawns) because black can afford to sac both N.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: End-game evaluation

Post by Desperado »

Interesting stuff :) ...

* Just a word on positional bonuses.

You can also scale the complete evaluation (not only the material value),
and store a flag in your material table when you like to do so.
If you think of opposite bishops for example, even with 2 connected passers ahead
it often will be draw (if they can be easily blocked for example).

There are of course a lot of ideas which can be done if
the material scaling and positional scaling for a material combination
is seperated.

* How do you handle the exchange ? especially if you classify
"being ahead" (i am thinking of constellations like: KRPP - KBNPP).
My feeling and my observations are, that the side with the rook
has the better play in many cases. :?: