draw endgame scaling

brtzsnr · Post by **brtzsnr** » Sun Dec 04, 2016 4:04 pm

Hi!

I'm trying to add some logic to my engine to handle drawish endgames better. For example KNNK is a draw, but zurichess evaluates it at +9. My hope is to teach the engine to exchange down when it can.

I added table (code below) saying which endgames are drawish (>90% probability based on endgame table bases). For these endgames I devide the score by 8 so now instead of +9 KNNK evaluates as ~1.

This however did not improve the play. Is there anything I'm missing, or is there a better way to handle drawish endgames?

Code: Select all

// drawishEndgames maps an endgame to a mask                                                                                                       
// of Color representing whether it can hold a draw if it is to move.                                                                              
var drawishEndgames = map&#91;uint64&#93;Color&#123;                                                                                                            
        KK&#58;    White | Black,                                                                                                                      
        KKN&#58;   White | Black,                                                                                                                      
        KNK&#58;   White | Black,                                                                                                                      
        KKB&#58;   White | Black,                                                                                                                      
        KBK&#58;   White | Black,                                                                                                                      
        KBKB&#58;  White | Black,                                                                                                                      
        KNKN&#58;  White | Black,                                                                                                                      
        KBKN&#58;  White | Black,                                                                                                                      
        KNKB&#58;  White | Black,                                                                                                                      
        KNNK&#58;  White | Black,                                                                                                                      
        KKNN&#58;  White | Black,                                                                                                                      
        KNNKB&#58; White | Black,                                                                                                                      
        KBKNN&#58; White | Black,                                                                                                                      
        KNNKN&#58; White | Black,                                                                                                                      
        KNKNN&#58; White | Black,                                                                                                                      
        KNNKR&#58; White | Black,                                                                                                                      
        KRKNN&#58; White | Black,                                                                                                                      
&#125;

Joost Buijs · Post by **Joost Buijs** » Sun Dec 04, 2016 5:05 pm

Hi,

I do something like what you are doing, I have a very large byte map (1MB) indexed by the material signature which can scale the evaluation from 0 to 255%, usually it sits at 100% of course.

At the moment I have ~100 different signatures which differ from 100% unfortunately I never found much improvement by using this table. Occasionally it prevents a bad exchange but somehow it doesn't improve Elo by much.

I tuned all the obvious cases by hand, maybe it would be better to use something automated like the Texel tuning method to optimize it.

kbhearn · Post by **kbhearn** » Sun Dec 04, 2016 5:59 pm

a couple factors come to mind:

1) most games probably don't get to the point where a small selection of pawnless endings being scaled down is relevant - pawnless endings are far rarer than endings with pawns (and many of your listed cases are probably scoring close enough to zero anyways without being scaled down)

2) in the cases where they can be prevented it may be pointless things like refusing to capture the last enemy pawn - which apart from KNNvKP does not really make for an improvement to winning chances - the main effect may be stopping one already-too-late step before the pawnless ending that's been scaled

3) while looking at your endgame cases treated i doubt it's the case here (mostly it appears KNN is not mating material is all that your current cases specify of note), it's possible in a selftest that the occurrence of half-points saved by preventing a bad exchange or by sacrificing for the enemy's last pawn is somewhat cancelled by half points not won because the version with scaling won't give the version without the opportunity to miss sacrificing for the pawn (i.e. the only winning chance is to advance the pawn but version B won't go for it because 'obviously' version A would sac for the pawn(s) and then it'd be drawish - but maybe it wouldn't because it doesn't know better and maybe there is no winning chance without giving it the chance to make that mistake). You need to strike a balance where the scale-down is smoother so it doesn't come to a point where the stronger side refuses to progress because then it'd realise it's drawish. This is one of the big traps with making opposite bishop endings get scaled down - They're drawish except when they're not, and refusing to go into one does not necessarily increase winning chances - sometimes even though it's drawish it's the best chance left.

Sven · Post by **Sven** » Sun Dec 04, 2016 10:17 pm

Joost Buijs wrote:At the moment I have ~100 different signatures which differ from 100% unfortunately I never found much improvement by using this table. Occasionally it prevents a bad exchange but somehow it doesn't improve Elo by much.

I tuned all the obvious cases by hand, maybe it would be better to use something automated like the Texel tuning method to optimize it.

I don't think that it is possible to get any substantal improvement of playing strength by adding knowledge about drawish endgames. Even adding "perfect" endgame knowledge through tablebases does not increase strength remarkably, some people even say it has "zero effect". So how could adding "imperfect", "partial" knowledge be better? Nevertheless I would always prefer to have that knowledge since it improves the engine's behaviour in analysis.

Necromancer · Post by **Necromancer** » Mon Dec 05, 2016 3:41 pm

In my engine theres a 'bool Evaluation::isDrawMaterial()' function, which I borrowed from the Vice engine (which in turn, he borrowed from the Sjeng engine if I remeber well).

Is that what you're looking for?

Joost Buijs · Post by **Joost Buijs** » Mon Dec 05, 2016 4:54 pm

Sven Schüle wrote:
Joost Buijs wrote:At the moment I have ~100 different signatures which differ from 100% unfortunately I never found much improvement by using this table. Occasionally it prevents a bad exchange but somehow it doesn't improve Elo by much.

I tuned all the obvious cases by hand, maybe it would be better to use something automated like the Texel tuning method to optimize it.
I don't think that it is possible to get any substantal improvement of playing strength by adding knowledge about drawish endgames. Even adding "perfect" endgame knowledge through tablebases does not increase strength remarkably, some people even say it has "zero effect". So how could adding "imperfect", "partial" knowledge be better? Nevertheless I would always prefer to have that knowledge since it improves the engine's behaviour in analysis.

I agree that knowledge about drawish endgames doesn't improve strength by much, in my case the table (since it contains all material combinations) can also be used for material unbalance in general. The problem is that it is very difficult to tune without an automated procedure. I added roughly 100 cases by hand and I never found the spirit to optimize it any further.

In the case of obviously drawn positions like e.g. 'KNNk' I terminate the search immediately with a draw score so I handle these cases in a different way.

PK · Post by PK » Mon Dec 05, 2016 11:57 pm

If you evaluate KB vs K as a draw but KB vs KP as +2, then you introduce ugly score inconsistency that eats away gains from having draw evaluation. True, draw evaluation doesn't gain very much. But it doesn't make sense to refrain from scaling down the endgames with pawns on the weaker side of the imbalance.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Dec 08, 2016 8:36 am

brtzsnr wrote:Hi!

I'm trying to add some logic to my engine to handle drawish endgames better. For example KNNK is a draw, but zurichess evaluates it at +9. My hope is to teach the engine to exchange down when it can.

I added table (code below) saying which endgames are drawish (>90% probability based on endgame table bases). For these endgames I devide the score by 8 so now instead of +9 KNNK evaluates as ~1.

This however did not improve the play. Is there anything I'm missing, or is there a better way to handle drawish endgames?

Code: Select all

// drawishEndgames maps an endgame to a mask                                                                                                       
// of Color representing whether it can hold a draw if it is to move.                                                                              
var drawishEndgames = map&#91;uint64&#93;Color&#123;                                                                                                            
        KK&#58;    White | Black,                                                                                                                      
        KKN&#58;   White | Black,                                                                                                                      
        KNK&#58;   White | Black,                                                                                                                      
        KKB&#58;   White | Black,                                                                                                                      
        KBK&#58;   White | Black,                                                                                                                      
        KBKB&#58;  White | Black,                                                                                                                      
        KNKN&#58;  White | Black,                                                                                                                      
        KBKN&#58;  White | Black,                                                                                                                      
        KNKB&#58;  White | Black,                                                                                                                      
        KNNK&#58;  White | Black,                                                                                                                      
        KKNN&#58;  White | Black,                                                                                                                      
        KNNKB&#58; White | Black,                                                                                                                      
        KBKNN&#58; White | Black,                                                                                                                      
        KNNKN&#58; White | Black,                                                                                                                      
        KNKNN&#58; White | Black,                                                                                                                      
        KNNKR&#58; White | Black,                                                                                                                      
        KRKNN&#58; White | Black,                                                                                                                      
&#125;

I guess instead of enumerating all those endgame draws, a much simpler solution is to just assign draw, whenever number of pawns is zero and piecematerial(stronger side) - piecematerial(weaker side) <= bishop value and number of pieces for both sides is less than 7. (as RB vs R is draw, while RRB vs RR is a win)

that would nicely suit all simple endgames, also BB vs N and Q vs BB or NB, but not if your engine's bishops value is lower than knigth value.

but I guess no much gain there, no much gain in the eg either, most of the gains should be in the mg and, especially, early opening.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Fri Dec 09, 2016 9:24 am

I guess that specifying that king of the weaker side, as SF and other engines certainly do, should it be on edge (a file,h file, 1st rank, 2nd rank), is also a good thing, as otherwise some (maybe 10-15%) such endings could still be won.

or maybe also include side to move in some way; for example, if weaker side and side to move instead of weaker and edge, etc.

interesting, how often people include stm when scaling down endgames?

hgm · Post by **hgm** » Fri Dec 09, 2016 9:28 pm

Note that the statement that "EGTs do not add Elo" is based on adding them to engines that do have some of the most elementary knowledge. I once made a version of Fairy-Max that did scale drawish end-games, called Pair-o-Max. That really was a lot stronger (some 30 Elo, IIRC?), and I don't believe it could all be due to evaluating the Bishop pair. (Which was the only other thing it had over Fairy-Max.)

Pawel hits it on the nail, though: if you do it only for Pawnless end-games, it is completely pointless, as you will be too late. The engine will just get stuck in KBKP, after happily trading into it from KBPPKNP. Fruit 2.1 is a good example for how to do it. I did a slightly simplified version of that in Pair-o-Max:

The scaling is triggered by the leading side having less than two Pawns. Without Pawns it is obviously crucial that your non-Pawn material has winning potential. The rule of thumb is that this is not the case if you are less than 350cP ahead in non-Pawn material (e.g. KBNKN, KRBKR, KRKN). Scaling by a factor 1/16 should be enough in that case. Although KBK and KNK of course deserve a factor 0, 1/16 is enough to make them unattractive compared to all alternatives. Defender Pawns do not matter for this: they are usually doomed. They do count in the score you are going to scale (so you will indeed gobble them up when the engine gets the opportunity, and the defender will not give them up without a fight), but not in the decision by how much you are going to scale. Note, though, that if the defender has too many Pawns, you will not be ahead even in the naive evaluation, and there is nothing to scale. (Scaling combinations like KBNK is also pointless for that reason, as they already have near-zero scores.)

When you are more than 350cP ahead in piece material you usually do have mating potential even without Pawns, but a factor 1/2 then gives a good approximation for how difficult it will be. Winning KRPPKR (+200, naively) is usually quite straightforward compared to winning KQKR, and the factor 1/2 for the latter protects you from too eagerly converting KRPPKR to KQKR. A special case is KNNK, for which I introduced in Pair-o-Max the possibility for marking a minor with a 'deficient pair' flag, indicating that two of those against nothing should be considered a draw even though you are more than 350cP ahead. (A similar exception was made for a single color-bound piece worth more than 350cP.) When the defender has no non-Pawns Pair-o-Max does not scale at all; otherwise it refused to convert to KRK in some cases (e.g. from KRPKB). The justifying assumption is that the engine will not have any difficulty performing the theoretically possible mates against a bare King.

The second major case is when the leading side has a single Pawn, and the opponent still has some non-Pawn material. The assumption is that he can then easily sacrifice his weakest piece for that Pawn, or block advance of that Pawn forever under the threat of making such a sacrifice. In that case the analysis of your winning potehtial should first delete the weakest defending piece. If that leaves the leading side with less than 350 cP ahead (or a deficient pair), I used a factor 1/4. Not as bad as the 1/16 that would apply when the sacrifice had already happened, to make the leading side aware he should avoid allowing such a sacrifice. E.g. KBPKB naively would be +100, KBK +300. But after scaling they become +100/5 +25 and +300/16 = +19, respectively. So the engine would realize KBPKB is better. (Usually the Pawn is evaluated more than 100, because it is a passer.) OTOH, KBPPKBP would not be scaled at all, and thus stay +100, which would bias the leading engine against trading the Pawn, and thus expose himself to the threat of a drawing Bishop sacrifice. End-games like KBPKNP are also recognized as quite drawish this way. Note that unlike the Pawnless KBKN, these do not necessarily have draw scores to begin with, because Pawn value is quite variable, and a 7th-rank Passer might be scored as +250cP. So without scaling KBPKNP could be at +150, while usually defending the draw is trivial.

In summary:
1) Ignore Pawns of the weak side
2) Assume the weak side has to sac his weakest piece for your only Pawn
3) Recognize lack of winning potential in the (resulting) non-Pawn material for the leading side
4) scale 1/2 without Pawns (but not against just King + Pawns), 1/4 without winning potential, and 1/16 with neither winning potetial nor Pawns

For recognizing mating potetial
3a) assume present if leading side has more than 2 pieces (e.g. KRRNKRR)
3b) none if less than 350 cP ahead
3c) none if deficient pair (such as NN)
3d) none if single color-bound
3e) none if single piece and opponent has piece marked as 'tough defender'

The latter 2 cases cannot happen in orthodox Chess. Pair-o-Max furthermore uses an exception to 3b, when it has a piece worth less than 350 which can force checkmate against a bare King. (Such pieces are also specially marked.). E.g. a piece moving as a King is worth about a Knight, but can force checkmate. So KKRKR or KKNKN should not be discounted. This rule also means KKBKBN would not be discounted, but this already has a draw score by itself.

draw endgame scaling

draw endgame scaling

Re: draw endgame scaling

Re: draw endgame scaling

Re: draw endgame scaling

Re: draw endgame scaling

Re: draw endgame scaling

Re: draw endgame scaling

Re: draw endgame scaling

Re: draw endgame scaling

Re: draw endgame scaling