Scaling eval with material on the board

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Scaling eval with material on the board

Post by Lyudmil Tsvetkov »

I think we already talked about this in a thread dedicated to book losses and TCEC, but quite probably no one paid attention there to what was discussed, so here again a quick summary.

My theoretic vision is that the available advantage in score will not stay the same as the game progresses, it either increases or decreases. It is going to increase when there is a lot of material on the board, meaning opportunities to convert are better. It is going to decrease when there is limited material on the board, meaning opportunities to convert are small.

Basically, large material in the opening and score of 50cps advantage and limited material in the endgame with the same score of 50cps edge mean 2 completely different things. Chances are that the 50cps advantage in the opening is only going to increase, while the 50cps edge in the endgame is only going to decrease. So those eval scores are not real ones, but simply wrong and imperfect assessments.
How to correct that?

Well, my suggestion is very simple: scale eval with available material. You could scale the following way:

- multiply eval by 1.4 from total material to 5/6 available material
- multiply eval by 1.3 from 5/6 to 4/6 available material of total material
- multiply eval by 1.2 from 4/6 to 3/6 available material of total material
- multiply eval by 1.1 from 3/6 to 2/6 available material

- multiply eval by 0.9 from 2/6 to 1/6 available material
- multiply eval by 0.8 from 1/6 to the end of the game

In this way, your eval will be much more realistic.

You might laugh at me, but I would say you could gain as much as 30 elo and more only by doing that. The potential is enormous, as basically all engine evals seem to be wrong in assessing opportunities to convert.

So that opportunities to convert increase from the opening until a stage when less than 1/3 of total material is present, and then they start to decrease as the game nears its end. I know many scale particular endgames, pawnless or otherwise, but that is very far from sufficient. You can scale down much bolder in the endgame. I am also almost sure that no one scales up the score in the opening and early middlegame stages, but that is the right approach, as with each passing move and a lot of available material the good moves of the weaker side that would hold the balance at the current score gradually decrease until there are already not such moves any more and the engine should compromise on quality and the score. That is why you should scale eval up in the opening and early middlegame.

As discussed in the book losses thread, a 50cps opening advantage, when it is real, already means that the weaker side has lost the game as the realistic score for te coming moves will be some 70-80cps and above.

What do you think of this theory?
Anyone doing or intent on doing something similar?

I would be very happy if someone tries the idea and reports what happens.
arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: Scaling eval with material on the board

Post by arjuntemurnikar »

Lyudmil Tsvetkov wrote: You might laugh at me, but I would say you could gain as much as 30 elo and more only by doing that. The potential is enormous, as basically all engine evals seem to be wrong in assessing opportunities to convert.
LOLZ your jokes get funnier and funnier day by day!
arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: Scaling eval with material on the board

Post by arjuntemurnikar »

Lyudmil Tsvetkov wrote: Well, my suggestion is very simple: scale eval with available material. You could scale the following way:

- multiply eval by 1.4 from total material to 5/6 available material
- multiply eval by 1.3 from 5/6 to 4/6 available material of total material
- multiply eval by 1.2 from 4/6 to 3/6 available material of total material
- multiply eval by 1.1 from 3/6 to 2/6 available material

- multiply eval by 0.9 from 2/6 to 1/6 available material
- multiply eval by 0.8 from 1/6 to the end of the game

In this way, your eval will be much more realistic.
What you are saying here is that eval in all endgames should be gradually scaled down by about 20%. What would that achieve?

If a drawn rook and pawn ending is shown as 0.30 cp, your "scaled down eval" will show 0.24 cp instead. Wow, surely this corrects all the "wrong evals" that all the engines show in endgames, right?

Scaling down the score is only useful in positions that are drawish (for e.g. opposite color bishop endgames) because you want to encourage the engine to avoid them. All endgames are not drawish, so I think this is highly illogical.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Scaling eval with material on the board

Post by lkaufman »

Lyudmil Tsvetkov wrote:I think we already talked about this in a thread dedicated to book losses and TCEC, but quite probably no one paid attention there to what was discussed, so here again a quick summary.

My theoretic vision is that the available advantage in score will not stay the same as the game progresses, it either increases or decreases. It is going to increase when there is a lot of material on the board, meaning opportunities to convert are better. It is going to decrease when there is limited material on the board, meaning opportunities to convert are small.

Basically, large material in the opening and score of 50cps advantage and limited material in the endgame with the same score of 50cps edge mean 2 completely different things. Chances are that the 50cps advantage in the opening is only going to increase, while the 50cps edge in the endgame is only going to decrease. So those eval scores are not real ones, but simply wrong and imperfect assessments.
How to correct that?

Well, my suggestion is very simple: scale eval with available material. You could scale the following way:

- multiply eval by 1.4 from total material to 5/6 available material
- multiply eval by 1.3 from 5/6 to 4/6 available material of total material
- multiply eval by 1.2 from 4/6 to 3/6 available material of total material
- multiply eval by 1.1 from 3/6 to 2/6 available material

- multiply eval by 0.9 from 2/6 to 1/6 available material
- multiply eval by 0.8 from 1/6 to the end of the game

In this way, your eval will be much more realistic.

You might laugh at me, but I would say you could gain as much as 30 elo and more only by doing that. The potential is enormous, as basically all engine evals seem to be wrong in assessing opportunities to convert.

So that opportunities to convert increase from the opening until a stage when less than 1/3 of total material is present, and then they start to decrease as the game nears its end. I know many scale particular endgames, pawnless or otherwise, but that is very far from sufficient. You can scale down much bolder in the endgame. I am also almost sure that no one scales up the score in the opening and early middlegame stages, but that is the right approach, as with each passing move and a lot of available material the good moves of the weaker side that would hold the balance at the current score gradually decrease until there are already not such moves any more and the engine should compromise on quality and the score. That is why you should scale eval up in the opening and early middlegame.

As discussed in the book losses thread, a 50cps opening advantage, when it is real, already means that the weaker side has lost the game as the realistic score for te coming moves will be some 70-80cps and above.

What do you think of this theory?
Anyone doing or intent on doing something similar?

I would be very happy if someone tries the idea and reports what happens.
My subjective opinion agrees with you totally on this point. However, it works against the principle of trading pieces when you are ahead in material. So most engines have the value of a pawn going up towards the endgame, while many positional terms are lower in the endgame. Attempts to do something similar to what you propose never helped Komodo, presumably because we already scale the scores up and down in a reasonable way. But subjectively I see that you should be right; a score like plus 30 cp is great on a crowded board but drawish in the endgame. I just don't know how to change Komodo to fix this and still get an elo gain. I suppose it would be the same for SF. We need a much more clever idea for this!
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling eval with material on the board

Post by Lyudmil Tsvetkov »

lkaufman wrote:
Lyudmil Tsvetkov wrote:I think we already talked about this in a thread dedicated to book losses and TCEC, but quite probably no one paid attention there to what was discussed, so here again a quick summary.

My theoretic vision is that the available advantage in score will not stay the same as the game progresses, it either increases or decreases. It is going to increase when there is a lot of material on the board, meaning opportunities to convert are better. It is going to decrease when there is limited material on the board, meaning opportunities to convert are small.

Basically, large material in the opening and score of 50cps advantage and limited material in the endgame with the same score of 50cps edge mean 2 completely different things. Chances are that the 50cps advantage in the opening is only going to increase, while the 50cps edge in the endgame is only going to decrease. So those eval scores are not real ones, but simply wrong and imperfect assessments.
How to correct that?

Well, my suggestion is very simple: scale eval with available material. You could scale the following way:

- multiply eval by 1.4 from total material to 5/6 available material
- multiply eval by 1.3 from 5/6 to 4/6 available material of total material
- multiply eval by 1.2 from 4/6 to 3/6 available material of total material
- multiply eval by 1.1 from 3/6 to 2/6 available material

- multiply eval by 0.9 from 2/6 to 1/6 available material
- multiply eval by 0.8 from 1/6 to the end of the game

In this way, your eval will be much more realistic.

You might laugh at me, but I would say you could gain as much as 30 elo and more only by doing that. The potential is enormous, as basically all engine evals seem to be wrong in assessing opportunities to convert.

So that opportunities to convert increase from the opening until a stage when less than 1/3 of total material is present, and then they start to decrease as the game nears its end. I know many scale particular endgames, pawnless or otherwise, but that is very far from sufficient. You can scale down much bolder in the endgame. I am also almost sure that no one scales up the score in the opening and early middlegame stages, but that is the right approach, as with each passing move and a lot of available material the good moves of the weaker side that would hold the balance at the current score gradually decrease until there are already not such moves any more and the engine should compromise on quality and the score. That is why you should scale eval up in the opening and early middlegame.

As discussed in the book losses thread, a 50cps opening advantage, when it is real, already means that the weaker side has lost the game as the realistic score for te coming moves will be some 70-80cps and above.

What do you think of this theory?
Anyone doing or intent on doing something similar?

I would be very happy if someone tries the idea and reports what happens.
My subjective opinion agrees with you totally on this point. However, it works against the principle of trading pieces when you are ahead in material. So most engines have the value of a pawn going up towards the endgame, while many positional terms are lower in the endgame. Attempts to do something similar to what you propose never helped Komodo, presumably because we already scale the scores up and down in a reasonable way. But subjectively I see that you should be right; a score like plus 30 cp is great on a crowded board but drawish in the endgame. I just don't know how to change Komodo to fix this and still get an elo gain. I suppose it would be the same for SF. We need a much more clever idea for this!
I do not know of a principle that you should trade pieces when you are ahead. There is no such principle. Humans tend to trade pieces when they are ahead, because they are afraid of complications, and in the process of wrong trading those humans lose a significant part of the advantage they had earlier.

Engines never trade pieces, as they are not afraid of complications, and consequently they very rarely waste part of their accumulated advantage, unless they assess and execute something wrongly. So that really, for me there is no such principle. Actually, I can not think of a reasonable trading rule at all: it all depends on specific features.

I think scaling eval up in the opening and down in endgame is very much similar to reaching more plies in calculations. If you were able to reach deeper in the opening a 50cps advantage at ply 30 would become 70cps at ply 35, if you evaluate correctly and make no mistakes. And in the endgame, the same 50cps advantage at ply 40 would become 30cps at ply 50. So that scaling up and down eval is simply analogous to reaching bigger depth. It is always better to have a more precise evaluation of the positions, as bigger depth would reveal, do not you agree, Larry.

The scaled eval is simply the eval that would naturally appear at a bigger depth.
User avatar
hgm
Posts: 27791
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Scaling eval with material on the board

Post by hgm »

Lyudmil Tsvetkov wrote:My theoretic vision is that the available advantage in score will not stay the same as the game progresses, it either increases or decreases. It is going to increase when there is a lot of material on the board, meaning opportunities to convert are better. It is going to decrease when there is limited material on the board, meaning opportunities to convert are small.
As usual you have it exactly the wrong way around. The same absolute advantage (like being 2 Pawns ahead) offers a far higher winning percentage with little material on the board than with a lot of material. Because it will be a larger relative advantage. Engines like Rybka more than doubled the advantage in the end-game, IIRC.

This is of course trivial to test, so that there is no reason at all to fantasize about it. Just play 1000 games (self-play) starting from opening positions where you deleted a minor for white, and two Pawns for black. Then play 1000 games starting from KNPPKPPPP or KBPPKPPPP positions, and see where the advantage minor vs. 2P des better.

There is a caveat, however: advantages that are not above the draw margin typically get more drawish as the material disappears, and future change of the advantage in any direction (so also in the direction of a win) becomes less likely.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Scaling eval with material on the board

Post by lkaufman »

Lyudmil Tsvetkov wrote:
lkaufman wrote:
Lyudmil Tsvetkov wrote:I think we already talked about this in a thread dedicated to book losses and TCEC, but quite probably no one paid attention there to what was discussed, so here again a quick summary.

My theoretic vision is that the available advantage in score will not stay the same as the game progresses, it either increases or decreases. It is going to increase when there is a lot of material on the board, meaning opportunities to convert are better. It is going to decrease when there is limited material on the board, meaning opportunities to convert are small.

Basically, large material in the opening and score of 50cps advantage and limited material in the endgame with the same score of 50cps edge mean 2 completely different things. Chances are that the 50cps advantage in the opening is only going to increase, while the 50cps edge in the endgame is only going to decrease. So those eval scores are not real ones, but simply wrong and imperfect assessments.
How to correct that?

Well, my suggestion is very simple: scale eval with available material. You could scale the following way:

- multiply eval by 1.4 from total material to 5/6 available material
- multiply eval by 1.3 from 5/6 to 4/6 available material of total material
- multiply eval by 1.2 from 4/6 to 3/6 available material of total material
- multiply eval by 1.1 from 3/6 to 2/6 available material

- multiply eval by 0.9 from 2/6 to 1/6 available material
- multiply eval by 0.8 from 1/6 to the end of the game

In this way, your eval will be much more realistic.

You might laugh at me, but I would say you could gain as much as 30 elo and more only by doing that. The potential is enormous, as basically all engine evals seem to be wrong in assessing opportunities to convert.

So that opportunities to convert increase from the opening until a stage when less than 1/3 of total material is present, and then they start to decrease as the game nears its end. I know many scale particular endgames, pawnless or otherwise, but that is very far from sufficient. You can scale down much bolder in the endgame. I am also almost sure that no one scales up the score in the opening and early middlegame stages, but that is the right approach, as with each passing move and a lot of available material the good moves of the weaker side that would hold the balance at the current score gradually decrease until there are already not such moves any more and the engine should compromise on quality and the score. That is why you should scale eval up in the opening and early middlegame.

As discussed in the book losses thread, a 50cps opening advantage, when it is real, already means that the weaker side has lost the game as the realistic score for te coming moves will be some 70-80cps and above.

What do you think of this theory?
Anyone doing or intent on doing something similar?

I would be very happy if someone tries the idea and reports what happens.
My subjective opinion agrees with you totally on this point. However, it works against the principle of trading pieces when you are ahead in material. So most engines have the value of a pawn going up towards the endgame, while many positional terms are lower in the endgame. Attempts to do something similar to what you propose never helped Komodo, presumably because we already scale the scores up and down in a reasonable way. But subjectively I see that you should be right; a score like plus 30 cp is great on a crowded board but drawish in the endgame. I just don't know how to change Komodo to fix this and still get an elo gain. I suppose it would be the same for SF. We need a much more clever idea for this!
I do not know of a principle that you should trade pieces when you are ahead. There is no such principle. Humans tend to trade pieces when they are ahead, because they are afraid of complications, and in the process of wrong trading those humans lose a significant part of the advantage they had earlier.

Engines never trade pieces, as they are not afraid of complications, and consequently they very rarely waste part of their accumulated advantage, unless they assess and execute something wrongly. So that really, for me there is no such principle. Actually, I can not think of a reasonable trading rule at all: it all depends on specific features.

I think scaling eval up in the opening and down in endgame is very much similar to reaching more plies in calculations. If you were able to reach deeper in the opening a 50cps advantage at ply 30 would become 70cps at ply 35, if you evaluate correctly and make no mistakes. And in the endgame, the same 50cps advantage at ply 40 would become 30cps at ply 50. So that scaling up and down eval is simply analogous to reaching bigger depth. It is always better to have a more precise evaluation of the positions, as bigger depth would reveal, do not you agree, Larry.

The scaled eval is simply the eval that would naturally appear at a bigger depth.
That sounds true, but the fact is that scaling the way you suggest doesn't work. Clearly, if you are ahead in material by 2 pawns or more, you should trade pieces to reach a winning endgame rather than risk a middlegame decision by attack. But if only one pawn ahead, it's not that simple. Anyway, your rule is too simple, but you are on the right track.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling eval with material on the board

Post by Lyudmil Tsvetkov »

arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote: Well, my suggestion is very simple: scale eval with available material. You could scale the following way:

- multiply eval by 1.4 from total material to 5/6 available material
- multiply eval by 1.3 from 5/6 to 4/6 available material of total material
- multiply eval by 1.2 from 4/6 to 3/6 available material of total material
- multiply eval by 1.1 from 3/6 to 2/6 available material

- multiply eval by 0.9 from 2/6 to 1/6 available material
- multiply eval by 0.8 from 1/6 to the end of the game

In this way, your eval will be much more realistic.
What you are saying here is that eval in all endgames should be gradually scaled down by about 20%. What would that achieve?

If a drawn rook and pawn ending is shown as 0.30 cp, your "scaled down eval" will show 0.24 cp instead. Wow, surely this corrects all the "wrong evals" that all the engines show in endgames, right?

Scaling down the score is only useful in positions that are drawish (for e.g. opposite color bishop endgames) because you want to encourage the engine to avoid them. All endgames are not drawish, so I think this is highly illogical.
The point is that there is a drawing margin in the endgame, let us say 50cps. Scaling is the same as reaching bigger depth, so that, if you score is sufficiently above the drawing margin, you will win anyhow; however, if your score after scaling becomes lower than the drawing margin, the engine should know it must avoid such lines. Very simple, you just drop all lines below the drawing margin if you are leading in score, and evaluate only those that still keep you above the margin. And vice versa, if you are the weaker side.

Scaling down endgames is not illogical at all, as you scale more as material gets very low, scarce. Actually, many engines scale pawnless endgames with low material, and the approach is true and confrimed, but this is just the beginning. Do not you see it, winning chances decrease as material decreases? Why not estimate more draws in advance; this will help see what lines are really winning and what drawn.

Scaling is tantamount to reaching bigger depth. Actually, that is how SF, the engine that reaches biggest depth, behaves: when in the opening its opponents sees only 20cps SF advantage because of less depth, SF already sees 50cps because it sees further. When in the endgame SF opponent sees 50cps SF advantage, SF already sees 20cps, as it has gone deeper in the position. Do you want to negate the way SF evaluates positions? Do you want to negate the reason behind bigger depth?
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling eval with material on the board

Post by Lyudmil Tsvetkov »

lkaufman wrote:
Lyudmil Tsvetkov wrote:
lkaufman wrote:
Lyudmil Tsvetkov wrote:I think we already talked about this in a thread dedicated to book losses and TCEC, but quite probably no one paid attention there to what was discussed, so here again a quick summary.

My theoretic vision is that the available advantage in score will not stay the same as the game progresses, it either increases or decreases. It is going to increase when there is a lot of material on the board, meaning opportunities to convert are better. It is going to decrease when there is limited material on the board, meaning opportunities to convert are small.

Basically, large material in the opening and score of 50cps advantage and limited material in the endgame with the same score of 50cps edge mean 2 completely different things. Chances are that the 50cps advantage in the opening is only going to increase, while the 50cps edge in the endgame is only going to decrease. So those eval scores are not real ones, but simply wrong and imperfect assessments.
How to correct that?

Well, my suggestion is very simple: scale eval with available material. You could scale the following way:

- multiply eval by 1.4 from total material to 5/6 available material
- multiply eval by 1.3 from 5/6 to 4/6 available material of total material
- multiply eval by 1.2 from 4/6 to 3/6 available material of total material
- multiply eval by 1.1 from 3/6 to 2/6 available material

- multiply eval by 0.9 from 2/6 to 1/6 available material
- multiply eval by 0.8 from 1/6 to the end of the game

In this way, your eval will be much more realistic.

You might laugh at me, but I would say you could gain as much as 30 elo and more only by doing that. The potential is enormous, as basically all engine evals seem to be wrong in assessing opportunities to convert.

So that opportunities to convert increase from the opening until a stage when less than 1/3 of total material is present, and then they start to decrease as the game nears its end. I know many scale particular endgames, pawnless or otherwise, but that is very far from sufficient. You can scale down much bolder in the endgame. I am also almost sure that no one scales up the score in the opening and early middlegame stages, but that is the right approach, as with each passing move and a lot of available material the good moves of the weaker side that would hold the balance at the current score gradually decrease until there are already not such moves any more and the engine should compromise on quality and the score. That is why you should scale eval up in the opening and early middlegame.

As discussed in the book losses thread, a 50cps opening advantage, when it is real, already means that the weaker side has lost the game as the realistic score for te coming moves will be some 70-80cps and above.

What do you think of this theory?
Anyone doing or intent on doing something similar?

I would be very happy if someone tries the idea and reports what happens.
My subjective opinion agrees with you totally on this point. However, it works against the principle of trading pieces when you are ahead in material. So most engines have the value of a pawn going up towards the endgame, while many positional terms are lower in the endgame. Attempts to do something similar to what you propose never helped Komodo, presumably because we already scale the scores up and down in a reasonable way. But subjectively I see that you should be right; a score like plus 30 cp is great on a crowded board but drawish in the endgame. I just don't know how to change Komodo to fix this and still get an elo gain. I suppose it would be the same for SF. We need a much more clever idea for this!
I do not know of a principle that you should trade pieces when you are ahead. There is no such principle. Humans tend to trade pieces when they are ahead, because they are afraid of complications, and in the process of wrong trading those humans lose a significant part of the advantage they had earlier.

Engines never trade pieces, as they are not afraid of complications, and consequently they very rarely waste part of their accumulated advantage, unless they assess and execute something wrongly. So that really, for me there is no such principle. Actually, I can not think of a reasonable trading rule at all: it all depends on specific features.

I think scaling eval up in the opening and down in endgame is very much similar to reaching more plies in calculations. If you were able to reach deeper in the opening a 50cps advantage at ply 30 would become 70cps at ply 35, if you evaluate correctly and make no mistakes. And in the endgame, the same 50cps advantage at ply 40 would become 30cps at ply 50. So that scaling up and down eval is simply analogous to reaching bigger depth. It is always better to have a more precise evaluation of the positions, as bigger depth would reveal, do not you agree, Larry.

The scaled eval is simply the eval that would naturally appear at a bigger depth.
That sounds true, but the fact is that scaling the way you suggest doesn't work. Clearly, if you are ahead in material by 2 pawns or more, you should trade pieces to reach a winning endgame rather than risk a middlegame decision by attack. But if only one pawn ahead, it's not that simple. Anyway, your rule is too simple, but you are on the right track.
But Komodo never behaves like that: whether it leads by 1 or 2 pawns or more, it always chooses lines that would only further increase its advantage, no matter if it should complicate the position or not.
arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: Scaling eval with material on the board

Post by arjuntemurnikar »

Lyudmil Tsvetkov wrote:I think scaling eval up in the opening and down in endgame is very much similar to reaching more plies in calculations. If you were able to reach deeper in the opening a 50cps advantage at ply 30 would become 70cps at ply 35, if you evaluate correctly and make no mistakes. And in the endgame, the same 50cps advantage at ply 40 would become 30cps at ply 50. So that scaling up and down eval is simply analogous to reaching bigger depth.
I see now the reason you came up with this theory was because of your dubious understanding of how engine evaluations work. I remember you said something similar to this a while back, and I tried to explain to you why this logic is flawed, but you simply ignored all sane explanations as usual.

I will not re-explain to you again but I will just say that this statement: "If you were able to reach deeper in the opening a 50cps advantage at ply 30 would become 70cps at ply 35" is just plain WRONG.

End of story.