Material imbalance evaluation

Alessandro Scotti · Post by **Alessandro Scotti** » Mon Apr 16, 2007 12:06 pm

Hi,
I think everybody knows Kaufman's article on evaluating material, but just in case:

http://mywebpages.comcast.net/danheisma ... alance.htm

So far I haven't had any luck adding those corrections in Hamsters, but I would like to go deeper on the subject. Is there any (possibly free) tool that can help me perform an analysis similar to Kaufman's on a game collection?
For example, I need to be able to select games with more or less equal material but where an imbalance is present for several moves (e.g. N vs. B, NPP vs. R and so on) and then to get win/draw/loss statistics on those games.
This data might be very valuable and these adjustments seem to be one of the "secrets" of Rybka...

hgm · Post by **hgm** » Mon Apr 16, 2007 12:26 pm

I would be afraid that such an evaluation is very prone to systematic errors. For example, if you look at win prercentage of games where a Knight was sacrificed for two Pawns, you might conclude that the Pawns have a pretty good chance. But this would most likely be caused because these two Pawns would be far-above average good, (like connected passers), because decent players would never sacrifice a Knight for two lousy Pawns, and when forced with an inivetible loss would opt for abandoning a Pawn rather than engaging in a N vs 2P swap. This would lead you to highly overestimate the value of Pawns compared to pieces.

So it seems you would have to do a fair amount of evaluation on the position of material imbalance to correct for other positional factors, like Pawn structure, good/bad Bishops, Rook positioning, as those factors are unlikely to average out to zero in positions selected from games.

mjlef · Post by **mjlef** » Mon Apr 16, 2007 2:02 pm

Alessandro Scotti wrote:Hi,
I think everybody knows Kaufman's article on evaluating material, but just in case:

http://mywebpages.comcast.net/danheisma ... alance.htm

So far I haven't had any luck adding those corrections in Hamsters, but I would like to go deeper on the subject. Is there any (possibly free) tool that can help me perform an analysis similar to Kaufman's on a game collection?
For example, I need to be able to select games with more or less equal material but where an imbalance is present for several moves (e.g. N vs. B, NPP vs. R and so on) and then to get win/draw/loss statistics on those games.
This data might be very valuable and these adjustments seem to be one of the "secrets" of Rybka...

SCID has these capabilities:

scid.sourceforge.net

You can specify specific material and how many minimum moves and it will give winning percentages for each side, even grouped by player ratings. It lets you also search on piece position and such...I wish it knew how to determine passed pawns (you can do this with a complex list of all the opponent querae a pawn cannot be on, but it is not as flxible as I wqant)...and other things, but it is a good, free start.

Mark

bob · Post by **bob** » Mon Apr 16, 2007 6:22 pm

I completely agree. You need a _random_ same of games played where N for PP trade occurred. Not just a sample of games humans played where the players were good and they only played the sac when they had a strong positional edge after doing so.

Alessandro Scotti · Post by **Alessandro Scotti** » Mon Apr 16, 2007 8:08 pm

Thanks Mark,
it seems SCID will do the work just fine!

bob wrote:I completely agree. You need a _random_ same of games played where N for PP trade occurred. Not just a sample of games humans played where the players were good and they only played the sac when they had a strong positional edge after doing so.

I think if the two players are more or less matched, imbalances are kind of "accepted" by both. It's not like one decides to create an advantage and the other just welcomes it. But even when the imbalance is forced it can be useful to take a look at it, because it might be possible to detect related patterns.
Also, besides NPP vs. R and similar cases that are probably more difficult to evaluate properly, there are many N vs. B situations where evaluation adjustments can be helpful.
IMO this is at least worth trying, especially if there are good tools to extract the statistics, which is the worst part.

bob · Post by **bob** » Tue Apr 17, 2007 8:37 pm

Alessandro Scotti wrote:Thanks Mark,
it seems SCID will do the work just fine!

bob wrote:I completely agree. You need a _random_ same of games played where N for PP trade occurred. Not just a sample of games humans played where the players were good and they only played the sac when they had a strong positional edge after doing so.
I think if the two players are more or less matched, imbalances are kind of "accepted" by both. It's not like one decides to create an advantage and the other just welcomes it. But even when the imbalance is forced it can be useful to take a look at it, because it might be possible to detect related patterns.
Also, besides NPP vs. R and similar cases that are probably more difficult to evaluate properly, there are many N vs. B situations where evaluation adjustments can be helpful.
IMO this is at least worth trying, especially if there are good tools to extract the statistics, which is the worst part.

The problem is that the _games_ were played by humans, and most likely, most of the time the "sacs" are good. But a program is going to apply that bit of knowledge all over the tree where most of the sacs are awful.

Classic is N for 2-3 pawns. Generally this loses for the side giving up the knight, but if you look at human games, most work out successfully because they don't do it unless it is pretty solid.

I do just the opposite, which is also bad, in that I consider such trades to always be bad, which is also wrong. But it is less wrong than always making those kinds of trades.

hgm · Post by **hgm** » Tue Apr 17, 2007 10:53 pm

I just did an interesting experiment in uMax concerning piece values. The standard version always used the 'classical' values 1,3,3,5,9. As uMax fails to correct for piece-square points of the captured piece, this makes trades of B vs N, B or N vs 3P, 2B vs R+P completely equal, and therefore often played. In many cases these trades are immediately fatal, however.

I tried to remedy this by taking the values 0.8, 2.8, 3.2, 5.2, 9.6, thinking that reducing the inclination to swap B for N would give an improvement. But in 1000 games of self-play, the result was an insignificant 50.6%.

The problem was, however, that ther Rooks were overestimated, as B+N for R+P was still neutral, and 2N for R+P even favored. When I reduced to R=4.8 and Q=9.2, the self-play result against the canonical values jumped to 55.5% over 1000 games.

So the effect of encouraging wrong trades can be really big, they often result in immediate loss of the game. So when in doubt, better not attempt them. I would trust such n experimental determination of the piece value much more reliable then analysis of game positions.

Uri Blass · Post by **Uri Blass** » Tue Apr 17, 2007 11:03 pm

hgm wrote:I just did an interesting experiment in uMax concerning piece values. The standard version always used the 'classical' values 1,3,3,5,9. As uMax fails to correct for piece-square points of the captured piece, this makes trades of B vs N, B or N vs 3P, 2B vs R+P completely equal, and therefore often played. In many cases these trades are immediately fatal, however.

I tried to remedy this by taking the values 0.8, 2.8, 3.2, 5.2, 9.6, thinking that reducing the inclination to swap B for N would give an improvement. But in 1000 games of self-play, the result was an insignificant 50.6%.

The problem was, however, that ther Rooks were overestimated, as B+N for R+P was still neutral, and 2N for R+P even favored. When I reduced to R=4.8 and Q=9.2, the self-play result against the canonical values jumped to 55.5% over 1000 games.

So the effect of encouraging wrong trades can be really big, they often result in immediate loss of the game. So when in doubt, better not attempt them. I would trust such n experimental determination of the piece value much more reliable then analysis of game positions.

I believe that you still overestimate the difference between bishop and knight

If I understand correctly
You have
0.8, 2.8, 3.2, 4.8, 9.2

This gives a difference of 1/2 pawn between bishop and knight.
I think it may be better to have
0.8, 2.9, 3.1, 4.8, 9.2

Uri

Bill Rogers · Post by **Bill Rogers** » Tue Apr 17, 2007 11:58 pm

Hey HGM
Years ago I did some theorical testing on piece values.
The first thing I did was to create a table for each man then I gave one point for each square, thus a paws gets two points, etc.
Second I gave an extra point for each man that could attack both colors with the exception of the bishops and the king. The king because it is not a good attacking piece under most circumstances.
So I arrived at the following values:
Pawns 3 :Knights 9 : Bishop 13 :Rook 15: queen 28 : king 7
When all of above are divided by 3 you get pretty close to what they were predicted to in the first place and resembling your number to a little extent.
pawn=1 knight=3 bisop= 4.3 rook=5 queen= 9.3 king=2.3

hgm · Post by **hgm** » Wed Apr 18, 2007 9:45 am

The most advanced theoretical considerations on ab-initio piece-value determination I have seen were by Ralph Betza. Apart from mobility, he also defined concepts such as 'forwardness' that seem to be important for piece strength. E.g. a piece that does only one diagonal step (Ferz) and a piece that does only one orthogonal step (Wazir) both have a mobility of 4. But the Ferz is more handicapped at the board edge, and averaged over all squares the mobility of the Wazir is 3.5, and of the Ferz 3.06. In addition, the Ferz is color-bound. like the Bishop. Yet in games the Ferz turns out to be the stronger piece of the two! (I did not believe this, of course, but I tested it by pitting opponents equipped with 8 Ferzes and 8 Wazirs in stead of Pawns against each other, and the Ferzes scored indeed over 60%.) This can be explained from the fact that a Ferz has two moves that go forward, while a Wazir has only one.

Other properties, like the concentration of the moves, can also be very important. This is also expressed in the mating potential of pieces. For this reason your counting method strongly underestimates the King. A King totally dominates a Knight in end-games if you forget about the royal aspect (a non-royal piece moving as King is know as a Commoner or Man). Of course the piece with the better focused set of target squares is hindred less by the board edge (K has 6.56 moves, N only 5.5 on the average), but that effect is not the main reason. KMK is a won end-game even on quite large boards, while KNK is always a draw, and on larger boards the mobility of the two approaches both 8. Also in combinations with other pieces the Man is usually stronger than the Knight. (Interestingly, KMKM is also won very often for one side, despite the material being even!)

I think that Uri is correct in that his values for N and B would be even better. The differences are minor, though, as they do encourage and discourage the same trades. So only in combination with positional factors they would move the point where the program would see enough compensation for a bad trade (sacrifice). On uMax such subteties are wasted, as its positional scoring is so primitive that it is hardly better than random, and I went for the 2.8 vs 3.2 simply because it the ratio of 2 sigle-digit integers (7:8), which saves 2 characters compared to needing 2-digit integers .

One should also realize that uMax cannot recognize the Bishop pair, so the value for the Bishop includes also the averaged bonus for the fact that it might be one of a pair. This should be taken into account when comparing with the piece values in other programs, that give a separate bonus for possession of the B-pair.

Material imbalance evaluation

Material imbalance evaluation

Re: Material imbalance evaluation

Re: Material imbalance evaluation

Re: Material imbalance evaluation

Re: Material imbalance evaluation

Re: Material imbalance evaluation

Re: Material imbalance evaluation

Re: Material imbalance evaluation

Re: Material imbalance evaluation

Re: Material imbalance evaluation