Material imbalance evaluation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Alessandro Scotti

Material imbalance evaluation

Post by Alessandro Scotti »

Hi,
I think everybody knows Kaufman's article on evaluating material, but just in case:

http://mywebpages.comcast.net/danheisma ... alance.htm

So far I haven't had any luck adding those corrections in Hamsters, but I would like to go deeper on the subject. Is there any (possibly free) tool that can help me perform an analysis similar to Kaufman's on a game collection?
For example, I need to be able to select games with more or less equal material but where an imbalance is present for several moves (e.g. N vs. B, NPP vs. R and so on) and then to get win/draw/loss statistics on those games.
This data might be very valuable and these adjustments seem to be one of the "secrets" of Rybka... :wink:
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Material imbalance evaluation

Post by hgm »

I would be afraid that such an evaluation is very prone to systematic errors. For example, if you look at win prercentage of games where a Knight was sacrificed for two Pawns, you might conclude that the Pawns have a pretty good chance. But this would most likely be caused because these two Pawns would be far-above average good, (like connected passers), because decent players would never sacrifice a Knight for two lousy Pawns, and when forced with an inivetible loss would opt for abandoning a Pawn rather than engaging in a N vs 2P swap. This would lead you to highly overestimate the value of Pawns compared to pieces.

So it seems you would have to do a fair amount of evaluation on the position of material imbalance to correct for other positional factors, like Pawn structure, good/bad Bishops, Rook positioning, as those factors are unlikely to average out to zero in positions selected from games.
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Material imbalance evaluation

Post by mjlef »

Alessandro Scotti wrote:Hi,
I think everybody knows Kaufman's article on evaluating material, but just in case:

http://mywebpages.comcast.net/danheisma ... alance.htm

So far I haven't had any luck adding those corrections in Hamsters, but I would like to go deeper on the subject. Is there any (possibly free) tool that can help me perform an analysis similar to Kaufman's on a game collection?
For example, I need to be able to select games with more or less equal material but where an imbalance is present for several moves (e.g. N vs. B, NPP vs. R and so on) and then to get win/draw/loss statistics on those games.
This data might be very valuable and these adjustments seem to be one of the "secrets" of Rybka... :wink:
SCID has these capabilities:

scid.sourceforge.net

You can specify specific material and how many minimum moves and it will give winning percentages for each side, even grouped by player ratings. It lets you also search on piece position and such...I wish it knew how to determine passed pawns (you can do this with a complex list of all the opponent querae a pawn cannot be on, but it is not as flxible as I wqant)...and other things, but it is a good, free start.

Mark
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Material imbalance evaluation

Post by bob »

I completely agree. You need a _random_ same of games played where N for PP trade occurred. Not just a sample of games humans played where the players were good and they only played the sac when they had a strong positional edge after doing so.
Alessandro Scotti

Re: Material imbalance evaluation

Post by Alessandro Scotti »

Thanks Mark,
it seems SCID will do the work just fine! :-)
bob wrote:I completely agree. You need a _random_ same of games played where N for PP trade occurred. Not just a sample of games humans played where the players were good and they only played the sac when they had a strong positional edge after doing so.
I think if the two players are more or less matched, imbalances are kind of "accepted" by both. It's not like one decides to create an advantage and the other just welcomes it. But even when the imbalance is forced it can be useful to take a look at it, because it might be possible to detect related patterns.
Also, besides NPP vs. R and similar cases that are probably more difficult to evaluate properly, there are many N vs. B situations where evaluation adjustments can be helpful.
IMO this is at least worth trying, especially if there are good tools to extract the statistics, which is the worst part.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Material imbalance evaluation

Post by bob »

Alessandro Scotti wrote:Thanks Mark,
it seems SCID will do the work just fine! :-)
bob wrote:I completely agree. You need a _random_ same of games played where N for PP trade occurred. Not just a sample of games humans played where the players were good and they only played the sac when they had a strong positional edge after doing so.
I think if the two players are more or less matched, imbalances are kind of "accepted" by both. It's not like one decides to create an advantage and the other just welcomes it. But even when the imbalance is forced it can be useful to take a look at it, because it might be possible to detect related patterns.
Also, besides NPP vs. R and similar cases that are probably more difficult to evaluate properly, there are many N vs. B situations where evaluation adjustments can be helpful.
IMO this is at least worth trying, especially if there are good tools to extract the statistics, which is the worst part.
The problem is that the _games_ were played by humans, and most likely, most of the time the "sacs" are good. But a program is going to apply that bit of knowledge all over the tree where most of the sacs are awful.

Classic is N for 2-3 pawns. Generally this loses for the side giving up the knight, but if you look at human games, most work out successfully because they don't do it unless it is pretty solid.

I do just the opposite, which is also bad, in that I consider such trades to always be bad, which is also wrong. But it is less wrong than always making those kinds of trades.
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Material imbalance evaluation

Post by hgm »

I just did an interesting experiment in uMax concerning piece values. The standard version always used the 'classical' values 1,3,3,5,9. As uMax fails to correct for piece-square points of the captured piece, this makes trades of B vs N, B or N vs 3P, 2B vs R+P completely equal, and therefore often played. In many cases these trades are immediately fatal, however.

I tried to remedy this by taking the values 0.8, 2.8, 3.2, 5.2, 9.6, thinking that reducing the inclination to swap B for N would give an improvement. But in 1000 games of self-play, the result was an insignificant 50.6%.

The problem was, however, that ther Rooks were overestimated, as B+N for R+P was still neutral, and 2N for R+P even favored. When I reduced to R=4.8 and Q=9.2, the self-play result against the canonical values jumped to 55.5% over 1000 games.

So the effect of encouraging wrong trades can be really big, they often result in immediate loss of the game. So when in doubt, better not attempt them. I would trust such n experimental determination of the piece value much more reliable then analysis of game positions.
Uri Blass
Posts: 10267
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Material imbalance evaluation

Post by Uri Blass »

hgm wrote:I just did an interesting experiment in uMax concerning piece values. The standard version always used the 'classical' values 1,3,3,5,9. As uMax fails to correct for piece-square points of the captured piece, this makes trades of B vs N, B or N vs 3P, 2B vs R+P completely equal, and therefore often played. In many cases these trades are immediately fatal, however.

I tried to remedy this by taking the values 0.8, 2.8, 3.2, 5.2, 9.6, thinking that reducing the inclination to swap B for N would give an improvement. But in 1000 games of self-play, the result was an insignificant 50.6%.

The problem was, however, that ther Rooks were overestimated, as B+N for R+P was still neutral, and 2N for R+P even favored. When I reduced to R=4.8 and Q=9.2, the self-play result against the canonical values jumped to 55.5% over 1000 games.

So the effect of encouraging wrong trades can be really big, they often result in immediate loss of the game. So when in doubt, better not attempt them. I would trust such n experimental determination of the piece value much more reliable then analysis of game positions.
I believe that you still overestimate the difference between bishop and knight

If I understand correctly
You have
0.8, 2.8, 3.2, 4.8, 9.2

This gives a difference of 1/2 pawn between bishop and knight.
I think it may be better to have
0.8, 2.9, 3.1, 4.8, 9.2

Uri
User avatar
Bill Rogers
Posts: 3562
Joined: Thu Mar 09, 2006 3:54 am
Location: San Jose, California

Re: Material imbalance evaluation

Post by Bill Rogers »

Hey HGM
Years ago I did some theorical testing on piece values.
The first thing I did was to create a table for each man then I gave one point for each square, thus a paws gets two points, etc.
Second I gave an extra point for each man that could attack both colors with the exception of the bishops and the king. The king because it is not a good attacking piece under most circumstances.
So I arrived at the following values:
Pawns 3 :Knights 9 : Bishop 13 :Rook 15: queen 28 : king 7
When all of above are divided by 3 you get pretty close to what they were predicted to in the first place and resembling your number to a little extent.
pawn=1 knight=3 bisop= 4.3 rook=5 queen= 9.3 king=2.3
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Material imbalance evaluation

Post by hgm »

The most advanced theoretical considerations on ab-initio piece-value determination I have seen were by Ralph Betza. Apart from mobility, he also defined concepts such as 'forwardness' that seem to be important for piece strength. E.g. a piece that does only one diagonal step (Ferz) and a piece that does only one orthogonal step (Wazir) both have a mobility of 4. But the Ferz is more handicapped at the board edge, and averaged over all squares the mobility of the Wazir is 3.5, and of the Ferz 3.06. In addition, the Ferz is color-bound. like the Bishop. Yet in games the Ferz turns out to be the stronger piece of the two! (I did not believe this, of course, but I tested it by pitting opponents equipped with 8 Ferzes and 8 Wazirs in stead of Pawns against each other, and the Ferzes scored indeed over 60%.) This can be explained from the fact that a Ferz has two moves that go forward, while a Wazir has only one.

Other properties, like the concentration of the moves, can also be very important. This is also expressed in the mating potential of pieces. For this reason your counting method strongly underestimates the King. A King totally dominates a Knight in end-games if you forget about the royal aspect (a non-royal piece moving as King is know as a Commoner or Man). Of course the piece with the better focused set of target squares is hindred less by the board edge (K has 6.56 moves, N only 5.5 on the average), but that effect is not the main reason. KMK is a won end-game even on quite large boards, while KNK is always a draw, and on larger boards the mobility of the two approaches both 8. Also in combinations with other pieces the Man is usually stronger than the Knight. (Interestingly, KMKM is also won very often for one side, despite the material being even!)

I think that Uri is correct in that his values for N and B would be even better. The differences are minor, though, as they do encourage and discourage the same trades. So only in combination with positional factors they would move the point where the program would see enough compensation for a bad trade (sacrifice). On uMax such subteties are wasted, as its positional scoring is so primitive that it is hardly better than random, and I went for the 2.8 vs 3.2 simply because it the ratio of 2 sigle-digit integers (7:8), which saves 2 characters compared to needing 2-digit integers . :wink: One should also realize that uMax cannot recognize the Bishop pair, so the value for the Bishop includes also the averaged bonus for the fact that it might be one of a pair. This should be taken into account when comparing with the piece values in other programs, that give a separate bonus for possession of the B-pair.