Pawn Advantage, Win Percentage, and Elo

Adam Hair · Post by **Adam Hair** » Sun Apr 15, 2012 8:29 pm

Approximately 4 years ago, Pradu Kannan and Sune Fischer conducted a study to determine the relationship between pawn advantage, win percentage, and Elo (can be found at the Chess Programming Wiki). As a side product of some data mining I am doing, I can offer some additional confirmation of their findings.

I collected over 900,000 long time control engine vs engine games. Then, using Norm Pollock's utilities, I filtered the games so that each opponent was greater than 2700 Elo (CCRL scale) and, for each engine pairing, the two opponents were within 50 Elo of each other. This produced a database of 343,789 games to study.

Using ChessDB (a branch of Scid by Dr. David Kirby), I determined the winning percentage for each whole integer increment of material advantage. The material advantage of each position was determined using the following piece values: P = 1, N = B = 3, R = 5, Q = 9. Only moves after the 24th ply were considered (to partially avoid influence from opening books), and each material advantage had to exist for at least 6 ply. The winning percentage for when the material was even was set to 50%, based on the assumption that the difference of the winning percentage in such a case (54.7%) is entirely due to White advantage. For subsequent material advantages, the winning percentages for White and for Black were used in the following formula to produce a centered (my term) winning percentage: winning percentage = W% - avg(W% - B%) + 50%. This formula is used to mitigate the influence of White advantage. For example, the win percentage when White was ahead 2 pawns worth of material was 79.8%; for Black, the percentage was 25.3%. The resulting percentage used was 79.8% - ((79.8% + 25.3%)/2) +50% = 77.3%.

*Notes:

1) Material advantage can be thought of as the same as pawn advantage, since the value of a pawn is equal to the basic unit of material advantage.

2) The winning percentages are actually White's winning percentage. Thus, 25.3% was White's score when Black had a 2 pawn advantage.

3) 100% - winning percentage was used as the win percentage for negative material advantages. Thus, when the material advantage was -2 pawns, the winning percentage used was (100% - 77.3%) = 22.8%.

I plotted the resulting data and compared it to the following logistic model: Win Percentage = 1 / ( 1 + 10 ^(Pawn advantage/4))

The adjusted R squared value for this fit is 0.9983 and the root mean square error is 0.0156.

That model is a very good fit for the data. And when we compare this model to the model used to predict the expected score of a match based on Elo difference ( Expected score = 1 / ( 1 + 10 ^ (Elo Difference/400)) ), we can interpret each point of Elo difference to be analogous to a centiPawn. One should keep in mind that this is an average over all games where an X centipawn material advantage persisted for 6 plies. For a specific position, the true value of a paticular material advantage depends on the number and arrangement of the pieces on the board.

I should report that many of these games were adjudicated early (winner declared by GUI when score is - X centipawns for Y consecutive moves). Presumably, this would have no different effect than if the losing engine resigned. In reality, given that not all engines use table bases and might make a mistake (thus causing a won game to be drawn), there probably is some effect.

One thing that I have not studied or given thought to yet is exactly why my data, using the relative piece values I gave above, and Pradu's and Sune's data, using P=1, N=4. B=4.1, R=6, Q=12, should agree so well. I will think about this, and welcome anybody else to give a reason.

Ajedrecista · Post by **Ajedrecista** » Sun Apr 15, 2012 9:16 pm

Hi Adam!

This study seems really interesting, although the pawn advantage (I understand it as the eval of an engine, but I can be wrong because I am only a mere afficionado to computer chess) could be a little arbitrary, depending on the engine you choose.

I reply to your post because I remember that there was a similar topic in Rybka Forum long ago:

Rybka eval scores vs winning % (actually Elo)

That topic was started by Vasik Rajlich in August of 2008. I do not know if conclusions (if any) are similar to yours... I am doing mainly a crossposting and nothing more.

Thanks for your study. Also thanks to the guys that programmed those utilities... they must be useful in the appropiate hands (not me)!

Regards from Spain.

Ajedrecista.

hgm · Post by **hgm** » Sun Apr 15, 2012 10:14 pm

Very interesting. What strikes me is that your Pawn advantage is rather small (~12% excess score). When I played Pawn-odds games the advantage was more in the neighborhood of +18%. The latter seemed quite independent of quality of play (achieved by changing TC), although I never got anywhere near a quality of 2700 Elo. The number was consistent, though, with reports on Pawn-odds self-play from Rybka in 1-min games, though, which was +22% (but I think that included the white advantage).

I wonder what cause the difference. I can think of several reasons:
* The Pawn-odds results were all self-play, which might magnify the result
* The total eval advantage could systematically be lower than the material advantage in the selected games, because engines do not give up material for free, and are likely to wrestle some compensation out of the loss.
* The games you selected are so high quality that the draw rate is significantly higher

Don · Post by **Don** » Mon Apr 16, 2012 1:58 am

Nice Study Adam,

We have found that Komodo and virtually any other program fit's that curve nicely, assuming you scale the score specifically to the program in question. A pawn advantage means something a little different in each program.

I think temporal difference learning also makes use of that formula (or it's inverse) in order to propagate win/loss/draw results to actual scores.

Adam Hair wrote:Approximately 4 years ago, Pradu Kannan and Sune Fischer conducted a study to determine the relationship between pawn advantage, win percentage, and Elo (can be found at the Chess Programming Wiki). As a side product of some data mining I am doing, I can offer some additional confirmation of their findings.

I collected over 900,000 long time control engine vs engine games. Then, using Norm Pollock's utilities, I filtered the games so that each opponent was greater than 2700 Elo (CCRL scale) and, for each engine pairing, the two opponents were within 50 Elo of each other. This produced a database of 343,789 games to study.

Using ChessDB (a branch of Scid by Dr. David Kirby), I determined the winning percentage for each whole integer increment of material advantage. The material advantage of each position was determined using the following piece values: P = 1, N = B = 3, R = 5, Q = 9. Only moves after the 24th ply were considered (to partially avoid influence from opening books), and each material advantage had to exist for at least 6 ply. The winning percentage for when the material was even was set to 50%, based on the assumption that the difference of the winning percentage in such a case (54.7%) is entirely due to White advantage. For subsequent material advantages, the winning percentages for White and for Black were used in the following formula to produce a centered (my term) winning percentage: winning percentage = W% - avg(W% - B%) + 50%. This formula is used to mitigate the influence of White advantage. For example, the win percentage when White was ahead 2 pawns worth of material was 79.8%; for Black, the percentage was 25.3%. The resulting percentage used was 79.8% - ((79.8% + 25.3%)/2) +50% = 77.3%.

*Notes:

1) Material advantage can be thought of as the same as pawn advantage, since the value of a pawn is equal to the basic unit of material advantage.

2) The winning percentages are actually White's winning percentage. Thus, 25.3% was White's score when Black had a 2 pawn advantage.

3) 100% - winning percentage was used as the win percentage for negative material advantages. Thus, when the material advantage was -2 pawns, the winning percentage used was (100% - 77.3%) = 22.8%.

I plotted the resulting data and compared it to the following logistic model: Win Percentage = 1 / ( 1 + 10 ^(Pawn advantage/4))

The adjusted R squared value for this fit is 0.9983 and the root mean square error is 0.0156.

That model is a very good fit for the data. And when we compare this model to the model used to predict the expected score of a match based on Elo difference ( Expected score = 1 / ( 1 + 10 ^ (Elo Difference/400)) ), we can interpret each point of Elo difference to be analogous to a centiPawn. One should keep in mind that this is an average over all games where an X centipawn material advantage persisted for 6 plies. For a specific position, the true value of a paticular material advantage depends on the number and arrangement of the pieces on the board.

I should report that many of these games were adjudicated early (winner declared by GUI when score is - X centipawns for Y consecutive moves). Presumably, this would have no different effect than if the losing engine resigned. In reality, given that not all engines use table bases and might make a mistake (thus causing a won game to be drawn), there probably is some effect.

One thing that I have not studied or given thought to yet is exactly why my data, using the relative piece values I gave above, and Pradu's and Sune's data, using P=1, N=4. B=4.1, R=6, Q=12, should agree so well. I will think about this, and welcome anybody else to give a reason.

Adam Hair · Post by **Adam Hair** » Mon Apr 16, 2012 3:51 am

Ajedrecista wrote:Hi Adam!

This study seems really interesting, although the pawn advantage (I understand it as the eval of an engine, but I can be wrong because I am only a mere afficionado to computer chess) could be a little arbitrary, depending on the engine you choose.

Hi Jesús. I am fairly certain that what you say is true. Some engines may avoid a material advantage in a certain situation because they recognize it may lead to a drawn/lost situation. Other engines may fall into that trap.

Ajedrecista wrote: I reply to your post because I remember that there was a similar topic in Rybka Forum long ago:

Rybka eval scores vs winning % (actually Elo)

That topic was started by Vasik Rajlich in August of 2008. I do not know if conclusions (if any) are similar to yours... I am doing mainly a crossposting and nothing more.

Thanks for the link. I have seen references to that thread, but I had not read it before now.

Ajedrecista wrote: Thanks for your study. Also thanks to the guys that programmed those utilities... they must be useful in the appropiate hands (not me)!

Regards from Spain.

Ajedrecista.

Those utilities are a great help to a non-programmer such as myself.

Adam Hair · Post by **Adam Hair** » Mon Apr 16, 2012 5:05 am

hgm wrote:Very interesting. What strikes me is that your Pawn advantage is rather small (~12% excess score). When I played Pawn-odds games the advantage was more in the neighborhood of +18%. The latter seemed quite independent of quality of play (achieved by changing TC), although I never got anywhere near a quality of 2700 Elo. The number was consistent, though, with reports on Pawn-odds self-play from Rybka in 1-min games, though, which was +22% (but I think that included the white advantage).

I wonder what cause the difference. I can think of several reasons:
* The Pawn-odds results were all self-play, which might magnify the result
* The total eval advantage could systematically be lower than the material advantage in the selected games, because engines do not give up material for free, and are likely to wrestle some compensation out of the loss.
* The games you selected are so high quality that the draw rate is significantly higher

Your first two reasons seem to be likely, especially the second reason.

In response to your third reason, I extracted the same information from the CCRL 40/4 database. The only filter that I applied was to chose games where the opponents were within 50 Elo of each other. Here is a plot of the data with the logistic model from my previous post:

As you can see, the curve does not model this data quite as well as it did the previous data. In addition, as I was accumulating the data, I could see that the results for engines 2700 Elo or greater was similar to my previous data, but the results for the lower rated engines (with lower draw rates) were different. Here is a graph of the data with the best fit logistic curve (Win Percentage = 1/(1 + 10^((1.158*Pawn Advantage/4))) ):

The prediction from this model for one pawn is ~16.1% excess (14% for the first model), which is more in line with your findings. However, the excess win percentage for a 1 pawn advantage from this data is ~13.5%, as opposed your data. I believe that this is due to reason 2, by way of influence from the opening books. The opening books attempt to give a variety of even positions for the start of matches. Presumably, a pawn may be given up early in these match with some positional compensation likely. Out of ~290,000 games, there were ~65,000 games where there was a 1 pawn advantage (sustained for 6 plies) between moves 13 and 25. The centered winning percentage was 55.4%, much lower than the average for all games with a 1 pawn advantage (sustained for 6 plies). I guess that reason 1 applies also. One would expect that an engine would more likely be able to translate a 1 pawn advantage into a win in self-play.

Adam Hair · Post by **Adam Hair** » Mon Apr 16, 2012 5:13 am

Don wrote:Nice Study Adam,

We have found that Komodo and virtually any other program fit's that curve nicely, assuming you scale the score specifically to the program in question. A pawn advantage means something a little different in each program.

I think temporal difference learning also makes use of that formula (or it's inverse) in order to propagate win/loss/draw results to actual scores.

Thanks, Don.

I believe I have seen evidence of the scaling you refer to, though I have not studied individual engines. Comparing the high-rated, long time control games to all of the CCRL blitz games showed that the pawn advantage must be scaled differently to more accurately model the results from different databases.

Adam Hair · Post by **Adam Hair** » Mon Apr 16, 2012 5:21 am

Ajedrecista wrote:Thanks for your study. Also thanks to the guys that programmed those utilities... they must be useful in the appropiate hands (not me)!

Regards from Spain.

Ajedrecista.

I know that you can do regressions with a calculator. I can too, but I recommend that you look at this site:
http://zunzun.com/

There are a multitude of different regression models and methods available there, along with the ability to make graphs (as you can see in my posts). It allows me to spend much more time analyzing results as opposed to calculating. The older I get, the less calculations I wish to do

Adam

Pawn Advantage, Win Percentage, and Elo

Pawn Advantage, Win Percentage, and Elo

Re: Pawn Advantage, Win Percentage, and Elo

Re: Pawn Advantage, Win Percentage, and Elo

Re: Pawn Advantage, Win Percentage, and Elo

Re: Pawn Advantage, Win Percentage, and Elo

Re: Pawn Advantage, Win Percentage, and Elo

Re: Pawn Advantage, Win Percentage, and Elo

Re: Pawn Advantage, Win Percentage, and Elo