The importance of contempt

Don · Post by **Don** » Sun Aug 28, 2011 11:06 pm

Komodo has a parameter which allows you to set the draw score - often known as the contempt factor. If Komodo has contempt it should be negative. The nomenclature "draw score" is based on a suggestion by Gian Carlo.

How important is it? In the Leiden tournament we were playing a significantly weaker opponent in an early round and nearly accepted a draw. The draw score was fixed at the time and was set at zero, a sensible value that assumes you don't know if your opponent is weaker or stronger. We watched helplessly as the result was out of our hands and Komodo seemed to be ok with a draw, feeling it's position was just slightly inferior. As things turned out, Komodo's opponent avoided the draw although there were some repetitions, just not 3-fold. A very close call.

As a result of this we immediately added a contempt factor to the program in order to avoid this situation in the future and also in anticipation of having to play Rybka and other strong programs in the same tournament so we could use a negative contempt (or positive draw score.)

But my question is "How important is contempt?" My feeling at the time was that it would be unlikely to make more than a couple of ELO difference. But I think it's more important than I had imagined. In Komodo the contempt factor is applied to repetition and stalemate, but not other draws.

In my study I am simulating a weaker opponent by using fixed nodes levels and handicapping this "weak" player. "weak" is about 270 ELO weaker and plays a gauntlet against "strong" komodo versions using various contempt factors. An opponent who is 270 ELO weaker still represents a dangerous opponent who will score almost 20% against you. I will repeat this study with opponents who are less different and more different to get a sense of how to compute a contempt factor given and ELO difference. I think that can be computed with a straightforward formula which can be used to convert a chess program evaluation score to a winning percentage (or ELO number) and would have to be calibrated for each program.

The match is still in progress, but the early indications are that this is far more important than I had realized previously.

Has anyone else studied this issue in any serious detail? Here is the current ongoing result of my small study where I'm testing "drawscore" of 0, -10, -20 and -30.

Code: Select all

Rank Name              Elo      +      -    games   score   oppo.   draws 
   1 kse-4290.00-30  3030.5   10.8   10.8    6831   85.7%  2733.1   15.9% 
   2 kse-4290.00-20  3016.5   10.5   10.5    6831   84.8%  2733.1   17.4% 
   3 kse-4290.00-10  3013.1   10.4   10.4    6832   84.6%  2733.1   18.1% 
   4 kse-4290.00-0   3000.0   10.2   10.2    6832   83.7%  2733.1   19.8% 
   5 weak            2733.1    5.2    5.2   27326   15.3%  3015.0   17.8%

[/size]

rbarreira · Post by **rbarreira** » Mon Aug 29, 2011 1:00 am

Very interesting. According to that test the optimal value is probably even smaller than -30.

BubbaTough · Post by **BubbaTough** » Mon Aug 29, 2011 1:10 am

On my tests before releasing Hannibal, it looked like contempt of 20-30 against stronger opponents was worth 20-30 elo. I have not tested it since then. It should be noted though that Hannibal has a somewhat richer than average concept of what is "drawish" which interacts with contempt at all stages of the game, not just during a repetition or stalemate or such. I suspect if all you are doing is changing the value of a draw, contempt becomes less valuable.

-Sam

Don · Post by **Don** » Mon Aug 29, 2011 2:05 am

rbarreira wrote:Very interesting. According to that test the optimal value is probably even smaller than -30.

Update:

Code: Select all

Rank Name              Elo      +      -    games   score   oppo.   draws 
   1 kse-4290.00-30  3028.7    9.1    9.1    9528   85.5%  2733.9   16.1% 
   2 kse-4290.00-20  3022.3    9.0    9.0    9523   85.2%  2733.9   17.1% 
   3 kse-4290.00-40  3019.2    9.0    9.0    9523   84.7%  2733.9   16.5% 
   4 kse-4290.00-10  3011.0    8.8    8.8    9526   84.4%  2733.9   18.4% 
   5 kse-4290.00-0   3000.0    8.6    8.6    9525   83.7%  2733.9   19.9% 
   6 weak            2733.9    4.0    4.0   47625   15.3%  3016.2   17.6%

A very sharp drop going to -40, so it's possible -30 overshot the mark or that the best value is between -30 and -40.

Don · Post by **Don** » Mon Aug 29, 2011 2:31 am

BubbaTough wrote:On my tests before releasing Hannibal, it looked like contempt of 20-30 against stronger opponents was worth 20-30 elo. I have not tested it since then. It should be noted though that Hannibal has a somewhat richer than average concept of what is "drawish" which interacts with contempt at all stages of the game, not just during a repetition or stalemate or such. I suspect if all you are doing is changing the value of a draw, contempt becomes less valuable.

-Sam

Do you mean contempt against weaker opponents or stronger opponents? You would use a negative contempt against stronger opponents. I am using the term "drawscore" in Komodo to avoid too much confusion.

I'm pretty sure the correct "drawscore" should be based on the strength of your opponent and you just mention "stronger opponents" without saying how much. You want to be willing to accept an inferior position to avoid a draw if you believe you are stronger than your opponent but how inferior the position must be before you are willing to embrace a draw is going to be a function of how much your extra skill can overcome.

This really comes down to a game of "chicken" if you are in a pattern where both sides are repeating and one side must yield to avoid the draw. The player who wants to win the most (and thinks it is worth the risk to try) must yield.

Sometimes a repetition or possibility of it can be viewed as a threat to the other side and thus there is potential leverage - the other player may have to go out of his way to avoid the draw.

A friend of mine was playing about 300 ELO down at a tournament and purposely played an inferior move, for fear of ending up in a very drawish ending. So even though this was not a repetition it was the threat of a draw that caused him to play differently.

Presumably you are talking about material trades when you talk of more than just repetition and stalemate?

bob · Post by **bob** » Mon Aug 29, 2011 2:55 am

Don wrote:Komodo has a parameter which allows you to set the draw score - often known as the contempt factor. If Komodo has contempt it should be negative. The nomenclature "draw score" is based on a suggestion by Gian Carlo.

How important is it? In the Leiden tournament we were playing a significantly weaker opponent in an early round and nearly accepted a draw. The draw score was fixed at the time and was set at zero, a sensible value that assumes you don't know if your opponent is weaker or stronger. We watched helplessly as the result was out of our hands and Komodo seemed to be ok with a draw, feeling it's position was just slightly inferior. As things turned out, Komodo's opponent avoided the draw although there were some repetitions, just not 3-fold. A very close call.

As a result of this we immediately added a contempt factor to the program in order to avoid this situation in the future and also in anticipation of having to play Rybka and other strong programs in the same tournament so we could use a negative contempt (or positive draw score.)

But my question is "How important is contempt?" My feeling at the time was that it would be unlikely to make more than a couple of ELO difference. But I think it's more important than I had imagined. In Komodo the contempt factor is applied to repetition and stalemate, but not other draws.

In my study I am simulating a weaker opponent by using fixed nodes levels and handicapping this "weak" player. "weak" is about 270 ELO weaker and plays a gauntlet against "strong" komodo versions using various contempt factors. An opponent who is 270 ELO weaker still represents a dangerous opponent who will score almost 20% against you. I will repeat this study with opponents who are less different and more different to get a sense of how to compute a contempt factor given and ELO difference. I think that can be computed with a straightforward formula which can be used to convert a chess program evaluation score to a winning percentage (or ELO number) and would have to be calibrated for each program.

The match is still in progress, but the early indications are that this is far more important than I had realized previously.

Has anyone else studied this issue in any serious detail? Here is the current ongoing result of my small study where I'm testing "drawscore" of 0, -10, -20 and -30.
Code: Select all
Rank Name              Elo      +      -    games   score   oppo.   draws 
   1 kse-4290.00-30  3030.5   10.8   10.8    6831   85.7%  2733.1   15.9% 
   2 kse-4290.00-20  3016.5   10.5   10.5    6831   84.8%  2733.1   17.4% 
   3 kse-4290.00-10  3013.1   10.4   10.4    6832   84.6%  2733.1   18.1% 
   4 kse-4290.00-0   3000.0   10.2   10.2    6832   83.7%  2733.1   19.8% 
   5 weak            2733.1    5.2    5.2   27326   15.3%  3015.0   17.8% 
[/size]

I've had a variable draw score in Crafty for about 15 years now. I first had to add the "rating" command to xboard, and get Tim to make it standard. I then used that for several things. Book learning is an obvious one, as the quality of your opponent dictates what you should "learn" by playing an opening. Draw score is another, increase it against stronger opponents, reduce it against weaker opponents. I do use draw-score for more than just repetitions and 50-move draws, I use it in the eval when I notice drawish positions (like opposite colored bishops where I drag the score toward draw-score as material comes off, etc...

BubbaTough · Post by **BubbaTough** » Mon Aug 29, 2011 3:04 am

In my test, I was trying FOR a draw against stronger opposition. It was worth 20-30 elo, to rate a draw worth 20-30 centipawns.

The game of repetition chicken is of relevant, and for engines whose only use of contempt is to change the value of an actual draw (such as draw by repetition) the winner of the game of chicken is about the only effect. I don't know how much elo that is worth, because it is a small subset of what Hannibal considers "drawish". Trades are in some cases a factor as you suggest, though there are probably over a dozen things in there that effect how "drawish" I consider something. If its an area your team plans to actively working on, I don't mind exchanging ideas/results on this offline, but only if you are working on it (as opposed to just curious).

-Sam

Don wrote:
BubbaTough wrote:On my tests before releasing Hannibal, it looked like contempt of 20-30 against stronger opponents was worth 20-30 elo. I have not tested it since then. It should be noted though that Hannibal has a somewhat richer than average concept of what is "drawish" which interacts with contempt at all stages of the game, not just during a repetition or stalemate or such. I suspect if all you are doing is changing the value of a draw, contempt becomes less valuable.

-Sam
Do you mean contempt against weaker opponents or stronger opponents? You would use a negative contempt against stronger opponents. I am using the term "drawscore" in Komodo to avoid too much confusion.

I'm pretty sure the correct "drawscore" should be based on the strength of your opponent and you just mention "stronger opponents" without saying how much. You want to be willing to accept an inferior position to avoid a draw if you believe you are stronger than your opponent but how inferior the position must be before you are willing to embrace a draw is going to be a function of how much your extra skill can overcome.

This really comes down to a game of "chicken" if you are in a pattern where both sides are repeating and one side must yield to avoid the draw. The player who wants to win the most (and thinks it is worth the risk to try) must yield.

Sometimes a repetition or possibility of it can be viewed as a threat to the other side and thus there is potential leverage - the other player may have to go out of his way to avoid the draw.

A friend of mine was playing about 300 ELO down at a tournament and purposely played an inferior move, for fear of ending up in a very drawish ending. So even though this was not a repetition it was the threat of a draw that caused him to play differently.

Presumably you are talking about material trades when you talk of more than just repetition and stalemate?

Don · Post by **Don** » Mon Aug 29, 2011 5:07 am

bob wrote:
Don wrote:Komodo has a parameter which allows you to set the draw score - often known as the contempt factor. If Komodo has contempt it should be negative. The nomenclature "draw score" is based on a suggestion by Gian Carlo.

How important is it? In the Leiden tournament we were playing a significantly weaker opponent in an early round and nearly accepted a draw. The draw score was fixed at the time and was set at zero, a sensible value that assumes you don't know if your opponent is weaker or stronger. We watched helplessly as the result was out of our hands and Komodo seemed to be ok with a draw, feeling it's position was just slightly inferior. As things turned out, Komodo's opponent avoided the draw although there were some repetitions, just not 3-fold. A very close call.

As a result of this we immediately added a contempt factor to the program in order to avoid this situation in the future and also in anticipation of having to play Rybka and other strong programs in the same tournament so we could use a negative contempt (or positive draw score.)

But my question is "How important is contempt?" My feeling at the time was that it would be unlikely to make more than a couple of ELO difference. But I think it's more important than I had imagined. In Komodo the contempt factor is applied to repetition and stalemate, but not other draws.

In my study I am simulating a weaker opponent by using fixed nodes levels and handicapping this "weak" player. "weak" is about 270 ELO weaker and plays a gauntlet against "strong" komodo versions using various contempt factors. An opponent who is 270 ELO weaker still represents a dangerous opponent who will score almost 20% against you. I will repeat this study with opponents who are less different and more different to get a sense of how to compute a contempt factor given and ELO difference. I think that can be computed with a straightforward formula which can be used to convert a chess program evaluation score to a winning percentage (or ELO number) and would have to be calibrated for each program.

The match is still in progress, but the early indications are that this is far more important than I had realized previously.

Has anyone else studied this issue in any serious detail? Here is the current ongoing result of my small study where I'm testing "drawscore" of 0, -10, -20 and -30.
Code: Select all
Rank Name              Elo      +      -    games   score   oppo.   draws 
   1 kse-4290.00-30  3030.5   10.8   10.8    6831   85.7%  2733.1   15.9% 
   2 kse-4290.00-20  3016.5   10.5   10.5    6831   84.8%  2733.1   17.4% 
   3 kse-4290.00-10  3013.1   10.4   10.4    6832   84.6%  2733.1   18.1% 
   4 kse-4290.00-0   3000.0   10.2   10.2    6832   83.7%  2733.1   19.8% 
   5 weak            2733.1    5.2    5.2   27326   15.3%  3015.0   17.8% 
[/size]
I've had a variable draw score in Crafty for about 15 years now. I first had to add the "rating" command to xboard, and get Tim to make it standard. I then used that for several things. Book learning is an obvious one, as the quality of your opponent dictates what you should "learn" by playing an opening. Draw score is another, increase it against stronger opponents, reduce it against weaker opponents. I do use draw-score for more than just repetitions and 50-move draws, I use it in the eval when I notice drawish positions (like opposite colored bishops where I drag the score toward draw-score as material comes off, etc...

We do a similar thing with opposite color bishops and several other endings - we multiply the score by some constant less than 1.0 for endings we consider drawish. We also assign scores to certain endings and reduce others by constant amounts to more reflect their actual winning chances. We could in principle modify all of these things to account for a higher or lower draw score.

For my specific test case it looks like just changing the draw score to -30 for rep, stalemate (and 50 move draw) gains at least 25 ELO against an opponent 270 ELO weaker or so. I assume that it would have a similar affect if you use +30 against an opponent 270 ELO stronger but I have not tested that yet. I have already tested against an equal opponent and even -10 contempt against an equal opponent hurts by 5 or more ELO, so this is more than just academic.

BubbaTough · Post by **BubbaTough** » Mon Aug 29, 2011 5:24 am

Don wrote: We do a similar thing with opposite color bishops and several other endings - we multiply the score by some constant less than 1.0 for endings we consider drawish. We also assign scores to certain endings and reduce others by constant amounts to more reflect their actual winning chances. We could in principle modify all of these things to account for a higher or lower draw score.

For my specific test case it looks like just changing the draw score to -30 for rep, stalemate (and 50 move draw) gains at least 25 ELO against an opponent 270 ELO weaker or so. I assume that it would have a similar affect if you use +30 against an opponent 270 ELO stronger but I have not tested that yet. I have already tested against an equal opponent and even -10 contempt against an equal opponent hurts by 5 or more ELO, so this is more than just academic.

Here is what I do to factor in all values into contempt.

score = ((score * (MAX_DRAW-ei.draw)) + (DrawValue[WHITE] * ei.draw))/MAX_DRAW;

Most of my changes effect a draw % score, though there are a few constant adjustments here and there that do not use contempt factors (those things I think of as adjusting who is winning more than chance of draw). I would guess those things are similar to your constant adjustments.

For me, it makes a big difference in my tests, though I never remember to turn it on during tournaments and no rating lists support auto-contempt changes, so I guess the only use its been is my own satisfaction.

-Sam

Don · Post by **Don** » Mon Aug 29, 2011 5:29 am

BubbaTough wrote:
Don wrote: We do a similar thing with opposite color bishops and several other endings - we multiply the score by some constant less than 1.0 for endings we consider drawish. We also assign scores to certain endings and reduce others by constant amounts to more reflect their actual winning chances. We could in principle modify all of these things to account for a higher or lower draw score.

For my specific test case it looks like just changing the draw score to -30 for rep, stalemate (and 50 move draw) gains at least 25 ELO against an opponent 270 ELO weaker or so. I assume that it would have a similar affect if you use +30 against an opponent 270 ELO stronger but I have not tested that yet. I have already tested against an equal opponent and even -10 contempt against an equal opponent hurts by 5 or more ELO, so this is more than just academic.
Here is what I do to factor in all values into contempt.

score = ((score * (MAX_DRAW-ei.draw)) + (DrawValue[WHITE] * ei.draw))/MAX_DRAW;

Most of my changes do into a draw % score, though there are a few constant adjustments here and there that do not use contempt factors (those things I think of as adjusting who is winning more than chance of draw). I would guess those things are similar to your constant adjustments.

For me, it makes a big difference in my tests, though I never remember to turn it on during tournaments and no rating lists support auto-contempt changes, so I guess the only use its been is my own satisfaction.

-Sam

The rating lists don't broadcast rating differences to the engines, but it's probably just as well. If BOTH programs use the appropriate contempt factor, I believe the effect would cancel out and you would get similar results to both programs setting it to zero. Setting to zero handicaps both the stronger and the weaker program so perhaps it comes out the same. I'm not sure they are handicapped equally, but I'll bet any difference is pretty small.

The importance of contempt

The importance of contempt

Re: The importance of contempt

Re: The importance of contempt

Re: The importance of contempt

Re: The importance of contempt

Re: The importance of contempt

Re: The importance of contempt

Re: The importance of contempt

Re: The importance of contempt

Re: The importance of contempt