Stockfish contempt factor

Laskos · Post by **Laskos** » Tue Mar 10, 2015 6:14 pm

I did not pay much attention to this issue, and don't know who pushed it and what it does. But it seems quite beneficial for Stockfish in rating lists if it stays ahead of the competition, which seems plausible. Maybe official releases would better have positive contempt, although it is not passing Fishtest?

I played ultra-bullet games to check it, 10,000 games each data point. Three different opponents: Texel 1.05, often the level of the lowest rated opponent in rating lists, Komodo 8, usually the highest rated. And against itself, to see if it can pass Fishtest and how much it deteriorates in self-playing. A Stockfish from 2nd of March. The second row is Elo difference.

Code: Select all

SF vs TEXEL 
Con&#58;

  0&#58;  388  
 20&#58;  396
 40&#58;  398 
 60&#58;  408  ---
 80&#58;  402 
100&#58;  398 


SF vs KOM  
Con&#58;

  0&#58;   56  
 20&#58;   59  ---
 40&#58;   58
 60&#58;   56
 80&#58;   47


SF vs SF Con=0
Con&#58;

  0&#58;    0  --- 
 20&#58;   -3
 40&#58;   -8
 60&#58;  -11 
 80&#58;  -20

I guess for rating lists not only Contempt=20 is beneficial, Contempt=40 or =60 could be rated the highest. The rating difference in optimal cases (shown by "---") increases by some 5%. And it compares favorably to Houdini 4 contempt scheme:

Code: Select all

H4 vs TEXEL
Con&#58;

0&#58;  316
1&#58;  321
2&#58;  326  ---

H4 vs H4 Con=0
Con&#58;

0&#58;    0  ---
1&#58;   -3
2&#58;   -7

JJJ · Post by **JJJ** » Tue Mar 10, 2015 6:40 pm

And you can also test Komodo with drawscore.

Gurcan Uckardes · Post by **Gurcan Uckardes** » Tue Mar 10, 2015 7:06 pm

Results are close enough to let us ask what would happen in longer tc's. Maybe the picture is totally changing.

bob · Post by **bob** » Tue Mar 10, 2015 7:10 pm

Laskos wrote:I did not pay much attention to this issue, and don't know who pushed it and what it does. But it seems quite beneficial for Stockfish in rating lists if it stays ahead of the competition, which seems plausible. Maybe official releases would better have positive contempt, although it is not passing Fishtest?

I played ultra-bullet games to check it, 10,000 games each data point. Three different opponents: Texel 1.05, often the level of the lowest rated opponent in rating lists, Komodo 8, usually the highest rated. And against itself, to see if it can pass Fishtest and how much it deteriorates in self-playing. A Stockfish from 2nd of March. The second row is Elo difference.
Code: Select all
SF vs TEXEL 
Con&#58;

  0&#58;  388  
 20&#58;  396
 40&#58;  398 
 60&#58;  408  ---
 80&#58;  402 
100&#58;  398 


SF vs KOM  
Con&#58;

  0&#58;   56  
 20&#58;   59  ---
 40&#58;   58
 60&#58;   56
 80&#58;   47


SF vs SF Con=0
Con&#58;

  0&#58;    0  --- 
 20&#58;   -3
 40&#58;   -8
 60&#58;  -11 
 80&#58;  -20
I guess for rating lists not only Contempt=20 is beneficial, Contempt=40 or =60 could be rated the highest. The rating difference in optimal cases (shown by "---") increases by some 5%. And it compares favorably to Houdini 4 contempt scheme:
Code: Select all
H4 vs TEXEL
Con&#58;

0&#58;  316
1&#58;  321
2&#58;  326  ---

H4 vs H4 Con=0
Con&#58;

0&#58;    0  ---
1&#58;   -3
2&#58;   -7

As you've noticed, I don't think self-play will work at all here. You are telling one "version" (which is equal to the other) to give up a little to avoid a draw when possible, so that side is always going to be giving away a few evaluation points in every game. And that has to affect things negatively. If you are really better than your opponent, then a positive contempt would be good since you would try a little harder to avoid draws, extend the game, and outplay him.

This just can't work in self-test however, because the two programs are equal. This should work equally poorly if you actually find two different programs that are exactly the same strength, but that is harder to do.

I have played around here and there with some SPRT testing with Crafty. Not a big fan. I have seen too many cases where -1.5, +4.5 will fail low and reject the change, yet a normal gauntlet will show the change as a positive Elo gain.

I now use SPRT for quick sanity tests to see if a major change works or breaks.

bob · Post by **bob** » Tue Mar 10, 2015 7:12 pm

JJJ wrote:And you can also test Komodo with drawscore.

That should be what "contempt factor" is. If you think you are better than your opponent, use -20 or something so that you will accept a -10 before you will allow that -20 draw to occur, hoping that if you hold on and keep the game going, you will be able to outplay your opponent.

Laskos · Post by **Laskos** » Tue Mar 10, 2015 9:28 pm

bob wrote:
As you've noticed, I don't think self-play will work at all here. You are telling one "version" (which is equal to the other) to give up a little to avoid a draw when possible, so that side is always going to be giving away a few evaluation points in every game. And that has to affect things negatively. If you are really better than your opponent, then a positive contempt would be good since you would try a little harder to avoid draws, extend the game, and outplay him.

This just can't work in self-test however, because the two programs are equal. This should work equally poorly if you actually find two different programs that are exactly the same strength, but that is harder to do.

I have played around here and there with some SPRT testing with Crafty. Not a big fan. I have seen too many cases where -1.5, +4.5 will fail low and reject the change, yet a normal gauntlet will show the change as a positive Elo gain.

I now use SPRT for quick sanity tests to see if a major change works or breaks.

I was not hoping for positive Elo gain in self play, but a negative around -1 point could pass a -3.0, +1.0 SPRT. I would use SPRT whenever I can, but, for example, to stop with alpha, beta=0.05 picking up -3 Elo points "real" difference instead of 0 would need some 20,000-40,000 games, a test too long to perform. Then, the stop value as Elo points gap is poorly chosen, and I limited myself with fixed 10,000 games matches, although the 3 Elo points differences are still out of reach and are just for guidance.

Laskos · Post by **Laskos** » Wed Mar 11, 2015 2:42 pm

JJJ wrote:And you can also test Komodo with drawscore.

Here it is, against Texel and against Stockfish, the strongest opponent:

Code: Select all

K8 vs TEXEL 
DS&#58;

   0&#58;   328
 -20&#58;   335
 -40&#58;   340
 -60&#58;   346  ---
 -80&#58;   336
-100&#58;   319


K8 vs SF Con=0
DS&#58;

  40&#58;   -52
  20&#58;   -46  ---
   0&#58;   -57
 -20&#58;   -61
 -40&#58;   -65

Not being the strongest engine, it would be more difficult for Komodo to choose a universal contempt factor.

jefk · Post by **jefk** » Wed Mar 18, 2015 12:42 pm

a few more little ideas, and ideally these could
be done automatically (eg for Stockfish)

- besides having a higher contempt for a weaker engine (*),
it might be useful to reduce contempt when the engine is
defending (which more often happens with Black btw)
(*) how to recognize the rating of the other comp online
should be possible i guess, if i'm correct Crafty has or
once had such a feature)

- vice versa, when the comp is a pawn (Score) up or
so, increasing the contempt slightly might be beneficial

- scaling of such features could be done with a parameter
'agressiveness' (eagerness to win, or willing to accept a draw)
instead of the original contempt parameter (which then
would be adapted automatically)

jef

JJJ · Post by **JJJ** » Wed Mar 18, 2015 2:09 pm

Laskos wrote:
JJJ wrote:And you can also test Komodo with drawscore.
Here it is, against Texel and against Stockfish, the strongest opponent:
Code: Select all
K8 vs TEXEL 
DS&#58;

   0&#58;   328
 -20&#58;   335
 -40&#58;   340
 -60&#58;   346  ---
 -80&#58;   336
-100&#58;   319


K8 vs SF Con=0
DS&#58;

  40&#58;   -52
  20&#58;   -46  ---
   0&#58;   -57
 -20&#58;   -61
 -40&#58;   -65
Not being the strongest engine, it would be more difficult for Komodo to choose a universal contempt factor.

Not the first time I see Komodo doing a slighty better score against a better opponent with drawscore not set to zero. But in your case, it is with positive drawscore.

Stockfish contempt factor

Stockfish contempt factor

Re: Stockfish contempt factor

Re: Stockfish contempt factor

Re: Stockfish contempt factor

Re: Stockfish contempt factor

Re: Stockfish contempt factor

Re: Stockfish contempt factor

Re: Stockfish contempt factor

Re: Stockfish contempt factor