Stockfish contempt factor

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Stockfish contempt factor

Post by Laskos »

I did not pay much attention to this issue, and don't know who pushed it and what it does. But it seems quite beneficial for Stockfish in rating lists if it stays ahead of the competition, which seems plausible. Maybe official releases would better have positive contempt, although it is not passing Fishtest?

I played ultra-bullet games to check it, 10,000 games each data point. Three different opponents: Texel 1.05, often the level of the lowest rated opponent in rating lists, Komodo 8, usually the highest rated. And against itself, to see if it can pass Fishtest and how much it deteriorates in self-playing. A Stockfish from 2nd of March. The second row is Elo difference.

Code: Select all

SF vs TEXEL 
Con:

  0:  388  
 20:  396
 40:  398 
 60:  408  ---
 80:  402 
100:  398 


SF vs KOM  
Con:

  0:   56  
 20:   59  ---
 40:   58
 60:   56
 80:   47


SF vs SF Con=0
Con:

  0:    0  --- 
 20:   -3
 40:   -8
 60:  -11 
 80:  -20
I guess for rating lists not only Contempt=20 is beneficial, Contempt=40 or =60 could be rated the highest. The rating difference in optimal cases (shown by "---") increases by some 5%. And it compares favorably to Houdini 4 contempt scheme:

Code: Select all

H4 vs TEXEL
Con:

0:  316
1:  321
2:  326  ---

H4 vs H4 Con=0
Con:

0:    0  ---
1:   -3
2:   -7
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Stockfish contempt factor

Post by JJJ »

And you can also test Komodo with drawscore.
Gurcan Uckardes
Posts: 196
Joined: Wed Oct 29, 2014 12:42 am

Re: Stockfish contempt factor

Post by Gurcan Uckardes »

Results are close enough to let us ask what would happen in longer tc's. Maybe the picture is totally changing.
My blog for Android users: http://chesstroid.blogspot.com
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish contempt factor

Post by bob »

Laskos wrote:I did not pay much attention to this issue, and don't know who pushed it and what it does. But it seems quite beneficial for Stockfish in rating lists if it stays ahead of the competition, which seems plausible. Maybe official releases would better have positive contempt, although it is not passing Fishtest?

I played ultra-bullet games to check it, 10,000 games each data point. Three different opponents: Texel 1.05, often the level of the lowest rated opponent in rating lists, Komodo 8, usually the highest rated. And against itself, to see if it can pass Fishtest and how much it deteriorates in self-playing. A Stockfish from 2nd of March. The second row is Elo difference.

Code: Select all

SF vs TEXEL 
Con:

  0:  388  
 20:  396
 40:  398 
 60:  408  ---
 80:  402 
100:  398 


SF vs KOM  
Con:

  0:   56  
 20:   59  ---
 40:   58
 60:   56
 80:   47


SF vs SF Con=0
Con:

  0:    0  --- 
 20:   -3
 40:   -8
 60:  -11 
 80:  -20
I guess for rating lists not only Contempt=20 is beneficial, Contempt=40 or =60 could be rated the highest. The rating difference in optimal cases (shown by "---") increases by some 5%. And it compares favorably to Houdini 4 contempt scheme:

Code: Select all

H4 vs TEXEL
Con:

0:  316
1:  321
2:  326  ---

H4 vs H4 Con=0
Con:

0:    0  ---
1:   -3
2:   -7
As you've noticed, I don't think self-play will work at all here. You are telling one "version" (which is equal to the other) to give up a little to avoid a draw when possible, so that side is always going to be giving away a few evaluation points in every game. And that has to affect things negatively. If you are really better than your opponent, then a positive contempt would be good since you would try a little harder to avoid draws, extend the game, and outplay him.

This just can't work in self-test however, because the two programs are equal. This should work equally poorly if you actually find two different programs that are exactly the same strength, but that is harder to do.

I have played around here and there with some SPRT testing with Crafty. Not a big fan. I have seen too many cases where -1.5, +4.5 will fail low and reject the change, yet a normal gauntlet will show the change as a positive Elo gain.

I now use SPRT for quick sanity tests to see if a major change works or breaks.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish contempt factor

Post by bob »

JJJ wrote:And you can also test Komodo with drawscore.
That should be what "contempt factor" is. If you think you are better than your opponent, use -20 or something so that you will accept a -10 before you will allow that -20 draw to occur, hoping that if you hold on and keep the game going, you will be able to outplay your opponent.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish contempt factor

Post by Laskos »

bob wrote:
As you've noticed, I don't think self-play will work at all here. You are telling one "version" (which is equal to the other) to give up a little to avoid a draw when possible, so that side is always going to be giving away a few evaluation points in every game. And that has to affect things negatively. If you are really better than your opponent, then a positive contempt would be good since you would try a little harder to avoid draws, extend the game, and outplay him.

This just can't work in self-test however, because the two programs are equal. This should work equally poorly if you actually find two different programs that are exactly the same strength, but that is harder to do.

I have played around here and there with some SPRT testing with Crafty. Not a big fan. I have seen too many cases where -1.5, +4.5 will fail low and reject the change, yet a normal gauntlet will show the change as a positive Elo gain.

I now use SPRT for quick sanity tests to see if a major change works or breaks.
I was not hoping for positive Elo gain in self play, but a negative around -1 point could pass a -3.0, +1.0 SPRT. I would use SPRT whenever I can, but, for example, to stop with alpha, beta=0.05 picking up -3 Elo points "real" difference instead of 0 would need some 20,000-40,000 games, a test too long to perform. Then, the stop value as Elo points gap is poorly chosen, and I limited myself with fixed 10,000 games matches, although the 3 Elo points differences are still out of reach and are just for guidance.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish contempt factor

Post by Laskos »

JJJ wrote:And you can also test Komodo with drawscore.
Here it is, against Texel and against Stockfish, the strongest opponent:

Code: Select all

K8 vs TEXEL 
DS:

   0:   328
 -20:   335
 -40:   340
 -60:   346  ---
 -80:   336
-100:   319


K8 vs SF Con=0
DS:

  40:   -52
  20:   -46  ---
   0:   -57
 -20:   -61
 -40:   -65
Not being the strongest engine, it would be more difficult for Komodo to choose a universal contempt factor.
jefk
Posts: 626
Joined: Sun Jul 25, 2010 10:07 pm
Location: the Netherlands
Full name: Jef Kaan

Re: Stockfish contempt factor

Post by jefk »

a few more little ideas, and ideally these could
be done automatically (eg for Stockfish)

- besides having a higher contempt for a weaker engine (*),
it might be useful to reduce contempt when the engine is
defending (which more often happens with Black btw)
(*) how to recognize the rating of the other comp online
should be possible i guess, if i'm correct Crafty has or
once had such a feature)

- vice versa, when the comp is a pawn (Score) up or
so, increasing the contempt slightly might be beneficial

- scaling of such features could be done with a parameter
'agressiveness' (eagerness to win, or willing to accept a draw)
instead of the original contempt parameter (which then
would be adapted automatically)

jef
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Stockfish contempt factor

Post by JJJ »

Laskos wrote:
JJJ wrote:And you can also test Komodo with drawscore.
Here it is, against Texel and against Stockfish, the strongest opponent:

Code: Select all

K8 vs TEXEL 
DS:

   0:   328
 -20:   335
 -40:   340
 -60:   346  ---
 -80:   336
-100:   319


K8 vs SF Con=0
DS:

  40:   -52
  20:   -46  ---
   0:   -57
 -20:   -61
 -40:   -65
Not being the strongest engine, it would be more difficult for Komodo to choose a universal contempt factor.
Not the first time I see Komodo doing a slighty better score against a better opponent with drawscore not set to zero. But in your case, it is with positive drawscore.