"Contempt" in Komodo broken?

Laskos · Post by **Laskos** » Thu Sep 17, 2015 4:48 pm

I was curious to check the rule of thumb for setting Komodo's "Contempt" as a function of ELO difference. I suspect the rule is that "Contempt" must be pretty linear in ELO diff., therefore measuring it at a large ELO difference sets the general rule fairly accurately. First I checked Komodo 9.1 against a much weaker, very old Stockfish which is about 300 ELO points weaker than Komodo. Komodo 9.1 has "Drawscore" parameter, which I expected to be optimal against that old Stockfish in minus 20-40 range. The test with Komodo 9.1 went smoothly, and pretty much as expected. Games at 2.5''+0.025'', the numbers after "Komodo" are showing _minus_ "Drawscore":

Code: Select all

Komodo 9.1 &#40;minus Drawscore&#41;
Rank Name                          ELO   Games   Score   Draws
   1 Komodo 60                     305    8000     85%      9%
   2 Komodo 40                     305    8000     85%     10%
   3 Komodo 50                     300    8000     85%      9%
   4 Komodo 70                     299    8000     85%      9%
   5 Komodo 80                     295    8000     85%      9%
   6 Komodo 10                     295    8000     85%     12%
   7 Komodo 30                     294    8000     84%     10%
   8 Komodo 0                      292    8000     84%     12%
   9 Komodo 20                     290    8000     84%     11%
  10 Komodo -10                    290    8000     84%     12%
  11 Komodo -20                    290    8000     84%     14%

  12 Stockfish                    -296   88000     15%     11%
Finished match

The optimal "Drawscore" is in the (minus) 40-60 range, and the rule of thumb for the optimal "Drawscore" is about (minus) ELO difference over 6-7.

But with Komodo 9.2 I had a nasty surprise. I was expecting a similar behavior, say optimal "Contempt" in positive 40-60 range. The results are very confusing:

Code: Select all

Komodo 9.2 &#40;Contempt&#41;
Rank Name                          ELO   Games   Score   Draws
   1 Komodo -20                    301    8000     85%     12%
   2 Komodo -30                    298    8000     85%     12%
   3 Komodo 0                      297    8000     85%     11%
   4 Komodo -40                    295    8000     84%     13%
   5 Komodo -10                    293    8000     84%     11%
   6 Komodo 10                     287    8000     84%     11%
   7 Komodo 20                     283    8000     84%     11%
   8 Komodo 30                     282    8000     84%     11%
   9 Komodo 40                     282    8000     84%     11%
  10 Komodo 60                     273    8000     83%     11%
  11 Komodo 50                     273    8000     83%     10%
  12 Komodo 70                     262    8000     82%     11%
  13 Komodo 80                     259    8000     82%     11%
 
  14 Stockfish                    -283  104000     16%     11%
Finished match

It seems that the best "Contempt" against a 300 ELO points weaker engine is consistent with Contempt=0 or even with slightly _negative_ Contempt. What this stuff means? I suspected the time control being too short, but I played at longer 10''+0.1'' Contempt=0 compared to Contempt=30, and "0" won convincingly. Is Komodo 9.2 "Contempt" broken?

Laskos · Post by **Laskos** » Thu Sep 17, 2015 5:02 pm

Here is the longer 10''+0.1'' gauntlet against Hakkapeliitta, which is about 400 ELO points weaker:

Code: Select all

Komodo 9.2 &#40;Contempt&#41;

Rank Name                          ELO   Games   Score   Draws
   1 Komodo 0                      439    5000     93%      7%
   2 Komodo 30                     420    5000     92%      7%
   3 Hakkapeliitta                -430   10000      8%      7%
Finished match

I don't know what's the matter, maybe much longer time controls change things.

bob · Post by **bob** » Thu Sep 17, 2015 5:25 pm

Laskos wrote:Here is the longer 10''+0.1'' gauntlet against Hakkapeliitta, which is about 400 ELO points weaker:
Code: Select all
Komodo 9.2 &#40;Contempt&#41;

Rank Name                          ELO   Games   Score   Draws
   1 Komodo 0                      439    5000     93%      7%
   2 Komodo 30                     420    5000     92%      7%
   3 Hakkapeliitta                -430   10000      8%      7%
Finished match
I don't know what's the matter, maybe much longer time controls change things.

I'm not sure this is all that surprising. I have run a number of such tests using Crafty, and am currently working on this again. I, like you, thought a "diff / n" type score should be pretty effective. Turns out "n" actually varies by opponent more than by rating. Seems odd, but these observations come from 30K games per test.

I'll post some data once the current test completes (right now, /10 is best, with /7 and /13 down a bit. I am "filling in the holes" now. But if I take two programs that are pretty equal in my tests (equal when playing Crafty which means equal in "crafty-Elo" terms, things look odd. I have three programs, Arasan 17.0, Glaurung 2.2, and Toga2, all rated between 2460 and 2470 (based on Crafty's roughly 2650 rating in this pool). Yet there is no single denominator that produces the best results for all three of those at once.

I'll post the bayeselo numbers showing draw and winning percentages for the above denominator range. I am not adjusting the draw score if the opponent is better, yet, as I wanted to tackle each side of the difference independently so I could compare to original which just did a simple divide with a "boost" added in first..

More later.

Laskos · Post by **Laskos** » Thu Sep 17, 2015 7:05 pm

bob wrote:
Laskos wrote:Here is the longer 10''+0.1'' gauntlet against Hakkapeliitta, which is about 400 ELO points weaker:
Code: Select all
Komodo 9.2 &#40;Contempt&#41;

Rank Name                          ELO   Games   Score   Draws
   1 Komodo 0                      439    5000     93%      7%
   2 Komodo 30                     420    5000     92%      7%
   3 Hakkapeliitta                -430   10000      8%      7%
Finished match
I don't know what's the matter, maybe much longer time controls change things.
I'm not sure this is all that surprising. I have run a number of such tests using Crafty, and am currently working on this again. I, like you, thought a "diff / n" type score should be pretty effective. Turns out "n" actually varies by opponent more than by rating. Seems odd, but these observations come from 30K games per test.

I'll post some data once the current test completes (right now, /10 is best, with /7 and /13 down a bit. I am "filling in the holes" now. But if I take two programs that are pretty equal in my tests (equal when playing Crafty which means equal in "crafty-Elo" terms, things look odd. I have three programs, Arasan 17.0, Glaurung 2.2, and Toga2, all rated between 2460 and 2470 (based on Crafty's roughly 2650 rating in this pool). Yet there is no single denominator that produces the best results for all three of those at once.

I'll post the bayeselo numbers showing draw and winning percentages for the above denominator range. I am not adjusting the draw score if the opponent is better, yet, as I wanted to tackle each side of the difference independently so I could compare to original which just did a simple divide with a "boost" added in first..

More later.

Yes, it could vary with opposing engines, but one would expect a positive "Contempt" against weaker engines and vice-versa. In my test Komodo 9.2 gets better results against the same weak engine if "Contempt" is 0 or below zero, Komodo seeks draws, gets more of them, with better results against a weak engine, which is counterintuitive. Komodo 9.1 with simple "Drawscore" correctly shows that against a much weaker engine it's better to avoid draws to some point. If the behavior of K 9.2 is intended, there is no way one can guess what contempt to set, even the sign of "Contempt".

lkaufman · Post by **lkaufman** » Thu Sep 17, 2015 7:18 pm

Laskos wrote:I was curious to check the rule of thumb for setting Komodo's "Contempt" as a function of ELO difference. I suspect the rule is that "Contempt" must be pretty linear in ELO diff., therefore measuring it at a large ELO difference sets the general rule fairly accurately. First I checked Komodo 9.1 against a much weaker, very old Stockfish which is about 300 ELO points weaker than Komodo. Komodo 9.1 has "Drawscore" parameter, which I expected to be optimal against that old Stockfish in minus 20-40 range. The test with Komodo 9.1 went smoothly, and pretty much as expected. Games at 2.5''+0.025'', the numbers after "Komodo" are showing _minus_ "Drawscore":
Code: Select all
Komodo 9.1 &#40;minus Drawscore&#41;
Rank Name                          ELO   Games   Score   Draws
   1 Komodo 60                     305    8000     85%      9%
   2 Komodo 40                     305    8000     85%     10%
   3 Komodo 50                     300    8000     85%      9%
   4 Komodo 70                     299    8000     85%      9%
   5 Komodo 80                     295    8000     85%      9%
   6 Komodo 10                     295    8000     85%     12%
   7 Komodo 30                     294    8000     84%     10%
   8 Komodo 0                      292    8000     84%     12%
   9 Komodo 20                     290    8000     84%     11%
  10 Komodo -10                    290    8000     84%     12%
  11 Komodo -20                    290    8000     84%     14%

  12 Stockfish                    -296   88000     15%     11%
Finished match
The optimal "Drawscore" is in the (minus) 40-60 range, and the rule of thumb for the optimal "Drawscore" is about (minus) ELO difference over 6-7.

But with Komodo 9.2 I had a nasty surprise. I was expecting a similar behavior, say optimal "Contempt" in positive 40-60 range. The results are very confusing:
Code: Select all
Komodo 9.2 &#40;Contempt&#41;
Rank Name                          ELO   Games   Score   Draws
   1 Komodo -20                    301    8000     85%     12%
   2 Komodo -30                    298    8000     85%     12%
   3 Komodo 0                      297    8000     85%     11%
   4 Komodo -40                    295    8000     84%     13%
   5 Komodo -10                    293    8000     84%     11%
   6 Komodo 10                     287    8000     84%     11%
   7 Komodo 20                     283    8000     84%     11%
   8 Komodo 30                     282    8000     84%     11%
   9 Komodo 40                     282    8000     84%     11%
  10 Komodo 60                     273    8000     83%     11%
  11 Komodo 50                     273    8000     83%     10%
  12 Komodo 70                     262    8000     82%     11%
  13 Komodo 80                     259    8000     82%     11%
 
  14 Stockfish                    -283  104000     16%     11%
Finished match
It seems that the best "Contempt" against a 300 ELO points weaker engine is consistent with Contempt=0 or even with slightly _negative_ Contempt. What this stuff means? I suspected the time control being too short, but I played at longer 10''+0.1'' Contempt=0 compared to Contempt=30, and "0" won convincingly. Is Komodo 9.2 "Contempt" broken?

Thanks for running this. I'm investigating now. I'm not too concerned about results at your 2.5" level, because when the increment is less than move overhead results depend very much on who runs out of time first (either causing forfeits or moving after just a few ply search), and contempt may easily change that. But results at ten seconds + .1" should be valid. So far, running current Komodo dev. vs. Stockfish 2, I'm getting better results for default contempt (15) than with zero. I'll try contempt 30 next; it's possible that your delta/10 formula is too aggressive and should be more like delta/15 or even delta/20. I'll try to pin this down. We didn't do much testing with contempt levels above 20 or so.

shrapnel · Post by **shrapnel** » Thu Sep 17, 2015 8:35 pm

Laskos wrote:It seems that the best "Contempt" against a 300 ELO points weaker engine is consistent with Contempt=0 or even with slightly _negative_ Contempt. What this stuff means? I suspected the time control being too short, but I played at longer 10''+0.1'' Contempt=0 compared to Contempt=30, and "0" won convincingly. Is Komodo 9.2 "Contempt" broken?

Heh Heh..... I knew there was something wrong with the Contempt parameter, but the Komodo Team refused to believe me.
I stand vindicated !

Laskos · Post by **Laskos** » Thu Sep 17, 2015 10:21 pm

shrapnel wrote:
Laskos wrote:It seems that the best "Contempt" against a 300 ELO points weaker engine is consistent with Contempt=0 or even with slightly _negative_ Contempt. What this stuff means? I suspected the time control being too short, but I played at longer 10''+0.1'' Contempt=0 compared to Contempt=30, and "0" won convincingly. Is Komodo 9.2 "Contempt" broken?
Heh Heh..... I knew there was something wrong with the Contempt parameter, but the Komodo Team refused to believe me.
I stand vindicated !

Let's see, if these issues are very much time control dependent, very few can test in thousands of games at blitz and longer against different engines.

Laskos · Post by **Laskos** » Fri Sep 18, 2015 12:38 am

lkaufman wrote:
Thanks for running this. I'm investigating now. I'm not too concerned about results at your 2.5" level, because when the increment is less than move overhead results depend very much on who runs out of time first (either causing forfeits or moving after just a few ply search), and contempt may easily change that. But results at ten seconds + .1" should be valid. So far, running current Komodo dev. vs. Stockfish 2, I'm getting better results for default contempt (15) than with zero. I'll try contempt 30 next; it's possible that your delta/10 formula is too aggressive and should be more like delta/15 or even delta/20. I'll try to pin this down. We didn't do much testing with contempt levels above 20 or so.

That's why I tested first Komodo 9.1, which has the simple "Drawscore". There all settled pretty much as expected even at 2.5''+0.025''. But 9.2 result is a bit weird. There were no forfeits as far as I followed the results.

lkaufman · Post by **lkaufman** » Fri Sep 18, 2015 4:44 am

Laskos wrote:
lkaufman wrote:
Thanks for running this. I'm investigating now. I'm not too concerned about results at your 2.5" level, because when the increment is less than move overhead results depend very much on who runs out of time first (either causing forfeits or moving after just a few ply search), and contempt may easily change that. But results at ten seconds + .1" should be valid. So far, running current Komodo dev. vs. Stockfish 2, I'm getting better results for default contempt (15) than with zero. I'll try contempt 30 next; it's possible that your delta/10 formula is too aggressive and should be more like delta/15 or even delta/20. I'll try to pin this down. We didn't do much testing with contempt levels above 20 or so.
That's why I tested first Komodo 9.1, which has the simple "Drawscore". There all settled pretty much as expected even at 2.5''+0.025''. But 9.2 result is a bit weird. There were no forfeits as far as I followed the results.

My point was that the new contempt (unlike drawscore) will have major effects on game length, which could easily be decisive in games where increment is less that overhead. Even if there are no forfeits, if the engines think they are out of time they do super-short searches, so these levels are just too unreliable.

But anyway, I've run a lot of games against SF2 at an average of 10" plus 0.5", and am now running the same against Rybka 3. Against SF2, I got that contempt 0 won by 305, contempt 15 by 320, contempt 30 by 342. So probably the curve peaks somewhere in the mid twenties, say 25. If so the divisor in your formula should be 12 rather than 10. However with Rybka 3 results so far are very mixed, I just need more games. Probably contempt works poorly against engines that are weak in the endgame, because it strives to avoid fairly even endgames. This may be what's going on in your test, I don't know. More in the morning.

shrapnel · Post by **shrapnel** » Fri Sep 18, 2015 2:59 pm

Laskos wrote:Let's see, if these issues are very much time control dependent

What you say is actually true even from a Gamer's point of view.
I have found that for very STC/Blitz, Contempt 0 is always the best, irrespective of the Engine being used !
A Playchess pal of mine theorized that this is because at very short time controls, the Victor is the Engine that makes the least amount of mistakes (!), NOT the most aggressive/brilliant move finding Engine ! Contempt 0 facilitates this. It is only logical I think.
So, it is only in LTC games that increasing Contempt has a positive impact because the Engine has more time to find brilliant or game-winning moves.
So, obviously, it follows from this, that only LTC games really and truly show which is indeed the superior Engine, other factors like Opening book strength and Hardware being equal.
On another note, this is the main reason why I have always been against equating thousands of short time control games to a few hundred LTC games; a favorite theory of Engine developers.
In my opinion, the two are as different as chalk and cheese and anyone who thinks differently is merely deluding himself.
Would you consider ten thousand pieces of chalk equivalent to even one cheese cube ? Of course NOT !

"Contempt" in Komodo broken?

"Contempt" in Komodo broken?

Re: "Contempt" in Komodo broken?

Re: "Contempt" in Komodo broken?

Re: "Contempt" in Komodo broken?

Re: "Contempt" in Komodo broken?

Re: "Contempt" in Komodo broken?

Re: "Contempt" in Komodo broken?

Re: "Contempt" in Komodo broken?

Re: "Contempt" in Komodo broken?

Re: "Contempt" in Komodo broken?