Komodo run - Ingo list revisited

Ajedrecista · Post by **Ajedrecista** » Fri Nov 08, 2013 8:01 pm

Hello Ingo:

IWB wrote:
Ajedrecista wrote:
... this development version of Komodo has earned around 24 Elo plus/minus uncertainties (around ± 14 Elo taking into account a difference between two normal distributions of 3036 ± 10 and 3060 ± 10, writing from memory) since version 5.1r2 or similar. Am I right?

Ajedrecista.
Not quite:

Excerpt of the full list:
Code: Select all
   2 K113300                    3062   11   11  3000   78%  2838   30% 
   3 Komodo 6                   3042   10   10  3300   76%  2837   33% 
   4 Komodo CCT                 3035    9    9  3750   74%  2850   34% 
   5 Komodo 5.1                 3023   11   10  2850   74%  2839   34% 
Of course a winning Komdo over the No1 is hammering down the first spot! ...

I see. I clearly was thinking in Komodo CCT. Sorry for the confusion. Thanks for the good job!

Regards from Spain.

Ajedrecista.

Uri Blass · Post by **Uri Blass** » Sat Nov 09, 2013 7:40 am

Milos wrote:
Uri Blass wrote:I think that good result against weak opponent can also be explained by lack of weaknesses so you do not lose against weak opponents.

I do not say that it is part of the explanation for houdini's good results and
I do not know.
Statistically you are wrong. When playing opponent that is 400Elo weaker chance to loose is almost neglidgable.
The effect of contempt is to convert certain percentage of drawn games into decided ones. The proporation of the decided games will again be much more on the side of a stronger engine.
So numerical example.
You played 100 games and original result is 60+/37=/-3, i.e. 78.5% or 225Elo.
Winning/lossing chances ratio is 20:1. Than you increased the contepemt and got only 30 draws this time. 7 more games are decided and lets say now weaker engine improved its chances to 6:1 so the final result is 66+/30=/4-, i.e. 81% or 252Elo.

The thing is unless you blow your contempt out of proportions, in additionally decided games stronger engine will always win much more games than weaker and will benefit from it.

I hope you see the point.

When playing against 100 elo weaker opponents(that is the more common case in rating lists like CCRL) chances to lose are not so small and the question if a program lose more games against weaker opponents is also interesting.

Note also that problem of drawing more games can be also result of lack of knowledge and programs that do not know about the blind bishop may allow a weaker opponent to draw more often not because of contempt
but because it falls into draw traps against weak engines.

Don · Post by **Don** » Sun Nov 10, 2013 12:00 pm

Milos wrote: I don't believe it would help H3 on Ingo's list, on the contrary. There are too many weak opponents (300Elo weaker) so high contempt there brings more points overall (more wins instead of draws) than what H3 looses against SF and Komodo (there it has rougly 6% of the games as losses that would be draws with contempt 0).

Ingo did in fact agree to rerun all of Houdini's results but with contempt factor of zero - and it verifies exactly what you said.

Under these conditions Komodo does not score as well against Houdini directly, showing that Komodo is too strong for Houdini to show contempt for.

In the new rating list we see that Komodo is now the highest rated programs because contempt of zero definitely hurts Houdini against most if not all other opponents. It's amazing that this one parameter swings the relative difference between these program by 17 ELO!

Also, where Komodo had defeated Houdini with Houdini's default ELO, In the Contempt = 0 test it was a dead tie, both programs scoring exactly 75 points.

Ingo considers Houdini's default contempt the right one to use - so at blitz time controls Houdini 3 retains a slight edge over Komodo.

Code: Select all

H3 Con 0&#58;

Rank Name                      Elo    +    - games score oppo. draws
   1 K113300                  3005   10   10  3000   78%  2785 31%
   2 Houdini 3 Con 0          2994   10   10  3000   77%  2785 35%
   3 Stockfish 4              2957   10    9  3000   73%  2787 38%
   4 Critter 1.4a             2922    9    9  3000   68%  2789 41%
   5 Gull 2.2                 2922    9    9  3000   68%  2789 40%
   6 Deep Rybka 4.1           2896 9 9 3000   65%  2790 42%
   7 Hannibal 1.4a            2814 9 9 3000   52%  2795 44%
   8 Chiron 1.5               2793    9    9  3000   50%  2796 41%
   9 Protector 1.5.0          2785    9    9  3000   49%  2796 44%
  10 Naum 4.2                 2784    9    9  3000   49%  2796 41%
  11 HIARCS 14 WCSC 32b       2764 9 9 3000   46%  2797 41%
  12 Deep Shredder 12         2750    9    9  3000   44%  2798 39%
  13 Jonny 6.00               2749    9    9  3000   43%  2798 38%
  14 Deep Sjeng c't 2010 32b  2732    9    9  3000   41%  2799 40%
  15 Spike 1.4 32b            2727 9 9 3000   40%  2799 42%
  16 spark-1.0                2713    9    9  3000   39%  2800 39%
  17 Deep Junior 13.3         2691   10   10  3000   36%  2801 33%
  18 Booot 5.2.0              2691    9    9  3000   35%  2801 37%
  19 Quazar 0.4               2682    9    9  3000   35%  2801 36%
  20 Zappa Mexico II          2672 9 9 3000   33%  2802 36%
  21 Toga II 3.0 32b          2661    9    9  3000   32%  2802 36%

Don · Post by **Don** » Sun Nov 10, 2013 12:07 pm

Don wrote:
Milos wrote: I don't believe it would help H3 on Ingo's list, on the contrary. There are too many weak opponents (300Elo weaker) so high contempt there brings more points overall (more wins instead of draws) than what H3 looses against SF and Komodo (there it has rougly 6% of the games as losses that would be draws with contempt 0).
Ingo did in fact agree to rerun all of Houdini's results but with contempt factor of zero - and it verifies exactly what you said.

Under these conditions Komodo does not score as well against Houdini directly, showing that Komodo is too strong for Houdini to show contempt for.

In the new rating list we see that Komodo is now the highest rated programs because contempt of zero definitely hurts Houdini against most if not all other opponents. It's amazing that this one parameter swings the relative difference between these program by 17 ELO!

Also, where Komodo had defeated Houdini with Houdini's default ELO, In the Contempt = 0 test it was a dead tie, both programs scoring exactly 75 points.

Ingo considers Houdini's default contempt the right one to use - so at blitz time controls Houdini 3 retains a slight edge over Komodo.
Code: Select all
H3 Con 0&#58;

Rank Name                      Elo    +    - games score oppo. draws
   1 K113300                  3005   10   10  3000   78%  2785 31%
   2 Houdini 3 Con 0          2994   10   10  3000   77%  2785 35%
   3 Stockfish 4              2957   10    9  3000   73%  2787 38%
   4 Critter 1.4a             2922    9    9  3000   68%  2789 41%
   5 Gull 2.2                 2922    9    9  3000   68%  2789 40%
   6 Deep Rybka 4.1           2896 9 9 3000   65%  2790 42%
   7 Hannibal 1.4a            2814 9 9 3000   52%  2795 44%
   8 Chiron 1.5               2793    9    9  3000   50%  2796 41%
   9 Protector 1.5.0          2785    9    9  3000   49%  2796 44%
  10 Naum 4.2                 2784    9    9  3000   49%  2796 41%
  11 HIARCS 14 WCSC 32b       2764 9 9 3000   46%  2797 41%
  12 Deep Shredder 12         2750    9    9  3000   44%  2798 39%
  13 Jonny 6.00               2749    9    9  3000   43%  2798 38%
  14 Deep Sjeng c't 2010 32b  2732    9    9  3000   41%  2799 40%
  15 Spike 1.4 32b            2727 9 9 3000   40%  2799 42%
  16 spark-1.0                2713    9    9  3000   39%  2800 39%
  17 Deep Junior 13.3         2691   10   10  3000   36%  2801 33%
  18 Booot 5.2.0              2691    9    9  3000   35%  2801 37%
  19 Quazar 0.4               2682    9    9  3000   35%  2801 36%
  20 Zappa Mexico II          2672 9 9 3000   33%  2802 36%
  21 Toga II 3.0 32b          2661    9    9  3000   32%  2802 36%

The downside of all of this is that it's now possible for fan-boys of either program to manipulate the results. For example if you were a big fan of Komodo leave Houdini and Komodo at the default settings but it you want to improve Houdini's performance you can optimized the setting to improve its result against Komodo. I would strongly suggest that testers just leave all settings at the default except of course for common settings such as Hash table and threads.

Adam Hair · Post by **Adam Hair** » Sun Nov 10, 2013 12:37 pm

Don wrote: The downside of all of this is that it's now possible for fan-boys of either program to manipulate the results. For example if you were a big fan of Komodo leave Houdini and Komodo at the default settings but it you want to improve Houdini's performance you can optimized the setting to improve its result against Komodo. I would strongly suggest that testers just leave all settings at the default except of course for common settings such as Hash table and threads.

Sure you do, fanboy!

This what most, if not all of us do. With many engines and limited resources, we typically leave it up to the authors to provide what they consider to be the best settings. It is assumed that the default settings are those best settings.

Don · Post by **Don** » Sun Nov 10, 2013 1:20 pm

Adam Hair wrote:
Don wrote: The downside of all of this is that it's now possible for fan-boys of either program to manipulate the results. For example if you were a big fan of Komodo leave Houdini and Komodo at the default settings but it you want to improve Houdini's performance you can optimized the setting to improve its result against Komodo. I would strongly suggest that testers just leave all settings at the default except of course for common settings such as Hash table and threads.
Sure you do, fanboy!

This what most, if not all of us do. With many engines and limited resources, we typically leave it up to the authors to provide what they consider to be the best settings. It is assumed that the default settings are those best settings.

I think that is what most testers do. The good testers are meticulous about publishing the exact conditions of the test. That is also good science, you should be able to verify or duplicate a result in science.

We find it very odd that someone can run a test that shows Komodo way ahead, then someone else runs a test showing Komodo way behind, and both tests seem legitimate and have statistically meaningful samples. But often the conditions are not fully specified so they lose their scientific relevance. I will try to point that out next time I see it, even if the test makes Komodo look good.

Adam Hair · Post by **Adam Hair** » Sun Nov 10, 2013 2:50 pm

I agree. The results are mostly irrelevant if the conditions remain unknown. It is hard to assume that anything other than statistical noise is the cause of the different results.

IWB · Post by **IWB** » Sun Nov 10, 2013 3:41 pm

Adam Hair wrote:... we typically leave it up to the authors to provide what they consider to be the best settings....

There is one special case where thisi s not the optimal strategy: The Playchess server!

If you want to play there with H3 you should set up the contempt to 0 as most engines playing there are VERY close. With Contempt 0 your chance of drawing and winning agianst Komodo and Stockfish rises ...

One can see it different: The only incident when using Contempt 1 is correct is for a better rating in the lists (today). Everything else, Analysis and playing other top engines, contempt 0 would be better ...

Of course it would be best to play a top list only between close contenders (100 Elo) ... but there are nearly none. No one is interested in a list where always only the same 3/4 engines are playing ...

It is the same as for Rybka 4/4.1. As long as you are the sole No 1 the cntempt let you shime even more. As soon as your engine is passed you either have to release a new and better one and no one will test the old one or your rating will decrease more than it should be nessesary. In my private list R4.1 is now worse than R4(.0) as R4 never had to play the though opponents ... (And I now almost for sure that R4.1 and R4 are in reality about equal, with a slite edge for 4.1)

If you are comercial and want/have to earn money your Top-Engine has to play with contempt but sooner or later contempt backfires (but then you have the money in your pocket

)

Bye
Ingo

lkaufman · Post by **lkaufman** » Sun Nov 10, 2013 4:26 pm

IWB wrote:
Adam Hair wrote:... we typically leave it up to the authors to provide what they consider to be the best settings....
There is one special case where thisi s not the optimal strategy: The Playchess server!

If you want to play there with H3 you should set up the contempt to 0 as most engines playing there are VERY close. With Contempt 0 your chance of drawing and winning agianst Komodo and Stockfish rises ...

One can see it different: The only incident when using Contempt 1 is correct is for a better rating in the lists (today). Everything else, Analysis and playing other top engines, contempt 0 would be better ...

Of course it would be best to play a top list only between close contenders (100 Elo) ... but there are nearly none. No one is interested in a list where always only the same 3/4 engines are playing ...

It is the same as for Rybka 4/4.1. As long as you are the sole No 1 the cntempt let you shime even more. As soon as your engine is passed you either have to release a new and better one and no one will test the old one or your rating will decrease more than it should be nessesary. In my private list R4.1 is now worse than R4(.0) as R4 never had to play the though opponents ... (And I now almost for sure that R4.1 and R4 are in reality about equal, with a slite edge for 4.1)

If you are comercial and want/have to earn money your Top-Engine has to play with contempt but sooner or later contempt backfires (but then you have the money in your pocket )

Bye
Ingo

A good solution would be to include all the games just as you do now, but to use a rating formula that automatically gives more weight to close pairings. I'm not at all sure about this, but I suspect that Ordo has that property if compared to BayesElo. A simple way to check this out would be to calculate a rating list including both Houdini Cont.0 and Houdini Cont. 1 on the same list, and run it with both Ordo and BayesElo. My prediction is that the gap will be noticeably smaller using Ordo. Perhaps you could run this calculation to see. If I'm right it would mean that Ordo would automatically reduce the incentive to use high contempt, which would lead to more meaningful ratings if contempt plays less of a role.

IWB · Post by **IWB** » Sun Nov 10, 2013 4:28 pm

lkaufman wrote:. A simple way to check this out would be to calculate a rating list including both Houdini Cont.0 and Houdini Cont. 1 on the same list, and run it with both Ordo and BayesElo.

What engine do you like to have fixed with what rating?

BYe
INgo

Komodo run - Ingo list revisited

Re: Komodo run - Ingo list revisited.

Re: Komodo run - Ingo list revisited

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.