Komodo run - Ingo list revisited

Don · Post by **Don** » Fri Nov 08, 2013 1:27 pm

Ingo was kind enough to run a development version of Komodo against his standard list.

To be sure, his test is not very favorable to Komodo which excells at longer time controls and this is a blitz time control list.

Komodo beat every single program on the list including Houdini, but falls just short of Houdini due to the fact that Houdini does slightly better against weak program at this time control.

Here is his results:

Code: Select all

Final Result one on one of 113300

   1 Houdini 3 STD            3000   10   10  3000   78%  2771 27%
   2 K113300                  2994   10   10  3000   78%  2771 30%
   3 Stockfish 4              2948   10   10  3000   73%  2774 37%
   4 Critter 1.4a             2909    9    9  3000   68%  2776 40%
   5 Gull 2.2                 2908    9    9  3000   68%  2776 40%
   6 Deep Rybka 4.1           2882    9    9  3000   64%  2777 42%
   7 Hannibal 1.4a            2799    9    9  3000   52%  2781 43%
   8 Chiron 1.5               2780    9    9  3000   50%  2782 40%
   9 Protector 1.5.0          2773    9    9  3000   49%  2782 45%
  10 Naum 4.2                 2770    9    9  3000   49%  2783 40%
  11 HIARCS 14 WCSC 32b       2750    9    9  3000   45%  2784 41%
  12 Deep Shredder 12         2734    9    9  3000   43%  2784 38%
  13 Jonny 6.00               2732    9    9  3000   43%  2785 38%
  14 Deep Sjeng c't 2010 32b  2715    9    9  3000   41%  2785 40%
  15 Spike 1.4 32b            2708 9 9 3000   40%  2786 41%
  16 spark-1.0                2698 9 9 3000   38%  2786 39%
  17 Deep Junior 13.3         2678 9 9 3000   36%  2787 33%
  18 Booot 5.2.0              2676 9 9 3000   35%  2787 37%
  19 Quazar 0.4               2667    9    9  3000   34%  2788 35%
  20 Zappa Mexico II          2656   10   10  3000   33%  2788 35%
  21 Toga II 3.0 32b          2646    9    9  3000   31%  2789 36%

Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.

Code: Select all

      150.0 ( 82.0 &#58;  68.0&#41; Houdini 3 STD            3000
      150.0 ( 85.0 &#58;  65.0&#41; Stockfish 4              2948
      150.0 ( 95.5 &#58;  54.5&#41; Critter 1.4a             2909
      150.0 ( 94.0 &#58;  56.0&#41; Gull 2.2                 2908
      150.0 &#40;107.0 &#58;  43.0&#41; Deep Rybka 4.1           2882
      150.0 &#40;117.0 &#58;  33.0&#41; Hannibal 1.4a            2799
      150.0 &#40;119.0 &#58;  31.0&#41; Chiron 1.5               2780
      150.0 &#40;113.5 &#58;  36.5&#41; Protector 1.5.0          2773
      150.0 &#40;120.5 &#58;  29.5&#41; Naum 4.2                 2770
      150.0 &#40;121.5 &#58;  28.5&#41; HIARCS 14 WCSC 32b       2750
      150.0 &#40;119.0 &#58;  31.0&#41; Deep Shredder 12         2734
      150.0 &#40;122.5 &#58;  27.5&#41; Jonny 6.00               2732

Werewolf · Post by **Werewolf** » Fri Nov 08, 2013 1:37 pm

Can you estimate how many elo you've gained since K6 then?

Don · Post by **Don** » Fri Nov 08, 2013 1:39 pm

Werewolf wrote:Can you estimate how many elo you've gained since K6 then?

On Ingo's list we have gained 20 ELO.

Milos · Post by **Milos** » Fri Nov 08, 2013 1:49 pm

Don wrote: Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.

I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.

Don · Post by **Don** » Fri Nov 08, 2013 1:53 pm

Milos wrote:
Don wrote: Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.

Thank you.

Ajedrecista · Post by **Ajedrecista** » Fri Nov 08, 2013 2:01 pm

Hello:

Don wrote:Ingo was kind enough to run a development version of Komodo against his standard list.

To be sure, his test is not very favorable to Komodo which excells at longer time controls and this is a blitz time control list.

Komodo beat every single program on the list including Houdini, but falls just short of Houdini due to the fact that Houdini does slightly better against weak program at this time control.

Here is his results:

Code: Select all

Final Result one on one of 113300

   1 Houdini 3 STD            3000   10   10  3000   78%  2771 27%
   2 K113300                  2994   10   10  3000   78%  2771 30%
   3 Stockfish 4              2948   10   10  3000   73%  2774 37%
   4 Critter 1.4a             2909    9    9  3000   68%  2776 40%
   5 Gull 2.2                 2908    9    9  3000   68%  2776 40%
   6 Deep Rybka 4.1           2882    9    9  3000   64%  2777 42%
   7 Hannibal 1.4a            2799    9    9  3000   52%  2781 43%
   8 Chiron 1.5               2780    9    9  3000   50%  2782 40%
   9 Protector 1.5.0          2773    9    9  3000   49%  2782 45%
  10 Naum 4.2                 2770    9    9  3000   49%  2783 40%
  11 HIARCS 14 WCSC 32b       2750    9    9  3000   45%  2784 41%
  12 Deep Shredder 12         2734    9    9  3000   43%  2784 38%
  13 Jonny 6.00               2732    9    9  3000   43%  2785 38%
  14 Deep Sjeng c't 2010 32b  2715    9    9  3000   41%  2785 40%
  15 Spike 1.4 32b            2708 9 9 3000   40%  2786 41%
  16 spark-1.0                2698 9 9 3000   38%  2786 39%
  17 Deep Junior 13.3         2678 9 9 3000   36%  2787 33%
  18 Booot 5.2.0              2676 9 9 3000   35%  2787 37%
  19 Quazar 0.4               2667    9    9  3000   34%  2788 35%
  20 Zappa Mexico II          2656   10   10  3000   33%  2788 35%
  21 Toga II 3.0 32b          2646    9    9  3000   31%  2789 36%

Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.

Code: Select all

      150.0 ( 82.0 &#58;  68.0&#41; Houdini 3 STD            3000
      150.0 ( 85.0 &#58;  65.0&#41; Stockfish 4              2948
      150.0 ( 95.5 &#58;  54.5&#41; Critter 1.4a             2909
      150.0 ( 94.0 &#58;  56.0&#41; Gull 2.2                 2908
      150.0 &#40;107.0 &#58;  43.0&#41; Deep Rybka 4.1           2882
      150.0 &#40;117.0 &#58;  33.0&#41; Hannibal 1.4a            2799
      150.0 &#40;119.0 &#58;  31.0&#41; Chiron 1.5               2780
      150.0 &#40;113.5 &#58;  36.5&#41; Protector 1.5.0          2773
      150.0 &#40;120.5 &#58;  29.5&#41; Naum 4.2                 2770
      150.0 &#40;121.5 &#58;  28.5&#41; HIARCS 14 WCSC 32b       2750
      150.0 &#40;119.0 &#58;  31.0&#41; Deep Shredder 12         2734
      150.0 &#40;122.5 &#58;  27.5&#41; Jonny 6.00               2732

I have translated these charts to IPON standard offset: 2800 for Shredder 12. I did the sums mentally, so it is possible that I went wrong somewhere:

Code: Select all

Final Result one on one of 113300

   1 Houdini 3 STD            3066   10   10  3000   78%  2837 27%
   2 K113300                  3060   10   10  3000   78%  2837 30%
   3 Stockfish 4              3014   10   10  3000   73%  2840 37%
   4 Critter 1.4a             2975    9    9  3000   68%  2842 40%
   5 Gull 2.2                 2974    9    9  3000   68%  2842 40%
   6 Deep Rybka 4.1           2948    9    9  3000   64%  2843 42%
   7 Hannibal 1.4a            2865    9    9  3000   52%  2847 43%
   8 Chiron 1.5               2846    9    9  3000   50%  2848 40%
   9 Protector 1.5.0          2839    9    9  3000   49%  2848 45%
  10 Naum 4.2                 2836    9    9  3000   49%  2849 40%
  11 HIARCS 14 WCSC 32b       2816    9    9  3000   45%  2850 41%
  12 Deep Shredder 12         2800    9    9  3000   43%  2850 38%
  13 Jonny 6.00               2798    9    9  3000   43%  2851 38%
  14 Deep Sjeng c't 2010 32b  2781    9    9  3000   41%  2851 40%
  15 Spike 1.4 32b            2774    9    9  3000   40%  2852 41%
  16 spark-1.0                2764    9    9  3000   38%  2852 39%
  17 Deep Junior 13.3         2744    9    9  3000   36%  2853 33%
  18 Booot 5.2.0              2742    9    9  3000   35%  2853 37%
  19 Quazar 0.4               2733    9    9  3000   34%  2854 35%
  20 Zappa Mexico II          2722   10   10  3000   33%  2854 35%
  21 Toga II 3.0 32b          2712    9    9  3000   31%  2855 36%

Code: Select all

150.0 ( 82.0 &#58;  68.0&#41; Houdini 3 STD            3066
150.0 ( 85.0 &#58;  65.0&#41; Stockfish 4              3014
150.0 ( 95.5 &#58;  54.5&#41; Critter 1.4a             2975
150.0 ( 94.0 &#58;  56.0&#41; Gull 2.2                 2974
150.0 &#40;107.0 &#58;  43.0&#41; Deep Rybka 4.1           2948
150.0 &#40;117.0 &#58;  33.0&#41; Hannibal 1.4a            2865
150.0 &#40;119.0 &#58;  31.0&#41; Chiron 1.5               2846
150.0 &#40;113.5 &#58;  36.5&#41; Protector 1.5.0          2839
150.0 &#40;120.5 &#58;  29.5&#41; Naum 4.2                 2836
150.0 &#40;121.5 &#58;  28.5&#41; HIARCS 14 WCSC 32b       2816
150.0 &#40;119.0 &#58;  31.0&#41; Deep Shredder 12         2800
150.0 &#40;122.5 &#58;  27.5&#41; Jonny 6.00               2798

It looks like Jonny 6.00 and Hannibal 1.4a are now included if I am not wrong. Disregarding this issue, this development version of Komodo has earned around 24 Elo plus/minus uncertainties (around ± 14 Elo taking into account a difference between two normal distributions of 3036 ± 10 and 3060 ± 10, writing from memory) since version 5.1r2 or similar. Am I right?

Anyway, well done Komodo team! I wish Don a speedy recovery.

Regards from Spain.

Ajedrecista.

Don · Post by **Don** » Fri Nov 08, 2013 2:18 pm

Milos wrote:
Don wrote: Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.

I think you explained why Houdini does better against weak programs, it's probably the aggressive contempt factor. Komodo probably respects other programs way too much.

And I agree with you that Komodo is way too strong for Houdini to have contempt for it.

I doubt Ingo would run this test again as it costs him precious electricity which is expensive where he lives, but if I could convince him to do so do you believe setting Houdini to contempt zero will increase it's overall rating on this list?

Don

Milos · Post by **Milos** » Fri Nov 08, 2013 2:26 pm

Don wrote:
Milos wrote:
Don wrote: Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.
I think you explained why Houdini does better against weak programs, it's probably the aggressive contempt factor. Komodo probably respects other programs way too much.

And I agree with you that Komodo is way too strong for Houdini to have contempt for it.

I doubt Ingo would run this test again as it costs him precious electricity which is expensive where he lives, but if I could convince him to do so do you believe setting Houdini to contempt zero will increase it's overall rating on this list?

Don

I don't believe it would help H3 on Ingo's list, on the contrary. There are too many weak opponents (300Elo weaker) so high contempt there brings more points overall (more wins instead of draws) than what H3 looses against SF and Komodo (there it has rougly 6% of the games as losses that would be draws with contempt 0).

Don · Post by **Don** » Fri Nov 08, 2013 2:33 pm

Milos wrote:
Don wrote:
Milos wrote:
Don wrote: Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.
I think you explained why Houdini does better against weak programs, it's probably the aggressive contempt factor. Komodo probably respects other programs way too much.

And I agree with you that Komodo is way too strong for Houdini to have contempt for it.

I doubt Ingo would run this test again as it costs him precious electricity which is expensive where he lives, but if I could convince him to do so do you believe setting Houdini to contempt zero will increase it's overall rating on this list?

Don
I don't believe it would help H3 on Ingo's list, on the contrary. There are too many weak opponents (300Elo weaker) so high contempt there brings more points overall (more wins instead of draws) than what H3 looses against SF and Komodo (there it has rougly 6% of the games as losses that would be draws with contempt 0).

So it's probably the case that Komodo would actually top this list if Houdini's contempt was zero. Houdini is optimized to do well on lists.

I'll see if Ingo is willing to run another test with contempt = 0 for Houdini.

Milos · Post by **Milos** » Fri Nov 08, 2013 2:40 pm

Don wrote:
Milos wrote:
Don wrote:
Milos wrote:
Don wrote: Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.
I think you explained why Houdini does better against weak programs, it's probably the aggressive contempt factor. Komodo probably respects other programs way too much.

And I agree with you that Komodo is way too strong for Houdini to have contempt for it.

I doubt Ingo would run this test again as it costs him precious electricity which is expensive where he lives, but if I could convince him to do so do you believe setting Houdini to contempt zero will increase it's overall rating on this list?

Don
I don't believe it would help H3 on Ingo's list, on the contrary. There are too many weak opponents (300Elo weaker) so high contempt there brings more points overall (more wins instead of draws) than what H3 looses against SF and Komodo (there it has rougly 6% of the games as losses that would be draws with contempt 0).
So it's probably the case that Komodo would actually top this list if Houdini's contempt was zero. Houdini is optimized to do well on lists.

I'll see if Ingo is willing to run another test with contempt = 0 for Houdini.

I pretty sure RH has the same kind of setup with avarage opponent rating as in Ingo list or CCRL, and after he's satisfied with contempt 0 strength of the engine, he then optimizes contempt to provide highest rating and than normalizes that one to 1

.

Komodo run - Ingo list revisited

Komodo run - Ingo list revisited

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisited.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.