AH_LTC Match: Komodo TCEC vs Houdini 4

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

ouachita
Posts: 454
Joined: Tue Jan 15, 2013 4:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by ouachita »

lkaufman wrote:So I'm inclined to think that on one core the breakeven level is around an hour for the game. It may not be shorter on 4 cores because I suspect that Houdini and SF have a better MP than Komodo for 4 cores but not for 12 or 16.
I agree, and believe that, when graphed, these ELO lines converge and intersect at approx. 60 min, and then diverge at TC > 60 min until at some unknown point in time point they might even become parallel
SIM, PhD, MBA, PE
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by beram »

ouachita wrote:
lkaufman wrote:So I'm inclined to think that on one core the breakeven level is around an hour for the game. It may not be shorter on 4 cores because I suspect that Houdini and SF have a better MP than Komodo for 4 cores but not for 12 or 16.
I agree, and believe that, when graphed, these ELO lines converge and intersect at approx. 60 min, and then diverge at TC > 60 min until at some unknown point in time point they might even become parallel
Perhaps at 120 but not at 60
This picture from CCRL 40/40 test with Houdini 4 (1 core) is telling enough
Image
ouachita
Posts: 454
Joined: Tue Jan 15, 2013 4:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by ouachita »

beram wrote: Perhaps at 120 but not at 60
Conceding that H4 is the King of Blitz, 2+2 and similar very STC, and short of preparing a dissertation, let's see if we can use some common sense on the question of, at which TC/CPU does H4 fall below KTCEC and/or SF DD+ (It will take awhile to post the various results):

Data #1 TCEC - final
120'+30"
16 cores
KTCEC= 3115
SF = 3103
H4= 3088

Data #2 (in progress a/o 12/14)
90 min +15 sec
12 cpu
KTCEC= 44 pts.
SF = 40.5
H4= 37.5

Data #3
90 min +15 sec
1 cpu
?

Data #4
AH_LTC Match: Stockfish DD vs Houdini 4
90'+30" TC
1 core
1 Stockfish DD 3183
2 Houdini 4 3161

Data #5
TC 30'+30"
i7-3960x 1-core
average games length ~2 h 35 m,831 games played:
1 Stockfish 111113 64 SSE4.2 : 2960
2 Houdini 4 Pro x64 : 2954

Data #6
AH_LTC Stairway to Heaven Competition
AH_LTC = 90'+30" (?)
1 Stockfish_SZ 13110122 3178
2 Houdini 4 3172
3 Komodo 6 3151

I need a 60'+ result, and I'm sure one is here but can't find it now. Still looks like +/-60+ to me.
SIM, PhD, MBA, PE
User avatar
Aser Huerga
Posts: 812
Joined: Tue Jun 16, 2009 10:09 am
Location: Spain

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by Aser Huerga »

Aser Huerga wrote: Next Test: Made In Heaven class Time Control Comparison
I will provide precise data about your debate at the end of the next week.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by lkaufman »

beram wrote:
ouachita wrote:
lkaufman wrote:So I'm inclined to think that on one core the breakeven level is around an hour for the game. It may not be shorter on 4 cores because I suspect that Houdini and SF have a better MP than Komodo for 4 cores but not for 12 or 16.
I agree, and believe that, when graphed, these ELO lines converge and intersect at approx. 60 min, and then diverge at TC > 60 min until at some unknown point in time point they might even become parallel
Perhaps at 120 but not at 60
This picture from CCRL 40/40 test with Houdini 4 (1 core) is telling enough
Image
By itself how does this relate to the discussion of relative strength of Houdini 4 and Komodo TCEC? Are the 40/40 minute ratings for each of these posted somewhere?
I see they were posted right after I wrote this. Houdini 4 is 8 elo points above Komodo TCEC, but Komodo has only 37 games, so this is rather meaningless. Still this is consistent with the crossover point being somewhere around 1 hour plus some increment.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by Milos »

ouachita wrote:
lkaufman wrote:Has there been a single match report yet of any Houdini beating Komodo TCEC or any recent Stockfish version in a long match at 90' per side or more?
I am unaware of any test of >=60+ wherein H4 beat KTCEC or >=SF DD in a match or test. My review reveals that their ELO curves intersect around the 60+ mark, but mine is just an unscientific observation
That's simply because you and all ppl who basically are fans supporting a particular engine (usually SF or Komodo) stubbornly refuse to test Houdinis with proper settings for these kind of matches i.e. with contempt 0.
On my machine on equivalently 2x longer SMP TC (30'+10'' on 6 cores) than this tournament SF021113 which is probably not more than 10Elo weaker than SF DD, has less than 10Elo advantage after 500 matches over H3, meaning probably that even SF DD is within error bars from H3. I can't imagine any TC where SF advantage would be even 1SD from H4 contempt 0.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by lkaufman »

Milos wrote:
ouachita wrote:
lkaufman wrote:Has there been a single match report yet of any Houdini beating Komodo TCEC or any recent Stockfish version in a long match at 90' per side or more?
I am unaware of any test of >=60+ wherein H4 beat KTCEC or >=SF DD in a match or test. My review reveals that their ELO curves intersect around the 60+ mark, but mine is just an unscientific observation
That's simply because you and all ppl who basically are fans supporting a particular engine (usually SF or Komodo) stubbornly refuse to test Houdinis with proper settings for these kind of matches i.e. with contempt 0.
On my machine on equivalently 2x longer SMP TC (30'+10'' on 6 cores) than this tournament SF021113 which is probably not more than 10Elo weaker than SF DD, has less than 10Elo advantage after 500 matches over H3, meaning probably that even SF DD is within error bars from H3. I can't imagine any TC where SF advantage would be even 1SD from H4 contempt 0.
I grant that Houdini plays stronger against near-equals with contempt set to zero. But consider:
1. Same is true for Komodo, drawscore should be zero rather than -7 when playing Houdini and Stockfish. So this is only a fully valid argument when comparing Houdini to Stockfish
2. TCEC was run with options chosen by the programmer(s). I don't know which contempt setting Houdini used in the round before the finals, but it was whatever Robert thought best. The TCEC ratings are based mostly on games against weaker opponents, so Houdini would probably have a lower rating if contempt zero was used for the whole event.
3. If testers would force more close pairings (as in this thread) and also use Ordo rather than BayesElo (because Ordo weights close pairing more heavily than BayesElo), programmers would choose lower default settings for contempt. Then this wouldn't be a big issue. This does seem to be the trend now.
4. Testers at all levels mostly test default. So the huge decline in Houdini's strength from blitz to slow chess (relative to SF and K) is unrelated to contempt.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by Milos »

lkaufman wrote:I grant that Houdini plays stronger against near-equals with contempt set to zero. But consider:
1. Same is true for Komodo, drawscore should be zero rather than -7 when playing Houdini and Stockfish. So this is only a fully valid argument when comparing Houdini to Stockfish
2. TCEC was run with options chosen by the programmer(s). I don't know which contempt setting Houdini used in the round before the finals, but it was whatever Robert thought best. The TCEC ratings are based mostly on games against weaker opponents, so Houdini would probably have a lower rating if contempt zero was used for the whole event.
3. If testers would force more close pairings (as in this thread) and also use Ordo rather than BayesElo (because Ordo weights close pairing more heavily than BayesElo), programmers would choose lower default settings for contempt. Then this wouldn't be a big issue. This does seem to be the trend now.
4. Testers at all levels mostly test default. So the huge decline in Houdini's strength from blitz to slow chess (relative to SF and K) is unrelated to contempt.
Larry you advertise your product and have strong bias therefore I think discussing with you is pointless.
You clearly do strawman here. You say rating lists test with default, but you are here stressing out LTC relative performance of SF, K and H. Relative performance has nothing to do with rating lists, apples and oranges. No Bayeselo or Ordo (which is btw. total BS program with bogus output that some ppl here support just because they are brown-nosing Miguel) can change this fact and you mentioning them is nothing but another strawman.
If you want relative performance than test with parameters best for relative performance otherwise the only valid claim is that with particular parameter (default) H is weaker in relative performance vs. K and SF in LTC, however in LTC rating lists H4 will be ahead of K and SF thanks to its contempt.
So when results please you like here you insist on default parameters. When results don't please you like rating lists you either complain or pretend these results don't exist (as we see here when Bram mentioned them).
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by Milos »

ouachita wrote:Larry should feel good about this result. On the other hand, Robert must feel as outnumbered as the British were by the main Zulu army.
Interesting comparison, I wonder only why British and Zulu, why not General Caster and Crazy Horse? :lol:
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: AH_LTC Match: Komodo TCEC vs Houdini 4

Post by Adam Hair »

Milos wrote:Ordo (which is btw. total BS program with bogus output that some ppl here support just because they are brown-nosing Miguel)
Hi Milos, Mr Brown-noser here. You can always examine the source code to Ordo 0.8 and point out the mistakes to everyone:

https://sites.google.com/site/gaviotachessengine/ordo