CCRL 40/4 lists updated (11th August 2012)

lkaufman · Post by **lkaufman** » Wed Aug 15, 2012 3:47 pm

Modern Times wrote:
Uri Blass wrote:
I do not understand your confidence that 10 elo difference is impossible.

CCRL did not play enough games for Komodo to have a statistical error that is lower than 10 elo so the fact that they see no difference between SSE and not SSE proves nothing.
True, but equally Larry's assertion that there *is* a 10 Elo difference is also impossible to prove, and until he proves it I don't believe it. His conclusions from the CEGT results to back up his 1.3 Elo per percentage point improvement are flawed because of the error margins on that list (and ours)

If you combine six different versions, I think the error margins are pretty small.

lkaufman · Post by **lkaufman** » Wed Aug 15, 2012 3:49 pm

Modern Times wrote:There are other fatal flaws in this line of reasoning:

- doubling the speed of a 2800 engine say, will yield different gains from doubling the speed of a 2950 engine (law of diminishing returns)

- a 90 Elo increase from doubling the speed may not be a linear progression. It could well be that a 7% speed increase achieves zero Elo, and any improvement comes later.

Your first point is valid, though I don't see much evidence of it in the data. Your second point seems far-fetched, it is totally contrary to all of our experience.

Modern Times · Post by **Modern Times** » Wed Aug 15, 2012 4:50 pm

lkaufman wrote:
Modern Times wrote:There are other fatal flaws in this line of reasoning:

- doubling the speed of a 2800 engine say, will yield different gains from doubling the speed of a 2950 engine (law of diminishing returns)

- a 90 Elo increase from doubling the speed may not be a linear progression. It could well be that a 7% speed increase achieves zero Elo, and any improvement comes later.
Your first point is valid, though I don't see much evidence of it in the data. Your second point seems far-fetched, it is totally contrary to all of our experience.

The second point in my view is by far the biggest factor. The increase just won't be linear. Just drawing a straight line is precisely what it won't be. But, difficult to prove either way, but logically it just won't be.

lkaufman · Post by **lkaufman** » Wed Aug 15, 2012 5:06 pm

Modern Times wrote:
lkaufman wrote:
Modern Times wrote:There are other fatal flaws in this line of reasoning:

- doubling the speed of a 2800 engine say, will yield different gains from doubling the speed of a 2950 engine (law of diminishing returns)

- a 90 Elo increase from doubling the speed may not be a linear progression. It could well be that a 7% speed increase achieves zero Elo, and any improvement comes later.
Your first point is valid, though I don't see much evidence of it in the data. Your second point seems far-fetched, it is totally contrary to all of our experience.
The second point in my view is by far the biggest factor. The increase just won't be linear. Just drawing a straight line is precisely what it won't be. But, difficult to prove either way, but logically it just won't be.

It would be valid if we always did a 20 ply search (for example) at a given level; then a 7% speedup would be worthless. But in actual play the depth reached varies enough from one position to another to smooth this out, so a given percentage speedup always allows an extra ply in roughly the same percentage of positions. This is very clear to us; if we speed up the program by a few percent, we always gain about the expected number of elo points, regardless of the precise time limit or current elo rating.

geots · Post by **geots** » Thu Aug 16, 2012 7:00 am

lkaufman wrote:
geots wrote:
lkaufman wrote:George, I'll ask you, do you have any reason to think that Komodo performs better or worse (against Houdini, Critter, Ivanhoe, and Stockfish) at repeating time controls as compared to increment time controls? We don't test at repeating time controls (except for very fast tests on rare occasions when working on time control) because they are a big waste of time, increment play is clearly superior for testing. But you are right, it is possible that Komodo is weaker at repeating controls. I'm asking if you have reason to believe this is actually the case.

I'll just point out that although our CCRL and CEGT blitz ratings are lower than what we get with our increment testing, the CCRL and CEGT ratings at intermediate levels (40/40 and 40/20) for Komodo seem about right relative to Houdini, and they are also repeating controls.

No, I really don't have any basis in fact to know that Komodo performs better or worse against said engines at incremental controls or repeating controls. Telling Uri that the controls could make more difference than sse vs no-sse I still will stand by. They very well "could", because the sse vs no-sse difference is so minimal anyway. I believe Jean Paul will agree with me on that.

But I am at a disadvantage here- because I don't follow what the point is. I'll go back to what Joe Garagiola said about Ted Williams. "He is a pure hitter, and he doesn't care about the conditions. He could hit the ball at midnight in a wind tunnel." That's the same with the Number 1 engine in the world. If you were playing Houdini for the championship, I doubt Robert would care if it was long control, short control, repeating or incremental. The programmer of the number 1 engine never cares. All he wants is to play.

So personally, I don't think the control used is going to have a thing to do with who is Number 1 and who is Number 2. I am probably the only person who has combined as much repeating controls with incremental controls in testing Komodo ag. Houdini. Probably 50-50. And I have not seen it make a difference one way or the other. And I was a bit surprised to see Houdini do a little better at 40/40 than at 40/4. My guess would be 5 to 8 elo. I say "guess", because I don't have the games to back that up yet.

I switched once from 4m+2s to 40/3 repeating, because some engines were having a lot of time losses. And generally speaking, I never went back, when just testing for myself. As for beta testing- I just follow orders- which is the way it should be.

But it doesn't matter if it is Vas, Robert, Richard, you and Don, whoever- if it makes a difference to the author what type of controls are used- he needs to go back to the drawing board. The person with a true Number 1 engine doesn't care.

Best,

george

Larry,

Of course it can make a large elo difference which type of time control you use, because one engine may have a seriously bad time algorithm for one or the other. This also implies that one might have the best engine but it might test as worse at some type of time control.
However as things actually stand now I don't think any of the top few engines has a seriously bad time algorithm, and so at the present time you are right to say the best engine should win at any type of time control. But I think this does not mean at any level; very fast time controls like 1' bullet chess may not be representative of normal chess as things stand now. In a couple more years that may no longer be true, but right now bullet chess favors Ippo and all derivatives and relatives of it. CCRL and CEGT blitz levels (40/3' on various hardware, mostly a bit old) are still too close to bullet chess to be good predictors of results at their intermediate levels, I think.

And Larry, I agree with you 100% on what you say about 1' bullet games. But i carry it even further. I think as a tool for telling you anything important about your engine- they are as useless as tits on a bull.

Best,

george

Modern Times · Post by **Modern Times** » Thu Aug 16, 2012 8:56 am

Back to the SSE4 question - CEGT have confirmed that *all* their games were SSE4, so Larry's comment in the CEGT thread about the use of non-SSE hardware
"This is starting to look like the main culprit, in CCRL as well. "

is not entirely true

However, CEGT used AMD X4, and it would be interesting to see what percentage of games were from AMD. becuase I have a suspicion (unproven) that this may be a factor

Perhaps it is:
CCRL - use of non-SSE hardware
CEGT - use of non-Intel hardware

All speculation. But I do know that in 40/40 testing, Komodo 5 performed poorly in all the in initial games, which were my AMD games, and it's rating only pulled up after other testers submitted Intel games.

Who knows what the truth is, the error margins are so big that there is no answer. Or maybe that is in fact the answer.

geots · Post by **geots** » Thu Aug 16, 2012 9:38 am

Modern Times wrote:Back to the SSE4 question - CEGT have confirmed that *all* their games were SSE4, so Larry's comment in the CEGT thread about the use of non-SSE hardware
"This is starting to look like the main culprit, in CCRL as well. "

is not entirely true

However, CEGT used AMD X4, and it would be interesting to see what percentage of games were from AMD. becuase I have a suspicion (unproven) that this may be a factor

Perhaps it is:
CCRL - use of non-SSE hardware
CEGT - use of non-Intel hardware

All speculation. But I do know that in 40/40 testing, Komodo 5 performed poorly in all the in initial games, which were my AMD games, and it's rating only pulled up after other testers submitted Intel games.

Who knows what the truth is, the error margins are so big that there is no answer. Or maybe that is in fact the answer.

I may not know what the truth is, but 2 things I do know: 1. Larry is wrong about 40/4. It will give you very reliable results- and it is a mistake to mention 40/4 in the same sentence with 1' Bullet- which gives you ABSOLUTELY NOTHING but a waste of time as far as learning anything about your engine.

2nd- Any differences in ratings that amount to anything worth mentioning CAN HAVE NOTHING to do with sse or pcnt. That is not my observation- that is a FACT. I know it to be true- from people that WOULD know and 2000 games I have under my belt. But I don't plan on hanging around and arguing the point. Doesn't interest me what anyone believes.

ThatsIt · Post by **ThatsIt** » Thu Aug 16, 2012 9:48 am

CEGT 40/4

Code: Select all

Komodo 5.0 x64 1CPU (ELO 2991 out of 1800 games until now) vs
001 Houdini 2.0c x64 4CPU     - 3143 100 + 07 = 45 - 48 29.5 % 2991 (Intel i5)
016 Houdini 1.5 x64 1CPU      - 3018 100 + 27 = 44 - 29 49.0 % 3011 (AMD X-4)
019 Critter 1.6 x64 1CPU      - 3012 100 + 25 = 45 - 30 47.5 % 2995 (AMD X-4)
021 Strelka 5.0 x64 1CPU      - 3007 100 + 29 = 45 - 26 51.5 % 3017 (AMD X-4)
035 Rybka 4.0 x64 1CPU        - 2968 100 + 28 = 49 - 23 52.5 % 2985 (AMD X-4)
037 Stockfish 2.2.2 x64 1CPU  - 2967 100 + 29 = 42 - 29 50.0 % 2967 (AMD X-4)
054 Gull II beta2 x64 1CPU    - 2935 100 + 40 = 44 - 16 62.0 % 3019 (AMD X-4)
105 Protector 1.4.0 x64 4CPU  - 2840 100 + 58 = 27 - 15 71.5 % 3000 (AMD X-4)
120 Naum 4.2 x64 1CPU         - 2826 100 + 66 = 24 - 10 78.0 % 3047 (AMD X-4)
138 Fritz 13                  - 2807 100 + 53 = 36 - 11 71.0 % 2963 (AMD X-4)
143 Gull 1.2 x64              - 2804 100 + 60 = 32 - 08 76.0 % 3005 (AMD X-4)
147 Deep Shredder 12 x64 1CPU - 2800 100 + 63 = 29 - 08 77.5 % 3016 (AMD X-4)
153 Hannibal 1.2 x64          - 2793 100 + 57 = 35 - 08 74.5 % 2980 (AMD X-4)
168 Deep Sjeng ct 2010 1CPU   - 2779 100 + 65 = 25 - 10 77.5 % 2995 (AMD X-4)
174 Spike 1.4 1CPU            - 2775 100 + 60 = 31 - 09 75.5 % 2972 (AMD X-4)
178 Hiarcs 13.2 1CPU          - 2774 100 + 67 = 27 - 06 80.5 % 3021 (AMD X-4)
182 Spark 1.0 x64 1CPU        - 2771 100 + 66 = 28 - 06 80.0 % 3012 (AMD X-4)
204 Deep Junior 13.3 x64 1CPU - 2753 100 + 70 = 27 - 03 83.5 % 3033 (Intel i5)
and unpublished yet:
xxx Critter 1.4 x64 1CPU      - 2978 100 + 33 = 55 - 12 51.5 % 2988 (Intel i5)
xxx Strelka 5.0 x64 1CPU      - 3007 100 + 38 = 44 - 18 48.0 % 2993 (Intel i5)

Best wishes,
G.S.
(CEGT member)

geots · Post by **geots** » Thu Aug 16, 2012 9:56 am

I forgot one last thing. I am willing to make a bet. We can pick a number of engine-engine matches. But we have to put money on what we believe. So whoever disagrees with me- just get in touch. I will put one thousand dollars on the table that says run each couple at 40/4 and then 40/40. My money says that at least 9 out of 10 that win a 40/4 in a 50 game match will also win at 40/40. We shall have a neutral tester. But before you get sucked in- remember- I never gamble.

Modern Times · Post by **Modern Times** » Thu Aug 16, 2012 10:21 am

I agree George, as I said before if SSE4 gives even 5 Elo I would be surprised

CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)

Re: CCRL 40/4 lists updated (11th August 2012)