SMP rating influence

bob · Post by **bob** » Mon Jan 12, 2009 2:23 am

This week (and weekend) I have been chasing a very old parallel search issue. Effect was minor but I had decided it was time for it to go away. And I finally found it. It didn't happen very often so I decided to do a couple of cluster runs, with the only difference being that Crafty would use 2 CPUs while the opponents would use one. I started this run after the bug was fixed, to make sure that after 64K games or so it did not happen again. It is getting close to done, but when looking at the results, I realized that I had some interesting data. I had two 32K runs with one cpu for crafty vs one cpu for opponents. I've published these numbers several times. But now I had the _same_ results for Crafty with 2 cpus against 1 cpu opponents.

What I was able to discover is what kind of a rating difference I found, changing absolutely nothing but giving Crafty an extra CPU against the same opponents and positions, same time control, etc.

Bottom line was +75 Elo. In previous runs, Crafty was about 60 Elo below the latest glaurung2 and Toga2, about 50 Elo better than Fruit, and about 110 Elo above Glaurung 1. With 2 cpus, Crafty finished +10 over glaurung 2, +25 over Toga2, +125 over fruit 2 and +190 over glaurung 2.

When I have time I will try to repeat this using 1, 2, 4 and 8 cpus on the other cluster. I will post the Bayeselo output once all the games have finished.. Things only run half as fast since I can only run one game per node, not two.

Mark · Post by **Mark** » Mon Jan 12, 2009 2:36 am

bob wrote:This week (and weekend) I have been chasing a very old parallel search issue. Effect was minor but I had decided it was time for it to go away. And I finally found it. It didn't happen very often so I decided to do a couple of cluster runs, with the only difference being that Crafty would use 2 CPUs while the opponents would use one. I started this run after the bug was fixed, to make sure that after 64K games or so it did not happen again. It is getting close to done, but when looking at the results, I realized that I had some interesting data. I had two 32K runs with one cpu for crafty vs one cpu for opponents. I've published these numbers several times. But now I had the _same_ results for Crafty with 2 cpus against 1 cpu opponents.

What I was able to discover is what kind of a rating difference I found, changing absolutely nothing but giving Crafty an extra CPU against the same opponents and positions, same time control, etc.

Bottom line was +75 Elo. In previous runs, Crafty was about 60 Elo below the latest glaurung2 and Toga2, about 50 Elo better than Fruit, and about 110 Elo above Glaurung 1. With 2 cpus, Crafty finished +10 over glaurung 2, +25 over Toga2, +125 over fruit 2 and +190 over glaurung 2.

When I have time I will try to repeat this using 1, 2, 4 and 8 cpus on the other cluster. I will post the Bayeselo output once all the games have finished.. Things only run half as fast since I can only run one game per node, not two.

Is 75 elo about what you expected? Do you have any guesses as to what elo increase you'll get for 4 and 8 processors? Thanks!

CRoberson · Post by **CRoberson** » Mon Jan 12, 2009 3:37 am

That is good info/work.

Reviewing the info from the SSDF, it seems that 4 CPUs vs 1 CPU is
worth around 120 Elo. So, I've been using an estimate of 70 for
two CPUs.

I predict you to achieve 120 to 150 points for 4 CPUs vs 1 and a sublinear
gain for 8 vs 1. It is sublinear for 4 vs 1 compared to 2 vs 1.
The reason is quite obvious and mathematically provable.

Interestingly, some of the engines in the SSDF list obtained
as little as an 80 pt gain for 4 CPUs.

bob · Post by **bob** » Mon Jan 12, 2009 5:08 am

Code: Select all

Rank Name               Elo    +    - games score oppo. draws
   1 Crafty-22.9R02-2  2664    5    4 31128   62%  2575   20% 
   2 Crafty-22.9R02-1  2662    4    4 31128   61%  2575   20% 
   3 Glaurung 2.2      2654    4    4 31128   54%  2625   21% 
   4 Toga2             2638    4    5 31128   52%  2625   22% 
   5 Crafty-22.9-1     2587    4    4 31128   51%  2575   21% 
   6 Crafty-22.9-2     2586    4    5 31128   51%  2575   20% 
   7 Fruit 2.1         2539    5    5 31128   38%  2625   22% 
   8 Glaurung 1.1 SMP  2471    4    4 31128   30%  2625   17%

22.9-1 and 22.9-2 are the latest 22.9. I made 2 runs which is where the -1 and -2 comes from.

22.9R02-1 and 22.9R02-2 are two runs with same code, but using 2 cpus instead of one.

bob · Post by **bob** » Mon Jan 12, 2009 5:09 am

Mark wrote:
bob wrote:This week (and weekend) I have been chasing a very old parallel search issue. Effect was minor but I had decided it was time for it to go away. And I finally found it. It didn't happen very often so I decided to do a couple of cluster runs, with the only difference being that Crafty would use 2 CPUs while the opponents would use one. I started this run after the bug was fixed, to make sure that after 64K games or so it did not happen again. It is getting close to done, but when looking at the results, I realized that I had some interesting data. I had two 32K runs with one cpu for crafty vs one cpu for opponents. I've published these numbers several times. But now I had the _same_ results for Crafty with 2 cpus against 1 cpu opponents.

What I was able to discover is what kind of a rating difference I found, changing absolutely nothing but giving Crafty an extra CPU against the same opponents and positions, same time control, etc.

Bottom line was +75 Elo. In previous runs, Crafty was about 60 Elo below the latest glaurung2 and Toga2, about 50 Elo better than Fruit, and about 110 Elo above Glaurung 1. With 2 cpus, Crafty finished +10 over glaurung 2, +25 over Toga2, +125 over fruit 2 and +190 over glaurung 2.

When I have time I will try to repeat this using 1, 2, 4 and 8 cpus on the other cluster. I will post the Bayeselo output once all the games have finished.. Things only run half as fast since I can only run one game per node, not two.
Is 75 elo about what you expected? Do you have any guesses as to what elo increase you'll get for 4 and 8 processors? Thanks!

I would expect 2-4 to have about the same, and ditto for 4-8 as the performance improvement is fairly linear thru 8 processors. I have run a good bit on 16 cores and the speedup stays about the same there as well. Beyond that the data gets more difficult to compare as the 32 and 64 node systems I have used in the past were NUMA which changes things some compared to the Intel boxes.

bob · Post by **bob** » Mon Jan 12, 2009 5:11 am

CRoberson wrote:That is good info/work.

Reviewing the info from the SSDF, it seems that 4 CPUs vs 1 CPU is
worth around 120 Elo. So, I've been using an estimate of 70 for
two CPUs.

I predict you to achieve 120 to 150 points for 4 CPUs vs 1 and a sublinear
gain for 8 vs 1. It is sublinear for 4 vs 1 compared to 2 vs 1.
The reason is quite obvious and mathematically provable.

Interestingly, some of the engines in the SSDF list obtained
as little as an 80 pt gain for 4 CPUs.

Here's the question. The speedup is linear from 1-2-4-8-16. The average is close to this:

speedup = 1 + (NCPUS - 1) * 0.7

the question is does the next ply bring about as much as the last ply. So far, it seems to be "yes..."

CRoberson · Post by **CRoberson** » Mon Jan 12, 2009 7:40 am

bob wrote:
CRoberson wrote:That is good info/work.

Reviewing the info from the SSDF, it seems that 4 CPUs vs 1 CPU is
worth around 120 Elo. So, I've been using an estimate of 70 for
two CPUs.

I predict you to achieve 120 to 150 points for 4 CPUs vs 1 and a sublinear
gain for 8 vs 1. It is sublinear for 4 vs 1 compared to 2 vs 1.
The reason is quite obvious and mathematically provable.

Interestingly, some of the engines in the SSDF list obtained
as little as an 80 pt gain for 4 CPUs.
Here's the question. The speedup is linear from 1-2-4-8-16. The average is close to this:

speedup = 1 + (NCPUS - 1) * 0.7

the question is does the next ply bring about as much as the last ply. So far, it seems to be "yes..."

Yes, your equation is the standard for predicting time to ply
improvement. My mistake - it is obviously linear. It is of the form
y = mx+b.

You are right, the sublinear issue is directly related to the question
"Is the Elo gain for each and every ply gained a linear graph?". In
other words "Does every ply gained produce a consistent Elo gain?".

It is my belief (as I stated several years ago in CCC)
that there exists a ply threshold after which the role of the
position/static eval plays a greater role in Elo performance.

A counter theory to this would be that all programs will perform
the same at a sufficiently deep fixed depth. Of course this theory
is true, if all programs could see to the end of the game. Thus,
there also must exist a depth threshold after which much of the
eval is meaningless.

hgm · Post by **hgm** » Mon Jan 12, 2009 9:47 am

This is equivalent to having a time-odds match. I always use the rule of thumb

EloGain = 100 * ln(sarchTime)

As ln(2) ~ 0.7, this means a 70-ELo gain for doubling the speed. The exact number probably depends on the quality of the program; programs with a good effective branching value are expected to gain more than programs with a poor branching factor (if they lie equally about their depth).

So 75 Elo matches very well with the theoretical expectation. The same parameter can be measured on non-SMP engines by using straightforward time odds.

Don · Post by **Don** » Mon Jan 12, 2009 6:56 pm

hgm wrote:This is equivalent to having a time-odds match. I always use the rule of thumb

EloGain = 100 * ln(sarchTime)

As ln(2) ~ 0.7, this means a 70-ELo gain for doubling the speed. The exact number probably depends on the quality of the program; programs with a good effective branching value are expected to gain more than programs with a poor branching factor (if they lie equally about their depth).

So 75 Elo matches very well with the theoretical expectation. The same parameter can be measured on non-SMP engines by using straightforward time odds.

This formula probably doesn't apply well at depths below 7 or 8 ply or something like that. But that is not what we are talking about here.

Do you think it's relatively reliable for a pretty good range of practical time conrtrols, for instance speed or even bullet chess to 40/2?

bob · Post by **bob** » Mon Jan 12, 2009 8:36 pm

hgm wrote:This is equivalent to having a time-odds match. I always use the rule of thumb

EloGain = 100 * ln(sarchTime)

As ln(2) ~ 0.7, this means a 70-ELo gain for doubling the speed. The exact number probably depends on the quality of the program; programs with a good effective branching value are expected to gain more than programs with a poor branching factor (if they lie equally about their depth).

So 75 Elo matches very well with the theoretical expectation. The same parameter can be measured on non-SMP engines by using straightforward time odds.

The only fly in the ointment is the parallel search efficiency issue. Not all parallel searches are created equal, and if the 2-cpu speedup is 1.3x, then this is not going to hold. Or if the 2cpu speedup is 2.0x, then it will be even better than what I reported...

SMP rating influence

SMP rating influence

Re: SMP rating influence

Re: SMP rating influence

Re: SMP rating influence - data here

Re: SMP rating influence

Re: SMP rating influence

Re: SMP rating influence

Re: SMP rating influence

Re: SMP rating influence

Re: SMP rating influence