SMP speed up

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

SMP speed up

Post by michiguel »

This is a spin off of another thread.

I never gave a lot thought about it but the well know formula for Crafty

speedup = 1 + (NCPUS - 1) * 0.7

may indicate that the inefficiency is not related to the Amdahl's law, even if this applies to a low number of CPUs. What is the cause of the parallel inefficiency? The shape of the tree? still, it looks like it should either saturate quicker or the speed up with 2 cores should be higher than 1.7

Was this investigated?

Miguel
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: SMP speed up

Post by rbarreira »

I think it pays off to simplify the formula for understanding:

1+ (NCPUS - 1) * 0.7 =

0.7 * NCPUS + 0.3

What strikes me as surprising about it is that converges very quickly to 70% efficiency, which it will never go under of course. It implies that going from 1 CPU to 2 CPUs results in much more added overhead than going from 2 to 4 CPUs, and so on.

I think it's a strange formula, but if it has been correctly measured that way, what can I say... (though Robert did say he only measured up to 64 cores)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP speed up

Post by bob »

michiguel wrote:This is a spin off of another thread.

I never gave a lot thought about it but the well know formula for Crafty

speedup = 1 + (NCPUS - 1) * 0.7

may indicate that the inefficiency is not related to the Amdahl's law, even if this applies to a low number of CPUs. What is the cause of the parallel inefficiency? The shape of the tree? still, it looks like it should either saturate quicker or the speed up with 2 cores should be higher than 1.7

Was this investigated?

Miguel
I beat it to death for 1-8 cores. About all that can be said is that after the first CPU, every processor added makes the tree grow. If you run the old CB positions, which are all the positions from a single real game that everyone has seen, it is pretty consistent. There is lots of variation until you average over all the moves, and then that 30% extra nodes per CPU starts to settle down. You could drive this down by improving move ordering to get a higher percentage of fail highs on the first move. But I have been stuck at 90-92% for years, which pretty well fixes the overhead since one of every 10 splits (when I split right after the first move) is going to be at a bad point that adds overhead...

There is likely a mathematical model that considers fh % as computed in Crafty, and predicts the speedup, but I have never tried to quantify that at all since it would not be of any benefit. We all know that YBW depends on the first move being the one to cause a cutoff, else we think it is an ALL node.

In my dissertation I tackled this by searching perfectly ordered trees and got perfect linearity in the speedup. But I faked the eval to produce a monotonically decreasing value to simulate perfect ordering. I also tackled worst-first, which is pure minimax, and also got perfect speedups. But it was slow with no cutoffs at all. It is the very good trees that cause the problem. :)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP speed up

Post by bob »

rbarreira wrote:I think it pays off to simplify the formula for understanding:

1+ (NCPUS - 1) * 0.7 =

0.7 * NCPUS + 0.3

What strikes me as surprising about it is that converges very quickly to 70% efficiency, which it will never go under of course. It implies that going from 1 CPU to 2 CPUs results in much more added overhead than going from 2 to 4 CPUs, and so on.
how do you figure that?

[/quote]

There are lots of things at play. With 2 cpus, only 1 is doing unnecessary work at a split point that was poorly chosen. With 4, that goes up to 3. So although it might look like the overhead is going down, it really is not.


I think it's a strange formula, but if it has been correctly measured that way, what can I say... (though Robert did say he only measured up to 64 cores)
And remember, with 64 cores, you you 7 data points that don't lie on a perfectly straight line. I simply chose a good approximation. Originally that formula worked for 1-2-4. then we added 8 and it still fit well. And then 16 and 32. Eugene ran it on a 64 core Itanium which was possibly a tainted result since it was a different architecture, but the 1-2-4-8-16-32-64 numbers still stayed around that line. It is not a perfect fit. But it is a good 1st approximation, which is all I have ever called it...
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: SMP speed up

Post by Milos »

Just a reply from the previous thread

"More on Bob's formula...

speedup_2=1+0.7=1.7
speedup_32=1+31*0.7=22.7
speedup_64=1+63*0.7=45.1

speedup_64/speedup32=1.99!!!
speedup_2=1.7

17% more gain when going from 32 to 64 than from 1 to 2 cores."

I let the ppl make their own conclusions ;).
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: SMP speed up

Post by Milos »

bob wrote:And remember, with 64 cores, you you 7 data points that don't lie on a perfectly straight line. I simply chose a good approximation. Originally that formula worked for 1-2-4. then we added 8 and it still fit well. And then 16 and 32. Eugene ran it on a 64 core Itanium which was possibly a tainted result since it was a different architecture, but the 1-2-4-8-16-32-64 numbers still stayed around that line. It is not a perfect fit. But it is a good 1st approximation, which is all I have ever called it...
I'll repeat. I hope you didn't published a paper with this kind of results, because this would be just a farce...
User avatar
mhull
Posts: 13447
Joined: Wed Mar 08, 2006 9:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: SMP speed up

Post by mhull »

Milos wrote:Just a reply from the previous thread

"More on Bob's formula...

speedup_2=1+0.7=1.7
speedup_32=1+31*0.7=22.7
speedup_64=1+63*0.7=45.1

speedup_64/speedup32=1.99!!!
speedup_2=1.7

17% more gain when going from 32 to 64 than from 1 to 2 cores."

I let the ppl make their own conclusions ;).
And your tests confirm Bob is wrong?
Matthew Hull
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP speed up

Post by bob »

Milos wrote:
bob wrote:And remember, with 64 cores, you you 7 data points that don't lie on a perfectly straight line. I simply chose a good approximation. Originally that formula worked for 1-2-4. then we added 8 and it still fit well. And then 16 and 32. Eugene ran it on a 64 core Itanium which was possibly a tainted result since it was a different architecture, but the 1-2-4-8-16-32-64 numbers still stayed around that line. It is not a perfect fit. But it is a good 1st approximation, which is all I have ever called it...
I'll repeat. I hope you didn't published a paper with this kind of results, because this would be just a farce...
I wish you would offer some sort of supporting evidence for your arguments, otherwise you just look foolish.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP speed up

Post by bob »

mhull wrote:
Milos wrote:Just a reply from the previous thread

"More on Bob's formula...

speedup_2=1+0.7=1.7
speedup_32=1+31*0.7=22.7
speedup_64=1+63*0.7=45.1

speedup_64/speedup32=1.99!!!
speedup_2=1.7

17% more gain when going from 32 to 64 than from 1 to 2 cores."

I let the ppl make their own conclusions ;).
And your tests confirm Bob is wrong?
No, his tests confirm he is either dense, an ass, or a troll. Nothing more or less. He is not offering _any_ data or observations of any kind, just boorish nonsense...
User avatar
mhull
Posts: 13447
Joined: Wed Mar 08, 2006 9:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: SMP speed up

Post by mhull »

bob wrote:
mhull wrote:
Milos wrote:Just a reply from the previous thread

"More on Bob's formula...

speedup_2=1+0.7=1.7
speedup_32=1+31*0.7=22.7
speedup_64=1+63*0.7=45.1

speedup_64/speedup32=1.99!!!
speedup_2=1.7

17% more gain when going from 32 to 64 than from 1 to 2 cores."

I let the ppl make their own conclusions ;).
And your tests confirm Bob is wrong?
No, his tests confirm he is either dense, an ass, or a troll. Nothing more or less. He is not offering _any_ data or observations of any kind, just boorish nonsense...
It was a dig, since he ran no tests. :)
Matthew Hull