## SMP speed up

**Moderators:** bob, hgm, Harvey Williamson

**Forum rules**

This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

### SMP speed up

This is a spin off of another thread.

I never gave a lot thought about it but the well know formula for Crafty

speedup = 1 + (NCPUS - 1) * 0.7

may indicate that the inefficiency is not related to the Amdahl's law, even if this applies to a low number of CPUs. What is the cause of the parallel inefficiency? The shape of the tree? still, it looks like it should either saturate quicker or the speed up with 2 cores should be higher than 1.7

Was this investigated?

Miguel

I never gave a lot thought about it but the well know formula for Crafty

speedup = 1 + (NCPUS - 1) * 0.7

may indicate that the inefficiency is not related to the Amdahl's law, even if this applies to a low number of CPUs. What is the cause of the parallel inefficiency? The shape of the tree? still, it looks like it should either saturate quicker or the speed up with 2 cores should be higher than 1.7

Was this investigated?

Miguel

### Re: SMP speed up

I think it pays off to simplify the formula for understanding:

1+ (NCPUS - 1) * 0.7 =

0.7 * NCPUS + 0.3

What strikes me as surprising about it is that converges very quickly to 70% efficiency, which it will never go under of course. It implies that going from 1 CPU to 2 CPUs results in much more added overhead than going from 2 to 4 CPUs, and so on.

I think it's a strange formula, but if it has been correctly measured that way, what can I say... (though Robert did say he only measured up to 64 cores)

1+ (NCPUS - 1) * 0.7 =

0.7 * NCPUS + 0.3

What strikes me as surprising about it is that converges very quickly to 70% efficiency, which it will never go under of course. It implies that going from 1 CPU to 2 CPUs results in much more added overhead than going from 2 to 4 CPUs, and so on.

I think it's a strange formula, but if it has been correctly measured that way, what can I say... (though Robert did say he only measured up to 64 cores)

### Re: SMP speed up

I beat it to death for 1-8 cores. About all that can be said is that after the first CPU, every processor added makes the tree grow. If you run the old CB positions, which are all the positions from a single real game that everyone has seen, it is pretty consistent. There is lots of variation until you average over all the moves, and then that 30% extra nodes per CPU starts to settle down. You could drive this down by improving move ordering to get a higher percentage of fail highs on the first move. But I have been stuck at 90-92% for years, which pretty well fixes the overhead since one of every 10 splits (when I split right after the first move) is going to be at a bad point that adds overhead...michiguel wrote:This is a spin off of another thread.

I never gave a lot thought about it but the well know formula for Crafty

speedup = 1 + (NCPUS - 1) * 0.7

may indicate that the inefficiency is not related to the Amdahl's law, even if this applies to a low number of CPUs. What is the cause of the parallel inefficiency? The shape of the tree? still, it looks like it should either saturate quicker or the speed up with 2 cores should be higher than 1.7

Was this investigated?

Miguel

There is likely a mathematical model that considers fh % as computed in Crafty, and predicts the speedup, but I have never tried to quantify that at all since it would not be of any benefit. We all know that YBW depends on the first move being the one to cause a cutoff, else we think it is an ALL node.

In my dissertation I tackled this by searching perfectly ordered trees and got perfect linearity in the speedup. But I faked the eval to produce a monotonically decreasing value to simulate perfect ordering. I also tackled worst-first, which is pure minimax, and also got perfect speedups. But it was slow with no cutoffs at all. It is the very good trees that cause the problem.

### Re: SMP speed up

how do you figure that?rbarreira wrote:I think it pays off to simplify the formula for understanding:

1+ (NCPUS - 1) * 0.7 =

0.7 * NCPUS + 0.3

What strikes me as surprising about it is that converges very quickly to 70% efficiency, which it will never go under of course. It implies that going from 1 CPU to 2 CPUs results in much more added overhead than going from 2 to 4 CPUs, and so on.

[/quote]

There are lots of things at play. With 2 cpus, only 1 is doing unnecessary work at a split point that was poorly chosen. With 4, that goes up to 3. So although it might look like the overhead is going down, it really is not.

And remember, with 64 cores, you you 7 data points that don't lie on a perfectly straight line. I simply chose a good approximation. Originally that formula worked for 1-2-4. then we added 8 and it still fit well. And then 16 and 32. Eugene ran it on a 64 core Itanium which was possibly a tainted result since it was a different architecture, but the 1-2-4-8-16-32-64 numbers still stayed around that line. It is not a perfect fit. But it is a good 1st approximation, which is all I have ever called it...

I think it's a strange formula, but if it has been correctly measured that way, what can I say... (though Robert did say he only measured up to 64 cores)

### Re: SMP speed up

Just a reply from the previous thread

"More on Bob's formula...

speedup_2=1+0.7=1.7

speedup_32=1+31*0.7=22.7

speedup_64=1+63*0.7=45.1

speedup_64/speedup32=1.99!!!

speedup_2=1.7

17% more gain when going from 32 to 64 than from 1 to 2 cores."

I let the ppl make their own conclusions .

"More on Bob's formula...

speedup_2=1+0.7=1.7

speedup_32=1+31*0.7=22.7

speedup_64=1+63*0.7=45.1

speedup_64/speedup32=1.99!!!

speedup_2=1.7

17% more gain when going from 32 to 64 than from 1 to 2 cores."

I let the ppl make their own conclusions .

### Re: SMP speed up

I'll repeat. I hope you didn't published a paper with this kind of results, because this would be just a farce...bob wrote:And remember, with 64 cores, you you 7 data points that don't lie on a perfectly straight line. I simply chose a good approximation. Originally that formula worked for 1-2-4. then we added 8 and it still fit well. And then 16 and 32. Eugene ran it on a 64 core Itanium which was possibly a tainted result since it was a different architecture, but the 1-2-4-8-16-32-64 numbers still stayed around that line. It is not a perfect fit. But it is a good 1st approximation, which is all I have ever called it...

### Re: SMP speed up

And your tests confirm Bob is wrong?Milos wrote:Just a reply from the previous thread

"More on Bob's formula...

speedup_2=1+0.7=1.7

speedup_32=1+31*0.7=22.7

speedup_64=1+63*0.7=45.1

speedup_64/speedup32=1.99!!!

speedup_2=1.7

17% more gain when going from 32 to 64 than from 1 to 2 cores."

I let the ppl make their own conclusions .

Matthew Hull

### Re: SMP speed up

I wish you would offer some sort of supporting evidence for your arguments, otherwise you just look foolish.Milos wrote:I'll repeat. I hope you didn't published a paper with this kind of results, because this would be just a farce...bob wrote:And remember, with 64 cores, you you 7 data points that don't lie on a perfectly straight line. I simply chose a good approximation. Originally that formula worked for 1-2-4. then we added 8 and it still fit well. And then 16 and 32. Eugene ran it on a 64 core Itanium which was possibly a tainted result since it was a different architecture, but the 1-2-4-8-16-32-64 numbers still stayed around that line. It is not a perfect fit. But it is a good 1st approximation, which is all I have ever called it...

### Re: SMP speed up

No, his tests confirm he is either dense, an ass, or a troll. Nothing more or less. He is not offering _any_ data or observations of any kind, just boorish nonsense...mhull wrote:And your tests confirm Bob is wrong?Milos wrote:Just a reply from the previous thread

"More on Bob's formula...

speedup_2=1+0.7=1.7

speedup_32=1+31*0.7=22.7

speedup_64=1+63*0.7=45.1

speedup_64/speedup32=1.99!!!

speedup_2=1.7

17% more gain when going from 32 to 64 than from 1 to 2 cores."

I let the ppl make their own conclusions .

### Re: SMP speed up

It was a dig, since he ran no tests.bob wrote:No, his tests confirm he is either dense, an ass, or a troll. Nothing more or less. He is not offering _any_ data or observations of any kind, just boorish nonsense...mhull wrote:And your tests confirm Bob is wrong?Milos wrote:Just a reply from the previous thread

"More on Bob's formula...

speedup_2=1+0.7=1.7

speedup_32=1+31*0.7=22.7

speedup_64=1+63*0.7=45.1

speedup_64/speedup32=1.99!!!

speedup_2=1.7

17% more gain when going from 32 to 64 than from 1 to 2 cores."

I let the ppl make their own conclusions .

Matthew Hull