superlinear speedups

CRoberson · Post by **CRoberson** » Tue Nov 04, 2008 3:23 am

I looked into this in the early 90's for nonchess tree searches.
To achieve superlinear speedups, one has to have a poorly ordered
tree.

Here is a simple example of how it works.
Given an unordered tree to be searched by a branch and bound algorithm,
one can easily achieve superlinear speed up under this
scenario: lets say the optimal search path is the first path in the
second half of the tree. Lets say your split algorithm splits the nodes
at the root of the tree only. The first thread takes the first half of the
tree and the second thread takes the last half. Also, the bound value
is shared between the two threads. Also, lets say that the optimal path
produces serious pruning once found.

So, in the serial form you search quite a bit through the first half of the
tree and get some prunning until you pick up the first path in the second
half of the tree (this is the optimal path), now you get serious prunning.
So, more than half the time of the serial algorithm is spent in the first
half of the tree.

In the parallel version described, the optimal path is found as the first
path by the second thread. The bound is shared then serious pruning
happens for both halves of the tree. Lets say this extra pruning in the
first half of the tree gives a 7x speed up.

So, the second half of the tree continues at the same speed as before,
but the first half gets a 7x speed up. This is superlinear.

I explained this to some of the engineers I knew at Intel and some of
them didn't get it. They kept saying "but you have only two processors.
How can you get a 5x speed up. You are limited by the hardware."

diep · Post by **diep** » Mon Nov 10, 2008 10:40 pm

jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James

Hi James,

Big definition problem first to solve: "what is a superlineair speedup".

Historically in computerchess that's getting n speedup out of n nodes,
however in many worlds if you get speedup of

1.95+ out of 2
3.86 out of 4
7.0 out of 8

(to quote diep's speedups)

I was told the above is called in a lot of literature also a superlineair speedup.

So we first must settle upon what a superlineair speedup is, before going into a formal proof.

Vincent

p.s. i didn't write the above to save Bob (if you have read his thesis with claims about getting superlineair speedups)

Zach Wegner · Post by **Zach Wegner** » Mon Nov 10, 2008 11:27 pm

diep wrote:Hi James,

Big definition problem first to solve: "what is a superlineair speedup".

Historically in computerchess that's getting n speedup out of n nodes,
however in many worlds if you get speedup of

1.95+ out of 2
3.86 out of 4
7.0 out of 8

(to quote diep's speedups)

I was told the above is called in a lot of literature also a superlineair speedup.

So we first must settle upon what a superlineair speedup is, before going into a formal proof.

Vincent

p.s. i didn't write the above to save Bob (if you have read his thesis with claims about getting superlineair speedups)

I'm pretty sure he's talking about things like getting a speedup of 5 out of 4 processors, etc. It's funny, I thought Diep had 2.0+ out of 2 processors

Anyways, congratulations on being able to measure speedup to 3 significant figures.

bob · Post by **bob** » Mon Nov 10, 2008 11:33 pm

diep wrote:
jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James
Hi James,

Big definition problem first to solve: "what is a superlineair speedup".

Historically in computerchess that's getting n speedup out of n nodes,
however in many worlds if you get speedup of

1.95+ out of 2
3.86 out of 4
7.0 out of 8

(to quote diep's speedups)

I was told the above is called in a lot of literature also a superlineair speedup.

So we first must settle upon what a superlineair speedup is, before going into a formal proof.

Vincent

p.s. i didn't write the above to save Bob (if you have read his thesis with claims about getting superlineair speedups)

Why is there any confusion? A "linear speedup" is considered to be a case where you use N processors to run N times faster. A "super-linear speedup" is a case where you use N processors to run more than N times faster... Something that can be solved by a modification to the original sequential algorithm.

I do not claim to produce a "super-linear speedup" other than on an occasional position here and there. Overall my speedup has _always_ been "sub-linear" and always will be. For every >N speedup you get, you will get at least one that more than offsets that in the other direction. And it happens with some regularity as anyone can verify. In fact, you were the one that used to claim you _always_ got a super-linear speedup with your program until everyone chased you out of here with such nonsense...

jswaff · Post by **jswaff** » Tue Nov 11, 2008 12:23 am

Hi Vincent,

Nothing tricky... just the standard definition: >N speedup using N processors. The numbers you cited would not be superlinear by that definition... just close to linear. If those numbers hold "in general" I'm very, very impressed.

diep wrote:
jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James
Hi James,

Big definition problem first to solve: "what is a superlineair speedup".

Historically in computerchess that's getting n speedup out of n nodes,
however in many worlds if you get speedup of

1.95+ out of 2
3.86 out of 4
7.0 out of 8

(to quote diep's speedups)

I was told the above is called in a lot of literature also a superlineair speedup.

So we first must settle upon what a superlineair speedup is, before going into a formal proof.

Vincent

p.s. i didn't write the above to save Bob (if you have read his thesis with claims about getting superlineair speedups)

bob · Post by **bob** » Tue Nov 11, 2008 2:34 am

michiguel wrote:
hgm wrote:I guess it deends on your definition of speedup. If you define it as an average over a certain classofpositions, the fact that an individual position would violate the thing you wanted to prove would not hurt you.

Wouldn't a proof not simply go like this:

I can simulate with one CPU what N CPUs do using time-division multiplexing, with almost no overhead. So this slows me down by a factor N.

Now suppose I would find a parallel search algorithm that would scale super-linearly. Then by making N large enough I could make 1/N times the performance (i.e. the performance my single multi-tasking CPU would have using that algorithm) larger than 1. So this algorithm would make 1 CPU could do more than the best 1 CPU could do, which is impossible. Thus the algorithm does not exist.
First it has to be defined what "CPU" means in this context. For instance, if you double the number of processing units with the cache memory that comes with it, and the algorithm takes advantage of more fast memory, you can easily envision a possible superlinear speedup. Some memory sorting techniques could fall in this category (speed dependent on fast temp. storage too).

Miguel

QED

Of course the proof would not work if the 1-CPU algorithm was lousy, rather than optimal. But of course then the statement would also not be true. E.g. if you use minimax, rather than alpha-beta, and now parallellize it in such a way that one CPU effectively performs beta cutoffs on other CPUs (but not itself) by recalling them, it would be very easy to get super-linear speedup.

I think the definition of CPU is clear. It is analogous to "core" in Intel's mangled terminology. In this context everything else remains constant. Same cache configuration (whether it is shared among multiple cores or not doesn't really matter since whatever is being done is constant. You don't mix and match different processors however... or compare speedups across different types of processors.

bob · Post by **bob** » Tue Nov 11, 2008 2:36 am

Dirt wrote:
michiguel wrote:First it has to be defined what "CPU" means in this context. For instance, if you double the number of processing units with the cache memory that comes with it, and the algorithm takes advantage of more fast memory, you can easily envision a possible superlinear speedup. Some memory sorting techniques could fall in this category (speed dependent on fast temp. storage too).

Miguel
You're suggesting that the performance could be superlinear with regard to memory? Maybe in some very odd cases, but I can't "easily envsion" that in general.

Perhaps the classic case was some old papers on hyper-cube chess programs. If you double the number of nodes, you double the size of main memory, which lets you double your hash table size. But that's artificial and in the SMP world, we don't compare speedups where we double the number of processors _and_ double the size of memory as well...

superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups