superlinear speedups

jswaff · Post by **jswaff** » Sat Nov 01, 2008 4:28 pm

I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James

Carey · Post by **Carey** » Sat Nov 01, 2008 4:44 pm

jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James

First, you can't prove that something like that is impossible. All you can prove is that given certain constraints and certain knowledge and certain techniques, it's not possible within those bounds. Even that would probably be pretty difficult.

Second, Hyatt and others have long encountered the occasional position that got superlinear speed ups.

True, those positions are rare. Just random positions that happen to work well for that particular search method etc.

But the fact that even one occured would suggest that you can not prove that superlinear speedups are impossible.

You might be able to prove they are rare.... But since they exist it would be difficult to prove they are impossible.

Of course, I'm not an expert in mathematical proofs or parallel search, so others may have a different opinion.

jswaff · Post by **jswaff** » Sat Nov 01, 2008 4:55 pm

Oh sure, it's possible in some positions, due to imperfect move ordering. So let me clarify: I'm talking about the general case, with well ordered trees. I'd like to hear even an informal argument as to why you won't see superlinear speedups (in general) with good move ordering.

Edit: Back to the proof: I think it should be possible to prove using perfectly ordered trees (in which CUT nodes examine just one child).

--
James

Carey wrote:
jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James
First, you can't prove that something like that is impossible. All you can prove is that given certain constraints and certain knowledge and certain techniques, it's not possible within those bounds. Even that would probably be pretty difficult.

Second, Hyatt and others have long encountered the occasional position that got superlinear speed ups.

True, those positions are rare. Just random positions that happen to work well for that particular search method etc.

But the fact that even one occured would suggest that you can not prove that superlinear speedups are impossible.

You might be able to prove they are rare.... But since they exist it would be difficult to prove they are impossible.

Of course, I'm not an expert in mathematical proofs or parallel search, so others may have a different opinion.

hgm · Post by **hgm** » Sat Nov 01, 2008 4:57 pm

I guess it deends on your definition of speedup. If you define it as an average over a certain classofpositions, the fact that an individual position would violate the thing you wanted to prove would not hurt you.

Wouldn't a proof not simply go like this:

I can simulate with one CPU what N CPUs do using time-division multiplexing, with almost no overhead. So this slows me down by a factor N.

Now suppose I would find a parallel search algorithm that would scale super-linearly. Then by making N large enough I could make 1/N times the performance (i.e. the performance my single multi-tasking CPU would have using that algorithm) larger than 1. So this algorithm would make 1 CPU could do more than the best 1 CPU could do, which is impossible. Thus the algorithm does not exist.

QED

Of course the proof would not work if the 1-CPU algorithm was lousy, rather than optimal. But of course then the statement would also not be true. E.g. if you use minimax, rather than alpha-beta, and now parallellize it in such a way that one CPU effectively performs beta cutoffs on other CPUs (but not itself) by recalling them, it would be very easy to get super-linear speedup.

michiguel · Post by **michiguel** » Sat Nov 01, 2008 5:35 pm

jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James

There was a huge thread some years ago about this that lead to nowhere.

Miguel

michiguel · Post by **michiguel** » Sat Nov 01, 2008 5:47 pm

hgm wrote:I guess it deends on your definition of speedup. If you define it as an average over a certain classofpositions, the fact that an individual position would violate the thing you wanted to prove would not hurt you.

Wouldn't a proof not simply go like this:

I can simulate with one CPU what N CPUs do using time-division multiplexing, with almost no overhead. So this slows me down by a factor N.

Now suppose I would find a parallel search algorithm that would scale super-linearly. Then by making N large enough I could make 1/N times the performance (i.e. the performance my single multi-tasking CPU would have using that algorithm) larger than 1. So this algorithm would make 1 CPU could do more than the best 1 CPU could do, which is impossible. Thus the algorithm does not exist.

First it has to be defined what "CPU" means in this context. For instance, if you double the number of processing units with the cache memory that comes with it, and the algorithm takes advantage of more fast memory, you can easily envision a possible superlinear speedup. Some memory sorting techniques could fall in this category (speed dependent on fast temp. storage too).

Miguel

QED

Of course the proof would not work if the 1-CPU algorithm was lousy, rather than optimal. But of course then the statement would also not be true. E.g. if you use minimax, rather than alpha-beta, and now parallellize it in such a way that one CPU effectively performs beta cutoffs on other CPUs (but not itself) by recalling them, it would be very easy to get super-linear speedup.

Carey · Post by **Carey** » Sat Nov 01, 2008 6:00 pm

jswaff wrote:Oh sure, it's possible in some positions, due to imperfect move ordering.

All move ordering in chess is imperfect.

If it was perfect, you would be able to instantly pick the best move at each node. If you can do that in the search, then you could do it in the root and you wouldn't need to search...

So let me clarify: I'm talking about the general case, with well ordered trees. I'd like to hear even an informal argument as to why you won't see superlinear speedups (in general) with good move ordering.

Edit: Back to the proof: I think it should be possible to prove using perfectly ordered trees (in which CUT nodes examine just one child).

I seriously doubt it's possible to prove it.

It would depend on so many factors, some of which will be essentially random (which processor accesses what on specific clock cycles, etc. etc.)

And it would probably even depend on the quality of your move ordering. The better it is, the less likely you'd get superlinear. (Because the better it is, the greater the chance that the first move will be best and only one processor will be able to search it, thereby preventing the other processors from doing significantly useful work.)

I'm not saying you can get superlinear speedups on a regular basis. I'm saying that I seriously doubt you can prove that it's impossible.

And if you could, you would probably need so many asumptions, restrictions, pre-conditions etc. that it would no longer apply to reach game searching.

Carey · Post by **Carey** » Sat Nov 01, 2008 6:06 pm

hgm wrote:I guess it deends on your definition of speedup. If you define it as an average over a certain classofpositions, the fact that an individual position would violate the thing you wanted to prove would not hurt you.

Wouldn't a proof not simply go like this:

I can simulate with one CPU what N CPUs do using time-division multiplexing, with almost no overhead. So this slows me down by a factor N.

You can't completely simulate the behavior of multiple cpu's with only one cpu.

There are random interactions involved with multiple cpu's. Their clock's will be slightly different, their accesses to memory will be slightly random, or their message passing timings will be slightly random, etc.

And there's drift in the clocks used by computers. They don't stay exactly the same. On a single phyisical system that's no big deal, but on multiple hardware, it's going to cause random behavior that you can't predict in advance.

Now suppose I would find a parallel search algorithm that would scale super-linearly. Then by making N large enough I could make 1/N times the performance (i.e. the performance my single multi-tasking CPU would have using that algorithm) larger than 1. So this algorithm would make 1 CPU could do more than the best 1 CPU could do, which is impossible. Thus the algorithm does not exist.

QED

Of course the proof would not work if the 1-CPU algorithm was lousy, rather than optimal. But of course then the statement would also not be true. E.g. if you use minimax, rather than alpha-beta, and now parallellize it in such a way that one CPU effectively performs beta cutoffs on other CPUs (but not itself) by recalling them, it would be very easy to get super-linear speedup.

That would be a valid algorithm, though.

As I've said, to try and prove this, you'll have to start adding all sorts of restrictions, asumptions, limitations, etc.

It's unlikely you'll be able to prove this in both a general mathematical way and a real-world way.

hgm · Post by **hgm** » Sat Nov 01, 2008 6:51 pm

In principle you can simate all that randomness as well. It doesn't matter how large a hit you take in making that detailed a simulation. With super-linear speedup you will earn back any factor, for large-enough N.

Obviously you cannot prove what is not true. But I think the proof I gave above does prove this:

"If you have super-linear speedup, your algorithm sucks for any number of processors."

Carey · Post by **Carey** » Sat Nov 01, 2008 7:02 pm

hgm wrote:In principle you can simate all that randomness as well. It doesn't matter how large a hit you take in making that detailed a simulation. With super-linear speedup you will earn back any factor, for large-enough N.

Only if you know about it ahead of time.

Once you start adding all sorts of extra conditions and situations, the proof has to go up greatly in complexity.

And then if you change the hardware that behaves differently, then you have to change your simulation & proof, etc.

Obviously you cannot prove what is not true. But I think the proof I gave above does prove this:

"If you have super-linear speedup, your algorithm sucks for any number of processors."

But it's entirely possible for your algorithm to suck.

That it's a poor algorithm or a poor (or odd) conversion going from single to parallel says nothing about the validity of the results.

I'm not saying superlinear is possible or practical.

Just that proving it would be very difficult and would likely require a significant number of caveats and could end up being very specfic rather than general.

superlinear speedups

superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups