MPI engine parent-child management

nimble_ninja · Post by **nimble_ninja** » Sat Nov 22, 2014 10:50 pm

Could someone explain the basic process of a how splitting and merging an alpha-beta search is done within MPI? I downloaded Cluster Toga as an example, but it would take me a while to read and understand all of that code.

As I understand it, the basic principle is that when splitting, the parent processor (searching the parent node) obtains a child processor (to search the child nodes). That child processor then performs the search (likely by using children processors of its own unless none are available) and returns a result. If, in the meantime, a cutoff has occurred and the result from that child processor can be discarded by the parent processor, the child processor is sent a message to discontinue searching and become available.

Presumably, this could be managed in the abstract by having a queue of available processor ids, and having parents remove processor ids from the queue to obtain a child processor and then having children processors add themselves back in when their search is complete or cancelled. This would be trivial for a shared-memory program, but what is the best way to do this with MPI? Must one process be devoted to managing this queue, or is there some better way to do it?

Edit: I should add that I'm assuming a YBW parallelization.

nimble_ninja · Post by **nimble_ninja** » Tue Nov 25, 2014 12:05 am

I believe I have a better understanding now, mostly based on Scorpio, a cluster engine, discussed here.

Essentially, in all cluster implementations I have found, idle processes randomly query other processes and ask for work (i.e. offer their help). If a process has available work to share, it sends a split message with a node in the tree to search, and when the process who offered the help is done searching, it sends a merge message, returning the result of its search.

The unfortunate thing is that at least one thread on a compute node has to handle probing for messages. It can do other work in between probes while it wants, but since MPI is not interrupt-driven, one can't simply devote a thread to message handling and still run n other threads on the n cpus per node (unless one put thread_sleep() calls in that thread, I suppose). Daniel Shawl mentioned that "some special edition" of MPI is interrupt-driven; does anyone have more information about that? My searches were fruitless. (I imagine this is hardware-dependent, but I would also imagine that modern NICs support such interrupt-driven operations, but I'm just guessing.)

There is another thing about this approach that seems like a downside to me. That is, the split points will essentially be random; processes do not choose when they split, only when they do not split (i.e. split at type 3 nodes (aka All-nodes) and not at type 1-2 nodes (aka PV- and cut-nodes)).

Is it not better to split higher up in the search, so that processes can be given the most amount of work possible? If it is in fact better to do that, does there not exist a means by which one could prioritize splitting at higher nodes in the search, i.e. a non-random load distribution?

bob · Post by **bob** » Tue Nov 25, 2014 1:11 am

nimble_ninja wrote:I believe I have a better understanding now, mostly based on Scorpio, a cluster engine, discussed here.

Essentially, in all cluster implementations I have found, idle processes randomly query other processes and ask for work (i.e. offer their help). If a process has available work to share, it sends a split message with a node in the tree to search, and when the process who offered the help is done searching, it sends a merge message, returning the result of its search.

The unfortunate thing is that at least one thread on a compute node has to handle probing for messages. It can do other work in between probes while it wants, but since MPI is not interrupt-driven, one can't simply devote a thread to message handling and still run n other threads on the n cpus per node (unless one put thread_sleep() calls in that thread, I suppose). Daniel Shawl mentioned that "some special edition" of MPI is interrupt-driven; does anyone have more information about that? My searches were fruitless. (I imagine this is hardware-dependent, but I would also imagine that modern NICs support such interrupt-driven operations, but I'm just guessing.)

There is another thing about this approach that seems like a downside to me. That is, the split points will essentially be random; processes do not choose when they split, only when they do not split (i.e. split at type 3 nodes (aka All-nodes) and not at type 1-2 nodes (aka PV- and cut-nodes)).

Is it not better to split higher up in the search, so that processes can be given the most amount of work possible? If it is in fact better to do that, does there not exist a means by which one could prioritize splitting at higher nodes in the search, i.e. a non-random load distribution?

It is not quite so simple. Yes, a deeper search is better. But you DO have to search at least one move before splitting so that you have a working alpha bound for the helpers to use...

MPI engine parent-child management

MPI engine parent-child management

Re: MPI engine parent-child management

Re: MPI engine parent-child management