DTS Structure

CThinker · Post by **CThinker** » Sat May 29, 2010 6:26 am

My understanding is that DTS is really not that different from YBW, except that instead of "waiting", a thread is made available to search other candidate sub-trees. So, the common recursive search can be easily made to do DTS. Crafty, I believe, is a good example of this.

bob · Post by **bob** » Sat May 29, 2010 7:11 am

CThinker wrote:My understanding is that DTS is really not that different from YBW, except that instead of "waiting", a thread is made available to search other candidate sub-trees. So, the common recursive search can be easily made to do DTS. Crafty, I believe, is a good example of this.

Not so easy. The problem is that the DTS algorithm is based on choosing a split point from all available nodes on _all_ other threads. That is, you choose the most beneficial split point. As an example, in Crafty (as in any YBW algorithm) you will split at a node after one node has been searched without causing a fail-high. In Crafty, about 92% of nodes fail high on the first move, if they fail high. This means that if this is a CUT node, 92% of the time we won't split here since we fail high on the first branch. Unfortunately, 8% of the time we will fail high on the second (or later) move, and doing a parallel split there does unnecessary work.

With DTS, in Cray Blitz, I circumvented this to an extent because rather than having to split at the current node of the current thread that has satisfied YBW, I could look at all nodes on all threads, find the ones that had satisfied YBW (or even better, ones that had searched multiple moves which really reduces the odds of doing a split at a CUT node).

The problem is that recursive search means you can split "here" or not at all, because of the call stack. With DTS there is no call stack, so you can split at a node that is 10 plies back in the tree just as easily as you can split "here". That's the main difference. The stuff about processors not waiting is pretty comparable between Crafty and DTS. As are the data structures and such. But the ability to see all nodes for all threads when a CPU goes idle, rather than having non-idle CPUs ask idle ones to "join in" is a significant efficiency enhancement...

Onno Garms · Post by **Onno Garms** » Sat May 29, 2010 9:33 am

bob wrote: The problem is that recursive search means you can split "here" or not at all, because of the call stack.

ACK, but that is a problem for both DTS and YBWC.

With DTS there is no call stack, so you can split at a node that is 10 plies back in the tree just as easily as you can split "here".

You mean "iterative search" rather than "DTS"?

That's the main difference.

Isnt' the main difference that in DTS the child that finishes last at a split point takes over the parent? This means, we move nodes to another thread. For this reason, an iterative search is essential for DTS (because we cannot move callstacks of recursive search). For YBW an iterative search is just a benefit on top (allowing to split anywhere in the tree).

As are the data structures and such. But the ability to see all nodes for all threads when a CPU goes idle, rather than having non-idle CPUs ask idle ones to "join in" is a significant efficiency enhancement...

ACK

Edmund · Post by **Edmund** » Sat May 29, 2010 9:43 am

CThinker wrote:My understanding is that DTS is really not that different from YBW, except that instead of "waiting", a thread is made available to search other candidate sub-trees. So, the common recursive search can be easily made to do DTS. Crafty, I believe, is a good example of this.

Additionally to Bob's statement that DTS is more flexible in the choice of split points I want to add that DTS also has more flexibility regarding node ownership:

YBW:
Slave takes work on splitpoint (SP)
Master returns to SP, only needs information from Slave
Master takes on other task
Slave finishes search
Master finishes other task
Master propagates the values

DTS:
Slave takes work on splitpoint (SP)
Master returns to SP, only needs information from Slave
Master takes on other task
Slave finishes search
Slave becomes the new owner of the node and propagates the values

This difference is critical if the information gained from propagating the values as soon as they were available would have saved some other threads work.

Onno Garms · Post by **Onno Garms** » Sat May 29, 2010 9:48 am

Zach Wegner wrote: That's quite close to the way it used to be in ZCT (apparently before the first public release).

I think it is more similar to what Edmund wants to implement. The only difference to Eduard's approach is that I do not store the addresses of the jump labels but have a case switch at the the end to translate enums into jumps.

Onno Garms · Post by **Onno Garms** » Sat May 29, 2010 10:09 am

Edmund wrote: YBW:
1. Slave takes work on splitpoint (SP)
2. Master returns to SP, only needs information from Slave
3. Master takes on other task
4. Slave finishes search
5. Master finishes other task
6. Master propagates the values

The order of steps 4 and 5 is broken. To avoid this order, the master only helps slaves in step 3, it must not take a completely different task. Then the master automatically finishes before the slave.

Edmund · Post by **Edmund** » Sat May 29, 2010 10:20 am

Onno Garms wrote:
Edmund wrote: YBW:
1. Slave takes work on splitpoint (SP)
2. Master returns to SP, only needs information from Slave
3. Master takes on other task
4. Slave finishes search
5. Master finishes other task
6. Master propagates the values
The order of steps 4 and 5 is broken. To avoid this order, the master only helps slaves in step 3, it must not take a completely different task. Then the master automatically finishes before the slave.

Ah, I see, that makes sense then,
however this still produces inefficiencies then as the splitpoints turn out deeper and deeper in the search tree. The most effective parallel task would search a very large tree on a node with a very high chance to be an all node, this reduces the communication and synchronization effort to a minimum. But there we are already coming back to the fact that DTS is more flexible in the split point selection.

bob · Post by **bob** » Sat May 29, 2010 6:36 pm

Onno Garms wrote:
bob wrote: The problem is that recursive search means you can split "here" or not at all, because of the call stack.
ACK, but that is a problem for both DTS and YBWC.

No, that is what we are talking about when we mention an iterated search vs a recursive search. In an iterated search, rather than making a recursive call to Search(), you increment "ply" and go back to the top. Rather than doing a return (from a recursive search call) you just decrement ply. So there is no call stack of any kind, and all the data for each ply is available for use or modification, where in a recursive search this is not true.

With DTS there is no call stack, so you can split at a node that is 10 plies back in the tree just as easily as you can split "here".
You mean "iterative search" rather than "DTS"?

DTS _depends_ on an iterated search... That was the primary underlying principle of the idea...

That's the main difference.
Isnt' the main difference that in DTS the child that finishes last at a split point takes over the parent? This means, we move nodes to another thread. For this reason, an iterative search is essential for DTS (because we cannot move callstacks of recursive search). For YBW an iterative search is just a benefit on top (allowing to split anywhere in the tree).

To do that, you can't use a recursive search. How can anyone but the "parent" back up thru a split point, since only the parent has the call stack to do this? That why everyone mentions "iterated search" when talking about DTS... One could do it with a recursive search, but it would be incredibly messy since it would be necessary to "unwind" a call stack here and there, and "artificially load one" at other points. Who'd want to debug that?

As are the data structures and such. But the ability to see all nodes for all threads when a CPU goes idle, rather than having non-idle CPUs ask idle ones to "join in" is a significant efficiency enhancement...
ACK

bob · Post by **bob** » Sat May 29, 2010 6:39 pm

Onno Garms wrote:
Edmund wrote: YBW:
1. Slave takes work on splitpoint (SP)
2. Master returns to SP, only needs information from Slave
3. Master takes on other task
4. Slave finishes search
5. Master finishes other task
6. Master propagates the values
The order of steps 4 and 5 is broken. To avoid this order, the master only helps slaves in step 3, it must not take a completely different task. Then the master automatically finishes before the slave.

First, how is this true? And second, why would you want to restrict where an idle processor can join in? That's one point of DTS, to pick the best possible split point regardless of where it is, while at the same time not allowing idle processors to sit around and wait for something they can work on.

bob · Post by **bob** » Sat May 29, 2010 6:40 pm

Edmund wrote:
Onno Garms wrote:
Edmund wrote: YBW:
1. Slave takes work on splitpoint (SP)
2. Master returns to SP, only needs information from Slave
3. Master takes on other task
4. Slave finishes search
5. Master finishes other task
6. Master propagates the values
The order of steps 4 and 5 is broken. To avoid this order, the master only helps slaves in step 3, it must not take a completely different task. Then the master automatically finishes before the slave.
Ah, I see, that makes sense then,
however this still produces inefficiencies then as the splitpoints turn out deeper and deeper in the search tree. The most effective parallel task would search a very large tree on a node with a very high chance to be an all node, this reduces the communication and synchronization effort to a minimum. But there we are already coming back to the fact that DTS is more flexible in the split point selection.

That is _the_ major point of DTS, you can split anywhere, not just below the last split point in this branch, and you can see all possible split points (since _any_ node is a potential split point in DTS) so that you can choose the one least likely to produce overhead with no useful information.

DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure