DTS Structure

Onno Garms · Post by **Onno Garms** » Sat May 29, 2010 10:09 am

Edmund wrote: YBW:
1. Slave takes work on splitpoint (SP)
2. Master returns to SP, only needs information from Slave
3. Master takes on other task
4. Slave finishes search
5. Master finishes other task
6. Master propagates the values

The order of steps 4 and 5 is broken. To avoid this order, the master only helps slaves in step 3, it must not take a completely different task. Then the master automatically finishes before the slave.

Edmund · Post by **Edmund** » Sat May 29, 2010 10:20 am

Onno Garms wrote:
Edmund wrote: YBW:
1. Slave takes work on splitpoint (SP)
2. Master returns to SP, only needs information from Slave
3. Master takes on other task
4. Slave finishes search
5. Master finishes other task
6. Master propagates the values
The order of steps 4 and 5 is broken. To avoid this order, the master only helps slaves in step 3, it must not take a completely different task. Then the master automatically finishes before the slave.

Ah, I see, that makes sense then,
however this still produces inefficiencies then as the splitpoints turn out deeper and deeper in the search tree. The most effective parallel task would search a very large tree on a node with a very high chance to be an all node, this reduces the communication and synchronization effort to a minimum. But there we are already coming back to the fact that DTS is more flexible in the split point selection.

bob · Post by **bob** » Sat May 29, 2010 6:36 pm

Onno Garms wrote:
bob wrote: The problem is that recursive search means you can split "here" or not at all, because of the call stack.
ACK, but that is a problem for both DTS and YBWC.

No, that is what we are talking about when we mention an iterated search vs a recursive search. In an iterated search, rather than making a recursive call to Search(), you increment "ply" and go back to the top. Rather than doing a return (from a recursive search call) you just decrement ply. So there is no call stack of any kind, and all the data for each ply is available for use or modification, where in a recursive search this is not true.

With DTS there is no call stack, so you can split at a node that is 10 plies back in the tree just as easily as you can split "here".
You mean "iterative search" rather than "DTS"?

DTS _depends_ on an iterated search... That was the primary underlying principle of the idea...

That's the main difference.
Isnt' the main difference that in DTS the child that finishes last at a split point takes over the parent? This means, we move nodes to another thread. For this reason, an iterative search is essential for DTS (because we cannot move callstacks of recursive search). For YBW an iterative search is just a benefit on top (allowing to split anywhere in the tree).

To do that, you can't use a recursive search. How can anyone but the "parent" back up thru a split point, since only the parent has the call stack to do this? That why everyone mentions "iterated search" when talking about DTS... One could do it with a recursive search, but it would be incredibly messy since it would be necessary to "unwind" a call stack here and there, and "artificially load one" at other points. Who'd want to debug that?

As are the data structures and such. But the ability to see all nodes for all threads when a CPU goes idle, rather than having non-idle CPUs ask idle ones to "join in" is a significant efficiency enhancement...
ACK

bob · Post by **bob** » Sat May 29, 2010 6:39 pm

Onno Garms wrote:
Edmund wrote: YBW:
1. Slave takes work on splitpoint (SP)
2. Master returns to SP, only needs information from Slave
3. Master takes on other task
4. Slave finishes search
5. Master finishes other task
6. Master propagates the values
The order of steps 4 and 5 is broken. To avoid this order, the master only helps slaves in step 3, it must not take a completely different task. Then the master automatically finishes before the slave.

First, how is this true? And second, why would you want to restrict where an idle processor can join in? That's one point of DTS, to pick the best possible split point regardless of where it is, while at the same time not allowing idle processors to sit around and wait for something they can work on.

bob · Post by **bob** » Sat May 29, 2010 6:40 pm

Edmund wrote:
Onno Garms wrote:
Edmund wrote: YBW:
1. Slave takes work on splitpoint (SP)
2. Master returns to SP, only needs information from Slave
3. Master takes on other task
4. Slave finishes search
5. Master finishes other task
6. Master propagates the values
The order of steps 4 and 5 is broken. To avoid this order, the master only helps slaves in step 3, it must not take a completely different task. Then the master automatically finishes before the slave.
Ah, I see, that makes sense then,
however this still produces inefficiencies then as the splitpoints turn out deeper and deeper in the search tree. The most effective parallel task would search a very large tree on a node with a very high chance to be an all node, this reduces the communication and synchronization effort to a minimum. But there we are already coming back to the fact that DTS is more flexible in the split point selection.

That is _the_ major point of DTS, you can split anywhere, not just below the last split point in this branch, and you can see all possible split points (since _any_ node is a potential split point in DTS) so that you can choose the one least likely to produce overhead with no useful information.

Onno Garms · Post by **Onno Garms** » Sun May 30, 2010 10:35 am

bob wrote:
Onno Garms wrote:
bob wrote: The problem is that recursive search means you can split "here" or not at all, because of the call stack.
ACK, but that is a problem for both DTS and YBWC.
No, that is what we are talking about when we mention an iterated search vs a recursive search.

Of course. I don't see any contraction between your comments (which I fully acknowledge) and my original comments. So I will go through them to post another reply.

Apart from this one (on "master only helps slaves, does not pick an arbitrary split point"):

First, how is this true?

Of course that is not true automatically, we have to implement that.

And second, why would you want to restrict where an idle processor can join in?

Sure, it is better not to restrict as in DTS. I never doubted that. But restricting is better then accepting the problem that Edmund saw with the YBWC: If we don't restrict, after all slaves at a split point have finished, would have to wait for the owner of the parent to become available again to continue the work.

bob · Post by **bob** » Sun May 30, 2010 6:57 pm

Onno Garms wrote:
bob wrote:
Onno Garms wrote:
bob wrote: The problem is that recursive search means you can split "here" or not at all, because of the call stack.
ACK, but that is a problem for both DTS and YBWC.
No, that is what we are talking about when we mention an iterated search vs a recursive search.
Of course. I don't see any contraction between your comments (which I fully acknowledge) and my original comments. So I will go through them to post another reply.

Apart from this one (on "master only helps slaves, does not pick an arbitrary split point"):

First, how is this true?
Of course that is not true automatically, we have to implement that.

And second, why would you want to restrict where an idle processor can join in?
Sure, it is better not to restrict as in DTS. I never doubted that. But restricting is better then accepting the problem that Edmund saw with the YBWC: If we don't restrict, after all slaves at a split point have finished, would have to wait for the owner of the parent to become available again to continue the work.

This is not a YBW issue, it is an implementation issue. The only specification for YBW is that you never split at any node until you have completed searching at least one branch out of that node to convince yourself that this is not likely going to be a CUT node. I use pure YBW in Crafty, And I don't have the issue you mentioned above at all. Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...

zamar · Post by **zamar** » Sun May 30, 2010 9:08 pm

bob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...

And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systems

bob · Post by **bob** » Sun May 30, 2010 10:28 pm

zamar wrote:
bob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...
And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systems

This is not quite true. With current "crafty-like" YBW algorithms (and it seems everyone doing a real parallel search uses the same idea) a thread that notices idle threads stops and does some work to "invite them in." In DTS the opposite was done, at least in my implementation. The _idle_ threads simply collected state from everyone else and chose which to "join". And it could do the "join" without the "joinee" even realizing this had happened. That is an efficiency improvement since working threads don't do the splitting, the idle threads do.

However, DTS _can_ be pretty good because you don't just have to take the current node as a split point once YBW is satisfied, you can split _anywhere_ in the tree, with any active thread, without regard to the depth or tree order. Which means you have a much better chance of choosing a good split point (ALL node). And you can pick one as close to the root as possible as well, which can further help as the moves there are better ordered.

But it is definitely messy...

Edmund · Post by **Edmund** » Sun May 30, 2010 11:39 pm

bob wrote:
zamar wrote:
bob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...
And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systems
This is not quite true. With current "crafty-like" YBW algorithms (and it seems everyone doing a real parallel search uses the same idea) a thread that notices idle threads stops and does some work to "invite them in." In DTS the opposite was done, at least in my implementation. The _idle_ threads simply collected state from everyone else and chose which to "join". And it could do the "join" without the "joinee" even realizing this had happened. That is an efficiency improvement since working threads don't do the splitting, the idle threads do.

However, DTS _can_ be pretty good because you don't just have to take the current node as a split point once YBW is satisfied, you can split _anywhere_ in the tree, with any active thread, without regard to the depth or tree order. Which means you have a much better chance of choosing a good split point (ALL node). And you can pick one as close to the root as possible as well, which can further help as the moves there are better ordered.

But it is definitely messy...

if you store all the data in a global struct why couldn't you let the slaves do the search for good split points in the YBW framework? What is different to DTS in that respect?

The only difference I really see is that in DTS master and slave can swap places and in YBW they can't. But the selection of the split points should work in a similar manner.

DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure

Re: DTS Structure