That is _the_ major point of DTS, you can split anywhere, not just below the last split point in this branch, and you can see all possible split points (since _any_ node is a potential split point in DTS) so that you can choose the one least likely to produce overhead with no useful information.Edmund wrote:Ah, I see, that makes sense then,Onno Garms wrote:The order of steps 4 and 5 is broken. To avoid this order, the master only helps slaves in step 3, it must not take a completely different task. Then the master automatically finishes before the slave.Edmund wrote: YBW:
1. Slave takes work on splitpoint (SP)
2. Master returns to SP, only needs information from Slave
3. Master takes on other task
4. Slave finishes search
5. Master finishes other task
6. Master propagates the values
however this still produces inefficiencies then as the splitpoints turn out deeper and deeper in the search tree. The most effective parallel task would search a very large tree on a node with a very high chance to be an all node, this reduces the communication and synchronization effort to a minimum. But there we are already coming back to the fact that DTS is more flexible in the split point selection.
DTS Structure
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: DTS Structure
-
- Posts: 224
- Joined: Mon Mar 12, 2007 7:31 pm
- Location: Bonn, Germany
Re: DTS Structure
Of course. I don't see any contraction between your comments (which I fully acknowledge) and my original comments. So I will go through them to post another reply.bob wrote:No, that is what we are talking about when we mention an iterated search vs a recursive search.Onno Garms wrote:ACK, but that is a problem for both DTS and YBWC.bob wrote: The problem is that recursive search means you can split "here" or not at all, because of the call stack.
Apart from this one (on "master only helps slaves, does not pick an arbitrary split point"):
Of course that is not true automatically, we have to implement that.First, how is this true?
Sure, it is better not to restrict as in DTS. I never doubted that. But restricting is better then accepting the problem that Edmund saw with the YBWC: If we don't restrict, after all slaves at a split point have finished, would have to wait for the owner of the parent to become available again to continue the work.And second, why would you want to restrict where an idle processor can join in?
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: DTS Structure
This is not a YBW issue, it is an implementation issue. The only specification for YBW is that you never split at any node until you have completed searching at least one branch out of that node to convince yourself that this is not likely going to be a CUT node. I use pure YBW in Crafty, And I don't have the issue you mentioned above at all. Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...Onno Garms wrote:Of course. I don't see any contraction between your comments (which I fully acknowledge) and my original comments. So I will go through them to post another reply.bob wrote:No, that is what we are talking about when we mention an iterated search vs a recursive search.Onno Garms wrote:ACK, but that is a problem for both DTS and YBWC.bob wrote: The problem is that recursive search means you can split "here" or not at all, because of the call stack.
Apart from this one (on "master only helps slaves, does not pick an arbitrary split point"):
Of course that is not true automatically, we have to implement that.First, how is this true?
Sure, it is better not to restrict as in DTS. I never doubted that. But restricting is better then accepting the problem that Edmund saw with the YBWC: If we don't restrict, after all slaves at a split point have finished, would have to wait for the owner of the parent to become available again to continue the work.And second, why would you want to restrict where an idle processor can join in?
-
- Posts: 613
- Joined: Sun Jan 18, 2009 7:03 am
Re: DTS Structure
And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systemsbob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...
Joona Kiiski
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: DTS Structure
This is not quite true. With current "crafty-like" YBW algorithms (and it seems everyone doing a real parallel search uses the same idea) a thread that notices idle threads stops and does some work to "invite them in." In DTS the opposite was done, at least in my implementation. The _idle_ threads simply collected state from everyone else and chose which to "join". And it could do the "join" without the "joinee" even realizing this had happened. That is an efficiency improvement since working threads don't do the splitting, the idle threads do.zamar wrote:And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systemsbob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...
However, DTS _can_ be pretty good because you don't just have to take the current node as a split point once YBW is satisfied, you can split _anywhere_ in the tree, with any active thread, without regard to the depth or tree order. Which means you have a much better chance of choosing a good split point (ALL node). And you can pick one as close to the root as possible as well, which can further help as the moves there are better ordered.
But it is definitely messy...
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: DTS Structure
if you store all the data in a global struct why couldn't you let the slaves do the search for good split points in the YBW framework? What is different to DTS in that respect?bob wrote:This is not quite true. With current "crafty-like" YBW algorithms (and it seems everyone doing a real parallel search uses the same idea) a thread that notices idle threads stops and does some work to "invite them in." In DTS the opposite was done, at least in my implementation. The _idle_ threads simply collected state from everyone else and chose which to "join". And it could do the "join" without the "joinee" even realizing this had happened. That is an efficiency improvement since working threads don't do the splitting, the idle threads do.zamar wrote:And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systemsbob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...
However, DTS _can_ be pretty good because you don't just have to take the current node as a split point once YBW is satisfied, you can split _anywhere_ in the tree, with any active thread, without regard to the depth or tree order. Which means you have a much better chance of choosing a good split point (ALL node). And you can pick one as close to the root as possible as well, which can further help as the moves there are better ordered.
But it is definitely messy...
The only difference I really see is that in DTS master and slave can swap places and in YBW they can't. But the selection of the split points should work in a similar manner.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: DTS Structure
Edmund wrote:if you store all the data in a global struct why couldn't you let the slaves do the search for good split points in the YBW framework? What is different to DTS in that respect?bob wrote:This is not quite true. With current "crafty-like" YBW algorithms (and it seems everyone doing a real parallel search uses the same idea) a thread that notices idle threads stops and does some work to "invite them in." In DTS the opposite was done, at least in my implementation. The _idle_ threads simply collected state from everyone else and chose which to "join". And it could do the "join" without the "joinee" even realizing this had happened. That is an efficiency improvement since working threads don't do the splitting, the idle threads do.zamar wrote:And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systemsbob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...
However, DTS _can_ be pretty good because you don't just have to take the current node as a split point once YBW is satisfied, you can split _anywhere_ in the tree, with any active thread, without regard to the depth or tree order. Which means you have a much better chance of choosing a good split point (ALL node). And you can pick one as close to the root as possible as well, which can further help as the moves there are better ordered.
But it is definitely messy...
How are you going to handle the "helpful master" approach if you do that? The master has to do the splitting so that it has a back door to help others while retaining the call stack for this split point so that it can return back through it when all work is completed.
Think about how you are going to deal with the call stack that you can't modify, and how you are going to get back to this point if other processors just "jump in" at points in the tree that could be above or below this one. It's an extremely complex approach. Anything _can_ be done of course. But I would not try to undertake something 100x more complex than basic iterated search using DTS.
The only difference I really see is that in DTS master and slave can swap places and in YBW they can't. But the selection of the split points should work in a similar manner.
Here's a sample problem. Thread A recursively calls itself to get down to ply 15. When another thread goes idle, and wants to join this thread at ply=6, how can that thread modify my call stack so that as I return I don't just return thru the split point without any synchronization to verify that all work has been completed? This is a major complexity issue.
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: DTS Structure
In my effort to DTS I used the following approach:bob wrote:Edmund wrote:if you store all the data in a global struct why couldn't you let the slaves do the search for good split points in the YBW framework? What is different to DTS in that respect?bob wrote:This is not quite true. With current "crafty-like" YBW algorithms (and it seems everyone doing a real parallel search uses the same idea) a thread that notices idle threads stops and does some work to "invite them in." In DTS the opposite was done, at least in my implementation. The _idle_ threads simply collected state from everyone else and chose which to "join". And it could do the "join" without the "joinee" even realizing this had happened. That is an efficiency improvement since working threads don't do the splitting, the idle threads do.zamar wrote:And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systemsbob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...
However, DTS _can_ be pretty good because you don't just have to take the current node as a split point once YBW is satisfied, you can split _anywhere_ in the tree, with any active thread, without regard to the depth or tree order. Which means you have a much better chance of choosing a good split point (ALL node). And you can pick one as close to the root as possible as well, which can further help as the moves there are better ordered.
But it is definitely messy...
How are you going to handle the "helpful master" approach if you do that? The master has to do the splitting so that it has a back door to help others while retaining the call stack for this split point so that it can return back through it when all work is completed.
Think about how you are going to deal with the call stack that you can't modify, and how you are going to get back to this point if other processors just "jump in" at points in the tree that could be above or below this one. It's an extremely complex approach. Anything _can_ be done of course. But I would not try to undertake something 100x more complex than basic iterated search using DTS.
The only difference I really see is that in DTS master and slave can swap places and in YBW they can't. But the selection of the split points should work in a similar manner.
Here's a sample problem. Thread A recursively calls itself to get down to ply 15. When another thread goes idle, and wants to join this thread at ply=6, how can that thread modify my call stack so that as I return I don't just return thru the split point without any synchronization to verify that all work has been completed? This is a major complexity issue.
I kept my movelists global for anyone to access; each move entry consists of 2 parts: 2byte from/to/promotion, 2byte score. Whenever the score is < 32000 it means the move still has to be searched; 32767 the move has already been searched; 32766-n the move is currently searched with the task number "n". (Note I say task number and not thread number because that task was not statically fixed to a certain thread, anyway in the structure of the task there would be a variable pointing at the thread id of the one currently working on the root node of the task) So an idle thread could walk through the movelists of a working thread and change the score value of a certain move where it wants to split. The master thread which finds itself on a node still waiting on a slave will then immediately know which thread to help, by looking at the score of all moves with a score from 32001 to 32766.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: DTS Structure
How does that solve the problem of returning? With recursive search, the "master" (or owner in DTS-speak) has to return thru the split point as things are backed up, with DTS this is not the case and _anyone_ can back up through a split point, which is really the last thread that is working at the split point.Edmund wrote:In my effort to DTS I used the following approach:bob wrote:Edmund wrote:if you store all the data in a global struct why couldn't you let the slaves do the search for good split points in the YBW framework? What is different to DTS in that respect?bob wrote:This is not quite true. With current "crafty-like" YBW algorithms (and it seems everyone doing a real parallel search uses the same idea) a thread that notices idle threads stops and does some work to "invite them in." In DTS the opposite was done, at least in my implementation. The _idle_ threads simply collected state from everyone else and chose which to "join". And it could do the "join" without the "joinee" even realizing this had happened. That is an efficiency improvement since working threads don't do the splitting, the idle threads do.zamar wrote:And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systemsbob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...
However, DTS _can_ be pretty good because you don't just have to take the current node as a split point once YBW is satisfied, you can split _anywhere_ in the tree, with any active thread, without regard to the depth or tree order. Which means you have a much better chance of choosing a good split point (ALL node). And you can pick one as close to the root as possible as well, which can further help as the moves there are better ordered.
But it is definitely messy...
How are you going to handle the "helpful master" approach if you do that? The master has to do the splitting so that it has a back door to help others while retaining the call stack for this split point so that it can return back through it when all work is completed.
Think about how you are going to deal with the call stack that you can't modify, and how you are going to get back to this point if other processors just "jump in" at points in the tree that could be above or below this one. It's an extremely complex approach. Anything _can_ be done of course. But I would not try to undertake something 100x more complex than basic iterated search using DTS.
The only difference I really see is that in DTS master and slave can swap places and in YBW they can't. But the selection of the split points should work in a similar manner.
Here's a sample problem. Thread A recursively calls itself to get down to ply 15. When another thread goes idle, and wants to join this thread at ply=6, how can that thread modify my call stack so that as I return I don't just return thru the split point without any synchronization to verify that all work has been completed? This is a major complexity issue.
I kept my movelists global for anyone to access; each move entry consists of 2 parts: 2byte from/to/promotion, 2byte score. Whenever the score is < 32000 it means the move still has to be searched; 32767 the move has already been searched; 32766-n the move is currently searched with the task number "n". (Note I say task number and not thread number because that task was not statically fixed to a certain thread, anyway in the structure of the task there would be a variable pointing at the thread id of the one currently working on the root node of the task) So an idle thread could walk through the movelists of a working thread and change the score value of a certain move where it wants to split. The master thread which finds itself on a node still waiting on a slave will then immediately know which thread to help, by looking at the score of all moves with a score from 32001 to 32766.
Using your approach, nobody else can back up still, which is an issue. This really is designed around an iterated search, and was, from the get-go.
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: DTS Structure
These are two issues:bob wrote:How does that solve the problem of returning? With recursive search, the "master" (or owner in DTS-speak) has to return thru the split point as things are backed up, with DTS this is not the case and _anyone_ can back up through a split point, which is really the last thread that is working at the split point.Edmund wrote:In my effort to DTS I used the following approach:bob wrote:Edmund wrote:if you store all the data in a global struct why couldn't you let the slaves do the search for good split points in the YBW framework? What is different to DTS in that respect?bob wrote:This is not quite true. With current "crafty-like" YBW algorithms (and it seems everyone doing a real parallel search uses the same idea) a thread that notices idle threads stops and does some work to "invite them in." In DTS the opposite was done, at least in my implementation. The _idle_ threads simply collected state from everyone else and chose which to "join". And it could do the "join" without the "joinee" even realizing this had happened. That is an efficiency improvement since working threads don't do the splitting, the idle threads do.zamar wrote:And when helpful master concept is used, the time master thread ends up being idle is so small that it is almost irrelevant. The real problem is the fact that sometimes split happens at node which fails high and then a lot of work is simply wasted. This is the only area where DTS can _possibly_ outperform YBW. Anyway the expected gain is so small that I'm not going to try this in Stockfish until most people have at least 16 core systemsbob wrote: Yes the original "owner" has to be the one to back up at that node once it has been completed by either itself or "slaves". But that master can go off and work at other points in the tree also, and not sit around holding things up...
However, DTS _can_ be pretty good because you don't just have to take the current node as a split point once YBW is satisfied, you can split _anywhere_ in the tree, with any active thread, without regard to the depth or tree order. Which means you have a much better chance of choosing a good split point (ALL node). And you can pick one as close to the root as possible as well, which can further help as the moves there are better ordered.
But it is definitely messy...
How are you going to handle the "helpful master" approach if you do that? The master has to do the splitting so that it has a back door to help others while retaining the call stack for this split point so that it can return back through it when all work is completed.
Think about how you are going to deal with the call stack that you can't modify, and how you are going to get back to this point if other processors just "jump in" at points in the tree that could be above or below this one. It's an extremely complex approach. Anything _can_ be done of course. But I would not try to undertake something 100x more complex than basic iterated search using DTS.
The only difference I really see is that in DTS master and slave can swap places and in YBW they can't. But the selection of the split points should work in a similar manner.
Here's a sample problem. Thread A recursively calls itself to get down to ply 15. When another thread goes idle, and wants to join this thread at ply=6, how can that thread modify my call stack so that as I return I don't just return thru the split point without any synchronization to verify that all work has been completed? This is a major complexity issue.
I kept my movelists global for anyone to access; each move entry consists of 2 parts: 2byte from/to/promotion, 2byte score. Whenever the score is < 32000 it means the move still has to be searched; 32767 the move has already been searched; 32766-n the move is currently searched with the task number "n". (Note I say task number and not thread number because that task was not statically fixed to a certain thread, anyway in the structure of the task there would be a variable pointing at the thread id of the one currently working on the root node of the task) So an idle thread could walk through the movelists of a working thread and change the score value of a certain move where it wants to split. The master thread which finds itself on a node still waiting on a slave will then immediately know which thread to help, by looking at the score of all moves with a score from 32001 to 32766.
Using your approach, nobody else can back up still, which is an issue. This really is designed around an iterated search, and was, from the get-go.
a) any thread can back up the node
b) split-point on any node possible
DTS: a) and b) are possible.
YBW: only b) is possible - a) isn't
In my last post I just wanted to provide a solution of how to do b) in YBW and not how to solve a)