Smp concepts

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Smp concepts

Post by bob »

Desperado wrote:Hi Martin, i will have a look at it,thx.
But finally i will follow the way which was most intiutive to me if it does not turn out to be unmanageable.

But you give the next keyword i am interested in.

Did anyone used local hash tables instead of a shared ones.
The point is "used". I already followed some ideas to make the hash
tables lockless and so on. But beside some natural ideas why using
a shared hash table is a good idea, i am not sure if local hash tables
might be equally efficient or even more efficient. Trees today are very
deep and branchy, so maybe a shared ressource does more handicap
than help.

Beside, thank you guys so far, giving me some useful ideas, details to think about. ( Unfortunately i can post very rarely during the week )
I personally believe local hash is a bad idea. You lose a lot of information sharing. One can look at distributed chess engines to see how bad an idea this really is, because most serious distributed algorithms have taken great pains to implement a shared hash table, which is painful on a distributed architecture.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Smp concepts

Post by bob »

jdart wrote:I have some small local tables (pawn scoring cache for example). But the main hash table is shared.

Local caches do not necessarily help with NUMA. Your thread can go idle and later when it is awakened its code may be assigned to a different NUMA node. But most OSs do not migrate its memory to the new node. So that thread's code is now doing non-local memory access, even if initially it was not.

--Jon
You almost have to use processor affinity on NUMA boxes to avoid this. And just as importantly, the FIRST thread that touches a memory address needs to be the thread that will use it most frequently, because first touch faults it in to the local node's memory. IE the initial thread should not initialize everything like split blocks and such.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Smp concepts

Post by syzygy »

bob wrote:
syzygy wrote:
bob wrote:A's call stack is already established to take it back ply by ply using a non-sharable block of memory to maintain the position...
So you don't take it back:
syzygy wrote:My split() just stores alpha, beta, depth, etc. and sets the flag.
My join() locks the splitpoint, copies the move list from root to split node, unlocks. The joining thread then makes these moves on its own board and starts to search.
Then when I return to that point in the tree, I will be using a pointer to the WRONG split block.
Our implementations are clearly different. :-)
I don't have this "master waiting" problem in Crafty. ANY thread can split with ANY thread, no matter where the split is done. At an earlier ply, at a later ply, etc.
So when a master is done searching its own node, but its slaves are still busy, the master can join at arbitrary places in the tree? That means that when its slaves are finished, the master won't be able to immediately continue searching its own tree.

This somehow sounds bad to me, but maybe it is not...
This makes it easy for me to figure out who to "stop" when I discover a fail-high that wasn't expected, the split block "tree" shows me who is helping me directly (all have to be stopped) as well as who is helping anyone that is helping me (who also have to be stopped, recursively).
In my implementation a master waiting for its slaves to finish will be helping one of its slaves, so will still be searching part of its own tree. A fail high can therefore be handled easily:
- a master that fails high informs all its slaves (by walking through a bitmask) of the ply at which it failed high;
- similarly, a slave that fail highs will inform its master and its siblings (by walking through the same bitmaks) of the ply at which it failed high;
- threads that get aborted back up to the signalled ply, while aborting their slaves at each split point they cross.

Each thread has one int as abort flag. If it its value is 0 the thread can continue. If its value is n > 0, it has to back up to ply = n (where ply = 1 is the root, if we are splitting at the root). If its value is -1 then time ran out, all threads back up to the root and the root returns.

Simultaneous aborts are handled by making sure the flag is only set to a new value n != 0 if its current value is either 0 or > n, e.g. using atomic instructions.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Smp concepts

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:A's call stack is already established to take it back ply by ply using a non-sharable block of memory to maintain the position...
So you don't take it back:
syzygy wrote:My split() just stores alpha, beta, depth, etc. and sets the flag.
My join() locks the splitpoint, copies the move list from root to split node, unlocks. The joining thread then makes these moves on its own board and starts to search.
Then when I return to that point in the tree, I will be using a pointer to the WRONG split block.
Our implementations are clearly different. :-)
I don't have this "master waiting" problem in Crafty. ANY thread can split with ANY thread, no matter where the split is done. At an earlier ply, at a later ply, etc.
So when a master is done searching its own node, but its slaves are still busy, the master can join at arbitrary places in the tree? That means that when its slaves are finished, the master won't be able to immediately continue searching its own tree.

This somehow sounds bad to me, but maybe it is not...
So long as searching is being done, does it really matter where? Most of the time the splits occur farther toward the tips for obvious reasons, however.
This makes it easy for me to figure out who to "stop" when I discover a fail-high that wasn't expected, the split block "tree" shows me who is helping me directly (all have to be stopped) as well as who is helping anyone that is helping me (who also have to be stopped, recursively).
In my implementation a master waiting for its slaves to finish will be helping one of its slaves, so will still be searching part of its own tree. A fail high can therefore be handled easily:
- a master that fails high informs all its slaves (by walking through a bitmask) of the ply at which it failed high;
That is, in general, how my tree looks as well. Because NOBODY can back up through a split point until it is done, which means nobody can split at someplace earlier in the tree. But once things get going, it is certainly possible for the master on one branch to be searching on another branch

- similarly, a slave that fail highs will inform its master and its siblings (by walking through the same bitmaks) of the ply at which it failed high;
- threads that get aborted back up to the signalled ply, while aborting their slaves at each split point they cross.

Each thread has one int as abort flag. If it its value is 0 the thread can continue. If its value is n > 0, it has to back up to ply = n (where ply = 1 is the root, if we are splitting at the root). If its value is -1 then time ran out, all threads back up to the root and the root returns.

Simultaneous aborts are handled by making sure the flag is only set to a new value n != 0 if its current value is either 0 or > n, e.g. using atomic instructions.
My split blocks show the exact tree as it is currently set up, the first split comes from split block[0], with links to split blocks for those threads that help. At any split block this is true. It's a simple recursive "walk the tree" to figure out who is searching anywhere below a split point where a fail high occurs...
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Smp concepts

Post by Desperado »

So long as searching is being done, does it really matter where? Most of the time the splits occur farther toward the tips for obvious reasons, however.
This question is less rhetorical than it sounds :)
NOBODY can back up through a split point until it is done, which means nobody can split at someplace earlier in the tree.
Well, not tried it yet ( and probably will never try it ), but this seems not impossible to me, and might be less harder than you think.

Let's say there is a path where master0 splits like

Code: Select all

 mov1 move2 move3 split ...
If a helper thread would now copy the movelist(talking of the path) ( and necessary stack data) , it is

1. able to join the splitpoint
2. the master would be able to go in holidays.
3. or the helper thread might split even on an earlier depth.

To get something like this to work, the necessary communication must be setup. The 2 threads would need to merge at this earlier split point.
And maybe, just to mention it, the helper thread would need to start with the alpha beta window existing at this earlier split.
Beside the fact that a thread can join ( late join ) everywhere it would also be able to split everywhere, not exclusively in the subtree of the the masters splitpoint.

Well, currently i am certainly not interested in building such a solution, because i like it simple. But it does not look like that NOBODY would be able to do such a thing.
At this theoretical point, i don't think of course of benefits and disadvantages, just a thought if it _can_ be done.

So, please consider this statement as what it is, just brainstorming. :wink: :)

Currently my solution is simply the spinning helper threads "polling" the split point stack looking for work.
Everytime able to late join everywhere and additionally able to open their own splitpoint. Then the roundabout turns and turns.
The point is, that i currently don't see the practical profit of such more flexibility, given by passing tree state approaches.

So, now it is time for my first coffee of the day. Good morning everybody 8-)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Smp concepts

Post by bob »

Desperado wrote:
So long as searching is being done, does it really matter where? Most of the time the splits occur farther toward the tips for obvious reasons, however.
This question is less rhetorical than it sounds :)
NOBODY can back up through a split point until it is done, which means nobody can split at someplace earlier in the tree.
Well, not tried it yet ( and probably will never try it ), but this seems not impossible to me, and might be less harder than you think.

Let's say there is a path where master0 splits like

Code: Select all

 mov1 move2 move3 split ...
If a helper thread would now copy the movelist(talking of the path) ( and necessary stack data) , it is

1. able to join the splitpoint
2. the master would be able to go in holidays.
3. or the helper thread might split even on an earlier depth.

To get something like this to work, the necessary communication must be setup. The 2 threads would need to merge at this earlier split point.
And maybe, just to mention it, the helper thread would need to start with the alpha beta window existing at this earlier split.
Beside the fact that a thread can join ( late join ) everywhere it would also be able to split everywhere, not exclusively in the subtree of the the masters split point.
You are thinking about the WRONG data. I am talking about the call stack that contains return addresses for the recursive calls to Search(). Other threads don't have access to that. A thread can only split (in Crafty anyway) at the current ply it is at in the tree. There is no way to split at an earlier ply until you back up to that ply. If that happens to be on the other side of a split point, it can't happen at all because no one can back up through that split point except the original thread that got there, and the original thread can't back up through it (to split at an earlier point) until that split point has been completely searched and retired.



Well, currently i am certainly not interested in building such a solution, because i like it simple. But it does not look like that NOBODY would be able to do such a thing.
At this theoretical point, i don't think of course of benefits and disadvantages, just a thought if it _can_ be done.

So, please consider this statement as what it is, just brainstorming. :wink: :)

Currently my solution is simply the spinning helper threads "polling" the split point stack looking for work.
Everytime able to late join everywhere and additionally able to open their own splitpoint. Then the roundabout turns and turns.
The point is, that i currently don't see the practical profit of such more flexibility, given by passing tree state approaches.

So, now it is time for my first coffee of the day. Good morning everybody 8-)
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Smp concepts

Post by Desperado »

bob wrote:
Desperado wrote:
So long as searching is being done, does it really matter where? Most of the time the splits occur farther toward the tips for obvious reasons, however.
This question is less rhetorical than it sounds :)
NOBODY can back up through a split point until it is done, which means nobody can split at someplace earlier in the tree.
Well, not tried it yet ( and probably will never try it ), but this seems not impossible to me, and might be less harder than you think.

Let's say there is a path where master0 splits like

Code: Select all

 mov1 move2 move3 split ...
If a helper thread would now copy the movelist(talking of the path) ( and necessary stack data) , it is

1. able to join the splitpoint
2. the master would be able to go in holidays.
3. or the helper thread might split even on an earlier depth.

To get something like this to work, the necessary communication must be setup. The 2 threads would need to merge at this earlier split point.
And maybe, just to mention it, the helper thread would need to start with the alpha beta window existing at this earlier split.
Beside the fact that a thread can join ( late join ) everywhere it would also be able to split everywhere, not exclusively in the subtree of the the masters split point.
You are thinking about the WRONG data. I am talking about the call stack that contains return addresses for the recursive calls to Search(). Other threads don't have access to that. A thread can only split (in Crafty anyway) at the current ply it is at in the tree. There is no way to split at an earlier ply until you back up to that ply. If that happens to be on the other side of a split point, it can't happen at all because no one can back up through that split point except the original thread that got there, and the original thread can't back up through it (to split at an earlier point) until that split point has been completely searched and retired.



Well, currently i am certainly not interested in building such a solution, because i like it simple. But it does not look like that NOBODY would be able to do such a thing.
At this theoretical point, i don't think of course of benefits and disadvantages, just a thought if it _can_ be done.

So, please consider this statement as what it is, just brainstorming. :wink: :)

Currently my solution is simply the spinning helper threads "polling" the split point stack looking for work.
Everytime able to late join everywhere and additionally able to open their own splitpoint. Then the roundabout turns and turns.
The point is, that i currently don't see the practical profit of such more flexibility, given by passing tree state approaches.

So, now it is time for my first coffee of the day. Good morning everybody 8-)
Well, maybe we are talking of different things. But what i described is a logic that allows a helper thread to late join anytime and anywhere,
split in a subtree of a master or even split in front of it. It also can simply replace the master. There's no need to hack the stack data you mention.

Each thread will have it's own search stack.

Let's say master did split at ply 4 with the path 1.e4 e5 2. Nf3 Nc6 (Split master).
A helper thread may replay 1.e4 ( copying from the current master following the current search ) and then decides to split at ply 2.
Would it now be a problem to exclude e5 from the search and let the master know that it has to pass the result of this move to the helper thread for this move.
After the result has passed the master can end in smoke if it likes.

I don't see why the thread stack in any way can limit the logic.
I repeat myself, and i don't say this is somehow a good idea, i only argue that it is possible.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Smp concepts

Post by bob »

Desperado wrote:
bob wrote:
Desperado wrote:
So long as searching is being done, does it really matter where? Most of the time the splits occur farther toward the tips for obvious reasons, however.
This question is less rhetorical than it sounds :)
NOBODY can back up through a split point until it is done, which means nobody can split at someplace earlier in the tree.
Well, not tried it yet ( and probably will never try it ), but this seems not impossible to me, and might be less harder than you think.

Let's say there is a path where master0 splits like

Code: Select all

 mov1 move2 move3 split ...
If a helper thread would now copy the movelist(talking of the path) ( and necessary stack data) , it is

1. able to join the splitpoint
2. the master would be able to go in holidays.
3. or the helper thread might split even on an earlier depth.

To get something like this to work, the necessary communication must be setup. The 2 threads would need to merge at this earlier split point.
And maybe, just to mention it, the helper thread would need to start with the alpha beta window existing at this earlier split.
Beside the fact that a thread can join ( late join ) everywhere it would also be able to split everywhere, not exclusively in the subtree of the the masters split point.
You are thinking about the WRONG data. I am talking about the call stack that contains return addresses for the recursive calls to Search(). Other threads don't have access to that. A thread can only split (in Crafty anyway) at the current ply it is at in the tree. There is no way to split at an earlier ply until you back up to that ply. If that happens to be on the other side of a split point, it can't happen at all because no one can back up through that split point except the original thread that got there, and the original thread can't back up through it (to split at an earlier point) until that split point has been completely searched and retired.



Well, currently i am certainly not interested in building such a solution, because i like it simple. But it does not look like that NOBODY would be able to do such a thing.
At this theoretical point, i don't think of course of benefits and disadvantages, just a thought if it _can_ be done.

So, please consider this statement as what it is, just brainstorming. :wink: :)

Currently my solution is simply the spinning helper threads "polling" the split point stack looking for work.
Everytime able to late join everywhere and additionally able to open their own splitpoint. Then the roundabout turns and turns.
The point is, that i currently don't see the practical profit of such more flexibility, given by passing tree state approaches.

So, now it is time for my first coffee of the day. Good morning everybody 8-)
Well, maybe we are talking of different things. But what i described is a logic that allows a helper thread to late join anytime and anywhere,
split in a subtree of a master or even split in front of it. It also can simply replace the master. There's no need to hack the stack data you mention.

Each thread will have it's own search stack.

Let's say master did split at ply 4 with the path 1.e4 e5 2. Nf3 Nc6 (Split master).
A helper thread may replay 1.e4 ( copying from the current master following the current search ) and then decides to split at ply 2.
Would it now be a problem to exclude e5 from the search and let the master know that it has to pass the result of this move to the helper thread for this move.
After the result has passed the master can end in smoke if it likes.

I don't see why the thread stack in any way can limit the logic.
I repeat myself, and i don't say this is somehow a good idea, i only argue that it is possible.
Normally "late join" means joining at an existing split point. A,B and C split. Then later D joins at that split point to avoid the overhead of another split operation...

I don't see how, within my existing framework, anyway, to split per your example above. To split requires that the "master" do some work at that point to allocate a new split block for itself, and for any threads that are going to help. There is no way to do this back up the tree, because when I would get back to that point, I would need to be using the new split block, but the address saved in the call stack would be the original split block...
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Smp concepts

Post by Desperado »

BASIC
======

( skip this, already described what i am doing n times, maybe as quick reference for the "BRANCH" below )

1.

There is a splitpoint ressource. The most simple thing would be to have a global array of splitpoints like

* splitpoint[SPLITPOINTNUMBER]
* alternatively per Thread

2.

The main thread A starts is work, wants to split, and checks if there is a splitpoint unused.
It registers, then owns this splitpoint until it will be freed by thread A.


3. ( at this point i will branch later, but first the classical doing )

A helper thread B scans all splitpoints, gets the information that there is a used splitpoint it
can join ( late join, because it wasn't considered before, to help at this splitpoint ).
It just has to register itself at this splitpoint and copies the data needed like position etc.
Now Thread A+B working on this splitpoint. Thread A and Thread B can reserve other splitpoints
while running the subtree. ( n other threads can do what Thread B did so far )

3a. No more work, Thread B finishs first. It exits. Thread A finishes and can exit immediatelly.
3b. No more work, Thread A finishes before B. Now, instead of waiting Thread A scans the splitpoints,
where it can join as slave ( while "waiting" ) and joins somewhere. If Thread B has done its work,
Thread A will come back sometime after it fullfilled its job as helpful master. The splitpoint can be freed now.
The detail if thread A should join a splitpoint reserved by its slave B, i am just ignoring now, because it
is not important for the general flow.

Notice:
--------

That is the basic algorithm which works for n threads the same way as for A+B. ( By the way works fine )
Further it is important that in this approach, the threads assign work by themselves. There is no communication
between the threads.

BRANCH
=======

Now, the question is what has to be done if Thread B should be able to remove Thread A from the search, and
finish the search by itself. At this point it is important to notice that this requirement only includes the
potential to setup different algorithms, but does not change anything so far.

How can we make Thread B independent of Thread A. As I already said, the idea is pretty simple.
We need to know the current path of Thread A. Keeping the algorithm used so far, Thread B prepares itself
for some work to do, as it setups itself with the splitpoint data provided by Thread A.

Additionally it copies the path to that splitpoint, building its stack by playing out the path and starts searching
from this splitpoint picking the next move at hand. Like done before too. When Thread A returns to this splitpoint
it can be terminated. Thread B will be able to run the remaining search as exclusive thread if you like.

Until now the algorithm does not have changed. All we gained is that the dependency is dissolved between Thread A+B.
But now the higher potential kicks in. Thread B can instruct Thread A where to terminate along the path.

Example1: Split at ply 4
1.e4 e5 2.Nf3 Nc6 ... Thread A
1.e4 e5 2.Nf3 Nf6 ... Thread B -> tell thread A to terminate when it comes back to ply 4.

Example2: Split in front ( for whatever reason )
1.e4 e5 2.Nf3 Nc6 ... Thread A
1.e4 e5 2.Nc3 ... Thread B splits at ply 3. -> tell thread A to terminate when it comes back to ply 3.

The second example doesn't match the algorithm described anymore. It offers the option that Thread B can
join the splitpoint or even can split before, excluding the moves already searched at this ply. I guess that
it is obvious that to terminate Thread A is just a substitution for it can do whatever it wants, like doing
another search job. Finally i don't consider such a split before as serious option.

But:
----

The idea shows, and this can be a serious option, that a helper thread does not need to wait for anything.
Neither to be signaled by another thread nor a splitpoint object that it can join. It can jump in whenever
it likes. If a path already includes a splitpoint it can join, if not it can open a splitpoint itself along the
path. Just because the path makes the available threads autonomous.

Well, i must admit that up to date the "BRANCH" is just an idea, not based on some practical experience.
On the other hand i don't see a serious problem why this is not manageable.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Smp concepts

Post by bob »

Desperado wrote:BASIC
======

( skip this, already described what i am doing n times, maybe as quick reference for the "BRANCH" below )

1.

There is a splitpoint ressource. The most simple thing would be to have a global array of splitpoints like

* splitpoint[SPLITPOINTNUMBER]
* alternatively per Thread

2.

The main thread A starts is work, wants to split, and checks if there is a splitpoint unused.
It registers, then owns this splitpoint until it will be freed by thread A.


3. ( at this point i will branch later, but first the classical doing )

A helper thread B scans all splitpoints, gets the information that there is a used splitpoint it
can join ( late join, because it wasn't considered before, to help at this splitpoint ).
It just has to register itself at this splitpoint and copies the data needed like position etc.
Now Thread A+B working on this splitpoint. Thread A and Thread B can reserve other splitpoints
while running the subtree. ( n other threads can do what Thread B did so far )

3a. No more work, Thread B finishs first. It exits. Thread A finishes and can exit immediatelly.
3b. No more work, Thread A finishes before B. Now, instead of waiting Thread A scans the splitpoints,
where it can join as slave ( while "waiting" ) and joins somewhere. If Thread B has done its work,
Thread A will come back sometime after it fullfilled its job as helpful master. The splitpoint can be freed now.
The detail if thread A should join a splitpoint reserved by its slave B, i am just ignoring now, because it
is not important for the general flow.

So far, that is almost exactly what I do (and have done for years) in Crafty. You are using the term "split point" a little more loosely than I do, because a split point in Crafty does not exist until a split has actually been done when some thread reports in as "idle, waiting for work."


Notice:
--------

That is the basic algorithm which works for n threads the same way as for A+B. ( By the way works fine )
Further it is important that in this approach, the threads assign work by themselves. There is no communication
between the threads.

Which begs this question. A reaches ply 10 and keeps right on going. plies 8, 9, 10, 11, 12, etc all are using the same "split block" which was created somewhere back up in the tree (ply 8 or earlier). How does a thread join at ply 10 when A is now down at 20 or whatever? When A backs up through the tree and gets to ply 10, it still has a pointer to the same split block that it grabbed when it originally split prior to ply 8. How do you sneak in and change the call stack data so that when it gets back to ply 10 it will know (a) that it can't back up any further since this is a split point; and (b) that it should now not be using the original split data but the data that was created when this "sneaky split" was done at ply=10 by a different thread.




BRANCH
=======

Now, the question is what has to be done if Thread B should be able to remove Thread A from the search, and
finish the search by itself. At this point it is important to notice that this requirement only includes the
potential to setup different algorithms, but does not change anything so far.

How can we make Thread B independent of Thread A. As I already said, the idea is pretty simple.
We need to know the current path of Thread A. Keeping the algorithm used so far, Thread B prepares itself
for some work to do, as it setups itself with the splitpoint data provided by Thread A.

Additionally it copies the path to that splitpoint, building its stack by playing out the path and starts searching
from this splitpoint picking the next move at hand. Like done before too. When Thread A returns to this splitpoint
it can be terminated. Thread B will be able to run the remaining search as exclusive thread if you like.
We are using a different definition for this call stack it would seem. My terminology is the stack that is produced when Search() is recursively calling itself. Just walking down a PV recursively to update the data to the current split block is not good enough. The call stack return addresses point to the WRONG places in memory, namely they point to the return address produced by the "PV-walker" as opposed to the actual Search() stack in the original thread...

Until now the algorithm does not have changed. All we gained is that the dependency is dissolved between Thread A+B.
But now the higher potential kicks in. Thread B can instruct Thread A where to terminate along the path.

Example1: Split at ply 4
1.e4 e5 2.Nf3 Nc6 ... Thread A
1.e4 e5 2.Nf3 Nf6 ... Thread B -> tell thread A to terminate when it comes back to ply 4.

Example2: Split in front ( for whatever reason )
1.e4 e5 2.Nf3 Nc6 ... Thread A
1.e4 e5 2.Nc3 ... Thread B splits at ply 3. -> tell thread A to terminate when it comes back to ply 3.

The second example doesn't match the algorithm described anymore. It offers the option that Thread B can
join the splitpoint or even can split before, excluding the moves already searched at this ply. I guess that
it is obvious that to terminate Thread A is just a substitution for it can do whatever it wants, like doing
another search job. Finally i don't consider such a split before as serious option.

But:
----

The idea shows, and this can be a serious option, that a helper thread does not need to wait for anything.
Neither to be signaled by another thread nor a splitpoint object that it can join. It can jump in whenever
it likes. If a path already includes a splitpoint it can join, if not it can open a splitpoint itself along the
path. Just because the path makes the available threads autonomous.

Well, i must admit that up to date the "BRANCH" is just an idea, not based on some practical experience.
On the other hand i don't see a serious problem why this is not manageable.
Here's the problem I have:

Thread A searches ply 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... and is somewhere below that. It is using split block zero (0) where I am assuming we just started a new iteration and are following the left-most edge of the tree down to max depth. No YBW criterion has been met, yet, so no splits.

Now, using your example, how can thread B magically split with A back at ply=10. As A works its way back up the tree, when it gets to ply=10, it still has a pointer to split block 0. If I had actually split at ply=10, thread A would get a new split block (1) and thread B would get split block (2) so that they have private data and don't interfere with each other.

Given my current framework, there is no way to do that split back up the tree without having thread A do the work. A would need to stop, work its way back up the call stack to ply 10 by returning, then do a split, and then work its way back down the tree to the correct ply where it can continue.

At least in the way I do this, your idea seems unworkable. If we get rid of the recursive Search() calls, and go to an iterated search (where Search() is called once, and iterates over ply to run the entire search with that one call, as I did in Cray Blitz) then it is easy, because there is no call stack, just global data and any thread can change the tree data, split wherever it wants, without breaking anything. I just don't want to give up the recursive search at the moment.