Lockfree parallel MCTS - how?

dangi12012 · Post by **dangi12012** » Fri Oct 07, 2022 4:48 pm

MCTS: A leaf node is selected for evaluation. Then this result gets backtracked towards the tree root. All nodes can be updated atomically without locks.

Now when there are multiple threads working on the tree, and they are picking nodes around the same time - they will all end up at the same node, since nothing else is updated yet.
This seems like a complete waste of resources - ideally MCTS should scale to 100s of threads without any inter-threat communication but still picking different nodes.

If two sibling nodes are close enough should randomness decide which path to take towards the MCTS leaf?

Any devs here that thought about this already?

smatovic · Post by **smatovic** » Fri Oct 07, 2022 7:31 pm

If you use MCTS-UCT with UCB1 this will normalize after a certain amount of playouts, and you usually do not need locks for this, dunno about MCTS-PUCT in Lc0/A0.

--
Srdja

smatovic · Post by **smatovic** » Sat Oct 08, 2022 9:46 am

MCTS consists of four strategic phases, repeated as long as there is time left [13] :

In the Selection phase the tree is traversed from the root node until it selects a leaf node that is not added to the tree yet
The Expansion strategy adds the leaf node to the tree
The Simulation strategy plays moves in self-play until the end of the game. The result is either 1, 0 ,-1
The Backpropagation strategy propagates the results through the tree

https://www.chessprogramming.org/Monte- ... ree_Search

If you mean the expansion phase of a leaf node, to add the node to the game tree in memory, then I guess you really need locks for this. In my parallel BestFirst implementation I simply restart the selection phase if a thread encounters a lock for expansion.

In one of my MCTS-UCT implementations I did limit the expansion phase, can not recall what the papers say about this, let's say you perform a 1000 MC playouts on a leaf, and then the node gets expanded and added to the tree, or alike, to limit the memory usage.

--
Srdja

dangi12012 · Post by **dangi12012** » Sat Oct 08, 2022 12:16 pm

smatovic wrote: ↑Sat Oct 08, 2022 9:46 am If you mean the expansion phase of a leaf node, to add the node to the game tree in memory, then I guess you really need locks for this. In my parallel BestFirst implementation I simply restart the selection phase if a thread encounters a lock for expansion.

I mean MCTS with many threads when a node is getting expanded. If other threads select a node via ucb1 around the same time they will all end up at the same node since for each thread no tree value changed in the meantime.

If the implementation is like yours - 1 thread will expand and 63 threads will reach the same node then (via ucb1) and restart until this is done?

smatovic · Post by **smatovic** » Sat Oct 08, 2022 12:42 pm

dangi12012 wrote: ↑Sat Oct 08, 2022 12:16 pm
smatovic wrote: ↑Sat Oct 08, 2022 9:46 am If you mean the expansion phase of a leaf node, to add the node to the game tree in memory, then I guess you really need locks for this. In my parallel BestFirst implementation I simply restart the selection phase if a thread encounters a lock for expansion.
I mean MCTS with many threads when a node is getting expanded. If other threads select a node via ucb1 around the same time they will all end up at the same node since for each thread no tree value changed in the meantime.

You can propagate in this case the lock flag back in the tree, so other threads do not select this path, but if you increase the visit counters for UCB1 the path to select will also change over time.

dangi12012 wrote: ↑Sat Oct 08, 2022 12:16 pm If the implementation is like yours - 1 thread will expand and 63 threads will reach the same node then (via ucb1) and restart until this is done?

Yes, they may reach the same node and then restart the selection phase. There are several ways to enhance UCB1 with more parameters, as you suggested, randomness is one of those.

--
Srdja

dangi12012 · Post by **dangi12012** » Sat Oct 08, 2022 9:08 pm

Ah I see - then I see a solution now.

Each leaf node has a std::atomic<bool> which gets set on entering the expansion or eval phase. This is not a lock but more like a "do not enter" sign.
When a thread encounters a leaf and the bool is false - it will just try the next sibling or backtrack to the parent recursively to get the next node according to ucb1.
This way each thread can do meaningful work and is not reset to root, neither are nodes visited unnecessarily nor is ucb1 calculated too often.

smatovic · Post by **smatovic** » Sun Oct 09, 2022 7:48 am

If you propagate the lock flag (IIRC I used a int with thread id+1) back in the tree you can update the score tree with the locked node excluded, I did this with BestFirst with game tree in memory, dunno if this is feasible with UCB1, to select simply the next sibling might be quicker, but from the score tree point of view the wrong decision, if the node was expaned and added to the tree, you can free the lock and reupdate the score tree again.

--
Srdja

smatovic · Post by **smatovic** » Sun Oct 09, 2022 10:45 am

Okay, maybe to summarize what I did with BestFirst and MCTS-UCT with game tree in memory in different implementations in case of an locked expansion node:

- you can modify UCB1 with more parameters, randomness, to vary selection
- you can just increment UCB1 visit counters and restart selection phase if a thread encounters a lock for expansion
- you can back-propagate the lock through the tree so other threads select a different path
- you can update the score tree back in the tree with locked node excluded so other threads select a different path

I am not aware of any lock-less method in case of an expansion node, and idk how Lc0/A0 handles this with MCTS-PUCT.

--
Srdja

AAce3 · Post by **AAce3** » Sun Oct 09, 2022 5:34 pm

smatovic wrote: ↑Sun Oct 09, 2022 10:45 am I am not aware of any lock-less method in case of an expansion node, and idk how Lc0/A0 handles this with MCTS-PUCT.

I don't know if my interpretation of the code is correct, however, I believe that Lc0 uses locks. The main thing is that the neural network evaluations take so much time that most of the time, when expanding a node, you're going to be sending stuff to GPU, waiting for it to return something, rather than traversing the tree, which takes a short time comparatively.

Feel free to correct me if I'm wrong though! I don't use C++ so my guess may be completely wrong.

Daniel Shawul · Post by **Daniel Shawul** » Sun Oct 09, 2022 5:47 pm

Scorpio mcts is lockless. I use atomic operations and try_lock in places where you usually need locks.
This is important especially because i use upto 640 threads for mcts search, while leela typically launches 2 or 4 threads.

Lockfree parallel MCTS - how?

Lockfree parallel MCTS - how?

Re: Lockfree parallel MCTS - how?

Re: Lockfree parallel MCTS - how?

Re: Lockfree parallel MCTS - how?

Re: Lockfree parallel MCTS - how?

Re: Lockfree parallel MCTS - how?

Re: Lockfree parallel MCTS - how?

Re: Lockfree parallel MCTS - how?

Re: Lockfree parallel MCTS - how?

Re: Lockfree parallel MCTS - how?