Daniel Shawul wrote:
The problem is that my old algorithm was not YBW, it was a home made algorithm that admittedly has many YBW and DTS-like properties, but still is quite different. For one thing, it keeps all potential parallel tasks in a priority queue data structure, and that structure is only accessed under a lock.

I need to find a way to efficiently parallelize this data structure, and I don't want to implement locking primitives using message passing.

A centralized work queue (probably very inefficient) or a work-stealing like algorithm where each processor keeps its own queue ?

It is a centralized data structure. I got it to work reasonably well on SMP/NUMA machines by using various tricks, like:

* Dynamically adjusting the minimum split depth based on measured lock contention.

* Use a hand written heap implementation that is faster than the gcc STL version.

* Use trylock when attempting to insert into the queue. If the lock is busy store the splitpoint locally and then try again in the next search tree node, and perform a batch insert once you get the lock.

Nevertheless it is obvious that a centralized queue is not going to work well for a large cluster (maybe not even for any cluster). I would like to have some data structure that works like a priority queue on a local scale, but does something more scalable on a non-local scale. Some kind of approximate priority queue I guess.

Daniel Shawul wrote:Also the data structure is actually more complicated that a priority queue, because it also has operations to recompute probability estimates and propagate those estimates between parent/child relationships between splitpoints.

A probability estimate for the likelihood of the success of a splitpoint? It seems a complicated algorithm indeed;

Yes, for each move in a splitpoint a probability is computed that a sequential search would have searched that move. The computation is based on measuring how often a fail high occurs for various move numbers and node types.

This probability changes every time a search for a move in the splitpoint completes. For example, in a position the probability that move 4 needs to be searched might be 70% after the search for move 1 has finished, but if the search for move 2 has also finished, the probability that move 4 is needed might be 95%. Therefore every time a search finishes for a move in a splitpoint, the priority queue needs to be updated.

For splitpoints not created by the master thread, there is another complication. The probability that a move in that splitpoint is needed by the master thread is computed from several parts:

The probability computed by fail high statistics for the move in that splitpoint.

The probability that the splitpoint itself is needed by the master thread.

The probability that the parent splitpoint is needed by the master thread.

...

The final probability is obtained by multiplying those probabilities.

This means that when the search finishes for a move at a splitpoint, the probability needs to be updated also for all child splitpoints.

Quite complicated indeed. That's why I hope some simpler algorithm could perform equally well, or even better.

Daniel Shawul wrote:will be looking forward to the results of ABDADA and your algorithm. You should probably aim for inventing algorithms on clusters with thousands of nodes, as it seems SMP and cluster with few nodes is a "dead issue" IMO.

Yes, I am aiming for thousands of nodes, but I will have no way to actually test that in the foreseeable future.