Since my experiments to improve on lazy SMP in texel by using ideas from the
ABDADA algorithm were
quite successful, I decided to write a summary of the algorithm I ended up with.
Differences compared to the original ABDADA algorithm
There are three categories of differences:
* The ABDADA exclusive condition is only approximately enforced, in order to reduce overhead, especially for clusters.
* Ideas from "lazy SMP" type algorithms that were successful in texel.
* Ideas from the texel "lazy cluster" algorithm.
Busy bit
* In transposition table entries, the "nproc" counter has been replaced by a single "busy" bit.
* There is no sequential consistency when updating the transposition table. The busy bit can sometimes be wrong.
* Handling of the busy bit and exclusive searches are only performed if remaining depth >= 7.
* The busy bit is independent of the rest of the information in the transposition table. In the original ABDADA algorithm a transposition table probe cannot return a score for a node currently being searched.
* If a transposition table entry does not already exist for a position, the entry is not created just to be able to store the busy bit.
From lazy SMP
* If a helper thread finishes its search before the master thread, the result from the helper thread is used instead of waiting for the master to finish its search.
* After a helper thread finishes its search, it will immediately restart its current search at a larger depth (1 ply deeper).
* The transposition table is shared between all threads running on an SMP or NUMA computer.
From lazy cluster
* Each node has its own local transposition table. All probe and insert operations act only on the local transposition table.
* A task that runs independently of the tree search algorithm sends interesting transposition table entries to other cluster nodes periodically.
Pseudo code
The part of the algorithm that deals with the recursive search function works as follows:
Code: Select all
int approximateAbdada(Position pos, int alpha, int beta, int depth, bool exclusive) {
// Check for abort
// Repetition check
TTEntry& e = tt.probe(pos);
if (e.cutOff(alpha, beta, depth))
return e.score;
if (depth >= 7) {
if (exclusive && e.busy)
return BUSY;
e.busy = true;
}
// null move, forward pruning, etc
MoveList moves = generateMoves(pos);
bool searched[256];
int nextDepth[256];
for (int pass = 0; pass < 2; pass++) {
for (int i = 0; i < moves.size(); i++) {
if (pass == 0)
nextDepth[i] = depth - 1 + extension - reduction;
else if (searched[i])
continue;
bool nextExclusive = pass == 0 && i > 0;
pos.makeMove(moves[i]);
int score = -approximateAbdada(pos, -beta, -alpha, nextDepth[i], nextExclusive);
pos.unMakeMove(moves[i]);
searched[i] = score != -BUSY;
if (!searched[i])
continue;
// Update alpha and bestScore, break if done
}
}
tt.store(pos, bestScore); // Also sets entry.busy = false;
return bestScore;
}
For the full SMP/cluster algorithm, the approximateAbdada algorithm has to be combined with the
original lazy SMP/cluster algorithm, basically by replacing the negaScout function with the approximateAbdada function.
Advantages
* The algorithm is fully asynchronous. No part of the algorithm requires waiting for results from other threads or other cluster nodes.
* Low latency cluster communication is not essential. Gigabit ethernet is enough.