On 150 positions with depths used for each engine so that the time used on 1 core is about 1-2 minutes per position, I measured average time-to-depth and average NPS.
Time-to-Depth Speedup:
Code: Select all
Name 1 thread 2 thread 4 thread 8 thread
Zappa (DTS) 1.00 1.99 3.65 5.62
Stockfish (YBWC) 1.00 1.93 3.04 4.76
Houdini (YBWC) 1.00 1.81 2.78 4.13
Cheng (Lazy SMP) 1.00 1.57 2.13 2.42
Andscacs (Lazy SMP) 1.00 1.29 1.74 1.54
Komodo (Lazy SMP ?) 1.00 1.41 1.54 1.83

First thing observed: Lazy SMP engines seem to behave differently from two more conventional parallelization implentations. Komodo seems to fall into Lazy SMP on this characteristic.
For Zappa, Stockfish, Houdini, time-to-depth is equal with Effective Speedup, as they don't widen their search. But Lazy SMP engines (and Komodo) do widen a lot during parallel search, going from 1 to 8 threads. Effective Speedup is larger in their case than time-to-depth speedup shown here. The widening is shown here at fixed depth in cutechess-cli:
H1=40 Elo points, H0=0 points, alpha=beta=0.05.
Score of Cheng 8 threads vs Cheng 1 thread: 91 - 51 - 71 [0.594] 213
ELO difference: 66
SPRT: H1 was accepted
Finished match
Score of Andscacs 8 threads vs Andscacs 1 thread: 16 - 1 - 13 [0.750] 30
ELO difference: 191
SPRT: H1 was accepted
Finished match
Score of Komodo 8 threads vs Komodo 1 thread: 29 - 13 - 63 [0.576] 105
ELO difference: 53
SPRT: H1 was accepted
Finished match
Komodo again seems similar to Lazy SMP engines in widening.
Then, NPS speedup came as colateral, and useful limiting value of Effective Speedup.
NPS Speedup:
Code: Select all
Name 1 thread 2 thread 4 thread 8 thread
Zappa (DTS) 1.00 1.96 3.59 5.60
Stockfish (YBWC) 1.00 1.97 3.75 6.85
Houdini (YBWC) 1.00 1.97 3.79 6.33
Cheng (Lazy SMP) 1.00 1.98 3.92 7.22
Andscacs (Lazy SMP) 1.00 1.93 3.71 7.23
Komodo (Lazy SMP ?) 1.00 1.99 3.93 7.09

Lazy SMP engines seem to scale the best on NPS. For Zappa (DTS) the NPS speedup matched within error margins to the total Effective Speedup given by time-to-depth. I don't understand why Lazy SMP engines with shared Hash and TT do widen copiously, and go deeper only partially.