Assuming an Amdahl-like behavior:Laskos wrote:Adam Hair wrote:
Very interesting, Adam. I have no possibility to check at very shallow depths 1->4 cores speedup, as the granularity of what I get is 1 second, nevertheless I did it for Houdini 3 for depth from 14 to 21, and fitted to a shifted logistic. What you fitted the data with? Simple Amdahl's a*x/(b + x) doesn't work well. The last point (depth 21) I got in 3 hours, 30 positions repeated 5 times.
Are you sure that the limiting value is 4 and not nps(4 cores)/nps(1 core)? Very good observation that TTD is dependent on depth.
S = (R + 1) * n / (R + n) eq.1
S = speed up, n = number of cores and R = Np/Ns, which is the ratio of parallelizable nodes/non-parallelizable (serial) nodes.
Then, Np and Ns can be approximated by
Np = f1 * F^(d-5) (exponential growth of parallelizable nodes from ply 5 an on)
Ns = f2 * G^(d-5) + f3 (exponential growth of serial nodes + fixed number existent at ply five)
where
F is the branching factor for the parallelizable nodes
G is the branching factor for the serial nodes
d is depth, which is substracted 5 since I do not use SMP in the last 4 plies.
f1,f2,f3 are proportionality parameters.
Then the ratio becomes
R = f1 * F^(d-4) / (f3 + f2 * G^(d-4))
dividing by f2
R = f1/f2 * F^(d-4) / (f3/f2 + G^(d-4))
if
a = f1/f2
b = f3/f2
R = a * F^(d-4) / (b + G^(d-4)) eq. 2
So plugging eq. 2 into eq. 1 will give an equation S = f(d).
In the plot Adam showed above the fitting was
a 0.57083499939221
F 2.5903162588137
G 2.14909239883083
b 25.1209749090981
Miguel