Time to depth measuring tool

michiguel · Post by **michiguel** » Wed Jul 31, 2013 12:16 am

Laskos wrote:
Adam Hair wrote:

Very interesting, Adam. I have no possibility to check at very shallow depths 1->4 cores speedup, as the granularity of what I get is 1 second, nevertheless I did it for Houdini 3 for depth from 14 to 21, and fitted to a shifted logistic. What you fitted the data with? Simple Amdahl's a*x/(b + x) doesn't work well. The last point (depth 21) I got in 3 hours, 30 positions repeated 5 times.

Are you sure that the limiting value is 4 and not nps(4 cores)/nps(1 core)? Very good observation that TTD is dependent on depth.

Assuming an Amdahl-like behavior:

S = (R + 1) * n / (R + n) eq.1

S = speed up, n = number of cores and R = Np/Ns, which is the ratio of parallelizable nodes/non-parallelizable (serial) nodes.

Then, Np and Ns can be approximated by

Np = f1 * F^(d-5) (exponential growth of parallelizable nodes from ply 5 an on)
Ns = f2 * G^(d-5) + f3 (exponential growth of serial nodes + fixed number existent at ply five)

where
F is the branching factor for the parallelizable nodes
G is the branching factor for the serial nodes
d is depth, which is substracted 5 since I do not use SMP in the last 4 plies.
f1,f2,f3 are proportionality parameters.

Then the ratio becomes
R = f1 * F^(d-4) / (f3 + f2 * G^(d-4))

dividing by f2
R = f1/f2 * F^(d-4) / (f3/f2 + G^(d-4))

if
a = f1/f2
b = f3/f2

R = a * F^(d-4) / (b + G^(d-4)) eq. 2

So plugging eq. 2 into eq. 1 will give an equation S = f(d).

In the plot Adam showed above the fitting was

a 0.57083499939221
F 2.5903162588137
G 2.14909239883083
b 25.1209749090981

Miguel

Laskos · Post by **Laskos** » Wed Jul 31, 2013 4:07 pm

michiguel wrote:
Assuming an Amdahl-like behavior:

S = (R + 1) * n / (R + n) eq.1

S = speed up, n = number of cores and R = Np/Ns, which is the ratio of parallelizable nodes/non-parallelizable (serial) nodes.

Then, Np and Ns can be approximated by

Np = f1 * F^(d-5) (exponential growth of parallelizable nodes from ply 5 an on)
Ns = f2 * G^(d-5) + f3 (exponential growth of serial nodes + fixed number existent at ply five)

where
F is the branching factor for the parallelizable nodes
G is the branching factor for the serial nodes
d is depth, which is substracted 5 since I do not use SMP in the last 4 plies.
f1,f2,f3 are proportionality parameters.

Then the ratio becomes
R = f1 * F^(d-4) / (f3 + f2 * G^(d-4))

dividing by f2
R = f1/f2 * F^(d-4) / (f3/f2 + G^(d-4))

if
a = f1/f2
b = f3/f2

R = a * F^(d-4) / (b + G^(d-4)) eq. 2

So plugging eq. 2 into eq. 1 will give an equation S = f(d).

In the plot Adam showed above the fitting was

a 0.57083499939221
F 2.5903162588137
G 2.14909239883083
b 25.1209749090981

Miguel

Thanks, the critical point here is that F>G, i.e. Np/Ns explodes with depth, so that limit d->infinity for TTD is number of cores n, and for d=5 TTD is close to 1 if a<<b (or f1<<f3). You mean (d-5) in the expression for R, not (d-4), right?

I fitted that for Houdini 3, but with very few points, and I don't really know how Houdini behaves at shallow depths:
a -> 0.482442, b -> 11.0359, F -> 1.94741, G -> 1.62379

Without a model for Houdini, the shifted logistic with 2 parameters fits well too:

Adam's plot is conclusive that your model works for Gaviota, very interesting.

Laskos · Post by **Laskos** » Wed Jul 31, 2013 11:15 pm

michiguel wrote:
Assuming an Amdahl-like behavior:

S = (R + 1) * n / (R + n) eq.1

S = speed up, n = number of cores and R = Np/Ns, which is the ratio of parallelizable nodes/non-parallelizable (serial) nodes.

Then, Np and Ns can be approximated by

Np = f1 * F^(d-5) (exponential growth of parallelizable nodes from ply 5 an on)
Ns = f2 * G^(d-5) + f3 (exponential growth of serial nodes + fixed number existent at ply five)

where
F is the branching factor for the parallelizable nodes
G is the branching factor for the serial nodes
d is depth, which is substracted 5 since I do not use SMP in the last 4 plies.
f1,f2,f3 are proportionality parameters.

Then the ratio becomes
R = f1 * F^(d-4) / (f3 + f2 * G^(d-4))

dividing by f2
R = f1/f2 * F^(d-4) / (f3/f2 + G^(d-4))

if
a = f1/f2
b = f3/f2

R = a * F^(d-4) / (b + G^(d-4)) eq. 2

So plugging eq. 2 into eq. 1 will give an equation S = f(d).

In the plot Adam showed above the fitting was

a 0.57083499939221
F 2.5903162588137
G 2.14909239883083
b 25.1209749090981

Miguel

Miguel, isn't the TTD limited by NPS(4 cores)/NPS(1 core) as depth->infinity? I managed to get NPS speedup 1->4 cores for different depths with Houdini 3, first with LittleBlitzer for low depths, then with Arena for high depths. The NPS speedup curve is fitted very well with your model
a -> 1.52095, b -> 35.4931, F -> 2.3887, G -> 2.10453

It seems that NPS speedup is always higher than TTD, approaching each other at high and very low depths. Isn't the NPS speedup the limiting envelope of TTD 1->4 cores?

michiguel · Post by **michiguel** » Wed Jul 31, 2013 11:35 pm

Laskos wrote:
michiguel wrote:
Assuming an Amdahl-like behavior:

S = (R + 1) * n / (R + n) eq.1

S = speed up, n = number of cores and R = Np/Ns, which is the ratio of parallelizable nodes/non-parallelizable (serial) nodes.

Then, Np and Ns can be approximated by

Np = f1 * F^(d-5) (exponential growth of parallelizable nodes from ply 5 an on)
Ns = f2 * G^(d-5) + f3 (exponential growth of serial nodes + fixed number existent at ply five)

where
F is the branching factor for the parallelizable nodes
G is the branching factor for the serial nodes
d is depth, which is substracted 5 since I do not use SMP in the last 4 plies.
f1,f2,f3 are proportionality parameters.

Then the ratio becomes
R = f1 * F^(d-4) / (f3 + f2 * G^(d-4))

dividing by f2
R = f1/f2 * F^(d-4) / (f3/f2 + G^(d-4))

if
a = f1/f2
b = f3/f2

R = a * F^(d-4) / (b + G^(d-4)) eq. 2

So plugging eq. 2 into eq. 1 will give an equation S = f(d).

In the plot Adam showed above the fitting was

a 0.57083499939221
F 2.5903162588137
G 2.14909239883083
b 25.1209749090981

Miguel
Miguel, isn't the TTD limited by NPS(4 cores)/NPS(1 core) as depth->infinity? I managed to get NPS speedup 1->4 cores for different depths with Houdini 3, first with LittleBlitzer for low depths, then with Arena for high depths. The NPS speedup curve is fitted very well with your model
a -> 1.52095, b -> 35.4931, F -> 2.3887, G -> 2.10453

It seems that NPS speedup is always higher than TTD, approaching each other at high and very low depths. Isn't the NPS speedup the limiting envelope of TTD 1->4 cores?

Nice fit!

"Miguel, isn't the TTD limited by NPS(4 cores)/NPS(1 core) as depth->infinity?"

I don't know... but there is a chance that the answer is both... At infinity it is possible that NPS(4cores)/NPS(1 core) is actually 4, or most likely near 4 (e.g. 3.8-3.9 ish). There are two major reasons for having this lower than 4. a) some nodes near the PV and near the leaves are always analyzed serially and b) delays based on resource competition. At infinity, "a" will become negligible, but "b" won't be zero.

So, it will be interesting to plot NPS(4)/NPS(1) vs. depth to see if how it grows.

One thing is very clear, it is critical to take into account depth when this experiments are done, particularly if one is going to make comparisons.

Miguel
PS: Yes I meant d-5

EDIT: Now I see that you plot NPS speedup, not speed up. So it means that the speedup curve is strongly influenced by the nps, which makes sense. In my example above, reason "a" seems to be dominant.

Laskos · Post by **Laskos** » Mon Aug 05, 2013 11:43 pm

michiguel wrote:
Nice fit!

"Miguel, isn't the TTD limited by NPS(4 cores)/NPS(1 core) as depth->infinity?"

I don't know... but there is a chance that the answer is both... At infinity it is possible that NPS(4cores)/NPS(1 core) is actually 4, or most likely near 4 (e.g. 3.8-3.9 ish). There are two major reasons for having this lower than 4. a) some nodes near the PV and near the leaves are always analyzed serially and b) delays based on resource competition. At infinity, "a" will become negligible, but "b" won't be zero.

So, it will be interesting to plot NPS(4)/NPS(1) vs. depth to see if how it grows.

One thing is very clear, it is critical to take into account depth when this experiments are done, particularly if one is going to make comparisons.

Miguel
PS: Yes I meant d-5
EDIT: Now I see that you plot NPS speedup, not speed up. So it means that the speedup curve is strongly influenced by the nps, which makes sense. In my example above, reason "a" seems to be dominant.

I plotted on the same graph the TTD speedup and NPS speedup 1->4 cores for Houdini 3. They seem to converge for very large depth or d->infinity. The lesson is that there is no independent of depth definition of neither TTD speedup, nor NPS speedup. So, the tests at ultra-short time controls don't give a picture of what parallelization does.

Time to depth measuring tool

Re: Time to depth measuring tool

Re: Time to depth measuring tool

Re: Time to depth measuring tool

Re: Time to depth measuring tool

Re: Time to depth measuring tool