Time to depth measuring tool

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Time to depth measuring tool

Post by michiguel »

Laskos wrote:
Adam Hair wrote:

Very interesting, Adam. I have no possibility to check at very shallow depths 1->4 cores speedup, as the granularity of what I get is 1 second, nevertheless I did it for Houdini 3 for depth from 14 to 21, and fitted to a shifted logistic. What you fitted the data with? Simple Amdahl's a*x/(b + x) doesn't work well. The last point (depth 21) I got in 3 hours, 30 positions repeated 5 times.
Image

Are you sure that the limiting value is 4 and not nps(4 cores)/nps(1 core)? Very good observation that TTD is dependent on depth.
Assuming an Amdahl-like behavior:

S = (R + 1) * n / (R + n) eq.1

S = speed up, n = number of cores and R = Np/Ns, which is the ratio of parallelizable nodes/non-parallelizable (serial) nodes.

Then, Np and Ns can be approximated by

Np = f1 * F^(d-5) (exponential growth of parallelizable nodes from ply 5 an on)
Ns = f2 * G^(d-5) + f3 (exponential growth of serial nodes + fixed number existent at ply five)

where
F is the branching factor for the parallelizable nodes
G is the branching factor for the serial nodes
d is depth, which is substracted 5 since I do not use SMP in the last 4 plies.
f1,f2,f3 are proportionality parameters.

Then the ratio becomes
R = f1 * F^(d-4) / (f3 + f2 * G^(d-4))

dividing by f2
R = f1/f2 * F^(d-4) / (f3/f2 + G^(d-4))

if
a = f1/f2
b = f3/f2


R = a * F^(d-4) / (b + G^(d-4)) eq. 2

So plugging eq. 2 into eq. 1 will give an equation S = f(d).

In the plot Adam showed above the fitting was

a 0.57083499939221
F 2.5903162588137
G 2.14909239883083
b 25.1209749090981

Miguel
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Time to depth measuring tool

Post by Laskos »

michiguel wrote:
Assuming an Amdahl-like behavior:

S = (R + 1) * n / (R + n) eq.1

S = speed up, n = number of cores and R = Np/Ns, which is the ratio of parallelizable nodes/non-parallelizable (serial) nodes.

Then, Np and Ns can be approximated by

Np = f1 * F^(d-5) (exponential growth of parallelizable nodes from ply 5 an on)
Ns = f2 * G^(d-5) + f3 (exponential growth of serial nodes + fixed number existent at ply five)

where
F is the branching factor for the parallelizable nodes
G is the branching factor for the serial nodes
d is depth, which is substracted 5 since I do not use SMP in the last 4 plies.
f1,f2,f3 are proportionality parameters.

Then the ratio becomes
R = f1 * F^(d-4) / (f3 + f2 * G^(d-4))

dividing by f2
R = f1/f2 * F^(d-4) / (f3/f2 + G^(d-4))

if
a = f1/f2
b = f3/f2


R = a * F^(d-4) / (b + G^(d-4)) eq. 2

So plugging eq. 2 into eq. 1 will give an equation S = f(d).

In the plot Adam showed above the fitting was

a 0.57083499939221
F 2.5903162588137
G 2.14909239883083
b 25.1209749090981

Miguel
Thanks, the critical point here is that F>G, i.e. Np/Ns explodes with depth, so that limit d->infinity for TTD is number of cores n, and for d=5 TTD is close to 1 if a<<b (or f1<<f3). You mean (d-5) in the expression for R, not (d-4), right?

I fitted that for Houdini 3, but with very few points, and I don't really know how Houdini behaves at shallow depths:
a -> 0.482442, b -> 11.0359, F -> 1.94741, G -> 1.62379
Image

Without a model for Houdini, the shifted logistic with 2 parameters fits well too:
Image

Adam's plot is conclusive that your model works for Gaviota, very interesting.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Time to depth measuring tool

Post by Laskos »

michiguel wrote:
Assuming an Amdahl-like behavior:

S = (R + 1) * n / (R + n) eq.1

S = speed up, n = number of cores and R = Np/Ns, which is the ratio of parallelizable nodes/non-parallelizable (serial) nodes.

Then, Np and Ns can be approximated by

Np = f1 * F^(d-5) (exponential growth of parallelizable nodes from ply 5 an on)
Ns = f2 * G^(d-5) + f3 (exponential growth of serial nodes + fixed number existent at ply five)

where
F is the branching factor for the parallelizable nodes
G is the branching factor for the serial nodes
d is depth, which is substracted 5 since I do not use SMP in the last 4 plies.
f1,f2,f3 are proportionality parameters.

Then the ratio becomes
R = f1 * F^(d-4) / (f3 + f2 * G^(d-4))

dividing by f2
R = f1/f2 * F^(d-4) / (f3/f2 + G^(d-4))

if
a = f1/f2
b = f3/f2


R = a * F^(d-4) / (b + G^(d-4)) eq. 2

So plugging eq. 2 into eq. 1 will give an equation S = f(d).

In the plot Adam showed above the fitting was

a 0.57083499939221
F 2.5903162588137
G 2.14909239883083
b 25.1209749090981

Miguel
Miguel, isn't the TTD limited by NPS(4 cores)/NPS(1 core) as depth->infinity? I managed to get NPS speedup 1->4 cores for different depths with Houdini 3, first with LittleBlitzer for low depths, then with Arena for high depths. The NPS speedup curve is fitted very well with your model
a -> 1.52095, b -> 35.4931, F -> 2.3887, G -> 2.10453
Image

It seems that NPS speedup is always higher than TTD, approaching each other at high and very low depths. Isn't the NPS speedup the limiting envelope of TTD 1->4 cores?
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Time to depth measuring tool

Post by michiguel »

Laskos wrote:
michiguel wrote:
Assuming an Amdahl-like behavior:

S = (R + 1) * n / (R + n) eq.1

S = speed up, n = number of cores and R = Np/Ns, which is the ratio of parallelizable nodes/non-parallelizable (serial) nodes.

Then, Np and Ns can be approximated by

Np = f1 * F^(d-5) (exponential growth of parallelizable nodes from ply 5 an on)
Ns = f2 * G^(d-5) + f3 (exponential growth of serial nodes + fixed number existent at ply five)

where
F is the branching factor for the parallelizable nodes
G is the branching factor for the serial nodes
d is depth, which is substracted 5 since I do not use SMP in the last 4 plies.
f1,f2,f3 are proportionality parameters.

Then the ratio becomes
R = f1 * F^(d-4) / (f3 + f2 * G^(d-4))

dividing by f2
R = f1/f2 * F^(d-4) / (f3/f2 + G^(d-4))

if
a = f1/f2
b = f3/f2


R = a * F^(d-4) / (b + G^(d-4)) eq. 2

So plugging eq. 2 into eq. 1 will give an equation S = f(d).

In the plot Adam showed above the fitting was

a 0.57083499939221
F 2.5903162588137
G 2.14909239883083
b 25.1209749090981

Miguel
Miguel, isn't the TTD limited by NPS(4 cores)/NPS(1 core) as depth->infinity? I managed to get NPS speedup 1->4 cores for different depths with Houdini 3, first with LittleBlitzer for low depths, then with Arena for high depths. The NPS speedup curve is fitted very well with your model
a -> 1.52095, b -> 35.4931, F -> 2.3887, G -> 2.10453
Image

It seems that NPS speedup is always higher than TTD, approaching each other at high and very low depths. Isn't the NPS speedup the limiting envelope of TTD 1->4 cores?
Nice fit!

"Miguel, isn't the TTD limited by NPS(4 cores)/NPS(1 core) as depth->infinity?"

I don't know... but there is a chance that the answer is both... At infinity it is possible that NPS(4cores)/NPS(1 core) is actually 4, or most likely near 4 (e.g. 3.8-3.9 ish). There are two major reasons for having this lower than 4. a) some nodes near the PV and near the leaves are always analyzed serially and b) delays based on resource competition. At infinity, "a" will become negligible, but "b" won't be zero.

So, it will be interesting to plot NPS(4)/NPS(1) vs. depth to see if how it grows.

One thing is very clear, it is critical to take into account depth when this experiments are done, particularly if one is going to make comparisons.

Miguel
PS: Yes I meant d-5 :oops:
EDIT: Now I see that you plot NPS speedup, not speed up. So it means that the speedup curve is strongly influenced by the nps, which makes sense. In my example above, reason "a" seems to be dominant.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Time to depth measuring tool

Post by Laskos »

michiguel wrote:
Nice fit!

"Miguel, isn't the TTD limited by NPS(4 cores)/NPS(1 core) as depth->infinity?"

I don't know... but there is a chance that the answer is both... At infinity it is possible that NPS(4cores)/NPS(1 core) is actually 4, or most likely near 4 (e.g. 3.8-3.9 ish). There are two major reasons for having this lower than 4. a) some nodes near the PV and near the leaves are always analyzed serially and b) delays based on resource competition. At infinity, "a" will become negligible, but "b" won't be zero.

So, it will be interesting to plot NPS(4)/NPS(1) vs. depth to see if how it grows.

One thing is very clear, it is critical to take into account depth when this experiments are done, particularly if one is going to make comparisons.

Miguel
PS: Yes I meant d-5 :oops:
EDIT: Now I see that you plot NPS speedup, not speed up. So it means that the speedup curve is strongly influenced by the nps, which makes sense. In my example above, reason "a" seems to be dominant.
I plotted on the same graph the TTD speedup and NPS speedup 1->4 cores for Houdini 3. They seem to converge for very large depth or d->infinity. The lesson is that there is no independent of depth definition of neither TTD speedup, nor NPS speedup. So, the tests at ultra-short time controls don't give a picture of what parallelization does.

Image