TalkChess.com

Posted: **Wed Jul 03, 2013 7:04 am**

There seems to be no agreement yet regarding how to measure SMP scalability in computer chess. I can think of 3 things that could be used to measure scalability.

NPS-wise
This seems to be the most simple and obvious one. It simply measures the CPU utilization of the algorithm used. The problem with this is that it doesn't take into account if the amount of work being measured is just redundant in the case of a shared hash table implementation.

Time to depth
This measures the capability of the SMP algorithm to search efficiently. This is a good measure, better than just NPS-wise, but it will only be most accurate if the algorithm traversed the same search tree as the serial one. In the case of modern chess engines, SMP algorithm doesn't search the same tree as the serial one, some prunes more, and some prunes less when in SMP for example.

In some cases, in order to gain better performance, one can widen the search a bit when using SMP to compensate for excessive pruning/reduction. This will improve performance for the reason that at higher depths there is not much compensation for searching deeper, it is better to search wider and not miss critical variations.

Performance/Elo based
This seems to be the most logical measurement to me since SMP algorithm in modern chess engines is more complex to be measured by just "Time to depth" and "NPS-wise".

Posted: **Wed Jul 03, 2013 1:10 pm**

Edsel Apostol wrote:There seems to be no agreement yet regarding how to measure SMP scalability in computer chess. I can think of 3 things that could be used to measure scalability.

NPS-wise
Time to depth
Performance/Elo based

This dilemma doesn't exist from the user point of view.

Posted: **Wed Jul 03, 2013 1:56 pm**

Nps scaling - Is a good measurement to see if you are not loosing speed to caching and synchronization problems etc. Ideally it should be perfect and in general can be used as an upper bound on expected time scaling.

Time scaling - Ratios of times taken to complete a given work. It is usually assumed work~f(depth) because usually you don't change search algorithms. Otherwise work~=f(depth,width,....,stopping_time,tilt_of_the_moon etc..). You can roughly make sure same work is being done by stopping when certain number of nodes are searched hence work~=f(nodes) may be a better measure when algorithms differ.

Elo based - A no-brainer.

There is no point to debate this because I am sure every one knows how to measure things properly if confronted with a certain situation.

Posted: **Wed Jul 03, 2013 3:50 pm**

Edsel Apostol wrote: Time to depth
This measures the capability of the SMP algorithm to search efficiently. This is a good measure, better than just NPS-wise, but it will only be most accurate if the algorithm traversed the same search tree as the serial one. In the case of modern chess engines, SMP algorithm doesn't search the same tree as the serial one, some prunes more, and some prunes less when in SMP for example.
.

What I like here is to use the following formula:
t(n) = W/n + C.
n is the number of threads;
t(n) the time spent
Once you have measured a few t(n) for various number of n (threads), you can compute W and C, and get an idea of the scalability of your program. Indeed:
W+C = t(1)
C = t(infinity)
C/(W+C) = theoretical upper limit of achievable acceleration.

Edsel Apostol wrote: Performance/Elo based

Elo performance is related to speed increase, so that doubling the speed is about a 50 Elo improvement. When dealing with speed improvement, which is the only thing SMP provides, Elo improvement is not a necessary tool, IMHO.

Posted: **Wed Jul 03, 2013 4:13 pm**

abulmo wrote:When dealing with speed improvement, which is the only thing SMP provides

Edsel's question essentially is: how do we measure increase in "speed" due to SMP.

What is "speed"?

NPS is speed, but it does not tell everything.

Time-to-depth is speed, but it does not tell everything. See the "General Topics" forum where this has been discussed at too much length in two threads.

Time to a certain quality of search (basically Elo) is speed and obviously the most relevant speed (unless one is interested in a particular aspect of the SMP implementation, e.g. synchronisation issues are more directly related to NPS increase).

Posted: **Wed Jul 03, 2013 4:57 pm**

abulmo wrote:
Edsel Apostol wrote: Time to depth
This measures the capability of the SMP algorithm to search efficiently. This is a good measure, better than just NPS-wise, but it will only be most accurate if the algorithm traversed the same search tree as the serial one. In the case of modern chess engines, SMP algorithm doesn't search the same tree as the serial one, some prunes more, and some prunes less when in SMP for example.
.
What I like here is to use the following formula:
t(n) = W/n + C.
n is the number of threads;
t(n) the time spent
Once you have measured a few t(n) for various number of n (threads), you can compute W and C, and get an idea of the scalability of your program. Indeed:
W+C = t(1)
C = t(infinity)
C/(W+C) = theoretical upper limit of achievable acceleration.

Unfortunately, it is not that simple, because the Amdahl law (which is what you imply) does not apply to chess engines. That assumes the same work to be done for the parallel and single thread programs. We do not have that. The parallel program explores different branches, many times increasing the work adding extra inefficiency, and other times finding "short cuts".

That is the reason why the data does not fit well the Amdahl's formula.

Miguel

Edsel Apostol wrote: Performance/Elo based
Elo performance is related to speed increase, so that doubling the speed is about a 50 Elo improvement. When dealing with speed improvement, which is the only thing SMP provides, Elo improvement is not a necessary tool, IMHO.

Posted: **Wed Jul 03, 2013 5:02 pm**

michiguel wrote:
abulmo wrote:
Edsel Apostol wrote: Time to depth
This measures the capability of the SMP algorithm to search efficiently. This is a good measure, better than just NPS-wise, but it will only be most accurate if the algorithm traversed the same search tree as the serial one. In the case of modern chess engines, SMP algorithm doesn't search the same tree as the serial one, some prunes more, and some prunes less when in SMP for example.
.
What I like here is to use the following formula:
t(n) = W/n + C.
n is the number of threads;
t(n) the time spent
Once you have measured a few t(n) for various number of n (threads), you can compute W and C, and get an idea of the scalability of your program. Indeed:
W+C = t(1)
C = t(infinity)
C/(W+C) = theoretical upper limit of achievable acceleration.

Unfortunately, it is not that simple, because the Amdahl law (which is what you imply) does not apply to chess engines. That assumes the same work to be done for the parallel and single thread programs. We do not have that. The parallel program explores different branches, many times increasing the work adding extra inefficiency, and other times finding "short cuts".

That is the reason why the data does not fit well the Amdahl's formula.

Miguel

Edsel Apostol wrote: Performance/Elo based
Elo performance is related to speed increase, so that doubling the speed is about a 50 Elo improvement. When dealing with speed improvement, which is the only thing SMP provides, Elo improvement is not a necessary tool, IMHO.

There was a discussion about it a while ago:
http://www.talkchess.com/forum/viewtopi ... 40&t=36082

There I made an attempt to model the behavior, but we did not follow that up.

Miguel

Posted: **Wed Jul 03, 2013 5:05 pm**

Would it help to define something like an "effective SMP scaling factor" as follows:

- Start with single-CPU version of a candidate engine, play games against a fixed set of opponents with a given time control and obtain an ELO rating.
- Repeat with N CPUs for the candidate engine but with unchanged number of CPUs for the opponents, and measure the ELO improvement X.
- Now find out how much faster the 1 CPU version would need to be for the same ELO improvement X. (You would usually give more time to the candidate engine than to its opponents to simulate that.) The speed factor you need to reach X is M.
- Then M/N would be the "effective SMP scaling factor" of the candidate engine for N CPUs.

Does that make sense to anyone?

Sven

Posted: **Wed Jul 03, 2013 5:12 pm**

Sven Schüle wrote:Would it help to define something like an "effective SMP scaling factor" as follows:

- Start with single-CPU version of a candidate engine, play games against a fixed set of opponents with a given time control and obtain an ELO rating.
- Repeat with N CPUs for the candidate engine but with unchanged number of CPUs for the opponents, and measure the ELO improvement X.
- Now find out how much faster the 1 CPU version would need to be for the same ELO improvement X. (You would usually give more time to the candidate engine than to its opponents to simulate that.) The speed factor you need to reach X is M.
- Then M/N would be the "effective SMP scaling factor" of the candidate engine.

Does that make sense to anyone?

Sven

Absolutely Yes, that is the perfect way to express it. It could also be either Effective/Equivalent/Apparent Speed Up.

The question of how much faster the 1x engine should be, must be answered with an extrapolation from at least two points, one faster and one slower than what is required.

Miguel

Miguel

Posted: **Wed Jul 03, 2013 5:18 pm**

michiguel wrote: The question of how much faster the 1x engine should be, must be answered with an extrapolation from at least two points, one faster and one slower than what is required.

The word you are looking for is "interpolation".

TalkChess.com

Measure of SMP scalability

Measure of SMP scalability

Re: Measure of SMP scalability

Re: Measure of SMP scalability

Re: Measure of SMP scalability

Re: Measure of SMP scalability

Re: Measure of SMP scalability

Re: Measure of SMP scalability

Re: Measure of SMP scalability

Re: Measure of SMP scalability

Re: Measure of SMP scalability