Reliable speed comparison: some math required

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Reliable speed comparison: some math required

Post by mcostalba »

I don't have math skills to find a correct answer to this, so I post here for help.

The problem is to measure if a non-functional speed optimization (NEW) is faster than current (MASTER):

Running a single speed bench test, as everybody knows, gives very noise information, able to detect only the most grossly differences in speed, but when speed is similar (as in real cases) this is not reliable.

We can run more than one test and average the results, but how much is the speed result reliable? So the problem is:

Given N noisy measures of speed of NEW and N noisy measures of MASTER, we want to know:

1. If NEW is faster than MASTER with 95% reliability

2. Maximum speed up of NEW, defined as (Speed NEW - Speed Master) / Speed MASTER that is guaranteed with 95% reliability

The second point needs a clarification. Suppose NEW is faster then MASTER of 0.5% with 95% guarantee), then it will be faster also of at least 0,4% and lower. On the contrary it will not be faster than 0.8% with 95% reliability, maybe just with 30% reliability. We want to find the max speed-up (in this case 0.5%) that has 95% reliability.

This is what I have cooked up myself:

Code: Select all

Suppose we have N measures and we sum them:

SUM A = N * SA + sum(1..N of nai)   // Where SA is speed of A, nai is a noise added to the i-th measure
SUM B = N * SB + sum(1..N of nbi)

SUM A - SUM B = N * (SA - SB) + sum(1..N of nai-nbi)

Now we want to know when SUM A - SUM B > 0, assuming nai are guassian noises, so...

SUM A - SUM B > 0   =>   SA - SB > sum(1...N of nbi-nai) / N
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Reliable speed comparison: some math required

Post by lucasart »

This is already in pyshbench:
https://github.com/lucasart/pyshbench/b ... /pyshbench

But the problem is not the maths. The math is correct, but relies on a fundamental assumption (behind the CL theorem used): that** the observations are drawn from independent and identically distributed random variables**

This is not the case in reality, if you run the benches sequentially (and that is easily proven with proper statistical tests). That is because external noise of OS and other processes pollute too much. The best you can do is run benches in parralel: for each run, do "master bench" and "test bench" concurrently, so that external noise is (if not absent) impacting both master and test the same. Of course it's never good enough, but it's the best you can do in practice. This is what pyshbench does already.

So the limiting factor is never the statistical precision (very high), it's the practical instability of the machine, iow its inadequation to the theoretical model.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Reliable speed comparison: some math required

Post by mcostalba »

What about running a longer bench? Same time but allocated on fewer but longer benches?
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Reliable speed comparison: some math required

Post by lucasart »

And of course, there's another problem. That even with the most perfect and precise measures, different machines and/or compilers really will give different speedups.

I know I'm stating the obvious. But this is really important. At least as important as understanding the math behind the formulas.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Reliable speed comparison: some math required

Post by lucasart »

mcostalba wrote:What about running a longer bench? Same time but allocated on fewer but longer benches?
That might help. But needs to be confirmed with data.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
Kotlov
Posts: 266
Joined: Fri Jul 10, 2015 9:23 pm
Location: Russia

Re: Reliable speed comparison: some math required

Post by Kotlov »

I also noticed that the first test will be faster than the next one, it probably depends on the temperature of the processor.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Reliable speed comparison: some math required

Post by lucasart »

Kotlov wrote:I also noticed that the first test will be faster than the next one, it probably depends on the temperature of the processor.
Quite possibly. But this is already neutralized by the concurrent testing technique deployed in pyshbench, that i explained above.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Reliable speed comparison: some math required

Post by mar »

I typically do something very simple: n runs for each version, pick the fastest one for each, then simply compare.
Not very scientific but works well for me.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy »

Testing in parallel is only more noisy, since the different cores will not be affected equally by OS noise and other processes.

In my view:
- disable cpu scaling if possible.
- kill as many background processes as possible.
- run a single bench at a time with SF bound to a particular core (use taskset on Linux).
- do a number of runs and take the fastest (= least disturbed by noise).
- if not most of the runs are close to the fastest, the system is too noisy and you have to find the background process that causes the trouble.
- there may be problems with cpu throttling on laptops.

If you can't disable cpu scaling, run two benches from a batch file and take the second result.

The problem remains that speed differs per architecture and per compiler. Also insignificant code changes can unpredictably affect speed in both directions for no good reason (due to cache alignment).
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Reliable speed comparison: some math required

Post by lucasart »

mar wrote:I typically do something very simple: n runs for each version, pick the fastest one for each, then simply compare.
Not very scientific but works well for me.
In theory, with noisy observations, it's best to choose the median. It's a more robust statistic than the max. By the way, that's what I do in my engine (median of 5 runs).
Last edited by lucasart on Tue Feb 27, 2018 10:31 am, edited 1 time in total.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.