Reliable speed comparison: some math required

syzygy · Post by **syzygy** » Tue Feb 27, 2018 10:27 am

Kotlov wrote:I also noticed that the first test will be faster than the next one, it probably depends on the temperature of the processor.

Probably turboboost that only works for a short time.

It can also be exactly the other way around because of cpu scaling (it costs time to go from 1.2Ghz to 4.2Ghz).

lucasart · Post by **lucasart** » Tue Feb 27, 2018 10:29 am

syzygy wrote:Testing in parallel is only more noisy

Did you verify this hypothesis of yours with empirical data ? I suggest you try…

Kotlov · Post by **Kotlov** » Tue Feb 27, 2018 10:54 am

I use something like this:

For example, this picture is typical for a slight speed improvement.
(third column)

BeyondCritics · Post by **BeyondCritics** » Tue Feb 27, 2018 3:33 pm

Kotlov wrote:I use something like this:

For example, this picture is typical for a slight speed improvement.
(third column)

This looks extremely sophisticated

Can you explain what exactly you are doing here? This would be of interest to other developers!

BeyondCritics · Post by **BeyondCritics** » Tue Feb 27, 2018 3:55 pm

I think the most efficient setup, is first to measure mean and variance of your chosen test metric very carefully for "MASTER" and then later use a simple "z-test" for "NEW" variants.
It has been already discussed, that you need a dedicated machine for that, since otherwise you will be seriously hampered by fluctuating variance.

syzygy · Post by **syzygy** » Tue Feb 27, 2018 8:02 pm

lucasart wrote:
mar wrote:I typically do something very simple: n runs for each version, pick the fastest one for each, then simply compare.
Not very scientific but works well for me.
In theory, with noisy observations, it's best to choose the median. It's a more robust statistic than the max. By the way, that's what I do in my engine (median of 5 runs).

The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.

AlvaroBegue · Post by **AlvaroBegue** » Tue Feb 27, 2018 8:10 pm

syzygy wrote: The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.

I was about to post that. The only problem with this is Turbo Boost. If you can disable it, that's probably best. If not, you can run your program a few times in a row. The first execution could be benefiting from Turbo Boost, so you can discard its measurement. Then pick the lowest running time of the lot.

That has worked well for me in the past.

syzygy · Post by **syzygy** » Tue Feb 27, 2018 8:42 pm

AlvaroBegue wrote:
syzygy wrote:The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.
I was about to post that. The only problem with this is Turbo Boost. If you can disable it, that's probably best. If not, you can run your program a few times in a row. The first execution could be benefiting from Turbo Boost, so you can discard its measurement. Then pick the lowest running time of the lot.

That has worked well for me in the past.

The lim sup should do

My desktop PC maintains turboboost speed for an indefinite period of time. The main problem is cpu scaling, but this can be overcome on Linux with "cpupower frequency-set -g performance". But running many benches in a row and taking the lim sup should also solve that.

What remains (apart from background processes) is cpu throttling on laptops. If the laptop heats up too much, it will clock down.

zullil · Post by **zullil** » Tue Feb 27, 2018 9:05 pm

syzygy wrote:
My desktop PC maintains turboboost speed for an indefinite period of time. The main problem is cpu scaling, but this can be overcome on Linux with "cpupower frequency-set -g performance".

On Xeon boxes running Linux, the only approach that has proved reliable for me is to enable turboboost in BIOS but also to write 100 in the following file:

/sys/devices/system/cpu/intel_pstate/min_perf_pct

Finding decent documentation on Intel p-states was a challenge, though I haven't tried recently.

syzygy · Post by **syzygy** » Tue Feb 27, 2018 9:41 pm

lucasart wrote:
syzygy wrote:Testing in parallel is only more noisy
Did you verify this hypothesis of yours with empirical data ? I suggest you try…

I took the liberty to create some noise:

Code: Select all

run       base       test     diff
  1    1929540    2652016  +722476
  2    1929540    2657409  +727869
  3    2645305    1925985  -719320
  4    2670988    2639961   -31027
  5    2665540    2669624    +4084
  6    2625376    2666900   +41524
  7    2604446    2604446       +0
  8    2673720    2664181    -9539
  9    2637297    2006573  -630724
 10    1930965    2662824  +731859
 11    2670988    2670988       +0
 12    2672353    2677829    +5476
 13    2668261    2672353    +4092
 14    2660113    2666900    +6787
 15    2670988    2639961   -31027
 16    2444866    2460981   +16115
 17    2574937    2656058   +81121
 18    1926695    2653362  +726667
 19    1921736    2669624  +747888
 20    1926695    2656058  +729363

Result of  20 runs
==================
base &#40;cfish          ) =    2422517  +/- 147234
test &#40;cfish          ) =    2578702  +/- 94082
diff                   =    +156184  +/- 191677

speedup        = +0.0645
P&#40;speedup > 0&#41; =  0.9446

CPU&#58; 6 x Intel&#40;R&#41; Core&#40;TM&#41; i7-3930K CPU @ 3.20GHz
Hyperthreading&#58; on

Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required

Re: Reliable speed comparison: some math required