I think the most efficient setup, is first to measure mean and variance of your chosen test metric very carefully for "MASTER" and then later use a simple "z-test" for "NEW" variants.
It has been already discussed, that you need a dedicated machine for that, since otherwise you will be seriously hampered by fluctuating variance.
mar wrote:I typically do something very simple: n runs for each version, pick the fastest one for each, then simply compare.
Not very scientific but works well for me.
In theory, with noisy observations, it's best to choose the median. It's a more robust statistic than the max. By the way, that's what I do in my engine (median of 5 runs).
The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.
syzygy wrote:
The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.
I was about to post that. The only problem with this is Turbo Boost. If you can disable it, that's probably best. If not, you can run your program a few times in a row. The first execution could be benefiting from Turbo Boost, so you can discard its measurement. Then pick the lowest running time of the lot.
syzygy wrote:The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.
I was about to post that. The only problem with this is Turbo Boost. If you can disable it, that's probably best. If not, you can run your program a few times in a row. The first execution could be benefiting from Turbo Boost, so you can discard its measurement. Then pick the lowest running time of the lot.
That has worked well for me in the past.
The lim sup should do
My desktop PC maintains turboboost speed for an indefinite period of time. The main problem is cpu scaling, but this can be overcome on Linux with "cpupower frequency-set -g performance". But running many benches in a row and taking the lim sup should also solve that.
What remains (apart from background processes) is cpu throttling on laptops. If the laptop heats up too much, it will clock down.
syzygy wrote:
My desktop PC maintains turboboost speed for an indefinite period of time. The main problem is cpu scaling, but this can be overcome on Linux with "cpupower frequency-set -g performance".
On Xeon boxes running Linux, the only approach that has proved reliable for me is to enable turboboost in BIOS but also to write 100 in the following file:
/sys/devices/system/cpu/intel_pstate/min_perf_pct
Finding decent documentation on Intel p-states was a challenge, though I haven't tried recently.