Reliable speed comparison: some math required

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy »

Kotlov wrote:I also noticed that the first test will be faster than the next one, it probably depends on the temperature of the processor.
Probably turboboost that only works for a short time.

It can also be exactly the other way around because of cpu scaling (it costs time to go from 1.2Ghz to 4.2Ghz).
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Reliable speed comparison: some math required

Post by lucasart »

syzygy wrote:Testing in parallel is only more noisy
Did you verify this hypothesis of yours with empirical data ? I suggest you try…
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
Kotlov
Posts: 266
Joined: Fri Jul 10, 2015 9:23 pm
Location: Russia

Re: Reliable speed comparison: some math required

Post by Kotlov »

I use something like this:
Image
For example, this picture is typical for a slight speed improvement.
(third column)
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: Reliable speed comparison: some math required

Post by BeyondCritics »

Kotlov wrote:I use something like this:
Image
For example, this picture is typical for a slight speed improvement.
(third column)
This looks extremely sophisticated :-)
Can you explain what exactly you are doing here? This would be of interest to other developers!
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: Reliable speed comparison: some math required

Post by BeyondCritics »

I think the most efficient setup, is first to measure mean and variance of your chosen test metric very carefully for "MASTER" and then later use a simple "z-test" for "NEW" variants.
It has been already discussed, that you need a dedicated machine for that, since otherwise you will be seriously hampered by fluctuating variance.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy »

lucasart wrote:
mar wrote:I typically do something very simple: n runs for each version, pick the fastest one for each, then simply compare.
Not very scientific but works well for me.
In theory, with noisy observations, it's best to choose the median. It's a more robust statistic than the max. By the way, that's what I do in my engine (median of 5 runs).
The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Reliable speed comparison: some math required

Post by AlvaroBegue »

syzygy wrote: The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.
I was about to post that. The only problem with this is Turbo Boost. If you can disable it, that's probably best. If not, you can run your program a few times in a row. The first execution could be benefiting from Turbo Boost, so you can discard its measurement. Then pick the lowest running time of the lot.

That has worked well for me in the past.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy »

AlvaroBegue wrote:
syzygy wrote:The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.
I was about to post that. The only problem with this is Turbo Boost. If you can disable it, that's probably best. If not, you can run your program a few times in a row. The first execution could be benefiting from Turbo Boost, so you can discard its measurement. Then pick the lowest running time of the lot.

That has worked well for me in the past.
The lim sup should do :)

My desktop PC maintains turboboost speed for an indefinite period of time. The main problem is cpu scaling, but this can be overcome on Linux with "cpupower frequency-set -g performance". But running many benches in a row and taking the lim sup should also solve that.

What remains (apart from background processes) is cpu throttling on laptops. If the laptop heats up too much, it will clock down.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Reliable speed comparison: some math required

Post by zullil »

syzygy wrote:
My desktop PC maintains turboboost speed for an indefinite period of time. The main problem is cpu scaling, but this can be overcome on Linux with "cpupower frequency-set -g performance".
On Xeon boxes running Linux, the only approach that has proved reliable for me is to enable turboboost in BIOS but also to write 100 in the following file:

/sys/devices/system/cpu/intel_pstate/min_perf_pct

Finding decent documentation on Intel p-states was a challenge, though I haven't tried recently.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy »

lucasart wrote:
syzygy wrote:Testing in parallel is only more noisy
Did you verify this hypothesis of yours with empirical data ? I suggest you try…
I took the liberty to create some noise:

Code: Select all

run       base       test     diff
  1    1929540    2652016  +722476
  2    1929540    2657409  +727869
  3    2645305    1925985  -719320
  4    2670988    2639961   -31027
  5    2665540    2669624    +4084
  6    2625376    2666900   +41524
  7    2604446    2604446       +0
  8    2673720    2664181    -9539
  9    2637297    2006573  -630724
 10    1930965    2662824  +731859
 11    2670988    2670988       +0
 12    2672353    2677829    +5476
 13    2668261    2672353    +4092
 14    2660113    2666900    +6787
 15    2670988    2639961   -31027
 16    2444866    2460981   +16115
 17    2574937    2656058   +81121
 18    1926695    2653362  +726667
 19    1921736    2669624  +747888
 20    1926695    2656058  +729363

Result of  20 runs
==================
base (cfish          ) =    2422517  +/- 147234
test (cfish          ) =    2578702  +/- 94082
diff                   =    +156184  +/- 191677

speedup        = +0.0645
P(speedup > 0) =  0.9446

CPU: 6 x Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
Hyperthreading: on