http://www.brendangregg.com/perf.html#CPUstatistics
This could be the final weapon! In particular look at the table in the above link:
Code: Select all
5,649,595,479 cycles
See also https://perf.wiki.kernel.org/index.php/Tutorial
Moderators: hgm, Rebel, chrisw
Code: Select all
5,649,595,479 cycles
Code: Select all
$ taskset -c 4 cfish bench 2>&1 >/dev/null | grep second
Code: Select all
Nodes/second : 2775955
Nodes/second : 2778906
Nodes/second : 2777430
Nodes/second : 2778906
Nodes/second : 2780385
Nodes/second : 2777430
Nodes/second : 2778906
Nodes/second : 2780385
Nodes/second : 2777430
Nodes/second : 2775955
Nodes/second : 2777430
Nodes/second : 2777430
And ?syzygy wrote:I took the liberty to create some noise:lucasart wrote:Did you verify this hypothesis of yours with empirical data ? I suggest you try…syzygy wrote:Testing in parallel is only more noisyCode: Select all
run base test diff 1 1929540 2652016 +722476 2 1929540 2657409 +727869 3 2645305 1925985 -719320 4 2670988 2639961 -31027 5 2665540 2669624 +4084 6 2625376 2666900 +41524 7 2604446 2604446 +0 8 2673720 2664181 -9539 9 2637297 2006573 -630724 10 1930965 2662824 +731859 11 2670988 2670988 +0 12 2672353 2677829 +5476 13 2668261 2672353 +4092 14 2660113 2666900 +6787 15 2670988 2639961 -31027 16 2444866 2460981 +16115 17 2574937 2656058 +81121 18 1926695 2653362 +726667 19 1921736 2669624 +747888 20 1926695 2656058 +729363 Result of 20 runs ================== base (cfish ) = 2422517 +/- 147234 test (cfish ) = 2578702 +/- 94082 diff = +156184 +/- 191677 speedup = +0.0645 P(speedup > 0) = 0.9446 CPU: 6 x Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz Hyperthreading: on
Same comment as my previous post. This data proves nothing.syzygy wrote:Without background noise and running just a single bench at a time:Results:Code: Select all
$ taskset -c 4 cfish bench 2>&1 >/dev/null | grep second
So max speed is 2780385.Code: Select all
Nodes/second : 2775955 Nodes/second : 2778906 Nodes/second : 2777430 Nodes/second : 2778906 Nodes/second : 2780385 Nodes/second : 2777430 Nodes/second : 2778906 Nodes/second : 2780385 Nodes/second : 2777430 Nodes/second : 2775955 Nodes/second : 2777430 Nodes/second : 2777430
It has been verified MANY times in the past. Even worse, testing in parallel can result in some CPUS throttling back due to heat or power consumption, which adds MORE noise.lucasart wrote:Did you verify this hypothesis of yours with empirical data ? I suggest you try…syzygy wrote:Testing in parallel is only more noisy
What's wrong with the simple approach?mcostalba wrote:Given N noisy measures of speed of NEW and N noisy measures of MASTER, we want to know:
1. If NEW is faster than MASTER with 95% reliability
2. Maximum speed up of NEW, defined as (Speed NEW - Speed Master) / Speed MASTER that is guaranteed with 95% reliability
The second point needs a clarification. Suppose NEW is faster then MASTER of 0.5% with 95% guarantee), then it will be faster also of at least 0,4% and lower. On the contrary it will not be faster than 0.8% with 95% reliability, maybe just with 30% reliability. We want to find the max speed-up (in this case 0.5%) that has 95% reliability.
Code: Select all
for (int i = 0; i < 110; i++) {
v[i] -= v[i + 1];
if ((i & 1) == 0)
v[i] = -v[i];
}
Code: Select all
sudo perf stat -r 5 -a -B -e cycles:u ./stockfish bench > /dev/null
Code: Select all
Performance counter stats for 'system wide' (5 runs):
11.694.771.724 cycles:u ( +- 0,08% )
4,370493245 seconds time elapsed ( +- 0,23% )
I have googled it, but it appears you can not disable virtual memory in Linux. What kernel did you run?bob wrote:...
If you want to go as far as I did, run a lightweight kernel. No virtual memory or anything there, rock-solid repeatability.
Certainly isn't math. It takes a little thought and preparation to get good benchmarking results...Jouni wrote:I think the problem is not math, but too many things to consider: linux/windows, AMD/Intel, popcount, bmi, 64 bit single/multi etc!?