Strange sporadic speed limitation in engine running in Linux on Ryzen

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

RubiChess
Posts: 584
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by RubiChess »

Deberger wrote: Sun Mar 08, 2020 9:54 pm Apparently I misunderstood the error you are reporting.

A copy of a binary, executed on the same machine, executed in the same way, has consistently differing results?

I would compare the file sizes and file ownerships and file permissions and md5sums.

If everything is the same I would backup any valuable data and check the file system with fsck.
I haven't checked md5 but a reboot cures the problem so I don't believe in problems of the disk or the file.
RubiChess
Posts: 584
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by RubiChess »

bob wrote: Sun Mar 08, 2020 9:41 pm Very first thing. Run "top" and keep it active. See if, during the match, the cpu utilization jumps up due to something in your linux distro. I had this happen to me years ago in Suse, which I always considered to be overloaded/bloated anyway. If you don't want to watch top, you might try this:

#!/bin/csh
while (1)
date >>logfile
ps -r | head 10 >>log file
sleep 60
end

When you find a strange slowdown, look in the log file for that time-frame and see if something unexpected is going on.
I looked at top interactively running in a second terminal once and I couldn't see "anything special".
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by bob »

Don't know about your processor, but I have had to ALWAYS turn off "turbo-boost" on intel processors. This lets all cores run at the max rated (non-boosted) clock speed so that processors won't change speed during a test. Every high-performance machine we run at UAB has had this disabled.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by mvanthoor »

bob wrote: Tue Mar 10, 2020 6:57 pm Don't know about your processor, but I have had to ALWAYS turn off "turbo-boost" on intel processors. This lets all cores run at the max rated (non-boosted) clock speed so that processors won't change speed during a test. Every high-performance machine we run at UAB has had this disabled.
Intel actually has two speed modes: EIST (Speedstep) and Turbo Boost.

On my CPU, Speedstep changes the multiplier from 8 to 40, which makes the CPU run at 800 MHz up to 4000 GHz. If I disable this, the CPU always runs at 4 GHz.

Turbo Boost, by default, boosts a single core to 4.2 GHz, if the other cores are lower than a certain load.

My mainboard has several options for handling this kind of speed change:

- Off: Never boost anything.
- Default: Boost only one core if there's one thread that requires a lot of performance, and the other cores are only lightly loaded.
- Multi: Boost 1-4 cores up to 4.2 GHz if needed.

When disabling Speedstep and enabling Multi for Turbo Boost, the CPU will run all of its cores at 4.2 GHz.

Even if you disable Turbo Boost, it's not a guarantee that an Intel CPU will never change speeds if you keep EIST/Speedstep enabled.

Personally (under Windows) I never had a problem with this. The mainboard is set to have EIST/Speedstep enabled, and Turbo Boost is set to Multi. If I run a chess engine at 1-4 threads, then 1-4 cores will boost to 4.2 GHz. If I don't run anything, the CPU runs at 800 MHz. to be honest, I've never tested this under Linux, because my Linux usage has always been for embedded systems.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by Alayan »

Variable turbo speed means more noise in the results. The increased throughput of turbo isn't worth it compared to a high fixed clock when it comes to engine testing.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by bob »

Two things here.

(1) if I am testing, I am going to do my best to make the entire test run at the same constant CPU speed. No turbo/speedstep whatsoever.

(2) in a game, I would use whatever provides the best overall average speed. With a chess engine, running multiple cores on a single chip, most likely it is going to settle in to no turbo/speedstep anyway, since all cores will be 100% busy, keeping temps high. But they can and will fiddle up and down, which makes testing include a little random noise.

The biggest problem I have had is trying to measure parallel speedup. Run using 1 core, then 2, then 4... As you ramp up above one, you start to see speed degradation as core speeds are throttled back. Without your knowing it. So maybe your raw NPS scales 360% at 4 cores and you start trying to debug to see what locks or cache invalidation traffic is causing the problem. Answer can be "none of the above".

So this is really much more important when testing speeds and efficiency. Less important when running lots of games in parallel, knowing there is some random noise tossed in. Unimportant if all you do is play games, one at a time, and want optimal performance with lots of random noise included.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by syzygy »

RubiChess wrote: Sat Mar 07, 2020 5:26 pm It isn't a bad compilation cause if I copy the slow binary and run the copy, speed is fine. So it seems that "something in the Linux system" slows down the binary. I tried "lsof" to see if there is a process running with a handle on the slow binary but without success.

Last time it happened, a reboot cured the slowness. This time the system is still running and waiting for your ideas to analyse the problem...
Does the problem go away after "echo 3 > /proc/sys/vm/drop_caches" as root?
RubiChess
Posts: 584
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by RubiChess »

syzygy wrote: Thu Mar 12, 2020 9:04 pm
RubiChess wrote: Sat Mar 07, 2020 5:26 pm It isn't a bad compilation cause if I copy the slow binary and run the copy, speed is fine. So it seems that "something in the Linux system" slows down the binary. I tried "lsof" to see if there is a process running with a handle on the slow binary but without success.

Last time it happened, a reboot cured the slowness. This time the system is still running and waiting for your ideas to analyse the problem...
Does the problem go away after "echo 3 > /proc/sys/vm/drop_caches" as root?
Thanks for helping.
I just had another test with a slow binary and indeed this "echo 3 > /proc/sys/vm/drop_caches" seems to (almost) cure the problem:

Code: Select all

$ ./5e75296cceb27470d0e986e8d886688a78887064 -bench -depth 23 > /dev/null 

Overall:                                                       30.567137 sec.   62572010 nodes    2047035 nps

$ cp 5e75296cceb27470d0e986e8d886688a78887064 5e-copy
$ ./5e-copy -bench -depth 23 > /dev/null 

Overall:                                                       29.015491 sec.   62572010 nodes    2156503 nps

$ ./5e75296cceb27470d0e986e8d886688a78887064 -bench -depth 23 > /dev/null 

Overall:                                                       30.556252 sec.   62572010 nodes    2047764 nps

$ echo 3 > /proc/sys/vm/drop_caches
$ ./5e75296cceb27470d0e986e8d886688a78887064 -bench -depth 23 > /dev/null 

Overall:                                                       28.989655 sec.   62572010 nodes    2158425 nps

$ ./5e-copy -bench -depth 23 > /dev/null 

Overall:                                                       28.822634 sec.   62572010 nodes    2170933 nps

So the slow original binary got from 2.04 mnps to 2.15 mnps after this "drop_caches" while the fast copy went from 2.15 mnps to even faster 2.17mnps (which may be inside normal error margin).

So the next question to the Linux expert will be: Is there any global setting that prevents this caching causing biased binary speed?
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by syzygy »

RubiChess wrote: Wed Mar 18, 2020 1:54 pmSo the slow original binary got from 2.04 mnps to 2.15 mnps after this "drop_caches" while the fast copy went from 2.15 mnps to even faster 2.17mnps (which may be inside normal error margin).
My explanation for what you are seeing is that the binary is sometimes loaded into memory in a way that gives poor L1 caching behaviour.

This may happen when too many important memory cache lines are at a distance that is a multiple of some power of 2, so that they are all mapped to the same few L1 cache lines.
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: Strange sporadic speed limitation in engine running in Linux on Ryzen

Post by Sesse »

syzygy's explanation makes sense, and it's easy to see if it holds true or not. Run the binary with perf stat -d, and observe the ratio of L1 dcache misses to total accesses. If it's much higher in the slow runs, it's likely that you're seeing an L1 cache aliasing effect (bank conflicts).