I don't have my machine yet (though I could probably test it on the one I bought for my youngest son, but then I would have to set up the whole compiler infrastructure and it's just a Windows box, etc.) but I think that the new AMD 3xxx CPUs are probably having most binaries compiled badly, though not intentionally.
I guess that most people are compiling with -march=native which is possibly a bad idea for these CPUs. I noticed, for instance, on the Phoronix tests, the Stockfish performance dropped (significantly) when the -march=native switch was chosen. My theory of the problem goes as follows:
The GCC compiler probably does a simple inquiry on the native machine to understand the available instruction set and uses it if possible. In the example of AMD 3xxx, this is probably a bad idea because the AVX BMI compiles are not as fast as the SSE3 compiles. So I suspect that with any third generation AMD CPU, the right choice for architecture is:
-march=znver2
If anyone has one setup already, I would be interested to know if my hunch is correct.
An AMD compiling hunch
Moderators: hgm, Rebel, chrisw
-
- Posts: 12545
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
An AMD compiling hunch
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 433
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: An AMD compiling hunch
You can check what options are activated with the following commands:
On my computer, (a first generation ryzen cpu), native and znver1 are identical. And the executable is significantly faster than with -march=generic.
Code: Select all
gcc -march=native -Q --help=target
gcc -march=znver2 -Q --help=target
Richard Delorme
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: An AMD compiling hunch
It isn't really GCC's fault. Stockfish simply goes and asks the compiler (well, preprocessor) “does the CPU I'm compiling for support BMI2?”, and if so, sends BMI2 code to the compiler. GCC has no say in the matter once it has given a truthful answer. (Not that there's a good way to ask “does the CPU I'm compiling for have _fast_ BMI2?”…)
-
- Posts: 4569
- Joined: Sun Mar 12, 2006 2:40 am
- Full name:
Re: An AMD compiling hunch
If you compile with -modern instead of -BMI2 is that not the simplest, for Stockfish I mean? You avoid the bad PEXT/PDEP instructions that Gian-Carlo mentions in another post here, they are implemented (in ZEN I suppose, not all recent AMD processors?) but Stockfish has better solution doing it with software, and you don't confuse the compiler. It has worked so far with other AMD processors.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
-
- Posts: 12545
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: An AMD compiling hunch
Yes, the modern flag makes the fastest code, apparently.
I suspect that the CPU does have other capabilities that native would enable.
Too bad there is a phony-baloney PEXT in the mix.
I suspect that the CPU does have other capabilities that native would enable.
Too bad there is a phony-baloney PEXT in the mix.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 61
- Joined: Wed Feb 19, 2014 10:11 pm
Re: An AMD compiling hunch
Are we also going to call instructions like XLAT "phony baloney" also because it perform poorly (even on Intel?)
The idea that GCC doesnt have a say is wrong. The goal of the compiler should be to produce the fastest binary given the information it has. If the information spawns from the "native" switch, then it should be doing more than just asking what instruction sets are supported, and the fact that it doesnt is proof that its not a very good optimizing compiler which is why asmfish beats it significantly.
The idea that GCC doesnt have a say is wrong. The goal of the compiler should be to produce the fastest binary given the information it has. If the information spawns from the "native" switch, then it should be doing more than just asking what instruction sets are supported, and the fact that it doesnt is proof that its not a very good optimizing compiler which is why asmfish beats it significantly.
-
- Posts: 2564
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: An AMD compiling hunch
if you tell the compiler to use pext via an intrinsic, then it will. that's what intrinsics are forDustyMonkey wrote: ↑Thu Dec 12, 2019 5:54 am The idea that GCC doesnt have a say is wrong. The goal of the compiler should be to produce the fastest binary given the information it has. If the information spawns from the "native" switch, then it should be doing more than just asking what instruction sets are supported, and the fact that it doesnt is proof that its not a very good optimizing compiler which is why asmfish beats it significantly.
gcc is still the best optimizing compiler out there on average, whether you like it or not
the fact that hand-optimized assembly performs better shouldn't be surprising (how much faster is asmfish really, 30%?),
but the cost comes with a lot of extra effort involved (and always lagging behind latest SF patches)
considering that even the difference among the best optimizing compilers themselves can be around 5-10%, this is not a bad result at all
-
- Posts: 12545
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: An AMD compiling hunch
I expect at some time AMD will put in real PEXT type BMI instructions.
And I can control which things to include or exclude manually for now.
No big deal either way.
And I can control which things to include or exclude manually for now.
No big deal either way.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 172
- Joined: Thu May 27, 2010 3:32 am
Re: An AMD compiling hunch
Sorry to ask a dumb question, but how could I compile Stockfish with this flag? I only know the typical ways to do -bmi2 and -modern.
-
- Posts: 12545
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: An AMD compiling hunch
-bmi uses pext
-modern only requires SSE instructions
-modern only requires SSE instructions
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.