Page 1 of 6

Re: Ryzen 2 and BMI2?

Posted: Tue May 15, 2018 3:27 pm
by Sesse
I wanted to use PEXT for a branchless UTF-8 parser, but unfortunately, the instruction is too slow for it to be a win over straight-up code. (I know others have tried and come to the same conslusion.)

Re: Ryzen 2 and BMI2?

Posted: Tue May 15, 2018 4:10 pm
by Gian-Carlo Pascutto
syzygy wrote: Tue May 15, 2018 9:15 am Please make Sjeng use PEXT ;-)
There's no way to have the instruction inferred from pure C code, is there? That would make it annoying to use in a portable benchmark.

Re: Ryzen 2 and BMI2?

Posted: Tue May 15, 2018 5:43 pm
by Sesse
No, you'd have to use an intrinsic or inline assembler. The former is fairly portable across compilers; at least MSVC, GCC, Clang and ICC all tend to support the Intel intrinsic style (_pext_u64 in this case) with some coaxing.

Re: Ryzen 2 and BMI2?

Posted: Tue May 15, 2018 7:52 pm
by Joost Buijs
Gian-Carlo Pascutto wrote: Tue May 15, 2018 9:04 am Ryzen 1 was really fast at BMI2, it was just slow at a single instruction, i.e. PEXT.

I wouldn't expect this to change. Nothing uses PEXT, aside from some chess engine movegens.
PEXT and his counterparty PDEP are both incredible slow on AMD Zen hardware because AMD was lazy and implemented these instructions in microcode instead of logic.

On intel processors you can really make very good use of PEXT in your evaluation function, for instance to index pawn patterns (or any other pattern) in a very fast way. In the pawn evaluator I'm currently working on I use PEXT throughout, using PEXT it runs about twice as fast as what I can get without using PEXT, unfortunately this doesn't work on AMD processors, on AMD is the old vintage way of calculating indices the only solution.

I'm pretty sure that AMD didn't fix this for Zen+ either, maybe they will fix it next year when Zen2 arrives, who knows? Until this is fixed I won't consider buying an AMD processor because it is unusable for the things I want to do, I'd rather wait for Intel Cascade Lake that arrives by the end of the year.

Re: Ryzen 2 and BMI2?

Posted: Tue May 15, 2018 10:29 pm
by Gian-Carlo Pascutto
Sesse wrote: Tue May 15, 2018 5:43 pm No, you'd have to use an intrinsic or inline assembler.
Right but that's not doable in a benchmark that also has to run on ARM and Power etc and has to be "fair", i.e. what Roland was referring to.

The versions in SPEC don't even use BSF/LZCNT/POPCNT because of the same reasons. Althought it wouldn't surprise me if Intel C++ generates them anyway, as long as you use the SPEC sources. :D

Re: Ryzen 2 and BMI2?

Posted: Tue May 15, 2018 11:58 pm
by Sesse
Obviously an Intel-specific instruction will not be applicable to PowerPC, indeed.

FWIW, bsf maps fairly well to the ffs() call in POSIX.

Re: Ryzen 2 and BMI2?

Posted: Wed May 16, 2018 1:08 am
by syzygy
Gian-Carlo Pascutto wrote: Tue May 15, 2018 10:29 pm
Sesse wrote: Tue May 15, 2018 5:43 pm No, you'd have to use an intrinsic or inline assembler.
Right but that's not doable in a benchmark that also has to run on ARM and Power etc and has to be "fair", i.e. what Roland was referring to.

The versions in SPEC don't even use BSF/LZCNT/POPCNT because of the same reasons. Althought it wouldn't surprise me if Intel C++ generates them anyway, as long as you use the SPEC sources. :D
Did Crafty as included in SPEC CPU2000 not use any of those on platforms where they were available? (Probably not, I guess...)

Apparently SPEC CPU2017 not includes Deep Sjeng but also Leela. Nice :)

Re: Ryzen 2 and BMI2?

Posted: Wed May 16, 2018 2:35 pm
by Gian-Carlo Pascutto
syzygy wrote: Wed May 16, 2018 1:08 am Did Crafty as included in SPEC CPU2000 not use any of those on platforms where they were available? (Probably not, I guess...)
You have to ask Bob (I don't have a SPEC2000 license) but it's hard to imagine that non-Intel and non-AMD SPEC members wouldn't object to that. Using generic intrinsics like those of GCC (builtin_ffs or what's it called) doesn't work either because it needs to be compilable by pretty much every ages old proprietary compiler out there too.
Apparently SPEC CPU2017 not includes Deep Sjeng but also Leela. Nice :)
Yeah. I regret that SPEC CPU2017 allows parallelism in "speed" benchmarks though (even if only xz uses it). But it's an impossible situation given increasing core counts and how boost speeds influence these benchmarks.

Re: Ryzen 2 and BMI2?

Posted: Mon May 18, 2020 9:59 am
by yurikvelo
Still true for Zen 2.
BMI2 compiles are much slower than POPCNT

Re: Ryzen 2 and BMI2?

Posted: Mon May 18, 2020 11:27 am
by Joost Buijs
yurikvelo wrote: Mon May 18, 2020 9:59 am Still true for Zen 2.
BMI2 compiles are much slower than POPCNT
I'm still waiting for the Intel 10980XE that I want to use for a new workstation, in the mean time I bought an AMD 3970X because I didn't want to wait any longer. It's a nice processor as long as you don't overclock it with precision boost (otherwise it runs extremely hot), PEXT and PDEP are unusable, maybe even worse as Zen 1. I tried to emulate PEXT in software and that runs faster as the native CPU instruction. AVX2 on the AMD is slow too, and it misses AVX-512.

Maybe scatter-gather is not so important for a chess engine, but there are other applications in which it is very useful.

I will keep the AMD 3970X for bulk applications, but as soon as the 10980XE is readily available I will use that one for a new workstation.