Ryzen 2 and BMI2?

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Sesse
Posts: 204
Joined: Mon Apr 30, 2018 9:51 pm
Contact:

Re: Ryzen 2 and BMI2?

Post by Sesse » Tue May 15, 2018 1:27 pm

I wanted to use PEXT for a branchless UTF-8 parser, but unfortunately, the instruction is too slow for it to be a win over straight-up code. (I know others have tried and come to the same conslusion.)

Gian-Carlo Pascutto
Posts: 1196
Joined: Sat Dec 13, 2008 6:00 pm
Contact:

Re: Ryzen 2 and BMI2?

Post by Gian-Carlo Pascutto » Tue May 15, 2018 2:10 pm

syzygy wrote:
Tue May 15, 2018 7:15 am
Please make Sjeng use PEXT ;-)
There's no way to have the instruction inferred from pure C code, is there? That would make it annoying to use in a portable benchmark.

Sesse
Posts: 204
Joined: Mon Apr 30, 2018 9:51 pm
Contact:

Re: Ryzen 2 and BMI2?

Post by Sesse » Tue May 15, 2018 3:43 pm

No, you'd have to use an intrinsic or inline assembler. The former is fairly portable across compilers; at least MSVC, GCC, Clang and ICC all tend to support the Intel intrinsic style (_pext_u64 in this case) with some coaxing.

Joost Buijs
Posts: 1056
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

Re: Ryzen 2 and BMI2?

Post by Joost Buijs » Tue May 15, 2018 5:52 pm

Gian-Carlo Pascutto wrote:
Tue May 15, 2018 7:04 am
Ryzen 1 was really fast at BMI2, it was just slow at a single instruction, i.e. PEXT.

I wouldn't expect this to change. Nothing uses PEXT, aside from some chess engine movegens.
PEXT and his counterparty PDEP are both incredible slow on AMD Zen hardware because AMD was lazy and implemented these instructions in microcode instead of logic.

On intel processors you can really make very good use of PEXT in your evaluation function, for instance to index pawn patterns (or any other pattern) in a very fast way. In the pawn evaluator I'm currently working on I use PEXT throughout, using PEXT it runs about twice as fast as what I can get without using PEXT, unfortunately this doesn't work on AMD processors, on AMD is the old vintage way of calculating indices the only solution.

I'm pretty sure that AMD didn't fix this for Zen+ either, maybe they will fix it next year when Zen2 arrives, who knows? Until this is fixed I won't consider buying an AMD processor because it is unusable for the things I want to do, I'd rather wait for Intel Cascade Lake that arrives by the end of the year.

Gian-Carlo Pascutto
Posts: 1196
Joined: Sat Dec 13, 2008 6:00 pm
Contact:

Re: Ryzen 2 and BMI2?

Post by Gian-Carlo Pascutto » Tue May 15, 2018 8:29 pm

Sesse wrote:
Tue May 15, 2018 3:43 pm
No, you'd have to use an intrinsic or inline assembler.
Right but that's not doable in a benchmark that also has to run on ARM and Power etc and has to be "fair", i.e. what Roland was referring to.

The versions in SPEC don't even use BSF/LZCNT/POPCNT because of the same reasons. Althought it wouldn't surprise me if Intel C++ generates them anyway, as long as you use the SPEC sources. :D

Sesse
Posts: 204
Joined: Mon Apr 30, 2018 9:51 pm
Contact:

Re: Ryzen 2 and BMI2?

Post by Sesse » Tue May 15, 2018 9:58 pm

Obviously an Intel-specific instruction will not be applicable to PowerPC, indeed.

FWIW, bsf maps fairly well to the ffs() call in POSIX.

syzygy
Posts: 4588
Joined: Tue Feb 28, 2012 10:56 pm

Re: Ryzen 2 and BMI2?

Post by syzygy » Tue May 15, 2018 11:08 pm

Gian-Carlo Pascutto wrote:
Tue May 15, 2018 8:29 pm
Sesse wrote:
Tue May 15, 2018 3:43 pm
No, you'd have to use an intrinsic or inline assembler.
Right but that's not doable in a benchmark that also has to run on ARM and Power etc and has to be "fair", i.e. what Roland was referring to.

The versions in SPEC don't even use BSF/LZCNT/POPCNT because of the same reasons. Althought it wouldn't surprise me if Intel C++ generates them anyway, as long as you use the SPEC sources. :D
Did Crafty as included in SPEC CPU2000 not use any of those on platforms where they were available? (Probably not, I guess...)

Apparently SPEC CPU2017 not includes Deep Sjeng but also Leela. Nice :)

Gian-Carlo Pascutto
Posts: 1196
Joined: Sat Dec 13, 2008 6:00 pm
Contact:

Re: Ryzen 2 and BMI2?

Post by Gian-Carlo Pascutto » Wed May 16, 2018 12:35 pm

syzygy wrote:
Tue May 15, 2018 11:08 pm
Did Crafty as included in SPEC CPU2000 not use any of those on platforms where they were available? (Probably not, I guess...)
You have to ask Bob (I don't have a SPEC2000 license) but it's hard to imagine that non-Intel and non-AMD SPEC members wouldn't object to that. Using generic intrinsics like those of GCC (builtin_ffs or what's it called) doesn't work either because it needs to be compilable by pretty much every ages old proprietary compiler out there too.
Apparently SPEC CPU2017 not includes Deep Sjeng but also Leela. Nice :)
Yeah. I regret that SPEC CPU2017 allows parallelism in "speed" benchmarks though (even if only xz uses it). But it's an impossible situation given increasing core counts and how boost speeds influence these benchmarks.

User avatar
yurikvelo
Posts: 548
Joined: Sat Dec 06, 2014 12:53 pm

Re: Ryzen 2 and BMI2?

Post by yurikvelo » Mon May 18, 2020 7:59 am

Still true for Zen 2.
BMI2 compiles are much slower than POPCNT

Joost Buijs
Posts: 1056
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

Re: Ryzen 2 and BMI2?

Post by Joost Buijs » Mon May 18, 2020 9:27 am

yurikvelo wrote:
Mon May 18, 2020 7:59 am
Still true for Zen 2.
BMI2 compiles are much slower than POPCNT
I'm still waiting for the Intel 10980XE that I want to use for a new workstation, in the mean time I bought an AMD 3970X because I didn't want to wait any longer. It's a nice processor as long as you don't overclock it with precision boost (otherwise it runs extremely hot), PEXT and PDEP are unusable, maybe even worse as Zen 1. I tried to emulate PEXT in software and that runs faster as the native CPU instruction. AVX2 on the AMD is slow too, and it misses AVX-512.

Maybe scatter-gather is not so important for a chess engine, but there are other applications in which it is very useful.

I will keep the AMD 3970X for bulk applications, but as soon as the 10980XE is readily available I will use that one for a new workstation.

Post Reply