Ryzen 2 and BMI2?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Steve Maughan
Posts: 1221
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Ryzen 2 and BMI2?

Post by Steve Maughan »

The first generation of Ryzen processors are extremely slow at executing the BMI2 instruction set. Does anyone know if this has been corrected in Ryzen 2 chips?

- Steve
http://www.chessprogramming.net - Maverick Chess Engine
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Ryzen 2 and BMI2?

Post by Gian-Carlo Pascutto »

Ryzen 1 was really fast at BMI2, it was just slow at a single instruction, i.e. PEXT.

I wouldn't expect this to change. Nothing uses PEXT, aside from some chess engine movegens.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Ryzen 2 and BMI2?

Post by syzygy »

Gian-Carlo Pascutto wrote: Tue May 15, 2018 9:04 am Ryzen 1 was really fast at BMI2, it was just slow at a single instruction, i.e. PEXT.

I wouldn't expect this to change. Nothing uses PEXT, aside from some chess engine movegens.
Please make Sjeng use PEXT ;-)
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: Ryzen 2 and BMI2?

Post by Sesse »

I wanted to use PEXT for a branchless UTF-8 parser, but unfortunately, the instruction is too slow for it to be a win over straight-up code. (I know others have tried and come to the same conslusion.)
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Ryzen 2 and BMI2?

Post by Gian-Carlo Pascutto »

syzygy wrote: Tue May 15, 2018 9:15 am Please make Sjeng use PEXT ;-)
There's no way to have the instruction inferred from pure C code, is there? That would make it annoying to use in a portable benchmark.
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: Ryzen 2 and BMI2?

Post by Sesse »

No, you'd have to use an intrinsic or inline assembler. The former is fairly portable across compilers; at least MSVC, GCC, Clang and ICC all tend to support the Intel intrinsic style (_pext_u64 in this case) with some coaxing.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Ryzen 2 and BMI2?

Post by Joost Buijs »

Gian-Carlo Pascutto wrote: Tue May 15, 2018 9:04 am Ryzen 1 was really fast at BMI2, it was just slow at a single instruction, i.e. PEXT.

I wouldn't expect this to change. Nothing uses PEXT, aside from some chess engine movegens.
PEXT and his counterparty PDEP are both incredible slow on AMD Zen hardware because AMD was lazy and implemented these instructions in microcode instead of logic.

On intel processors you can really make very good use of PEXT in your evaluation function, for instance to index pawn patterns (or any other pattern) in a very fast way. In the pawn evaluator I'm currently working on I use PEXT throughout, using PEXT it runs about twice as fast as what I can get without using PEXT, unfortunately this doesn't work on AMD processors, on AMD is the old vintage way of calculating indices the only solution.

I'm pretty sure that AMD didn't fix this for Zen+ either, maybe they will fix it next year when Zen2 arrives, who knows? Until this is fixed I won't consider buying an AMD processor because it is unusable for the things I want to do, I'd rather wait for Intel Cascade Lake that arrives by the end of the year.
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Ryzen 2 and BMI2?

Post by Gian-Carlo Pascutto »

Sesse wrote: Tue May 15, 2018 5:43 pm No, you'd have to use an intrinsic or inline assembler.
Right but that's not doable in a benchmark that also has to run on ARM and Power etc and has to be "fair", i.e. what Roland was referring to.

The versions in SPEC don't even use BSF/LZCNT/POPCNT because of the same reasons. Althought it wouldn't surprise me if Intel C++ generates them anyway, as long as you use the SPEC sources. :D
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: Ryzen 2 and BMI2?

Post by Sesse »

Obviously an Intel-specific instruction will not be applicable to PowerPC, indeed.

FWIW, bsf maps fairly well to the ffs() call in POSIX.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Ryzen 2 and BMI2?

Post by syzygy »

Gian-Carlo Pascutto wrote: Tue May 15, 2018 10:29 pm
Sesse wrote: Tue May 15, 2018 5:43 pm No, you'd have to use an intrinsic or inline assembler.
Right but that's not doable in a benchmark that also has to run on ARM and Power etc and has to be "fair", i.e. what Roland was referring to.

The versions in SPEC don't even use BSF/LZCNT/POPCNT because of the same reasons. Althought it wouldn't surprise me if Intel C++ generates them anyway, as long as you use the SPEC sources. :D
Did Crafty as included in SPEC CPU2000 not use any of those on platforms where they were available? (Probably not, I guess...)

Apparently SPEC CPU2017 not includes Deep Sjeng but also Leela. Nice :)