Ryzen 2 and BMI2?

Joost Buijs · Post by **Joost Buijs** » Mon May 18, 2020 11:27 am

yurikvelo wrote: ↑Mon May 18, 2020 9:59 am Still true for Zen 2.
BMI2 compiles are much slower than POPCNT

I'm still waiting for the Intel 10980XE that I want to use for a new workstation, in the mean time I bought an AMD 3970X because I didn't want to wait any longer. It's a nice processor as long as you don't overclock it with precision boost (otherwise it runs extremely hot), PEXT and PDEP are unusable, maybe even worse as Zen 1. I tried to emulate PEXT in software and that runs faster as the native CPU instruction. AVX2 on the AMD is slow too, and it misses AVX-512.

Maybe scatter-gather is not so important for a chess engine, but there are other applications in which it is very useful.

I will keep the AMD 3970X for bulk applications, but as soon as the 10980XE is readily available I will use that one for a new workstation.

Gian-Carlo Pascutto · Fri May 29, 2020 10:40 pm

I tried to emulate PEXT in software and that runs faster as the native CPU instruction.

Do you have a fast implementation that you'd want to make public domain?

bob · Post by **bob** » Sat May 30, 2020 12:36 am

A. You are correct. no intel asm or intel-specific stuff was allowed.

B. The reason I stopped being in SpecInt was stupidity. They decided they wanted to move crafty to the parallel benchmarks. I told them "bad idea" and explained the non-determniism problem. They said "no problem." I replied "node counts will not match, so nobody can verify their test results are correct. They said "no problem." A month or two later, I received a call. "Crafty doesn't produce node counts that match for each run with the same data." I replied "go back and look at all the emails we swapped about this." I got a short "ahhh... that is what you were talking about. sheesh.

Joost Buijs · Post by **Joost Buijs** » Sat May 30, 2020 8:36 am

Gian-Carlo Pascutto wrote: ↑Fri May 29, 2020 10:40 pm
I tried to emulate PEXT in software and that runs faster as the native CPU instruction.
Do you have a fast implementation that you'd want to make public domain?

The software implementation is very basic and unusable slow too, the only thing that strucks me is that it runs somewhat faster than the native CPU implementation (at least for the problem I used it on, a pattern evaluation routine).

Code: Select all


// PEXT emulation
inline bb_t PEXT(uint64_t src, uint64_t mask)
{
	uint64_t result = 0;

	for (uint64_t bit = 1; mask != 0; bit += bit)
	{
		if (src & mask & -(int64_t)mask)
			result |= bit;

		mask &= mask - 1;
	}

	return result;
}

// PDEP emulation
inline bb_t PDEP(uint64_t src, uint64_t mask)
{
	uint64_t result = 0;

	for (uint64_t bit = 1; mask != 0; bit += bit)
	{
		if (src & bit)
			result |= mask & -(int64_t)mask;

		mask &= mask - 1;
	}

	return result;
}

Maybe there are ways to make something better with AVX2, but that's not general purpose too.

I really hope that AMD will fix these instructions someday, otherwise the Zen2 is a nice processor, unfortunately it has some weaknesses.

Ozymandias · Post by **Ozymandias** » Sat May 30, 2020 9:59 am

Joost Buijs wrote: ↑Sat May 30, 2020 8:36 amZen2 is a nice processor, unfortunately it has some weaknesses.

Biggest weak spot so far: price.

Black Friday 2018: Ryzen 7 1700 for 165.99€ at Amazon.

Balck Friday 2019: Ryzen 7 2700 for 149.99€ at Amazon.

Black Friday 2020: Ryzen 7 3700x for a similar price? If so, weakness removed.

Gerd Isenberg · Post by **Gerd Isenberg** » Sat May 30, 2020 10:00 am

Gian-Carlo Pascutto wrote: ↑Fri May 29, 2020 10:40 pm
I tried to emulate PEXT in software and that runs faster as the native CPU instruction.
Do you have a fast implementation that you'd want to make public domain?

https://www.chessprogramming.org/BMI2

The serial implementation of PEXT and PDEP look quite similar.

yurikvelo · Post by **yurikvelo** » Sat May 30, 2020 11:58 am

Ozymandias wrote: ↑Sat May 30, 2020 9:59 am
Ryzen 7 1700 = 166€ = 4.8B transistors

Ryzen 7 2700 = 150€ = 4.9B transistors

Ryzen 7 3700x = ???€ = 19.2B transistors

Joost Buijs · Post by **Joost Buijs** » Sat May 30, 2020 4:53 pm

Gerd Isenberg wrote: ↑Sat May 30, 2020 10:00 am
Gian-Carlo Pascutto wrote: ↑Fri May 29, 2020 10:40 pm
I tried to emulate PEXT in software and that runs faster as the native CPU instruction.
Do you have a fast implementation that you'd want to make public domain?
https://www.chessprogramming.org/BMI2

The serial implementation of PEXT and PDEP look quite similar.

Very well possible, but I'm sure I didn't get it from CPW because I never look there.

About five years ago I got this specific algorithm from somebody who has no connection with computer-chess at all and claimed to be the original author, so I wonder what it's origins are.

I only meant to say that the PEXT implementation of the Zen 2 is so bad that even software emulation runs faster. I'ts a pity because on the AMD I have to replace PEXT with a series of mask and shifts which performs clearly worse.

Joost Buijs · Post by **Joost Buijs** » Sat May 30, 2020 4:58 pm

Ozymandias wrote: ↑Sat May 30, 2020 9:59 am
Joost Buijs wrote: ↑Sat May 30, 2020 8:36 amZen2 is a nice processor, unfortunately it has some weaknesses.
Biggest weak spot so far: price.

Black Friday 2018: Ryzen 7 1700 for 165.99€ at Amazon.

Balck Friday 2019: Ryzen 7 2700 for 149.99€ at Amazon.

Black Friday 2020: Ryzen 7 3700x for a similar price? If so, weakness removed.

Actually I don't care about price!

Gerd Isenberg · Post by **Gerd Isenberg** » Sat May 30, 2020 6:21 pm

Joost Buijs wrote: ↑Sat May 30, 2020 4:53 pm
Gerd Isenberg wrote: ↑Sat May 30, 2020 10:00 am
Gian-Carlo Pascutto wrote: ↑Fri May 29, 2020 10:40 pm
I tried to emulate PEXT in software and that runs faster as the native CPU instruction.
Do you have a fast implementation that you'd want to make public domain?
https://www.chessprogramming.org/BMI2

The serial implementation of PEXT and PDEP look quite similar.

Very well possible, but I'm sure I didn't get it from CPW because I never look there.

About five years ago I got this specific algorithm from somebody who has no connection with computer-chess at all and claimed to be the original author, so I wonder what it's origins are.

I only meant to say that the PEXT implementation of the Zen 2 is so bad that even software emulation runs faster. I'ts a pity because on the AMD I have to replace PEXT with a series of mask and shifts which performs clearly worse.

The routines are so obvious with some bit-twiddling expecience - I would not claim ownership.

We had this discussion here in 2013, where the slightly modified pext/pdep routines came up
http://www.talkchess.com/forum3/viewtop ... 20&start=1
and were mentioned in wikispaces cpw in July 2013:
https://web.archive.org/web/20130706111 ... s.com/BMI2

I agree it is a shame, that Zen is so slow with pext. AMD has to spent some transistors for a fast hardware pext in one cycle!

Best regards,
Gerd

Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?

Re: Ryzen 2 and BMI2?