Questions about PEXT move generation

Discussion of chess software programming and technical issues.

Moderator: Ras

mathmoi
Posts: 290
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec
Full name: Mathieu Pagé

Questions about PEXT move generation

Post by mathmoi »

Bonjour,

Over the past few years, I haven't always kept up to date with chess engine development techniques. In recent weeks, I have returned to the development of a new engine and I have reached the stage of developing the moves generator. During the development of my latest engine, the most modern technique was Magic Bitboards. It seems to still be a technique in use. On the other hand, it seems that a new technique based on the PEXT machine instruction has become a serious alternative.

From what I understand this technique requires a rapid implementation of PEXT, which was not available in AMD processors before the ZEN 3 architecture which was released in late 2020.

Is a move generator based on PEXT generally faster or at least as fast as a generator based on Magic bitboards?

Is it too early in 2024 to abandon Magic Bitboards and assume the users have a processor with a fast PEXT implementation (so an Intel CPU or a recent AMD one)?

If the answer to the first question is yes and the answer to the second question is no, I guess I could release two versions of my engine. Do other authors do this?

Merci!
smatovic
Posts: 3220
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Questions about PEXT move generation

Post by smatovic »

Some newer engines by individuals support for example only AVX2 architectures for NNUE inference, Stockfish delivers several binaries for all kind of architectures, with optimized code, according to some TC posts, some people still use CPUs as Intel Q6600 from ~2007 w/o PEXT and AVX.

So, optimized code vs. compatibility? I myself decided to drop legacy GPU support in my OpenCL engine, to drop support for >=~10 years old hardware, cos the vendors drop driver support for such old hardware the decision was made easier for me.

Windows 11 will make a cut on old hardware too, but maybe the userbase will then move on to Linux, or alike.

In regard of PEXT and architectures:
viewtopic.php?f=7&t=83116&p=956763#p956763

--
Srdja
smatovic
Posts: 3220
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Questions about PEXT move generation

Post by smatovic »

For example, from the Caissa README file:

https://github.com/Witek902/Caissa#prov ... e-versions
AVX-512 - Fastest, requires a x64 CPU with AVX-512 instruction set support. May not be supported on consumer-grade CPUs.
BMI2 - Fast, requires a x64 CPU with AVX2 and BMI2 instruction set support. Supported by majority of modern CPUs.
AVX2 - Fast, requires a x64 CPU with AVX2 instruction set support. May be faster than BMI2 on some older CPUs (e.g. Intel Haswell processors).
POPCNT - Slower, requires a x64 CPU with SSE4 and POPCNT instruction set support. For older CPUs.
Legacy - Slowest, requires any x64 CPU. For very old x64 CPUs.
and from the makefile:

Code: Select all

SSE2FLAGS     = $(COMMONFLAGS) -DUSE_SSE -DUSE_SSE2
SSE4FLAGS     = $(SSE2FLAGS) -DUSE_SSE4 -DUSE_POPCNT
AVX2FLAGS     = $(SSE4FLAGS) -DUSE_AVX2
BMI2FLAGS     = $(AVX2FLAGS) -DUSE_BMI2
AVX512FLAGS   = $(BMI2FLAGS) -mavx512f -mavx512bw -mavx512dq -DUSE_AVX512
--
Srdja
mathmoi
Posts: 290
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec
Full name: Mathieu Pagé

Re: Questions about PEXT move generation

Post by mathmoi »

Hi,

Thanks for theses info. I think I'll provide two (or more if necessary) build. I'il first Implement a magic implementation, make sure it's correct and then I'll implement the PEXT based generator.
Modern Times
Posts: 3699
Joined: Thu Jun 07, 2012 11:02 pm

Re: Questions about PEXT move generation

Post by Modern Times »

On BM!2 capable Intel machines, I've never noticed any significant speed difference between AVX2 and BMI2 compiles when I've benchmarked them. I do use BMI2 though if it is provided.
syzygy
Posts: 5690
Joined: Tue Feb 28, 2012 11:56 pm

Re: Questions about PEXT move generation

Post by syzygy »

Modern Times wrote: Fri Jan 12, 2024 8:57 pm On BM!2 capable Intel machines, I've never noticed any significant speed difference between AVX2 and BMI2 compiles when I've benchmarked them. I do use BMI2 though if it is provided.
I think before NNUE the speedup was maybe 1-2%.
With NNUE the gain from BMI2 (i.e. pext/pdep) is probably even smaller because the evaluation no longer calculates sliding piece attacks.
syzygy
Posts: 5690
Joined: Tue Feb 28, 2012 11:56 pm

Re: Questions about PEXT move generation

Post by syzygy »

mathmoi wrote: Thu Jan 11, 2024 7:02 pmFrom what I understand this technique requires a rapid implementation of PEXT, which was not available in AMD processors before the ZEN 3 architecture which was released in late 2020.

Is a move generator based on PEXT generally faster or at least as fast as a generator based on Magic bitboards?

Is it too early in 2024 to abandon Magic Bitboards and assume the users have a processor with a fast PEXT implementation (so an Intel CPU or a recent AMD one)?
It depends on who your users are, but obviously many people still have an older CPU without a fast pext implementation or without a pext implementation at all.

In addition, ARM processors do not support the pext and pdep instructions, so chess engines aimed at a general public will continue to include magic bitboard implementations. (This remark is irrelevant if your real question is whether you should release one or two x86_64-based binaries.)
jdart
Posts: 4397
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Questions about PEXT move generation

Post by jdart »

PEXT gives only a very minor speed gain in my engine.
mathmoi
Posts: 290
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec
Full name: Mathieu Pagé

Re: Questions about PEXT move generation

Post by mathmoi »

jdart wrote: Sun Jan 14, 2024 10:38 pm PEXT gives only a very minor speed gain in my engine.
Is it even worth it? I was under the impression that PEXT (when a fast implementation is available) was definitely faster. Was it wrong to assume that?

In any case I will have to test it myself, but I'd like to know what kind of speedup other author have found between Magic and PEXT.
Jouni
Posts: 3621
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Questions about PEXT move generation

Post by Jouni »

For Intel prosessor the fastest Stockfish compile is still abrok bmi-compile. Github and AVX2 are slower. But of course 1-2 % differences have no practical meaning.
Jouni