Bonjour,
Over the past few years, I haven't always kept up to date with chess engine development techniques. In recent weeks, I have returned to the development of a new engine and I have reached the stage of developing the moves generator. During the development of my latest engine, the most modern technique was Magic Bitboards. It seems to still be a technique in use. On the other hand, it seems that a new technique based on the PEXT machine instruction has become a serious alternative.
From what I understand this technique requires a rapid implementation of PEXT, which was not available in AMD processors before the ZEN 3 architecture which was released in late 2020.
Is a move generator based on PEXT generally faster or at least as fast as a generator based on Magic bitboards?
Is it too early in 2024 to abandon Magic Bitboards and assume the users have a processor with a fast PEXT implementation (so an Intel CPU or a recent AMD one)?
If the answer to the first question is yes and the answer to the second question is no, I guess I could release two versions of my engine. Do other authors do this?
Merci!
Questions about PEXT move generation
Moderator: Ras
-
- Posts: 290
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
- Full name: Mathieu Pagé
-
- Posts: 3220
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Questions about PEXT move generation
Some newer engines by individuals support for example only AVX2 architectures for NNUE inference, Stockfish delivers several binaries for all kind of architectures, with optimized code, according to some TC posts, some people still use CPUs as Intel Q6600 from ~2007 w/o PEXT and AVX.
So, optimized code vs. compatibility? I myself decided to drop legacy GPU support in my OpenCL engine, to drop support for >=~10 years old hardware, cos the vendors drop driver support for such old hardware the decision was made easier for me.
Windows 11 will make a cut on old hardware too, but maybe the userbase will then move on to Linux, or alike.
In regard of PEXT and architectures:
viewtopic.php?f=7&t=83116&p=956763#p956763
--
Srdja
So, optimized code vs. compatibility? I myself decided to drop legacy GPU support in my OpenCL engine, to drop support for >=~10 years old hardware, cos the vendors drop driver support for such old hardware the decision was made easier for me.
Windows 11 will make a cut on old hardware too, but maybe the userbase will then move on to Linux, or alike.
In regard of PEXT and architectures:
viewtopic.php?f=7&t=83116&p=956763#p956763
--
Srdja
-
- Posts: 3220
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Questions about PEXT move generation
For example, from the Caissa README file:
https://github.com/Witek902/Caissa#prov ... e-versions
--
Srdja
https://github.com/Witek902/Caissa#prov ... e-versions
and from the makefile:AVX-512 - Fastest, requires a x64 CPU with AVX-512 instruction set support. May not be supported on consumer-grade CPUs.
BMI2 - Fast, requires a x64 CPU with AVX2 and BMI2 instruction set support. Supported by majority of modern CPUs.
AVX2 - Fast, requires a x64 CPU with AVX2 instruction set support. May be faster than BMI2 on some older CPUs (e.g. Intel Haswell processors).
POPCNT - Slower, requires a x64 CPU with SSE4 and POPCNT instruction set support. For older CPUs.
Legacy - Slowest, requires any x64 CPU. For very old x64 CPUs.
Code: Select all
SSE2FLAGS = $(COMMONFLAGS) -DUSE_SSE -DUSE_SSE2
SSE4FLAGS = $(SSE2FLAGS) -DUSE_SSE4 -DUSE_POPCNT
AVX2FLAGS = $(SSE4FLAGS) -DUSE_AVX2
BMI2FLAGS = $(AVX2FLAGS) -DUSE_BMI2
AVX512FLAGS = $(BMI2FLAGS) -mavx512f -mavx512bw -mavx512dq -DUSE_AVX512
Srdja
-
- Posts: 290
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
- Full name: Mathieu Pagé
Re: Questions about PEXT move generation
Hi,
Thanks for theses info. I think I'll provide two (or more if necessary) build. I'il first Implement a magic implementation, make sure it's correct and then I'll implement the PEXT based generator.
Thanks for theses info. I think I'll provide two (or more if necessary) build. I'il first Implement a magic implementation, make sure it's correct and then I'll implement the PEXT based generator.
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com
-
- Posts: 3699
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Questions about PEXT move generation
On BM!2 capable Intel machines, I've never noticed any significant speed difference between AVX2 and BMI2 compiles when I've benchmarked them. I do use BMI2 though if it is provided.
-
- Posts: 5690
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Questions about PEXT move generation
I think before NNUE the speedup was maybe 1-2%.Modern Times wrote: ↑Fri Jan 12, 2024 8:57 pm On BM!2 capable Intel machines, I've never noticed any significant speed difference between AVX2 and BMI2 compiles when I've benchmarked them. I do use BMI2 though if it is provided.
With NNUE the gain from BMI2 (i.e. pext/pdep) is probably even smaller because the evaluation no longer calculates sliding piece attacks.
-
- Posts: 5690
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Questions about PEXT move generation
It depends on who your users are, but obviously many people still have an older CPU without a fast pext implementation or without a pext implementation at all.mathmoi wrote: ↑Thu Jan 11, 2024 7:02 pmFrom what I understand this technique requires a rapid implementation of PEXT, which was not available in AMD processors before the ZEN 3 architecture which was released in late 2020.
Is a move generator based on PEXT generally faster or at least as fast as a generator based on Magic bitboards?
Is it too early in 2024 to abandon Magic Bitboards and assume the users have a processor with a fast PEXT implementation (so an Intel CPU or a recent AMD one)?
In addition, ARM processors do not support the pext and pdep instructions, so chess engines aimed at a general public will continue to include magic bitboard implementations. (This remark is irrelevant if your real question is whether you should release one or two x86_64-based binaries.)
-
- Posts: 4397
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Questions about PEXT move generation
PEXT gives only a very minor speed gain in my engine.
-
- Posts: 290
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
- Full name: Mathieu Pagé
Re: Questions about PEXT move generation
Is it even worth it? I was under the impression that PEXT (when a fast implementation is available) was definitely faster. Was it wrong to assume that?
In any case I will have to test it myself, but I'd like to know what kind of speedup other author have found between Magic and PEXT.
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com
-
- Posts: 3621
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Questions about PEXT move generation
For Intel prosessor the fastest Stockfish compile is still abrok bmi-compile. Github and AVX2 are slower. But of course 1-2 % differences have no practical meaning.
Jouni