VPOPCNTDQ and VBMI2
Moderators: hgm, Rebel, chrisw
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
VPOPCNTDQ and VBMI2
Did someone already tried VPOPCNTDQ and/or VBMI2 ? what is the expected performance improvment ?
-
- Posts: 2561
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: VPOPCNTDQ and VBMI2
no idea, but I assume those are vector instructions
for example, arm64 doesn't have popcnt for gpr registers, so you have to load into a vfp register, execute cnt on 8-bit vector, then accumulate per-byte results and convert back to a general purpose register.
doesn't seem to me like this should be a win at all, assuming you want to do popcnt on 64-bit bitboards
-
- Posts: 1565
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: VPOPCNTDQ and VBMI2
These AVX-512 instructions are only supported by a few Intel architectures like the new Rocket Lake i9-11900K, nowadays everybody seems to buy AMD so there won't be many people with access to one of these processors.
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: VPOPCNTDQ and VBMI2
And if you actually use these instructions, the CPU will throttle down so much that it will be a loss in anything but crafted benchmarks for dubious marketing - which is what these instructions are actually for.Joost Buijs wrote: ↑Wed May 05, 2021 7:16 pmThese AVX-512 instructions are only supported by a few Intel architectures
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 476
- Joined: Sun Mar 17, 2019 12:00 pm
- Full name: Henk Drost
Re: VPOPCNTDQ and VBMI2
AVX512 is actually a huge speed up if a majority of the instructions are avx512.Ras wrote: ↑Wed May 05, 2021 7:21 pmAnd if you actually use these instructions, the CPU will throttle down so much that it will be a loss in anything but crafted benchmarks for dubious marketing - which is what these instructions are actually for.Joost Buijs wrote: ↑Wed May 05, 2021 7:16 pmThese AVX-512 instructions are only supported by a few Intel architectures
In mixed loads the throttle can indeed cause slow downs.
I think vnni256 is fastest for SF, followed by vnni512. But it also depends on the CPU, some throttle less than others.
-
- Posts: 1565
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: VPOPCNTDQ and VBMI2
Indeed it gives some gain, I never tried with Stockfish but with my own engine the gain with AVX-512 is 20 to 25% in comparison with AVX2. This is on my i9-10980XE. When properly cooled it doesn't throttle much with AVX-512 and still runs at 3800 MHz. all core.Raphexon wrote: ↑Thu May 06, 2021 8:18 amAVX512 is actually a huge speed up if a majority of the instructions are avx512.Ras wrote: ↑Wed May 05, 2021 7:21 pmAnd if you actually use these instructions, the CPU will throttle down so much that it will be a loss in anything but crafted benchmarks for dubious marketing - which is what these instructions are actually for.Joost Buijs wrote: ↑Wed May 05, 2021 7:16 pmThese AVX-512 instructions are only supported by a few Intel architectures
In mixed loads the throttle can indeed cause slow downs.
I think vnni256 is fastest for SF, followed by vnni512. But it also depends on the CPU, some throttle less than others.
The 8 core i9-11900K is even better in this respect but uses a shitload of power (300W), and this is a lot for just 8 cores. An acquaintance of mine has one, it is extremely fast with AVX2 and AVX-512.
-
- Posts: 3554
- Joined: Thu Jun 07, 2012 11:02 pm
Re: VPOPCNTDQ and VBMI2
And yet, apart from the AVX-512 support, (or even despite the lack of AVX-512 support) you're probably better off with the 10-core 10900K.Joost Buijs wrote: ↑Thu May 06, 2021 2:50 pm The 8 core i9-11900K is even better in this respect but uses a shitload of power (300W), and this is a lot for just 8 cores. An acquaintance of mine has one, it is extremely fast with AVX2 and AVX-512.
-
- Posts: 1565
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: VPOPCNTDQ and VBMI2
It depends upon what you want to do with it. If you are a programmer and want to experiment with AVX-512 you don't have much options. For running a multi-core chess engine the 10 core 10900K is probably better, is is cheaper and draws less power. Performance wise they are more or less on par. The 10900K has 2 extra cores and the 11900K has a higher IPC, not much of a difference.Modern Times wrote: ↑Thu May 06, 2021 3:09 pmAnd yet, apart from the AVX-512 support, (or even despite the lack of AVX-512 support) you're probably better off with the 10-core 10900K.Joost Buijs wrote: ↑Thu May 06, 2021 2:50 pm The 8 core i9-11900K is even better in this respect but uses a shitload of power (300W), and this is a lot for just 8 cores. An acquaintance of mine has one, it is extremely fast with AVX2 and AVX-512.
If you don't need AVX-512 you are better of with the AMD 5950X, multi-core it has more than twice the speed of the 11900K, it is more expensive though.
-
- Posts: 3554
- Joined: Thu Jun 07, 2012 11:02 pm
Re: VPOPCNTDQ and VBMI2
Or indeed the 12-core 5900X