AVX-512 and NNUE
Moderators: hgm, Rebel, chrisw
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
AVX-512 and NNUE
Has anyone benchmarked Stockfish NNUE's performance on an AVX-512 system with an AVX-512 build? How does it compare to a Zen2 AVX2 build?
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
Re: AVX-512 and NNUE
Is there a build available? I could compile but it'd be easier if somebody had it ready.
-
- Posts: 5228
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: AVX-512 and NNUE
Some fun facts about Intel and reduced Turbo Boost frequency :Gian-Carlo Pascutto wrote: ↑Tue Sep 08, 2020 8:15 pm Has anyone benchmarked Stockfish NNUE's performance on an AVX-512 system with an AVX-512 build? How does it compare to a Zen2 AVX2 build?
https://en.wikipedia.org/wiki/Advanced_ ... wnclocking
Downclocking
Since AVX instructions are wider and generate more heat, Intel processors have provisions to reduce the Turbo Boost frequency limit when such instructions are being executed. The throttling is divided into three levels:[43][44]
L0 (100%): The normal turbo boost limit.
L1 (~85%): The "AVX boost" limit. Soft-triggered by 256-bit "heavy" (floating-point unit: FP math and integer multiplication) instructions. Hard-triggered by "light" (all other) 512-bit instructions.
L2 (~60%): The "AVX-512 boost" limit. Soft-triggered by 512-bit heavy instructions.
The frequency transition can be soft or hard. Hard transition means the frequency is reduced as soon as such an instruction is spotted; soft transition means that the frequency is reduced only after reaching a threshold number of matching instructions. The limit is per-thread.[43]
Downclocking means that using AVX in a mixed workload with an Intel processor can incur a frequency penalty despite it being faster in a "pure" context. Avoiding the use of wide and heavy instructions help minimize the impact in these cases. AVX-512VL allows for using 256-bit or 128-bit operands in AVX-512, making it a sensible default for mixed loads.[45]
-
- Posts: 195
- Joined: Sun Apr 12, 2020 1:09 am
- Full name: Marc-O Moisan-Plante
Re: AVX-512 and NNUE
I don't have an appropriate cpu for this, but on Stockfish discord a few people have and shared information and binaries. Here is one: https://gofile.io/d/G3mkFy
Otherwise, nodchip pure nnue from 07/19 also has AVX 512 binaries : https://github.com/nodchip/Stockfish/re ... 2020-07-19Navs wrote: Some highly optimised Binaries for Intel Skylake CPU's
Have increased nps by 38% and now running at 70% of SF speed
Last edited by MMarco on Wed Sep 09, 2020 12:03 am, edited 1 time in total.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: AVX-512 and NNUE
Not sure if it is using AVX-512 but at Navs stream , there are two stockfish-nnue playing with vnni256 and vnni512
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: AVX-512 and NNUE
This seems to get close:Gian-Carlo Pascutto wrote: ↑Tue Sep 08, 2020 8:15 pm Has anyone benchmarked Stockfish NNUE's performance on an AVX-512 system with an AVX-512 build? How does it compare to a Zen2 AVX2 build?
https://github.com/official-stockfish/S ... -678426702
-
- Posts: 288
- Joined: Sat Jun 30, 2018 10:58 pm
- Location: Ukraine
- Full name: Volodymyr Shcherbyna
Re: AVX-512 and NNUE
I could produce such build but for Igel
Unfortunately none of my machine is AVX512 capable so I did not produce this build because I could not. So if someone could test it it would be great.
Unfortunately none of my machine is AVX512 capable so I did not produce this build because I could not. So if someone could test it it would be great.
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
Re: AVX-512 and NNUE
This build crashes on startup for me.MMarco wrote: ↑Tue Sep 08, 2020 11:58 pm I don't have an appropriate cpu for this, but on Stockfish discord a few people have and shared information and binaries. Here is one: https://gofile.io/d/G3mkFy
Here are the benchmark results for builds with 256x2 for 2 x Xeon Gold 6246, 512 MB cache (24 cores, 48 hyperthreads), average of 3 runs with depth 16:MMarco wrote: ↑Tue Sep 08, 2020 11:58 pm Otherwise, nodchip pure nnue from 07/19 also has AVX 512 binaries : https://github.com/nodchip/Stockfish/re ... 2020-07-19
Ratio of NPS for AVX-512 as compared to AVX2 and BMI2:
Threads AVX-512/AVX2 AVX-512/BMI2
1 1.011 1.110
2 1.004 1.095
4 0.935 1.045
8 0.946 1.035
16 0.946 1.029
32 0.934 1.025
45 0.963 1.051
Raw NPS:
Threads AVX2 BMI2 AVX-512
1 1284745 1169957 1298495
2 2600270 2383164 2610592
4 5316033 4752857 4968256
8 10672936 9757955 10096322
16 20227740 18598880 19139049
32 34599384 31519777 32309726
45 39815839 36463132 38324966
So AVX-512 was best at only 1 and 2 threads. The AVX2 build was best otherwise.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: AVX-512 and NNUE
What puzzles me about these results is that the AVX2 build is faster than the BMI2 build. The BMI2 build should include the AVX2 code (and add pext-based move generation), so it should be faster than AVX2 on Intel (not on AMD/Zen, but you are testing on Intel).mmt wrote: ↑Wed Sep 09, 2020 5:35 amThis build crashes on startup for me.MMarco wrote: ↑Tue Sep 08, 2020 11:58 pm I don't have an appropriate cpu for this, but on Stockfish discord a few people have and shared information and binaries. Here is one: https://gofile.io/d/G3mkFy
Here are the benchmark results for builds with 256x2 for 2 x Xeon Gold 6246, 512 MB cache (24 cores, 48 hyperthreads), average of 3 runs with depth 16:MMarco wrote: ↑Tue Sep 08, 2020 11:58 pm Otherwise, nodchip pure nnue from 07/19 also has AVX 512 binaries : https://github.com/nodchip/Stockfish/re ... 2020-07-19
Ratio of NPS for AVX-512 as compared to AVX2 and BMI2:
Threads AVX-512/AVX2 AVX-512/BMI2
1 1.011 1.110
2 1.004 1.095
4 0.935 1.045
8 0.946 1.035
16 0.946 1.029
32 0.934 1.025
45 0.963 1.051
Raw NPS:
Threads AVX2 BMI2 AVX-512
1 1284745 1169957 1298495
2 2600270 2383164 2610592
4 5316033 4752857 4968256
8 10672936 9757955 10096322
16 20227740 18598880 19139049
32 34599384 31519777 32309726
45 39815839 36463132 38324966
So AVX-512 was best at only 1 and 2 threads. The AVX2 build was best otherwise.
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
Re: AVX-512 and NNUE
I was surprised also. But I re-ran the test to double-check and the result is the same so I don't think I made a mistake.syzygy wrote: ↑Wed Sep 09, 2020 1:30 pm What puzzles me about these results is that the AVX2 build is faster than the BMI2 build. The BMI2 build should include the AVX2 code (and add pext-based move generation), so it should be faster than AVX2 on Intel (not on AMD/Zen, but you are testing on Intel).