Namely these instructions:
Code: Select all
VGF2P8AFFINEQB - Galois field affine transformation
VGF2P8AFFINEINVQB - Galois field affine transformation inverse
VGF2P8MULB - Galois field multiply bytes
My favourite chatbot will explain it better than I could:
Explain avx512 Galois field instructions
Can they be used to multiply two 8x8 binary matrices represented by 64bit numbers?AVX-512 Galois Field instructions are a set of specialized instructions for performing operations on Galois Field (GF) elements, which are a mathematical construct used in error-correcting codes and other applications. These instructions are part of the AVX-512 instruction set, which is an extension to the Advanced Vector Extensions (AVX) instruction set for Intel processors. AVX-512 Galois Field instructions allow for efficient implementation of GF operations such as multiplication, division, and exponentiation. They also enable support for different Galois Field sizes, such as GF(2^8) and GF(2^16). These instructions can be used in various applications such as data storage, communications, and cryptography.
Give me an example in C++ using _mm512_gf2p8affine_epi64_epi8Yes, AVX-512 Galois Field instructions can be used to multiply two 8x8 binary matrices represented by 64-bit numbers.
The AVX-512 instruction set includes specialized instructions for performing Galois Field (GF) operations on 512-bit registers. These operations can be used to multiply two 8x8 binary matrices represented by 64-bit numbers. Since each element of the 8x8 matrix can be represented by a single bit, a 64-bit number can hold the entire matrix. The AVX-512 Galois Field instructions can then be used to perform the matrix multiplication efficiently.
It's worth noting that the AVX-512 instruction set is an extension of the AVX instruction set, which means that it requires a compatible processor and operating system to run. Also it's important to keep in mind that the implementation of these instructions might be specific to the software library or programming language used.
Now back to myself. What is very cool is that this has direct application in replacing some parts of nnue of cours but I was more interested in slider movegen of "bitrotation" this is a strong improvement upon hyperbola qsc where byteswap is replaced with bitrotation.Code: Select all
#include <immintrin.h> __m512i matrix1, matrix2, result; // Load the two matrices into the __m512i variables matrix1 = _mm512_loadu_si512((__m512i *)matrix1_data); matrix2 = _mm512_loadu_si512((__m512i *)matrix2_data); // Perform the matrix multiplication result = _mm512_gf2p8affine_epi64_epi8(matrix1, matrix2, 0); // Store the result in the output array _mm512_storeu_si512((__m512i *)output_data, result);
This really just 1 instructions - and I cant wait to get my hands on Zen4
Code: Select all
__m512i reverse_epi8(__m512i input) {
const __m512i select = _mm512_set1_epi64(bit_shuffle_const(7, 6, 5, 4, 3, 2, 1, 0));
return _mm512_gf2p8affine_epi64_epi8(input, select, 0x00);
}