Comparison of all known Sliding lookup algorithms

Discussion of chess software programming and technical issues.

Moderator: Ras

Mergi
Posts: 127
Joined: Sat Aug 21, 2021 9:55 pm
Full name: Jen

Re: Comparison of all known Sliding lookup algorithms

Post by Mergi »

dangi12012 wrote: Thu Jun 09, 2022 12:14 am
Mergi wrote: Tue Jun 07, 2022 6:16 pm :shock: :shock:
*PextUnsafe 1132.33
Any chance you can share the output of the C++ Comparison?
I can believe its 'only' the 12600 not a 12900k.
Amazing. :mrgreen:
Mergi wrote: Thu Mar 03, 2022 10:04 pm Oops, forgot to post the single threaded results.

Code: Select all

Million Lookups/s Random Squares, Random Occupation/s:
Name                               Performance [MQueens/s]       Tablesize           Dependencies             Template  Author                                       Reference
Binary Neural Network              35.962782                     5852    [45kb]      pdep_u64, AVX2           no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
Exploding Bitboards                61.285682                     768     [6kb]       imul64                   no        Harald Lüßen                                  http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup)          45.776734                     0       [0kb]       none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift               198.559123                    0       [0kb]       AVX2                     no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=60
Pext Emulated                      70.406628                     107904  [843kb]     none                     no        Zach Wegner                                  https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill                         76.379072                     0       [0kb]       none                     no        Gunnar Andersson                             https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone                        121.197186                    0       [0kb]       none                     no        Peter M. Kogge, Harold S. Stone              https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards                  39.130035                     1848    [14kb]      none                     no        Robert Hyatt                                 https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine                          181.375461                    0       [0kb]       countr_zero, countl_zero yes       Fabio Gobbato                                https://www.chessprogramming.org/QBBEngine
Classical Bob-Mike                 264.465141                    1024    [8kb]       countr_zero, countl_zero yes       Robert Hyatt and Michael Sherwin             https://www.chessprogramming.org/Classical_Approach
Leorik                             257.820557                    128     [1kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Leorik Inline                      96.737293                     0       [0kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Obstruction Difference             291.149068                    768     [6kb]       countl_zero              no        Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline      91.151886                     0       [0kb]       countl_zero              yes       Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Slide Arithmetic                   274.539860                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            94.212820                     0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
SBAMG o^(o-3cbn)                   262.902487                    576     [4kb]       countl_zero, bswap       yes       Syed Fahad                                   http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline                       90.577180                     0       [0kb]       countl_zero, bswap       yes       Syed Fahad and Daniel Inführ                 http://www.talkchess.com/forum3/viewtopic.php?t=59845
Hyperbola Quintessence o^(o-2r)    307.683629                    256     [2kb]       bswap                    no        Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline      87.670208                     0       [0kb]       bswap                    yes       Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Kindergarten                       555.769115                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    370.042586                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Fancy Magic BB - Variable shift    559.388402                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
Plain Magic BB                     612.864016                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       764.010034                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     1163.049905                   107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube                          55.202769                     107680  [841kb]     none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
I've already run that a few weeks ago. Tbh the results for the c++ version are kinda disappointing. I attribute that to the Windows platform though, c# has a big advantage there. In my experience, my c++ engine would usually be roughly 25% faster (in pure perft) on linux compiling with gcc compared to windows + msvc.
Mergi
Posts: 127
Joined: Sat Aug 21, 2021 9:55 pm
Full name: Jen

Re: Comparison of all known Sliding lookup algorithms

Post by Mergi »

Also one interesting thing i've noticed is that big constexpr arrays are not always the best. I used to initialize my magic bitboards using consteval functions during compile time but the perft was noticeably (although not by much) faster when i switched to dynamic initialization on startup. I'm not that knowledgeable about computers to know exactly why it is happening (probably some cache weirdness?), but the perft was clear.
Carbec
Posts: 161
Joined: Thu Jan 20, 2022 9:42 am
Location: France
Full name: Philippe Chevalier

Re: Comparison of all known Sliding lookup algorithms

Post by Carbec »

Hello,

I use the library M42 in my own engine. Its very fast. So when I found this post, I noticed that the SBAM code is based on M42; and I thought it was an improved version. But not, its clearly slower. In fact SBAM code is almost M42 one, except that the rook_attack and bishop_attack are not the same.
I tried to "constexpr" the M42, but I failed with the UINT_64* table[64].
If some one with a better knowledge could look at it ?

Thanks
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Comparison of all known Sliding lookup algorithms

Post by Mike Sherwin »

Code: Select all

AMD Ryzen 9 3950X 16-Core Processor (4 GHz)

Million Lookups/s Random Squares, Random Occupation/s:
Name                               Performance [MQueens/s]       Tablesize           Dependencies             Template  Author                                       Reference
SBAMG o^(o-3cbn)                   186.831494                    576     [4kb]       countl_zero, bswap       yes       Syed Fahad                                   http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline                       119.386829                    0       [0kb]       countl_zero, bswap       yes       Syed Fahad and Daniel Inführ                 http://www.talkchess.com/forum3/viewtopic.php?t=59845
Hyperbola Quintessence o^(o-2r)    232.353704                    256     [2kb]       bswap                    no        Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline      76.086805                     0       [0kb]       bswap                    yes       Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Genetic 8 Ray                      31.405317                     0       [0kb]       bswap                    no        Daniel Inführ (dangi12012)                   Abstract C++ Syntax Tree Sifter (c) Daniel Infuehr
Bitrotation                        28.679729                     0       [0kb]       ReverseBits              no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=20
Binary Neural Network              13.300400                     5852    [45kb]      pdep_u64, AVX2           no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
Exploding Bitboards                50.724215                     768     [6kb]       imul64                   no        Harald Lüßen                                  http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup)          26.113615                     0       [0kb]       none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift               104.513503                    0       [0kb]       AVX2                     no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=60
Pext Emulated                      46.124788                     107904  [843kb]     none                     no        Zach Wegner                                  https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill                         49.381192                     0       [0kb]       none                     no        Gunnar Andersson                             https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone                        91.882418                     0       [0kb]       none                     no        Peter M. Kogge, Harold S. Stone              https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards                  35.207745                     1848    [14kb]      none                     no        Robert Hyatt                                 https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine                          152.046547                    0       [0kb]       countr_zero, countl_zero yes       Fabio Gobbato                                https://www.chessprogramming.org/QBBEngine
QBBEngine - Shifted Mask           151.698646                    0       [0kb]       countr_zero, countl_zero no        Fabio Gobbato                                http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=90#p924623
Classical Bob-Mike                 162.570346                    1024    [8kb]       countr_zero, countl_zero yes       Robert Hyatt and Michael Sherwin             https://www.chessprogramming.org/Classical_Approach
Advanced Bob-Mike                  177.775934                    520     [4kb]       countr_zero, countl_zero no        Michael Sherwin and Daniel Inführ            http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=50#p924653
Leorik                             187.798130                    128     [1kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Leorik Inline                      85.806835                     0       [0kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Obstruction Difference             174.196808                    768     [6kb]       countl_zero              no        Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline      73.233829                     0       [0kb]       countl_zero              yes       Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference     195.311228                    384     [3kb]       countl_zero              no        Daniel Inführ and Michael Hoffmann           http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Genetic Obstruction Difference V2  232.895296                    768     [6kb]       countl_zero              no        Daniel Inführ                                http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Slide Arithmetic                   191.511569                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            74.807511                     0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten                       347.962246                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    163.592045                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Fancy Magic BB - Variable shift    283.043567                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic       171.580052                    6468    [50kb]      none                     no        Daniel Inführ                                tbd
Plain Magic BB                     413.124981                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       451.545414                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     37.350946                     107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube                          51.706776                     107680  [841kb]     none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Comparison of all known Sliding lookup algorithms

Post by Mike Sherwin »

Carbec wrote: Tue Nov 29, 2022 3:22 pm Hello,

I use the library M42 in my own engine. Its very fast. So when I found this post, I noticed that the SBAM code is based on M42; and I thought it was an improved version. But not, its clearly slower. In fact SBAM code is almost M42 one, except that the rook_attack and bishop_attack are not the same.
I tried to "constexpr" the M42, but I failed with the UINT_64* table[64].
If some one with a better knowledge could look at it ?

Thanks
I don't have better knowledge but I took a look. The original author Syed Fahad created SBAMG as a standalone sliding piece attack getter. That is what Daniel seems to have reproduced and improved upon here. The m42 library is mentioned. But this is only part of the m42 library. In the m42 library SBAMG looks to be used only for initialization of the m42 library. The actual bishop_attacks, rook_attacks and queen_attacks are Fancy Magic attack getters.

Code: Select all

  inline uint64_t rook_attacks(int sq, uint64_t occ) {
    return RAttacks[sq][((occ & RMasks[sq]) * RMagics[sq]) >> RShift[sq]];
  }

  inline uint64_t bishop_attacks(int sq, uint64_t occ) {
    return BAttacks[sq][((occ & BMasks[sq]) * BMagics[sq]) >> BShift[sq]];
  }

  inline uint64_t queen_attacks(int sq, uint64_t occ) {
    return rook_attacks(sq, occ) | bishop_attacks(sq, occ);
  }
Please someone let us know if this is a correct understanding.

P.S. Daniel commented somewhere that Magic bitboards in a real world chess engine may not perform as well as it does on his platform. The reason would be that in a real world chess engine the L2 cache pressure would be much higher because of all the other data that would need to be swapped in and out of the L2 cache, for example the transposition table and other hashes. Until this statement is tested I think Kindergarten bitboards is the way to go.
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Comparison of all known Sliding lookup algorithms

Post by Mike Sherwin »

When SISSY bitboards were being created it was noted by Mar and Geerd that compilers did not understand the C++ code very well. So today I tried changing the code a bit so the compiler could understand it a little better.

Old Code

Code: Select all

    uint64_t Queen(int sq, uint64_t occ) {
        return qss[sq][occ & 255][0]
            & qss[sq][(occ >> 8) & 255][1]
            & qss[sq][(occ >> 16) & 255][2]
            & qss[sq][(occ >> 24) & 255][3]
            & qss[sq][(occ >> 32) & 255][4]
            & qss[sq][(occ >> 40) & 255][5]
            & qss[sq][(occ >> 48) & 255][6]
            & qss[sq][(occ >> 56) & 255][7];
    }
New Code

Code: Select all

    uint64_t Queen(int sq, uint64_t occ) {
      uint64_t bb1, bb2, bb3, bb4;
      bb1 = qss[sq][occ & 255][0] & qss[sq][(occ >> 8) & 255][1];
      bb2 = qss[sq][(occ >> 16) & 255][2] & qss[sq][(occ >> 24) & 255][3];
      bb3 = qss[sq][(occ >> 32) & 255][4] & qss[sq][(occ >> 40) & 255][5];
      bb4 = qss[sq][(occ >> 48) & 255][6] & qss[sq][(occ >> 56) & 255][7];
      bb1 = bb1 & bb2;
      bb3 = bb3 & bb4;
      bb1 = bb1 & bb3;
      return bb1;
    }
Old Result
SISSY Bitboards 165.096416 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtop ... =7&t=73083
New Result
SISSY Bitboards 180.710100 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtop ... =7&t=73083

For Comparison
QBBEngine 107.648710 0 [0kb] countr_zero, countl_zero yes Fabio Gobbato https://www.chessprogramming.org/QBBEngine

This was just a first try. I think it can be improved even further.
JoAnnP38
Posts: 253
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Comparison of all known Sliding lookup algorithms

Post by JoAnnP38 »

I found this set of collection of algorithms to be very interesting. When I ran the tests on my CPU (an AMD Ryzen 5 4500U) I was surprised to see that the PEXT method ran very slow on my system, only to find out that the PEXT instruction prior to the Zen 3 architecture is not really part of the AMD instruction set. Instead, they emulate it with microcode on their processors which is considerable slower. So PEXT bitboards are not really a good option unless you are executing under a on an Intel CPU or AMD with the Zen3 architecture or later. I wonder what people to that use the PEXT algorithm with their chess engines? Do they just accept that their slider algorithm runs much slower on AMD systems pre-Zen 3? Or do they dynamically switch algorithms?
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Comparison of all known Sliding lookup algorithms

Post by Mike Sherwin »

JoAnnP38 wrote: Tue Dec 20, 2022 1:31 am I found this set of collection of algorithms to be very interesting. When I ran the tests on my CPU (an AMD Ryzen 5 4500U) I was surprised to see that the PEXT method ran very slow on my system, only to find out that the PEXT instruction prior to the Zen 3 architecture is not really part of the AMD instruction set. Instead, they emulate it with microcode on their processors which is considerable slower. So PEXT bitboards are not really a good option unless you are executing under a on an Intel CPU or AMD with the Zen3 architecture or later. I wonder what people to that use the PEXT algorithm with their chess engines? Do they just accept that their slider algorithm runs much slower on AMD systems pre-Zen 3? Or do they dynamically switch algorithms?
As far as I have seen engines for the general population that use PEXT default back to Magic when PEXT is not available or not performant.

And I'm quite angry at AMD because when I heard that the 3950x had PEXT I got one, not knowing the truth of the matter. And what made me even more angry is that I may have given INTEL & AMD the idea for PEXT in the first place.
Early PEXT/PDEP Proposal
In late 2006, Michael Sherwin already proposed a PEXTPDEP instruction, Parallel Bits Extract controlled by a source mask followed by Parallel Bits Deposit controlled by a destination mask [20] [21]. However, it is not known whether his proposal was recognized by Intel engineers and had any influence on the design of the BMI2 PEXT and PDEP instructions.
http://www.open-aurec.com/wbforum/viewt ... f=4&t=5962
JoAnnP38
Posts: 253
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Comparison of all known Sliding lookup algorithms

Post by JoAnnP38 »

Mike Sherwin wrote: Tue Dec 20, 2022 2:21 am ... I'm quite angry at AMD because when I heard that the 3950x had PEXT I got one, not knowing the truth of the matter. And what made me even more angry is that I may have given INTEL & AMD the idea for PEXT in the first place.
Early PEXT/PDEP Proposal
In late 2006, Michael Sherwin already proposed a PEXTPDEP instruction, Parallel Bits Extract controlled by a source mask followed by Parallel Bits Deposit controlled by a destination mask [20] [21]. However, it is not known whether his proposal was recognized by Intel engineers and had any influence on the design of the BMI2 PEXT and PDEP instructions.
http://www.open-aurec.com/wbforum/viewt ... f=4&t=5962
At least you received a mention somewhere. Didn't you work for Intel at one point? For some reason I was reading through the references on the wiki, and I thought I may have seen mention of that. Of course, I could easily be mistaken. When I bought my laptop, I had just retired, and I was thinking I would never write another line of code and that the laptop would just be an Internet/Social Media workstation for me. Now that I'm enjoying the heck out of coding my chess engine and not having my soul drained by corporate politics, I may want to rethink that decision at some point.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

New Algorithm Found: GaloisField Slider Lookup

This one is special since it can solve 2 Queens or 4 Bishops, Rooks in just very few instructions!
It has a 0byte lookup table and has the shortest code per looked up target bit ratio! So very few lines of code.

The key insight is that AVX512F_GFNI allows rotation of bits inside each of the 64bytes in a single instruction. With permutation of bytes one can reliably calculate 16 directions at once. It is the natural evolution of "hyperbola qsc." into "bitrotation" into "GaloisField Slider Lookup".

Hardware support looks promising with AMD and Intel supporting the instructions in the coming generations.
Core algo (which is 8x8 binary matrix multiply) and corresponds to a single instruction: _mm512_gf2p8affine_epi64_epi8

Code: Select all

	static constexpr uint8_t affine_byte(uint8_t qword[8], uint8_t byte) {
		uint8_t res = 0;
		res |= parity_lu[(qword[7 - 0] & byte)] << 0;
		res |= parity_lu[(qword[7 - 1] & byte)] << 1;
		res |= parity_lu[(qword[7 - 2] & byte)] << 2;
		res |= parity_lu[(qword[7 - 3] & byte)] << 3;
		res |= parity_lu[(qword[7 - 4] & byte)] << 4;
		res |= parity_lu[(qword[7 - 5] & byte)] << 5;
		res |= parity_lu[(qword[7 - 6] & byte)] << 6;
		res |= parity_lu[(qword[7 - 7] & byte)] << 7;
		return res;
	}

	static constexpr __m512i _mm512_set1_epi64(uint64_t broadcast) {
		__m512i res;
		for (int i = 0; i < 8; i++) {
			res.qword[i] = { broadcast };
		}
		return res;
	}

	//This is the main part of the algorithm: 8x8 matrix mulitplication via avx512. In this case reversing bits - but a real algeabraic solution might be possible.
	//A solution would be a matrix array that can solve all possible input configurations for a slider just by multiplication.
	static constexpr __m512i _mm512_gf2p8affine_epi64_epi8(__m512i x, __m512i A, uint8_t imm8) {
		__m512i res;
		for (int j = 0; j < Limit; j++) {
			res.qword[j].byte[0] = affine_byte(A.qword[j].byte, x.qword[j].byte[0]) ^ imm8;
			res.qword[j].byte[1] = affine_byte(A.qword[j].byte, x.qword[j].byte[1]) ^ imm8;
			res.qword[j].byte[2] = affine_byte(A.qword[j].byte, x.qword[j].byte[2]) ^ imm8;
			res.qword[j].byte[3] = affine_byte(A.qword[j].byte, x.qword[j].byte[3]) ^ imm8;
			res.qword[j].byte[4] = affine_byte(A.qword[j].byte, x.qword[j].byte[4]) ^ imm8;
			res.qword[j].byte[5] = affine_byte(A.qword[j].byte, x.qword[j].byte[5]) ^ imm8;
			res.qword[j].byte[6] = affine_byte(A.qword[j].byte, x.qword[j].byte[6]) ^ imm8;
			res.qword[j].byte[7] = affine_byte(A.qword[j].byte, x.qword[j].byte[7]) ^ imm8;
		}
		return res;
	}
Notice how the core algorithm essentialy stays the same, but now performs on 8 directions simultaniously which is equal to 16 rays.

Code: Select all

	static constexpr __m512i attack8(uint64_t occ, uint64_t x, __m512i mask) {
		__m512i o = _mm512_set1_epi64(occ) & mask;
		__m512i sq = _mm512_set1_epi64(1ull << x);
		__m512i sq_rev = _mm512_set1_epi64((1ull << (x ^ 63)));

		return ((o - sq) ^ bit_reverse(bit_reverse(o) - sq_rev)) & mask;
	}

I could not test in hardware since I am lacking Zen4 - maybe someone will take up the mantle and just use the functions in immintrin.h directly.
https://github.com/Gigantua/Chess_Moveg ... sField.hpp

Code: Select all

Verify Engines...OK!

AMD Ryzen 9 5950X 16-Core Processor

Million Lookups/s Random Squares, Random Occupation/s:
Name                               Performance [MQueens/s]       Tablesize           Dependencies             Template  Author                                       Reference
SBAMG o^(o-3cbn)                   237.449319                    576     [4kb]       countl_zero, bswap       yes       Syed Fahad                                   http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline                       135.764729                    0       [0kb]       countl_zero, bswap       yes       Syed Fahad and Daniel Inführ                 http://www.talkchess.com/forum3/viewtopic.php?t=59845
GaloisField - AVX512               5.904822                      0       [0kb]       AVX512F_GFNI             no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=81335
Hyperbola Quintessence o^(o-2r)    295.760276                    256     [2kb]       bswap                    no        Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline      96.516712                     0       [0kb]       bswap                    yes       Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Genetic 8 Ray                      44.151004                     0       [0kb]       bswap                    no        Daniel Inführ (dangi12012)                   Abstract C++ Syntax Tree Sifter (c) Daniel Infuehr
Bitrotation                        40.674519                     0       [0kb]       ReverseBits              no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=20
Binary Neural Network              49.450441                     5852    [45kb]      pdep_u64, AVX2           no        Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
The hardware support of 8x8 binary matrix multiply also begs the question if a true algebraic solution exists where one just multiplies the masked occupancy with a square dependent matrix and gets out the solution. (My gut feeling says YES)
Which suddenly would consolidate movegen and evaluation into a uniform mathematical framework. So I hope it exists.

Take a look:
https://github.com/Gigantua/Chess_Moveg ... sField.hpp
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer