Comparison of all known Sliding lookup algorithms
Moderator: Ras
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
If somebody with a math background has time:
You can revisit rotated bitboards at any time with the galois field instruction which is 8x8 matrix multiplication.
GF2P8AFFINEQB can apply this to 8 bitboards at once with different matrices.
Mirroring, 90 Degree clockwise, reverting and all other transformations.
To reverse bits we multiply with Identity Transposed. Mirroring solved = I^t
What form do the matrices have to get the other transformations? (90° CW+CCW, 180°)
There is a huge batch of algos that only works with "positive rays" and if you look at CPW rotating BBs is expensive with 11+ instructions for 1BB.
GF2P8AFFINEQB uses 1 instruction for 8BBs at once.
You can revisit rotated bitboards at any time with the galois field instruction which is 8x8 matrix multiplication.
GF2P8AFFINEQB can apply this to 8 bitboards at once with different matrices.
Mirroring, 90 Degree clockwise, reverting and all other transformations.
To reverse bits we multiply with Identity Transposed. Mirroring solved = I^t
What form do the matrices have to get the other transformations? (90° CW+CCW, 180°)
There is a huge batch of algos that only works with "positive rays" and if you look at CPW rotating BBs is expensive with 11+ instructions for 1BB.
GF2P8AFFINEQB uses 1 instruction for 8BBs at once.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 29
- Joined: Thu Sep 15, 2022 4:51 pm
- Full name: Elad Yosef Cohen
Re: Comparison of all known Sliding lookup algorithms
Wow thats huge if it does 700+M with no LUT at all. Plus it wasn't even optimized yet.
-
- Posts: 10
- Joined: Sat Sep 18, 2021 9:36 pm
- Full name: Tony Schwebs
Re: Comparison of all known Sliding lookup algorithms
Swapping the argument order with identity / identity transposed will get us diagonal mirroring / 90 degree rotation.dangi12012 wrote: ↑Tue Jan 31, 2023 10:09 am If somebody with a math background has time:
You can revisit rotated bitboards at any time with the galois field instruction which is 8x8 matrix multiplication.
GF2P8AFFINEQB can apply this to 8 bitboards at once with different matrices.
Mirroring, 90 Degree clockwise, reverting and all other transformations.
To reverse bits we multiply with Identity Transposed. Mirroring solved = I^t
What form do the matrices have to get the other transformations? (90° CW+CCW, 180°)
There is a huge batch of algos that only works with "positive rays" and if you look at CPW rotating BBs is expensive with 11+ instructions for 1BB.
GF2P8AFFINEQB uses 1 instruction for 8BBs at once.
Code: Select all
// . . X X X X . . . . X X X X . .
// . X X X X X X . . X X X X X X .
// . X X . . X X . . X X . . X X .
// . X X . X X . . . . X X . X X .
// . X X X X . . . . . . X X X X .
// . X X . X X . . . . X X . X X .
// . X X . . X X . . X X . . X X .
// . X X . . . X X X X . . . X X .
static __m256i mirror_horizontal(__m256i input) {
return _mm256_gf2p8affine_epi64_epi8(input, _mm256_set1_epi64x(0x8040201008040201), 0);
}
// . . X X X X . . . . . . . . . .
// . X X X X X X . X X X X X X X .
// . X X . . X X . X X X X X X X X
// . X X . X X . . . . . X . . X X
// . X X X X . . . . . X X X . X X
// . X X . X X . . . X X . X X X X
// . X X . . X X . X X . . . X X .
// . X X . . . X X X . . . . . . .
static __m256i rotate_90_degrees(__m256i input) {
return _mm256_gf2p8affine_epi64_epi8(_mm256_set1_epi64x(0x8040201008040201), input, 0);
}
// . . X X X X . . X . . . . . . .
// . X X X X X X . X X . . . X X .
// . X X . . X X . . X X . X X X X
// . X X . X X . . . . X X X . X X
// . X X X X . . . . . . X . . X X
// . X X . X X . . X X X X X X X X
// . X X . . X X . X X X X X X X .
// . X X . . . X X . . . . . . . .
static __m256i mirror_diagonal(__m256i input) {
return _mm256_gf2p8affine_epi64_epi8(_mm256_set1_epi64x(0x0102040810204080), input, 0);
}
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
Its 2023 and another two new movegeneration algorithms are born!
Namely: Correlationmagic and spectral Correlationmagic. The main idea is that sampling 200GB of magic candidates into 100GB of magic solutions you can start to correlate the solutions with each other to find a function that compresses the usual 64slot magic lookup into something smaller.
It was a big brain bug of mine and finally I found a solution to that problem.
That is numbers that satisfy the condition: (occ | mask) * magicM >> (64 - N) where magicM == magicM-1 >> 1 for long ranges. I went so far as to create a competition initially: https://www.talkchess.com/forum3/viewto ... 9853e38986
The competition didnt have all 100GB of magics to work with so that no perfect solution existed but with more data the chance of bigger buckets becomes greater. Also it was unclear what the function is. We can find magics that are valid for multiple squares etc.. In the end most good solutions are bithsifts and can be seen like this:
I executed exactly the tricks to work on all of the data concurrently via memory mapped IO so that all of that can be accessed all the time without memory overflows. In the end only a gpu approach was used since 16cores of CPU are around 60x slower than my GPU for this type of integer arithmetic.
So without any ado lets look at the code itself:
The algo in english is this:
Take only 8 magics and shift them up to 8 times - use usual fancy magic on top of that and index into a table with a square dependent offset. This skips two numbers we usually need to look up. The magic (can be known inside a ZMM register) and the offset (which is sq shifted).
From usually 64 elements we can find lots of solutions that are valid when shifted up to 8x. I made no attempts to make the overall attack_table_bishop small seeing in my adding sq << 9 by default. Yet you can see we need much less hot memory and actually can fit all magics inside a ZMM register. The masks still need to be looked up or calculated for 64 squares.
Now what about the second part the spectral magic?
In working with specific popcounts we can get rid of the expensive Imul64 and multiply via bitshifts. This is only valid for SIMD - non simd 6x bitshift is slower than imul64.
This is important because your and my and everyones computer on this planet in AVX and AVX512 cannot do IMUL64 in a simdwise manner. Bitshifts on the other hand even when multiple of them are used are all not hindering this compiler optimisation.
This is a proof oc concept for the algorithm and the author already has much better ideas for this algo. Since 1, 2, 3 bit solutions also exist it is expected that some squares can be looked up by mask and shift alone which in the end is very fast. Here I picked popcount = 6 for all solutions.
These are the same magics as above but for expressiveness reasons I show here the literally same multiplication and shifting. This is essentially a hybrid of spectral shift decomposition and correlationmagic.
The algo in english is this:
Take only 8 magics and shift them up to 8 times - use bithsift multiplication on top of that and index into a table with a square dependent offset.
We can easily find magics with 5 or even 4 bits that that is open for the future. Here I chose 6 bits.
Thats all nice Daniel but what does that mean in performance?
So correlation magic eeked out a small win against Variable shift Magic BB, but not against Black Magic BB - Fixed shift. This can be expected as there are still many opportunities to improve this. For example no attempt was made to make the table small yielding 2304kb against condensing this easily to 800kb. Another point is that shifting increasing 12 to 14 for rooks by >> (64-12) yields more magics. You can see the rook didnt let itself compress much and I sifted through all 64 bit numbers up to a popcount of 9 which is ncr(64,9) + ncr(64,8) ... so a lot of numbers to try.
The nice thing about slot lookup algorithms generally is that we dont necessarily need to store the bitboards at the calculated index. We might as well have another structure there. For example an array of SqFrom, SqTo since these are known. Here we chose to return the bitmap of the attacked squares.
Finally I was able to work with this now because I have replaced random magic sampling by exhaustive search. With modern architectures the amount of VRAM or RAM does not matter because a fast NVME SSD and memory mapped IO is around as fast as a DDR2 stick from 2010 for sequential access. Allowing the chess community to access 100s of GB of working memory for this problem and future chess problems easily.
I personally will stick with my Gigantua Rotation Algorithm since that is hamstrung on AVX2 but still faster than magic lookups and completely fits in L1.
You can see all of the code here: https://github.com/Gigantua/Chess_Moveg ... aa0e47077d
Future movegens and releases of mine will be on a google indexed page or blog but I will link here of course too. If you have questions let me know!
Namely: Correlationmagic and spectral Correlationmagic. The main idea is that sampling 200GB of magic candidates into 100GB of magic solutions you can start to correlate the solutions with each other to find a function that compresses the usual 64slot magic lookup into something smaller.
It was a big brain bug of mine and finally I found a solution to that problem.
That is numbers that satisfy the condition: (occ | mask) * magicM >> (64 - N) where magicM == magicM-1 >> 1 for long ranges. I went so far as to create a competition initially: https://www.talkchess.com/forum3/viewto ... 9853e38986
The competition didnt have all 100GB of magics to work with so that no perfect solution existed but with more data the chance of bigger buckets becomes greater. Also it was unclear what the function is. We can find magics that are valid for multiple squares etc.. In the end most good solutions are bithsifts and can be seen like this:
Code: Select all
uint64_t magic[8] {
0b000000000000000000000001110000000000000010000000000001000001000000000000,
0b000000000000000000000000000000011100000000000000100000000000010000000000,
0b000000000000000000000100000000000000000111000000000000001000000000000000,
0b000000000000000000001000000001000000000000000001100000000000001000000000,
0b000000000000000000000010000010000000010000000000000000011000000000000000,
0b000000000000000000010000000000100000100000000100000000000000000100000000,
0b000000000000000000000001000100000000001000001000000001000000000000000000,
0b000000000000000000100000000000010001000000000010000010000000010000000000,
};
So without any ado lets look at the code itself:
Code: Select all
static constexpr uint64_t bishop_magics[8] = {
281754153844992ull,
1100594872576ull,
281483600331392ull,
2286985260581120ull,
1143509276953600ull,
563501328237568ull,
141289400076288ull,
565157600174592ull
};
static inline uint64_t bishop(int sq, uint64_t occ) {
uint64_t masked_occ = occ | bishop_masks[sq];
uint64_t magic = bishop_magics[sq >> 3] >> (sq & 7);
uint64_t index = (masked_occ * magic) >> 55;
return *(attack_table_bishop + (sq << 9) + index);
};
Take only 8 magics and shift them up to 8 times - use usual fancy magic on top of that and index into a table with a square dependent offset. This skips two numbers we usually need to look up. The magic (can be known inside a ZMM register) and the offset (which is sq shifted).
From usually 64 elements we can find lots of solutions that are valid when shifted up to 8x. I made no attempts to make the overall attack_table_bishop small seeing in my adding sq << 9 by default. Yet you can see we need much less hot memory and actually can fit all magics inside a ZMM register. The masks still need to be looked up or calculated for 64 squares.
Now what about the second part the spectral magic?
In working with specific popcounts we can get rid of the expensive Imul64 and multiply via bitshifts. This is only valid for SIMD - non simd 6x bitshift is slower than imul64.
This is important because your and my and everyones computer on this planet in AVX and AVX512 cannot do IMUL64 in a simdwise manner. Bitshifts on the other hand even when multiple of them are used are all not hindering this compiler optimisation.
This is a proof oc concept for the algorithm and the author already has much better ideas for this algo. Since 1, 2, 3 bit solutions also exist it is expected that some squares can be looked up by mask and shift alone which in the end is very fast. Here I picked popcount = 6 for all solutions.
These are the same magics as above but for expressiveness reasons I show here the literally same multiplication and shifting. This is essentially a hybrid of spectral shift decomposition and correlationmagic.
Code: Select all
static constexpr uint8_t bishop_spectralmagic[64][6] = {
{ 8, 16, 22, 32, 38, 48 },
{ 8, 16, 20, 23, 30, 40 },
{ 7, 9, 17, 25, 33, 48 },
{ 8, 14, 20, 30, 45, 51 },
{ 10, 11, 22, 34, 44, 50 },
{ 10, 23, 29, 30, 39, 49 },
{ 11, 15, 23, 31, 39, 47 },
{ 9, 13, 25, 33, 41, 49 } };
uint64_t inline spectral_bishop(int sq, uint64_t occ) {
uint64_t masked_occ = occ | bishop_masks[sq];
uint64_t* attacks = attack_table_bishop + (sq << 9);
const uint8_t* s = bishop_spectralmagic[sq >> 3];
uint64_t index = (
(masked_occ << (s[0] - (sq & 7))) +
(masked_occ << (s[1] - (sq & 7))) +
(masked_occ << (s[2] - (sq & 7))) +
(masked_occ << (s[3] - (sq & 7))) +
(masked_occ << (s[4] - (sq & 7))) +
(masked_occ << (s[5] - (sq & 7)))) >> 55;
return attacks[index];
};
Take only 8 magics and shift them up to 8 times - use bithsift multiplication on top of that and index into a table with a square dependent offset.
We can easily find magics with 5 or even 4 bits that that is open for the future. Here I chose 6 bits.
Thats all nice Daniel but what does that mean in performance?
Code: Select all
Name Performance [MQueens/s] Tablesize Dependencies Template Author Reference
Correlation Magic 380.457183 294976 [2304kb] imul64 no Daniel Inführ https://www.talkchess.com/forum3/viewtopic.php?f=7&t=82224#p949688
Spectral Magic 144.279394 294976 [2304kb] none no Daniel Inführ https://www.talkchess.com/forum3/viewtopic.php?f=7&t=82224#p949688
Gigantua Rotation Algorithm 526.645174 768 [6kb] AVX2 or AVX512 + GFNI no Daniel Inführ (dangi12012) https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81707&sid=ec3532ea1ce1e8014d2b579b29fddd34
SBAMG o^(o-3cbn) 235.408406 576 [4kb] countl_zero, bswap yes Syed Fahad http://www.talkchess.com/forum3/viewtopic.php?t=59845
Fancy Magic BB - Variable shift 377.520046 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic 211.290669 6468 [50kb] none no Daniel Inführ tbd
Plain Magic BB 478.842733 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 520.237679 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr 764.414955 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
The nice thing about slot lookup algorithms generally is that we dont necessarily need to store the bitboards at the calculated index. We might as well have another structure there. For example an array of SqFrom, SqTo since these are known. Here we chose to return the bitmap of the attacked squares.
Finally I was able to work with this now because I have replaced random magic sampling by exhaustive search. With modern architectures the amount of VRAM or RAM does not matter because a fast NVME SSD and memory mapped IO is around as fast as a DDR2 stick from 2010 for sequential access. Allowing the chess community to access 100s of GB of working memory for this problem and future chess problems easily.
I personally will stick with my Gigantua Rotation Algorithm since that is hamstrung on AVX2 but still faster than magic lookups and completely fits in L1.
You can see all of the code here: https://github.com/Gigantua/Chess_Moveg ... aa0e47077d
Future movegens and releases of mine will be on a google indexed page or blog but I will link here of course too. If you have questions let me know!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
Busy week and I need to push this idea out there.
When having fancy magic solutions we can limit the popcount to for example 3. This only brings a few solutions so the fixed shift needs to be increased a lot from 12 to someting a lot bigger. If we allow ourselfes to do that we can do 64bit multiplication by 3shifts and 2 additions which is insanely cheap.
Altough Modern processors can do imul64 faster so this movegen is applicable to plattforms where hardware IMUL64 is not existing or with less reciprocal througput than 3 shifts and 2 adds.
So now I want to publish the spectral version of fancy magic with explicit terms. (So everything lives in a switch instead of an array). In addition to the above 2 algos:
https://godbolt.org/z/rKMq64bdT
https://github.com/Gigantua/Chess_Moveg ... 16cb753093
IMO this is just beautiful and would have been great before x86-64 became a thing. As mentioned above applicable anywhere fancy magics IMUL64 is too expensive as the underlying lookup table is 100% compatible.
To note here is that m << 0 is free. The table is only a very tiny amount bigger than individual expanded magics (think of it as a buffer size of 2^18 per square). Usually this is 2^12 bits. Anyways the table size is log2(274645) = 18.06 only slightly bigger than single squares.
This is the explicit form where we see the multiplication in Init() (check the github) and the results here.
Also it is time to publish my performance test of current MSVC as of 2023 vs Clang for Computerchess movegenerators:
Here is the raw data of every movegenerators performance in MSVC vs Clang. Copy paste to excel if you want to analyse this.
I dont recommend MSVC or GCC for that matter for shipping chessprogramming binaries in the aspect of movegens. A few years back the performance difference was not that big. Seems that LLVM backend finds many good optimisations that MSVC is missing.
Here is the graph. Keep in mind that 100 means that clang produces code that is TWICE as fast. and even the run of the mill 30% uplift is non trivial. The aggregate averate of all movegens is 45,67% uplift for Clang++-16. Compiler flags used: clang++ -std=c++20 -O3 -march=native

If you have Zen4 contact me.
When having fancy magic solutions we can limit the popcount to for example 3. This only brings a few solutions so the fixed shift needs to be increased a lot from 12 to someting a lot bigger. If we allow ourselfes to do that we can do 64bit multiplication by 3shifts and 2 additions which is insanely cheap.
Altough Modern processors can do imul64 faster so this movegen is applicable to plattforms where hardware IMUL64 is not existing or with less reciprocal througput than 3 shifts and 2 adds.
So now I want to publish the spectral version of fancy magic with explicit terms. (So everything lives in a switch instead of an array). In addition to the above 2 algos:
https://godbolt.org/z/rKMq64bdT
https://github.com/Gigantua/Chess_Moveg ... 16cb753093
IMO this is just beautiful and would have been great before x86-64 became a thing. As mentioned above applicable anywhere fancy magics IMUL64 is too expensive as the underlying lookup table is 100% compatible.
Code: Select all
static inline uint64_t rook(int sq, uint64_t occ) {
uint64_t m = occ | rook_masks[sq];
switch (sq) {
case 0: return (attack_table + 84)[((m << 14) + (m << 39) + (m << 55)) >> 46];
case 1: return (attack_table + 87)[((m << 13) + (m << 36) + (m << 54)) >> 46];
case 2: return (attack_table + 78)[((m << 12) + (m << 37) + (m << 55)) >> 46];
case 3: return (attack_table + 60)[((m << 11) + (m << 36) + (m << 55)) >> 46];
case 4: return (attack_table + 66)[((m << 10) + (m << 35) + (m << 55)) >> 46];
case 5: return (attack_table + 72)[((m << 9) + (m << 34) + (m << 55)) >> 46];
case 6: return (attack_table + 0)[((m << 8) + (m << 33) + (m << 56)) >> 46];
case 7: return (attack_table + 21)[((m << 7) + (m << 32) + (m << 55)) >> 46];
case 8: return (attack_table + 111)[((m << 13) + (m << 30) + (m << 46)) >> 46];
case 9: return (attack_table + 390)[((m << 13) + (m << 31) + (m << 47)) >> 46];
case 10: return (attack_table + 102)[((m << 13) + (m << 28) + (m << 47)) >> 46];
case 11: return (attack_table + 75)[((m << 10) + (m << 27) + (m << 46)) >> 46];
case 12: return (attack_table + 57)[((m << 9) + (m << 26) + (m << 46)) >> 46];
case 13: return (attack_table + 30)[((m << 8) + (m << 25) + (m << 46)) >> 46];
case 14: return (attack_table + 0)[((m << 0) + (m << 25) + (m << 49)) >> 46];
case 15: return (attack_table + 9)[((m << 7) + (m << 30) + (m << 46)) >> 46];
case 16: return (attack_table + 96)[((m << 14) + (m << 39) + (m << 55)) >> 46];
case 17: return (attack_table + 168)[((m << 13) + (m << 38) + (m << 52)) >> 46];
case 18: return (attack_table + 1056)[((m << 4) + (m << 21) + (m << 40)) >> 46];
case 19: return (attack_table + 516)[((m << 3) + (m << 22) + (m << 41)) >> 46];
case 20: return (attack_table + 213)[((m << 10) + (m << 20) + (m << 39)) >> 46];
case 21: return (attack_table + 126)[((m << 2) + (m << 20) + (m << 41)) >> 46];
case 22: return (attack_table + 216)[((m << 0) + (m << 20) + (m << 42)) >> 46];
case 23: return (attack_table + 108)[((m << 0) + (m << 15) + (m << 41)) >> 46];
case 24: return (attack_table + 24)[((m << 13) + (m << 30) + (m << 46)) >> 46];
case 25: return (attack_table + 15)[((m << 11) + (m << 29) + (m << 52)) >> 46];
case 26: return (attack_table + 0)[((m << 12) + (m << 31) + (m << 53)) >> 46];
case 27: return (attack_table + 12)[((m << 11) + (m << 31) + (m << 52)) >> 46];
case 28: return (attack_table + 114)[((m << 10) + (m << 31) + (m << 47)) >> 46];
case 29: return (attack_table + 186)[((m << 9) + (m << 31) + (m << 42)) >> 46];
case 30: return (attack_table + 576)[((m << 8) + (m << 31) + (m << 47)) >> 46];
case 31: return (attack_table + 39)[((m << 8) + (m << 31) + (m << 53)) >> 46];
case 32: return (attack_table + 417)[((m << 7) + (m << 23) + (m << 46)) >> 46];
case 33: return (attack_table + 993)[((m << 5) + (m << 23) + (m << 46)) >> 46];
case 34: return (attack_table + 381)[((m << 5) + (m << 23) + (m << 44)) >> 46];
case 35: return (attack_table + 114)[((m << 3) + (m << 23) + (m << 44)) >> 46];
case 36: return (attack_table + 102)[((m << 2) + (m << 25) + (m << 37)) >> 46];
case 37: return (attack_table + 96)[((m << 3) + (m << 25) + (m << 44)) >> 46];
case 38: return (attack_table + 51)[((m << 2) + (m << 24) + (m << 41)) >> 46];
case 39: return (attack_table + 90)[((m << 6) + (m << 22) + (m << 31)) >> 46];
case 40: return (attack_table + 6)[((m << 15) + (m << 38) + (m << 53)) >> 46];
case 41: return (attack_table + 27)[((m << 14) + (m << 37) + (m << 61)) >> 46];
case 42: return (attack_table + 324)[((m << 13) + (m << 19) + (m << 36)) >> 46];
case 43: return (attack_table + 2805)[((m << 10) + (m << 28) + (m << 35)) >> 46];
case 44: return (attack_table + 633)[((m << 1) + (m << 18) + (m << 38)) >> 46];
case 45: return (attack_table + 5166)[((m << 2) + (m << 17) + (m << 36)) >> 46];
case 46: return (attack_table + 1068)[((m << 3) + (m << 17) + (m << 32)) >> 46];
case 47: return (attack_table + 174)[((m << 0) + (m << 17) + (m << 31)) >> 46];
case 48: return (attack_table + 3)[((m << 6) + (m << 29) + (m << 46)) >> 46];
case 49: return (attack_table + 0)[((m << 6) + (m << 29) + (m << 52)) >> 46];
case 50: return (attack_table + 0)[((m << 6) + (m << 28) + (m << 51)) >> 46];
case 51: return (attack_table + 0)[((m << 6) + (m << 27) + (m << 50)) >> 46];
case 52: return (attack_table + 3)[((m << 6) + (m << 26) + (m << 51)) >> 46];
case 53: return (attack_table + 3)[((m << 7) + (m << 25) + (m << 50)) >> 46];
case 54: return (attack_table + 12)[((m << 8) + (m << 33) + (m << 56)) >> 46];
case 55: return (attack_table + 12)[((m << 7) + (m << 32) + (m << 53)) >> 46];
case 56: return (attack_table + 0)[((m << 0) + (m << 14) + (m << 39)) >> 46];
case 57: return (attack_table + 45)[((m << 0) + (m << 13) + (m << 38)) >> 46];
case 58: return (attack_table + 33)[((m << 0) + (m << 12) + (m << 37)) >> 46];
case 59: return (attack_table + 12)[((m << 0) + (m << 11) + (m << 36)) >> 46];
case 60: return (attack_table + 6)[((m << 0) + (m << 10) + (m << 35)) >> 46];
case 61: return (attack_table + 27)[((m << 0) + (m << 9) + (m << 34)) >> 46];
case 62: return (attack_table + 45)[((m << 0) + (m << 25) + (m << 48)) >> 46];
case 63: return (attack_table + 54)[((m << 0) + (m << 15) + (m << 33)) >> 46];
default: unreachable();
}
}
This is the explicit form where we see the multiplication in Init() (check the github) and the results here.
Also it is time to publish my performance test of current MSVC as of 2023 vs Clang for Computerchess movegenerators:
Here is the raw data of every movegenerators performance in MSVC vs Clang. Copy paste to excel if you want to analyse this.
Code: Select all
Name CLANG MSVC Clang vs MSVC
Correlation Magic 380,235084 383,455187 -0,839759927
Spectral Magic 165,823542 140,443389 18,07144728
Spectral Magic Switch 138,425403 78,232906 76,9401267
SBAMG o^(o-3cbn) 291,481517 207,833811 40,24740036
SBAMG Inline 384,061423 132,008845 190,9361286
GaloisField - AVX512 4,550884 4,942163 -7,917160968
Gigantua Rotation Algo 591,667773 458,393529 29,07419838
Hyperbola Quintessence 786,918095 234,588145 235,4466591
Hyperbola Quintessence 553,289165 85,048851 550,5545442
Genetic 8 Ray 84,806722 41,185067 105,9161929
Bitrotation 117,713879 38,867366 202,8604485
Binary Neural Network 34,035661 48,590727 -29,95441085
Exploding Bitboards 74,24909 64,411367 15,27327156
Reference (Switch Look 57,78067 35,884307 61,01932803
AVX Branchless Shift 271,728106 194,76915 39,51290849
Pext Emulated 71,338672 64,783951 10,11781606
Dumb7 Fill 97,631418 66,632484 46,52225482
Kogge-Stone 481,000789 111,765852 330,3647137
Rotated Bitboards 46,585472 44,765362 4,065889158
QBBEngine 314,724898 176,489587 78,32491047
QBBEngine - Shifted Ma 318,546483 167,736265 89,90913086
Classical Bob-Mike 309,387596 202,548602 52,74733716
Advanced Bob-Mike 333,170104 212,279071 56,94910592
Leorik 301,119182 220,354676 36,65204999
Leorik Inline 320,125672 113,105453 183,032925
Obstruction Difference 291,878385 242,44305 20,39049377
Obstruction Difference 287,345546 92,558469 210,4476004
Genetic Obstruction Di 367,765361 272,002263 35,20672841
Genetic Obstruction Di 387,127914 313,839211 23,35230922
Slide Arithmetic 255,623887 249,682903 2,379411617
Slide Arithmetic Inlin 112,280204 110,186908 1,899768346
Kindergarten 439,524489 436,557944 0,679530642
SISSY Bitboards 223,645943 204,062476 9,596799659
Kindergarten Super SIS 447,212391 510,440204 -12,38691868
Fancy Magic BB - Varia 534,579364 389,204 37,35197069
FoldingHash - 4x fancy 312,658241 218,534971 43,07011805
Plain Magic BB 491,756405 504,557415 -2,537076975
Black Magic BB - Fixed 638,793243 520,870409 22,63957252
Pext constexpr 866,371859 763,187887 13,52012706
HyperCube 89,265628 69,692085 28,08574747
AVERAGE 306,905654 210,6734077 45,67840211
Here is the graph. Keep in mind that 100 means that clang produces code that is TWICE as fast. and even the run of the mill 30% uplift is non trivial. The aggregate averate of all movegens is 45,67% uplift for Clang++-16. Compiler flags used: clang++ -std=c++20 -O3 -march=native

If you have Zen4 contact me.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 20
- Joined: Fri Mar 10, 2023 9:33 am
- Full name: Martin Novák
Re: Comparison of all known Sliding lookup algorithms
Hi Daniel,
could you also add Split Pext into your Comparison framework? Here is a link for code https://gist.github.com/martinnovaak/b9 ... 1001b13fe8.
And this is link https://www.talkchess.com/forum3/viewto ... 10#p949575 for thread where the algorithm was discussed.
Martin
could you also add Split Pext into your Comparison framework? Here is a link for code https://gist.github.com/martinnovaak/b9 ... 1001b13fe8.
And this is link https://www.talkchess.com/forum3/viewto ... 10#p949575 for thread where the algorithm was discussed.
Martin
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
Another algorithm comes into the repo. Split Pext.
It technically existed in there for over 1 year but the C# to C++ translation had a bug and was disabled. Now it works and even gets an improvement suggested by our friend Mike Sherwin.
Outcome:
Around the same speed as Magic BBs and 20% slower than PEXT at 1/7th the memory cost. For engine perft with cache pressure you would need to test there.
Outcome for this line of code:
Interesting Trivia: I have all these algos in my head and I knew other algos use a different horizontal getter which I assumed were fastest.
A few years back the fastest horizontal code was known as this, and this is what hyperbola uses:
I tested this and its slower than the approach in the repo which is this:
So in summary:
3xPext + 1x Hyperbola horizontal getter = 571 MNPS, Naive 4x PEXT = 591 MNPS, 3xPext + 1x Kindergarten = 615 MNPS.
Final summary: Slightly slower than magic and pext but 1/7th the memory.
Wait until I show Gigantua V2 movegen (or engine, depends on how fast NNUE runs in cuda)
It technically existed in there for over 1 year but the C# to C++ translation had a bug and was disabled. Now it works and even gets an improvement suggested by our friend Mike Sherwin.
Outcome:
Around the same speed as Magic BBs and 20% slower than PEXT at 1/7th the memory cost. For engine perft with cache pressure you would need to test there.
Code: Select all
Fancy Magic BB - Variable shift 532.018585 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic 318.205335 6468 [50kb] none no Daniel Inf�hr tbd
Plain Magic BB 478.390423 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 654.241259 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr 824.616122 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
Split Pext 615.332063 16608 [129kb] pext_u64 no Thomas Jahn https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79049&start=340
HyperCube 92.042865 107680 [841kb] none yes Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Code: Select all
// horizontal pext is probably slower than kindergarten horizontal
// Daniel says: Confirmed and pext will be slower than kindergarten lookup on other platforms as well.
return horizontal_subset[square][_pext_u64(occupancy, horizontal_mask[square])];
Interesting Trivia: I have all these algos in my head and I knew other algos use a different horizontal getter which I assumed were fastest.
A few years back the fastest horizontal code was known as this, and this is what hyperbola uses:
Code: Select all
static constexpr uint64_t horizontal_attack(uint64_t pieces, uint32_t x) {
uint32_t file_mask = x & 7;
uint32_t rank_mask = x & 56;
uint64_t o = (pieces >> rank_mask) & 126;
return ((uint64_t)rank_attack[o * 4 + file_mask]) << rank_mask;
}
Code: Select all
horizontal_subset[square][(occupancy >> horizontal_shift_table[square]) & 63];
3xPext + 1x Hyperbola horizontal getter = 571 MNPS, Naive 4x PEXT = 591 MNPS, 3xPext + 1x Kindergarten = 615 MNPS.
Final summary: Slightly slower than magic and pext but 1/7th the memory.
Wait until I show Gigantua V2 movegen (or engine, depends on how fast NNUE runs in cuda)

Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 2
- Joined: Wed Aug 09, 2023 11:20 pm
- Full name: Julien Dehaes
Re: Comparison of all known Sliding lookup algorithms
Hello Daniel, first of all, thanks for your great work. You'll find hereafter the results for a old Intel hardware. I'll post other results from older Intel machines in the near future. BTW, would you also have interest for results for ARM processors or before-Ryzen AMD processors?
Hope this helps,
Best regards,
Julien
Hope this helps,
Best regards,
Julien
Code: Select all
$ ./movegen_compare
Verify Engines...OK!
Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
Million Lookups/s Random Squares, Random Occupation, Transposition Table Cache Pressure Simulation:
Name Performance [MQueens/s] Tablesize Dependencies Template Author Reference
Correlation Magic 230.548117 294976 [2304kb] imul64 no Daniel Inf�hr https://www.talkchess.com/forum3/viewtopic.php?f=7&t=82224#p949688
Spectral Magic 93.106684 294976 [2304kb] none no Daniel Inf�hr https://www.talkchess.com/forum3/viewtopic.php?f=7&t=82224#p949688
Spectral Magic Switched 55.550977 274773 [2146kb] none no Daniel Inf�hr https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=160
SBAMG o^(o-3cbn) 136.710128 576 [4kb] countl_zero, bswap yes Syed Fahad http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline 195.245299 0 [0kb] countl_zero, bswap yes Syed Fahad and Daniel Inf�hr http://www.talkchess.com/forum3/viewtopic.php?t=59845
GaloisField - AVX512 3.149912 0 [0kb] AVX512F_GFNI no Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=81335
Hyperbola Quintessence o^(o-2r) 212.713752 256 [2kb] bswap no Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline 259.804464 0 [0kb] bswap yes Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Genetic 8 Ray 71.250432 0 [0kb] bswap no Daniel Inf�hr (dangi12012) Abstract C++ Syntax Tree Sifter (c) Daniel Infuehr
Bitrotation 57.089636 0 [0kb] ReverseBits no Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=20
Binary Neural Network 16.845716 5852 [45kb] pdep_u64, AVX2 no Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
Exploding Bitboards 38.731858 768 [6kb] imul64 no Harald L��en http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup) 47.126966 0 [0kb] none yes Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift 115.677512 0 [0kb] AVX2 no Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=60
Pext Emulated 41.512009 107904 [843kb] none no Zach Wegner https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill 125.971578 0 [0kb] none no Gunnar Andersson https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone 218.000741 0 [0kb] none no Peter M. Kogge, Harold S. Stone https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards 23.080858 1848 [14kb] none no Robert Hyatt https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine 153.420210 0 [0kb] countr_zero, countl_zero yes Fabio Gobbato https://www.chessprogramming.org/QBBEngine
QBBEngine - Shifted Mask 146.027925 0 [0kb] countr_zero, countl_zero no Fabio Gobbato http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=90#p924623
Classical Bob-Mike 147.223936 1024 [8kb] countr_zero, countl_zero yes Robert Hyatt and Michael Sherwin https://www.chessprogramming.org/Classical_Approach
Advanced Bob-Mike 161.354347 520 [4kb] countr_zero, countl_zero no Michael Sherwin and Daniel Inf�hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=50#p924653
Leorik 149.322507 128 [1kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Leorik Inline 147.319707 0 [0kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Obstruction Difference 148.202371 768 [6kb] countl_zero no Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline 141.345824 0 [0kb] countl_zero yes Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference 161.427833 384 [3kb] countl_zero no Daniel Inf�hr and Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Genetic Obstruction Difference V2 179.270111 768 [6kb] countl_zero no Daniel Inf�hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Slide Arithmetic 146.628112 256 [2kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inf�hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline 114.738006 0 [0kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inf�hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten 265.159355 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards 86.191465 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Kindergarten Super SISSY Bitboards 272.379785 16672 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Fancy Magic BB - Variable shift 262.772526 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic 165.183501 6468 [50kb] none no Daniel Inf�hr tbd
Plain Magic BB 221.416569 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 302.002876 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr 365.125694 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
Split Pext 361.105356 16608 [129kb] pext_u64 no Group Idea https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79049&start=340
HyperCube 32.622309 107680 [841kb] none yes Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Code: Select all
$ cat /proc/cpuinfo | tail -n26
vendor_id : GenuineIntel
cpu family : 6
model : 61
model name : Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
stepping : 4
microcode : 0x22
cpu MHz : 2305.209
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap intel_pt xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds mmio_unknown
bogomips : 4589.04
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
-
- Posts: 2
- Joined: Wed Aug 09, 2023 11:20 pm
- Full name: Julien Dehaes
Re: Comparison of all known Sliding lookup algorithms
Hello, you'll find results for another older intel PC. I had to disable the Split Pext algorithm due to 3 errors (one for each _pext_u64) when compiling on that machine without BMI2 support.
Hope this helps,
Best regards,
Hope this helps,
Best regards,
Code: Select all
$ ./movegen_compare
Verify Engines...OK!
Intel(R) Atom(TM) CPU N450 @ 1.66GHz
Million Lookups/s Random Squares, Random Occupation, Transposition Table Cache Pressure Simulation:
Name Performance [MQueens/s] Tablesize Dependencies Template Author Reference
Correlation Magic 8.308243 294976 [2304kb] imul64 no Daniel Inf hr https://www.talkchess.com/forum3/viewtopic.php?f=7&t=82224#p949688
Spectral Magic 7.434693 294976 [2304kb] none no Daniel Inf hr https://www.talkchess.com/forum3/viewtopic.php?f=7&t=82224#p949688
Spectral Magic Switched 5.665985 274773 [2146kb] none no Daniel Inf hr https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=160
SBAMG o^(o-3cbn) 22.926894 576 [4kb] countl_zero, bswap yes Syed Fahad http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline 14.172273 0 [0kb] countl_zero, bswap yes Syed Fahad and Daniel Inf hr http://www.talkchess.com/forum3/viewtopic.php?t=59845
GaloisField - AVX512 0.389438 0 [0kb] AVX512F_GFNI no Daniel Inf hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=81335
Hyperbola Quintessence o^(o-2r) 32.793463 256 [2kb] bswap no Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline 10.564876 0 [0kb] bswap yes Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Genetic 8 Ray 7.372978 0 [0kb] bswap no Daniel Inf hr (dangi12012) Abstract C++ Syntax Tree Sifter (c) Daniel Infuehr
Bitrotation 2.480609 0 [0kb] ReverseBits no Daniel Inf hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=20
Binary Neural Network 0.087823 5852 [45kb] pdep_u64, AVX2 no Daniel Inf hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
Exploding Bitboards 7.576257 768 [6kb] imul64 no Harald L á en http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup) 15.936083 0 [0kb] none yes Daniel Inf hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift 6.843997 0 [0kb] AVX2 no Daniel Inf hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=60
Pext Emulated 6.888965 107904 [843kb] none no Zach Wegner https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill 15.652464 0 [0kb] none no Gunnar Andersson https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone 25.938833 0 [0kb] none no Peter M. Kogge, Harold S. Stone https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards 10.435298 1848 [14kb] none no Robert Hyatt https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine 7.683107 0 [0kb] countr_zero, countl_zero yes Fabio Gobbato https://www.chessprogramming.org/QBBEngine
QBBEngine - Shifted Mask 7.548652 0 [0kb] countr_zero, countl_zero no Fabio Gobbato http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=90#p924623
Classical Bob-Mike 7.945425 1024 [8kb] countr_zero, countl_zero yes Robert Hyatt and Michael Sherwin https://www.chessprogramming.org/Classical_Approach
Advanced Bob-Mike 8.939939 520 [4kb] countr_zero, countl_zero no Michael Sherwin and Daniel Inf hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=50#p924653
Leorik 14.237092 128 [1kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Leorik Inline 11.753295 0 [0kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Obstruction Difference 13.768134 768 [6kb] countl_zero no Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline 10.471520 0 [0kb] countl_zero yes Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference 14.465473 384 [3kb] countl_zero no Daniel Inf hr and Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Genetic Obstruction Difference V2 14.531934 768 [6kb] countl_zero no Daniel Inf hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Slide Arithmetic 14.093006 256 [2kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inf hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline 11.584118 0 [0kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inf hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten 14.423311 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards 2.478371 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Kindergarten Super SISSY Bitboards 12.489517 16672 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Fancy Magic BB - Variable shift 12.879612 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic 17.916294 6468 [50kb] none no Daniel Inf hr tbd
Plain Magic BB 5.829206 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 15.015647 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr 6.892822 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube 6.329301 107680 [841kb] none yes Daniel Inf hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Code: Select all
$ cat /proc/cpuinfo | tail -n26
vendor_id : GenuineIntel
cpu family : 6
model : 28
model name : Intel(R) Atom(TM) CPU N450 @ 1.66GHz
stepping : 10
microcode : 0x107
cpu MHz : 1611.148
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
bugs :
bogomips : 3325.06
clflush size : 64
cache_alignment : 64
address sizes : 32 bits physical, 48 bits virtual
power management: