Comparison of all known Sliding lookup algorithms

Discussion of chess software programming and technical issues.

Moderator: Ras

Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Comparison of all known Sliding lookup algorithms

Post by Mike Sherwin »

Got it done for the bishop.
113,089,586 / 108,580,668 = 4% speedup. Not terrific but 4% is better than <= 0%.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Update!

The current project is to find a new Sliding lookup algorithm by "Genetic Programming" in essence just trying out every valid combination of a list of constants, parameters and binary instructions in an abstract syntax tree- this will be a definitive proof that with a list of some instructions no smaller algorithm can be constructed (an algorithm without lookups that is)

It took some time because its not that easy to generate a mixed valid tree of binary operators and unary operators and functions parameters.
But it is running (with performance issues) and goes from a single constant all the way to the highest possible number of binary ops with the highest possible constant / parameter.
The cool thing about this is that the AST tree can print itself the same way it calculates its own output - simple recursion from root.

This is great because we want to have as few instructions as possible with as light instructions as possible (so no 64bit multiply - but some intrinsics like popcnt or bswap)

As you can see below most of these are totally negating themselves etc - but that is the goal! - Exhaustive search to find the optimal algorithm.
I will add the bswap intrinsic next and with a size of N = 9 it should emit hyperbola quiescence in there somewhere.
For now I am trying to solve just the horizontal attacks for all 64 squares.

Who is to say that there is no better algo? I will report back with definite answers :D


Sample:

Code: Select all

(sq >> (mask ^ mask))
(sq >> (mask ^ 1))
(sq >> (mask ^ 2))
(sq >> (mask ^ 3))
(sq >> (mask ^ 56))
(sq >> (1 ^ sq))
(sq >> (1 ^ occ))
(sq >> (1 ^ mask))
(sq >> (1 ^ 1))
(sq >> (1 ^ 2))
(sq >> (1 ^ 3))
(sq >> (1 ^ 56))
(sq >> (2 ^ sq))
(sq >> (2 ^ occ))
(sq >> (2 ^ mask))
(sq >> (2 ^ 1))
(sq >> (2 ^ 2))
(sq >> (2 ^ 3))
(sq >> (2 ^ 56))
(sq >> (3 ^ sq))
(sq >> (3 ^ occ))
(sq >> (3 ^ mask))
(sq >> (3 ^ 1))
(sq >> (3 ^ 2))
(sq >> (3 ^ 3))
(sq >> (3 ^ 56))
(sq >> (56 ^ sq))
(sq >> (56 ^ occ))
(sq >> (56 ^ mask))
(sq >> (56 ^ 1))
(sq >> (56 ^ 2))
(sq >> (56 ^ 3))
(sq >> (56 ^ 56))
(occ >> (sq ^ sq))
(occ >> (sq ^ occ))
(occ >> (sq ^ mask))
Exhaustive list with 3 tokens:

Code: Select all

sq
occ
mask
1
2
3
56
(sq | sq)
(sq | occ)
(sq | mask)
(sq | 1)
(sq | 2)
(sq | 3)
(sq | 56)
(occ | sq)
(occ | occ)
(occ | mask)
(occ | 1)
(occ | 2)
(occ | 3)
(occ | 56)
(mask | sq)
(mask | occ)
(mask | mask)
(mask | 1)
(mask | 2)
(mask | 3)
(mask | 56)
(1 | sq)
(1 | occ)
(1 | mask)
(1 | 1)
(1 | 2)
(1 | 3)
(1 | 56)
(2 | sq)
(2 | occ)
(2 | mask)
(2 | 1)
(2 | 2)
(2 | 3)
(2 | 56)
(3 | sq)
(3 | occ)
(3 | mask)
(3 | 1)
(3 | 2)
(3 | 3)
(3 | 56)
(56 | sq)
(56 | occ)
(56 | mask)
(56 | 1)
(56 | 2)
(56 | 3)
(56 | 56)
(sq ^ sq)
(sq ^ occ)
(sq ^ mask)
(sq ^ 1)
(sq ^ 2)
(sq ^ 3)
(sq ^ 56)
(occ ^ sq)
(occ ^ occ)
(occ ^ mask)
(occ ^ 1)
(occ ^ 2)
(occ ^ 3)
(occ ^ 56)
(mask ^ sq)
(mask ^ occ)
(mask ^ mask)
(mask ^ 1)
(mask ^ 2)
(mask ^ 3)
(mask ^ 56)
(1 ^ sq)
(1 ^ occ)
(1 ^ mask)
(1 ^ 1)
(1 ^ 2)
(1 ^ 3)
(1 ^ 56)
(2 ^ sq)
(2 ^ occ)
(2 ^ mask)
(2 ^ 1)
(2 ^ 2)
(2 ^ 3)
(2 ^ 56)
(3 ^ sq)
(3 ^ occ)
(3 ^ mask)
(3 ^ 1)
(3 ^ 2)
(3 ^ 3)
(3 ^ 56)
(56 ^ sq)
(56 ^ occ)
(56 ^ mask)
(56 ^ 1)
(56 ^ 2)
(56 ^ 3)
(56 ^ 56)
(sq << sq)
(sq << occ)
(sq << mask)
(sq << 1)
(sq << 2)
(sq << 3)
(sq << 56)
(occ << sq)
(occ << occ)
(occ << mask)
(occ << 1)
(occ << 2)
(occ << 3)
(occ << 56)
(mask << sq)
(mask << occ)
(mask << mask)
(mask << 1)
(mask << 2)
(mask << 3)
(mask << 56)
(1 << sq)
(1 << occ)
(1 << mask)
(1 << 1)
(1 << 2)
(1 << 3)
(1 << 56)
(2 << sq)
(2 << occ)
(2 << mask)
(2 << 1)
(2 << 2)
(2 << 3)
(2 << 56)
(3 << sq)
(3 << occ)
(3 << mask)
(3 << 1)
(3 << 2)
(3 << 3)
(3 << 56)
(56 << sq)
(56 << occ)
(56 << mask)
(56 << 1)
(56 << 2)
(56 << 3)
(56 << 56)
(sq >> sq)
(sq >> occ)
(sq >> mask)
(sq >> 1)
(sq >> 2)
(sq >> 3)
(sq >> 56)
(occ >> sq)
(occ >> occ)
(occ >> mask)
(occ >> 1)
(occ >> 2)
(occ >> 3)
(occ >> 56)
(mask >> sq)
(mask >> occ)
(mask >> mask)
(mask >> 1)
(mask >> 2)
(mask >> 3)
(mask >> 56)
(1 >> sq)
(1 >> occ)
(1 >> mask)
(1 >> 1)
(1 >> 2)
(1 >> 3)
(1 >> 56)
(2 >> sq)
(2 >> occ)
(2 >> mask)
(2 >> 1)
(2 >> 2)
(2 >> 3)
(2 >> 56)
(3 >> sq)
(3 >> occ)
(3 >> mask)
(3 >> 1)
(3 >> 2)
(3 >> 3)
(3 >> 56)
(56 >> sq)
(56 >> occ)
(56 >> mask)
(56 >> 1)
(56 >> 2)
(56 >> 3)
(56 >> 56)
(sq - sq)
(sq - occ)
(sq - mask)
(sq - 1)
(sq - 2)
(sq - 3)
(sq - 56)
(occ - sq)
(occ - occ)
(occ - mask)
(occ - 1)
(occ - 2)
(occ - 3)
(occ - 56)
(mask - sq)
(mask - occ)
(mask - mask)
(mask - 1)
(mask - 2)
(mask - 3)
(mask - 56)
(1 - sq)
(1 - occ)
(1 - mask)
(1 - 1)
(1 - 2)
(1 - 3)
(1 - 56)
(2 - sq)
(2 - occ)
(2 - mask)
(2 - 1)
(2 - 2)
(2 - 3)
(2 - 56)
(3 - sq)
(3 - occ)
(3 - mask)
(3 - 1)
(3 - 2)
(3 - 3)
(3 - 56)
(56 - sq)
(56 - occ)
(56 - mask)
(56 - 1)
(56 - 2)
(56 - 3)
(56 - 56)
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Getting closer and closer... Memory issues fixed - but the number of instructions is too low to find a horizontal solution (yet)
Also optimisation could help a lot.

So I will add Logical AND, popcount, countLZero, countRZero as well as bswap

Code: Select all

Best: 1015744/1048576: (1 << (sq ^ (1 | 1)))
Best: 1015744/1048576: (1 << (1 ^ (sq | sq)))
Best: 1015744/1048576: (2 << (2 ^ (sq | 2)))
Best: 1015744/1048576: (2 << (2 ^ (2 | sq)))
Best: 1015744/1048576: (1 << (sq ^ (2 ^ 3)))
Best: 1015744/1048576: (1 << (sq ^ (3 ^ 2)))
Best: 1015744/1048576: (1 << (2 ^ (sq ^ 3)))
Best: 1015744/1048576: (1 << (2 ^ (3 ^ sq)))
Best: 1015744/1048576: (1 << (3 ^ (sq ^ 2)))
Best: 1015744/1048576: (1 << (3 ^ (2 ^ sq)))
Best: 1015744/1048576: (1 << (sq ^ (2 ^ 3)))
Best: 1015744/1048576: (1 << (sq ^ (3 ^ 2)))
Best: 1015744/1048576: (1 << (2 ^ (sq ^ 3)))
Best: 1015744/1048576: (1 << (2 ^ (3 ^ sq)))
Best: 1015744/1048576: (1 << (3 ^ (sq ^ 2)))
Best: 1015744/1048576: (1 << (3 ^ (2 ^ sq)))
Best: 1023968/1048576: (occ ^ (occ - (2 << sq)))
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
tcusr
Posts: 325
Joined: Tue Aug 31, 2021 10:32 pm
Full name: tcusr

Re: Comparison of all known Sliding lookup algorithms

Post by tcusr »

where is the code for this?
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Small Update:

Optimally the number of trees is the adapted catalan number n = (n + 1) / 2 since we can only have binary trees that are full (every math operator needs a left and right side). (https://oeis.org/A000108) so the number of full binary trees multiplied with M^N where N is the number of unary instructions/parameters slots and M is the number of instructions themselves.

Each vertice in the tree can have a symbol like a
parameter (sq, mask, occupation)
unary instructions (popcount, countlzero, countrzero, bitreverse)
constant (1, 2, 3, 56)

Each branch in the tree can be a function like
<<, >>, |, &, ^, -

And since we want to allow repetition you can see how that will scale with
~11^n * NumberOfFullBinaryTrees(m)

currently I have fixed n + m so that smaller and bigger trees can exist - but the number of tokens always stays constant.
So the scaling to search every tree is:
O(n) ~ M^n * n!
which is possible if we keep M small :D

Of course binary instructions like these could be pruned / skipped:
1 | 1
1 ^ 1
1 - 1

So searching for the optimal sliding piece algorithm really has similarities with chess itself. We are building a search tree and pruning nodes that we know have no value. But in the space of mathematical code tokens (functions, parameters, constants), and not in the space of chess moves.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

New Algorithm - Genetic8Ray!

This one was created by sifting every possible abstract syntax tree that is constructible with 16 tokens or less.
I described it in this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=79701

Check it out if you are interested: https://github.com/Gigantua/Chess_Movegen.git

Now I want to show here in which order the trees are constructed:

Code: Select all

1ull
2ull
56ull
sq
occ
mask
mask
popcount(1ull)
countl_zero(1ull)
countr_zero(1ull)
bit_reverse(1ull)
(1ull | 1ull)
(1ull &~ 1ull)
(1ull ^ 1ull)
(1ull - 1ull)
(1ull << 1ull)
(1ull >> 1ull)
popcount(2ull)
countl_zero(2ull)
countr_zero(2ull)
bit_reverse(2ull)
(2ull | 1ull)
(2ull &~ 1ull)
(2ull ^ 1ull)
(2ull - 1ull)
(2ull << 1ull)
(2ull >> 1ull)
popcount(56ull)
countl_zero(56ull)
countr_zero(56ull)
bit_reverse(56ull)
(56ull | 1ull)
(56ull &~ 1ull)
(56ull ^ 1ull)
(56ull - 1ull)
(56ull << 1ull)
(56ull >> 1ull)
popcount(sq)
.........
popcount(((56ull << 1ull) - 1ull))
countl_zero(((56ull << 1ull) - 1ull))
countr_zero(((56ull << 1ull) - 1ull))
bit_reverse(((56ull << 1ull) - 1ull))
(((56ull << 1ull) - 1ull) | 1ull)
(((56ull << 1ull) - 1ull) &~ 1ull)
As you can see its an exhaustive list of every possibility of legal code starting from just the constants and building up with all unary and binary functions etc. Repetition is still an issue but detecting repetition makes the cost much higher so I live with that the code is only ~30% efficient but nevertheless it can create and execute 16^11 = 1.7592186e+13 trees overnight. This number is non trivial and there is no way to optimize execution since each code is only executed once. (Detecting the subtree 1|1 is more expensive than executing it)

The cool thing is that this really works - and that out of maybe 16^9 trees sometimes only one solves a sliding problem.
The 16 arguments or tokens are totally free to pick and you could have 15 constants and one function etc.

Constants that are not supplied will be generated by the code! So if the algorithm needs a 3 for a solution then it will find the term (2 + 1).
This is a true and beautiful brute force abstract syntax tree searcher that can solve many chess problems optimally.

I think I might share the code via PM to people that are interested.
Feel free to contact me under daniel.infuehr@live.de - I am always interested about chessprogramming discussions.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

New Algorithm - Genetic Obstruction Difference!

With the power of abstract syntax tree sifting - a new algorithm was uncovered! - it is a further improvement of Obstruction Difference.
The file is hosted here among the other algorithms - https://github.com/Gigantua/Chess_Moveg ... onDiff.hpp

Current CLANG-14 performance - single thread

Code: Select all

Million Lookups/s Random Squares, Random Occupation/s:
Name                               Performance [MQueens/s]       Tablesize           Dependencies             Template  Author                                       Reference
SBAMG o^(o-3cbn)                   290.996566                    576     [4kb]       countl_zero, bswap       yes       Syed Fahad                                   http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline                       421.167899                    0       [0kb]       countl_zero, bswap       yes       Syed Fahad and Daniel Inf�hr                 http://www.talkchess.com/forum3/viewtopic.php?t=59845
Hyperbola Quintessence o^(o-2r)    594.571562                    256     [2kb]       bswap                    no        Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline      576.150260                    0       [0kb]       bswap                    yes       Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Genetic 8 Ray                      160.177263                    0       [0kb]       bswap                    no        Daniel Inf�hr (dangi12012)                   Abstract C++ Syntax Tree Sifter (c) Daniel Infuehr
Bitrotation                        153.838067                    0       [0kb]       ReverseBits              no        Daniel Inf�hr (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=20
Binary Neural Network              37.265090                     5852    [45kb]      pdep_u64, AVX2           no        Daniel Inf�hr (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
Exploding Bitboards                76.940142                     768     [6kb]       imul64                   no        Harald L��en                                  http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup)          88.202607                     0       [0kb]       none                     yes       Daniel Inf�hr (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift               280.341737                    0       [0kb]       AVX2                     no        Daniel Inf�hr (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=60
Pext Emulated                      68.503395                     107904  [843kb]     none                     no        Zach Wegner                                  https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill                         300.334122                    0       [0kb]       none                     no        Gunnar Andersson                             https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone                        435.573269                    0       [0kb]       none                     no        Peter M. Kogge, Harold S. Stone              https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards                  46.902647                     1848    [14kb]      none                     no        Robert Hyatt                                 https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine                          350.457054                    0       [0kb]       countr_zero, countl_zero yes       Fabio Gobbato                                https://www.chessprogramming.org/QBBEngine
Classical Bob-Mike                 322.506087                    1024    [8kb]       countr_zero, countl_zero yes       Robert Hyatt and Michael Sherwin             https://www.chessprogramming.org/Classical_Approach
Leorik                             334.393174                    128     [1kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Leorik Inline                      352.132308                    0       [0kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Obstruction Difference             315.880913                    768     [6kb]       countl_zero              no        Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline      330.828228                    0       [0kb]       countl_zero              yes       Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference     361.254275                    0       [0kb]       countl_zero              yes       Daniel Inf�hr and Michael Hoffmann           http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Slide Arithmetic                   259.544758                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inf�hr              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            149.784871                    0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inf�hr              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten                       600.249103                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    263.630811                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Fancy Magic BB - Variable shift    542.456242                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
Plain Magic BB                     572.631097                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       647.353941                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     778.386793                    107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube                          50.815956                     107680  [841kb]     none                     yes       Daniel Inf�hr (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Zooming into the relevant improved algorithm:

Code: Select all

Obstruction Difference             315.880913                    768     [6kb]       countl_zero              no        Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline      330.828228                    0       [0kb]       countl_zero              yes       Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference     361.254275                    0       [0kb]       countl_zero              yes       Daniel Inführ and Michael Hoffmann           http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
These performance numbers may not look too much - but keep in mind that at the beginning of 2021 only the first original Obstruction Difference algorithm existed. The Inline - and now the Genetic improvements are new developments.
Also Obstruction difference is one of the 3 algorithms that are perfect for GPU devices (Bitrotation, QBB, ObstructionDiff). These are algorithms that go beyond 60 Billion lookups per second. I think this will stay the smallest algorithm in terms of number of operations starting from the masks.

It will be interesting to see how much this new code improves performance on the GPU. The interesting part is that the 3 fast algorithms are very different to each other - and yet perform the same task.

Next Steps:
I can see a huge improvement for all of chessprogramming when the currently common form of 0..64 for the source square is replaced by 0...(1ull << 64). I can see that lookup free algorithms mostly dont use the square at all anymore but use the single set bit for square. But creating all 4 or 8 ray masks from a single set bit is yet to be solved.

Happy Easter - 2022
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
Fabio Gobbato
Posts: 219
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Comparison of all known Sliding lookup algorithms

Post by Fabio Gobbato »

A small optimization over the QBB algorithm (I have removed the ^1ULL<<sq in the final mask):

Code: Select all

/* return the bitboard with the rook destinations */
    static constexpr uint64_t Rook(uint64_t sq, uint64_t occupation)
    {
        occupation ^= 1ULL << sq; /* remove the selected piece from the occupation */
        uint64_t piecesup = (0x0101010101010101ULL << sq) & (occupation | 0xFF00000000000000ULL); /* find the pieces up */
        uint64_t piecesdo = (0x8080808080808080ULL >> (63 ^ sq)) & (occupation | 0x00000000000000FFULL); /* find the pieces down */
        uint64_t piecesri = (0x00000000000000FFULL << sq) & (occupation | 0x8080808080808080ULL); /* find pieces on the right */
        uint64_t piecesle = (0xFF00000000000000ULL >> (63 ^ sq)) & (occupation | 0x0101010101010101ULL); /* find pieces on the left */
        return (((0x8080808080808080ULL >> LSB(piecesup)) & (0x0101010101010101ULL << MSB(piecesdo))) ^
                ((0xFF00000000000000ULL >> LSB(piecesri)) & (0x00000000000000FFULL << MSB(piecesle))));
        /* From every direction find the first piece and from that piece put a mask in the opposite direction.
           Put togheter all the 4 masks and remove the moving piece */
    }

    /* return the bitboard with the bishops destinations */
    static constexpr uint64_t Bishop(uint64_t sq, uint64_t occupation)
    {  /* it's the same as the rook */
        occupation ^= 1ULL << sq;
        uint64_t piecesup = (0x8040201008040201ULL << sq) & (occupation | 0xFF80808080808080ULL);
        uint64_t piecesdo = (0x8040201008040201ULL >> (63 ^ sq)) & (occupation | 0x01010101010101FFULL);
        uint64_t piecesle = (0x8102040810204081ULL << sq) & (occupation | 0xFF01010101010101ULL);
        uint64_t piecesri = (0x8102040810204081ULL >> (63 ^ sq)) & (occupation | 0x80808080808080FFULL);
        return (((0x8040201008040201ULL >> LSB(piecesup)) & (0x8040201008040201ULL << MSB(piecesdo))) ^
                ((0x8102040810204081ULL >> LSB(piecesle)) & (0x8102040810204081ULL << MSB(piecesri))));
    }
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Fabio Gobbato wrote: Tue Apr 19, 2022 5:21 pm A small optimization over the QBB algorithm (I have removed the ^1ULL<<sq in the final mask):
Interesting - will take a look!

Sliding piece update:
The new raymasks from this Thread - http://www.talkchess.com/forum3/viewtop ... 0&start=40

have found themselves in this file:
https://github.com/Gigantua/Chess_Moveg ... on.hpp#L45

And improved performance on MSVC by 10%. This code is much longer and more verbose but can increase performance and is the first performant bitboard native algorithm in the comparison codebase. (where square is not in 0..64 form - but in 0..(1ull << 64) form)
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Fabio Gobbato wrote: Tue Apr 19, 2022 5:21 pm A small optimization over the QBB algorithm (I have removed the ^1ULL<<sq in the final mask):
Fabio you will like this one. I found alternate forms for the QBB algorithm that shave off a few instructions.
I will preview it now for the rook - but I will take a week break from talkchess.
Chessprogramming is very addictive to me and is starting to interfere with my work.

Anyways here is a sneak peek of what will be hopefully a slight improvement:

QBB Genetic Rook Preview:

Code: Select all

uint64_t lsb_up = LSB((occ | 0xFF00000000000000ULL) & (0x0101010101010101ULL << sq));
uint64_t msb_do = MSB((occ | 0x00000000000000FFULL) & (0x8080808080808080ULL >> sr));
uint64_t lsb_ri = LSB((occ | 0x8080808080808080ULL) & (0x00000000000000FFULL << sq));
uint64_t msb_le = MSB((occ | 0x0101010101010101ULL) & (0xFF00000000000000ULL >> sr));
uint64_t ab = (((0x0101010101010101ULL << msb_do) << lsb_up) >> lsb_up);
uint64_t cd = (((0x00000000000000FFULL << msb_le) << lsb_ri) >> lsb_ri);
return ab ^ cd; //trick from Fabios post - sine both lines overlap this is perfect
The ab, cd terms now each use one constant and instruction less compared to vanilla QBB!
Anyways - see you in a week!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer