Got it done for the bishop.
113,089,586 / 108,580,668 = 4% speedup. Not terrific but 4% is better than <= 0%.
Comparison of all known Sliding lookup algorithms
Moderator: Ras
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
Update!
The current project is to find a new Sliding lookup algorithm by "Genetic Programming" in essence just trying out every valid combination of a list of constants, parameters and binary instructions in an abstract syntax tree- this will be a definitive proof that with a list of some instructions no smaller algorithm can be constructed (an algorithm without lookups that is)
It took some time because its not that easy to generate a mixed valid tree of binary operators and unary operators and functions parameters.
But it is running (with performance issues) and goes from a single constant all the way to the highest possible number of binary ops with the highest possible constant / parameter.
The cool thing about this is that the AST tree can print itself the same way it calculates its own output - simple recursion from root.
This is great because we want to have as few instructions as possible with as light instructions as possible (so no 64bit multiply - but some intrinsics like popcnt or bswap)
As you can see below most of these are totally negating themselves etc - but that is the goal! - Exhaustive search to find the optimal algorithm.
I will add the bswap intrinsic next and with a size of N = 9 it should emit hyperbola quiescence in there somewhere.
For now I am trying to solve just the horizontal attacks for all 64 squares.
Who is to say that there is no better algo? I will report back with definite answers
Sample:
Exhaustive list with 3 tokens:
The current project is to find a new Sliding lookup algorithm by "Genetic Programming" in essence just trying out every valid combination of a list of constants, parameters and binary instructions in an abstract syntax tree- this will be a definitive proof that with a list of some instructions no smaller algorithm can be constructed (an algorithm without lookups that is)
It took some time because its not that easy to generate a mixed valid tree of binary operators and unary operators and functions parameters.
But it is running (with performance issues) and goes from a single constant all the way to the highest possible number of binary ops with the highest possible constant / parameter.
The cool thing about this is that the AST tree can print itself the same way it calculates its own output - simple recursion from root.
This is great because we want to have as few instructions as possible with as light instructions as possible (so no 64bit multiply - but some intrinsics like popcnt or bswap)
As you can see below most of these are totally negating themselves etc - but that is the goal! - Exhaustive search to find the optimal algorithm.
I will add the bswap intrinsic next and with a size of N = 9 it should emit hyperbola quiescence in there somewhere.
For now I am trying to solve just the horizontal attacks for all 64 squares.
Who is to say that there is no better algo? I will report back with definite answers

Sample:
Code: Select all
(sq >> (mask ^ mask))
(sq >> (mask ^ 1))
(sq >> (mask ^ 2))
(sq >> (mask ^ 3))
(sq >> (mask ^ 56))
(sq >> (1 ^ sq))
(sq >> (1 ^ occ))
(sq >> (1 ^ mask))
(sq >> (1 ^ 1))
(sq >> (1 ^ 2))
(sq >> (1 ^ 3))
(sq >> (1 ^ 56))
(sq >> (2 ^ sq))
(sq >> (2 ^ occ))
(sq >> (2 ^ mask))
(sq >> (2 ^ 1))
(sq >> (2 ^ 2))
(sq >> (2 ^ 3))
(sq >> (2 ^ 56))
(sq >> (3 ^ sq))
(sq >> (3 ^ occ))
(sq >> (3 ^ mask))
(sq >> (3 ^ 1))
(sq >> (3 ^ 2))
(sq >> (3 ^ 3))
(sq >> (3 ^ 56))
(sq >> (56 ^ sq))
(sq >> (56 ^ occ))
(sq >> (56 ^ mask))
(sq >> (56 ^ 1))
(sq >> (56 ^ 2))
(sq >> (56 ^ 3))
(sq >> (56 ^ 56))
(occ >> (sq ^ sq))
(occ >> (sq ^ occ))
(occ >> (sq ^ mask))
Code: Select all
sq
occ
mask
1
2
3
56
(sq | sq)
(sq | occ)
(sq | mask)
(sq | 1)
(sq | 2)
(sq | 3)
(sq | 56)
(occ | sq)
(occ | occ)
(occ | mask)
(occ | 1)
(occ | 2)
(occ | 3)
(occ | 56)
(mask | sq)
(mask | occ)
(mask | mask)
(mask | 1)
(mask | 2)
(mask | 3)
(mask | 56)
(1 | sq)
(1 | occ)
(1 | mask)
(1 | 1)
(1 | 2)
(1 | 3)
(1 | 56)
(2 | sq)
(2 | occ)
(2 | mask)
(2 | 1)
(2 | 2)
(2 | 3)
(2 | 56)
(3 | sq)
(3 | occ)
(3 | mask)
(3 | 1)
(3 | 2)
(3 | 3)
(3 | 56)
(56 | sq)
(56 | occ)
(56 | mask)
(56 | 1)
(56 | 2)
(56 | 3)
(56 | 56)
(sq ^ sq)
(sq ^ occ)
(sq ^ mask)
(sq ^ 1)
(sq ^ 2)
(sq ^ 3)
(sq ^ 56)
(occ ^ sq)
(occ ^ occ)
(occ ^ mask)
(occ ^ 1)
(occ ^ 2)
(occ ^ 3)
(occ ^ 56)
(mask ^ sq)
(mask ^ occ)
(mask ^ mask)
(mask ^ 1)
(mask ^ 2)
(mask ^ 3)
(mask ^ 56)
(1 ^ sq)
(1 ^ occ)
(1 ^ mask)
(1 ^ 1)
(1 ^ 2)
(1 ^ 3)
(1 ^ 56)
(2 ^ sq)
(2 ^ occ)
(2 ^ mask)
(2 ^ 1)
(2 ^ 2)
(2 ^ 3)
(2 ^ 56)
(3 ^ sq)
(3 ^ occ)
(3 ^ mask)
(3 ^ 1)
(3 ^ 2)
(3 ^ 3)
(3 ^ 56)
(56 ^ sq)
(56 ^ occ)
(56 ^ mask)
(56 ^ 1)
(56 ^ 2)
(56 ^ 3)
(56 ^ 56)
(sq << sq)
(sq << occ)
(sq << mask)
(sq << 1)
(sq << 2)
(sq << 3)
(sq << 56)
(occ << sq)
(occ << occ)
(occ << mask)
(occ << 1)
(occ << 2)
(occ << 3)
(occ << 56)
(mask << sq)
(mask << occ)
(mask << mask)
(mask << 1)
(mask << 2)
(mask << 3)
(mask << 56)
(1 << sq)
(1 << occ)
(1 << mask)
(1 << 1)
(1 << 2)
(1 << 3)
(1 << 56)
(2 << sq)
(2 << occ)
(2 << mask)
(2 << 1)
(2 << 2)
(2 << 3)
(2 << 56)
(3 << sq)
(3 << occ)
(3 << mask)
(3 << 1)
(3 << 2)
(3 << 3)
(3 << 56)
(56 << sq)
(56 << occ)
(56 << mask)
(56 << 1)
(56 << 2)
(56 << 3)
(56 << 56)
(sq >> sq)
(sq >> occ)
(sq >> mask)
(sq >> 1)
(sq >> 2)
(sq >> 3)
(sq >> 56)
(occ >> sq)
(occ >> occ)
(occ >> mask)
(occ >> 1)
(occ >> 2)
(occ >> 3)
(occ >> 56)
(mask >> sq)
(mask >> occ)
(mask >> mask)
(mask >> 1)
(mask >> 2)
(mask >> 3)
(mask >> 56)
(1 >> sq)
(1 >> occ)
(1 >> mask)
(1 >> 1)
(1 >> 2)
(1 >> 3)
(1 >> 56)
(2 >> sq)
(2 >> occ)
(2 >> mask)
(2 >> 1)
(2 >> 2)
(2 >> 3)
(2 >> 56)
(3 >> sq)
(3 >> occ)
(3 >> mask)
(3 >> 1)
(3 >> 2)
(3 >> 3)
(3 >> 56)
(56 >> sq)
(56 >> occ)
(56 >> mask)
(56 >> 1)
(56 >> 2)
(56 >> 3)
(56 >> 56)
(sq - sq)
(sq - occ)
(sq - mask)
(sq - 1)
(sq - 2)
(sq - 3)
(sq - 56)
(occ - sq)
(occ - occ)
(occ - mask)
(occ - 1)
(occ - 2)
(occ - 3)
(occ - 56)
(mask - sq)
(mask - occ)
(mask - mask)
(mask - 1)
(mask - 2)
(mask - 3)
(mask - 56)
(1 - sq)
(1 - occ)
(1 - mask)
(1 - 1)
(1 - 2)
(1 - 3)
(1 - 56)
(2 - sq)
(2 - occ)
(2 - mask)
(2 - 1)
(2 - 2)
(2 - 3)
(2 - 56)
(3 - sq)
(3 - occ)
(3 - mask)
(3 - 1)
(3 - 2)
(3 - 3)
(3 - 56)
(56 - sq)
(56 - occ)
(56 - mask)
(56 - 1)
(56 - 2)
(56 - 3)
(56 - 56)
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
Getting closer and closer... Memory issues fixed - but the number of instructions is too low to find a horizontal solution (yet)
Also optimisation could help a lot.
So I will add Logical AND, popcount, countLZero, countRZero as well as bswap
Also optimisation could help a lot.
So I will add Logical AND, popcount, countLZero, countRZero as well as bswap
Code: Select all
Best: 1015744/1048576: (1 << (sq ^ (1 | 1)))
Best: 1015744/1048576: (1 << (1 ^ (sq | sq)))
Best: 1015744/1048576: (2 << (2 ^ (sq | 2)))
Best: 1015744/1048576: (2 << (2 ^ (2 | sq)))
Best: 1015744/1048576: (1 << (sq ^ (2 ^ 3)))
Best: 1015744/1048576: (1 << (sq ^ (3 ^ 2)))
Best: 1015744/1048576: (1 << (2 ^ (sq ^ 3)))
Best: 1015744/1048576: (1 << (2 ^ (3 ^ sq)))
Best: 1015744/1048576: (1 << (3 ^ (sq ^ 2)))
Best: 1015744/1048576: (1 << (3 ^ (2 ^ sq)))
Best: 1015744/1048576: (1 << (sq ^ (2 ^ 3)))
Best: 1015744/1048576: (1 << (sq ^ (3 ^ 2)))
Best: 1015744/1048576: (1 << (2 ^ (sq ^ 3)))
Best: 1015744/1048576: (1 << (2 ^ (3 ^ sq)))
Best: 1015744/1048576: (1 << (3 ^ (sq ^ 2)))
Best: 1015744/1048576: (1 << (3 ^ (2 ^ sq)))
Best: 1023968/1048576: (occ ^ (occ - (2 << sq)))
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 325
- Joined: Tue Aug 31, 2021 10:32 pm
- Full name: tcusr
Re: Comparison of all known Sliding lookup algorithms
where is the code for this?
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
Small Update:
Optimally the number of trees is the adapted catalan number n = (n + 1) / 2 since we can only have binary trees that are full (every math operator needs a left and right side). (https://oeis.org/A000108) so the number of full binary trees multiplied with M^N where N is the number of unary instructions/parameters slots and M is the number of instructions themselves.
Each vertice in the tree can have a symbol like a
parameter (sq, mask, occupation)
unary instructions (popcount, countlzero, countrzero, bitreverse)
constant (1, 2, 3, 56)
Each branch in the tree can be a function like
<<, >>, |, &, ^, -
And since we want to allow repetition you can see how that will scale with
~11^n * NumberOfFullBinaryTrees(m)
currently I have fixed n + m so that smaller and bigger trees can exist - but the number of tokens always stays constant.
So the scaling to search every tree is:
O(n) ~ M^n * n!
which is possible if we keep M small
Of course binary instructions like these could be pruned / skipped:
1 | 1
1 ^ 1
1 - 1
So searching for the optimal sliding piece algorithm really has similarities with chess itself. We are building a search tree and pruning nodes that we know have no value. But in the space of mathematical code tokens (functions, parameters, constants), and not in the space of chess moves.
Optimally the number of trees is the adapted catalan number n = (n + 1) / 2 since we can only have binary trees that are full (every math operator needs a left and right side). (https://oeis.org/A000108) so the number of full binary trees multiplied with M^N where N is the number of unary instructions/parameters slots and M is the number of instructions themselves.
Each vertice in the tree can have a symbol like a
parameter (sq, mask, occupation)
unary instructions (popcount, countlzero, countrzero, bitreverse)
constant (1, 2, 3, 56)
Each branch in the tree can be a function like
<<, >>, |, &, ^, -
And since we want to allow repetition you can see how that will scale with
~11^n * NumberOfFullBinaryTrees(m)
currently I have fixed n + m so that smaller and bigger trees can exist - but the number of tokens always stays constant.
So the scaling to search every tree is:
O(n) ~ M^n * n!
which is possible if we keep M small

Of course binary instructions like these could be pruned / skipped:
1 | 1
1 ^ 1
1 - 1
So searching for the optimal sliding piece algorithm really has similarities with chess itself. We are building a search tree and pruning nodes that we know have no value. But in the space of mathematical code tokens (functions, parameters, constants), and not in the space of chess moves.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
New Algorithm - Genetic8Ray!
This one was created by sifting every possible abstract syntax tree that is constructible with 16 tokens or less.
I described it in this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=79701
Check it out if you are interested: https://github.com/Gigantua/Chess_Movegen.git
Now I want to show here in which order the trees are constructed:
As you can see its an exhaustive list of every possibility of legal code starting from just the constants and building up with all unary and binary functions etc. Repetition is still an issue but detecting repetition makes the cost much higher so I live with that the code is only ~30% efficient but nevertheless it can create and execute 16^11 = 1.7592186e+13 trees overnight. This number is non trivial and there is no way to optimize execution since each code is only executed once. (Detecting the subtree 1|1 is more expensive than executing it)
The cool thing is that this really works - and that out of maybe 16^9 trees sometimes only one solves a sliding problem.
The 16 arguments or tokens are totally free to pick and you could have 15 constants and one function etc.
Constants that are not supplied will be generated by the code! So if the algorithm needs a 3 for a solution then it will find the term (2 + 1).
This is a true and beautiful brute force abstract syntax tree searcher that can solve many chess problems optimally.
I think I might share the code via PM to people that are interested.
Feel free to contact me under daniel.infuehr@live.de - I am always interested about chessprogramming discussions.
This one was created by sifting every possible abstract syntax tree that is constructible with 16 tokens or less.
I described it in this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=79701
Check it out if you are interested: https://github.com/Gigantua/Chess_Movegen.git
Now I want to show here in which order the trees are constructed:
Code: Select all
1ull
2ull
56ull
sq
occ
mask
mask
popcount(1ull)
countl_zero(1ull)
countr_zero(1ull)
bit_reverse(1ull)
(1ull | 1ull)
(1ull &~ 1ull)
(1ull ^ 1ull)
(1ull - 1ull)
(1ull << 1ull)
(1ull >> 1ull)
popcount(2ull)
countl_zero(2ull)
countr_zero(2ull)
bit_reverse(2ull)
(2ull | 1ull)
(2ull &~ 1ull)
(2ull ^ 1ull)
(2ull - 1ull)
(2ull << 1ull)
(2ull >> 1ull)
popcount(56ull)
countl_zero(56ull)
countr_zero(56ull)
bit_reverse(56ull)
(56ull | 1ull)
(56ull &~ 1ull)
(56ull ^ 1ull)
(56ull - 1ull)
(56ull << 1ull)
(56ull >> 1ull)
popcount(sq)
.........
popcount(((56ull << 1ull) - 1ull))
countl_zero(((56ull << 1ull) - 1ull))
countr_zero(((56ull << 1ull) - 1ull))
bit_reverse(((56ull << 1ull) - 1ull))
(((56ull << 1ull) - 1ull) | 1ull)
(((56ull << 1ull) - 1ull) &~ 1ull)
The cool thing is that this really works - and that out of maybe 16^9 trees sometimes only one solves a sliding problem.
The 16 arguments or tokens are totally free to pick and you could have 15 constants and one function etc.
Constants that are not supplied will be generated by the code! So if the algorithm needs a 3 for a solution then it will find the term (2 + 1).
This is a true and beautiful brute force abstract syntax tree searcher that can solve many chess problems optimally.
I think I might share the code via PM to people that are interested.
Feel free to contact me under daniel.infuehr@live.de - I am always interested about chessprogramming discussions.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
New Algorithm - Genetic Obstruction Difference!
With the power of abstract syntax tree sifting - a new algorithm was uncovered! - it is a further improvement of Obstruction Difference.
The file is hosted here among the other algorithms - https://github.com/Gigantua/Chess_Moveg ... onDiff.hpp
Current CLANG-14 performance - single thread
Zooming into the relevant improved algorithm:
These performance numbers may not look too much - but keep in mind that at the beginning of 2021 only the first original Obstruction Difference algorithm existed. The Inline - and now the Genetic improvements are new developments.
Also Obstruction difference is one of the 3 algorithms that are perfect for GPU devices (Bitrotation, QBB, ObstructionDiff). These are algorithms that go beyond 60 Billion lookups per second. I think this will stay the smallest algorithm in terms of number of operations starting from the masks.
It will be interesting to see how much this new code improves performance on the GPU. The interesting part is that the 3 fast algorithms are very different to each other - and yet perform the same task.
Next Steps:
I can see a huge improvement for all of chessprogramming when the currently common form of 0..64 for the source square is replaced by 0...(1ull << 64). I can see that lookup free algorithms mostly dont use the square at all anymore but use the single set bit for square. But creating all 4 or 8 ray masks from a single set bit is yet to be solved.
Happy Easter - 2022
With the power of abstract syntax tree sifting - a new algorithm was uncovered! - it is a further improvement of Obstruction Difference.
The file is hosted here among the other algorithms - https://github.com/Gigantua/Chess_Moveg ... onDiff.hpp
Current CLANG-14 performance - single thread
Code: Select all
Million Lookups/s Random Squares, Random Occupation/s:
Name Performance [MQueens/s] Tablesize Dependencies Template Author Reference
SBAMG o^(o-3cbn) 290.996566 576 [4kb] countl_zero, bswap yes Syed Fahad http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline 421.167899 0 [0kb] countl_zero, bswap yes Syed Fahad and Daniel Inf�hr http://www.talkchess.com/forum3/viewtopic.php?t=59845
Hyperbola Quintessence o^(o-2r) 594.571562 256 [2kb] bswap no Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline 576.150260 0 [0kb] bswap yes Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Genetic 8 Ray 160.177263 0 [0kb] bswap no Daniel Inf�hr (dangi12012) Abstract C++ Syntax Tree Sifter (c) Daniel Infuehr
Bitrotation 153.838067 0 [0kb] ReverseBits no Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078&start=20
Binary Neural Network 37.265090 5852 [45kb] pdep_u64, AVX2 no Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79332
Exploding Bitboards 76.940142 768 [6kb] imul64 no Harald L��en http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup) 88.202607 0 [0kb] none yes Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift 280.341737 0 [0kb] AVX2 no Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79005&start=60
Pext Emulated 68.503395 107904 [843kb] none no Zach Wegner https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill 300.334122 0 [0kb] none no Gunnar Andersson https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone 435.573269 0 [0kb] none no Peter M. Kogge, Harold S. Stone https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards 46.902647 1848 [14kb] none no Robert Hyatt https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine 350.457054 0 [0kb] countr_zero, countl_zero yes Fabio Gobbato https://www.chessprogramming.org/QBBEngine
Classical Bob-Mike 322.506087 1024 [8kb] countr_zero, countl_zero yes Robert Hyatt and Michael Sherwin https://www.chessprogramming.org/Classical_Approach
Leorik 334.393174 128 [1kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Leorik Inline 352.132308 0 [0kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Obstruction Difference 315.880913 768 [6kb] countl_zero no Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline 330.828228 0 [0kb] countl_zero yes Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference 361.254275 0 [0kb] countl_zero yes Daniel Inf�hr and Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Slide Arithmetic 259.544758 256 [2kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inf�hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline 149.784871 0 [0kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inf�hr http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Kindergarten 600.249103 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards 263.630811 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Fancy Magic BB - Variable shift 542.456242 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy
Plain Magic BB 572.631097 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 647.353941 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr 778.386793 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube 50.815956 107680 [841kb] none yes Daniel Inf�hr (dangi12012) http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
Code: Select all
Obstruction Difference 315.880913 768 [6kb] countl_zero no Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline 330.828228 0 [0kb] countl_zero yes Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference 361.254275 0 [0kb] countl_zero yes Daniel Inführ and Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79701
Also Obstruction difference is one of the 3 algorithms that are perfect for GPU devices (Bitrotation, QBB, ObstructionDiff). These are algorithms that go beyond 60 Billion lookups per second. I think this will stay the smallest algorithm in terms of number of operations starting from the masks.
It will be interesting to see how much this new code improves performance on the GPU. The interesting part is that the 3 fast algorithms are very different to each other - and yet perform the same task.
Next Steps:
I can see a huge improvement for all of chessprogramming when the currently common form of 0..64 for the source square is replaced by 0...(1ull << 64). I can see that lookup free algorithms mostly dont use the square at all anymore but use the single set bit for square. But creating all 4 or 8 ray masks from a single set bit is yet to be solved.
Happy Easter - 2022
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 219
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
Re: Comparison of all known Sliding lookup algorithms
A small optimization over the QBB algorithm (I have removed the ^1ULL<<sq in the final mask):
Code: Select all
/* return the bitboard with the rook destinations */
static constexpr uint64_t Rook(uint64_t sq, uint64_t occupation)
{
occupation ^= 1ULL << sq; /* remove the selected piece from the occupation */
uint64_t piecesup = (0x0101010101010101ULL << sq) & (occupation | 0xFF00000000000000ULL); /* find the pieces up */
uint64_t piecesdo = (0x8080808080808080ULL >> (63 ^ sq)) & (occupation | 0x00000000000000FFULL); /* find the pieces down */
uint64_t piecesri = (0x00000000000000FFULL << sq) & (occupation | 0x8080808080808080ULL); /* find pieces on the right */
uint64_t piecesle = (0xFF00000000000000ULL >> (63 ^ sq)) & (occupation | 0x0101010101010101ULL); /* find pieces on the left */
return (((0x8080808080808080ULL >> LSB(piecesup)) & (0x0101010101010101ULL << MSB(piecesdo))) ^
((0xFF00000000000000ULL >> LSB(piecesri)) & (0x00000000000000FFULL << MSB(piecesle))));
/* From every direction find the first piece and from that piece put a mask in the opposite direction.
Put togheter all the 4 masks and remove the moving piece */
}
/* return the bitboard with the bishops destinations */
static constexpr uint64_t Bishop(uint64_t sq, uint64_t occupation)
{ /* it's the same as the rook */
occupation ^= 1ULL << sq;
uint64_t piecesup = (0x8040201008040201ULL << sq) & (occupation | 0xFF80808080808080ULL);
uint64_t piecesdo = (0x8040201008040201ULL >> (63 ^ sq)) & (occupation | 0x01010101010101FFULL);
uint64_t piecesle = (0x8102040810204081ULL << sq) & (occupation | 0xFF01010101010101ULL);
uint64_t piecesri = (0x8102040810204081ULL >> (63 ^ sq)) & (occupation | 0x80808080808080FFULL);
return (((0x8040201008040201ULL >> LSB(piecesup)) & (0x8040201008040201ULL << MSB(piecesdo))) ^
((0x8102040810204081ULL >> LSB(piecesle)) & (0x8102040810204081ULL << MSB(piecesri))));
}
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
Interesting - will take a look!Fabio Gobbato wrote: ↑Tue Apr 19, 2022 5:21 pm A small optimization over the QBB algorithm (I have removed the ^1ULL<<sq in the final mask):
Sliding piece update:
The new raymasks from this Thread - http://www.talkchess.com/forum3/viewtop ... 0&start=40
have found themselves in this file:
https://github.com/Gigantua/Chess_Moveg ... on.hpp#L45
And improved performance on MSVC by 10%. This code is much longer and more verbose but can increase performance and is the first performant bitboard native algorithm in the comparison codebase. (where square is not in 0..64 form - but in 0..(1ull << 64) form)
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Comparison of all known Sliding lookup algorithms
Fabio you will like this one. I found alternate forms for the QBB algorithm that shave off a few instructions.Fabio Gobbato wrote: ↑Tue Apr 19, 2022 5:21 pm A small optimization over the QBB algorithm (I have removed the ^1ULL<<sq in the final mask):
I will preview it now for the rook - but I will take a week break from talkchess.
Chessprogramming is very addictive to me and is starting to interfere with my work.
Anyways here is a sneak peek of what will be hopefully a slight improvement:
QBB Genetic Rook Preview:
Code: Select all
uint64_t lsb_up = LSB((occ | 0xFF00000000000000ULL) & (0x0101010101010101ULL << sq));
uint64_t msb_do = MSB((occ | 0x00000000000000FFULL) & (0x8080808080808080ULL >> sr));
uint64_t lsb_ri = LSB((occ | 0x8080808080808080ULL) & (0x00000000000000FFULL << sq));
uint64_t msb_le = MSB((occ | 0x0101010101010101ULL) & (0xFF00000000000000ULL >> sr));
uint64_t ab = (((0x0101010101010101ULL << msb_do) << lsb_up) >> lsb_up);
uint64_t cd = (((0x00000000000000FFULL << msb_le) << lsb_ri) >> lsb_ri);
return ab ^ cd; //trick from Fabios post - sine both lines overlap this is perfect
Anyways - see you in a week!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer