dangi12012 wrote: ↑Thu Feb 10, 2022 10:15 pm
Update!
New Algo!
AVX Branchless Shift
I was getting tired of the lack of AVX2 so I sat down and mixed together Kogge Stone - Dumb7 Fill and NO HEADACHES algo to create this branchless version of what normally is a conditional loop:
You can see its dumb7 fill like but does not infer 4 sliders at once. It infers 1 slider at a time - but in 4 directions at once. 

Performance is great because it beats QBBEngine by a small amount! 
 Update2: 
Neural Network speed improved by 3x.
Update2: 
Neural Network speed improved by 3x.
This was made possible by training against the 16 bits of pext and not against the occupancy itself. 
Training code is here If you run it - it should produce most C++ code needed: 
https://github.com/Gigantua/Chess_BinaryNeuralNetwork
This is the performance for truly sampling random positions:
Code: Select all
Million Lookups/s Random Squares, Random Occupation/s:
Name                               Performance [MQueens/s]       Tablesize           Dependencies             Template  Author                                       Reference
Binary Neural Network              52.163979                     5852    [45kb]      pdep_u64, AVX2           no        Daniel Inführ (dangi12012)                   Not released yet
Exploding Bitboards                64.496087                     768     [6kb]       imul64                   no        Harald Lüßen                                  http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4523&start=80
Reference (Switch Lookup)          35.547266                     0       [0kb]       none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78235&p=907362&hilit=espresso#p907362
AVX Branchless Shift               185.208336                    0       [0kb]       none                     no        Daniel Inführ (dangi12012)
Pext Emulated                      66.814698                     107904  [843kb]     none                     no        Zach Wegner                                  https://randombit.net/bitbashing/posts/haswell_bit_permutations.html
Dumb7 Fill                         68.671650                     0       [0kb]       none                     no        Gunnar Andersson                             https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone                        111.041916                    0       [0kb]       none                     no        Peter M. Kogge, Harold S. Stone              https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards                  46.262581                     1848    [14kb]      none                     no        Robert Hyatt                                 https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine                          176.931912                    0       [0kb]       countr_zero, countl_zero yes       Fabio Gobbato                                https://www.chessprogramming.org/QBBEngine
Classical Bob-Mike                 196.873001                    1024    [8kb]       countr_zero, countl_zero yes       Robert Hyatt and Michael Sherwin             https://www.chessprogramming.org/Classical_Approach
Leorik                             216.790810                    128     [1kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Leorik Inline                      103.871642                    0       [0kb]       countl_zero              no        Thomas Jahn (lithander)                      https://github.com/lithander/MinimalChessEngine
Obstruction Difference             241.152710                    768     [6kb]       countl_zero              no        Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline      90.061827                     0       [0kb]       countl_zero              yes       Michael Hoffmann                             http://www.talkchess.com/forum3/viewtopic.php?t=29087
Slide Arithmetic                   242.023315                    256     [2kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
Slide Arithmetic Inline            105.799674                    0       [0kb]       bzhi_u64, blsmsk_u64     no        Jakob Progsch and Daniel Inführ              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=78693&p=914767&hilit=SlideArithm#p914767
SBAMG o^(o-3cbn)                   235.724430                    576     [4kb]       countl_zero, bswap       yes       Syed Fahad                                   http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline                       93.100479                     0       [0kb]       countl_zero, bswap       yes       Syed Fahad and Daniel Inführ                 http://www.talkchess.com/forum3/viewtopic.php?t=59845
Hyperbola Quintessence o^(o-2r)    293.503304                    256     [2kb]       bswap                    no        Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline      93.087840                     0       [0kb]       bswap                    yes       Ryan Mack                                    https://www.chessprogramming.org/Hyperbola_Quintessence
Kindergarten                       446.025357                    16640   [130kb]     imul64                   no        Urban Koistinen                              https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards                    201.621711                    180416  [1409kb]    none                     no        Michael Sherwin                              http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73083
Fancy Magic BB - Variable shift    381.589575                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
Plain Magic BB                     478.667390                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       522.343232                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     742.179286                    107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube                          69.705720                     107680  [841kb]     none                     yes       Daniel Inführ (dangi12012)                   http://www.talkchess.com/forum3/viewtopic.php?f=7&t=79004&p=916723&hilit=hypercube#p916723
 
Well, in my opinion, the results over the chat look very inconsistent. On the one hand in the ranking of the algorithms, on the other hand the concrete numbers that are presented. Even on my computer, the ranking per measurement is not stable.
Sorry, if I ask a stupid question. But what is random sampling supposed to do here? From my point of view essential components for a comparison of the algorithms are missing. (assuming that I have interpreted the term correctly at this point). 
Basically, you have a board state and an action that leads to a subsequent state. One does not jump through randomly possible states. In the performance context this can even mean that you never get to the next relevant state.
So, it is absolutely necessary to include an action like execute move in the performance comparison. With a uniform implementation, the algorithms remain comparable and the effort for the action is constant and does not affect the ranking.
A significant difference is that the performance of some algorithms changes relatively when the board states are connected by an action.
For example, if the target squares for a sliding piece do not change,
algorithms that do the computation on the fly, can access a cache and use the last computation. This benefits these algorithms more than algorithms such as Magics that only compute an index and perform a memory lookup.This means in practice and in ranking, the results shift in favor of direct computations. It also means that the optimization of the algorithms is far from being as complete as you presented it.
I actually find it interesting to see the comparison of the algorithms.
But so in the big picture of this thread, at least for me personally, the fundamentals for direct comparison are pretty skewed at this point.