Comparison of all known Sliding lookup algorithms

Discussion of chess software programming and technical issues.

Moderator: Ras

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

hgm wrote: Tue Jan 18, 2022 3:39 pm Still no incrementally updated slider tables orth[64] and diag[64]?
lithander wrote: Tue Jan 18, 2022 4:03 pm
hgm wrote: Tue Jan 18, 2022 3:39 pm Still no incrementally updated slider tables orth[64] and diag[64]?
I don't see how that's going to work in the testing framework Daniel has set up. If an algorithm can't be used in such a way where it accepts a square and an occupation-bitboard as parameters and returns the correct queen targets as a bitboard he's basically out.
Lithander: We have incremental "emulated games" so the comparison code is there already..
Actual incremental algos implemented are: Hypercube, Rotated Bitboards

hgm:
Do you have a url to an existing C or C++ repo?

Both of you:
There are some algorithms that could have huge benefits by implementing a caching adapter (caching attack sets is something that Michael Hoffman pointer me to). With a uniform interface this could be done with a template parameter. Like: auto Lookup = CachingAdapter<PEXT>();
So there is a lot of work to be done. What I am focused on is to keep one interface accross all algorithms and yes incremental algos are supported.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

@HGM

When reading the CPW: https://www.chessprogramming.org/Attack_and_Defend_Maps
it seems to be an acceleration structure like a reduced caching idea (which caches internal states of some algorithms) above which is applicable to any and all algorithms.
So that would not test the bedrock of the application but would be emulated game perf.

This would not change any rankings among the algorithsm since the number of calls to the underlying algorithm would just be reduced - so need to test more to get the same relative result... But nothing would change in the rankings.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
hgm
Posts: 28331
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Comparison of all known Sliding lookup algorithms

Post by hgm »

lithander wrote: Tue Jan 18, 2022 4:03 pm I don't see how that's going to work in the testing framework Daniel has set up. If an algorithm can't be used in such a way where it accepts a square and an occupation-bitboard as parameters and returns the correct queen targets as a bitboard he's basically out.
I understood that the standard algorithms don't return queen targets but rook or bishop targets. Hence orth[] and diag[]. For the Queen targets you just OR those together.

And are the attack getters ever called for board occupancies other than the current position? If not, you can just ignore the occupied parameter, and return orth[sq] or diag[sqr] directly.
User avatar
hgm
Posts: 28331
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Comparison of all known Sliding lookup algorithms

Post by hgm »

dangi12012 wrote: Tue Jan 18, 2022 5:31 pm @HGM

When reading the CPW: https://www.chessprogramming.org/Attack_and_Defend_Maps
it seems to be an acceleration structure like a reduced caching idea (which caches internal states of some algorithms) above which is applicable to any and all algorithms.
So that would not test the bedrock of the application but would be emulated game perf.

This would not change any rankings among the algorithsm since the number of calls to the underlying algorithm would just be reduced - so need to test more to get the same relative result... But nothing would change in the rankings.
I am not sure whether you CPW link describes the same method; the updating mechanism I was referring to would not use an 'underlying algorithm', but would update the table by extending the attack sets from the parent position that included the evacuated square with the discovered ray, which was also in the table (e.g. orth[neighbor] = (orth[neighbor] | orth[fromSqr]) & orthMask[neighbor]; )

I have no link to a repo; I posted the code here last time we discussed this problem.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

hgm wrote: Tue Jan 18, 2022 6:53 pm I have no link to a repo; I posted the code here last time we discussed this problem.
Well if its a working algorithm it should probably get its own entry in CPW then. You are referring to this:
http://www.talkchess.com/forum3/viewtop ... 5D#p911070

If this is implemented in your mailbox trials then do you have a URL where this can be downloaded?

But my question remains: Is this a lookup algorithm or can this work ontop of any sliding lookup?
If its a 4x uint64_t[64] cache that incrementally skips calls to the underlying sliding lookup algo then this does not belong in the comparison. (since it would not change the relative ranking)
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

UPDATE: Leorik Lookup and Template Optimisations added

We have two more algos - Leorik and LeorikIL (0kb version) by lithander which are different from other algos in terms of performance.

One thing that was not tested yet is that some board algorithms do know the square in terms of a constant. So I added a template optimisation where the int square is allowed to be moved as a template. This is the 4th comparison mode and helps performance to another +80% improvement in some algos! So if you were to implement something on your own this could be a good reference to compare against.

So there you have it. Almost all algorithms testable in CLANG, or Microsoft MSVC - Singlethread, Multithread, Emulated Games and Constant Lookup mode for optimal comparison. Makefile included - just clone and run:

Code: Select all

git clone https://github.com/Gigantua/Chess_Movegen.git
Not all algorithms profit from a known square so that corresponds to the last yes/no. Notice how much faster some algos are getting from that:

Template optimisation:

Code: Select all

Exploading:     151.058MOps     6 kB    Optimal perf: imul64 templ:  no
Reference:      155.334MOps     0 kB    Optimal perf: none templ: yes
Pext Emulated:  99.2648MOps     843 kB  Optimal perf: none templ:  no
KoggeStone:     113.804MOps     0 kB    Optimal perf: none templ:  no
RotatedBoard:   69.8078MOps     14 kB   Optimal perf: none templ:  no
QBB Algo:       186.446MOps     0 kB    Optimal perf: countr_zero, countl_zero templ: yes
BobMike:        253.626MOps     8 kB    Optimal perf: countr_zero, countl_zero templ: yes
Leorik:         228.003MOps     1 kB    Optimal perf: countl_zero templ:  no
Leorik Lookup:  202.608MOps     0 kB    Optimal perf: countl_zero templ:  no
Obstr. Diff:    258.002MOps     6 kB    Optimal perf: countl_zero templ:  no
Obstr. Inline:  212.233MOps     0 kB    Optimal perf: countl_zero templ: yes
SlideArithm:    268.605MOps     2 kB    Optimal perf: bzhi_u64, blsmsk_u64 templ:  no
SlideA Inline:  193.436MOps     0 kB    Optimal perf: bzhi_u64, blsmsk_u64 templ:  no
Hyperbola Qsc:  316.284MOps     2 kB    Optimal perf: bswap templ:  no
Hyperb.Inline:  270.871MOps     0 kB    Optimal perf: bswap templ: yes
SISSY BB:       262.717MOps     1409 kB Optimal perf: none templ:  no
Hash Variable:  354.248MOps     729 kB  Optimal perf: imul64 templ: yes
Hash Plain:     1010.3MOps      2306 kB Optimal perf: imul64 templ:  no
Hash Fancy:     1199.83MOps     694 kB  Optimal perf: imul64 templ:  no
Pext  :         1824.13MOps     843 kB  Optimal perf: pext_u64 templ: yes
HyperCube:      329.147MOps     841 kB  Optimal perf: none templ: yes
First commit perf:

Code: Select all

Exploading:     150.89MOps      6 kB    Optimal perf: imul64
Reference:      68.93MOps       8 kB    Optimal perf: none
KoggeStone:     111.98MOps      0 kB    Optimal perf: none
RotatedBoard:   92.37MOps       14 kB   Optimal perf: none
QBB Algo:       171.72MOps      0 kB    Optimal perf: countr_zero, countl_zero
BobMike:        211.32MOps      8 kB    Optimal perf: countr_zero, countl_zero
SlideArithm:    256.04MOps      2 kB    Optimal perf: bzhi_u64, blsmsk_u64
XorRookSub:     297.78MOps      2 kB    Optimal perf: bswap
Hash Variable:  399.36MOps      729 kB  Optimal perf: imul64
Hash Plain:     529.61MOps      2306 kB Optimal perf: imul64
Hash Fancy:     597.36MOps      694 kB  Optimal perf: imul64
Pext  :         925.24MOps      843 kB  Optimal perf: pext_u64
HyperCube:      310.30MOps      841 kB  Optimal perf: none

Can someone recommend a table printing (single header)? (then I can add author and reference URL to the table itself)
Does somebody have Dumb7Fill sourcecode?

If you really want to go above and beyond:
Here is the fastest Compiler known on earth. Clang + facebooks bolt in a package
https://github.com/facebookincubator/BOLT
So you can build your own CLANG 14 with a post link optimiser and compile the makefile with that. Yields another +15% because of the optimal memory layout for the bigger algos.

With all the optimisations discussed in this thread - you get a huge bonus in performance for almost all algorithms.
The performance here is still a WORST CASE because every lookup sets a volatile uint64_t. During normal operation these things get overlapped and hidden with other calculation (20 years of hyperscalar CPU architectures)

Have a nice day -
Daniel
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
lithander
Posts: 913
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Comparison of all known Sliding lookup algorithms

Post by lithander »

Thanks for including Leorik's implementation! ;)
dangi12012 wrote: Sun Jan 23, 2022 10:46 pm

Code: Select all

Leorik:         228.003MOps     1 kB    Optimal perf: countl_zero templ:  no
Leorik Lookup:  202.608MOps     0 kB    Optimal perf: countl_zero templ:  no
I think the naming is a little odd (probably swapped?) as the version with the 1kB table is doing lookups and the other is completely inlined.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: Comparison of all known Sliding lookup algorithms

Post by Harald »

While you are checking algorithm names please also check the "Exploading".
The correct name is:
Exploding Bitboards
https://www.chessprogramming.org/Exploding_Bitboards
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Harald wrote: Mon Jan 24, 2022 4:02 pm While you are checking algorithm names please also check the "Exploading".
The correct name is:
Exploding Bitboards
https://www.chessprogramming.org/Exploding_Bitboards
Yes any Ideas of how to print a table properly? - didnt change anything because it messes up the format
Thats the real hard task for me... Not all the algos - but printing names correctly :mrgreen:

Probably printf("%-10s", "xyz");?
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

UPDATE: Author attributions added + Dumb7Fill

The links to the original and first reference for all algos (or link to cpw) exists in the sourcecode - printing is coming soon!
I will have this schema: original Author + Last Author that implemented changes in the algorithm to increase performance by at least 20%
If only one author is written thats the Original Author.
This list is incomplete but I finally found a working Dumb7Fill implementation!

AMD Ryzen 9 5950X 16-Core Processor

Code: Select all

Megalooks Known Positions/s:
Exploding Bitboards           : 151.808MOps     6 kB    Optimal perf: imul64 templ:  no         Harald Lüßen
Reference (Switch Lookup)     : 154.608MOps     0 kB    Optimal perf: none templ: yes           Daniel Inführ (dangi12012)
Pext Emulated                 : 97.8782MOps     843 kB  Optimal perf: none templ:  no           Zach Wegner
Dumb7 Fill                    : 109.603MOps     0 kB    Optimal perf: none templ:  no
Kogge-Stone                   : 114.537MOps     0 kB    Optimal perf: none templ:  no
Rotated Bitboards             : 70.1799MOps     14 kB   Optimal perf: none templ:  no           Robert Hyatt
QBBEngine                     : 187.667MOps     0 kB    Optimal perf: countr_zero, countl_zero templ: yes               Fabio Gobbato
Classical Bob-Mike            : 241.987MOps     8 kB    Optimal perf: countr_zero, countl_zero templ: yes               Robert Hyatt and Michael Sherwin
Leorik                        : 228.403MOps     1 kB    Optimal perf: countl_zero templ:  no            Thomas Jahn (lithander)
Leorik Inline                 : 206.439MOps     0 kB    Optimal perf: countl_zero templ:  no            Thomas Jahn (lithander)
Obstruction Difference        : 259.716MOps     6 kB    Optimal perf: countl_zero templ:  no            Michael Hoffmann
Obstruction Difference Inline : 211.447MOps     0 kB    Optimal perf: countl_zero templ: yes            Michael Hoffmann
Slide Arithmetic              : 267.642MOps     2 kB    Optimal perf: bzhi_u64, blsmsk_u64 templ:  no           Jakob Progsch and Daniel Inführ
Slide Arithmetic Inline       : 193.929MOps     0 kB    Optimal perf: bzhi_u64, blsmsk_u64 templ:  no           Jakob Progsch and Daniel Inführ
Hyperbola Quintessence        : 318.205MOps     2 kB    Optimal perf: bswap templ:  no          Ryan Mack
Hyperbola Quintessence Inline : 277.458MOps     0 kB    Optimal perf: bswap templ: yes          Ryan Mack
SISSY Bitboards               : 251.099MOps     1409 kB Optimal perf: none templ:  no           Michael Sherwin
Hash Variable                 : 362.119MOps     729 kB  Optimal perf: imul64 templ: yes
Hash Plain                    : 972.191MOps     2306 kB Optimal perf: imul64 templ:  no
Hash Fancy                    : 1237.89MOps     694 kB  Optimal perf: imul64 templ:  no
Pext constexpr                : 1784.7MOps      843 kB  Optimal perf: pext_u64 templ: yes               Zach Wegner
HyperCube                     : 328.45MOps      841 kB  Optimal perf: none templ: yes           Daniel Inführ (dangi12012)
I can say that Slide Arithmetic by Jakob Progsch - https://talkchess.com/forum3/viewtopic. ... 33#p890486 is missing an entry in CPW - since he really invented a new algo there (by combining existing ones + bringing new ideas)

Having a template for the position improves speed by on average 50%. Having a template is not possible everytime - but I see templates very underutilized in chessprogramming. Proof: Look how close Hyperbola Quintessence got without the table. This was a 1:2 ratio before!

That means that 3 new algos were invented in the last 12 months: Sissy bitboards, hypercube lookup, Slide Arithmetic
I will probably lose my day job because of my obsession for chessprogramming. But everything has a price I guess.
Greetings - Daniel
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer