Comparison of all known Sliding lookup algorithms

Discussion of chess software programming and technical issues.

Moderator: Ras

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Ok it seems from previous posts the summary for C# would be:
Unsafe FancyMagic seems to be the fastest for C# in any case on x86 (ARM is untested)

Pext Unsafe is ridiculously faster on current Intel 12th gen processors with performance above even C++ Zen3 implementation.
I suspect there are different Jitter codepaths in the dotNet runtime for Intel and AMD - and one clearly emits more optimal assembly.
More importantly DDR5 together with a faster L2 cache will play the biggest role!

Zen3+/Intel -> Pext
Zen2-/Old Intel -> Fancy Magic

In C# it would be no problem to check the cpuID and set the correct algo at startup.
It will be very interesting to see how Zen4 will behave. :D
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
krunch
Posts: 10
Joined: Sat Sep 18, 2021 9:36 pm
Full name: Tony Schwebs

Re: Comparison of all known Sliding lookup algorithms

Post by krunch »

Optimizing the test loop to remove array bounds checking also gives a nice performance boost.

Code: Select all

Intel Core i5-6600K CPU 3.50GHz (Skylake)
Verify Engines...OK
Algorithm                                Million Queen/s

CSharp Native Code
Switch                                   17.59
ObstructionDiff                          91.39
Leorik                                   95.69
HyperbolaQsc                             124.61
FancyMagic                               147.42
FancyMagic Unsafe                        201.49
Pext                                     151.02
Pext Unsafe                              234.04

CSharp Optimized Test Loop
*Switch                                  17.76
*ObstructionDiff                         110.72
*Leorik                                  118.79
*HyperbolaQsc                            151.33
*FancyMagic                              187.24
*FancyMagicUnsafe                        316.44
*Pext                                    243.20
*PextUnsafe                              391.94

Code: Select all

        public interface ISliderAlgorithm { }
        public struct Pext : ISliderAlgorithm { }
        public struct PextUnsafe : ISliderAlgorithm { }
        public struct Switch : ISliderAlgorithm { }
        public struct ObstructionDiff : ISliderAlgorithm { }
        public struct Leorik : ISliderAlgorithm { }
        public struct HyperbolaQsc : ISliderAlgorithm { }
        public struct FancyMagic : ISliderAlgorithm { }
        public struct FancyMagicUnsafe : ISliderAlgorithm { }

        static void Native_Code2(ulong[] occs, int[] squares)
        {
            Console.WriteLine("\nCSharp Optimized Test Loop");
            TestSlider<Switch>(occs, squares);
            TestSlider<ObstructionDiff>(occs, squares);
            TestSlider<Leorik>(occs, squares);
            TestSlider<HyperbolaQsc>(occs, squares);
            TestSlider<FancyMagic>(occs, squares);
            TestSlider<FancyMagicUnsafe>(occs, squares);
            TestSlider<Pext>(occs, squares);
            TestSlider<PextUnsafe>(occs, squares);
        }

        static void TestSlider<T>(ulong[] occs, int[] squares)
            where T : struct, ISliderAlgorithm
        {
            Stopwatch stopwatch = Stopwatch.StartNew();
            ref int squaresBase = ref MemoryMarshal.GetArrayDataReference(squares);
            ulong bulk = 0;
            for (int i = 0; i < occs.Length; i++)
            {
                ulong occ = occs[i]; int offset = 12 * i;
                for (int r = 0; r < 12; r++)
                {
                    bulk ^= Slider<T>.Queen(Unsafe.Add(ref squaresBase, offset + r), occ);
                }
            }
            double result = perf_poscount * 12000.0 / (stopwatch.Elapsed.TotalSeconds * 1000000000.0);
            Console.WriteLine($"{$"*{typeof(T).Name}",-40} {result.ToString("0.00"),-10}");
        }

        public class Slider<T> where T : struct, ISliderAlgorithm
        {
            [MethodImpl(MethodImplOptions.AggressiveInlining)]
            public static ulong Queen(int square, ulong occ)
            {
                if (typeof(T) == typeof(Pext))
                {
                    return Movegen.Implementation.Pext.Queen(square, occ);
                }
                else if (typeof(T) == typeof(PextUnsafe))
                {
                    return Movegen.Implementation.Pext_Unsafe.Queen(square, occ);
                }
                else if (typeof(T) == typeof(FancyMagic))
                {
                    return Movegen.Implementation.FancyMagic.Queen(square, occ);
                }
                else if (typeof(T) == typeof(FancyMagicUnsafe))
                {
                    return Movegen.Implementation.FancyMagic_Unsafe.Queen(square, occ);
                }
                else if (typeof(T) == typeof(Switch))
                {
                    return Movegen.Implementation.Switch.Queen(square, occ);
                }
                else if (typeof(T) == typeof(ObstructionDiff))
                {
                    return Movegen.Implementation.ObstructionDiff.Queen(square, occ);
                }
                else if (typeof(T) == typeof(Leorik))
                {
                    return Movegen.Implementation.Leorik.Queen(square, occ);
                }
                else if (typeof(T) == typeof(HyperbolaQsc))
                {
                    return Movegen.Implementation.HyperbolaQsc.Queen(square, occ);
                }
                else
                {
                    return 0;
                }
            }
        }
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

krunch wrote: Tue Jun 07, 2022 12:23 am Optimizing the test loop to remove array bounds checking also gives a nice performance boost.

Code: Select all

Intel Core i5-6600K CPU 3.50GHz (Skylake)
Verify Engines...OK
Algorithm                                Million Queen/s

CSharp Native Code
Switch                                   17.59
ObstructionDiff                          91.39
Leorik                                   95.69
HyperbolaQsc                             124.61
FancyMagic                               147.42
FancyMagic Unsafe                        201.49
Pext                                     151.02
Pext Unsafe                              234.04

CSharp Optimized Test Loop
*Switch                                  17.76
*ObstructionDiff                         110.72
*Leorik                                  118.79
*HyperbolaQsc                            151.33
*FancyMagic                              187.24
*FancyMagicUnsafe                        316.44
*Pext                                    243.20
*PextUnsafe                              391.94

Code: Select all

        public interface ISliderAlgorithm { }
        public struct Pext : ISliderAlgorithm { }
        public struct PextUnsafe : ISliderAlgorithm { }
        public struct Switch : ISliderAlgorithm { }
        public struct ObstructionDiff : ISliderAlgorithm { }
        public struct Leorik : ISliderAlgorithm { }
        public struct HyperbolaQsc : ISliderAlgorithm { }
        public struct FancyMagic : ISliderAlgorithm { }
        public struct FancyMagicUnsafe : ISliderAlgorithm { }

        static void Native_Code2(ulong[] occs, int[] squares)
        {
            Console.WriteLine("\nCSharp Optimized Test Loop");
            TestSlider<Switch>(occs, squares);
            TestSlider<ObstructionDiff>(occs, squares);
            TestSlider<Leorik>(occs, squares);
            TestSlider<HyperbolaQsc>(occs, squares);
            TestSlider<FancyMagic>(occs, squares);
            TestSlider<FancyMagicUnsafe>(occs, squares);
            TestSlider<Pext>(occs, squares);
            TestSlider<PextUnsafe>(occs, squares);
        }

        static void TestSlider<T>(ulong[] occs, int[] squares)
            where T : struct, ISliderAlgorithm
        {
            Stopwatch stopwatch = Stopwatch.StartNew();
            ref int squaresBase = ref MemoryMarshal.GetArrayDataReference(squares);
            ulong bulk = 0;
            for (int i = 0; i < occs.Length; i++)
            {
                ulong occ = occs[i]; int offset = 12 * i;
                for (int r = 0; r < 12; r++)
                {
                    bulk ^= Slider<T>.Queen(Unsafe.Add(ref squaresBase, offset + r), occ);
                }
            }
            double result = perf_poscount * 12000.0 / (stopwatch.Elapsed.TotalSeconds * 1000000000.0);
            Console.WriteLine($"{$"*{typeof(T).Name}",-40} {result.ToString("0.00"),-10}");
        }

        public class Slider<T> where T : struct, ISliderAlgorithm
        {
            [MethodImpl(MethodImplOptions.AggressiveInlining)]
            public static ulong Queen(int square, ulong occ)
            {
                if (typeof(T) == typeof(Pext))
                {
                    return Movegen.Implementation.Pext.Queen(square, occ);
                }
                else if (typeof(T) == typeof(PextUnsafe))
                {
                    return Movegen.Implementation.Pext_Unsafe.Queen(square, occ);
                }
                else if (typeof(T) == typeof(FancyMagic))
                {
                    return Movegen.Implementation.FancyMagic.Queen(square, occ);
                }
                else if (typeof(T) == typeof(FancyMagicUnsafe))
                {
                    return Movegen.Implementation.FancyMagic_Unsafe.Queen(square, occ);
                }
                else if (typeof(T) == typeof(Switch))
                {
                    return Movegen.Implementation.Switch.Queen(square, occ);
                }
                else if (typeof(T) == typeof(ObstructionDiff))
                {
                    return Movegen.Implementation.ObstructionDiff.Queen(square, occ);
                }
                else if (typeof(T) == typeof(Leorik))
                {
                    return Movegen.Implementation.Leorik.Queen(square, occ);
                }
                else if (typeof(T) == typeof(HyperbolaQsc))
                {
                    return Movegen.Implementation.HyperbolaQsc.Queen(square, occ);
                }
                else
                {
                    return 0;
                }
            }
        }
Awesome! :D

With all the performance regressions in C# i was unsure if the jitter does remove branches in generic methods - but you proved it does it without loss.
I think instead of

Code: Select all

Unsafe.Add(ref squaresBase, offset + r)
Just pinning with

Code: Select all

GCHandle.Alloc(array, GCHandleType.Pinned);
and doing normal pointer arithmetic and unrolling the 12 loop should be a even faster.
Low overhead testcode is really important or else you cannot measure algo improvements anymore.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Also if switches are so slow in C# i will publish the results for the same code as the reference algo - but I will replace switch by
a simple Func<ulong, int, ulong>[64] - and lets see how that goes.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

krunch wrote: Tue Jun 07, 2022 12:23 am
Update: Thanks to krunch we now have faster C# testing code.
I quite like the idea to use a class as a template and then switching on it. All ideas I had involved Activator.CreateInstance<T> and then you get a class instance - calling a member on that instance involves a virtual call (which is much slower than calling static members).

I also added Unrolled loops and even faster - direct pointer access:
https://github.com/Gigantua/Chess_Moveg ... 20edaff93b

Oh and the Func jumptable in C#: (array of 64x functions)

Code: Select all

public static Func<ulong, ulong>[] GetSliderHCond = new Func<ulong, ulong>[]
		{
            (occupy) => {
				ulong result = 2ul;
				if ((occupy & 2ul) == 0) result |= 4ul;
				if ((occupy & 6ul) == 0) result |= 8ul;
				if ((occupy & 14ul) == 0) result |= 16ul;
				if ((occupy & 30ul) == 0) result |= 32ul;
				if ((occupy & 62ul) == 0) result |= 64ul;
				if ((occupy & 126ul) == 0) result |= 128ul;
				return result;
            },
			....
		};
is not faster than a switch (on my laptop)

Code: Select all

Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
CSharp Optimized Test Loop
*Switch                                  15,21
*Switch_Jumptable                        14,21
*ObstructionDiff                         135,20
*Leorik                                  133,00
*HyperbolaQsc                            167,80
*FancyMagic                              219,29
*FancyMagicUnsafe                        320,36
*Pext                                    326,31
*PextUnsafe                              413,54
For a chess engine in C# I would have the feeling of being hold back - when normal array accessors make any program much slower comapred to pointer array access.
At least native (unsafe) C# can currently be 80% as fast as C++ for these kinds of algorithms which is still very good.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Comparison of all known Sliding lookup algorithms

Post by lithander »

lithander wrote: Mon Jun 06, 2022 1:19 pm

Code: Select all

AMD Ryzen 5 3600 6-Core Processor
Verify Engines...OK
Algorithm                                Million Queen/s

CSharp Native Code
Switch                                   16,55
ObstructionDiff                          142,80
Leorik                                   152,03
HyperbolaQsc                             200,27
FancyMagic                               212,88
FancyMagic Unsafe                        381,78
Pext                                     33,87
Pext Unsafe                              37,89
I ran the benchmark on my work PC from 2019 which is roughly as powerful as my Ryzen at home but using an Intel CPU.

Code: Select all

Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
Verify Engines...OK
Algorithm                                Million Queen/s

CSharp Native Code
Switch                                   21,23
ObstructionDiff                          132,74
Leorik                                   137,07
HyperbolaQsc                             153,37
FancyMagic                               199,84
FancyMagic Unsafe                        339,31
Pext                                     195,05
Pext Unsafe                              413,50
No suprises here. Pext is performing great and FancyMagic is also doing good. But it's amazing how much faster Mergi's current gen Intel is compared to the 9700K.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

lithander wrote: Tue Jun 07, 2022 5:00 pm I ran the benchmark on my work PC from 2019 which is roughly as powerful as my Ryzen at home but using an Intel CPU.
Can you rerun with latest commit?

Now we have made a lot of progress:
With my machine the numbers developed like this:

01.06.2022

Code: Select all

CSharp Native Code
Switch                                   20,17
ObstructionDiff                          156,67
Leorik                                   171,61
HyperbolaQsc                             228,62
Pext Inlined                             232,25
07.06.2022

Code: Select all

CSharp Optimized Test Loop
*Switch                                  18,21
*Switch_Jumptable                        16,52
*ObstructionDiff                         210,79
*Leorik                                  214,89
*HyperbolaQsc                            341,44
*FancyMagic                              290,75
*FancyMagicUnsafe                        513,10
*Pext                                    506,30
*PextUnsafe                              628,62
With 3-4 iterations and some knowledge about the language any algorithm now performs many times faster. Testing code overhead was reduced and language specific code optimized.
With C# it seems that aggressive inlining is mandatory because for these functions because without it - pext and all others drop down down to around 100.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Comparison of all known Sliding lookup algorithms

Post by lithander »

dangi12012 wrote: Tue Jun 07, 2022 6:02 pm Can you rerun with latest commit?

Code: Select all

Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
Algorithm                                Million Queen/s

CSharp Optimized Test Loop
*Switch                                  17,85
*Switch_Jumptable                        16,46
*ObstructionDiff                         158,60
*Leorik                                  157,86
*HyperbolaQsc                            200,63
*FancyMagic                              267,87
*FancyMagicUnsafe                        383,49
*Pext                                    403,05
*PextUnsafe                              522,22
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Mergi
Posts: 127
Joined: Sat Aug 21, 2021 9:55 pm
Full name: Jen

Re: Comparison of all known Sliding lookup algorithms

Post by Mergi »

:shock: :shock:

Code: Select all

12th Gen Intel(R) Core(TM) i5-12600
Verify Engines...OK
Algorithm                                Million Queen/s

CSharp Optimized Test Loop
*Switch                                  16.65
*Switch_Jumptable                        14.82
*ObstructionDiff                         245.40
*Leorik                                  259.92
*HyperbolaQsc                            316.91
*FancyMagic                              582.38
*FancyMagicUnsafe                        862.10
*Pext                                    829.21
*PextUnsafe                              1132.33

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Comparison of all known Sliding lookup algorithms

Post by dangi12012 »

Mergi wrote: Tue Jun 07, 2022 6:16 pm :shock: :shock:
*PextUnsafe 1132.33
Any chance you can share the output of the C++ Comparison?
I can believe its 'only' the 12600 not a 12900k.
Amazing. :mrgreen:
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer