Devlog of Leorik

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
emadsen
Posts: 440
Joined: Thu Apr 26, 2012 1:51 am
Location: Oak Park, IL, USA
Full name: Erik Madsen

Re: Devlog of Leorik - Version 1.0 released!

Post by emadsen »

lithander wrote: Sat Feb 19, 2022 10:13 pm Version 1.0 is released:
https://github.com/lithander/Leorik
Congrats Thomas on the bitboard conversion and the release. I glanced through your code. One suggestion I'll make based on personal experience is to remove all color-specific code now before it become much more difficult (tedious, I should say) later. I wrote similar color-specific code in MadChess, as you have in Leorik, and I really regretted it. It's a source of potential bugs (change white code, forgot or bungle the change for black) and it's redundant code.

I removed the color-specific code in September. It was a tedious process of translating color-specific code to color-agnostic code via array lookups indexed by colorless piece. I ran many gauntlet tournaments as I progressed through the refactoring to determine if I had broken or weakened the engine, and if so, where.

The longer you delay making your code color-agnostic, the more daunting the task becomes.

Otherwise, your engine looks like a good start at bitboards. I'm curious though why you didn't use magic bitboards? Is it just a bit too magic for your liking and you prefer a more direct approach for move generation?
Erik Madsen | My C# chess engine: https://www.madchess.net
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik - Version 1.0 released!

Post by lithander »

amanjpro wrote: Sat Feb 19, 2022 11:53 pm Congrats! I am looking forward to see this kid grow up :)
Thanks! The fast progress with Zahak and what you achieved in the end continues to be a big inspiration for me. And... it reminds me that Leorik really needs a logo! :)
algerbrex wrote: Sat Feb 19, 2022 11:57 pm Congratulations Thomas! I'm excited to Leorik develop, and I hope it'll be a useful and challenging sparring partner for future releases of Blunder (whenever I find the time again to work on releasing another version ).
I'm very grateful that you decided to wait for me to catch up. ;) I'll continue to use your released versions as sparring partner while trying to catch up!
Mike Sherwin wrote: Sun Feb 20, 2022 3:27 am A real fight this time! :)
That looks like an exciting game, even to the untrained eye! When Leorik moved the Bishop that was blocking the Pawn from promoting away I was a little shocked but it didn't cause the damage I thought it would.
emadsen wrote: Sun Feb 20, 2022 4:37 am One suggestion I'll make based on personal experience is to remove all color-specific code now before it become much more difficult (tedious, I should say) later. I wrote similar color-specific code in MadChess, as you have in Leorik, and I really regretted it. It's a source of potential bugs (change white code, forgot or bungle the change for black) and it's redundant code.
I admit that it irked me while writing but at least the branching based on the color does not seem to be a performance constraint. I'll closely monitor how often I have to write color-specific code in the future. If it doesn't bleed into search and evaluation too much I can live with it. Otherwise I need to reconsider...

But the truth is that all the alternatives of making it color agnostic that I tried/came up with also annoyed me in some way or another.

The craziest idea is that I use BinaryPrimitives.ReverseEndianness on all bitboards whenever I copy the boardstate. So the whole game would always be played from the same point of view. It's what I would do on a physical board if I played myself. Rotate the board after each move so I don't have to play topside-down... and that would probably get rid of most of the code dublication, too.
emadsen wrote: Sun Feb 20, 2022 4:37 am Otherwise, your engine looks like a good start at bitboards. I'm curious though why you didn't use magic bitboards? Is it just a bit too magic for your liking and you prefer a more direct approach for move generation?
I'm just wary of big look-up tables. Especially in C# where every array is a managed object on the heap. My limitted understanding of what's going on on the hardware level is that they put a big load on the cache. And that may not be a big issue in perft() where you profile the move generator and make the decision. But later on there are so many more thing's that are also going to cause cache pressure. The transposition table, history, PSTs and so on... that means if I want to compare a lookup based solution like Magics against an arithmetic implementation I thought I should wait with that until I have more realistic set of features and not just perft. And I think even better than magics would be PEXT - but only on hardware that supports it.

Bye the way... this is a question to all C# programmers: I was informed that Leorik does not work on older hardware due to the use of

Code: Select all

Bmi1.X64.ResetLowestSetBit(bb)
...and I would like to implement a fallback into my engine to ensure compatibility with older hardware. So what I tried today was this:

Code: Select all

[MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static ulong ClearLSB(ulong bb)
        {
            if (Bmi1.IsSupported)
                return Bmi1.X64.ResetLowestSetBit(bb);
            else
                return bb & (bb - 1);
        }
And while I expected this to be slower when using ahead-of-time compilation I really thought it would be soemthing the JIT compiler would just catch and optimized based on the available hardware.

Sadly, I lose 2-3M nps in perft on my machine so that means that providing backwards compatibility that way would hurt the speed of the 95% of computer's that support BMI1 just fine. It seams actually better to just always use the legacy implementation with the conditional check. Then the difference is only -1M nps... but it's still a compromise I'm not willing to make.

So how would you solve it? With a compiler flag?

And then I provide an additional configuration called 'Legacy' that users of old hardware will have to use to make dedicated builds for their hardware?

Code: Select all

dotnet build --configuration Legacy
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
unserializable
Posts: 65
Joined: Sat Oct 24, 2020 6:39 pm
Full name: Taimo Peelo

Re: Devlog of Leorik - Version 1.0 released!

Post by unserializable »

Congratulations, huge welcome to your new undead baby engine :)!
lithander wrote: Sun Feb 20, 2022 2:52 pm ...and I would like to implement a fallback into my engine to ensure compatibility with older hardware. So what I tried today was this:

Code: Select all

[MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static ulong ClearLSB(ulong bb)
        {
            if (Bmi1.IsSupported)
                return Bmi1.X64.ResetLowestSetBit(bb);
            else
                return bb & (bb - 1);
        }
And while I expected this to be slower when using ahead-of-time compilation I really thought it would be soemthing the JIT compiler would just catch and optimized based on the available hardware.
Is this construction something that remains at runtime or is solved already at compile-time? As non C# programmer, looking the compiled constructions at https://godbolt.org/z/WqsGhGP55 this perhaps is compile-time only (seeing the machine code is surprise too, I would have expected to see .NET bytecode).
Monchester 1.0, chess engine playing at scholastic level: https://github.com/unserializable/monchester ("Daddy, it is gonna take your horsie!")
Tickle Monchester at: https://lichess.org/@/monchester
User avatar
emadsen
Posts: 440
Joined: Thu Apr 26, 2012 1:51 am
Location: Oak Park, IL, USA
Full name: Erik Madsen

Re: Devlog of Leorik - Version 1.0 released!

Post by emadsen »

lithander wrote: Sun Feb 20, 2022 2:52 pm I'm just wary of big look-up tables. Especially in C# where every array is a managed object on the heap. My limitted understanding of what's going on on the hardware level is that they put a big load on the cache. And that may not be a big issue in perft() where you profile the move generator and make the decision. But later on there are so many more thing's that are also going to cause cache pressure.
That a reasonable answer. I assume with the large caches available to modern CPUs, the numerous lookup tables my engine relies on wouldn't cost much in terms of performance. But, as it is with our hobby, we either have to take action based on anecdotal "received wisdom", or write numerous variations of the code and measure it ourselves, in our engine. And there's only so much free time available for our hobby. What I'm saying is... I haven't measured the performance difference of magic move generation versus other techniques in my engine.

It's good though, for our hobby, that I made one decision and you another. We've increased the "genetic diversity" of chess engines, even within the narrow C# family of engines :)
lithander wrote: Sun Feb 20, 2022 2:52 pm So how would you solve it? With a compiler flag?

And then I provide an additional configuration called 'Legacy' that users of old hardware will have to use to make dedicated builds for their hardware?

Code: Select all

dotnet build --configuration Legacy
Personally, I think we're past the point of worrying about ancient hardware. Especially for free, open source software. BMI1 is what, 10 years old? If someone really wants to run your engine on an old PC they can fork the Github repo. But my opinion doesn't matter. It's up to you to decide not to support old CPUs, or guard the offending instruction with an if test, find a way to remove the offending instruction, produce separate builds via a configuration switch, etc.
Erik Madsen | My C# chess engine: https://www.madchess.net
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik - Version 1.0 released!

Post by lithander »

unserializable wrote: Sun Feb 20, 2022 8:22 pm Congratulations, huge welcome to your new undead baby engine :)!
Thanks! :)
unserializable wrote: Sun Feb 20, 2022 8:22 pm Is this construction something that remains at runtime or is solved already at compile-time? As non C# programmer, looking the compiled constructions at https://godbolt.org/z/WqsGhGP55 this perhaps is compile-time only (seeing the machine code is surprise too, I would have expected to see .NET bytecode).
Traditionally you'd ship your executable compiled to IL byte-code. When this is executed on the target hardware a runtime is required just like in Java. This runtime includes a JIT-compiler (JIT stands for just in time) and that compiles the IL into machine-code. Because this happens on the target hardware you can write code like the above and trust the JIT compiler to strip the conditional because Bmi1.IsSupported is now a compile-time constant. You could argue that this is a pretty cool advantage over something like C/C++.

But... drawback #1 is that because the compilation happens after the program has started there is often something like a "warm-up" period right after program start where the application runs at limited speed. Only when the JIT compiler has finished his work int he background the full speed of your engine becomes available. Drawback #2 is that a compiler that is constrained to work behind the scenes on a time budget measured in milliseconds will not be able to support the level of optimizations as C/C++.

Since only recently it is therefore also possible to compile with AoT (ahead of time) compilation. You have to select your target hardware at compile time and machine code is generated. But now something clever like the Bmi1.IsSupported condition turns into a trap. ClearLSB is compiled to something that's really expensive because at compile time Bmi1.IsSupported wasn't a constant so the branching on this condition is actually part of the machine code now. You lose several million nodes per second in perft.

And this leaves me with a choice between 4 options. Each of which I don't really like.

1.) Ignore "old" hardware and just assume everybody has Bmi1 without even checking.
2.) Checking for Bmi1 support at runtime and trust that the runtime is smart-enough to strip the conditional on the BMI capable hardware. But if the unsuspecting user opts for AoT compilation he's paying a hefty price.
3.) Just don't use ResetLowestSetBit... the alternative is barely even slower. But now every user with modern CPU get's a slightly slower build.
4.) Use a #define LEGACY compile flag and provide a Legacy configuration. Now not even with AoT compilation there's branching in the machine code. But you have two configurations and ultimately two different builds. Run the "normal" build on legacy hardware and it still crashes.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
tcusr
Posts: 325
Joined: Tue Aug 31, 2021 10:32 pm
Full name: Mateo

Re: Devlog of Leorik - Version 1.0 released!

Post by tcusr »

lithander wrote: Mon Feb 21, 2022 2:11 am You could argue that this is a pretty cool advantage over something like C/C++
just nitpicking but for compile time branches in C/C++ you can use preprocessor directives (#if, #ifdef) and in C++17 "if constexpr".
anyway i'd suggest option 4 because that's basically what everybody does in C/C++.
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

Since the release of Version 1.0 everything I touched was eventually reverted. So I'm still at Version 1.0 despite having spent at least 10h on the engine. As all this work will not show up anywhere else (certainly not in the master branch) I thought I would at least make a quick post about it here. ;)

Deadend #1

I've received some criticism regarding my choice of going copy/make instead of make/unmake. To make the copy step fast I skip storing redundant information as much as possible. Because the placement of all pieces is encoded in the bitboards I don't have to maintain a mailbox array. 64bytes less data to copy! But... the drawback is that the GetPiece(square) function looks like this.

Code: Select all

        public Piece GetPiece(int square)
        {
            /*
            2nd Bit = White or Black? (implies the Piece bit to be set to)
            Black = 1,      //01
            White = 3,      //11

            3rd+ Bits = Type of Piece
            Pawn = 4,       //00100
            Knight = 8,     //01000
            Bishop = 12,    //01100
            Rook = 16,      //10000
            Queen = 20,     //10100
            King = 24,      //11000
            */
            return (Piece)(Bit(Black | White, square, 0) |
                           Bit(White, square, 1) |
                           Bit(Pawns | Bishops | Queens, square, 2) |
                           Bit(Knights | Bishops | Kings, square, 3) |
                           Bit(Kings | Rooks | Queens, square, 4));
        }

        private int Bit(ulong bb, int square, int shift) => (int)((bb >> square) & 1) << shift;
And I tried different alternative implementations to speed this function up. There are a few interesting ideas and more details of what I tried to be found in this thread.

In the end the difference was hard to measure in time-to-depth even and that means there's no trace of a difference to be expected in actual play. And so I shelved the alternatives for now. Because... let's face it... the best way to fetch a piece by square would be the mailbox array. I'll just defer the decision for now and if reasons to forgo a copy/make approach in favor of make/unmake keep accumulating that's exactly what I'm going to do at some later point.

Deadend #2

Leorik 1.0 is just using tapered PSTs and Mike Sherwin showed a great example earlier in the thread what the drawbacks of that are.

In MinimalChess I went one step further by adding a 13th table to the 12 PSTs. For each piece on the board you'd accumulate small values from this table based on what squares that piece could "see". The table has 13 columns and 6 rows. So for each of the 6 pieces there are 13 values: 1 for empty squares, 6 friendly pieces and 6 enemy pieces. These values are auto-tuned. And they can encode a lot of basic chess knowledge:
There's a 2 cp bonus for each empty squares for rooks and bishops. This encourages mobility. There's a 10cp bonus for pawn guarding friendly pawns. This improves the pawn structure. Pieces get a bonus for threatening more valuable pieces. Queens and Knights are encouraged to go hunt for the King.

I ported that feature over to Leorik but sadly it tanks the performance of the engine so much that, in actual self play, the addition isn't beneficial at all. Time to depth increases by 3x!

It was a slow feature in MinimalChess, too, but it that was already a slow engine so it didn't hurt the time to depth as much. And despite the costly computations involved it turned out to be worth over 60 ELO!

MinimalChess used a mailbox representation, though, and that makes it easier to implement this feature. Basically what you have to do is you generate the move targets for *all* pieces and then you need to lookup the peace on *each* targeted square because that determines the index into the 13th table. Implementing that on pure bitboards was not only coming down to a complete rewrite of the code but also the new code is slower due to the lack of a cheap way to lookup pieces on squares.

So as it stands now my plans for the next major release, Leorik 2.0, do not include this evaluation feature. I will focus on the search features but stick with the same evaluation that Leorik 1.0 used. Does that mean Leorik 2.0 is doomed to be still weaker than the latest MinimalChess version is? I don't hope so, hopefully it's speed compensates for the lack of the 13th table. It will be an interesting matchup! Maybe Graham will agree to pit them both against each other in the Amateur Series Division 8.

Deadend #3

I'm never really sure whether I should consider Qsearch to be part of the evaluation or part of the search. It's main purpose is to eliminate the horizon effect so that the evaluation is actually reliable. So you have to resolve all open beneficial captures, that's the main purpose.

But when in check you need to prove that you can escape and can't just use standpat. But if I can't escape, does that really mean checkmate is the correct score to return? E.g. should the qsearch() on a position where side to move is not in check be allowed to return a checkmate score? I feel like it shouldn't but at the same time it seems unavoidable that it does: While playing captures the king may end up in check and if you can't escape it, you have to return checkmate... what else can you do? But the position is probably not guaranteed to be a lost one, it just seemed so because you were greatly limiting the moves you allowed playing during qsearch.

A sub-problem of that is whether Qsearch should detect and return stalemates. And I just tested both variants now. Turns out not considering stalemate is stronger in selfplay than the alternative where you return if the side to move is not in check but can't play any legal moves (quiet or captures).

I gained knowledge from that test I guess but as this was already what Leorik 1.0 was doing there's nothing to commit to master, again.

---

I feel like it's time to do something that's actually going to boost the engine's strength, next. So I'm going to implement Null Move Pruning and look forward to celebrating 150 ELO gains from it. ;)
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
eligolf
Posts: 114
Joined: Sat Nov 14, 2020 12:49 pm
Full name: Elias Nilsson

Re: Devlog of Leorik

Post by eligolf »

Very interesting problems and hopefully you will find nice solutions to them as well :)

I can only speak for point #1 since I have yet little experience with the other ones. Since I started out in Python there was no way to use a deep copy framework since the performance was taking too much of a hit. I find that developing a make/unmake move framework is in comparison very easy to implementing other features. It is also pretty easy to find bugs since you have the perft test which should catch all the eventueal faults you make.

If you want I can send you some code for help. It shoulnd't be too much work to implement :)
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

lithander wrote: Fri Feb 25, 2022 8:16 pm But when in check you need to prove that you can escape and can't just use standpat. But if I can't escape, does that really mean checkmate is the correct score to return? E.g. should the qsearch() on a position where side to move is not in check be allowed to return a checkmate score? I feel like it shouldn't but at the same time it seems unavoidable that it does: While playing captures the king may end up in check and if you can't escape it, you have to return checkmate... what else can you do? But the position is probably not guaranteed to be a lost one, it just seemed so because you were greatly limiting the moves you allowed playing during qsearch.
I made further tests on how reliable the checkmate or stalemate result form qsearch is. And the answer is: very reliable. I ran a search on my usual 300 testpositions and whenever qsearch would return checkmate or stalemate I made a copy of the boardstate and searched it right there with a normal search to confirm the result qsearch was giving.

Some interesting observations:
- There were no false positives for positions scored as mate.
- When mate was returned then the position was already checkmate when entering qsearch (no legal moves & king in check) or the side to move could give check&mate through a capture immediately.
- What about longer lines? If there is no check within the first 2 plys of qsearch deeper checkmates don't have to be accepted thanks to stand pat.
- And what about forcing sequences? Well they are not explored because after escaping check the other side could only give check through another capture. Which apparently is rare!
- Stalemates are generally much much rarer than checkmates. So even though most of the time qsearch is actually right when claiming stalemate it doesn't outweigh the cost of making sure that when no captures could be played also no quiets would have been possible.
- Unlike with checkmates with stalemate there were false positives, though.

Here's an example:
[fen]8/2q5/8/1p1Q1pk1/1r6/2P3P1/8/7K w - - 0 1[/fen]


-------

And I also implemented null move reductions. Which was almost trivial because I just did again what I already did in MinimalChess. And like last time it's giving a tremendous strength boost:

Code: Select all

10s+0.5s inc
Score of Leorik 1.1 vs Leorik 1.0: 1084 - 208 - 474  [0.748] 1766
Elo difference: 189.0 +/- 15.0, LOS: 100.0 %, DrawRatio: 26.8 %

5s+0.1s inc
Score of Leorik 1.1 vs Leorik 1.0: 1532 - 389 - 709  [0.717] 2630
Elo difference: 161.8 +/- 12.0, LOS: 100.0 %, DrawRatio: 27.0 %
What makes me extra happy: Every other feature was hurting the 'nps' but with null move pruning I'm happy to see the 'nps' finally go up again!
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Devlog of Leorik

Post by dangi12012 »

Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer