Devlog of Leorik

mvanthoor · Post by **mvanthoor** » Tue Jan 11, 2022 12:00 pm

lithander wrote: ↑Sun Jan 09, 2022 3:04 pm Personally I think a C# chess engine should not depend on unsafe code. It's a managed language after all and if you don't want that you should pick C/C++ imo.

Well... there's like two lines of unsafe code in Rustic at the moment. One to prevent it from initializing new move lists with 0's, because right after creating it I'll put moves in it; and one tiny function that swaps two pointers to moves instead of two moves. Because the pointer and move are both 64-bit, I don't know if this is even necessary; it is a remnant from the very beginning where Rustic used a struct to store move information instead of a 64-bit int. I'll have to remove it and run a test.

So I'm aware that the last few percent of speed I'll only get when I port Leorik to C++ at some point in the far future! But for now I'd rather attempt to write the strongest C# chess engine than the 100th strongest C/C++ one, hehe.
That does it. I'm moving to Ada right now. There's only one engine in Ada I know of (AdaChess), and the current Rustic dev-version is already 200 Elo stronger. Then I'll release Rustic 4 in that language and I'll have the strongest Ada engine

algerbrex · Post by **algerbrex** » Tue Jan 11, 2022 4:03 pm

lithander wrote: ↑Tue Jan 11, 2022 12:08 am Why not use mailbox in combination with the bitboard representation, you mean? Well, because you now have redundant information that you need to keep in sync. It makes the board representation much larger, too, so that Copy/Make is probably not an option anymore. And then you have to implement an Unmake method that you also have to keep in sync with the Make method. All in all it strikes me as a bit too complicated to be my first choice.

I've actually never really found it too cumbersome to keep the mailbox and bitboard representation in sync. In Blunder I have two functions, removePiece and putPiece. These functions handle keeping the mailbox, bitboard, and zobrist for pieces in sync:

Code: Select all

func (pos *Position) putPiece(pieceType, pieceColor, to uint8) {
	pos.PieceBB[pieceColor][pieceType].SetBit(to)
	pos.SideBB[pieceColor].SetBit(to)

	pos.Squares[to].Type = pieceType
	pos.Squares[to].Color = pieceColor
	pos.Hash ^= Zobrist.PieceNumber(pieceType, pieceColor, to)
}

func (pos *Position) removePiece(from uint8) {
	piece := &pos.Squares[from]
	pos.PieceBB[piece.Color][piece.Type].ClearBit(from)
	pos.SideBB[piece.Color].ClearBit(from)

	pos.Hash ^= Zobrist.PieceNumber(piece.Type, piece.Color, from)
	piece.Type = NoType
	piece.Color = NoColor
}

And all I have to do is just call one of these two methods anytime I need to update the board, so it's not actually that complicated for me. All the nitty-gritty update logic is nice and compact.

But your copy/make comment is probably true. Including a mailbox representation does add an extra 128 bytes (although if you encoded each piece in a single byte instead of a struct like me, it'd be only 64 bytes). But I guess this has never really been a concern for me since I never considered using anything besides make/unmake.

Although make/unmake is more complicated code-wise than make/copy, I've never really had the sense that it's too much more complicated. I will say that debugging during perft was definitely harder since you had to look through two methods for bugs. But overall as long as I sat down before and made sure to sketch out all the logic to make and then reverse a move, it was definitely doable

lithander wrote: ↑Tue Jan 11, 2022 12:08 am Maybe the performance benefits make it all worth it but when in doubt simple is best! (for me)

Honestly not sure. Depending on how much branching one has in their make/unmake methods, make/copy might be quicker. I actually never tested this in Blunder, but I think it'd be fun to try this out in the next couple of weeks. Maybe I'll get a speed improvement if I can clean-up and optimize things enough.

lithander wrote: ↑Tue Jan 11, 2022 12:08 am But I really don't want to argue against any of your solutions or claim that what I am doing would be superior. It's part of the beauty of this hobby that there seems to be no obviously best solution. And my engine is first and foremost just an expression of my personal preferences or testament to whatever piqued my interests at the time of writing. And to my ignorance, oftentimes, I'm sure.

Indeed. This post is only meant to share my experience, not argue. Chess engines would be boring, and not what it is today, if people weren't willing to try different things and learn from each other.

mvanthoor · Post by **mvanthoor** » Tue Jan 11, 2022 7:51 pm

mvanthoor wrote: ↑Tue Jan 11, 2022 12:00 pm Well... there's like two lines of unsafe code in Rustic at the moment.

,.. there is now one line of unsafe code in Rustic at the moment. I'm testing the removal of the unsafe code for swap() in the move-list. One FIXME left for the creation of the movelist, where an old function is used (apparently, according to the linter). I don't think I'll be able to remove that unsafe code but rather have to update it. With regard to the mailbox... I could do the same thing you (lithander) are doing, but I don't know if it'll be faster than the mailbox. It would spare some bytes in the game state, though... so that may gain a bit of speed. But that is for a later version. If it gains speed at all, it won't be earth-shattering.

lithander · Post by **lithander** » Sun Jan 16, 2022 3:34 am

Mike Sherwin wrote: ↑Sun Jan 09, 2022 7:13 pm My piece types are enum {EMPTY, WP, WN, WB, WRC, WR, WQ, WKC, WK, BP ...}. The types with 'C' can castle. If the WKC (WRC) moves it becomes a WK (WR) and becomes a WKC (WRC) again when the move is retracted. That means that as soon as a WKC becomes a WK no more cycles are wasted on castling.

I actually tried this today in MinimalChess. And the code-base lost about 13 lines of source-code, 6 less executable code. So for the goal of making the engine "more minimal" defined by that metric it would be a win.

Where I had already switch-case blocks like in the move generator the new code looked fine. The problem was whenever I *didn't* care about the castling rights for example when converting the pieces into their character-representation for text output or when I needed an index-by-piece into some table (e.g. History or Evaluation) because then I had to remember to zero out the "Castling" bit. For the Zobrist keys where this information couldn't be discarded I had previously just cast the Piece to an int and used it as index directly into a table of Zobrist. Because zeroing the bit here would cause the hash of a position to ignore castling rights I now needed twice the table size (the indices having one additional bit) to keep this direct-access scheme. In the end it felt to me like encoding additional information in an enum of Pieces was making the code less straight forward to read, which was always my top goal.

There seem to be no measurable difference in performance (Score of MinimalChess 0.6.2 vs MinimalChess 0.6.1: 296 - 294 - 410 [0.501] 1000) and so I'm probably not going to keep the change. But thanks for making me aware of that that idea, Mike! It was a fun exercise and my reason for rejecting it in the end is a purely subjective one.

Mike Sherwin · Post by **Mike Sherwin** » Sun Jan 16, 2022 6:44 am

lithander wrote: ↑Sun Jan 16, 2022 3:34 am
Mike Sherwin wrote: ↑Sun Jan 09, 2022 7:13 pm My piece types are enum {EMPTY, WP, WN, WB, WRC, WR, WQ, WKC, WK, BP ...}. The types with 'C' can castle. If the WKC (WRC) moves it becomes a WK (WR) and becomes a WKC (WRC) again when the move is retracted. That means that as soon as a WKC becomes a WK no more cycles are wasted on castling.
I actually tried this today in MinimalChess. And the code-base lost about 13 lines of source-code, 6 less executable code. So for the goal of making the engine "more minimal" defined by that metric it would be a win.

Where I had already switch-case blocks like in the move generator the new code looked fine. The problem was whenever I *didn't* care about the castling rights for example when converting the pieces into their character-representation for text output or when I needed an index-by-piece into some table (e.g. History or Evaluation) because then I had to remember to zero out the "Castling" bit. For the Zobrist keys where this information couldn't be discarded I had previously just cast the Piece to an int and used it as index directly into a table of Zobrist. Because zeroing the bit here would cause the hash of a position to ignore castling rights I now needed twice the table size (the indices having one additional bit) to keep this direct-access scheme. In the end it felt to me like encoding additional information in an enum of Pieces was making the code less straight forward to read, which was always my top goal.

There seem to be no measurable difference in performance (Score of MinimalChess 0.6.2 vs MinimalChess 0.6.1: 296 - 294 - 410 [0.501] 1000) and so I'm probably not going to keep the change. But thanks for making me aware of that that idea, Mike! It was a fun exercise and my reason for rejecting it in the end is a purely subjective one.

Well thank you also for working with me on this and other things in the past. It is refreshing to not be ignored or just written off. Of course Bricabrac was designed this way from the beginning and for Bric there were no conflicts like in MinimalChess.

Bric in the original position searches 10 million moves per second using only one thread. Having 128 bits of hash signature probably slows it down some. It is sound at its core because it passes all perft test on the CPW for all depths given. It is a shame that I probably will never be able to finish it because of my age and my health problems. It would be trivial (but not by me!) to add the MinimalChess evaluation as an experiment. And maybe your new sliding piece generator as well. PM me if you are interested otherwise no need to comment on this. It is just a thought.

lithander · Post by **lithander** » Mon Jan 17, 2022 12:34 pm

Mike Sherwin wrote: ↑Sun Jan 16, 2022 6:44 am Bric in the original position searches 10 million moves per second using only one thread. It is a shame that I probably will never be able to finish it because of my age and my health problems. It would be trivial (but not by me!) to add the MinimalChess evaluation as an experiment. And maybe your new sliding piece generator as well. PM me if you are interested otherwise no need to comment on this. It is just a thought.

I am sorry to hear that your health situation hasn't improved, yet.

The 10 Million NPS of Bric sound really great compared to MinimalChess' 1M NPS! I want that, too!

And it's not only about the X Elo you can gain directly from a faster engine. Every idea I had left to make MinimalChess stronger would have taken much longer then necessary, wasting *my* development time, too. For example, I wanted to base my evaluation on my own set of quiet positions data (instead of data from Zurichess) but the time to generate scales with the engine speed. I wanted to re-tune my PSTs after changes to the search but tuning speed depends on the engine's evaluation speed. And of course every change needs hundreds or thousands of games at somewhat meaningful time controls to be verified as an improvement. In the end I figured that a faster engine would be the next thing to do and to put all other ideas on hold.

Maybe I could, with your help, implement my evaluatioin in Bric now but it would seem that it would be better to wait with that a few more weeks until after it has been ported to a bitboard engine. And I have a few ideas beyond that, that don't really change the 12 + 1 PST table idea but add a king safety term to the latter for example. It's not quite done, yet. And it's also not 100% original at the moment because the values are trained on a data-set of quiet positions from another engine. Last but not least, however, I really promised myself to focus on Leorik and avoid distractions as much as possible. (But focus and time are a scarce resource these days for me, too)

But after Leorik's first release version we can give it a try!

lithander · Post by **lithander** » Wed Jan 19, 2022 11:28 am

--just a progress update ---

In a previous post I quoted the raw performance of the pseudo-legal perft with no bulk counting as 66M NPS on my set of test-positions and machine.

I have since split the move generation into two parts: Captures and quiet moves are generated separately and for captures I'm looking up the captured piece and store it in the move.
This has cost me 10M NPS but this information is crucial to do incremental zobrist-hash updates cheaply. And that's what I did next.

If I update the zobrist-hash only when needed (excluding leaf nodes) then the overall cost of updating a hash in perft is negligible. To measure and optimize the incremental update speed I temporarily updated the hash for all nodes however (which makes no sense in perft) and the speed dropped by another 12M NPS to 43M NPS.

As a test of the zobrist quality I store and lookup the perft value of nodes in a transposition table. My perft reports 150M NPS, now. That's cheated of course because not all of these nodes have truely been visited. But it's a good test to verify the zobrist keys are indeed unique or otherwise the perft result would be wrong.

So I lost 30% speed compared to the vanilla perft implementation. If you think about that as the price paid just for incremental zobrist then it was really an expensive feature. But in practice it's not that bad. MinimalChess never used the TT in QSearch and that means the cost of the incremental zobrist update is going to be practically negligible.
And the changes to the movegen have additional value besides zobrist hashing. In a real search you can oftentimes skip generating the quiets entirely. And the captured piece is critical knowledge for implementing move-ordering and an incremental evaluation, too. Which is the next topic I'm going to focus on.

The plan is to port the PST-based evaluation from MMC over to Leorik which should be trivial if computed from scratch. Then make it incremental again and optimize it for speed. I expect the incremental eval update to have roughly the same performance impact as the incremental zobrist update had.

lithander · Post by **lithander** » Mon Jan 24, 2022 2:36 pm

--- progress update ---

Since the last progress update I have ported the PST based evaluation from Minimal Chess and made it incremental. The BoardState class now contains not only the bitboards but also a field for the zobrist hash (ulong) and the evaluation state (a struct of four shorts) but whether those get updated (incrementally) is optional. You need to call the UpdateZobrist or UpdateEval methods and pass the move if you want. There's a small price for seperating that functionality from the move playing and I tried *many* variations to find the right balance between good™ code and fast code. At some point I just decided to move on...

So, with a way to get a score for a position I'm now ready to implement an actual search. I'm not going to do engine vs engine tests unless I have feature parity with MinimalChess. Instead I have decided to use the Win at Chess (WAC) collection of 300 chess positions with known best moves to quickly verify that I'm moving in the right direction when adding features to the search.

I have written all the code to read fens and convert best move notations and have a basic testing setup where, if you run the application, it starts to get to work on the testset immediately and prints the accumulated results in the end. The search at this point couldn't be simpler: I'm doing a full minmax search of each position to a fixed depth.

On depth 4 it takes already 10 minutes and solves only 79 positions correctly.

Code: Select all

22931089K nodes visited. Took 594,398 seconds!
38578K NPS.
Best move found in 79 / 300 positions!

...but at least it does so at very high speeds! I fear in terms of raw speed it will only get worse from here on...

Just for context: A node in my framework is every position that has been truely created by playing a move on a previous position. Each node has also been incrementally evaluated. And 79/300 is really bad... when I'm feature complete and ready for the first release of Leorik it should solve 295+ of these positions if you give it 10 minutes (2s per position).

lithander · Post by **lithander** » Fri Jan 28, 2022 5:06 pm

--- progress update ---

I'm in the process of recreating the bits and pieces that work together in MinimalChess in the new framework one by one.

After pure minmax the next logical step was to add alpha-beta pruning and the difference is massive. Searching each position of the testset to depth 7 took almost an hour before and less then 2 seconds after. The search tree lost 99,95% of it's nodes!?

So, I was excited to implement MVV-LVA move ordering next expecting another huge step forward... instead the tree size shrunk only a little retaining 75% of the nodes. For example on depth 9 alpha-beta visited 94B nodes and still visited 69B nodes. Apparently the fact that I already generated and played captures before quiets made the move ordering almost obsolete. Now, if you consider that the move ordering isn't for free and takes *time* so maybe you visit 25% less nodes but if the speed of the search (nodes per second) drops by 25% you have gained nothing.

I thought Mvv-Lva was such a crucial component in any chess engine but my results didn't confirm it.I double and triple-checked the implementation but it seems like the moves are played in the right order... it just doesn't matter much. What is going on?

I postponed the question, shelved the code, and moved on to the next thing considered rather crucial: quiescence search. And guess what... adding it completely killed the performance of my search. So much so that again I thought there must be a bug. With qsearch enabled (but without move-ordering of the captures) searching to a specific depth of 7 took 40x longer. The time a search took *with* quiescence enabled exploded so much that in a runtime- or even node-limited scenario I could get much more solved positions by not doing qsearch at all but searching ~2 plies deeper.

So both individual results had me stumped and caused me to question my abilities to code straight. Only when I combined the two and used move-ordering in the qsearch my results finally started to make *some* sense. By playing something like PxQ first the qsearch part of the tree loses 90% of it's nodes. That's significant. And now the version with quiescence took "only" 5x longer. Still... that's a lot of time spent in qsearch and it's not only that the number of positions visited increases dramatically, the raw speed (visited nodes per second) really doesn't benefit from that kind of deep but narrow search where you invoke the move generator a lot to generate all the captures only to then scan for the one capture that refutes the previous "stupid" move and the rest was generated for nothing.

Considering the 60M+ nps I was getting in perft my current performance metrics of 14M nps with q-search enabled are a little depressing.

Maybe I'm doing too much in my quiescence search?

Code: Select all

        private static int EvaluateQuiet(int depth, int alpha, int beta, MoveGen moveGen)
        {
            NodesVisited++;
            BoardState current = Positions[depth];
            BoardState next = Positions[depth + 1];

            bool inCheck = current.IsChecked(current.SideToMove);

            //if inCheck we can't use standPat, need to escape check!
            if (!inCheck)
            {
                int standPatScore = (int)current.SideToMove * current.Eval.Score;

                if (standPatScore >= beta)
                    return beta;

                if (standPatScore > alpha)
                    alpha = standPatScore;
            }

            //play the possible captures in mvv-lva order
            bool movesPlayed = false;
            for (int i = moveGen.CollectCaptures(current); i < moveGen.Next; i++)
            {
                PickBestMove(i, moveGen.Next);
                if (next.PlayAndUpdate(current, ref Moves[i]))
                {
                    movesPlayed = true;
                    int score = -EvaluateQuiet(depth + 1, -beta, -alpha, moveGen);

                    if (score >= beta)
                        return beta;

                    if (score > alpha)
                        alpha = score;
                }
            }

            if(inCheck)
            {
                //because we need to escape check we have to search all the quiet moves
                for (int i = moveGen.CollectQuiets(current); i < moveGen.Next; i++)
                {
                    if (next.PlayAndUpdate(current, ref Moves[i]))
                    {
                        movesPlayed = true;
                        int score = -EvaluateQuiet(depth + 1, -beta, -alpha, moveGen);

                        if (score >= beta)
                            return beta;

                        if (score > alpha)
                            alpha = score;
                    }
                }
                //no moves escape check? guess it's game over
                if (!movesPlayed)
                    return Evaluation.Checkmate(current.SideToMove, depth);
            }


            //stalemate?
            //if (expandedNodes == 0 && !LegalMoves.HasMoves(position))
            //    return 0;

            return alpha;
        }

...or maybe chess puzzles (https://www.chessprogramming.org/Win_at_Chess) are not the most representative?

But anyway, after a lot of confusion and trying to make sense of my measurements I think there's nothing seriously wrong and everything is more or less as you should expect it. Just that I really didn't have a very accurate gut-feel about it all. I just hope that when I'm all done there's some significant performance advantage left (compared to MinimalChess) or else this whole endavour would feel rather pointless in retrospect.

mvanthoor · Post by **mvanthoor** » Fri Jan 28, 2022 6:27 pm

lithander wrote: ↑Fri Jan 28, 2022 5:06 pm --- progress update ---

Does your a/b-function already implement staged move generation? If so, I'd remove it.

It could also be that your test-set is biased. That is, if you generate the captures, sort them, and then play the best one, it could be the solution to the test-set most of the time, which would make any other moves and their order irrelevant.

If I were you, I'd implement the simplest a/b-function and a simple linear time control and then have the engine run a test against MMC 0.2. Leorik should perform horribly because it has no qsearch (so big problems with horizon effect) and it has no MVV-LVA (so it tries moves in random order). Then implement qsearch to remove the horizon effect. (This will lower the depths you can reach, because the entire qsearch is done at depth 0; but it will massively boost seldepth, obviously.) Test again; you should now see a clear improvement although Leorik will still be very weak. Then implement MVV-LVA. If you do it right, you should then be at least as strong as MMC 0.3; if not stronger.

If you implement only qsearch, MVV-LVA, linear time control and incremental non-tapered PST, you should have feature-parity with Rustic Alpha 1 and MMC 0.3, if I remember correctly. Therefore your rating should be somewhere around 1500-1700 depending on the speed of the implementation.

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik