Move generator advice

RedBedHed · Post by **RedBedHed** » Tue Aug 10, 2021 2:57 am

Thomas,

Thank you!

I'm honestly just as surprised as you are lol. I still need to test it further... It was all based on research and experimentation... and I know that there is a *lot* of room for improvement... especially in the apply/retract move algorithms (where I could probably get away with using ^ instead of & and ~).

The generator is mainly fast because it uses magic bitboards. It also uses the DeBruijn bit-scan heavily, and a few bit-twiddling tricks from "Hacker's Delight."

I was originally compiling the code with g++, using the "-g" flag so that I could debug with gdb. The generator was only making 40,000,000 leaf moves per second, so I tried extra hard to optimize what I could.

I hinted "inline" for my small functions as much as possible (I wrote some assembly code last semester and it rubbed off on me lol).

I learned about a few of the uses of the "constexpr" keyword in early July. As a result, "Move" is a literal type, and the generator relies on lots of global constexpr tables to map squares to masks and exploit my laptop's big caches. It also uses pointers to constexpr aggregate-structs with "Alliance" defaults (directions, castling information, etc.).

I liked seeing the pretty hex tables in Leela, so I decided to make some of my own. I think that many of these tables really paid off! I noticed that the Stockfish engineers decided to calculate and verify magics at runtime, while the Leela engineers printed their best magics and stashed them in a table. Leela's magic initialization time is definitely faster than Stockfish's. I feel like the Stockfish authors probably had a reason for their design, and I'm probably just missing it...

I avoided heap allocation (except when building the magic database, however, because it wasn't time-critical, and it seemed natural to make the magic entries immutable).

I took lots of notes on Stockfish and imitated its template use, taking care of as much logic as possible during compile-time. Lots of the functions and methods take enum constants as template parameters and rely on a single conditional in the caller to control flow. I believe templating like this has the same effect as writing a separate function (lots of redundant code) for each new parameter (or new combination), and I think it helps to mitigate the (admittedly rare) possibility of pipeline stalls due to branching.

I also separated pinned pieces from free pieces during generation, as H.G.M. suggested, and I used a path mask to filter moves when the king is in single-check. Additionally, the generator considers only king moves whenever the king is in double-check.

Anyhow, About a week ago, I took a look at the g++ assembly output for the generator and noticed that the compiler wasn't inlining my functions! I did some Pro Googling™ and replaced the "-g" flag with "-O3." I re-made the executable and ran it in bash. The generator gained 140,000,000+ leaf moves per second! It totally blew me away lol. So... The moral to this story is to avoid using the "-g" flag with g++ if you aren't debugging. But most people already know that because they aren't dumb like me lol.

** I am still fairly new to C++, so please correct me if any of this is wrong (or reckless). I don't want to spread misinformation or bad practices, and I'd really like to learn how to do things the right way. **

RedBedHed · Post by **RedBedHed** » Tue Aug 10, 2021 3:11 am

Henk,

That is really interesting... I am very new to writing depth-first chess searches... I've only tried it once. I know that alpha-beta and hashing prune quite a bit of the search tree... Do you think that they prune optimal moves? That makes sense with the hashing optimization because the scores need to be updated often, right? Where can I read more about this?

amanjpro · Post by **amanjpro** » Tue Aug 10, 2021 4:30 am

RedBedHed wrote: ↑Tue Aug 10, 2021 3:11 am Henk,

That is really interesting... I am very new to writing depth-first chess searches... I've only tried it once. I know that alpha-beta and hashing prune quite a bit of the search tree... Do you think that they prune optimal moves? That makes sense with the hashing optimization because the scores need to be updated often, right? Where can I read more about this?

There is a balance to be kept between pruning too much and too little. Depth is not the only important factor, but it is definitely extremely important.

Rustic is a very very fast engine with almost no pruning apart from alpha beta. But it hardly reaches depth 12. Zahak is much slower, with lots of pruning in place, and reaches depth 20 from startpos in 5 seconds. Even though Zahak's NPS is quarter of that of Rustic. Obviously, rustic is still under development, and I'm sure it will have pruning in the future.

Moral of story, speed is important, but it is not the only factor

RedBedHed · Post by **RedBedHed** » Tue Aug 10, 2021 6:50 am

Thank you again for all the test cases, Christian! They just revealed a sneaky en-passant-discovered check bug. I wrongly assumed-- as a casual chess player-- that an en-passant-discovered check could only occur along the same rank as the pawns involved.

I think I fixed it! The generator is passing all the test cases now.

RedBedHed · Post by **RedBedHed** » Tue Aug 10, 2021 7:26 am

Hi, Amanj!

Depth 20 sounds amazing! The first engine that I wrote (Before I started at ASU) could only search to depth eight... and it took up to 30 seconds lol. It used a purely object-oriented mailbox approach.

I think you are right... I've probably spent a little too much time on move generation... I just really messed it up last time, and I wanted to do better.

I totally fell down the rabbit hole this summer. I had a really good time though!

My ultimate goal is to write a search that uses a neural network to evaluate the board. I'm going to base my honors thesis on my findings. I doubt the engine will be able to search deeply or play well... But getting it to play at all would be wonderful. I have about three semesters and one summer left to research searches and neural networks. I'm really glad I found this forum.

algerbrex · Post by **algerbrex** » Tue Aug 10, 2021 8:06 am

RedBedHed wrote: ↑Tue Aug 10, 2021 6:50 am Thank you again for all the test cases, Christian! They just revealed a sneaky en-passant-discovered check bug. I wrongly assumed-- as a casual chess player-- that an en-passant-discovered check could only occur along the same rank as the pawns involved.

I think I fixed it! The generator is passing all the test cases now.

You're very welcome Ellie, that's excellent to hear! And yes, I remember when I was working on Blunder's move generator I encountered the exact same bug since I too assumed that the en-passant-discovered checks could only occur along the rank of the pawns involved

Now, whenever you make a change to the move generator, whether it be optimization or a bug fix, I recommend you run through all the tests again. This seems very cumbersome I know, but even the most innocent of changes to a move generator can introduce very nasty bugs!

But since you've said the move generator is passing the test suite now, you can put aside the move generator for a while and focus on the fun part of the chess engine: The search and evaluation phases. This is really the part where you can start to give your engine its own personality! Depending on how you tune your search and the evaluation parameters, you can create an engine that looks for deep tactical lines or one that grinds you down with strong positional play. Or one that plays very human chess.

And hopefully, once you've implemented many of the search and evaluation features that are possible, you'll start to see your engine not just play chess, but play beautiful chess, which to me, is one of the most satisfying parts of creating an engine.

Henk · Post by **Henk** » Tue Aug 10, 2021 10:15 am

RedBedHed wrote: ↑Tue Aug 10, 2021 7:26 am Hi, Amanj!

Depth 20 sounds amazing! The first engine that I wrote (Before I started at ASU) could only search to depth eight... and it took up to 30 seconds lol. It used a purely object-oriented mailbox approach.

I think you are right... I've probably spent a little too much time on move generation... I just really messed it up last time, and I wanted to do better.

I totally fell down the rabbit hole this summer. I had a really good time though!

My ultimate goal is to write a search that uses a neural network to evaluate the board. I'm going to base my honors thesis on my findings. I doubt the engine will be able to search deeply or play well... But getting it to play at all would be wonderful. I have about three semesters and one summer left to research searches and neural networks. I'm really glad I found this forum.

You can choose any depth you like when inplementing LMR (Late move reductions). Even no problem to get depth 200. But that does not mean that it finds the best moves. Depth only giving an upperbound on maximum search depth.

Henk · Post by **Henk** » Tue Aug 10, 2021 10:49 am

Henk wrote: ↑Tue Aug 10, 2021 10:15 am
RedBedHed wrote: ↑Tue Aug 10, 2021 7:26 am Hi, Amanj!

Depth 20 sounds amazing! The first engine that I wrote (Before I started at ASU) could only search to depth eight... and it took up to 30 seconds lol. It used a purely object-oriented mailbox approach.

I think you are right... I've probably spent a little too much time on move generation... I just really messed it up last time, and I wanted to do better.

I totally fell down the rabbit hole this summer. I had a really good time though!

My ultimate goal is to write a search that uses a neural network to evaluate the board. I'm going to base my honors thesis on my findings. I doubt the engine will be able to search deeply or play well... But getting it to play at all would be wonderful. I have about three semesters and one summer left to research searches and neural networks. I'm really glad I found this forum.
You can choose any depth you like when inplementing LMR (Late move reductions). Even no problem to get depth 200. But that does not mean that it finds the best moves. Depth only giving an upperbound on maximum search depth.

For instance if I set LMR = 12 It may overlook some check mates. So I better assign it a lower value.

j.t. · Post by **j.t.** » Tue Aug 10, 2021 3:41 pm

RedBedHed wrote: ↑Tue Aug 10, 2021 2:57 am [...] compiling the code with g++, using the "-g" flag [...] -O3 [...]

You can use "-g" and "-O3" together. Adding "-g" often only makes a small performance difference. When I compile C/C++ I usually do this for development builds:

Code: Select all

g++ ./*.cpp -Wall -Wextra -Wpedantic -Wshadow -std=c++20 -O3 -fno-omit-frame-pointer -g

and this for release build:

Code: Select all

g++ ./*.cpp -Wall -Wextra -Wpedantic -Wshadow -std=c++20 -O3 -flto -DNDEBUG

"-O3" says to the compiler to try as much as possible to optimize the code.
"-flto" says to use link time optimization. This is optimization that can be done over different .cpp files. Though this may make debugging harder.
"-fno-omit-frame-pointer" apparently helps to debug sometimes. I am not sure if it helped me so far, but it doesn't hurt.
(-fomit-frame-pointer is somtimes on by default and there could be issues with debugging).
"-DNDEBUG" defines the macro "#define NDEBUG" which disables anything you test with "assert".
"-W..." useful warnings

"-flto" is more helpful if you use clang++. It is generally a good idea to try both compilers, clang++ and g++ and see what makes your code faster. For my engine for example clang++ results in 2,000,000 NPS while g++ only results in 1,600,000 NPS.

Advice for using Git: I would add ".idea/", all the cmake files and folder except "CMakeLists.txt", and all Makefiles generated by cmake to a ".gitignore" file. At best, files like these are just unnecessary/distracting and at worst they have some private information (e.g. details about your machine, or an embarrassing folder name, etc.) that you don't want public.

RedBedHed · Post by **RedBedHed** » Tue Aug 10, 2021 11:04 pm

Jost,

Woah! Thank you so much! I used assert() in many of my functions, and I considered the overhead to be necessary. But you are totally right! I don't need to assert at runtime if everything is working!

I tried out the flags you gave and the generator is doing perft(6) from the initial position in 0.547 seconds. That's more than 0.1 seconds faster! (originally 0.656-0.703 seconds)

Can't thank you enough!

edit: Just woke up. Need coffee in order to math lol.

Move generator advice

Re: Move generator advice

Re: Move generator advice

Re: Move generator advice

Re: Move generator advice

Re: Move generator advice

Re: Move generator advice

Re: Move generator advice

Re: Move generator advice

Re: Move generator advice

Re: Move generator advice