Writing the fastest move generator. Up to 4BNodes/s

Discussion of chess software programming and technical issues.

Moderator: Ras

Aleks Peshkov
Posts: 917
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by Aleks Peshkov »

Steve Maughan wrote: Tue Sep 16, 2025 6:12 pm
syzygy wrote: Tue Sep 16, 2025 5:11 pm
Steve Maughan wrote: Mon Sep 15, 2025 5:16 am
chessbit wrote: Sun Sep 14, 2025 5:59 pm<snip>Unfortunately my chess knowledge is too bad to undertake this kind of project. I don't think I would be able to do more than 1500 elo.<snip>
A chess engine with a basic search at this speed (I'm assuming it would slow down to <100 mnps) with Pesto piece-square evaluation would be easily north of 2300 ELO, and more likely closer to 2600 ELO.
Not if you are stuck with a fixed move order because you don't generate lists of moves.
Which number are you skeptical of, the 2300 or the 2600? You could easily split the move generation into captures then quiets. Add a basic piece square table on top of an alpha beta search and I don't think the nps world drop to less than 100 Mnps (I could be wrong). That should be 2300 ELO. Juggernaut 1.01 is basically this with a TT. It does 10 Mnps and has an ELO of 2150.

Steve
100M nodes (not perft numbers) per second is not possible on consumer one thread CPU. 10M x2 sure doable. x3? I doubt.
User avatar
Steve Maughan
Posts: 1295
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by Steve Maughan »

syzygy wrote: Tue Sep 16, 2025 10:18 pm
Steve Maughan wrote: Tue Sep 16, 2025 6:12 pm
syzygy wrote: Tue Sep 16, 2025 5:11 pm
Steve Maughan wrote: Mon Sep 15, 2025 5:16 am
chessbit wrote: Sun Sep 14, 2025 5:59 pm<snip>Unfortunately my chess knowledge is too bad to undertake this kind of project. I don't think I would be able to do more than 1500 elo.<snip>
A chess engine with a basic search at this speed (I'm assuming it would slow down to <100 mnps) with Pesto piece-square evaluation would be easily north of 2300 ELO, and more likely closer to 2600 ELO.
Not if you are stuck with a fixed move order because you don't generate lists of moves.
Which number are you skeptical of, the 2300 or the 2600? You could easily split the move generation into captures then quiets. Add a basic piece square table on top of an alpha beta search and I don't think the nps world drop to less than 100 Mnps (I could be wrong). That should be 2300 ELO. Juggernaut 1.01 is basically this with a TT. It does 10 Mnps and has an ELO of 2150.
I am skeptical of the suitability of the move generator for use in a chess engine.
A program optimised for perft will not generate captures first, for example. And once you have a decent search, most moves should not ever be looked at. There is no relationship between 4BNodes/sec and Elo.
It would be easy to split a move generator into captures and quiet moves. Strength is generally correlated with speed, so I respectfully disagree that there's no relationship. Maybe one day I'll have the time to dig deeper.

— Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
syzygy
Posts: 5714
Joined: Tue Feb 28, 2012 11:56 pm

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by syzygy »

chessbit wrote: Tue Sep 16, 2025 10:05 pm For those interested, I have released the code on my dusty github: https://github.com/thuijbregts/chessbit
There are some notes about the implementation, and also a binary if you want to try it yourself. I would be interested to know how it performs on your machine!
I needed to add

Code: Select all

#define __popcnt64(x) _mm_popcnt_u64(X)
and

Code: Select all

#define __forceinline __attribute__((always_inline))
I also needed to add -march=native to the compilation flag. (I just type "gcc -O3 *.cpp -o main -march=native".)

With a typo in "always_inline" (which made gcc simply ignore the attribute), I got:

Code: Select all

perft 6
Depth:		6
Nodes:		119060324
Time:		78 ms
Average:	1512.49 Mn/s
perft 7
Depth:		7
Nodes:		3195901860
Time:		1898 ms
Average:	1683.04 Mn/s
perft 8
Depth:		8
Nodes:		84998978956
Time:		50675 ms
Average:	1677.33 Mn/s
I then corrected the typo. Compilation failed because of (very many) failures to inline enumMoves().

I will try with clang.
syzygy
Posts: 5714
Joined: Tue Feb 28, 2012 11:56 pm

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by syzygy »

With clang++ compilation works, but it is not that much faster:

Code: Select all

perft 6
Depth:		6
Nodes:		119060324
Time:		81 ms
Average:	1465.79 Mn/s
perft 7
Depth:		7
Nodes:		3195901860
Time:		1511 ms
Average:	2114.58 Mn/s
perft 8
Depth:		8
Nodes:		84998978956
Time:		49892 ms
Average:	1703.65 Mn/s
syzygy
Posts: 5714
Joined: Tue Feb 28, 2012 11:56 pm

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by syzygy »

Steve Maughan wrote: Tue Sep 16, 2025 10:29 pmIt would be easy to split a move generator into captures and quiet moves. Strength is generally correlated with speed, so I respectfully disagree that there's no relationship. Maybe one day I'll have the time to dig deeper.
Why do you think that would be easy?
syzygy
Posts: 5714
Joined: Tue Feb 28, 2012 11:56 pm

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by syzygy »

Aleks Peshkov wrote: Tue Sep 16, 2025 10:19 pm 100M nodes (not perft numbers) per second is not possible on consumer one thread CPU. 10M x2 sure doable. x3? I doubt.
To maximize nps, I recommend pure minimax ;)
Aleks Peshkov
Posts: 917
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by Aleks Peshkov »

chessbit wrote: Tue Sep 16, 2025 10:05 pm For those interested, I have released the code on my dusty github: https://github.com/thuijbregts/chessbit
There are some notes about the implementation, and also a binary if you want to try it yourself. I would be interested to know how it performs on your machine!
Did not found any secret that any other bitboard legal generator on the planet does not do. BTW Stockfish goes fine with pseudolegal generator. But the code looks nice and pleasant to see. Well done!
Aleks Peshkov
Posts: 917
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by Aleks Peshkov »

syzygy wrote: Tue Sep 16, 2025 5:11 pmNot if you are stuck with a fixed move order because you don't generate lists of moves.
Tactical (based on the current position) move by move generation rather then ordered by statistical noise is forgotten path in chess programming. I follow it so far.
abulmo2
Posts: 474
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by abulmo2 »

Steve Maughan wrote: Mon Sep 15, 2025 5:16 am
chessbit wrote: Sun Sep 14, 2025 5:59 pm<snip>Unfortunately my chess knowledge is too bad to undertake this kind of project. I don't think I would be able to do more than 1500 elo.<snip>
A chess engine with a basic search at this speed (I'm assuming it would slow down to <100 mnps) with Pesto piece-square evaluation would be easily north of 2300 ELO, and more likely closer to 2600 ELO.

— Steve
Looking at the code, I do not think the move generator is that much faster than the one usually used in modern engines.
Five years ago, I made a simple perft based on magic move generation: https://github.com/abulmo/MPerft, using the pext instruction if available. On my computer, using bulk counting (as chessbit does), I have got about 800 Mnps from the starting position :

Code: Select all

perft  7 :      3195901860 leaves in      3.979 s    803109301 leaves/s
Without bulk counting, which is closer to what we can get with a chess engine, I have got only about 70 Mnps:

Code: Select all

perft  7 :      3195901860 leaves in     47.741 s     66942888 leaves/s
On the same computer & OS, chessbit, compiled with clang++ with the modifications proposed by Ronald de Man (Syzygy), reaches about 1600 Mnps ;

Code: Select all

perft 7 
Depth:		7
Nodes:		3195901860
Time:		1975 ms
Average:	1617.49 Mn/s
So, on my computer chessbit is only 2x faster than mperft.
The trick that makes chessbit faster is CTFE (compile time function evaluation) abusing metaprogramming with constexpr & templates. The recursive call to perft and the move generation are intricate and the compiler eliminates all the recursive calls up to depth 18 (the upper limit in chessbit) by slowly building a huge executable. On my computer chessbit executable size is 12.8 Mb versus 83kb for mperft.
Mperft is basically what I am using in the chess engine Dumb https://github.com/abulmo/Dumb, changing the language from C to D, and optimising it for a chess engine by using a pseudolegal move generator, no bulk counting, etc. The speed of perft drops to 49 Mnps. Yes, the optimisations that make a chess engine faster slow down perft.
Dumb is (and will stay) a simple engine with a pesto-like evaluation function, and some simple search pruning and move sorting. Its speed is only 7 Mnps but its strength on CCRL is around 2850 Elo. By removing SEE (static exchange evaluation), the speed should increase over 10 Mnps, but the Elo should go down. And, as suggested by Ronald, using only minimax should increase the speed up to 20 Mnps.

So I do not think chessbit is of interest in building a chess engine. And a speed of 100 Mnps on a single core is surely an illusion on current hardware, even for a simple engine.
Richard Delorme
chessbit
Posts: 26
Joined: Fri Dec 29, 2023 4:47 pm
Location: Belgium
Full name: thomas albert

Re: Writing the fastest move generator. Up to 4BNodes/s

Post by chessbit »

abulmo2 wrote: Wed Sep 17, 2025 11:55 am
Steve Maughan wrote: Mon Sep 15, 2025 5:16 am
chessbit wrote: Sun Sep 14, 2025 5:59 pm<snip>Unfortunately my chess knowledge is too bad to undertake this kind of project. I don't think I would be able to do more than 1500 elo.<snip>
A chess engine with a basic search at this speed (I'm assuming it would slow down to <100 mnps) with Pesto piece-square evaluation would be easily north of 2300 ELO, and more likely closer to 2600 ELO.

— Steve
Looking at the code, I do not think the move generator is that much faster than the one usually used in modern engines.
Five years ago, I made a simple perft based on magic move generation: https://github.com/abulmo/MPerft, using the pext instruction if available. On my computer, using bulk counting (as chessbit does), I have got about 800 Mnps from the starting position :

Code: Select all

perft  7 :      3195901860 leaves in      3.979 s    803109301 leaves/s
Without bulk counting, which is closer to what we can get with a chess engine, I have got only about 70 Mnps:

Code: Select all

perft  7 :      3195901860 leaves in     47.741 s     66942888 leaves/s
On the same computer & OS, chessbit, compiled with clang++ with the modifications proposed by Ronald de Man (Syzygy), reaches about 1600 Mnps ;

Code: Select all

perft 7 
Depth:		7
Nodes:		3195901860
Time:		1975 ms
Average:	1617.49 Mn/s
So, on my computer chessbit is only 2x faster than mperft.
The trick that makes chessbit faster is CTFE (compile time function evaluation) abusing metaprogramming with constexpr & templates. The recursive call to perft and the move generation are intricate and the compiler eliminates all the recursive calls up to depth 18 (the upper limit in chessbit) by slowly building a huge executable. On my computer chessbit executable size is 12.8 Mb versus 83kb for mperft.
Mperft is basically what I am using in the chess engine Dumb https://github.com/abulmo/Dumb, changing the language from C to D, and optimising it for a chess engine by using a pseudolegal move generator, no bulk counting, etc. The speed of perft drops to 49 Mnps. Yes, the optimisations that make a chess engine faster slow down perft.
Dumb is (and will stay) a simple engine with a pesto-like evaluation function, and some simple search pruning and move sorting. Its speed is only 7 Mnps but its strength on CCRL is around 2850 Elo. By removing SEE (static exchange evaluation), the speed should increase over 10 Mnps, but the Elo should go down. And, as suggested by Ronald, using only minimax should increase the speed up to 20 Mnps.

So I do not think chessbit is of interest in building a chess engine. And a speed of 100 Mnps on a single core is surely an illusion on current hardware, even for a simple engine.
Could you try the executable available on github? The code was optimized with the Intel compiler from the start, clang and other compilers yield (much) worse results. I think it would be fairer to compare different engines with the best possible compilation results. For instance, I don't know if your engine would be better optimized with the Intel compiler.

The thread is deviating into chess engine discussions, but this was not the intended goal of this project. I only wrote this code with a perft engine in mind. That being said, I think it would not be too difficult to change the existing code for the purpose of a chess engine. I might implement one just to prove it but I need a break :).