In recent years bitboards have been invaluable in chess engines — especially magic bitboards. For example, you can find all the passed pawns in a couple of lines of code. But in the era of NNUE is this important? Isn't it the case that the speed constraint of an engine is now the code that updates the NN. And since there is little need for a fast hand-crafter evaluation, the main advantages of bitboards are now moot. Could it be the case that good old-fashioned mailbox board representations are as good or better than bitboards for NNUE engines?
Thoughts? Has anyone done any experiments?
Steve
Bitboards vs. Mailboxes in the Era of NNUE...
Moderator: Ras
-
- Posts: 1274
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Bitboards vs. Mailboxes in the Era of NNUE...
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Bitboards vs. Mailboxes in the Era of NNUE...
Exactly the other way. Bitboards are more important than ever!
I could list 50 algorithms and ideas that are only possible with Bitboards.
The movegen is much faster. For example you can solve all 8 pawns at once.
The memory footprint of a position is actually similar with nibble boards (but only then)
What you are eluding to is the first layer of nnune would like a 64x8bit input. This is a Limitation of nnune and not a strength.
With Galois field instructions you can have a uint64 as 64bits of input to a nn - skipping this expansion state.
This has also been possible with Cuda and popxnor networks for some time now.
The key thing is you can expand from BB to Neural network natively for some time now.
I could list 50 algorithms and ideas that are only possible with Bitboards.
The movegen is much faster. For example you can solve all 8 pawns at once.
The memory footprint of a position is actually similar with nibble boards (but only then)
What you are eluding to is the first layer of nnune would like a 64x8bit input. This is a Limitation of nnune and not a strength.
With Galois field instructions you can have a uint64 as 64bits of input to a nn - skipping this expansion state.
This has also been possible with Cuda and popxnor networks for some time now.
The key thing is you can expand from BB to Neural network natively for some time now.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 37
- Joined: Fri Aug 05, 2022 7:58 am
- Full name: Arturs Priede
Re: Bitboards vs. Mailboxes in the Era of NNUE...
Could you provide some reference to this? I am currently solving each pawn individually. Shifting them all by 8 up or down sounds easy enough. But shifting them by 7 or 9 for captures seems more tricky but I guess applying a bitmask limiting these shifts only to the next rank would solve it. I guess additional care is needed to split promotions into 4 separate moves also? I guess double moves could also be solved by applying a 4th or 5th rank mask...
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Bitboards vs. Mailboxes in the Era of NNUE...
Well-programmed mailbox has always been faster than bitboard. But with an evaluation as expensive as NNUE this seems a moot point.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Bitboards vs. Mailboxes in the Era of NNUE...
Https://github.com/Gigantua/Gigantua/bl ... en.hpp#L33
You can copy paste the pawn BB approach into a cpp benchmark of your choice and compare with mail slot.
Also let's be careful with over-general claims. Performance depends on target architecture, compiler and overarching use case.
You can copy paste the pawn BB approach into a cpp benchmark of your choice and compare with mail slot.
If that were true then my movegen would not be the fastest CPU generator.
Also let's be careful with over-general claims. Performance depends on target architecture, compiler and overarching use case.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Bitboards vs. Mailboxes in the Era of NNUE...
I generally see 50 upsides (which are enough believe me)
Movegeneration is much faster with BB. Yes you have to loop over the resulting set but 3 branchless instructions are just faster than 27 conditional checks for a queen. I know you see this in the context of an engine with pseudo moves etc. - but it's still faster.
I see 2 non trivial difference:
Zobrist (solved - not published)
Nnue (not solved - ideas to solve are public)
Generally I suspect a Bfloat16 network should be much stronger than the AVX2 code used today. (duh!)
Galois field extensions OR Cuda close that gap.
Movegeneration is much faster with BB. Yes you have to loop over the resulting set but 3 branchless instructions are just faster than 27 conditional checks for a queen. I know you see this in the context of an engine with pseudo moves etc. - but it's still faster.
I see 2 non trivial difference:
Zobrist (solved - not published)
Nnue (not solved - ideas to solve are public)
Generally I suspect a Bfloat16 network should be much stronger than the AVX2 code used today. (duh!)
Galois field extensions OR Cuda close that gap.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 313
- Joined: Tue Aug 03, 2021 2:41 pm
- Full name: Bill Beame
Re: Bitboards vs. Mailboxes in the Era of NNUE...
How about PEXT? It solved every issue I had with magic bitboards with a nice increase in speed. My old mailbox engine had serious issues and probably wasn't bug free, so, I can't compare against mailboxes. I do; however, think my evaluation function is optimum under PEXT. I always appreciate your comments. Thought?
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Bitboards vs. Mailboxes in the Era of NNUE...
The problem with that move generator is exactly what it says: it is a move generator. That is nice for perft, but the speed of chess engines is determined by a capture generator. Plus how fast you can sort the captures, or at least extract the one with the highest sort key.dangi12012 wrote: ↑Mon Sep 26, 2022 10:38 pm If that were true then my movegen would not be the fastest CPU generator.
Also let's be careful with over-general claims. Performance depends on target architecture, compiler and overarching use case.
Another issue is that move generation in a context where nothing else is done (so it can use all CPU resources) is not really relevant for an engine, where many other tasks are essential too. So that you cannot afford to have the game state occupy all registers all the time.
-
- Posts: 1632
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: Bitboards vs. Mailboxes in the Era of NNUE...
Is a capture not a move than?hgm wrote: ↑Tue Sep 27, 2022 7:36 am The problem with that move generator is exactly what it says: it is a move generator. That is nice for perft, but the speed of chess engines is determined by a capture generator. Plus how fast you can sort the captures, or at least extract the one with the highest sort key.
The speed of a chess engine is mainly determined by the speed of the evaluation function and not by generating and sorting the moves. With NNUE evaluation it won't matter much if you go for the superior bitboard representation or for mailboxes.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Bitboards vs. Mailboxes in the Era of NNUE...
Thats what I mean - the capture generator is much faster with bitboards: Visibility & Enemyhgm wrote: ↑Tue Sep 27, 2022 7:36 am The problem with that move generator is exactly what it says: it is a move generator. That is nice for perft, but the speed of chess engines is determined by a capture generator. Plus how fast you can sort the captures, or at least extract the one with the highest sort key.
Another issue is that move generation in a context where nothing else is done (so it can use all CPU resources) is not really relevant for an engine, where many other tasks are essential too. So that you cannot afford to have the game state occupy all registers all the time.
With mailslot you would need an if for every possible target square.
Less registers and memory needed than 64 slots together with faster code is everything you mentioned above - but the point is actually for BB.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer