Bitboard Tricks for Large Chess Variants

Ed Trice · Post by **Ed Trice** » Sat Nov 01, 2014 3:20 am

After seeing how my old code for a 10x8 variant is doing against some new players in the field, I think it is time for a complete rewrite.

Gothic Vortex (white) lost to Bihasa (black) from this position, which is well on the path to its demise already.

Vortex is a 10x8 port of much older Crafty code, going back to the days of its 45 degree rotated bitboards for Bishop sliding moves. Then I started thinking about my own 64-bit move generator for the game of checkers, which is set up like this:

Those who are familiar with checkers may wonder what good are 64-bits on a game that uses only 32 squares? The answer is: you get a much simpler move generator using 64-bits on the squares show.

Every checker move to the top-right is a shift of 9 bits. Every such shift always results in a legal destination. If there is not bit specified, such as 44 + 9 = 53, then that is "off the board." Likewise, moves to the top-left are upward shifts of 1-bit. If you numbered the squares 1-32 instead, you have to check odd/even ranks, and do different bit masks depending on where you are. It adds branch points to the code, as well as some complexity. I believe my entire move generator was something like 39 lines of code.

Anyway, I was wondering if those of you with 10x8 variant move generators have any pseudo-magic squares for use with your bitboards, so you don't have to resort to rotated bitboards or other contortionist moves? I was just playing around with a Knight Bitboard for 10x8:

This is a 107-bit ugly looking thing, but with operator overloading (making things like >> work seamlessly) your code could look very clean.

Every knight hop up-2-and-over-1 is an upward shift of 13 bits. Likewise, where there is no number specified, the bit falls off of the destination bitboard via a simple stamping mask.

Other offset moves of the Knight are shown.

Do any of you bother to do something like this for larger boards? Or is it a waste of time, and you just use Mailbox offsets in arrays?

Evert · Post by **Evert** » Sat Nov 01, 2014 7:16 am

I have a multi-variant bitboard engine called "Sjaak", which uses 128 bit integers for boards larger than 8x8. The original is in C and duplicates the code path for 64 and 128 bits, which is rather ugly. The development version is largely a re-write that uses C++ templates to hide the details.

There are some disadvantages to 128 bit integers though: they're obviously large, and they're slow unless you have native support for them.

For move generation I use three different standard bitboard techniques. Pawn moves are generated in parallel by shifting the entire bitboard up (down) one rank. For leaper moves I use a lookup table, indexed by square, to identify possible destination squares. Sliders use what the wiki calls "kindergarten" bitboards, which are basically lookup tables indexed by the occupancy bitfield along a ray. I want mine as compact as possible, so I share the rank-attack tables with the diagonal (bishop) tables, but I still use the full occupancy mask (10 or 8 bit, for rank attacks or file attacks) for simplicity. You can knock that down to 8 and 6 bits because the occupancy on the outside squares doesn't matter. Come to think of it, it may be worthwhile for me to do that too.

All of this works just fine, but one thing to bear in mind is this: I use bitboards partially because I like the technical challenge of writing an arbitrary boardsize engine using them. I don't know if they're superior to other approaches in any way, and actually suspect that they probably aren't for large boards.

hgm · Post by **hgm** » Sat Nov 01, 2014 7:57 am

Note that whether you use bitboards or not, and what flavor exactly, will have almost zero impact on "how your engine is doing". Even if you could speed up your engine by a factor 2 (which is very doubtful), it would only buy you 50-70 Elo. You can already test how it would do without doing any work, just play it with time odds where you give it twice as much time as the opponent. Chances are you would still not be very happy about the performance.

Engine strength comes mainly from search selectivity, and only for a very small part from raw speed (nps).

Evert · Post by **Evert** » Sat Nov 01, 2014 8:05 am

hgm wrote:Note that whether you use bitboards or not, and what flavor exactly, will have almost zero impact on "how your engine is doing". Even if you could speed up your engine by a factor 2 (which is very doubtful), it would only buy you 50-70 Elo. You can already test how it would do without doing any work, just play it with time odds where you give it twice as much time as the opponent. Chances are you would still not be very happy about the performance.

Engine strength comes mainly from search selectivity, and only for a very small part from raw speed (nps).

+1

Which is part of the reason I don't worry so much about it anyway. Most of the time in Sjaak is spent in the evaluation function, which is called at every node. Limiting the number of eval calls is the best way to speed it up, which is what a more selective search will accomplish in the end.

EDIT: I should point out that Sjaak takes a much larger hit when it runs in 32 bit mode than other engines because it needs to emulate the 128 bit integers, which is much more expensive. Modern 64 bit machines can do 128 bit integers much faster. Even so, this doesn't set it back more than 100 elo or so, which is clearly not what separates it from the likes of Stockfish.

SMIRF · Post by **SMIRF** » Sat Nov 01, 2014 8:57 am

Being surrounded by a relevant amount of copy cats it seems to be necessary to focus on individually found new interesting approaches. Thus for me it is not that important, that a new patchworked engine reaches the elo tops of racing camels. Instead any genuine approach is worth discussing and appreciation.

Ed Trice · Post by **Ed Trice** » Sat Nov 01, 2014 2:22 pm

hgm wrote:Note that whether you use bitboards or not, and what flavor exactly, will have almost zero impact on "how your engine is doing".

I am aware, branching factor is the real killer. This past summer, one of my co-workers bought me a 4x4x4 Rubik's Revenge cube. I had first solved one back in 1984, and was frustrated that I could no longer remember how to finish off the last layer. So, I wrote a brute force solver for it:

The branching factor is 36, and I was unaware of any pruning tricks when I first started. So, I decided to add "endgame databases" to it by storing every position that was 1-turn from being solved , 2-turns from being solved, etc., up to 6-tfs. This was not a branch pruning mechanism, this was a TRUNK pruning mechanism. Chopping 6-ply from a tree that grows at 36^n is substantial, but still was not good enough to get deep enough searches to solve some positions requiring 15 moves.

Eventually I added some leaf-node pruning based on move sequences that would just "undo" previous ply's moves, or just ended up rotating the cube in place without getting closer to a solution, and things of this nature.

But, back on topic...

I have always had a programming interest in four areas:

1. Making the fastest possible move generator
2. Using bitboards to try and achieve the above
3. Making a move generator with the fewest possible instructions
4. Using endgame databases/tablebases if applicable!

For now, I am starting over with Gothic Vortex, and focusing on item #1.

Rein Halbersma · Post by **Rein Halbersma** » Sat Nov 01, 2014 10:23 pm

Evert wrote:I have a multi-variant bitboard engine called "Sjaak", which uses 128 bit integers for boards larger than 8x8. The original is in C and duplicates the code path for 64 and 128 bits, which is rather ugly. The development version is largely a re-write that uses C++ templates to hide the details.

Yes, with C++ templates and operator overloading, you can have native syntax (&, ^, |, <<, >>) for user-defined types. I have my own modified bitset class that does the same, and which reduces to a class wrapper around a uin64_t variable for small boards. Do you have a link to the development code of Sjaak?

There are some disadvantages to 128 bit integers though: they're obviously large, and they're slow unless you have native support for them.

I looked at the stable Sjaak version and the 128-bit code looks very handy. I adapted it to my own library but unfortunately, Clang/libc++ does not fully support 128-bit integers (missing traits and type_info), at least not for my out-of-date version, and I cannot use gcc/libstdc++ because gcc doesn't support the C++14 constexpr feature that my library uses all over the place.

For move generation I use three different standard bitboard techniques. Pawn moves are generated in parallel by shifting the entire bitboard up (down) one rank. For leaper moves I use a lookup table, indexed by square, to identify possible destination squares. Sliders use what the wiki calls "kindergarten" bitboards, which are basically lookup tables indexed by the occupancy bitfield along a ray. I want mine as compact as possible, so I share the rank-attack tables with the diagonal (bishop) tables, but I still use the full occupancy mask (10 or 8 bit, for rank attacks or file attacks) for simplicity. You can knock that down to 8 and 6 bits because the occupancy on the outside squares doesn't matter. Come to think of it, it may be worthwhile for me to do that too.

For my draughts library (also supporting arbitrary boards), I use very low-level techniques: just plain old shifting of an index along a bitboard (CPW "Dumb7fill") and also the old-fashioned ray-wise table lookup (CPW "Classical approach"), and they are about the same speed. I never tried my hand at magics for anything beyond 64-bit boards, but your 128-bit tricks look interesting.

One disadvantage of lookup tables over plain looping is that for N * N boards, tables scale as O(N^2) and looping only as O(N). For a chessboard, looping on an empty board only takes 3.5 iterations on average, and even less on densely occupied boards. A few iterations without any cache-polluting tables might not be a bad tradeoff (at least it isn't in my program).

All of this works just fine, but one thing to bear in mind is this: I use bitboards partially because I like the technical challenge of writing an arbitrary boardsize engine using them. I don't know if they're superior to other approaches in any way, and actually suspect that they probably aren't for large boards.

I use bitboards for the data-parallelism you get from computing patterns during evaluation. I have also defined some templates that mimic direction-wise iterators over the board that allow me to loop over a bitboard as if it were a regular array. I could even try to map the 12*10 mailbox coordinates to a 120-bit bitboard so that all edge-testing disappears at the cost of doubling the bitboards.

Evert · Post by **Evert** » Sun Nov 02, 2014 9:50 am

Rein Halbersma wrote: Yes, with C++ templates and operator overloading, you can have native syntax (&, ^, |, <<, >>) for user-defined types. I have my own modified bitset class that does the same, and which reduces to a class wrapper around a uin64_t variable for small boards. Do you have a link to the development code of Sjaak?

Not yet, because the repository isn't on a public server.

What I did is define a bitboard template class over an arbitrary type, which just reduces to just plain integer manipulations for simple integer types, but I can specialise it for other types (say an array to represent larger boards than could be handled with any simple integer type), and actually do for 128 bit integers (using the same code as in the last release version of Sjaak).

I looked at the stable Sjaak version and the 128-bit code looks very handy. I adapted it to my own library but unfortunately, Clang/libc++ does not fully support 128-bit integers (missing traits and type_info), at least not for my out-of-date version, and I cannot use gcc/libstdc++ because gcc doesn't support the C++14 constexpr feature that my library uses all over the place.

Interesting. I have Clang 4.1 and it all seems to work here.

One disadvantage of lookup tables over plain looping is that for N * N boards, tables scale as O(N^2) and looping only as O(N). For a chessboard, looping on an empty board only takes 3.5 iterations on average, and even less on densely occupied boards. A few iterations without any cache-polluting tables might not be a bad tradeoff (at least it isn't in my program).

For sure. However, changing my move generator to work that way would not be entirely transparent, so it's not something I can experiment with easily...

Bitboard Tricks for Large Chess Variants

Bitboard Tricks for Large Chess Variants

Re: Bitboard Tricks for Large Chess Variants

Re: Bitboard Tricks for Large Chess Variants

Re: Bitboard Tricks for Large Chess Variants

Re: Bitboard Tricks for Large Chess Variants

Re: Bitboard Tricks for Large Chess Variants

Re: Bitboard Tricks for Large Chess Variants

Re: Bitboard Tricks for Large Chess Variants