compact bitboard move generator

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

compact bitboard move generator

Post by bob »

Last week we discussed the idea of eliminating replicated code completely from the Crafty move generator, and I mentioned that I had plans to test an array of function pointers approach. Here's what I tried:

Code: Select all

BITBOARD KnightMoves(TREE *tree, int from, BITBOARD target) {
  return (knight_attacks[from] & target);
}
BITBOARD BishopMoves(TREE *tree, int from, BITBOARD target) {
  return (AttacksBishop(from, OccupiedSquares) & target);
}
BITBOARD RookMoves(TREE *tree, int from, BITBOARD target) {
  return (AttacksRook(from, OccupiedSquares) & target);
}
BITBOARD QueenMoves(TREE *tree, int from, BITBOARD target) {
  return (AttacksQueen(from, OccupiedSquares) & target);
}
BITBOARD KingMoves(TREE *tree, int from, BITBOARD target) {
  return (king_attacks[from] & target);
}
Those five functions just return a bitboard set of <to> squares for a piece on from, with a requirement that the to squares must be in the set <target> as well.

I then declared this array of function pointers:

BITBOARD (*Generator[7]) (TREE*, int, BITBOARD) =
{ NULL, NULL, KnightMoves, BishopMoves, RookMoves, QueenMoves, KingMoves };

and used this for the compact move generator:

Code: Select all

  target = x;  /* Occupied&#91;btm&#93; to generate captures of opponent pieces for example */
  for &#40;ptype = knight; ptype <= king; ptype++) &#123;
    piecebd = Pieces&#40;wtm, ptype&#41;;
    while &#40;piecebd&#41; &#123;
      from = Advanced&#40;wtm, piecebd&#41;; 
      moves = Generator&#91;ptype&#91;&#40;tree, from, target&#41;;
      temp = from + &#40;ptype << 12&#41;;
      Unpack&#40;wtm, move, moves, temp&#41;;
      Clear&#40;from, piecebd&#41;;
    &#125;
  &#125;
Much shorter and cleaner, but, as I suspected 5-10% slower to boot, because now the 5 functions above can't be inlned because they really are function calls because of the array of pointers to functions used.

I then tried replacing the array of functions by a switch that generated each piece type separately, rather than using the Generator() function pointer, but that was not much better.

So the explicitly unrolled form I am currently using is going to be the norm for a while, it seems, as the performance hit is too much, just to obtain more concise code...

BTW, CCC is slow as the devil today, often taking 2-3-4 minutes to refresh a page...
Allard Siemelink
Posts: 297
Joined: Fri Jun 30, 2006 9:30 pm
Location: Netherlands

Re: compact bitboard move generator

Post by Allard Siemelink »

Your code should go faster if you can get rid of the *tree and target parameters that do not seem to be doing anything useful...
e.g:

Code: Select all

BITBOARD KnightMoves&#40;int from&#41; &#123; 
  return knight_attacks&#91;from&#93;; 
&#125; 

...
moves = Generator&#91;ptype&#93;&#40;from&#41; & target;
...
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: compact bitboard move generator

Post by wgarvin »

Allard Siemelink wrote:Your code should go faster if you can get rid of the *tree and target parameters that do not seem to be doing anything useful...
e.g:

Code: Select all

BITBOARD KnightMoves&#40;int from&#41; &#123; 
  return knight_attacks&#91;from&#93;; 
&#125; 

...
moves = Generator&#91;ptype&#93;&#40;from&#41; & target;
...
The difference probably wouldn't be very big. The main slowdown probably comes from the dynamic dispatch or branch, and not being able to inline that code. Since this is a very tiny function (2-3 instructions?) the prologue and epilogue and pushing parameters and calling it, end up costing much more than the body of the function does, which would still be the case even if you got rid of a parameter or two.

Better to just keep the original code, even if its a bit more verbose.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: compact bitboard move generator

Post by Zach Wegner »

I use the switch method, as I'd rather sacrifice the few % time for clearer code. Try using occupied & ~pawns rather than looping through piece types. I think this might be a bit faster, because I have it that way in my code now.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: compact bitboard move generator

Post by bob »

Allard Siemelink wrote:Your code should go faster if you can get rid of the *tree and target parameters that do not seem to be doing anything useful...
e.g:

Code: Select all

BITBOARD KnightMoves&#40;int from&#41; &#123; 
  return knight_attacks&#91;from&#93;; 
&#125; 

...
moves = Generator&#91;ptype&#93;&#40;from&#41; & target;
...
they are needed for the others. Things like RookAttacks() need the *tree pointer. The target is needed for all of them... It really doesn't matter whether the target is passed in, or used on the result. I tried it both ways and it made no difference at all, the cost appears to be in the procedure calls, rather than in anything done in the procedures. Remember I removed the procedure calls completely, which helped some, but then the switch() overhead was an issue...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: compact bitboard move generator

Post by bob »

Zach Wegner wrote:I use the switch method, as I'd rather sacrifice the few % time for clearer code. Try using occupied & ~pawns rather than looping through piece types. I think this might be a bit faster, because I have it that way in my code now.
I had actually thought of that already, but the test I ran had pieces of all types on the board and it was still slower than the separate segments for each piece type that I currently have...
rreagan
Posts: 102
Joined: Sun Sep 09, 2007 6:32 am

Re: compact bitboard move generator

Post by rreagan »

This is what I've done (from memory). It's not quite as compact (plus I'm a little more long winded with my names), but it removes the bulk of the repetition. It's C++, but you could probably achieve the same with macros.

Code: Select all

template <class ATTACK_FUNCTION>
void gen_moves &#40;BITBOARD source_pieces, BITBOARD destination_mask, int move_type, MOVE_LIST & move_list&#41; &#123;
  while &#40;source_pieces&#41; &#123;
    source_square = first_one&#40;source_pieces&#41;;
    clear_bit&#40;source_pieces, source_square&#41;;
    moves = ATTACK_FUNCTION&#40;source_square&#41; & destination_mask;
    while &#40;moves&#41; &#123;
      destination_square = first_one&#40;moves&#41;;
      clear_bit&#40;moves, destination_square&#41;;
      move_list.add_move&#40;MOVE&#40;source_square, destination_square, move_type&#41;);
    &#125;
  &#125;
&#125;

void generate_moves &#40;MOVE_LIST & move_list&#41; &#123;
  gen_moves<knight_attacks>&#40;friendly_knights, enemy_pieces, CAPTURE, move_list&#41;;
  gen_moves<knight_attacks>&#40;friendly_knights, empty_squares, NONCAPTURE, move_list&#41;;
  gen_moves<bishop_attacks>&#40;friendly_bishops | friendly_queens, enemy_pieces , move_list&#41;;
// and so on...
&#125;
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: compact bitboard move generator

Post by Aleks Peshkov »

Using magic bitboards method, you need only 3 functions:
1) kings, knights, pawns with an argument pointer to attack table.
2) bishops, rooks with a pointer to uniform magic data structure.
3) queen case, one may unroll it to separate rooks+bishops, but I do not.

3-way switch or indirect call have better prediction rate.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: compact bitboard move generator

Post by bob »

Aleks Peshkov wrote:Using magic bitboards method, you need only 3 functions:
1) kings, knights, pawns with an argument pointer to attack table.
2) bishops, rooks with a pointer to uniform magic data structure.
3) queen case, one may unroll it to separate rooks+bishops, but I do not.

3-way switch or indirect call have better prediction rate.
The prediction rate is not the issue, until you eliminate the branch completely. The main problem is the cache footprint causes different pieces of that generator function to be used/skipped each pass through the thing, which simply increases the cache miss rate unacceptably. The indirect function call was too expensive because of lack of inlining, the switch has two extra branches to test for the proper range of the switch variable, all of which drive up the cost...

I'm still playing with options, but so far the "unrolled" approach I currently use has been the fastest.