compact bitboard move generator

bob · Post by **bob** » Mon Feb 25, 2008 10:26 pm

Last week we discussed the idea of eliminating replicated code completely from the Crafty move generator, and I mentioned that I had plans to test an array of function pointers approach. Here's what I tried:

Code: Select all

BITBOARD KnightMoves&#40;TREE *tree, int from, BITBOARD target&#41; &#123;
  return &#40;knight_attacks&#91;from&#93; & target&#41;;
&#125;
BITBOARD BishopMoves&#40;TREE *tree, int from, BITBOARD target&#41; &#123;
  return &#40;AttacksBishop&#40;from, OccupiedSquares&#41; & target&#41;;
&#125;
BITBOARD RookMoves&#40;TREE *tree, int from, BITBOARD target&#41; &#123;
  return &#40;AttacksRook&#40;from, OccupiedSquares&#41; & target&#41;;
&#125;
BITBOARD QueenMoves&#40;TREE *tree, int from, BITBOARD target&#41; &#123;
  return &#40;AttacksQueen&#40;from, OccupiedSquares&#41; & target&#41;;
&#125;
BITBOARD KingMoves&#40;TREE *tree, int from, BITBOARD target&#41; &#123;
  return &#40;king_attacks&#91;from&#93; & target&#41;;
&#125;

Those five functions just return a bitboard set of <to> squares for a piece on from, with a requirement that the to squares must be in the set <target> as well.

I then declared this array of function pointers:

BITBOARD (*Generator[7]) (TREE*, int, BITBOARD) =
{ NULL, NULL, KnightMoves, BishopMoves, RookMoves, QueenMoves, KingMoves };

and used this for the compact move generator:

Code: Select all

  target = x;  /* Occupied&#91;btm&#93; to generate captures of opponent pieces for example */
  for &#40;ptype = knight; ptype <= king; ptype++) &#123;
    piecebd = Pieces&#40;wtm, ptype&#41;;
    while &#40;piecebd&#41; &#123;
      from = Advanced&#40;wtm, piecebd&#41;; 
      moves = Generator&#91;ptype&#91;&#40;tree, from, target&#41;;
      temp = from + &#40;ptype << 12&#41;;
      Unpack&#40;wtm, move, moves, temp&#41;;
      Clear&#40;from, piecebd&#41;;
    &#125;
  &#125;

Much shorter and cleaner, but, as I suspected 5-10% slower to boot, because now the 5 functions above can't be inlned because they really are function calls because of the array of pointers to functions used.

I then tried replacing the array of functions by a switch that generated each piece type separately, rather than using the Generator() function pointer, but that was not much better.

So the explicitly unrolled form I am currently using is going to be the norm for a while, it seems, as the performance hit is too much, just to obtain more concise code...

BTW, CCC is slow as the devil today, often taking 2-3-4 minutes to refresh a page...

Allard Siemelink · Post by **Allard Siemelink** » Tue Feb 26, 2008 12:46 am

Your code should go faster if you can get rid of the *tree and target parameters that do not seem to be doing anything useful...
e.g:

Code: Select all

BITBOARD KnightMoves&#40;int from&#41; &#123; 
  return knight_attacks&#91;from&#93;; 
&#125; 

...
moves = Generator&#91;ptype&#93;&#40;from&#41; & target;
...

wgarvin · Post by **wgarvin** » Tue Feb 26, 2008 1:05 am

Allard Siemelink wrote:Your code should go faster if you can get rid of the *tree and target parameters that do not seem to be doing anything useful...
e.g:
Code: Select all
BITBOARD KnightMoves&#40;int from&#41; &#123; 
  return knight_attacks&#91;from&#93;; 
&#125; 

...
moves = Generator&#91;ptype&#93;&#40;from&#41; & target;
...

The difference probably wouldn't be very big. The main slowdown probably comes from the dynamic dispatch or branch, and not being able to inline that code. Since this is a very tiny function (2-3 instructions?) the prologue and epilogue and pushing parameters and calling it, end up costing much more than the body of the function does, which would still be the case even if you got rid of a parameter or two.

Better to just keep the original code, even if its a bit more verbose.

Zach Wegner · Post by **Zach Wegner** » Tue Feb 26, 2008 1:24 am

I use the switch method, as I'd rather sacrifice the few % time for clearer code. Try using occupied & ~pawns rather than looping through piece types. I think this might be a bit faster, because I have it that way in my code now.

bob · Post by **bob** » Tue Feb 26, 2008 1:31 am

Allard Siemelink wrote:Your code should go faster if you can get rid of the *tree and target parameters that do not seem to be doing anything useful...
e.g:
Code: Select all
BITBOARD KnightMoves&#40;int from&#41; &#123; 
  return knight_attacks&#91;from&#93;; 
&#125; 

...
moves = Generator&#91;ptype&#93;&#40;from&#41; & target;
...

they are needed for the others. Things like RookAttacks() need the *tree pointer. The target is needed for all of them... It really doesn't matter whether the target is passed in, or used on the result. I tried it both ways and it made no difference at all, the cost appears to be in the procedure calls, rather than in anything done in the procedures. Remember I removed the procedure calls completely, which helped some, but then the switch() overhead was an issue...

bob · Post by **bob** » Tue Feb 26, 2008 1:33 am

Zach Wegner wrote:I use the switch method, as I'd rather sacrifice the few % time for clearer code. Try using occupied & ~pawns rather than looping through piece types. I think this might be a bit faster, because I have it that way in my code now.

I had actually thought of that already, but the test I ran had pieces of all types on the board and it was still slower than the separate segments for each piece type that I currently have...

rreagan · Post by **rreagan** » Tue Feb 26, 2008 3:46 am

This is what I've done (from memory). It's not quite as compact (plus I'm a little more long winded with my names), but it removes the bulk of the repetition. It's C++, but you could probably achieve the same with macros.

Code: Select all

template <class ATTACK_FUNCTION>
void gen_moves &#40;BITBOARD source_pieces, BITBOARD destination_mask, int move_type, MOVE_LIST & move_list&#41; &#123;
  while &#40;source_pieces&#41; &#123;
    source_square = first_one&#40;source_pieces&#41;;
    clear_bit&#40;source_pieces, source_square&#41;;
    moves = ATTACK_FUNCTION&#40;source_square&#41; & destination_mask;
    while &#40;moves&#41; &#123;
      destination_square = first_one&#40;moves&#41;;
      clear_bit&#40;moves, destination_square&#41;;
      move_list.add_move&#40;MOVE&#40;source_square, destination_square, move_type&#41;);
    &#125;
  &#125;
&#125;

void generate_moves &#40;MOVE_LIST & move_list&#41; &#123;
  gen_moves<knight_attacks>&#40;friendly_knights, enemy_pieces, CAPTURE, move_list&#41;;
  gen_moves<knight_attacks>&#40;friendly_knights, empty_squares, NONCAPTURE, move_list&#41;;
  gen_moves<bishop_attacks>&#40;friendly_bishops | friendly_queens, enemy_pieces , move_list&#41;;
// and so on...
&#125;

Aleks Peshkov · Post by **Aleks Peshkov** » Tue Feb 26, 2008 1:07 pm

Using magic bitboards method, you need only 3 functions:
1) kings, knights, pawns with an argument pointer to attack table.
2) bishops, rooks with a pointer to uniform magic data structure.
3) queen case, one may unroll it to separate rooks+bishops, but I do not.

3-way switch or indirect call have better prediction rate.

bob · Post by **bob** » Tue Feb 26, 2008 5:59 pm

Aleks Peshkov wrote:Using magic bitboards method, you need only 3 functions:
1) kings, knights, pawns with an argument pointer to attack table.
2) bishops, rooks with a pointer to uniform magic data structure.
3) queen case, one may unroll it to separate rooks+bishops, but I do not.

3-way switch or indirect call have better prediction rate.

The prediction rate is not the issue, until you eliminate the branch completely. The main problem is the cache footprint causes different pieces of that generator function to be used/skipped each pass through the thing, which simply increases the cache miss rate unacceptably. The indirect function call was too expensive because of lack of inlining, the switch has two extra branches to test for the proper range of the switch variable, all of which drive up the cost...

I'm still playing with options, but so far the "unrolled" approach I currently use has been the fastest.

compact bitboard move generator

compact bitboard move generator

Re: compact bitboard move generator

Re: compact bitboard move generator

Re: compact bitboard move generator

Re: compact bitboard move generator

Re: compact bitboard move generator

Re: compact bitboard move generator

Re: compact bitboard move generator

Re: compact bitboard move generator