Fast lookup of pieces for move generation without bitboards

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

zd3nik
Posts: 193
Joined: Wed Mar 11, 2015 3:34 am
Location: United States

Fast lookup of pieces for move generation without bitboards

Post by zd3nik »

In engines that don't use bitboards, is there a commonly used technique for keeping track of where the pieces of each color are so you don't have to scan the entire board to find them during move generation?

It's very trivial to incrementally update a table of counts for each piece type, but I'm not having luck thinking of a way to incrementally update a fast lookup map of piece positions.

I ask because I'm doing profiling on a new experimental 0x88 engine I'm tinkering on and it's showing really high cost in the top level move generation function, which does nothing but scan the board for pieces of the desired color and then calls individual generators for each piece type. The profiler I'm using (valgrind) does not include the cost of sub calls in the reported cost of a given function so it's just the board scan that's killing it

Here's some pseudo C code:

Code: Select all

  int pieces = pieceCount[color];
  for &#40;int sqr = A1; &#40;pieces > 0&#41; && &#40;sqr <= H8&#41;; ++sqr&#41; &#123;
    if &#40;sqr & 0x88&#41; &#123;
      sqr += 7;
    &#125;
    else switch &#40;board&#91;sqr&#93;) &#123;
      case &#40;color|Pawn&#41;&#58;   --pieces; generatePawnMoves&#40;sqr&#41;; break;
      case &#40;color|Knight&#41;&#58; --pieces; generateKnightMoves&#40;sqr&#41;; break;
      case &#40;color|Bishop&#41;&#58; --pieces; generateBishopMoves&#40;sqr&#41;; break;
      case &#40;color|Rook&#41;&#58;   --pieces; generateRookMoves&#40;sqr&#41;; break;
      case &#40;color|Queen&#41;&#58;  --pieces; generateQueenMoves&#40;sqr&#41;; break;
      case &#40;color|King&#41;&#58;   --pieces; generateKingMoves&#40;sqr&#41;; break;
    &#125;
  &#125;
NOTE: The real routine scans in the opposite direction (H8 down to A1) for black.
JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Fast lookup of pieces for move generation without bitboa

Post by JVMerlino »

zd3nik wrote:In engines that don't use bitboards, is there a commonly used technique for keeping track of where the pieces of each color are so you don't have to scan the entire board to find them during move generation?

It's very trivial to incrementally update a table of counts for each piece type, but I'm not having luck thinking of a way to incrementally update a fast lookup map of piece positions.

I ask because I'm doing profiling on a new experimental 0x88 engine I'm tinkering on and it's showing really high cost in the top level move generation function, which does nothing but scan the board for pieces of the desired color and then calls individual generators for each piece type. The profiler I'm using (valgrind) does not include the cost of sub calls in the reported cost of a given function so it's just the board scan that's killing it

Here's some pseudo C code:

Code: Select all

  int pieces = pieceCount&#91;color&#93;;
  for &#40;int sqr = A1; &#40;pieces > 0&#41; && &#40;sqr <= H8&#41;; ++sqr&#41; &#123;
    if &#40;sqr & 0x88&#41; &#123;
      sqr += 7;
    &#125;
    else switch &#40;board&#91;sqr&#93;) &#123;
      case &#40;color|Pawn&#41;&#58;   --pieces; generatePawnMoves&#40;sqr&#41;; break;
      case &#40;color|Knight&#41;&#58; --pieces; generateKnightMoves&#40;sqr&#41;; break;
      case &#40;color|Bishop&#41;&#58; --pieces; generateBishopMoves&#40;sqr&#41;; break;
      case &#40;color|Rook&#41;&#58;   --pieces; generateRookMoves&#40;sqr&#41;; break;
      case &#40;color|Queen&#41;&#58;  --pieces; generateQueenMoves&#40;sqr&#41;; break;
      case &#40;color|King&#41;&#58;   --pieces; generateKingMoves&#40;sqr&#41;; break;
    &#125;
  &#125;
NOTE: The real routine scans in the opposite direction (H8 down to A1) for black.
You're probably already familiar with this and looking for something better, but just in case:

http://chessprogramming.wikispaces.com/Piece-Lists

jm
User avatar
hgm
Posts: 27825
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Fast lookup of pieces for move generation without bitboa

Post by hgm »

The technique is called 'piece list'. It works as follows:

In stead of piece types, you store piece numbers in the board array. Unlike the piece numbers, these are unique. E.g. 16-31 for 16 white pieces (24-31 being Pawns) and 32-47 for the black pieces.

Then you keep a list indexed by piece number (i.e. with 48 elements) that holds the square number where the piece currently is. It can also hold other info on the pieces, e.g. their type, their value, their start element in the PST and Zobrist key table, etc.

Making a move then goes like this:

Code: Select all

piece = board&#91;from&#93;;
victim = board&#91;to&#93;;
board&#91;to&#93; = piece; // actually this should be the promoted form of the piece, which might not have the same number
board&#91;from&#93; = EMPTY; // some recognizably invalid piece number, like 0 
list&#91;piece&#93;.location = to;
list&#91;victim&#93;.location = CAPTURED; // some recognizably invalid square number, like -1;
It is often advantageous to keep the piece numbers in some meaningful order, e.g. have a contiguous Pawn section, a Knights section, a slider section and royal section. When you want to do something with a sub-section of the pieces, e.g. evaluate Pawn structure, you just loop over the relevant section. (But be sure to skip those pieces marked as CAPTURED!) Sliders need other move-generation code than Knights and Kings, so your move generator could use different loops for generating slider, leaper and pawn moves.

Not every piece number has to be in use all the time; in Joker I organized the leaper and slider section as stacks, with sufficient free space above them to add new pieces to them because of promotions. The Pawn that promoted is then temporarily marked as CAPTURED.

I typically assign the numbers such that you can see from the number already that you are dealing with a Pawn, without having to look up the type in the piece list. (I.e. test for (pieceNr&8) instead of (list[pieceNr].type == PAWN).) This speeds up testing whether a given square is attacked by a Pawn (for which you just have to look to two board squares). To test if a given square is attacked by a non-Pawn you can then loop through the leaper and slider sections of the piece list, and (if the piece is not CAPTURED) apply the 0x88 attack test for leapers or sliders to them, respectively. The King is always the first element of the list for each color, so I can always get the King positions by looking in list[side+KING].location.

You can look at the source of qperft for a simple real-life example.
User avatar
hgm
Posts: 27825
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Fast lookup of pieces for move generation without bitboa

Post by hgm »

Btw, I recently realized that there is a reasonbly efficient alternative for this when you keep a view-distance board. Such a board would keep track of the distance to the nearest obstacle in each of the 8 directions (which could be a board edge). Having this you could use the +rank direction to hop from occupied square to occupied square along the rank, (starting in the first ring of boundary guards around the board) and just repeat that for all ranks. This visits all occupied squares, so you still would have to figure out if it is an own piece or an enemy. But at least you would skip the empty squares.

In Shokidoki I do not use a piece list, because pieces can change owner, so that you would need all pieces to be present in the lists of both sides, and mark them as CAPTURED in the list of the person that currently does not have them. This typically causes half of the list for each side to be CAPTURED, and there still would be a lot of overhead for skipping these. The board also never gets empty, and also has a filling fraction of ~50%. So I might as well scan the board, and save me the overhead of updating the piece list.

You can actually make the best of scanning over occupied squares of both colors, by genererating moves for both sides at once (getting mobility, king attack and attacker/protector counts as a cheap side effect), and then use the opponent move list for doing the null-move search before starting searching your own moves.
zd3nik
Posts: 193
Joined: Wed Mar 11, 2015 3:34 am
Location: United States

Re: Fast lookup of pieces for move generation without bitboa

Post by zd3nik »

JVMerlino wrote:You're probably already familiar with this and looking for something better, but just in case:

http://chessprogramming.wikispaces.com/Piece-Lists

jm
I wasn't already familiar with this. It looks very similar to something I had in mind. But there are a couple tricks in there I didn't think about which make it far more optimal than what I was thinking, so I'll give it a try.

Thanks! :-)
zd3nik
Posts: 193
Joined: Wed Mar 11, 2015 3:34 am
Location: United States

Re: Fast lookup of pieces for move generation without bitboa

Post by zd3nik »

hgm wrote:Btw, I recently realized that there is a reasonbly efficient alternative for this when you keep a view-distance board. Such a board would keep track of the distance to the nearest obstacle in each of the 8 directions (which could be a board edge). Having this you could use the +rank direction to hop from occupied square to occupied square along the rank, (starting in the first ring of boundary guards around the board) and just repeat that for all ranks. This visits all occupied squares, so you still would have to figure out if it is an own piece or an enemy. But at least you would skip the empty squares.

In Shokidoki I do not use a piece list, because pieces can change owner, so that you would need all pieces to be present in the lists of both sides, and mark them as CAPTURED in the list of the person that currently does not have them. This typically causes half of the list for each side to be CAPTURED, and there still would be a lot of overhead for skipping these. The board also never gets empty, and also has a filling fraction of ~50%. So I might as well scan the board, and save me the overhead of updating the piece list.

You can actually make the best of scanning over occupied squares of both colors, by genererating moves for both sides at once (getting mobility, king attack and attacker/protector counts as a cheap side effect), and then use the opponent move list for doing the null-move search before starting searching your own moves.
I must admit I don't follow what you're saying about the view-distance boards. It's over my head I guess.

I'm curious about the captured flag idea though. Why don't you just truncate the list as shown in the example on http://chessprogramming.wikispaces.com/Piece-Lists ? Setting the pieces to captured means you'll have to iterate over them. No such time is wasted when you simply truncate the list. Truncating moves things around, but I don't see that as a problem; pieces could still be grouped by type within predictable offset ranges.
User avatar
hgm
Posts: 27825
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Fast lookup of pieces for move generation without bitboa

Post by hgm »

What do you mean by 'truncating'? The captured piece will in general not be the last in the list, it will be somewhere in the middle. Compacting the list to squeeze out the piece, (requiring assigning new numbers to the pieces on the board!), and re-expanding it on UnMake, would be very expensive. In QS you would have to do it all the time, and there might not be enough nodes downstream to earn it back by not having to skip that one piece. In the root I of course repack the lists, so that you start every search with a hole-free list.

If you don;t want any overhead from skipping, you should accompany the piece list by a 'presence' mask. You can then clear bits in that mask that correspond to pieces that are captured. E.g. add in MakeMove():

Code: Select all

present &= ~list&#91;victim&#93;.bit; // bit flag corresponding to piece
and in UnMake you set it again (or restore from the saved value). You could already store the complemented bit in the piece-list field to save a complement operation.

You can then loop over the non-captured pieces by extracting the 1 bits from the word with the usual bit-scan techniques, and skipping the captured ones takes no extra time and no branches. If you want to loop over a section of the piece list you just mask out that section, e.g.

Code: Select all

int todo = present & WHITE & SLIDERS;
while&#40;todo&#41; &#123;
  int next = todo & -todo;
  int piece = BSF&#40;next&#41;;
  ...
  todo -= next;
&#125;
[Edit] Oh, I see that they just re-assign one piece number, remapping the last piece into the created hole. But that would only work if all pieces are the same type, if you want the order to have any meaning. (And I want them in order of value, to automatically generate captures in MVV/LVA order).

Having only one typeof piece in a list means that you need many lists, one for each piece type. This just moves the problem of the holes to the holes between the lists. You would get the overhead of setting up a new loop over a list that contains 0, 1 or 2 pieces many times. It doesn't sound like a competative method at all.

An alternative would be to organize the piece list as a doubly linked list, so that you could remove the captured pieces from the list, and looping through it would automatically skip the captured pieces:

Code: Select all

list&#91;list&#91;victim&#93;.pred&#93;.succ = list&#91;victim&#93;.succ;
list&#91;list&#91;victim&#93;.succ&#93;.pred = list&#91;victim&#93;.pred;
and during UnMake:

Code: Select all

list&#91;list&#91;victim&#93;.pred&#93;.succ = victim;
list&#91;list&#91;victim&#93;.succ&#93;.pred = victim;
I tried that in qperft, but following the links creates a long-latency dependency chain in the (in generally tight) loops, and it was slower.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Fast lookup of pieces for move generation without bitboa

Post by stegemma »

zd3nik wrote:[...]Setting the pieces to captured means you'll have to iterate over them. No such time is wasted when you simply truncate the list. Truncating moves things around, but I don't see that as a problem; pieces could still be grouped by type within predictable offset ranges.
Iterate through captured piece is not so expensive as iterate all over the board. You can use a flag (it's the faster solution) but you can also avoid using a captured flag at all, if you store in the chessboard the index of any piece and in the piece list the index to the related squares. When you capture a piece, you can avoid to update captured flag if you just test the piece index on the square against the index of the piece itself: if don't match then the piece has been captured:

Code: Select all

loop i through piece list
  square = piece&#91;i&#93;
  index = board&#91;square&#93;
  if index not equal to i then captured
This would add a read of the board for any piece but would save the update of the captured flag at any capture move and relative undo.

Another way to avoid captured flag could be to exchange the captured piece (in piece list) with the last one on the piece list of the same color and then diminish the pieces count. Of course you should update the board index for the last piece that you have exchanged:

Code: Select all

idx = index of captured piece
last = index of last piece
exchange piece&#91;idx&#93; with piece&#91;last&#93;
square = piece&#91;idx&#93; &#40;now it is the one who was the last&#41;
board&#91;square&#93; = idx
decrement number of pieces

That way you don't iterate over the captured piece anymore. You can do the same when you undo the capture move.

I've used this on some version of Drago but I don't remember if they were fast than iterate over captured pieces.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
zd3nik
Posts: 193
Joined: Wed Mar 11, 2015 3:34 am
Location: United States

Re: Fast lookup of pieces for move generation without bitboa

Post by zd3nik »

hgm wrote:What do you mean by 'truncating'? The captured piece will in general not be the last in the list, it will be somewhere in the middle. Compacting the list to squeeze out the piece, (requiring assigning new numbers to the pieces on the board!), and re-expanding it on UnMake, would be very expensive. In QS you would have to do it all the time, and there might not be enough nodes downstream to earn it back by not having to skip that one piece. In the root I of course repack the lists, so that you start every search with a hole-free list.

If you don;t want any overhead from skipping, you should accompany the piece list by a 'presence' mask. You can then clear bits in that mask that correspond to pieces that are captured. E.g. add in MakeMove():

Code: Select all

present &= ~list&#91;victim&#93;.bit; // bit flag corresponding to piece
and in UnMake you set it again (or restore from the saved value). You could already store the complemented bit in the piece-list field to save a complement operation.

You can then loop over the non-captured pieces by extracting the 1 bits from the word with the usual bit-scan techniques, and skipping the captured ones takes no extra time and no branches. If you want to loop over a section of the piece list you just mask out that section, e.g.

Code: Select all

int todo = present & WHITE & SLIDERS;
while&#40;todo&#41; &#123;
  int next = todo & -todo;
  int piece = BSF&#40;next&#41;;
  ...
  todo -= next;
&#125;
[Edit] Oh, I see that they just re-assign one piece number, remapping the last piece into the created hole. But that would only work if all pieces are the same type, if you want the order to have any meaning. (And I want them in order of value, to automatically generate captures in MVV/LVA order).

Having only one typeof piece in a list means that you need many lists, one for each piece type. This just moves the problem of the holes to the holes between the lists. You would get the overhead of setting up a new loop over a list that contains 0, 1 or 2 pieces many times. It doesn't sound like a competative method at all.

An alternative would be to organize the piece list as a doubly linked list, so that you could remove the captured pieces from the list, and looping through it would automatically skip the captured pieces:

Code: Select all

list&#91;list&#91;victim&#93;.pred&#93;.succ = list&#91;victim&#93;.succ;
list&#91;list&#91;victim&#93;.succ&#93;.pred = list&#91;victim&#93;.pred;
and during UnMake:

Code: Select all

list&#91;list&#91;victim&#93;.pred&#93;.succ = victim;
list&#91;list&#91;victim&#93;.succ&#93;.pred = victim;
I tried that in qperft, but following the links creates a long-latency dependency chain in the (in generally tight) loops, and it was slower.
I guess I should have called it "compacting" the lists instead of "truncating" the lists. But you caught my meaning in your edit.

I see your point about the holes problem simply getting moved. But I don't think maintaining separate lists for each piece type is such a bad thing. it would only add a tiny bit of extra code to MVV/LVA routines.

Anyway, for my first stab at this I'm trying something very similar to what you've described. There's one piece list per color, but piece types are grouped at specific offset ranges within - effectively making separate "inner" lists per piece type. Having the inner lists makes it so offset values can be used to determine piece type. So far this is the same as what you originally described - I think. But I'm not going to use a captured flag on each piece. I'm going to do the compacting thing - compaction only occurs within each inner list. And I'm grouping all the sliding pieces into one inner list. That will make MVV/LVA logic more complicated, but it solves other problems specific to the engine I'm tinkering with.

Anyway, we'll see how it works, if I can get it to work. I suspect all the extra incremental maintenance and de-referencing is going to slow things down enough that it won't be any faster in the end than what I have now (which is basically just a perft routine at present - no qsearch and no positional eval). The piece lists should help tremendously in qsearch move generation and positional eval though, so hopefully it's worth the extra complexity.
zd3nik
Posts: 193
Joined: Wed Mar 11, 2015 3:34 am
Location: United States

Re: Fast lookup of pieces for move generation without bitboa

Post by zd3nik »

stegemma wrote:Iterate through captured piece is not so expensive as iterate all over the board. You can use a flag (it's the faster solution) but you can also avoid using a captured flag at all, if you store in the chessboard the index of any piece and in the piece list the index to the related squares. When you capture a piece, you can avoid to update captured flag if you just test the piece index on the square against the index of the piece itself: if don't match then the piece has been captured:

Code: Select all

loop i through piece list
  square = piece&#91;i&#93;
  index = board&#91;square&#93;
  if index not equal to i then captured
This would add a read of the board for any piece but would save the update of the captured flag at any capture move and relative undo.
I realize it would be faster than iterating over every square on the board. I was just stating I think it's unnecessary because you can just compact the lists when a piece is removed.
stegemma wrote:Another way to avoid captured flag could be to exchange the captured piece (in piece list) with the last one on the piece list of the same color and then diminish the pieces count. Of course you should update the board index for the last piece that you have exchanged:

Code: Select all

idx = index of captured piece
last = index of last piece
exchange piece&#91;idx&#93; with piece&#91;last&#93;
square = piece&#91;idx&#93; &#40;now it is the one who was the last&#41;
board&#91;square&#93; = idx
decrement number of pieces

That way you don't iterate over the captured piece anymore. You can do the same when you undo the capture move.

I've used this on some version of Drago but I don't remember if they were fast than iterate over captured pieces.
And this is what I mean by compacting the lists (I first called it truncating, but let's stick with compacting).