Incrementally-updated attack map

hgm · Post by **hgm** » Mon Apr 21, 2014 9:13 pm

I am developing an engine especially for Chess variants with huge boards. Incremental techniques are very rewarding there, as the number of pieces typically increases with the board area, while the effects of a move only extend to the rays that pass through the from- and to-squares, and thus scales with the board circumference. Most pieces are totally unaffected by this, and keep the same moves as they had before. The same techniques are of course applicable to orthodox Chess as well. I just don't know if they would be competitive there. Nevertheless, I want to share them here.

The basic data structure I use is a distance table, containing the distance to the nearest piece in each of the eight major directions. Distance=1 means adjacent, and 0 is used when there is no piece at all in that direction, and you are looking directly at the board edge. This distance is tabulated per board square, rather than per piece, so that in case of a replacement capture it remains valid, and the new occupant of the square can 'inherit' it unmodified from the victim. The distance information is only valid for occupied squares.

This makes incremental update of the table rather easy in the (common) case that a capture is played: the only effect is from evacuation of the from-square. This from-square, being formerly occupied, contains valid distances to the nearest obstacles in any direction, which were looking at it, and now have to be extended over the evacuated square. So basically we have to increase the distance in our direction of the 'upstream' obstacle with our own distance to the downstream obstacle, for all 8 directions. On UnMake we assume that the distance information on the evacuated square, although formally not valid after it was evacuated, still has the same values as when we left it. So again these point us directly to the squares that were looking upon us, and we can restore their distance in our direction to the original value by subtracting our own distance in that direction. (In both cases there has to be special treatment of the 0-case, where we were looking at the board edge.)

The more difficult case (but fortunately far less common in the tree) is that of a non-capture, where we land on a formerly empty square. This will in general not contain meaningful information, so we will somehow have to generate it. But to not violate the assumption for UnMake of captures, we have to save the old distance values first, so we can restore it on UnMake. But that is not too much work: the 8 distances can be kept as bytes in a 64-bit word, which can be saved as a single item. Worse is that we have to determine the distance to the nearest piece in each direction, which requires a ray scan over the board. (It might be possible to avoid that ray scan by keeping an occupancy bitmap for each of the four lines that intersect the square. But during most of the game the board will be crowded, and the lines of sight will not be very long. So this is a refinement that does not buy terribly much speedup, compared to what you have to do anyway. And, as said before, this only applies to non-captures, while the tree is mostly captures, because of QS.) The determined distances can be used for the square itself, as well as to set a new distance from the found obstacles to us in the reciprocal direction. For UnMake we will have the valid distances of the to-square available, so we can immediately get at the (upstream) obstacles to add our downstream distance to theirs.

Now this is easy and efficient, but in itself not yet good for anything, It is just auxiliary data that is helpful for incrementally updating an attack map. So in addition to the distance count, we store for each square two words of information (one for each player), which packs bit flags to indicate for each direction if there actually is an attack on that square. (One of the two then obviously flags only virtual attacks, that protect an own piece.)

This makes generating captures, which usually is the Achilles heel of mailbox representations, quite easy: You can run through the opponent's piece list (assumed to be stored in order of piece value), looking for a piece that is on a square that is attacked by us. For each attack flag set in 'our' word of the attack map, we look up the distance to the nearest obstacle in the corresponding direction in the distance map, and that must be the attacker. This generates the attacks in MVV order, so that you can do stages capture generation per victim group of equal value, and only have to sort the group based on attacker. Non-captures are usually generated extremely efficiently with mailbox, by just scanning the rays of moving starting from each piece, as long as you encounter empty squares.

Each time a piece is captured, its captures would have to be removed from the attack map, but as the distances of possible targets from the capture square are given, we immediately know where they are. Similarly, a new piece that appears somewhere, would have to compare the distances it has by its range in the corresponding direction, to see if it should flag an attack there. If a distance changes by occupying or evacuating a square on which it had a view, it would have to be determined if the piece on that square now hits the obstacle at the new distance, (and possibly remove the attack on the piece at the old distance, if this was an interposition rather than an evacuation).

That describes the 'easy part' of stepper and slider moves. Alas, this is not enough for the actual Shogi variants I am implemented, as it would also not be enough for orthodox Chess, due to the presence of Knights. Moves that do not go along orthogonal or diagonal directions (so-called 'oblique' moves) are not handled so far at all. Fortunately oblique moves are exceedingly rare in Shogi, and only very few pieces have them. Most Shogi variants do have Knights, that jump as Chess Knights, but only with the forward-most two moves. What is worse is that the exceedingly powerful Lion and its Kin occurring in all the larger variants amongst (many) others also moves like a Chess Knight. And the largest variants have so-called 'hook movers', which are Rooks or Bishops that can turn one corner during their move, so they can basically reach any square (for the Bishop of the right color), often along two different paths.

But before dealing with that, there is also trouble to be solved for the on-ray moves, as some Shogi pieces (amongst which the infamous Lion) can jump over others there. Fortunately they can make leaps of only 2 squares, and (in the largest variants) very rarely 3 squares. Orthodox Chess doesn't have such moves, but its Persian predecessor 'Shatranj' already had it for the Bishops (which, as we all know, is English for Elephants). This makes the whole thing more cumbersome, as sometimes the nearest piece will be the jumping attacker (if it is at distance 2), but sometimes it is not, when there is a piece at distance 1 over which it jumps. In that case both the jumper and the neighbor can each attack us. This makes it also confusing to have separate attack flags for such jump attacks and the nearest attacker, as sometimes they are the same. So rather than assigning extra flags (which we will see are badly needed for the more exotic stuff), the attack flag for a certain direction is set not only for an attack by the nearest piece, but also when that piece (at distance 1) is not attacking, and the piece directly behind it is. This means we cannot assume that it is the nearest piece attacks us if the flag is set, at least not if the distance is 1. So there will be a fair amount of extra code to untangle the d=1 case.

Another complexity occurs with slider pieces that can jump over arbitrarily many others to capture. Such pieces exist in two of the more exotic Shogi variants. They then come in several 'ranks', each rank only able to jump over pieces of lower rank (where 'normal' pieces have rank 0). The attack map deals with this by also keeping for each direction a variable that lists the highest rank of the piece attacking from that direction. If that is set to a non-zero value, it indicates an attack that does not necessarily come from the nearest piece, but from arbitrarily far behind it. To get at the attacker you can hop from piece to piece using the distance information, examining the rank of what you find there, (which might also attack you, even if its rank is not the listed highest rank), until you reach the piece of the listed rank. When a field is evacuated by a piece ranked equal or higher than the highest-ranked attack on it, that attack would have to be elongated (as it is now unblocked) by hopping downstream. (Which, at the same time, might remove the attacks of the ranked piece itself, if it moved in that direction.)

The hook movers are a real challenge. They have an enormously large number of moves, which is a huge incentive to not generate their captures from scratch, but keep them in the attack table and update them incrementally. Of course it cannot be avoided even then that they all have to be removed, if the hook mover is captured, or otherwise disappears. (Hmm, maybe not.) But only a very small fraction of the pieces are hook movers, so most moves will leave the hook movers in place. Then it would just be a matter of blocking or unblocking their few moves that pass through from- and to-square. To make that possible, each hook mover gets 2 bits assigned to it in the attack map, flagging attacks along the two paths that would be theoretically possible on an empty board. Rank+file vs file+rank for an orthogonal hook mover, and diagonal+anti-diagonal vs anti-diagonal+diagonal for the diagonal hook movers. A set flag implies a particular hook mover, the location of which can be obtained from the piece list. From the relative location compared to the attacked square it then follows from which side along the flagged dimension the attack was coming, and how far it is in that direction to reach the square where the hook move turns a corner. By comparing this to the distance of the square that was occupied, we can determine if this blocked the hook move in its second leg. (For unblocking hook moves, it would just be a matter of elongating the hook attacks on the square that is evacuated to the new down-stream obstacle.) Blocking a hook mover on its first leg would be recognized by having a direct view on that hook mover from the active square, and would lead to a flurry of activity, for updating all downstream targets that could be reached through a hook move from there.

Actual Shogi Knights are so simple (at most two leaping moves!) and so rare (only two in even the largest Shogi variant) that it is competitive to generate their captures from scratch in every node. The Lion-like pieces, that also make Knight jumps (and all 8 of them) are special pieces anyway, which can do multiple captures in one move. This makes the MVV concept a bit fuzzy, making it desirable to generate their captures separately, before starting to generate by MVV, as two lower victims together can be worth more than the highest individual victim. So it makes sense to generate Lion moves independently from the attack map anyway, in a sort of prelude stage to capture generation. This holds even more for the Fire Demons that occur in one of the variants, which 'burn' away all pieces around the square they move to (somewhat similar to Atomic Chess). Other special pieces, such as 'area movers' (rare pieces that can do 3 King steps while changing direction, and thus walk around obstacles) are very difficult to update incrementally, so generating their moves from scratch is also attractive. All these pieces can be handled from scratch in a 'prelude stage' to capture generation, creating a (comparatively small) temporary move list. The (MVV-wise) best capture can then be extracted from this list, and as long as there are higher attacked victims in the piece list/attack map, we will process the attacks on those, but if not, we inject the capture from this temporary move list in the process. So all rarities can easily be handled in this prelude, leaving the overwhelming bulk of the moves of the more normal pieces handled incrementally.

Of course running through the piece list to find attacked pieces is already a significant effort when there are hundred(s) of pieces, most of them not attacked at all. Even that could be sped up by using an incremental technique which puts the attacked pieces in a doubly-linked list (listing the list-distance to the next and previous attacked piece). Then attack-flag words that change into or away from zero would require removal or addition to this linked list, where only addition would require a search through the list for the next attacked piece. (All very similar to what happens in the distance table along a ray, actually.)

hMx · Post by **hMx** » Tue Apr 22, 2014 9:30 pm

Wow! That is quite a large design, resulting in a "fat" board with a lot of redundant information, consequently designed to help a certain purpose, namely certain forms of move generation.

Congratulations, your design looks good. For Chest we had similar design considerations, and some of your reasoning looks familiar to me.

"Fat" boards also have drawbacks. Copying boards now is not really cheap anymore, what may become a factor e.g. at split points for parallel tree search. But maybe, not.

Please tell us more when you have collected experience with the pro and con of your design, I find it very interesting.

Cheers, Heiner

hgm · Post by **hgm** » Tue Apr 22, 2014 9:48 pm

Thank you for your comments, I must admit that I never considered SMP issues in this context. My favorite design for multi-threaded search would not involve copying boards, but sequences of moves that would tell the thread how to reach the position, which it would then make, and update the fat board in the process. Or, preferably,not exchanging any information at all, but has each thread walk the tree independently, searching for nodes marked in the hash table as open split points.

But I must admit that I so far have never been able to produce a multi-threaded program that was faster (in nps) than the corresponding single-threaded program.

Aleks Peshkov · Post by **Aleks Peshkov** » Wed Apr 23, 2014 12:17 am

It is more efficient to use piece byte-vectors, but not piece lists. Orthodox chess need only single XMM register to hold information about each of 16 piece for each side.

My engine holds the vector of squares and the vector of piece type (rooks with castling right and en passant status can be encoded as special piece type), total 2x2x16 bytes. Using branchless SIMD vectors bit twiddling it easy to get information which piece is on any given square. All other data is redundant, including board array.

I also update matrix of attacks (8 files of 16x8-bit rank vectors). It is trivial to get information what pieces attacks each square or a given set of squares.
It is very easy to create a matrix of legal moves from attack matrix (instead of move list). My engine calculate perft as fast as the fastest public perft generators.

hgm · Post by **hgm** » Wed Apr 23, 2014 10:04 am

Aleks Peshkov wrote:It is more efficient to use piece byte-vectors, but not piece lists. Orthodox chess need only single XMM register to hold information about each of 16 piece for each side.

That sounds indeed like a very efficient way to do it, for orthodox Chess. Unfortunately the engine I am designing also aims to play (at least) Tai Shogi, where each side has 177 pieces on a 25x25 board. Where almost every piece can promote to a piece of different value, which makes it necessary to reserve two entries for each piece in order to keep them ordered by value (if you don't want to re-order them when one of them promotes). And then I am not even talking about Taikyoku Shogi on a 36x36 board, with 402 pieces each.

It might still be competitive, though, to keep a bitmap of pieces next to the the piece list that stores things like location, type, promoted entry, value, PST number, Zobrist piece key etc. Then you could set the bits of the attacked pieces, and extract them in order of value, and thus scan for attacked pieces in groups of 64 or 32 at the time. The entire piece bitmap would span several words, but you can easily get the word you have to extract from next from the number N of the piece you just been treating, as

m = (N+1) >> 6;
while((b = pieceMap[m]) == 0);
N = EXTRACT_BIT(b);

That would remove the need to scan the list for neighboring attacked pieces when you want to add a newly attacked one. You just set the corresponding bit.

Aleks Peshkov · Post by **Aleks Peshkov** » Wed Apr 23, 2014 12:25 pm

If the board is square, it is profitable to store rank and file information in separate byte-vectors. I may actually try this approach for Orthodox chess (using single bit inside byte, instead of file/rank number). Branchless parallel bit operations with piece coordinates may be faster then looping through piece units lists or through board squares for attack generation.

I do not shuffle my original data vectors. All empty piece slots are implicitly skipped in loops. (_mm_movemask_epi8() convert byte-vector to bit-vector then bitscan). In rare cases where the order of iterating through piece-set is important (MVV/LVA, what's else?) it is easier to keep separate order-index piece-vector. _mm_shuffle_epi8() is magically powerful.

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Apr 23, 2014 2:11 pm

Aleks Peshkov wrote:It is more efficient to use piece byte-vectors, but not piece lists. Orthodox chess need only single XMM register to hold information about each of 16 piece for each side.

My engine holds the vector of squares and the vector of piece type (rooks with castling right and en passant status can be encoded as special piece type), total 2x2x16 bytes. Using branchless SIMD vectors bit twiddling it easy to get information which piece is on any given square. All other data is redundant, including board array.

I also update matrix of attacks (8 files of 16x8-bit rank vectors). It is trivial to get information what pieces attacks each square or a given set of squares.
It is very easy to create a matrix of legal moves from attack matrix (instead of move list). My engine calculate perft as fast as the fastest public perft generators.

Interesting. So the initial position is represented like this?

Code: Select all

xmm0 piece codes w&#58; | wk | wq | wr | wr | wb | wb | wn | wn | wp | wp | wp | wp | wp | wp | wp | wp |
xmm1 squares white&#58; |  4 |  3 |  7 |  0 |  5 |  2 |  6 |  1 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |
xmm2 piece codes b&#58; | bk | bq | br | br | bb | bb | bn | bn | bp | bp | bp | bp | bp | bp | bp | bp |
xmm3 squares black&#58; | 60 | 59 | 63 | 56 | 61 | 54 | 62 | 57 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 |

How exactly do you twiddle the bits to get piece code by square, i.e. square 60?
pcmpeqb with const vector of 16x60, pmovmskb, conditional bitscan if != 0 and SSE4 pextrb, or SSSE3 pshufb?

Thanks,
Gerd

Aleks Peshkov · Post by **Aleks Peshkov** » Wed Apr 23, 2014 3:29 pm

I use SSSE3 intrinsics. Main intermediate data structure is byte-mask vector (0 or 0xFF), so I rarely need to extract index of single populated byte-mask.

My code is completely colorless. I switch sides inside position during make move, so I do not need to explicitly know which side is on move.

My square coordinates are always relative to own side (both kings initially on E1 square), flip (sq ^ 070) if needed to compare different sides squares. Pawns go in one direction. e2e4 and e7e5 moves internally encoded identically.

Piece type element is not an index, but a bitset of properties:
* 6 bits per piece type + special_move bit + pin_candidates bit;
* getting all sliders is easy bitmask operation;
* pin_candidates mean sliders that directed towards enemy king, this flag greatly speed ups move legality checking);
* special_move + rook means castling possible;
* special_move + pawn means either en passant victim pawn or pawn(s) that can capture en passant this time).

Incrementally-updated attack map

Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map