Incrementally-updated attack map

hMx · Post by **hMx** » Tue Apr 22, 2014 9:30 pm

Wow! That is quite a large design, resulting in a "fat" board with a lot of redundant information, consequently designed to help a certain purpose, namely certain forms of move generation.

Congratulations, your design looks good. For Chest we had similar design considerations, and some of your reasoning looks familiar to me.

"Fat" boards also have drawbacks. Copying boards now is not really cheap anymore, what may become a factor e.g. at split points for parallel tree search. But maybe, not.

Please tell us more when you have collected experience with the pro and con of your design, I find it very interesting.

Cheers, Heiner

hgm · Post by **hgm** » Tue Apr 22, 2014 9:48 pm

Thank you for your comments, I must admit that I never considered SMP issues in this context. My favorite design for multi-threaded search would not involve copying boards, but sequences of moves that would tell the thread how to reach the position, which it would then make, and update the fat board in the process. Or, preferably,not exchanging any information at all, but has each thread walk the tree independently, searching for nodes marked in the hash table as open split points.

But I must admit that I so far have never been able to produce a multi-threaded program that was faster (in nps) than the corresponding single-threaded program.

Aleks Peshkov · Post by **Aleks Peshkov** » Wed Apr 23, 2014 12:17 am

It is more efficient to use piece byte-vectors, but not piece lists. Orthodox chess need only single XMM register to hold information about each of 16 piece for each side.

My engine holds the vector of squares and the vector of piece type (rooks with castling right and en passant status can be encoded as special piece type), total 2x2x16 bytes. Using branchless SIMD vectors bit twiddling it easy to get information which piece is on any given square. All other data is redundant, including board array.

I also update matrix of attacks (8 files of 16x8-bit rank vectors). It is trivial to get information what pieces attacks each square or a given set of squares.
It is very easy to create a matrix of legal moves from attack matrix (instead of move list). My engine calculate perft as fast as the fastest public perft generators.

hgm · Post by **hgm** » Wed Apr 23, 2014 10:04 am

Aleks Peshkov wrote:It is more efficient to use piece byte-vectors, but not piece lists. Orthodox chess need only single XMM register to hold information about each of 16 piece for each side.

That sounds indeed like a very efficient way to do it, for orthodox Chess. Unfortunately the engine I am designing also aims to play (at least) Tai Shogi, where each side has 177 pieces on a 25x25 board. Where almost every piece can promote to a piece of different value, which makes it necessary to reserve two entries for each piece in order to keep them ordered by value (if you don't want to re-order them when one of them promotes). And then I am not even talking about Taikyoku Shogi on a 36x36 board, with 402 pieces each.

It might still be competitive, though, to keep a bitmap of pieces next to the the piece list that stores things like location, type, promoted entry, value, PST number, Zobrist piece key etc. Then you could set the bits of the attacked pieces, and extract them in order of value, and thus scan for attacked pieces in groups of 64 or 32 at the time. The entire piece bitmap would span several words, but you can easily get the word you have to extract from next from the number N of the piece you just been treating, as

m = (N+1) >> 6;
while((b = pieceMap[m]) == 0);
N = EXTRACT_BIT(b);

That would remove the need to scan the list for neighboring attacked pieces when you want to add a newly attacked one. You just set the corresponding bit.

Aleks Peshkov · Post by **Aleks Peshkov** » Wed Apr 23, 2014 12:25 pm

If the board is square, it is profitable to store rank and file information in separate byte-vectors. I may actually try this approach for Orthodox chess (using single bit inside byte, instead of file/rank number). Branchless parallel bit operations with piece coordinates may be faster then looping through piece units lists or through board squares for attack generation.

I do not shuffle my original data vectors. All empty piece slots are implicitly skipped in loops. (_mm_movemask_epi8() convert byte-vector to bit-vector then bitscan). In rare cases where the order of iterating through piece-set is important (MVV/LVA, what's else?) it is easier to keep separate order-index piece-vector. _mm_shuffle_epi8() is magically powerful.

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Apr 23, 2014 2:11 pm

Aleks Peshkov wrote:It is more efficient to use piece byte-vectors, but not piece lists. Orthodox chess need only single XMM register to hold information about each of 16 piece for each side.

My engine holds the vector of squares and the vector of piece type (rooks with castling right and en passant status can be encoded as special piece type), total 2x2x16 bytes. Using branchless SIMD vectors bit twiddling it easy to get information which piece is on any given square. All other data is redundant, including board array.

I also update matrix of attacks (8 files of 16x8-bit rank vectors). It is trivial to get information what pieces attacks each square or a given set of squares.
It is very easy to create a matrix of legal moves from attack matrix (instead of move list). My engine calculate perft as fast as the fastest public perft generators.

Interesting. So the initial position is represented like this?

Code: Select all

xmm0 piece codes w&#58; | wk | wq | wr | wr | wb | wb | wn | wn | wp | wp | wp | wp | wp | wp | wp | wp |
xmm1 squares white&#58; |  4 |  3 |  7 |  0 |  5 |  2 |  6 |  1 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |
xmm2 piece codes b&#58; | bk | bq | br | br | bb | bb | bn | bn | bp | bp | bp | bp | bp | bp | bp | bp |
xmm3 squares black&#58; | 60 | 59 | 63 | 56 | 61 | 54 | 62 | 57 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 |

How exactly do you twiddle the bits to get piece code by square, i.e. square 60?
pcmpeqb with const vector of 16x60, pmovmskb, conditional bitscan if != 0 and SSE4 pextrb, or SSSE3 pshufb?

Thanks,
Gerd

Aleks Peshkov · Post by **Aleks Peshkov** » Wed Apr 23, 2014 3:29 pm

I use SSSE3 intrinsics. Main intermediate data structure is byte-mask vector (0 or 0xFF), so I rarely need to extract index of single populated byte-mask.

My code is completely colorless. I switch sides inside position during make move, so I do not need to explicitly know which side is on move.

My square coordinates are always relative to own side (both kings initially on E1 square), flip (sq ^ 070) if needed to compare different sides squares. Pawns go in one direction. e2e4 and e7e5 moves internally encoded identically.

Piece type element is not an index, but a bitset of properties:
* 6 bits per piece type + special_move bit + pin_candidates bit;
* getting all sliders is easy bitmask operation;
* pin_candidates mean sliders that directed towards enemy king, this flag greatly speed ups move legality checking);
* special_move + rook means castling possible;
* special_move + pawn means either en passant victim pawn or pawn(s) that can capture en passant this time).

Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map

Re: Incrementally-updated attack map