Slider attack mask generation without table lookup

vittyvirus · Post by **vittyvirus** » Sun May 24, 2015 4:22 pm

Let's jump straight to the code:

Bitboard rank_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0x8181818181818181ULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & RankMask&#91;rank_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & RankSqMask&#91;sq&#93;;
&#125;
Bitboard file_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0xFF000000000000FFULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & FileMask&#91;file_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & FileSqMask&#91;sq&#93;;
&#125;

What we actually do (very simplified version)
1. Get the index of the first blocker behind this square.
2. Mask this square and the next square for subtraction.
3. Subtract it (and the square on which the piece resides) from the occupancy. This will give you required attacks in both directions.
3. 'Xor' it with occupancy to also target the first blocker (and to so some other stuff too).
4. 'And' it with the file mask... and Bingo!

I 'discovered' this technique months ago, but was not able to post it because the motherboard of my laptop needed to be repaired (repaired it just today!). This works for diagonals and anti diagonals as well. I measured the speed for rook attacks on MSVC++ with BitScanReverse64() and _byteswap_uint64() intrinsics but the results were not too encouraging:
1. It took 8.31 secs for 1000000000 or so rounds. (Writing this from memory, I recorded the timings but not the no. of rounds, and I'm too lazy to redo the timings...)
2. Magic Bitboards took 6.39 secs for the same thing (sometimes less than 6 secs!)
3. Hyperbola Quintessence is faster for file attacks
I feel that there is scope for improvement. Even if it doesn't get faster than Magic Bitboards or HQ, it'd atleast make up a good replacement of Kindergarden Bitboards for Hyperbola Quintessence.

bob · Post by **bob** » Sun May 24, 2015 5:02 pm

vittyvirus wrote:Let's jump straight to the code:
Code: Select all
Bitboard rank_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0x8181818181818181ULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & RankMask&#91;rank_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & RankSqMask&#91;sq&#93;;
&#125;
Bitboard file_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0xFF000000000000FFULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & FileMask&#91;file_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & FileSqMask&#91;sq&#93;;
&#125;
What we actually do (very simplified version)
1. Get the index of the first blocker behind this square.
2. Mask this square and the next square for subtraction.
3. Subtract it (and the square on which the piece resides) from the occupancy. This will give you required attacks in both directions.
3. 'Xor' it with occupancy to also target the first blocker (and to so some other stuff too).
4. 'And' it with the file mask... and Bingo!

I 'discovered' this technique months ago, but was not able to post it because the motherboard of my laptop needed to be repaired (repaired it just today!). This works for diagonals and anti diagonals as well. I measured the speed for rook attacks on MSVC++ with BitScanReverse64() and _byteswap_uint64() intrinsics but the results were not too encouraging:
1. It took 8.31 secs for 1000000000 or so rounds. (Writing this from memory, I recorded the timings but not the no. of rounds, and I'm too lazy to redo the timings...)
2. Magic Bitboards took 6.39 secs for the same thing (sometimes less than 6 secs!)
3. Hyperbola Quintessence is faster for file attacks
I feel that there is scope for improvement. Even if it doesn't get faster than Magic Bitboards or HQ, it'd atleast make up a good replacement of Kindergarden Bitboards for Hyperbola Quintessence.

This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.

Gerd Isenberg · Post by **Gerd Isenberg** » Sun May 24, 2015 7:07 pm

bob wrote:This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.

Nope, Syed's bit-twiddling approach looks new and original to me. It is not the classical ray-wise approach you probably mean with direct calculation and ray-lookups. There is only one bsr per line, that is two for rook/bishop, not four. Otherwise I agree. 2*11-1 operations for rook/bishop is too expensive despite lower memory usage.

Gerd Isenberg · Post by **Gerd Isenberg** » Sun May 24, 2015 7:16 pm

vittyvirus wrote:Let's jump straight to the code:
Code: Select all
Bitboard rank_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0x8181818181818181ULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & RankMask&#91;rank_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & RankSqMask&#91;sq&#93;;
&#125;
Bitboard file_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0xFF000000000000FFULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & FileMask&#91;file_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & FileSqMask&#91;sq&#93;;
&#125;
What we actually do (very simplified version)
1. Get the index of the first blocker behind this square.
2. Mask this square and the next square for subtraction.
3. Subtract it (and the square on which the piece resides) from the occupancy. This will give you required attacks in both directions.
3. 'Xor' it with occupancy to also target the first blocker (and to so some other stuff too).
4. 'And' it with the file mask... and Bingo!

I 'discovered' this technique months ago, but was not able to post it because the motherboard of my laptop needed to be repaired (repaired it just today!). This works for diagonals and anti diagonals as well. I measured the speed for rook attacks on MSVC++ with BitScanReverse64() and _byteswap_uint64() intrinsics but the results were not too encouraging:
1. It took 8.31 secs for 1000000000 or so rounds. (Writing this from memory, I recorded the timings but not the no. of rounds, and I'm too lazy to redo the timings...)
2. Magic Bitboards took 6.39 secs for the same thing (sometimes less than 6 secs!)
3. Hyperbola Quintessence is faster for file attacks
I feel that there is scope for improvement. Even if it doesn't get faster than Magic Bitboards or HQ, it'd atleast make up a good replacement of Kindergarden Bitboards for Hyperbola Quintessence.

Very nice and original bit-twiddling. But 2*11-1 operations for rook/bishop is too expensive despite lower memory size, compared to 13 of classical approach, or 2*9 of OD, not to mention Kindergarten, HQ or magics. I have re-written one routine to better understand it ...

Code: Select all

Bitboard rank_attacks&#40;int sq, Bitboard occ&#41; &#123; //... 00110010 &#91;63..0&#93;  sq = 3
   static const Bitboard OuterSquares = 0x8181818181818181ULL;
   Bitboard s, x;                      //               v sq
   occ = occ | OuterSquares            // occ = ... 10110011
   occ = occ & RankMask&#91;sq&#93;;           // occ = ..0 10110011
   s =  1ULL << sq;                    //   s = ..0 00001000 0001 << 3
   x = s - 1;                          //   x = ..0 00000111 lower mask
   x = occ & x;                        //   x = ..0 00000011, msb = 1
   x = 3ULL << msb&#40;x&#41;;                 //   x = ..0 00000110, 011 << 1
   x = x | s;                          //   x = ..0 00001110
   x = occ - x;                        //   x = ..0 10100101
   x = occ ^ x;                        //   x = ..0 00010110
   x = x & RankSqMask&#91;sq&#93;;
   return x;
&#125;

bob · Post by **bob** » Mon May 25, 2015 2:31 am

Gerd Isenberg wrote:
bob wrote:This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.
Nope, Syed's bit-twiddling approach looks new and original to me. It is not the classical ray-wise approach you probably mean with direct calculation and ray-lookups. There is only one bsr per line, that is two for rook/bishop, not four. Otherwise I agree. 2*11-1 operations for rook/bishop is too expensive despite lower memory usage.

It looked like what Joel Rivat was using in ChessGuru back in the 90's. Find two endpoints, compute bit vector, repeat for other direction. I used a tangential approach for a while. Find one endpoint, look up attack vector from one endpoint toward the other, then use second end point to lop off unnecessary bits.

Gerd Isenberg · Post by **Gerd Isenberg** » Mon May 25, 2015 9:22 am

bob wrote:
Gerd Isenberg wrote:
bob wrote:This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.
Nope, Syed's bit-twiddling approach looks new and original to me. It is not the classical ray-wise approach you probably mean with direct calculation and ray-lookups. There is only one bsr per line, that is two for rook/bishop, not four. Otherwise I agree. 2*11-1 operations for rook/bishop is too expensive despite lower memory usage.
It looked like what Joel Rivat was using in ChessGuru back in the 90's. Find two endpoints, compute bit vector, repeat for other direction. I used a tangential approach for a while. Find one endpoint, look up attack vector from one endpoint toward the other, then use second end point to lop off unnecessary bits.

Ok, no idea what Joël Rivat used in Chess Guru, guess from his rgcc posts he later moved to rotated. Two endpoints are also topic in Hoffmann's Obstruction Difference, while Fahad's method applies the o^(o-2r) trick, but does not subtract the rook but the next blocker in "negative" direction.

Gerd Isenberg · Post by **Gerd Isenberg** » Mon May 25, 2015 11:18 am

I stand corrected with my earlier statement, which is Ok for the general idea, but the code is buggy, at least the sq == 0 case requires some further twiddling ...

vittyvirus · Post by **vittyvirus** » Mon May 25, 2015 12:41 pm

Gerd Isenberg wrote:I stand corrected with my earlier statement, which is Ok for the general idea, but the code is buggy, at least the sq == 0 case requires some further twiddling ...

Well, you are right. This can be avoided by maintaining arrays instead of doing straight calculation, as I originally did:

Code: Select all

inline Bitboard rank_atk&#40;square_t sq, Bitboard occ&#41;
  &#123;
    static const Bitboard OuterSquares = 0x8181818181818181ULL;
    occ = &#40;occ | OuterSquares&#41; & RankMask&#91;rank_of&#40;sq&#41;&#93;;
    return (&#40;occ - &#40;ThisAndNextSquare&#91;msb&#40;occ & PreviousSquares&#91;sq&#93;)&#93; | SquareMask&#91;sq&#93;)) ^ occ&#41; & RankSqMask&#91;sq&#93;;
  &#125;

This also is more efficient for 32-bit processors, I'd conjecture.

stegemma · Post by **stegemma** » Tue May 26, 2015 7:49 pm

bob wrote:[...]
This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.

I agree with you, I've used a similar idea (but simpler than this) in one of the version of Satana and all trying were slowest than previous code.

For sample:

Code: Select all

inline void RookMoves&#40;const uint64_t &boFrom, const uint64_t &boAllPieces, const uint64_t &boMyEnemies, 
                        uint64_t &boChecks, uint64_t &boRookCaptures, uint64_t &boRookMoves&#41; const
  &#123;
    uint64_t mask_E                , mask_N    , mask_S    , mask_X, 
             slide_n   , slide_s   , slide_e   , slide_w   ,
             take_n    , take_s    , take_e    , take_w    ,
                         stop_s    , stop_e    , stop_w    ,
             exclude_n;
    mask_X  = ((&#40;boFrom*BOARD_E&#41;>>56&#41;*BOARD_E&#41;;

    slide_w = (&#40;BOARD_W - boFrom&#41;^BOARD_W&#41;^boFrom;
    slide_e = &#40;boFrom-((&#40;slide_w|boFrom&#41;>>7&#41;&BOARD_E&#41;);

    mask_N = &#40;0 - boFrom&#41;^slide_w^boFrom;
    mask_E = mask_X - BOARD_E;
    mask_S = (~mask_N&#41;>>8;

    slide_n = &#40;boFrom*BOARD_E&#41;^boFrom;
    slide_s = &#40;boFrom*BOARD_E&#41;^mask_X;

    take_e  = slide_e & boAllPieces;
    take_w  = slide_w & boAllPieces;

    stop_e  = MSB_SLIDE&#40;stop_e, take_e, 1, mask_E&#41;;
    stop_w  = FirstBit&#40;take_w&#41;;

    slide_e = slide_e & ~stop_e;
    slide_w = slide_w & &#40;stop_w - 1&#41;;

    take_e &= (&#40;boFrom|slide_e&#41;>>1&#41;;  boChecks  = take_e;  take_e &= boMyEnemies;
    take_w &= (&#40;boFrom|slide_w&#41;<<1&#41;;  boChecks |= take_w;  take_w &= boMyEnemies;

    //-------------------------------------------------N
    take_n  = slide_n & boAllPieces;
    exclude_n = &#40;0-take_n&#41;|take_n;
    slide_n &= ~exclude_n;
    take_n = (&#40;boFrom|slide_n&#41;<<8&#41; & exclude_n;  boChecks |= take_n;  take_n &= boMyEnemies;

    //-------------------------------------------------S
    take_s  = slide_s  & boAllPieces;
    stop_s  = MSB_SLIDE&#40;stop_s, take_s, 8, mask_S&#41;;
    slide_s &= slide_s & ~stop_s;
    take_s &= (&#40;boFrom|slide_s&#41;>>8&#41;;  boChecks |= take_s;  take_s &= boMyEnemies;

    boRookMoves    = slide_n | slide_e | slide_s | slide_w;
    boRookCaptures = take_n  | take_e  | take_s  | take_w;
  &#125;

Slider attack mask generation without table lookup

Slider attack mask generation without table lookup

Re: Slider attack mask generation without table lookup

Re: Slider attack mask generation without table lookup

Re: Slider attack mask generation without table lookup

Re: Slider attack mask generation without table lookup

Re: Slider attack mask generation without table lookup

Re: Slider attack mask generation without table lookup

Re: Slider attack mask generation without table lookup

Re: Slider attack mask generation without table lookup