Slider attack mask generation without table lookup

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
vittyvirus
Posts: 646
Joined: Wed Jun 18, 2014 2:30 pm
Full name: Fahad Syed

Slider attack mask generation without table lookup

Post by vittyvirus »

Let's jump straight to the code:

Code: Select all

Bitboard rank_attacks(int sq, Bitboard occ) {
	static const Bitboard OuterSquares = 0x8181818181818181ULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & RankMask&#91;rank_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & RankSqMask&#91;sq&#93;;
&#125;
Bitboard file_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0xFF000000000000FFULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & FileMask&#91;file_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & FileSqMask&#91;sq&#93;;
&#125;
What we actually do (very simplified version)
1. Get the index of the first blocker behind this square.
2. Mask this square and the next square for subtraction.
3. Subtract it (and the square on which the piece resides) from the occupancy. This will give you required attacks in both directions.
3. 'Xor' it with occupancy to also target the first blocker (and to so some other stuff too).
4. 'And' it with the file mask... and Bingo!

I 'discovered' this technique months ago, but was not able to post it because the motherboard of my laptop needed to be repaired (repaired it just today!). This works for diagonals and anti diagonals as well. I measured the speed for rook attacks on MSVC++ with BitScanReverse64() and _byteswap_uint64() intrinsics but the results were not too encouraging:
1. It took 8.31 secs for 1000000000 or so rounds. (Writing this from memory, I recorded the timings but not the no. of rounds, and I'm too lazy to redo the timings...)
2. Magic Bitboards took 6.39 secs for the same thing (sometimes less than 6 secs!)
3. Hyperbola Quintessence is faster for file attacks
I feel that there is scope for improvement. Even if it doesn't get faster than Magic Bitboards or HQ, it'd atleast make up a good replacement of Kindergarden Bitboards for Hyperbola Quintessence.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Slider attack mask generation without table lookup

Post by bob »

vittyvirus wrote:Let's jump straight to the code:

Code: Select all

Bitboard rank_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0x8181818181818181ULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & RankMask&#91;rank_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & RankSqMask&#91;sq&#93;;
&#125;
Bitboard file_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0xFF000000000000FFULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & FileMask&#91;file_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & FileSqMask&#91;sq&#93;;
&#125;
What we actually do (very simplified version)
1. Get the index of the first blocker behind this square.
2. Mask this square and the next square for subtraction.
3. Subtract it (and the square on which the piece resides) from the occupancy. This will give you required attacks in both directions.
3. 'Xor' it with occupancy to also target the first blocker (and to so some other stuff too).
4. 'And' it with the file mask... and Bingo!

I 'discovered' this technique months ago, but was not able to post it because the motherboard of my laptop needed to be repaired (repaired it just today!). This works for diagonals and anti diagonals as well. I measured the speed for rook attacks on MSVC++ with BitScanReverse64() and _byteswap_uint64() intrinsics but the results were not too encouraging:
1. It took 8.31 secs for 1000000000 or so rounds. (Writing this from memory, I recorded the timings but not the no. of rounds, and I'm too lazy to redo the timings...)
2. Magic Bitboards took 6.39 secs for the same thing (sometimes less than 6 secs!)
3. Hyperbola Quintessence is faster for file attacks
I feel that there is scope for improvement. Even if it doesn't get faster than Magic Bitboards or HQ, it'd atleast make up a good replacement of Kindergarden Bitboards for Hyperbola Quintessence.
This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Slider attack mask generation without table lookup

Post by Gerd Isenberg »

bob wrote:This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.
Nope, Syed's bit-twiddling approach looks new and original to me. It is not the classical ray-wise approach you probably mean with direct calculation and ray-lookups. There is only one bsr per line, that is two for rook/bishop, not four. Otherwise I agree. 2*11-1 operations for rook/bishop is too expensive despite lower memory usage.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Slider attack mask generation without table lookup

Post by Gerd Isenberg »

vittyvirus wrote:Let's jump straight to the code:

Code: Select all

Bitboard rank_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0x8181818181818181ULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & RankMask&#91;rank_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & RankSqMask&#91;sq&#93;;
&#125;
Bitboard file_attacks&#40;int sq, Bitboard occ&#41; &#123;
	static const Bitboard OuterSquares = 0xFF000000000000FFULL;
	assert&#40;!&#40;occ & &#40;1ULL << sq&#41;));	// occ must not mask sq
	occ = &#40;occ | OuterSquares&#41; & FileMask&#91;file_of&#40;sq&#41;&#93;;
	return (&#40;occ - (&#40;3ULL << msb&#40;occ & (&#40;1ULL << sq&#41; - 1&#41;)) | &#40;1ULL << sq&#41;)) ^ occ&#41; & FileSqMask&#91;sq&#93;;
&#125;
What we actually do (very simplified version)
1. Get the index of the first blocker behind this square.
2. Mask this square and the next square for subtraction.
3. Subtract it (and the square on which the piece resides) from the occupancy. This will give you required attacks in both directions.
3. 'Xor' it with occupancy to also target the first blocker (and to so some other stuff too).
4. 'And' it with the file mask... and Bingo!

I 'discovered' this technique months ago, but was not able to post it because the motherboard of my laptop needed to be repaired (repaired it just today!). This works for diagonals and anti diagonals as well. I measured the speed for rook attacks on MSVC++ with BitScanReverse64() and _byteswap_uint64() intrinsics but the results were not too encouraging:
1. It took 8.31 secs for 1000000000 or so rounds. (Writing this from memory, I recorded the timings but not the no. of rounds, and I'm too lazy to redo the timings...)
2. Magic Bitboards took 6.39 secs for the same thing (sometimes less than 6 secs!)
3. Hyperbola Quintessence is faster for file attacks
I feel that there is scope for improvement. Even if it doesn't get faster than Magic Bitboards or HQ, it'd atleast make up a good replacement of Kindergarden Bitboards for Hyperbola Quintessence.
Very nice and original bit-twiddling. But 2*11-1 operations for rook/bishop is too expensive despite lower memory size, compared to 13 of classical approach, or 2*9 of OD, not to mention Kindergarten, HQ or magics. I have re-written one routine to better understand it ...

Code: Select all

Bitboard rank_attacks&#40;int sq, Bitboard occ&#41; &#123; //... 00110010 &#91;63..0&#93;  sq = 3
   static const Bitboard OuterSquares = 0x8181818181818181ULL;
   Bitboard s, x;                      //               v sq
   occ = occ | OuterSquares            // occ = ... 10110011
   occ = occ & RankMask&#91;sq&#93;;           // occ = ..0 10110011
   s =  1ULL << sq;                    //   s = ..0 00001000 0001 << 3
   x = s - 1;                          //   x = ..0 00000111 lower mask
   x = occ & x;                        //   x = ..0 00000011, msb = 1
   x = 3ULL << msb&#40;x&#41;;                 //   x = ..0 00000110, 011 << 1
   x = x | s;                          //   x = ..0 00001110
   x = occ - x;                        //   x = ..0 10100101
   x = occ ^ x;                        //   x = ..0 00010110
   x = x & RankSqMask&#91;sq&#93;;
   return x;
&#125; 
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Slider attack mask generation without table lookup

Post by bob »

Gerd Isenberg wrote:
bob wrote:This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.
Nope, Syed's bit-twiddling approach looks new and original to me. It is not the classical ray-wise approach you probably mean with direct calculation and ray-lookups. There is only one bsr per line, that is two for rook/bishop, not four. Otherwise I agree. 2*11-1 operations for rook/bishop is too expensive despite lower memory usage.
It looked like what Joel Rivat was using in ChessGuru back in the 90's. Find two endpoints, compute bit vector, repeat for other direction. I used a tangential approach for a while. Find one endpoint, look up attack vector from one endpoint toward the other, then use second end point to lop off unnecessary bits.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Slider attack mask generation without table lookup

Post by Gerd Isenberg »

bob wrote:
Gerd Isenberg wrote:
bob wrote:This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.
Nope, Syed's bit-twiddling approach looks new and original to me. It is not the classical ray-wise approach you probably mean with direct calculation and ray-lookups. There is only one bsr per line, that is two for rook/bishop, not four. Otherwise I agree. 2*11-1 operations for rook/bishop is too expensive despite lower memory usage.
It looked like what Joel Rivat was using in ChessGuru back in the 90's. Find two endpoints, compute bit vector, repeat for other direction. I used a tangential approach for a while. Find one endpoint, look up attack vector from one endpoint toward the other, then use second end point to lop off unnecessary bits.
Ok, no idea what Joël Rivat used in Chess Guru, guess from his rgcc posts he later moved to rotated. Two endpoints are also topic in Hoffmann's Obstruction Difference, while Fahad's method applies the o^(o-2r) trick, but does not subtract the rook but the next blocker in "negative" direction.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Slider attack mask generation without table lookup

Post by Gerd Isenberg »

I stand corrected with my earlier statement, which is Ok for the general idea, but the code is buggy, at least the sq == 0 case requires some further twiddling ...
User avatar
vittyvirus
Posts: 646
Joined: Wed Jun 18, 2014 2:30 pm
Full name: Fahad Syed

Re: Slider attack mask generation without table lookup

Post by vittyvirus »

Gerd Isenberg wrote:I stand corrected with my earlier statement, which is Ok for the general idea, but the code is buggy, at least the sq == 0 case requires some further twiddling ...
Well, you are right. This can be avoided by maintaining arrays instead of doing straight calculation, as I originally did:

Code: Select all

inline Bitboard rank_atk&#40;square_t sq, Bitboard occ&#41;
  &#123;
    static const Bitboard OuterSquares = 0x8181818181818181ULL;
    occ = &#40;occ | OuterSquares&#41; & RankMask&#91;rank_of&#40;sq&#41;&#93;;
    return (&#40;occ - &#40;ThisAndNextSquare&#91;msb&#40;occ & PreviousSquares&#91;sq&#93;)&#93; | SquareMask&#91;sq&#93;)) ^ occ&#41; & RankSqMask&#91;sq&#93;;
  &#125;
This also is more efficient for 32-bit processors, I'd conjecture.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Slider attack mask generation without table lookup

Post by stegemma »

bob wrote:[...]
This approach has been around forever. Commonly called "direct computation". But it is MUCH slower than magic, which gives you ALL attacks (all 4 directions) at once... Just the BSF/BSR is a killer, even with hardware support because there are four of 'em.
I agree with you, I've used a similar idea (but simpler than this) in one of the version of Satana and all trying were slowest than previous code.

For sample:

Code: Select all

inline void RookMoves&#40;const uint64_t &boFrom, const uint64_t &boAllPieces, const uint64_t &boMyEnemies, 
                        uint64_t &boChecks, uint64_t &boRookCaptures, uint64_t &boRookMoves&#41; const
  &#123;
    uint64_t mask_E                , mask_N    , mask_S    , mask_X, 
             slide_n   , slide_s   , slide_e   , slide_w   ,
             take_n    , take_s    , take_e    , take_w    ,
                         stop_s    , stop_e    , stop_w    ,
             exclude_n;
    mask_X  = ((&#40;boFrom*BOARD_E&#41;>>56&#41;*BOARD_E&#41;;

    slide_w = (&#40;BOARD_W - boFrom&#41;^BOARD_W&#41;^boFrom;
    slide_e = &#40;boFrom-((&#40;slide_w|boFrom&#41;>>7&#41;&BOARD_E&#41;);

    mask_N = &#40;0 - boFrom&#41;^slide_w^boFrom;
    mask_E = mask_X - BOARD_E;
    mask_S = (~mask_N&#41;>>8;

    slide_n = &#40;boFrom*BOARD_E&#41;^boFrom;
    slide_s = &#40;boFrom*BOARD_E&#41;^mask_X;

    take_e  = slide_e & boAllPieces;
    take_w  = slide_w & boAllPieces;

    stop_e  = MSB_SLIDE&#40;stop_e, take_e, 1, mask_E&#41;;
    stop_w  = FirstBit&#40;take_w&#41;;

    slide_e = slide_e & ~stop_e;
    slide_w = slide_w & &#40;stop_w - 1&#41;;

    take_e &= (&#40;boFrom|slide_e&#41;>>1&#41;;  boChecks  = take_e;  take_e &= boMyEnemies;
    take_w &= (&#40;boFrom|slide_w&#41;<<1&#41;;  boChecks |= take_w;  take_w &= boMyEnemies;

    //-------------------------------------------------N
    take_n  = slide_n & boAllPieces;
    exclude_n = &#40;0-take_n&#41;|take_n;
    slide_n &= ~exclude_n;
    take_n = (&#40;boFrom|slide_n&#41;<<8&#41; & exclude_n;  boChecks |= take_n;  take_n &= boMyEnemies;

    //-------------------------------------------------S
    take_s  = slide_s  & boAllPieces;
    stop_s  = MSB_SLIDE&#40;stop_s, take_s, 8, mask_S&#41;;
    slide_s &= slide_s & ~stop_s;
    take_s &= (&#40;boFrom|slide_s&#41;>>8&#41;;  boChecks |= take_s;  take_s &= boMyEnemies;

    boRookMoves    = slide_n | slide_e | slide_s | slide_w;
    boRookCaptures = take_n  | take_e  | take_s  | take_w;
  &#125;