152k rook and bishop attacks using PEXT and PDEP

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Lasse Hansen
Posts: 27
Joined: Wed May 28, 2008 1:07 pm
Location: Porsgrunn, Norway

152k rook and bishop attacks using PEXT and PDEP

Post by Lasse Hansen »

Hi again!

I got fascinated by the new BMI2 instructions PEXT and PDEP, and wanted to see how much it may compress rook and bishop attacks compared to magic bitboards.

No compression at all gives 841k as already noted, and by using PDEP and shift, one easily comes down to 209k+, also already noted by others.

By using the fact that some attacks are *always* set, I got the total size
down to 152k, combined rook and bishop attacks. The smallest set of a rook attacks can be generated by

Code: Select all

RookAttacksMin[sqr] = KingAttacks[sqr] & MaskRook[sqr];
and they dont need to be in the attack table. For a rook at E4 one only need to store 10 bits, instead of 14 bits as magic bitboards do.

I added the bishop attacks after the rook attacks, and since there were more than enough space left, all bishop attacks could be given the same shift, 55, so they all lie in the higher MSBs of the attacks.
Heres usage and initialisation code.

Code: Select all

	U64 AttacksPDEP[152*1024]; // Attacks table rook + bishop
	U64 MaskRookPDEP[64];	   // Masks for expanding attacks by PDEP
	int RookPosPDEP[64];	   // Basic square
	int RookShiftPDEP[64];	   // Shifts within shared attacks

	U64 RookAttacks(TSquare s, U64 Pcs)
	{ 
		U64 Atk = AttacksPDEP[RookPosPDEP[s] + pext(Pcs, MaskRook64[s])];
		Atk = pdep(Atk>>RookShiftPDEP[s], MaskRookPDEP[s]);
		return Atk | (KingAttacks[s] & MaskRook[s]);
	} 
	U64 BishopAttacks(TSquare s, U64 Pcs) 
	{ 
		U64 Atk = AttacksPDEP[BishPosPDEP[s] + pext(Pcs, MaskBishop64[s])]; 
		Atk = pdep(Atk>>55, MaskBishPDEP[s]);
		return Atk | (KingAttacks[s] & MaskBishop[s]);
	}
Initialisation of the attack positions and shifts were done manually.

Code: Select all

	const TSquare SqrQueRook[64] = {
		H8,A8,H1,A1,		// 12 bit ( 4)
		G8,F8,E8,D8,C8,B8,  // 11 bit (24)
		H7,H6,H5,H4,H3,H2,	
		A7,A6,A5,A4,A3,A2,	
		G1,F1,E1,D1,C1,B1, 									  
		G7,F7,E7,D7,C7,B7,	// 10 bit (36)
		G6,F6,E6,D6,C6,B6,
		G5,F5,E5,D5,C5,B5,
		G4,F4,E4,D4,C4,B4,
		G3,F3,E3,D3,C3,B3,
		G2,F2,E2,D2,C2,B2
	};
	pos = 0;
	RookPosPDEP[H8] =      pos; RookShiftPDEP[H8] =    0; // 12 bit
	RookPosPDEP[A8] =      pos; RookShiftPDEP[A8] =   12;
	RookPosPDEP[A1] =      pos; RookShiftPDEP[A1] =   24;
	RookPosPDEP[H1] =      pos; RookShiftPDEP[H1] =   36;
	RookPosPDEP[G8] =      pos; RookShiftPDEP[G8] =   48; // 11 bit
	RookPosPDEP[F8] = 2048+pos; RookShiftPDEP[F8] =   48; // 59-63 free (5)
	pos += 4096;
	RookPosPDEP[E8] =      pos; RookShiftPDEP[E8] =    0; // 11 bit
	RookPosPDEP[D8] =      pos; RookShiftPDEP[D8] =   11;
	RookPosPDEP[C8] =      pos; RookShiftPDEP[C8] =   22;
	RookPosPDEP[B8] =      pos; RookShiftPDEP[B8] =   33;
	RookPosPDEP[H7] =      pos; RookShiftPDEP[H7] =   44; // 55-63 free (9)
	pos += 2048;

	etc....
	
	for &#40;n = 0; n < 64; n++) &#123;
		sqr = SqrQueRook&#91;n&#93;;
		MaskRookPDEP&#91;sqr&#93; = RookAttacks5&#40;sqr, 0&#41; ^ &#40;KingAttacks&#91;sqr&#93; & MaskRook&#91;sqr&#93;);
		for &#40;i=0; i<64; i++)	
		for &#40;j=0; j<64; j++) &#123;	
			Ptn = GenRookOcc&#40;sqr,i,j&#41;;
			Atk = RookAttacks5&#40;sqr, Ptn&#41;; // Normal attack generation
			Atk ^= KingAttacks&#91;sqr&#93; & MaskRook&#91;sqr&#93;; // Remove always set bits
			Atk = pext&#40;Atk, MaskRookPDEP&#91;sqr&#93;) << RookShiftPDEP&#91;sqr&#93;;
			AttacksPDEP&#91;RookPosPDEP&#91;sqr&#93; + pext&#40;Ptn, MaskRook64&#91;sqr&#93;)&#93; |= Atk;
		&#125;
	&#125;	
	const TSquare SqrQueBish&#91;64&#93; = &#123;
		D5,E5,D4,E4, 						// 9 bit ( 4&#41;
		C6,D6,E6,F6,F5,F4,F3,E3,D3,C3,C4,C5,// 7 bit &#40;12&#41;
		A8,H8,A1,H1, 						// 6 bit ( 4&#41;
		B7,C7,D7,E7,F7,G7,G6,G5,G4,G3,G2, 	// 5 bit &#40;44&#41;
		F2,E2,D2,C2,B2,B3,B4,B5,B6,
		B8,C8,D8,E8,F8,G8,H7,H6,H5,H4,H3,H2,
		G1,F1,E1,D1,C1,B1,A2,A3,A4,A5,A6,A7 
	&#125;;
	pos = RookPosPDEP&#91;E8&#93;; // 9 free bits, starting at 55
	BishPosPDEP&#91;D5&#93; =      pos; BishShiftPDEP&#91;D5&#93; = 55; // 9bit
	BishPosPDEP&#91;E5&#93; =  512+pos; BishShiftPDEP&#91;E5&#93; = 55;
	BishPosPDEP&#91;D4&#93; = 1024+pos; BishShiftPDEP&#91;D4&#93; = 55;
	BishPosPDEP&#91;E4&#93; = 1536+pos; BishShiftPDEP&#91;E4&#93; = 55;
	
	pos = RookPosPDEP&#91;H6&#93;; // 9 free bits, starting at 55
	BishPosPDEP&#91;C6&#93; =      pos; BishShiftPDEP&#91;C6&#93; = 55; // 7 bit
	BishPosPDEP&#91;D6&#93; =  128+pos; BishShiftPDEP&#91;D6&#93; = 55;
	BishPosPDEP&#91;E6&#93; =  256+pos; BishShiftPDEP&#91;E6&#93; = 55;
	BishPosPDEP&#91;F6&#93; =  384+pos; BishShiftPDEP&#91;F6&#93; = 55;
	BishPosPDEP&#91;F5&#93; =  512+pos; BishShiftPDEP&#91;F5&#93; = 55;

	etc....
	
	for &#40;n = 0; n < 64; n++) &#123;
		sqr = SqrQueBish&#91;n&#93;;
		MaskBishPDEP&#91;sqr&#93; = BishopAttacks5&#40;sqr, 0&#41; ^ &#40;KingAttacks&#91;sqr&#93; & MaskBishop&#91;sqr&#93;);
		for &#40;i=0; i<64; i++)	
		for &#40;j=0; j<64; j++) &#123;	
			Ptn = GenBishOcc&#40;sqr,i,j&#41;;
			Atk = BishopAttacks5&#40;sqr, Ptn&#41;;	// Normal attack generation
			Atk ^= KingAttacks&#91;sqr&#93; & MaskBishop&#91;sqr&#93;; // Remove always set bits
			Atk = pext&#40;Atk, MaskBishPDEP&#91;sqr&#93;) << BishShiftPDEP&#91;sqr&#93;;
			AttacksPDEP&#91;BishPosPDEP&#91;sqr&#93; + pext&#40;Ptn, MaskBishop64&#91;sqr&#93;)&#93; |= Atk;
		&#125;
	&#125;	
I had to emulate PEXT and PDEP, so I dont know have fast this will be. I got a slowdown of around 70% in my normal search in Sillycon, compared to magic bitboards.

To get an attack table of around 209k instead of 841k, one need to do an additional PDEP and shift (or??). The difference between 209k and 152k is an extra OR in series. The AND between KingAttacks and Mask might be done in parallell. Also, I believe that KingAttacks and Masks are normally present in the cache, since they are much used.

Anyway it's yet another slider attack generator :-)

Regards, Lasse
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: 152k rook and bishop attacks using PEXT and PDEP

Post by syzygy »

Lasse Hansen wrote:To get an attack table of around 209k instead of 841k, one need to do an additional PDEP and shift (or??).
I have no shift:
https://github.com/syzygy1/tb/blob/master/src/bmi2.h

My table is 210.5k, but I didn't bother (yet) to remove some redundancies.

By now there should be lots of people with Haswell machines. Maybe the Stockfish team (for example) could add pext/pdep generation as a compilation option, so that we finally get some numbers?
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: 152k rook and bishop attacks using PEXT and PDEP

Post by Gerd Isenberg »

Hi Lasse,

interesting idea to omit adjacent attack bits in pdep/pext which are independent from occupancy, and to calculate them separately. Considering the huge L3, my wild guess is that a 800+K plain pext approach will be fastest in a real engine ;-)

Cheers,
Gerd
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: 152k rook and bishop attacks using PEXT and PDEP

Post by syzygy »

Gerd Isenberg wrote:Considering the huge L3, my wild guess is that a 800+K plain pext approach will be fastest in a real engine ;-)
For my table generator, my wild guess is that the 200+K pdep/pext approach is fastest. In my chess engine, shiftless magic bitboards with compact tables by Volker Annuss easily beats the "Hyperbola Quintessence" approach, but for my generator they are about equal (possibly Hyperbola is faster on AMD). So memory usage seems more of an issue for my generator.

Maybe pext for bishops, pdep/pext for rooks.

My plan is to wait for Haswell-E...