speed question about strelka

Uri Blass · Post by **Uri Blass** » Sun Jan 27, 2008 3:16 pm

strelka has the following code

const unsigned char PawnPassedFile[8] = {0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80};

My question is if it is not better to write simply (1<<) instead of an array.

Note that strelka is using often code like
(unsigned __int64)1 << sq instead of using an array setmask[sq]

I wonder if there is a speed reason to prefer an array when the array is arrays of chars of size 8 and to prefer 1<< when the array is array of 64 bit numbers of size 64.

Uri

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Jan 28, 2008 8:30 am

Uri Blass wrote:strelka has the following code
Code: Select all
const unsigned char PawnPassedFile[8] = {0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80};
My question is if it is not better to write simply (1<<) instead of an array.

Note that strelka is using often code like
(unsigned __int64)1 << sq instead of using an array setmask[sq]

I wonder if there is a speed reason to prefer an array when the array is arrays of chars of size 8 and to prefer 1<< when the array is array of 64 bit numbers of size 64.

Uri

It is likely negligible, specially if processing pawn-hashed stuff.
You may simply try both variations (or ignore it). It is about a good "balance" between memory lookup and computation for best scheduling and instructions per cycle. If you have high register pressure a lookup of 8 bytes is likely better.

Code: Select all

mov   al, byte ptr [PawnPassedFile + ecx]

or with zero extension to eax

Code: Select all

movzx eax, byte ptr [PawnPassedFile + ecx]

If you have high memory pressure in some context, you may better use computation en-passant while "waiting" for some memory read/write.

Code: Select all

mov al, 1
shl al, cl

Similar for single populated bitboards. In 32-bit mode is seems faster to use lookups:

Code: Select all

; 205  :         if (att & (1 << k)) LineMask[3][i][j] |= (unsigned __int64)1 << sq;
  0029e	b8 01 00 00 00	 mov	 eax, 1
  002a3	8b cf		 mov	 ecx, edi
  002a5	d3 e0		 shl	 eax, cl
  002a7	85 c5		 test	 eax, ebp
  002a9	74 1c		 je	 SHORT $L53879
  002ab	b8 01 00 00 00	 mov	 eax, 1
  002b0	33 d2		 xor	 edx, edx
  002b2	8b cb		 mov	 ecx, ebx
  002b4	e8 00 00 00 00	 call	 __allshl
  002b9	8b 0e		 mov	 ecx, DWORD PTR [esi]
  002bb	0b c8		 or	 ecx, eax
  002bd	8b 46 04	 mov	 eax, DWORD PTR [esi+4]
  002c0	0b c2		 or	 eax, edx
  002c2	89 0e		 mov	 DWORD PTR [esi], ecx
  002c4	89 46 04	 mov	 DWORD PTR [esi+4], eax
$L53879:
  002c7	47		 inc	 edi
  002c8	83 eb 08	 sub	 ebx, 8
  002cb	83 ff 08	 cmp	 edi, 8
  002ce	7c ce		 jl	 SHORT $L53878

Btw. have you understood how the pawn-shield stuff works?

grant · Post by **grant** » Mon Jan 28, 2008 12:55 pm

Hi

Interesting that you can use magic to calculate the pawn shield.

If the white king is on the 1st rank (or 2nd rank as I do), shift the WhitePawns bitboard to the left then AND with 0x0000000000000707. Then use magic to calculate an index into an array[64] containing a value representing the strength of the shield which, is then converted into a score. This is just part of my king safety.
My board is H1 = 0, H8 = 7, A8 = 63.

Code: Select all

square = WhiteKingSq;
		col = COL(square);
		if (col == 0)
			shift = (square > 47) ? square - 16 : 0;
		else if (col == 7)
			shift = (square > 47) ? square - 18 : 0;
		else
			shift = (square > 47) ? square - 17 : 0;
		if (shift) {
			U64 tempa = (WhitePawns >> shift) & 0x0000000000000707;
			U64 tempb = ((tempa & 7) << 8);
			tempa = (tempa >> 8) | tempb;
			index = (unsigned int)(tempa) * KING_SHIELD_MAGIC;
			index = index >> 26;
			lookup = shield[index];
			if (square < 56)
				lookup = lookup >> 1;
			score = shieldValue[lookup];
		}

Grant

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Jan 28, 2008 8:43 pm

grant wrote:Hi

Interesting that you can use magic to calculate the pawn shield.

If the white king is on the 1st rank (or 2nd rank as I do), shift the WhitePawns bitboard to the left then AND with 0x0000000000000707. Then use magic to calculate an index into an array[64] containing a value representing the strength of the shield which, is then converted into a score. This is just part of my king safety.
My board is H1 = 0, H8 = 7, A8 = 63.
Code: Select all
square = WhiteKingSq;
		col = COL(square);
		if (col == 0)
			shift = (square > 47) ? square - 16 : 0;
		else if (col == 7)
			shift = (square > 47) ? square - 18 : 0;
		else
			shift = (square > 47) ? square - 17 : 0;
		if (shift) {
			U64 tempa = (WhitePawns >> shift) & 0x0000000000000707;
			U64 tempb = ((tempa & 7) << 8);
			tempa = (tempa >> 8) | tempb;
			index = (unsigned int)(tempa) * KING_SHIELD_MAGIC;
			index = index >> 26;
			lookup = shield[index];
			if (square < 56)
				lookup = lookup >> 1;
			score = shieldValue[lookup];
		}
Grant

Hi Grant,

Yes, i had similar in mind in the wiki...
http://chessprogramming.wikispaces.com/ ... 20Pattern4
- but it will be updated in the future.

Probably Strelka is a bit more sophisticated. It considers abc-, def- and fgh-files, four ranks each for both sides. Pawn-shields are calculated for center and wings and are member of pawn-hash:

Code: Select all

struct pawn_info_t * pawn_get_info()
{
  ....
  // We estimate pawn covering king on flanks and-or in the center
  mask = Board->mp[BlackPawn];
  mask1=((mask>>23)&0x0E00)|((mask>>18)&0x01C0)|((mask>>13)&0x0038)|((mask>>8)&0x0007);
  mask = Board->mp[WhitePawn];
  mask2=((mask>>23)&0x0E00)|((mask>>18)&0x01C0)|((mask>>13)&0x0038)|((mask>>8)&0x0007);
  entry->sheet_white_king[0] = PawnStruScore1[mask1] + PawnStruScore0[mask2];
  mask = Board->mp[BlackPawn];
  ...
}

Instead four 64-bit shifts, four 32-bit "ands" and three 32-bit ors ...

Code: Select all

mask = 
mask1 = ((mask >> 23) & 0x0E00) 
      | ((mask >> 18) & 0x01C0) 
      | ((mask >> 13) & 0x0038) 
      | ((mask >>  8) & 0x0007);

 . . . . . . . .     . . . . . . . .
 . . . . . . . .     . . . . . . . .
 . . . . . . . .     . . . . . . . .
 j k l . . . . .     . . . . . . . .
 g h i . . . . .     . . . . . . . .
 d e f . . . . .     . . . . . . . .
 a b c . . . . .     i j k l . . . .
 . . . . . . . .     a b c d e f g h

... one may cheaper use pre-mask and 32-bit imul, shift:

Code: Select all

 u32 mask1 = (u32) (mask >> 8);
 mask1 &= 0x07070707;
 mask1 *= 0x00108420;
 mask1 >>= 20;

Since the code only occurs in pawn-hash miss cases, possible savings are negligible of course.

Cheers,
Gerd

Uri Blass · Post by **Uri Blass** » Mon Jan 28, 2008 11:07 pm

Gerd Isenberg wrote:
Uri Blass wrote:strelka has the following code
Code: Select all
const unsigned char PawnPassedFile[8] = {0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80};
My question is if it is not better to write simply (1<<) instead of an array.

Note that strelka is using often code like
(unsigned __int64)1 << sq instead of using an array setmask[sq]

I wonder if there is a speed reason to prefer an array when the array is arrays of chars of size 8 and to prefer 1<< when the array is array of 64 bit numbers of size 64.

Uri
It is likely negligible, specially if processing pawn-hashed stuff.
You may simply try both variations (or ignore it). It is about a good "balance" between memory lookup and computation for best scheduling and instructions per cycle. If you have high register pressure a lookup of 8 bytes is likely better.
Code: Select all
mov   al, byte ptr [PawnPassedFile + ecx]
or with zero extension to eax
Code: Select all
movzx eax, byte ptr [PawnPassedFile + ecx]
If you have high memory pressure in some context, you may better use computation en-passant while "waiting" for some memory read/write.
Code: Select all
mov al, 1
shl al, cl
Similar for single populated bitboards. In 32-bit mode is seems faster to use lookups:
Code: Select all
; 205  :         if (att & (1 << k)) LineMask[3][i][j] |= (unsigned __int64)1 << sq;
  0029e	b8 01 00 00 00	 mov	 eax, 1
  002a3	8b cf		 mov	 ecx, edi
  002a5	d3 e0		 shl	 eax, cl
  002a7	85 c5		 test	 eax, ebp
  002a9	74 1c		 je	 SHORT $L53879
  002ab	b8 01 00 00 00	 mov	 eax, 1
  002b0	33 d2		 xor	 edx, edx
  002b2	8b cb		 mov	 ecx, ebx
  002b4	e8 00 00 00 00	 call	 __allshl
  002b9	8b 0e		 mov	 ecx, DWORD PTR [esi]
  002bb	0b c8		 or	 ecx, eax
  002bd	8b 46 04	 mov	 eax, DWORD PTR [esi+4]
  002c0	0b c2		 or	 eax, edx
  002c2	89 0e		 mov	 DWORD PTR [esi], ecx
  002c4	89 46 04	 mov	 DWORD PTR [esi+4], eax
$L53879:
  002c7	47		 inc	 edi
  002c8	83 eb 08	 sub	 ebx, 8
  002cb	83 ff 08	 cmp	 edi, 8
  002ce	7c ce		 jl	 SHORT $L53878
Btw. have you understood how the pawn-shield stuff works?

I think that I understood the idea.
The idea is simply to have scores for
attacking the king and defending the king by pawns based on pawn structure near the king.

PawnStruScore1 can be called pawn_attack_king and
pawnStruScore0 can be called pawn_defence_king

Code: Select all

 mask1 = ((mask >> 23) & 0x0E00) | ((mask >> 18) & 0x01C0) | ((mask >> 13) & 0x0038) | ((mask >> 8) & 0x0007);

This code simply take a bitboard and translate it to a single number of 12 bits to give information about part of the bits(the bits that represent
A5 B5 C5 A4 B4 C4 A3 B3 C3 A2 B2 C2)

This code is about black pawns so the score are scores for attacking the white king.

There is a similiar score for white pawns to defend the white king.
The same is done also for files D E F and for files F G H.

Uri

bob · Post by **bob** » Tue Jan 29, 2008 9:49 pm

Uri Blass wrote:
Gerd Isenberg wrote:
Uri Blass wrote:strelka has the following code
Code: Select all
const unsigned char PawnPassedFile[8] = {0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80};
My question is if it is not better to write simply (1<<) instead of an array.

Note that strelka is using often code like
(unsigned __int64)1 << sq instead of using an array setmask[sq]

I wonder if there is a speed reason to prefer an array when the array is arrays of chars of size 8 and to prefer 1<< when the array is array of 64 bit numbers of size 64.

Uri
It is likely negligible, specially if processing pawn-hashed stuff.
You may simply try both variations (or ignore it). It is about a good "balance" between memory lookup and computation for best scheduling and instructions per cycle. If you have high register pressure a lookup of 8 bytes is likely better.
Code: Select all
mov   al, byte ptr [PawnPassedFile + ecx]
or with zero extension to eax
Code: Select all
movzx eax, byte ptr [PawnPassedFile + ecx]
If you have high memory pressure in some context, you may better use computation en-passant while "waiting" for some memory read/write.
Code: Select all
mov al, 1
shl al, cl
Similar for single populated bitboards. In 32-bit mode is seems faster to use lookups:
Code: Select all
; 205  :         if (att & (1 << k)) LineMask[3][i][j] |= (unsigned __int64)1 << sq;
  0029e	b8 01 00 00 00	 mov	 eax, 1
  002a3	8b cf		 mov	 ecx, edi
  002a5	d3 e0		 shl	 eax, cl
  002a7	85 c5		 test	 eax, ebp
  002a9	74 1c		 je	 SHORT $L53879
  002ab	b8 01 00 00 00	 mov	 eax, 1
  002b0	33 d2		 xor	 edx, edx
  002b2	8b cb		 mov	 ecx, ebx
  002b4	e8 00 00 00 00	 call	 __allshl
  002b9	8b 0e		 mov	 ecx, DWORD PTR [esi]
  002bb	0b c8		 or	 ecx, eax
  002bd	8b 46 04	 mov	 eax, DWORD PTR [esi+4]
  002c0	0b c2		 or	 eax, edx
  002c2	89 0e		 mov	 DWORD PTR [esi], ecx
  002c4	89 46 04	 mov	 DWORD PTR [esi+4], eax
$L53879:
  002c7	47		 inc	 edi
  002c8	83 eb 08	 sub	 ebx, 8
  002cb	83 ff 08	 cmp	 edi, 8
  002ce	7c ce		 jl	 SHORT $L53878
Btw. have you understood how the pawn-shield stuff works?
I think that I understood the idea.
The idea is simply to have scores for
attacking the king and defending the king by pawns based on pawn structure near the king.

PawnStruScore1 can be called pawn_attack_king and
pawnStruScore0 can be called pawn_defence_king
Code: Select all
 mask1 = ((mask >> 23) & 0x0E00) | ((mask >> 18) & 0x01C0) | ((mask >> 13) & 0x0038) | ((mask >> 8) & 0x0007);
This code simply take a bitboard and translate it to a single number of 12 bits to give information about part of the bits(the bits that represent
A5 B5 C5 A4 B4 C4 A3 B3 C3 A2 B2 C2)

This code is about black pawns so the score are scores for attacking the white king.

There is a similiar score for white pawns to defend the white king.
The same is done also for files D E F and for files F G H.

Uri

The idea is cute, although it has no real benefit for a program that does pawn score hashing. I compute similar scores as above, for four groups of pawns depending on where the king stands. But they are all hashed (I do them all, and then use the right one outside the hashing since it depends on the king position). As a result, while the above would let me compute static defect evaluations and stuff them in a 4096 element array, the saving would be nil since hashing eliminates almost all pawn scoring overhead anyway...

speed question about strelka

speed question about strelka

Re: speed question about strelka

Re: speed question about strelka

Re: speed question about strelka

Re: speed question about strelka

Re: speed question about strelka