speed question about strelka

Discussion of chess software programming and technical issues.

Moderator: Ras

Uri Blass
Posts: 10792
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

speed question about strelka

Post by Uri Blass »

strelka has the following code

Code: Select all

const unsigned char PawnPassedFile[8] = {0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80};
My question is if it is not better to write simply (1<<) instead of an array.

Note that strelka is using often code like
(unsigned __int64)1 << sq instead of using an array setmask[sq]

I wonder if there is a speed reason to prefer an array when the array is arrays of chars of size 8 and to prefer 1<< when the array is array of 64 bit numbers of size 64.

Uri
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: speed question about strelka

Post by Gerd Isenberg »

Uri Blass wrote:strelka has the following code

Code: Select all

const unsigned char PawnPassedFile[8] = {0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80};
My question is if it is not better to write simply (1<<) instead of an array.

Note that strelka is using often code like
(unsigned __int64)1 << sq instead of using an array setmask[sq]

I wonder if there is a speed reason to prefer an array when the array is arrays of chars of size 8 and to prefer 1<< when the array is array of 64 bit numbers of size 64.

Uri
It is likely negligible, specially if processing pawn-hashed stuff.
You may simply try both variations (or ignore it). It is about a good "balance" between memory lookup and computation for best scheduling and instructions per cycle. If you have high register pressure a lookup of 8 bytes is likely better.

Code: Select all

mov   al, byte ptr [PawnPassedFile + ecx]
or with zero extension to eax

Code: Select all

movzx eax, byte ptr [PawnPassedFile + ecx]
If you have high memory pressure in some context, you may better use computation en-passant while "waiting" for some memory read/write.

Code: Select all

mov al, 1
shl al, cl
Similar for single populated bitboards. In 32-bit mode is seems faster to use lookups:

Code: Select all

; 205  :         if (att & (1 << k)) LineMask[3][i][j] |= (unsigned __int64)1 << sq;
  0029e	b8 01 00 00 00	 mov	 eax, 1
  002a3	8b cf		 mov	 ecx, edi
  002a5	d3 e0		 shl	 eax, cl
  002a7	85 c5		 test	 eax, ebp
  002a9	74 1c		 je	 SHORT $L53879
  002ab	b8 01 00 00 00	 mov	 eax, 1
  002b0	33 d2		 xor	 edx, edx
  002b2	8b cb		 mov	 ecx, ebx
  002b4	e8 00 00 00 00	 call	 __allshl
  002b9	8b 0e		 mov	 ecx, DWORD PTR [esi]
  002bb	0b c8		 or	 ecx, eax
  002bd	8b 46 04	 mov	 eax, DWORD PTR [esi+4]
  002c0	0b c2		 or	 eax, edx
  002c2	89 0e		 mov	 DWORD PTR [esi], ecx
  002c4	89 46 04	 mov	 DWORD PTR [esi+4], eax
$L53879:
  002c7	47		 inc	 edi
  002c8	83 eb 08	 sub	 ebx, 8
  002cb	83 ff 08	 cmp	 edi, 8
  002ce	7c ce		 jl	 SHORT $L53878
Btw. have you understood how the pawn-shield stuff works?
grant
Posts: 67
Joined: Mon Aug 06, 2007 4:42 pm
Location: London, England

Re: speed question about strelka

Post by grant »

Hi

Interesting that you can use magic to calculate the pawn shield.

If the white king is on the 1st rank (or 2nd rank as I do), shift the WhitePawns bitboard to the left then AND with 0x0000000000000707. Then use magic to calculate an index into an array[64] containing a value representing the strength of the shield which, is then converted into a score. This is just part of my king safety.
My board is H1 = 0, H8 = 7, A8 = 63.

Code: Select all

square = WhiteKingSq;
		col = COL(square);
		if (col == 0)
			shift = (square > 47) ? square - 16 : 0;
		else if (col == 7)
			shift = (square > 47) ? square - 18 : 0;
		else
			shift = (square > 47) ? square - 17 : 0;
		if (shift) {
			U64 tempa = (WhitePawns >> shift) & 0x0000000000000707;
			U64 tempb = ((tempa & 7) << 8);
			tempa = (tempa >> 8) | tempb;
			index = (unsigned int)(tempa) * KING_SHIELD_MAGIC;
			index = index >> 26;
			lookup = shield[index];
			if (square < 56)
				lookup = lookup >> 1;
			score = shieldValue[lookup];
		}
Grant
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: speed question about strelka

Post by Gerd Isenberg »

grant wrote:Hi

Interesting that you can use magic to calculate the pawn shield.

If the white king is on the 1st rank (or 2nd rank as I do), shift the WhitePawns bitboard to the left then AND with 0x0000000000000707. Then use magic to calculate an index into an array[64] containing a value representing the strength of the shield which, is then converted into a score. This is just part of my king safety.
My board is H1 = 0, H8 = 7, A8 = 63.

Code: Select all

square = WhiteKingSq;
		col = COL(square);
		if (col == 0)
			shift = (square > 47) ? square - 16 : 0;
		else if (col == 7)
			shift = (square > 47) ? square - 18 : 0;
		else
			shift = (square > 47) ? square - 17 : 0;
		if (shift) {
			U64 tempa = (WhitePawns >> shift) & 0x0000000000000707;
			U64 tempb = ((tempa & 7) << 8);
			tempa = (tempa >> 8) | tempb;
			index = (unsigned int)(tempa) * KING_SHIELD_MAGIC;
			index = index >> 26;
			lookup = shield[index];
			if (square < 56)
				lookup = lookup >> 1;
			score = shieldValue[lookup];
		}
Grant
Hi Grant,

Yes, i had similar in mind in the wiki...
http://chessprogramming.wikispaces.com/ ... 20Pattern4
- but it will be updated in the future.

Probably Strelka is a bit more sophisticated. It considers abc-, def- and fgh-files, four ranks each for both sides. Pawn-shields are calculated for center and wings and are member of pawn-hash:

Code: Select all

struct pawn_info_t * pawn_get_info()
{
  ....
  // We estimate pawn covering king on flanks and-or in the center
  mask = Board->mp[BlackPawn];
  mask1=((mask>>23)&0x0E00)|((mask>>18)&0x01C0)|((mask>>13)&0x0038)|((mask>>8)&0x0007);
  mask = Board->mp[WhitePawn];
  mask2=((mask>>23)&0x0E00)|((mask>>18)&0x01C0)|((mask>>13)&0x0038)|((mask>>8)&0x0007);
  entry->sheet_white_king[0] = PawnStruScore1[mask1] + PawnStruScore0[mask2];
  mask = Board->mp[BlackPawn];
  ...
} 
Instead four 64-bit shifts, four 32-bit "ands" and three 32-bit ors ...

Code: Select all

mask = 
mask1 = ((mask >> 23) & 0x0E00) 
      | ((mask >> 18) & 0x01C0) 
      | ((mask >> 13) & 0x0038) 
      | ((mask >>  8) & 0x0007);

 . . . . . . . .     . . . . . . . .
 . . . . . . . .     . . . . . . . .
 . . . . . . . .     . . . . . . . .
 j k l . . . . .     . . . . . . . .
 g h i . . . . .     . . . . . . . .
 d e f . . . . .     . . . . . . . .
 a b c . . . . .     i j k l . . . .
 . . . . . . . .     a b c d e f g h 

... one may cheaper use pre-mask and 32-bit imul, shift:

Code: Select all

 u32 mask1 = (u32) (mask >> 8);
 mask1 &= 0x07070707;
 mask1 *= 0x00108420;
 mask1 >>= 20;
Since the code only occurs in pawn-hash miss cases, possible savings are negligible of course.

Cheers,
Gerd
Uri Blass
Posts: 10792
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: speed question about strelka

Post by Uri Blass »

Gerd Isenberg wrote:
Uri Blass wrote:strelka has the following code

Code: Select all

const unsigned char PawnPassedFile[8] = {0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80};
My question is if it is not better to write simply (1<<) instead of an array.

Note that strelka is using often code like
(unsigned __int64)1 << sq instead of using an array setmask[sq]

I wonder if there is a speed reason to prefer an array when the array is arrays of chars of size 8 and to prefer 1<< when the array is array of 64 bit numbers of size 64.

Uri
It is likely negligible, specially if processing pawn-hashed stuff.
You may simply try both variations (or ignore it). It is about a good "balance" between memory lookup and computation for best scheduling and instructions per cycle. If you have high register pressure a lookup of 8 bytes is likely better.

Code: Select all

mov   al, byte ptr [PawnPassedFile + ecx]
or with zero extension to eax

Code: Select all

movzx eax, byte ptr [PawnPassedFile + ecx]
If you have high memory pressure in some context, you may better use computation en-passant while "waiting" for some memory read/write.

Code: Select all

mov al, 1
shl al, cl
Similar for single populated bitboards. In 32-bit mode is seems faster to use lookups:

Code: Select all

; 205  :         if (att & (1 << k)) LineMask[3][i][j] |= (unsigned __int64)1 << sq;
  0029e	b8 01 00 00 00	 mov	 eax, 1
  002a3	8b cf		 mov	 ecx, edi
  002a5	d3 e0		 shl	 eax, cl
  002a7	85 c5		 test	 eax, ebp
  002a9	74 1c		 je	 SHORT $L53879
  002ab	b8 01 00 00 00	 mov	 eax, 1
  002b0	33 d2		 xor	 edx, edx
  002b2	8b cb		 mov	 ecx, ebx
  002b4	e8 00 00 00 00	 call	 __allshl
  002b9	8b 0e		 mov	 ecx, DWORD PTR [esi]
  002bb	0b c8		 or	 ecx, eax
  002bd	8b 46 04	 mov	 eax, DWORD PTR [esi+4]
  002c0	0b c2		 or	 eax, edx
  002c2	89 0e		 mov	 DWORD PTR [esi], ecx
  002c4	89 46 04	 mov	 DWORD PTR [esi+4], eax
$L53879:
  002c7	47		 inc	 edi
  002c8	83 eb 08	 sub	 ebx, 8
  002cb	83 ff 08	 cmp	 edi, 8
  002ce	7c ce		 jl	 SHORT $L53878
Btw. have you understood how the pawn-shield stuff works?
I think that I understood the idea.
The idea is simply to have scores for
attacking the king and defending the king by pawns based on pawn structure near the king.

PawnStruScore1 can be called pawn_attack_king and
pawnStruScore0 can be called pawn_defence_king



Code: Select all

 mask1 = ((mask >> 23) & 0x0E00) | ((mask >> 18) & 0x01C0) | ((mask >> 13) & 0x0038) | ((mask >> 8) & 0x0007);
This code simply take a bitboard and translate it to a single number of 12 bits to give information about part of the bits(the bits that represent
A5 B5 C5 A4 B4 C4 A3 B3 C3 A2 B2 C2)

This code is about black pawns so the score are scores for attacking the white king.

There is a similiar score for white pawns to defend the white king.
The same is done also for files D E F and for files F G H.

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: speed question about strelka

Post by bob »

Uri Blass wrote:
Gerd Isenberg wrote:
Uri Blass wrote:strelka has the following code

Code: Select all

const unsigned char PawnPassedFile[8] = {0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80};
My question is if it is not better to write simply (1<<) instead of an array.

Note that strelka is using often code like
(unsigned __int64)1 << sq instead of using an array setmask[sq]

I wonder if there is a speed reason to prefer an array when the array is arrays of chars of size 8 and to prefer 1<< when the array is array of 64 bit numbers of size 64.

Uri
It is likely negligible, specially if processing pawn-hashed stuff.
You may simply try both variations (or ignore it). It is about a good "balance" between memory lookup and computation for best scheduling and instructions per cycle. If you have high register pressure a lookup of 8 bytes is likely better.

Code: Select all

mov   al, byte ptr [PawnPassedFile + ecx]
or with zero extension to eax

Code: Select all

movzx eax, byte ptr [PawnPassedFile + ecx]
If you have high memory pressure in some context, you may better use computation en-passant while "waiting" for some memory read/write.

Code: Select all

mov al, 1
shl al, cl
Similar for single populated bitboards. In 32-bit mode is seems faster to use lookups:

Code: Select all

; 205  :         if (att & (1 << k)) LineMask[3][i][j] |= (unsigned __int64)1 << sq;
  0029e	b8 01 00 00 00	 mov	 eax, 1
  002a3	8b cf		 mov	 ecx, edi
  002a5	d3 e0		 shl	 eax, cl
  002a7	85 c5		 test	 eax, ebp
  002a9	74 1c		 je	 SHORT $L53879
  002ab	b8 01 00 00 00	 mov	 eax, 1
  002b0	33 d2		 xor	 edx, edx
  002b2	8b cb		 mov	 ecx, ebx
  002b4	e8 00 00 00 00	 call	 __allshl
  002b9	8b 0e		 mov	 ecx, DWORD PTR [esi]
  002bb	0b c8		 or	 ecx, eax
  002bd	8b 46 04	 mov	 eax, DWORD PTR [esi+4]
  002c0	0b c2		 or	 eax, edx
  002c2	89 0e		 mov	 DWORD PTR [esi], ecx
  002c4	89 46 04	 mov	 DWORD PTR [esi+4], eax
$L53879:
  002c7	47		 inc	 edi
  002c8	83 eb 08	 sub	 ebx, 8
  002cb	83 ff 08	 cmp	 edi, 8
  002ce	7c ce		 jl	 SHORT $L53878
Btw. have you understood how the pawn-shield stuff works?
I think that I understood the idea.
The idea is simply to have scores for
attacking the king and defending the king by pawns based on pawn structure near the king.

PawnStruScore1 can be called pawn_attack_king and
pawnStruScore0 can be called pawn_defence_king



Code: Select all

 mask1 = ((mask >> 23) & 0x0E00) | ((mask >> 18) & 0x01C0) | ((mask >> 13) & 0x0038) | ((mask >> 8) & 0x0007);
This code simply take a bitboard and translate it to a single number of 12 bits to give information about part of the bits(the bits that represent
A5 B5 C5 A4 B4 C4 A3 B3 C3 A2 B2 C2)

This code is about black pawns so the score are scores for attacking the white king.

There is a similiar score for white pawns to defend the white king.
The same is done also for files D E F and for files F G H.

Uri
The idea is cute, although it has no real benefit for a program that does pawn score hashing. I compute similar scores as above, for four groups of pawns depending on where the king stands. But they are all hashed (I do them all, and then use the right one outside the hashing since it depends on the king position). As a result, while the above would let me compute static defect evaluations and stuff them in a 4096 element array, the saving would be nil since hashing eliminates almost all pawn scoring overhead anyway...