speed question about strelka

Uri Blass · Post by **Uri Blass** » Sat Jan 12, 2008 8:59 pm

Strelkla include the following code

if ((square & 0xFFFFFFF8) == 0x30)

square always get numbers that are 0-63 based on my understanding so
This code is practically equivalent to
if ((square & 56) == 48)

My question is if there is speed reason to use FFFF numbers because before changing the code of strelka to be more simple I would like to know if there is a speed reason for decisions that I do not understand.

Uri

Dirt · Post by **Dirt** » Sat Jan 12, 2008 9:11 pm

Uri Blass wrote:Strelkla include the following code

if ((square & 0xFFFFFFF8) == 0x30)

square always get numbers that are 0-63 based on my understanding so
This code is practically equivalent to
if ((square & 56) == 48)

My question is if there is speed reason to use FFFF numbers because before changing the code of strelka to be more simple I would like to know if there is a speed reason for decisions that I do not understand.

Uri

You find
if ((square & 56) == 48)
simpler? I find myself immediately converting it to hex to try and understand it. I might prefer 0xFFF8, if it really works the same.

Uri Blass · Post by **Uri Blass** » Sat Jan 12, 2008 9:25 pm

Dirt wrote:
Uri Blass wrote:Strelkla include the following code

if ((square & 0xFFFFFFF8) == 0x30)

square always get numbers that are 0-63 based on my understanding so
This code is practically equivalent to
if ((square & 56) == 48)

My question is if there is speed reason to use FFFF numbers because before changing the code of strelka to be more simple I would like to know if there is a speed reason for decisions that I do not understand.

Uri
You find
if ((square & 56) == 48)
simpler? I find myself immediately converting it to hex to try and understand it. I might prefer 0xFFF8, if it really works the same.

I usually use base 10 and not base 16 and I prefer to see numbers in base 10 when they are small(and I consider 48 as small) but this is not the only point.

1)You can use 0x38 or 0xF8 and no reason to use many F's and my question is if there is a speed reason not to do it.

2)It is easy for me to translate 56 to 32+16+8 and translating 0xFFF8 is less intuitive for me(I am not used to divide number in base 16 to sum of power of 2 and only if the numbers are really big it is more simple for me).

I can see that square&56=48 means 7th rank more easily.

My thinking is the following:

square&56=48 means square&32=32 square&16=16 square&8=0
1)0<=square<=63
2)square&32=32
3)square&16=16
4)square&8=0

1,2->
5)63>=square>=32
3,5->
6)63>=square>=48

4,6->
7)56>square>=48 or in other words square is in the 7th rank

Uri

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Jan 12, 2008 10:30 pm

It is a minor issue and might depend on the compiler. Some compiler seem to produce better code with native register width constants - even if encoded as sign extended byte inside the opcode.

0xfffffff8 or ~7 or -8 is used to mask off the three file bits, to look whether the square is on the seventh rank.

Code: Select all

; 110  :     if ((square & 0xFFFFFFF8) == 0x30) {    // 7.th rank 
  004b5	83 e6 f8   and   esi, -8	; fffffff8H
  004b8	83 fe 30   cmp   esi, 48	; 00000030H
  004bb	75 30      jne   SHORT $L54996

If one uses 0x38 or 56, the vc6 compiler uses more constrained register allocation to take "advantage" of partial x86 byte-registers for the compare, which seems microscopic worse.

Code: Select all

; 110  :     if ((square & 0x38) == 0x30) {    // 7.th rank
  004a8 8b d7      mov   edx, edi
  004aa 83 e2 38   and   edx, 56        ; 00000038H
  004ad 80 fa 30   cmp   dl, 48         ; 00000030H
  004b0 75 36      jne   SHORT $L54996

Using 0xfff8, or even 0xf8 for instance has the drawback of 3 byte larger opcode, since they are no longer in the -128...+127 range for shorter x86 opcode.

For clearing least significant bits, 'and' with -2, -4, -8, -16 ... is more common, since they are independent on the width of the register. It works with signed/unsigned char, short, int and long long.

For bitmasks or sets like bitboards hex-constants have the advantage to easier see the bit-pattern, since each hex-digit covers exactly four bits.

Code: Select all

hex binary
0x0 0000B
0x1 0001B
0x2 0010B
0x3 0011B
0x4 0100B
0x5 0101B
0x6 0110B
0x7 0111B
0x8 1000B
0x9 1001B
0xA 1010B
0xB 1011B
0xC 1100B
0xD 1101B
0xE 1110B
0xF 1111B

Two hex-digits cover a byte or rank with bitboards. Thus, 0x0102040810204080 versus 72624976668147840 is probably more intuitive

.

Uri Blass · Post by **Uri Blass** » Sat Jan 12, 2008 11:07 pm

Gerd Isenberg wrote:It is a minor issue and might depend on the compiler. Some compiler seem to produce better code with native register width constants - even if encoded as sign extended byte inside the opcode.

0xfffffff8 or ~7 or -8 is used to mask off the three file bits, to look whether the square is on the seventh rank.
Code: Select all
; 110  :     if ((square & 0xFFFFFFF8) == 0x30) {    // 7.th rank 
  004b5	83 e6 f8   and   esi, -8	; fffffff8H
  004b8	83 fe 30   cmp   esi, 48	; 00000030H
  004bb	75 30      jne   SHORT $L54996
If one uses 0x38 or 56, the vc6 compiler uses more constrained register allocation to take "advantage" of partial x86 byte-registers for the compare, which seems microscopic worse.
Code: Select all
; 110  :     if ((square & 0x38) == 0x30) {    // 7.th rank
  004a8 8b d7      mov   edx, edi
  004aa 83 e2 38   and   edx, 56        ; 00000038H
  004ad 80 fa 30   cmp   dl, 48         ; 00000030H
  004b0 75 36      jne   SHORT $L54996
Using 0xfff8, or even 0xf8 for instance has the drawback of 3 byte larger opcode, since they are no longer in the -128...+127 range for shorter x86 opcode.

For clearing least significant bits, 'and' with -2, -4, -8, -16 ... is more common, since they are independent on the width of the register. It works with signed/unsigned char, short, int and long long.

For bitmasks or sets like bitboards hex-constants have the advantage to easier see the bit-pattern, since each hex-digit covers exactly four bits.
Code: Select all
hex binary
0x0 0000B
0x1 0001B
0x2 0010B
0x3 0011B
0x4 0100B
0x5 0101B
0x6 0110B
0x7 0111B
0x8 1000B
0x9 1001B
0xA 1010B
0xB 1011B
0xC 1100B
0xD 1101B
0xE 1110B
0xF 1111B
Two hex-digits cover a byte or rank with bitboards. Thus, 0x0102040810204080 versus 72624976668147840 is probably more intuitive .

I totally agree that for big numbers 0x is more intuitive.
My conclusion is that it is best to replace the 0xfffffff8 simply by ~7 and get code like
if ((square & ~7) == 48)

It seems strange to me that
(square & ~7) == 48 is slightly faster than (square&56)==48 because without knowing in the first place that 0<=square<=63
I need to check every bit of square to calculate square&~7 when for square&56 I can ignore most bits.

Edit:note that I understood that computers use strange logic when number of bits that they need to calculate is not important for them.

I wonder if this is because of non optimal decision in building hardware and
I also wonder if the compiler has no way to earn speed from the knowledge that 0<=square<=63

Uri

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Jan 12, 2008 11:44 pm

Uri Blass wrote: I totally agree that for big numbers 0x is more intuitive.
My conclusion is that it is best to replace the 0xfffffff8 simply by ~7 and get code like
if ((square & ~7) == 48)

Yes. I somehow faster map -8 to clear three bits - a matter of familiarization.
0xfffffff8 might be used by a program that translates opcode of an executable to C-code

It seems strange to me that
(square & ~7) == 48 is slightly faster than (square&56)==48 because without knowing in the first place that 0<=square<=63
I need to check every bit of square to calculate square&~7 when for square&56 I can ignore most bits.

Weird compiler behavior. In assemby it is the same:

Code: Select all

004b5   83 e6 f8   and   esi, -8  ; fffffff8H
004b8   83 fe 30   cmp   esi, 48  ; 00000030H 

004b5   83 e6 38   and   esi, 56  ; 00000038H
004b8   83 fe 30   cmp   esi, 48  ; 00000030H

I wonder if this is because of non optimal decision in building hardware and
I also wonder if the compiler has no way to earn speed from the knowledge that 0<=square<=63
Uri

Compiler are not aware of reduced value ranges at compile time, that is to understand the semantics of values, that int first_one leaves only a range between 0..63. One may declare first_one as unsigned char, but todays compiler work best with native 32-bit register sizes. And using char for square parameter and variables is likely slower. Using byte-registers has the drawback of possible partial register stalls by hardware of recent processors.

Code: Select all

mov ah, dl
mov al, dh ; stalls on ah since al and ah are part of eax

Gerd Isenberg · Post by **Gerd Isenberg** » Sun Jan 13, 2008 12:04 am

In assembly I would use byte-registers for that stuff, specially shortest opcode with the old 8080 accumulator AL - only six bytes for and, cmp and condional jump. Whether this is the fastest code on recent processors is another question.

Code: Select all

24 38    AND AL, 56
3C 30    CMP AL, 48
75 xx    JNZ short label

bob · Post by **bob** » Sun Jan 13, 2008 4:19 am

Uri Blass wrote:
Dirt wrote:
Uri Blass wrote:Strelkla include the following code

if ((square & 0xFFFFFFF8) == 0x30)

square always get numbers that are 0-63 based on my understanding so
This code is practically equivalent to
if ((square & 56) == 48)

My question is if there is speed reason to use FFFF numbers because before changing the code of strelka to be more simple I would like to know if there is a speed reason for decisions that I do not understand.

Uri
You find
if ((square & 56) == 48)
simpler? I find myself immediately converting it to hex to try and understand it. I might prefer 0xFFF8, if it really works the same.
I usually use base 10 and not base 16 and I prefer to see numbers in base 10 when they are small(and I consider 48 as small) but this is not the only point.

1)You can use 0x38 or 0xF8 and no reason to use many F's and my question is if there is a speed reason not to do it.

2)It is easy for me to translate 56 to 32+16+8 and translating 0xFFF8 is less intuitive for me(I am not used to divide number in base 16 to sum of power of 2 and only if the numbers are really big it is more simple for me).

I can see that square&56=48 means 7th rank more easily.

My thinking is the following:

square&56=48 means square&32=32 square&16=16 square&8=0
1)0<=square<=63
2)square&32=32
3)square&16=16
4)square&8=0

1,2->
5)63>=square>=32
3,5->
6)63>=square>=48

4,6->
7)56>square>=48 or in other words square is in the 7th rank

Uri

I personally like the hex way better. square & fff8 simply extracts the rank part of the square, leaving the file part off. Makes it easy to check for the 7th rank, for example, which is what square & fff8 == 48 is doing...

I find that very easy to read as opposed to base-10 numbers, because the & is a bitwise operator and hex makes it easier to visualize the bits and how they will be paired up for the AND...

speed question about strelka

speed question about strelka

Re: speed question about strelka

Re: speed question about strelka

Re: speed question about strelka

Re: speed question about strelka

Re: speed question about strelka

Re: speed question about strelka

Re: speed question about strelka