Strelkla include the following code
if ((square & 0xFFFFFFF8) == 0x30)
square always get numbers that are 0-63 based on my understanding so
This code is practically equivalent to
if ((square & 56) == 48)
My question is if there is speed reason to use FFFF numbers because before changing the code of strelka to be more simple I would like to know if there is a speed reason for decisions that I do not understand.
Uri
speed question about strelka
Moderator: Ras
-
- Posts: 10790
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
-
- Posts: 2851
- Joined: Wed Mar 08, 2006 10:01 pm
- Location: Irvine, CA, USA
Re: speed question about strelka
You findUri Blass wrote:Strelkla include the following code
if ((square & 0xFFFFFFF8) == 0x30)
square always get numbers that are 0-63 based on my understanding so
This code is practically equivalent to
if ((square & 56) == 48)
My question is if there is speed reason to use FFFF numbers because before changing the code of strelka to be more simple I would like to know if there is a speed reason for decisions that I do not understand.
Uri
if ((square & 56) == 48)
simpler? I find myself immediately converting it to hex to try and understand it. I might prefer 0xFFF8, if it really works the same.
-
- Posts: 10790
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: speed question about strelka
I usually use base 10 and not base 16 and I prefer to see numbers in base 10 when they are small(and I consider 48 as small) but this is not the only point.Dirt wrote:You findUri Blass wrote:Strelkla include the following code
if ((square & 0xFFFFFFF8) == 0x30)
square always get numbers that are 0-63 based on my understanding so
This code is practically equivalent to
if ((square & 56) == 48)
My question is if there is speed reason to use FFFF numbers because before changing the code of strelka to be more simple I would like to know if there is a speed reason for decisions that I do not understand.
Uri
if ((square & 56) == 48)
simpler? I find myself immediately converting it to hex to try and understand it. I might prefer 0xFFF8, if it really works the same.
1)You can use 0x38 or 0xF8 and no reason to use many F's and my question is if there is a speed reason not to do it.
2)It is easy for me to translate 56 to 32+16+8 and translating 0xFFF8 is less intuitive for me(I am not used to divide number in base 16 to sum of power of 2 and only if the numbers are really big it is more simple for me).
I can see that square&56=48 means 7th rank more easily.
My thinking is the following:
square&56=48 means square&32=32 square&16=16 square&8=0
1)0<=square<=63
2)square&32=32
3)square&16=16
4)square&8=0
1,2->
5)63>=square>=32
3,5->
6)63>=square>=48
4,6->
7)56>square>=48 or in other words square is in the 7th rank
Uri
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: speed question about strelka
It is a minor issue and might depend on the compiler. Some compiler seem to produce better code with native register width constants - even if encoded as sign extended byte inside the opcode.
0xfffffff8 or ~7 or -8 is used to mask off the three file bits, to look whether the square is on the seventh rank.
If one uses 0x38 or 56, the vc6 compiler uses more constrained register allocation to take "advantage" of partial x86 byte-registers for the compare, which seems microscopic worse.
Using 0xfff8, or even 0xf8 for instance has the drawback of 3 byte larger opcode, since they are no longer in the -128...+127 range for shorter x86 opcode.
For clearing least significant bits, 'and' with -2, -4, -8, -16 ... is more common, since they are independent on the width of the register. It works with signed/unsigned char, short, int and long long.
For bitmasks or sets like bitboards hex-constants have the advantage to easier see the bit-pattern, since each hex-digit covers exactly four bits.
Two hex-digits cover a byte or rank with bitboards. Thus, 0x0102040810204080 versus 72624976668147840 is probably more intuitive
.
0xfffffff8 or ~7 or -8 is used to mask off the three file bits, to look whether the square is on the seventh rank.
Code: Select all
; 110 : if ((square & 0xFFFFFFF8) == 0x30) { // 7.th rank
004b5 83 e6 f8 and esi, -8 ; fffffff8H
004b8 83 fe 30 cmp esi, 48 ; 00000030H
004bb 75 30 jne SHORT $L54996
Code: Select all
; 110 : if ((square & 0x38) == 0x30) { // 7.th rank
004a8 8b d7 mov edx, edi
004aa 83 e2 38 and edx, 56 ; 00000038H
004ad 80 fa 30 cmp dl, 48 ; 00000030H
004b0 75 36 jne SHORT $L54996
For clearing least significant bits, 'and' with -2, -4, -8, -16 ... is more common, since they are independent on the width of the register. It works with signed/unsigned char, short, int and long long.
For bitmasks or sets like bitboards hex-constants have the advantage to easier see the bit-pattern, since each hex-digit covers exactly four bits.
Code: Select all
hex binary
0x0 0000B
0x1 0001B
0x2 0010B
0x3 0011B
0x4 0100B
0x5 0101B
0x6 0110B
0x7 0111B
0x8 1000B
0x9 1001B
0xA 1010B
0xB 1011B
0xC 1100B
0xD 1101B
0xE 1110B
0xF 1111B

-
- Posts: 10790
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: speed question about strelka
I totally agree that for big numbers 0x is more intuitive.Gerd Isenberg wrote:It is a minor issue and might depend on the compiler. Some compiler seem to produce better code with native register width constants - even if encoded as sign extended byte inside the opcode.
0xfffffff8 or ~7 or -8 is used to mask off the three file bits, to look whether the square is on the seventh rank.If one uses 0x38 or 56, the vc6 compiler uses more constrained register allocation to take "advantage" of partial x86 byte-registers for the compare, which seems microscopic worse.Code: Select all
; 110 : if ((square & 0xFFFFFFF8) == 0x30) { // 7.th rank 004b5 83 e6 f8 and esi, -8 ; fffffff8H 004b8 83 fe 30 cmp esi, 48 ; 00000030H 004bb 75 30 jne SHORT $L54996
Using 0xfff8, or even 0xf8 for instance has the drawback of 3 byte larger opcode, since they are no longer in the -128...+127 range for shorter x86 opcode.Code: Select all
; 110 : if ((square & 0x38) == 0x30) { // 7.th rank 004a8 8b d7 mov edx, edi 004aa 83 e2 38 and edx, 56 ; 00000038H 004ad 80 fa 30 cmp dl, 48 ; 00000030H 004b0 75 36 jne SHORT $L54996
For clearing least significant bits, 'and' with -2, -4, -8, -16 ... is more common, since they are independent on the width of the register. It works with signed/unsigned char, short, int and long long.
For bitmasks or sets like bitboards hex-constants have the advantage to easier see the bit-pattern, since each hex-digit covers exactly four bits.Two hex-digits cover a byte or rank with bitboards. Thus, 0x0102040810204080 versus 72624976668147840 is probably more intuitiveCode: Select all
hex binary 0x0 0000B 0x1 0001B 0x2 0010B 0x3 0011B 0x4 0100B 0x5 0101B 0x6 0110B 0x7 0111B 0x8 1000B 0x9 1001B 0xA 1010B 0xB 1011B 0xC 1100B 0xD 1101B 0xE 1110B 0xF 1111B
.
My conclusion is that it is best to replace the 0xfffffff8 simply by ~7 and get code like
if ((square & ~7) == 48)
It seems strange to me that
(square & ~7) == 48 is slightly faster than (square&56)==48 because without knowing in the first place that 0<=square<=63
I need to check every bit of square to calculate square&~7 when for square&56 I can ignore most bits.
Edit:note that I understood that computers use strange logic when number of bits that they need to calculate is not important for them.
I wonder if this is because of non optimal decision in building hardware and
I also wonder if the compiler has no way to earn speed from the knowledge that 0<=square<=63
Uri
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: speed question about strelka
Yes. I somehow faster map -8 to clear three bits - a matter of familiarization.Uri Blass wrote: I totally agree that for big numbers 0x is more intuitive.
My conclusion is that it is best to replace the 0xfffffff8 simply by ~7 and get code like
if ((square & ~7) == 48)
0xfffffff8 might be used by a program that translates opcode of an executable to C-code

Weird compiler behavior. In assemby it is the same:It seems strange to me that
(square & ~7) == 48 is slightly faster than (square&56)==48 because without knowing in the first place that 0<=square<=63
I need to check every bit of square to calculate square&~7 when for square&56 I can ignore most bits.
Code: Select all
004b5 83 e6 f8 and esi, -8 ; fffffff8H
004b8 83 fe 30 cmp esi, 48 ; 00000030H
004b5 83 e6 38 and esi, 56 ; 00000038H
004b8 83 fe 30 cmp esi, 48 ; 00000030H
Compiler are not aware of reduced value ranges at compile time, that is to understand the semantics of values, that int first_one leaves only a range between 0..63. One may declare first_one as unsigned char, but todays compiler work best with native 32-bit register sizes. And using char for square parameter and variables is likely slower. Using byte-registers has the drawback of possible partial register stalls by hardware of recent processors.I wonder if this is because of non optimal decision in building hardware and
I also wonder if the compiler has no way to earn speed from the knowledge that 0<=square<=63
Uri
Code: Select all
mov ah, dl
mov al, dh ; stalls on ah since al and ah are part of eax
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: speed question about strelka
In assembly I would use byte-registers for that stuff, specially shortest opcode with the old 8080 accumulator AL - only six bytes for and, cmp and condional jump. Whether this is the fastest code on recent processors is another question.
Code: Select all
24 38 AND AL, 56
3C 30 CMP AL, 48
75 xx JNZ short label
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: speed question about strelka
I personally like the hex way better. square & fff8 simply extracts the rank part of the square, leaving the file part off. Makes it easy to check for the 7th rank, for example, which is what square & fff8 == 48 is doing...Uri Blass wrote:I usually use base 10 and not base 16 and I prefer to see numbers in base 10 when they are small(and I consider 48 as small) but this is not the only point.Dirt wrote:You findUri Blass wrote:Strelkla include the following code
if ((square & 0xFFFFFFF8) == 0x30)
square always get numbers that are 0-63 based on my understanding so
This code is practically equivalent to
if ((square & 56) == 48)
My question is if there is speed reason to use FFFF numbers because before changing the code of strelka to be more simple I would like to know if there is a speed reason for decisions that I do not understand.
Uri
if ((square & 56) == 48)
simpler? I find myself immediately converting it to hex to try and understand it. I might prefer 0xFFF8, if it really works the same.
1)You can use 0x38 or 0xF8 and no reason to use many F's and my question is if there is a speed reason not to do it.
2)It is easy for me to translate 56 to 32+16+8 and translating 0xFFF8 is less intuitive for me(I am not used to divide number in base 16 to sum of power of 2 and only if the numbers are really big it is more simple for me).
I can see that square&56=48 means 7th rank more easily.
My thinking is the following:
square&56=48 means square&32=32 square&16=16 square&8=0
1)0<=square<=63
2)square&32=32
3)square&16=16
4)square&8=0
1,2->
5)63>=square>=32
3,5->
6)63>=square>=48
4,6->
7)56>square>=48 or in other words square is in the 7th rank
Uri
I find that very easy to read as opposed to base-10 numbers, because the & is a bitwise operator and hex makes it easier to visualize the bits and how they will be paired up for the AND...