Gerd Isenberg
Joined: 08 Mar 2006 Posts: 1787 Location: Hattingen, Germany
|
Post subject: Re: Resource for bit twiddlers Posted: Mon May 28, 2007 4:35 pm |
|
|
The vc2005 compiler surprisingly was not aware of the
A ^ (~A & B) == A | B simplification.
| Code: |
?eastAttacks@@YA_K_K0@Z PROC
00000 49 b9 80 80 80
80 80 80 80 80 mov r9, 8080808080808080H
0000a 48 f7 d2 not rdx
0000d 49 b8 7f 7f 7f
7f 7f 7f 7f 7f mov r8, 7f7f7f7f7f7f7f7fH
00017 48 0b d1 or rdx, rcx
0001a 48 8b c2 mov rax, rdx
0001d 48 33 c1 xor rax, rcx
00020 49 23 c8 and rcx, r8
00023 49 0b c1 or rax, r9
00026 48 2b c1 sub rax, rcx
00029 48 8b ca mov rcx, rdx
0002c 48 f7 d1 not rcx
0002f 49 23 c9 and rcx, r9
00032 48 33 c1 xor rax, rcx
00035 48 33 c2 xor rax, rdx
00038 c3 ret 0
|
| Code: |
rooks$ = 8
empty$ = 16
?eastAttacks@@YA_K_K0@Z PROC
00000 48 f7 d2 not rdx
00003 49 b9 80 80 80
80 80 80 80 80 mov r9, 8080808080808080H
0000d 49 b8 7f 7f 7f
7f 7f 7f 7f 7f mov r8, 7f7f7f7f7f7f7f7fH
00017 48 0b d1 or rdx, rcx
0001a 48 8b c2 mov rax, rdx
0001d 49 0b d1 or rdx, r9
00020 48 33 c1 xor rax, rcx
00023 49 23 c8 and rcx, r8
00026 49 0b c1 or rax, r9
00029 48 2b c1 sub rax, rcx
0002c 48 33 c2 xor rax, rdx
0002f c3 ret 0
|
This one with a De Morgan and -i = ~i+1 transformation takes one register and a few code bytes less. The additional +1 is done together with the add by the AGU instaed of the ALU, performing a lea-instruction.
| Code: |
u64 eastAttacks(u64 rooks, u64 empty)
{
const u64 H = 0x8080808080808080;
u64 occInclRook = rooks | ~empty;
u64 occExclRook = rooks ^ occInclRook;
u64 attacks = ((occExclRook | H) + (~rooks | H) + 1)
^ (occInclRook | H);
return attacks;
}
|
| Code: |
rooks$ = 8
empty$ = 16
?eastAttacks@@YA_K_K0@Z PROC
00000 49 b8 80 80 80
80 80 80 80 80 mov r8, 8080808080808080H
0000a 48 f7 d2 not rdx
0000d 48 0b d1 or rdx, rcx
00010 48 8b c2 mov rax, rdx
00013 49 0b d0 or rdx, r8
00016 48 33 c1 xor rax, rcx
00019 48 f7 d1 not rcx
0001c 49 0b c0 or rax, r8
0001f 49 0b c8 or rcx, r8
00022 48 8d 44 08 01 lea rax, QWORD PTR [rax+rcx+1]
00027 48 33 c2 xor rax, rdx
0002a c3 ret 0
|
|
|