Reversing bits in a 32bit word
Moderator: Ras
-
- Posts: 363
- Joined: Thu Mar 16, 2006 7:39 pm
- Location: Portugal
- Full name: Alvaro Cardoso
Reversing bits in a 32bit word
Hi,
what is the fastest way to reverse the bits in a 32bit int?
For example bit 0 goes to bit 31, bit 1 goes to 30, bit 12 goes to 19, etc.
Gerd?
Thanks in advance,
Alvaro
what is the fastest way to reverse the bits in a 32bit int?
For example bit 0 goes to bit 31, bit 1 goes to 30, bit 12 goes to 19, etc.
Gerd?
Thanks in advance,
Alvaro
Re: Reversing bits in a 32bit word
http://www.cs.utk.edu/~vose/c-stuff/bithacks.html
unsigned int v; // reverse the bits in this
unsigned int t = v << 1; // t will have the reversed bits of v
int i;
v >>= 1;
for (i = sizeof(v) * 8 - 2; i; i--)
{
t |= v & 1;
t <<= 1;
v >>= 1;
}
t |= v;
Reverse an N-bit quantity in parallel in 5 * lg(N) operations:
unsigned int v; // 32 bit word to reverse bit order
int const S[] = {1, 2, 4, 8, 16}; // Magic Binary Numbers
int const B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};
v = ((v >> S[0]) & B[0]) | ((v << S[0]) & ~B[0]); // swap odd and even bits
v = ((v >> S[1]) & B[1]) | ((v << S[1]) & ~B[1]); // swap consecutive pairs
v = ((v >> S[2]) & B[2]) | ((v << S[2]) & ~B[2]); // swap nibbles ...
v = ((v >> S[3]) & B[3]) | ((v << S[3]) & ~B[3]);
v = ((v >> S[4]) & B[4]) | ((v << S[4]) & ~B[4]);
-Sean Eron Anderson
Helpful?
unsigned int v; // reverse the bits in this
unsigned int t = v << 1; // t will have the reversed bits of v
int i;
v >>= 1;
for (i = sizeof(v) * 8 - 2; i; i--)
{
t |= v & 1;
t <<= 1;
v >>= 1;
}
t |= v;
Reverse an N-bit quantity in parallel in 5 * lg(N) operations:
unsigned int v; // 32 bit word to reverse bit order
int const S[] = {1, 2, 4, 8, 16}; // Magic Binary Numbers
int const B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};
v = ((v >> S[0]) & B[0]) | ((v << S[0]) & ~B[0]); // swap odd and even bits
v = ((v >> S[1]) & B[1]) | ((v << S[1]) & ~B[1]); // swap consecutive pairs
v = ((v >> S[2]) & B[2]) | ((v << S[2]) & ~B[2]); // swap nibbles ...
v = ((v >> S[3]) & B[3]) | ((v << S[3]) & ~B[3]);
v = ((v >> S[4]) & B[4]) | ((v << S[4]) & ~B[4]);
-Sean Eron Anderson
Helpful?
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Reversing bits in a 32bit word
Hi Alvaro,Cardoso wrote:Hi,
what is the fastest way to reverse the bits in a 32bit int?
For example bit 0 goes to bit 31, bit 1 goes to 30, bit 12 goes to 19, etc.
Gerd?
Thanks in advance,
Alvaro
Ok, 32 bit is not exactly mentioned in
http://chessprogramming.wikispaces.com/ ... d+Rotating,
but the principle is the same. Five delta swaps, swapping bits, duos, nibbles, bytes and words:
Code: Select all
U32 reverse (U32 x) {
x = ((x >> 1) & 0x55555555) | ((x & 0x55555555) << 1);
x = ((x >> 2) & 0x33333333) | ((x & 0x33333333) << 2);
x = ((x >> 4) & 0x0f0f0f0f) | ((x & 0x0f0f0f0f) << 4);
x = ((x >> 8) & 0x00ff00ff) | ((x & 0x00ff00ff) << 8);
return (x >> 16) | (x << 16);
}
Code: Select all
U32 reverse (U32 x) {
x ^= 0x0f0f0f0f & (x ^ _rotl(x, 8));
x ^= 0x33333333 & (x ^ _rotl(x, 4));
x ^= 0x55555555 & (x ^ _rotl(x, 2));
return _byteswap_ulong(_rotr(x, 7));
}
Code: Select all
x = reverseWord[x>>16] | (reverseWord[x & 0xffff] << 16);
Gerd
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Reversing bits in a 32bit word
Probably this is even faster, due to more parallelism and replacing 'or' by 'add', taking advantage of x86 lea for mul 2,4:Gerd Isenberg wrote: With x86 32-bit bswap/rotate intrinsics this one is probably fastest with 14 dependent ops:Code: Select all
U32 reverse (U32 x) { x ^= 0x0f0f0f0f & (x ^ _rotl(x, 8)); x ^= 0x33333333 & (x ^ _rotl(x, 4)); x ^= 0x55555555 & (x ^ _rotl(x, 2)); return _byteswap_ulong(_rotr(x, 7)); }
Code: Select all
U32 reverse (U32 x) {
x = ((x >> 1) & 0x55555555) + 2*(x & 0x55555555);
x = ((x >> 2) & 0x33333333) + 4*(x & 0x33333333);
x = ((x >> 4) & 0x0f0f0f0f) + 16*(x & 0x0f0f0f0f);
return _byteswap_ulong(x);
}
Code: Select all
mov eax, ecx
shr eax, 1
and ecx, 0x55555555
and eax, 0x55555555
lea eax, [eax + 2*ecx]
mov ecx, eax
shr eax, 2
and ecx, 0x33333333
and eax, 0x33333333
lea eax, [eax + 4*ecx]
mov ecx, eax
shr eax, 4
and ecx, 0x0f0f0f0f
and eax, 0x0f0f0f0f
shl ecx, 4
add eax, ecx
bswap eax
-
- Posts: 363
- Joined: Thu Mar 16, 2006 7:39 pm
- Location: Portugal
- Full name: Alvaro Cardoso
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Reversing bits in a 32bit word
The const-array declaration is fine. If declared local, as shift, factor and const mask by two-adic fractions of -1 (see macro M(a)).Hart wrote: Reverse an N-bit quantity in parallel in 5 * lg(N) operations:
unsigned int v; // 32 bit word to reverse bit order
int const S[] = {1, 2, 4, 8, 16}; // Magic Binary Numbers
int const B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};
v = ((v >> S[0]) & B[0]) | ((v << S[0]) & ~B[0]); // swap odd and even bits
v = ((v >> S[1]) & B[1]) | ((v << S[1]) & ~B[1]); // swap consecutive pairs
v = ((v >> S[2]) & B[2]) | ((v << S[2]) & ~B[2]); // swap nibbles ...
v = ((v >> S[3]) & B[3]) | ((v << S[3]) & ~B[3]);
v = ((v >> S[4]) & B[4]) | ((v << S[4]) & ~B[4]);
The loop is unrolled and all constants are resolved by the (vc2005) compiler. The generated 64-bit assembly looks optimal. Of course "reverse" can be done faster by mirrorHorizontal and flipvertical aka 64-bit bswap.
Code: Select all
#define S(a) ( C64( 1) << (a) )
#define M(a) ( C64(-1) / ((a)+1) )
U64 mirrorHorizontal(U64 b) {
const int s[]= {S( 0 ), S( 1 ), S( 2 )};
const U64 f[]= {S(s[0]), S(s[1]), S(s[2])};
const U64 k[]= {M(f[0]), M(f[1]), M(f[2])};
for (U32 i = 0; i < sizeof(s)/sizeof(s[0]); i++)
b = ((b >> s[i]) & k[i]) + f[i]*(b & k[i]);
return b;
}
U64 reverse(U64 b) {
const int s[]= {S( 0 ), S( 1 ), S( 2 ), S( 3 ), S( 4 ), S( 5 )};
const U64 f[]= {S(s[0]), S(s[1]), S(s[2]), S(s[3]), S(s[4]), S(s[5])};
const U64 k[]= {M(f[0]), M(f[1]), M(f[2]), M(f[3]), M(f[4]), M(f[5])};
for (U32 i = 0; i < sizeof(s)/sizeof(s[0]); i++)
b = ((b >> s[i]) & k[i]) + f[i]*(b & k[i]);
return b;
}
Code: Select all
#define InitS(3)
=> {S( 0 ), S( 1 ), S( 2 )}
#define InitS(6)
=> {S( 0 ), S( 1 ), S( 2 ), S( 3 ), S( 4 ), S( 5 )}
Code: Select all
b$ = 8
?mirrorHorizontal@@ PROC
00000 48 8b c1 mov rax, rcx
00003 48 ba 55 55 55
55 55 55 55 55 mov rdx, 5555555555555555H
0000d 48 d1 e8 shr rax, 1
00010 48 23 ca and rcx, rdx
00013 48 23 c2 and rax, rdx
00016 48 ba 33 33 33
33 33 33 33 33 mov rdx, 3333333333333333H
00020 48 8d 0c 48 lea rcx, QWORD PTR [rax+rcx*2]
00024 48 8b c1 mov rax, rcx
00027 48 23 ca and rcx, rdx
0002a 48 c1 e8 02 shr rax, 2
0002e 48 23 c2 and rax, rdx
00031 48 8d 14 88 lea rdx, QWORD PTR [rax+rcx*4]
00035 48 b9 0f 0f 0f
0f 0f 0f 0f 0f mov rcx, 0f0f0f0f0f0f0f0fH
0003f 48 8b c2 mov rax, rdx
00042 48 c1 ea 04 shr rdx, 4
00046 48 23 c1 and rax, rcx
00049 48 23 d1 and rdx, rcx
0004c 48 c1 e0 04 shl rax, 4
00050 48 03 c2 add rax, rdx
00053 c3 ret 0
?mirrorHorizontal@@ ENDP
b$ = 8
?reverse@@ PROC
00000 48 8b c1 mov rax, rcx
00003 48 ba 55 55 55
55 55 55 55 55 mov rdx, 5555555555555555H
0000d 48 23 ca and rcx, rdx
00010 48 d1 e8 shr rax, 1
00013 48 23 c2 and rax, rdx
00016 48 ba 33 33 33
33 33 33 33 33 mov rdx, 3333333333333333H
00020 48 8d 0c 48 lea rcx, QWORD PTR [rax+rcx*2]
00024 48 8b c1 mov rax, rcx
00027 48 23 ca and rcx, rdx
0002a 48 c1 e8 02 shr rax, 2
0002e 48 23 c2 and rax, rdx
00031 48 8d 14 88 lea rdx, QWORD PTR [rax+rcx*4]
00035 48 b9 0f 0f 0f
0f 0f 0f 0f 0f mov rcx, 0f0f0f0f0f0f0f0fH
0003f 48 8b c2 mov rax, rdx
00042 48 c1 ea 04 shr rdx, 4
00046 48 23 c1 and rax, rcx
00049 48 23 d1 and rdx, rcx
0004c 48 b9 ff 00 ff
00 ff 00 ff 00 mov rcx, 00ff00ff00ff00ffH
00056 48 c1 e0 04 shl rax, 4
0005a 48 03 c2 add rax, rdx
0005d 48 8b d0 mov rdx, rax
00060 48 c1 e8 08 shr rax, 8
00064 48 23 c1 and rax, rcx
00067 48 23 d1 and rdx, rcx
0006a 48 c1 e2 08 shl rdx, 8
0006e 48 03 d0 add rdx, rax
00071 48 b8 ff ff 00
00 ff ff 00 00 mov rax, 0000ffff0000ffffH
0007b 48 8b ca mov rcx, rdx
0007e 48 23 d0 and rdx, rax
00081 48 c1 e9 10 shr rcx, 16
00085 48 c1 e2 10 shl rdx, 16
00089 48 23 c8 and rcx, rax
0008c 48 03 ca add rcx, rdx
0008f 8b c1 mov eax, ecx
00091 48 c1 e9 20 shr rcx, 32
00095 48 c1 e0 20 shl rax, 32
00099 48 03 c1 add rax, rcx
0009c c3 ret 0
?reverse@@ ENDP
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Reversing bits in a 32bit word
Nonsense, no need for ugly macros. All constants are calculated at compile time while unrolling the loops - same assembly as mentioned. One may even pass integer template parameter.Gerd Isenberg wrote: How can we define some clever text-concatination macros to make the above array-initializations dynamic on the passed number of elements?Code: Select all
#define InitS(3) => {S( 0 ), S( 1 ), S( 2 )} #define InitS(6) => {S( 0 ), S( 1 ), S( 2 ), S( 3 ), S( 4 ), S( 5 )}
Code: Select all
U64 mirrorHorizontal(U64 b) {
for (U32 i = 0; i < 3; i++) {
const int s( 1 << i);
const U64 f( C64( 1) << s);
const U64 k( C64(-1) / (f+1) );
b = ((b >> s) & k) + f*(b & k);
}
return b;
}
U64 flipVertical(U64 b) {
for (U32 i = 3; i < 6; i++) {
const int s( 1 << i);
const U64 f( C64( 1) << s);
const U64 k( C64(-1) / (f+1) );
b = ((b >> s) & k) + f*(b & k);
}
return b;
}
U64 reverse(U64 b) {
for (U32 i = 0; i < 6; i++) {
const int s( 1 << i);
const U64 f( C64( 1) << s);
const U64 k( C64(-1) / (f+1) );
b = ((b >> s) & k) + f*(b & k);
}
return b;
}
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Reversing bits in a 32bit word
Do you ever worry that your head is going to explode or something???Gerd Isenberg wrote:Nonsense, no need for ugly macros. All constants are calculated at compile time while unrolling the loops - same assembly as mentioned. One may even pass integer template parameter.Gerd Isenberg wrote: How can we define some clever text-concatination macros to make the above array-initializations dynamic on the passed number of elements?Code: Select all
#define InitS(3) => {S( 0 ), S( 1 ), S( 2 )} #define InitS(6) => {S( 0 ), S( 1 ), S( 2 ), S( 3 ), S( 4 ), S( 5 )}
Code: Select all
U64 mirrorHorizontal(U64 b) { for (U32 i = 0; i < 3; i++) { const int s( 1 << i); const U64 f( C64( 1) << s); const U64 k( C64(-1) / (f+1) ); b = ((b >> s) & k) + f*(b & k); } return b; } U64 flipVertical(U64 b) { for (U32 i = 3; i < 6; i++) { const int s( 1 << i); const U64 f( C64( 1) << s); const U64 k( C64(-1) / (f+1) ); b = ((b >> s) & k) + f*(b & k); } return b; } U64 reverse(U64 b) { for (U32 i = 0; i < 6; i++) { const int s( 1 << i); const U64 f( C64( 1) << s); const U64 k( C64(-1) / (f+1) ); b = ((b >> s) & k) + f*(b & k); } return b; }
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Reversing bits in a 32bit word
If you can spare 128 KB of memory, try a lookup table:
Code: Select all
unsigned short int RevBitsVec[65536]; // Needs initialization
unsigned int Rev32(const unsigned int UnRev32)
{
return
RevBitsVec[UnRev32 & 0xffff] << 16 |
RevBitsVec[(UnRev32 >> 16) & 0xffff];
}
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Reversing bits in a 32bit word
No need to worrybob wrote:Do you ever worry that your head is going to explode or something???

Just happy to understand some simple math. The compile time division of -1 by power of two plus one, to get all the masks for a general delta-swap formula, to swap bits, pairs of bits, nibbles, bytes, words and dwords. I was quite amazed by the compiler abilities to unroll the loop to get that optimal assembly to mirror, flip or reverse bitboards (or any integer).
Code: Select all
s[i] := 1<<i
f[i] := 1<<s[i]
k[i] := (unsigned)-1 / (f[i] + 1)
i = 0
s[i] = 1
f[i] = 2
k[i] = 0xffffffffffffffff / 3
k[i] = 0x5555555555555555
i = 1
s[i] = 2
f[i] = 4
k[i] = 0xffffffffffffffff / 5
k[i] = 0x3333333333333333
i = 2
s[i] = 4
f[i] = 16
k[i] = 0xffffffffffffffff / 0x11
k[i] = 0x0f0f0f0f0f0f0f0f
i = 3
s[i] = 8
f[i] = 256
k[i] = 0xffffffffffffffff / 0x101
k[i] = 0x00ff00ff00ff00ff
i = 4
s[i] = 16
f[i] = 65536
k[i] = 0xffffffffffffffff / 0x10001
k[i] = 0x0000ffff0000ffff
i = 5
s[i] = 32
f[i] = 4294967296
k[i] = 0xffffffffffffffff / 0x100000001
k[i] = 0x00000000ffffffff
Code: Select all
U64 mirrorHorizontal(U64 b) {
for (U32 i = 0; i < 3; i++) {
const int s( 1 << i);
const U64 f( C64( 1) << s);
const U64 k( C64(-1) / (f+1) );
b = ((b >> s) & k) + f*(b & k);
}
return b;
}
Code: Select all
U64 mirrorHorizontal(U64 b) {
b = ((b >> 1) & 0x5555555555555555) + 2*(b & 0x5555555555555555);
b = ((b >> 2) & 0x3333333333333333) + 4*(b & 0x3333333333333333);
b = ((b >> 4) & 0x0f0f0f0f0f0f0f0f) + 16*(b & 0x0f0f0f0f0f0f0f0f);
return b;
}
Code: Select all
zobrist[whitepiece][sq] := bswap( zobrist[blackpiece][sq^56].
As an intended side effect, I'll gain more hash-hits in white/black symmetrical positions

I was also thinking to conditionally mirror the board horizontally, to keep the white king always on e-h files. But that seems actually to expensive until future x86 will have a PPERM instruction (introduced by AMDs SSE5 instruction set) to reverse vectors of two or four bitboards in one instruction.