Need help with g++ inline assembly

sje · Post by **sje** » Wed Mar 12, 2008 2:09 am

When compiling using g++ 4.x for a 32 bit x86 target, I use the following code for FindFirstZero:

#if &#40;CTHostCpuX86 && CTArchBits32 && CTAllowAssembly&#41;
static __inline__ unsigned int __ffz32&#40;unsigned int theB32&#41;
&#123;
  __asm__("bsfl %1,%0" &#58; "=r" &#40;theB32&#41; &#58;"r" (~theB32&#41;);
  return theB32;
&#125;
#endif

And it works. However, I'm having some difficulty writing a 64 bit version. Any clues for the clueless?

bob · Post by **bob** » Wed Mar 12, 2008 3:25 am

sje wrote:When compiling using g++ 4.x for a 32 bit x86 target, I use the following code for FindFirstZero:
Code: Select all
#if &#40;CTHostCpuX86 && CTArchBits32 && CTAllowAssembly&#41;
static __inline__ unsigned int __ffz32&#40;unsigned int theB32&#41;
&#123;
  __asm__("bsfl %1,%0" &#58; "=r" &#40;theB32&#41; &#58;"r" (~theB32&#41;);
  return theB32;
&#125;
#endif
And it works. However, I'm having some difficulty writing a 64 bit version. Any clues for the clueless?

use "bsfq/bsrq" (quadword format)...

The only danger is if the argument is zero when you call the function. The result is undefined and most X86 processors do not change the destination register if that is true. The inline64.h file in Crafty has a MSB/LSB function that does exactly this, but it returns 64 if no bit is set...

sje · Post by **sje** » Wed Mar 12, 2008 4:22 am

bob wrote:
sje wrote:When compiling using g++ 4.x for a 32 bit x86 target, I use the following code for FindFirstZero:
Code: Select all
#if &#40;CTHostCpuX86 && CTArchBits32 && CTAllowAssembly&#41;
static __inline__ unsigned int __ffz32&#40;unsigned int theB32&#41;
&#123;
  __asm__("bsfl %1,%0" &#58; "=r" &#40;theB32&#41; &#58;"r" (~theB32&#41;);
  return theB32;
&#125;
#endif
And it works. However, I'm having some difficulty writing a 64 bit version. Any clues for the clueless?
use "bsfq/bsrq" (quadword format)...

The only danger is if the argument is zero when you call the function. The result is undefined and most X86 processors do not change the destination register if that is true. The inline64.h file in Crafty has a MSB/LSB function that does exactly this, but it returns 64 if no bit is set...

The zero check is made in the caller.

Questions:

1) Doesn't the formal parameter declaration have to be changed to unsigned long long (64 bit)?

2) But we still want to return an unsigned int (32 bit), so doesn't the return statement have to be modified?

bob · Post by **bob** » Wed Mar 12, 2008 6:38 am

sje wrote:
bob wrote:
sje wrote:When compiling using g++ 4.x for a 32 bit x86 target, I use the following code for FindFirstZero:
Code: Select all
#if &#40;CTHostCpuX86 && CTArchBits32 && CTAllowAssembly&#41;
static __inline__ unsigned int __ffz32&#40;unsigned int theB32&#41;
&#123;
  __asm__("bsfl %1,%0" &#58; "=r" &#40;theB32&#41; &#58;"r" (~theB32&#41;);
  return theB32;
&#125;
#endif
And it works. However, I'm having some difficulty writing a 64 bit version. Any clues for the clueless?
use "bsfq/bsrq" (quadword format)...

The only danger is if the argument is zero when you call the function. The result is undefined and most X86 processors do not change the destination register if that is true. The inline64.h file in Crafty has a MSB/LSB function that does exactly this, but it returns 64 if no bit is set...
The zero check is made in the caller.

Questions:

1) Doesn't the formal parameter declaration have to be changed to unsigned long long (64 bit)?

Here's my MSB inline:
Code: Select all
int static __inline__ MSB&#40;long word&#41;
&#123;
  long dummy, dummy2;

  asm&#40;"          bsrq    %1, %0" "\n\t"
      "          jnz     1f"     "\n\t"
      "          movq    $64, %0" "\n\t"
      "1&#58;"&#58;"=&r"&#40;dummy&#41;,
      "=&r" &#40;dummy2&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;
On a 64 bit machine, you don't need long long, "long" will do the trick since the machine has 64 bit words. But since microsoft decided that a 16 bit value was a word many years ago, and then that a 32 bit value must be a doubleword, we are now left with quadwords on 64 bit machines...

BTW why would you want to return a 32 bit value on a 64 bit architecture? It isn't any faster and the native registers are 64 bits on the x86-64 processors.

2) But we still want to return an unsigned int (32 bit), so doesn't the return statement have to be modified?[/quote]

Dann Corbit · Post by **Dann Corbit** » Wed Mar 12, 2008 7:27 am

bob wrote:
sje wrote:
bob wrote:
sje wrote:When compiling using g++ 4.x for a 32 bit x86 target, I use the following code for FindFirstZero:
Code: Select all
#if &#40;CTHostCpuX86 && CTArchBits32 && CTAllowAssembly&#41;
static __inline__ unsigned int __ffz32&#40;unsigned int theB32&#41;
&#123;
  __asm__("bsfl %1,%0" &#58; "=r" &#40;theB32&#41; &#58;"r" (~theB32&#41;);
  return theB32;
&#125;
#endif
And it works. However, I'm having some difficulty writing a 64 bit version. Any clues for the clueless?
use "bsfq/bsrq" (quadword format)...

The only danger is if the argument is zero when you call the function. The result is undefined and most X86 processors do not change the destination register if that is true. The inline64.h file in Crafty has a MSB/LSB function that does exactly this, but it returns 64 if no bit is set...
The zero check is made in the caller.

Questions:

1) Doesn't the formal parameter declaration have to be changed to unsigned long long (64 bit)?
Here's my MSB inline:
Code: Select all
int static __inline__ MSB&#40;long word&#41;
&#123;
  long dummy, dummy2;

  asm&#40;"          bsrq    %1, %0" "\n\t"
      "          jnz     1f"     "\n\t"
      "          movq    $64, %0" "\n\t"
      "1&#58;"&#58;"=&r"&#40;dummy&#41;,
      "=&r" &#40;dummy2&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;
On a 64 bit machine, you don't need long long, "long" will do the trick since the machine has 64 bit words. But since microsoft decided that a 16 bit value was a word many years ago, and then that a 32 bit value must be a doubleword, we are now left with quadwords on 64 bit machines...

BTW why would you want to return a 32 bit value on a 64 bit architecture? It isn't any faster and the native registers are 64 bits on the x86-64 processors.
2) But we still want to return an unsigned int (32 bit), so doesn't the return statement have to be modified?

[/quote]

I guess that once he sees how you did it, he will know what to do.

Code: Select all

/*
     AMD Opteron inline functions for MSB&#40;), LSB&#40;) and
     PopCnt&#40;).  Note that these are 64 bit functions and they use
     64 bit &#40;quad-word&#41; X86-64 instructions.
*/
int static __inline__ MSB&#40;long word&#41;
&#123;
  long dummy, dummy2;

asm&#40;"          bsrq    %1, %0" "\n\t" "          jnz     1f" "\n\t" "          m
      "=&r"
      &#40;dummy2&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;

int static __inline__ LSB&#40;long word&#41;
&#123;
  long dummy, dummy2;

asm&#40;"          bsfq    %1, %0" "\n\t" "          jnz     1f" "\n\t" "          m
      "=&r"
      &#40;dummy2&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;

int static __inline__ PopCnt&#40;long word&#41;
&#123;
  long dummy, dummy2, dummy3;

asm&#40;"          xorq    %0, %0" "\n\t" "          testq   %1, %1" "\n\t" "
      "=&r"
      &#40;dummy3&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;

Dann Corbit · Post by **Dann Corbit** » Wed Mar 12, 2008 7:32 am

Dann Corbit wrote:
bob wrote:
sje wrote:
bob wrote:
sje wrote:When compiling using g++ 4.x for a 32 bit x86 target, I use the following code for FindFirstZero:
Code: Select all
#if &#40;CTHostCpuX86 && CTArchBits32 && CTAllowAssembly&#41;
static __inline__ unsigned int __ffz32&#40;unsigned int theB32&#41;
&#123;
  __asm__("bsfl %1,%0" &#58; "=r" &#40;theB32&#41; &#58;"r" (~theB32&#41;);
  return theB32;
&#125;
#endif
And it works. However, I'm having some difficulty writing a 64 bit version. Any clues for the clueless?
use "bsfq/bsrq" (quadword format)...

The only danger is if the argument is zero when you call the function. The result is undefined and most X86 processors do not change the destination register if that is true. The inline64.h file in Crafty has a MSB/LSB function that does exactly this, but it returns 64 if no bit is set...
The zero check is made in the caller.

Questions:

1) Doesn't the formal parameter declaration have to be changed to unsigned long long (64 bit)?
Here's my MSB inline:
Code: Select all
int static __inline__ MSB&#40;long word&#41;
&#123;
  long dummy, dummy2;

  asm&#40;"          bsrq    %1, %0" "\n\t"
      "          jnz     1f"     "\n\t"
      "          movq    $64, %0" "\n\t"
      "1&#58;"&#58;"=&r"&#40;dummy&#41;,
      "=&r" &#40;dummy2&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;
On a 64 bit machine, you don't need long long, "long" will do the trick since the machine has 64 bit words. But since microsoft decided that a 16 bit value was a word many years ago, and then that a 32 bit value must be a doubleword, we are now left with quadwords on 64 bit machines...

BTW why would you want to return a 32 bit value on a 64 bit architecture? It isn't any faster and the native registers are 64 bits on the x86-64 processors.
2) But we still want to return an unsigned int (32 bit), so doesn't the return statement have to be modified?

I guess that once he sees how you did it, he will know what to do.

Code: Select all

/*
     AMD Opteron inline functions for MSB&#40;), LSB&#40;) and
     PopCnt&#40;).  Note that these are 64 bit functions and they use
     64 bit &#40;quad-word&#41; X86-64 instructions.
*/
int static __inline__ MSB&#40;long word&#41;
&#123;
  long dummy, dummy2;

asm&#40;"          bsrq    %1, %0" "\n\t" "          jnz     1f" "\n\t" "          m
      "=&r"
      &#40;dummy2&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;

int static __inline__ LSB&#40;long word&#41;
&#123;
  long dummy, dummy2;

asm&#40;"          bsfq    %1, %0" "\n\t" "          jnz     1f" "\n\t" "          m
      "=&r"
      &#40;dummy2&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;

int static __inline__ PopCnt&#40;long word&#41;
&#123;
  long dummy, dummy2, dummy3;

asm&#40;"          xorq    %0, %0" "\n\t" "          testq   %1, %1" "\n\t" "
      "=&r"
      &#40;dummy3&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;

[/quote]

Ack. The formatting came out awful.
Let's try again:

Code: Select all

/*
     AMD Opteron inline functions for MSB&#40;), LSB&#40;) and
     PopCnt&#40;).  Note that these are 64 bit functions and they use
     64 bit &#40;quad-word&#41; X86-64 instructions.
*/
int static __inline__ MSB&#40;long word&#41;
&#123;
  long dummy, dummy2;

asm&#40;"          bsrq    %1, %0" "\n\t" 
    "          jnz     1f" "\n\t" 
    "          movq    $64, %0" "\n\t" 
    "1&#58;"&#58;"=&r"&#40;dummy&#41;, "=&r"
      &#40;dummy2&#41;
&#58;    "1"(&#40;long&#41; &#40;word&#41;)
&#58;    "cc");
  return &#40;dummy&#41;;
&#125;

sje · Post by **sje** » Wed Mar 12, 2008 11:52 am

It turned out that there were a couple of bugs elsewhere including a somewhat subtle 32 bit to 64 bit conversion failure that had been hidden by a fault in the compilation environment definitions. The program is now working correctly in full x86-64 mode with inline assembly assistance.

I need to do a little more work on cleaning up the preprocessor environment symbol definition chain. It works, but it's not something I like to have to show to others. Anyway, I'm still committed to supporting combinations of:

1) Three different CPU families: Intel, PowerPC, and unknown;

2) Two different scalar type sets 32/64 bit math and pointers;

3) Three different host operating systems: OS/X (OpenBSD), Linux, and unknown.

The idea is that all of this is determined at compile time. On the Macintosh, the Xcode IDE constructs multiple object versions in the same executable file while the system loader automatically selects the "rigtt" one at application start time.

I have made one compromise so far; I'm no longer supporting or even testing Mac OS/X versions before 10.4. I had been doing some 10.2 support, but it's not worth it anymore. Every Mac machine for some time now has been a 64 bit capable, multiple core Intel box. I still use some older PowerPC hardware including a 400 MHz G4 desktop that's eight years old and still in use with OS/X 10.5 no less.

Interestingly, nearly all Mac application software still runs in 32 bit mode as is required for G3/G4 PowerPC chips and all Intel Core Solo and Core Duo (not Core 2 Duo, like the PowerPC G5 it's 64 bit capable). I strongly suspect that when OS/X 10.6 arrives in a couple of years from now, Apple will have jettisoned support of all 32 bit only notebooks and desktops machines. The handheld gizmos might be an exception.

Need help with g++ inline assembly

Need help with g++ inline assembly

Re: Need help with g++ inline assembly

Re: Need help with g++ inline assembly

Re: Need help with g++ inline assembly

Re: Need help with g++ inline assembly

Re: Need help with g++ inline assembly

Re: Need help with g++ inline assembly