Please help to conver my asm code into Linux asm

Casey · Post by **Casey** » Mon Jan 21, 2008 4:16 am

I have been migrating my chess from Win to Linux. I can understand and modify assembly code for Intel-Win, written for VC++ (my code is actually took and modified from someone in this forum). However, it seems be too hard for me to understand and modify assembly code in AT&T style (for GCC compiler). I have tried several times but all attemps were crashed.

Could someone help me to convert the bellow code into new one for GCC compiler? Please write in simple and straight style so I can learn and modify later.

Many thanks for any help.

Code: Select all

// Get index of the first active bit and clear it
__forceinline unsigned int TakeOneA(UINT32 & data)
{
	__asm
	{
		xor		edx, edx	
		mov	      eax, edx	
		inc		edx			

		mov	      ebx, [data]	

		bsf		ecx, [ebx]
		jnz		found

		add		ebx, 4
		bsf		ecx, [ebx]
		mov		eax, 32
		jnz		found

		mov		eax, 64
                jmp             done
	

	found:
		shl		edx, cl
		add		eax, ecx
		xor		[ebx], edx
        done:
	}

}

hgm · Post by **hgm** » Mon Jan 21, 2008 8:17 am

I think the easiest way to leard it is to write a simple C routine, and compile it with 'gcc -S' to produce an assembler listing in a *.s file. Then you can see exactly what the assembly code should look like.

Most important (and confusing) difference is that you have to swap the order of all sources and destinations. Addressing modes will have to use barentheses in stead of brackets. And fthe mnemocs of ALU instructions will have b or l suffix depending on the width of the data type (byte or 32-bit). And register names will all start with %. Immediate operants have to be preceeded by $.

So

mov ebx, 64
mov eax, [ebx]

will become

movl $64, ebx
movl (%ebx), %eax

Have fun!

Pradu · Post by **Pradu** » Mon Jan 21, 2008 10:41 am

This is a fine webpage for GCC inline assembly syntax:
http://www.ibiblio.org/gferg/ldp/GCC-In ... HOWTO.html

What I do is use compiler intrinsics available for MSVC/Intel and implement the intrinsics for GCC. Sometimes all three compilers have the intrinsics implemented depending on what you want to do (like getting the return address for the stack). Here are the more common implementations of the intrinsics used in a bitboard chess program:

Code: Select all

#ifdef _MSC_VER
	#include <intrin.h>
	#pragma message("MSC compatible compiler detected -- turning off warning 4146")
	#pragma warning( disable : 4146)
	#ifdef _WIN64
		#pragma intrinsic(_BitScanForward64)
		#pragma intrinsic(_BitScanReverse64)
		#define USE_PROCESSOR_INSTRUCTIONS
		#define USING_INTRINSICS
	#endif
#elif defined(__GNUC__) && defined(__LP64__)
	static INLINE unsigned char _BitScanForward64(unsigned int* const Index, const U64 Mask)
	{
		U64 Ret;
		__asm__
		(
			"bsfq %[Mask], %[Ret]"
			:[Ret] "=r" (Ret)
			:[Mask] "mr" (Mask)
		);
		*Index = (unsigned int)Ret;
		return Mask?1:0;
	}
	static INLINE unsigned char _BitScanReverse64(unsigned int* const Index, const U64 Mask)
	{
		U64 Ret;
		__asm__
		(
			"bsrq %[Mask], %[Ret]"
			:[Ret] "=r" (Ret)
			:[Mask] "mr" (Mask)
		);
		*Index = (unsigned int)Ret;
		return Mask?1:0;
	}
	#define USING_INTRINSICS
#endif



#if defined(__GNUC__)
	typedef volatile int SpinLock[1];
	typedef volatile int* const SpinLock_P;
	static INLINE int volatile LockedExchange(SpinLock_P Target, const int Value)
	{
		int ret = Value;
		__asm__
		(
			"xchgl %[ret], %[Target]"
			: [ret] "+r" (ret)
			: [Target] "m" (*Target)
			: "memory"
		);
		return ret;
	}
#elif defined(_MSC_VER)
	typedef volatile long SpinLock[1];
	typedef volatile long* const SpinLock_P;
	#include <intrin.h>
	#pragma intrinsic (_InterlockedExchange) 
	#define LockedExchange(Target,Value) _InterlockedExchange(Target,Value)
#else
	#error Unspported Compiler
#endif

I guess you still have to be careful when switching compilers. MSVC makes long 32-bits and GCC makes long 64-bits. I'd use int for 32-bit integers and long long for 64-bit integers but I suppose you could do some preprocessor and define your own types.

Ron Murawski · Post by **Ron Murawski** » Tue Jan 22, 2008 12:12 am

This link is very helpful for Linux assembly
http://www.ibm.com/developerworks/linux ... zone=linux

Even better would be to drop the assembly language and replace it with C code. The following code will probably execute faster than your assembly code because the compiler will be better able to optimize around all the calls.

Code: Select all

// 'lsz64_tbl' source: Matt Taylor
static const int lsz64_tbl[64] =
{
    0, 31,  4, 33, 60, 15, 12, 34,
   61, 25, 51, 10, 56, 20, 22, 35,
   62, 30,  3, 54, 52, 24, 42, 19,
   57, 29,  2, 44, 47, 28,  1, 36,
   63, 32, 59,  5,  6, 50, 55,  7,
   16, 53, 13, 41,  8, 43, 46, 17,
   26, 58, 49, 14, 11, 40,  9, 45,
   21, 48, 39, 23, 18, 38, 37, 27
};

#ifdef _MSC_VER
#define FORCEINLINE __forceinline
#else
#define FORCEINLINE __inline
#endif


//______________________________________________________________________________
/* FirstPieceAndClear():
 *
 *      Return square number (0 to 63) of the least significant set bit
 *      in bitboard 'bb' and clear that bit from bitboard 'bb'
 *
 *      source: Matt Taylor's "de Bruijn method" implementation
 */
//______________________________________________________________________________
FORCEINLINE int FirstPieceAndClear(BITBOARD *bb)
{
   const BITBOARD  lsb = (*bb & -(s64) *bb) - 1;
   register const u32 foldedLSB = ((u32) lsb) ^ ((u32) (lsb >> 32));

   *bb &= *bb - 1; // clear least significant bit from bb

   return lsz64_tbl[foldedLSB * 0x78291ACF >> 26];
}

My board layout is:
a8 = 0
h8 = 7
a1 = 56
h1 = 63
If your layout is different you'll have to rearrange the lsz64_tb array entries.

Ron

Casey · Post by **Casey** » Tue Jan 22, 2008 7:00 am

Thank you all for helping me.

Your codes are excellent. I can use them directly in my program. Thanks again.

However, I still insist to develop my own code because I plan to modify later.

I knew some basic knowledge of Linux asm. But I have some problems of declaring output/input. Compile by gcc -S cannot help for these problems.

My bellow code is quite simple and based on what I have just learned. It can be compiled but can not be run (always crash). Can someone fix for me?

Thanks a lot for any help.

Code: Select all

int static __inline__ TakeOneA(UINT32& data)
{
	int ret;
	asm(
		
			"	xorl	%%edx, %%edx"	"\n\t"
			"	movl	%%eax, %%edx"	"\n\t"
			"	incl	%%edx"			"\n\t"
			
			"	movl	(%1), %%ebx"	"\n\t"

			"	bsfl	(%%ebx), %%ecx"	"\n\t"
			"	jnz		found"			"\n\t"
			
			"	addl	$4, %%ebx"		"\n\t"
			"	bsfl	(%%ebx), %%ecx"	"\n\t"
			"	movl	$32, %%eax"		"\n\t"
			"	jnz		found"			"\n\t"
			
			"	movl	$64, %%eax"		"\n\t"
			"	jmp		done"			"\n\t"
			
			
			"found:"					"\n\t"
			"	shl		%%cl, %%edx"	"\n\t"
			"	addl	%%ecx, %%eax"	"\n\t"
			"	xorl	%%edx, (%%ebx)"	"\n\t"
			"	movl	%%eax, %0"	"\n\t"
			"done:"					"\n\t"
			
	:"=r" (ret)
	:"r" (&data)
		
	   );
	return ret;
}

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Jan 22, 2008 7:03 am

Ron Murawski wrote:This link is very helpful for Linux assembly
http://www.ibm.com/developerworks/linux ... zone=linux

Even better would be to drop the assembly language and replace it with C code. The following code will probably execute faster than your assembly code because the compiler will be better able to optimize around all the calls.
Code: Select all
// 'lsz64_tbl' source: Matt Taylor
static const int lsz64_tbl[64] =
{
    0, 31,  4, 33, 60, 15, 12, 34,
   61, 25, 51, 10, 56, 20, 22, 35,
   62, 30,  3, 54, 52, 24, 42, 19,
   57, 29,  2, 44, 47, 28,  1, 36,
   63, 32, 59,  5,  6, 50, 55,  7,
   16, 53, 13, 41,  8, 43, 46, 17,
   26, 58, 49, 14, 11, 40,  9, 45,
   21, 48, 39, 23, 18, 38, 37, 27
};

#ifdef _MSC_VER
#define FORCEINLINE __forceinline
#else
#define FORCEINLINE __inline
#endif


//______________________________________________________________________________
/* FirstPieceAndClear():
 *
 *      Return square number (0 to 63) of the least significant set bit
 *      in bitboard 'bb' and clear that bit from bitboard 'bb'
 *
 *      source: Matt Taylor's "de Bruijn method" implementation
 */
//______________________________________________________________________________
FORCEINLINE int FirstPieceAndClear(BITBOARD *bb)
{
   const BITBOARD  lsb = (*bb & -(s64) *bb) - 1;
   register const u32 foldedLSB = ((u32) lsb) ^ ((u32) (lsb >> 32));

   *bb &= *bb - 1; // clear least significant bit from bb

   return lsz64_tbl[foldedLSB * 0x78291ACF >> 26];
}
My board layout is:
a8 = 0
h8 = 7
a1 = 56
h1 = 63
If your layout is different you'll have to rearrange the lsz64_tb array entries.

Ron

Yes, for 32-bit mode with fast 3-4 cycle imul Matt's method is the favorite. Specially for K8 and or P4, where bsf/bsr is very slow - and if you have a square mapping, which requires a lookup anyway. The preferred method for Core2duo in 64-bit mode is one bsf (via _BitScanForward64 intrinsic or GCC inline asm as mentioned by Pradu) for little endian rank-file-mapping, where the found bit-index already correspondents to the square-index:

http://chessprogramming.wikispaces.com/BitScan
http://chessprogramming.wikispaces.com/ ... iderations

K10 has faster lzc (leading zero count) than bsr though.

Ron Murawski · Post by **Ron Murawski** » Tue Jan 22, 2008 7:50 pm

Sorry, I can't help you with your assembly language problem. I am *not* an expert in this field. I'm sure someone else can find and fix your problem.

I've replaced all of my old assembly routines with C code equivalents and the speed of my program improved! Using assembly routines makes a compiler's job of optimization much more difficult. Even though your routine might be the fastest on the planet, it is possible that in actual practice it runs slower because of the compiler's need to insert push/pop instructions on each side of your code to preserve its own registers.

Casey wrote: I still insist to develop my own code because I plan to modify later.

For search and eval coding I agree with you. But for bitscans, population counts, and magic sliders -- I regard them as library routines to be used as needed. "Develop my own code" can be carried to extremes where you rewrite all the C library functions too. You don't want to do that!

I recommend that you use one of Gerd's bitscan routines that are written in C.
http://chessprogramming.wikispaces.com/BitScan

Ron

Please help to conver my asm code into Linux asm

Please help to conver my asm code into Linux asm

Re: Please help to conver my asm code into Linux asm

Re: Please help to conver my asm code into Linux asm

Re: Please help to conver my asm code into Linux asm

Re: Please help to conver my asm code into Linux asm

Re: Please help to conver my asm code into Linux asm

Re: Please help to conver my asm code into Linux asm