Problem with functions not inlining

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Bo Persson
Posts: 243
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: Problem with functions not inlining

Post by Bo Persson »

Rein Halbersma wrote:
Gerd Isenberg wrote:
Rein Halbersma wrote:

Code: Select all

xmm128i& operator&(const xmm128i& left, const xmm128i& right)
{
    return xmm128i(left) ^= right; // should be &=
}
This should be almost guaranteed to be inlined without copy overhead because of the Return Value Optimization mechanism.
Guess you mean a friend? Despite, is it correct to return a reference (or pointer) to a temporary "object", which is no longer valid outside the scope of the function? Which would be like this pointer one, which hardly looks correct to me:
Gerd
Sorry for the typo, I meant &= and not ^=. I did mean to have a non-friend operator& though. There is no reason to clutter the class interface by adding friend operators that directly manipulate data members. Stroustrup also define binary operators in terms of their compound assignment cousins in his TC++PL book.

Return by value would of course always work. But the return by reference should not give any problems (it does not in my code). Can you give an example where it does give a problem?
It is always a problem to return a pointer or reference to a local variable or temporary, because the object is destroyed when leaving the function.

If (big IF) the function is inlined and/or holds the value in an XMM register, you might not always notice that the value formally is gone, because the bits happen to still be there.

Using a value after it goes out of scope is undefined behavior, so anything can happen - including that the code seems to work.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Problem with functions not inlining

Post by Gerd Isenberg »

Rein Halbersma wrote: Sorry for the typo, I meant &= and not ^=. I did mean to have a non-friend operator& though. There is no reason to clutter the class interface by adding friend operators that directly manipulate data members. Stroustrup also define binary operators in terms of their compound assignment cousins in his TC++PL book.

Return by value would of course always work. But the return by reference should not give any problems (it does not in my code). Can you give an example where it does give a problem?

Code: Select all

struct XYZ {
    int x;
    int y;
    int z;
    XYZ * parent;
};

XYZ & fooref(XYZ *ptrParent) {
  XYZ xyz;  // automatic variable on stack
  xyz.x = 1;
  xyz.y = 2;
  xyz.z = 3;
  xyz.parent = ptrParent;
  return xyz;  // oups
}

XYZ * fooptr(XYZ *ptrParent) {
  XYZ xyz;  // automatic variable on stack
  xyz.x = 1;
  xyz.y = 2;
  xyz.z = 3;
  xyz.parent = ptrParent;
  return &xyz;  // oups
}

int main(int argc, char* argv[])
{
   XYZ & refxyz = fooref (NULL); // refers to garbage area of stack
   // each interrupt and further call will likely overwrite that structure
   printf("x:%10d y:%10d z:%10d\n", refxyz.x, refxyz.y, refxyz.z);
   getchar();
   printf("x:%10d y:%10d z:%10d\n", refxyz.x, refxyz.y, refxyz.z);
   XYZ * ptrxyz = fooptr (NULL); // points to garbage area of stack
   printf("x:%10d y:%10d z:%10d\n", ptrxyz->x, ptrxyz->y, ptrxyz->z);
   // this seems safer, but still a bug
   XYZ xyz = fooref (NULL); 
   printf("x:%10d y:%10d z:%10d\n", xyz.x, xyz.y, xyz.z);
   getchar();
   return refxyz.x;
}
VC6 at least gives some warnings
warning C4172: returning address of local variable or temporary
warning C4172: returning address of local variable or temporary
x:1073741929 y: 2048 z: 0
x: 0 y: 4243808 z: 0
x: 1245036 y: 4231296 z: 0
x: 0 y: 4231216 z: 0
User avatar
Greg Strong
Posts: 388
Joined: Sun Dec 21, 2008 6:57 pm
Location: Washington, DC

Re: Problem with functions not inlining

Post by Greg Strong »

Sven Schüle wrote:Could you show one exact example of how you use that operator& in your code?

My suspicion is that your problem may be related to the way how you have defined your operator(s).

Sven
Ok, I've now looked at all the suggestions, and tried this every which way, and still no dice. I've created a new project from scratch with only the essential code and a small test, and looked at the compiler options repeatedly. I just can't believe that this doesn't work.

Here's the code for the classes:

Code: Select all

#include <emmintrin.h>

typedef unsigned char byte;

union xmm
&#123;
	byte AsBytes&#91;16&#93;;
	__m128i As128i;
&#125;;

class xmm128i
&#123;
  private&#58;
	__m128i m_data;

  public&#58;
	__forceinline xmm128i&#40;) &#123; &#125; 

	__forceinline xmm128i&#40; __m128i *source )
	&#123; m_data = _mm_load_si128&#40; source ); &#125;

	__forceinline xmm128i&#40; __m128i source )&#58; m_data&#40;source&#41; &#123; &#125;

	__forceinline xmm128i&#40; xmm const &source )
	&#123; _mm_load_si128&#40; &source.As128i ); &#125;

	__forceinline ~xmm128i&#40;) &#123; &#125;

	__forceinline operator __m128i&#40;)
	&#123; return m_data; &#125;


	//	logical operations
	__forceinline friend xmm128i operator &( xmm128i const &left, xmm128i const &right )
	&#123; return xmm128i&#40; _mm_and_si128&#40; left.m_data, right.m_data ) ); &#125;

&#125;;
Here's the code for a simple test function:

Code: Select all

xmm128i TEST&#40;)
&#123;
	xmm data1;
	xmm data2;

	//	initialize our data objects with random data ...
	data1.AsBytes&#91;0&#93; = 4; data1.AsBytes&#91;1&#93; = 4; data1.AsBytes&#91;2&#93; = 4; data1.AsBytes&#91;3&#93; = 4;
	data1.AsBytes&#91;4&#93; = 4; data1.AsBytes&#91;5&#93; = 4; data1.AsBytes&#91;6&#93; = 4; data1.AsBytes&#91;7&#93; = 4;
	data1.AsBytes&#91;8&#93; = 4; data1.AsBytes&#91;9&#93; = 4; data1.AsBytes&#91;10&#93; = 4; data1.AsBytes&#91;11&#93; = 4;
	data1.AsBytes&#91;12&#93; = 4; data1.AsBytes&#91;13&#93; = 4; data1.AsBytes&#91;14&#93; = 4; data1.AsBytes&#91;15&#93; = 4;

	data2.AsBytes&#91;0&#93; = 0; data2.AsBytes&#91;1&#93; = 4; data2.AsBytes&#91;2&#93; = 0; data2.AsBytes&#91;3&#93; = 4;
	data2.AsBytes&#91;4&#93; = 0; data2.AsBytes&#91;5&#93; = 4; data2.AsBytes&#91;6&#93; = 0; data2.AsBytes&#91;7&#93; = 4;
	data2.AsBytes&#91;8&#93; = 7; data2.AsBytes&#91;9&#93; = 0; data2.AsBytes&#91;10&#93; = 7; data2.AsBytes&#91;11&#93; = 0;
	data2.AsBytes&#91;12&#93; = 7; data2.AsBytes&#91;13&#93; = 0; data2.AsBytes&#91;14&#93; = 7; data2.AsBytes&#91;15&#93; = 0;

	xmm128i A&#40; data1 );
	xmm128i B&#40; data2 );

	//	use our operater & ...
	xmm128i A_and_B = A & B;

	return A_and_B;
&#125;
And here's the assembly output for TEST function:

Code: Select all

?TEST@@YA?AVxmm128i@@XZ PROC				; TEST, COMDAT
; ___$ReturnUdt$ = esi

; 4    &#58; &#123;

	push	ebp
	mov	ebp, esp
	and	esp, -16				; fffffff0H
	sub	esp, 32					; 00000020H
	mov	eax, DWORD PTR ___security_cookie
	xor	eax, esp
	mov	DWORD PTR __$ArrayPad$&#91;esp+32&#93;, eax

; 5    &#58; 	xmm data1;
; 6    &#58; 	xmm data2;
; 7    &#58; 
; 8    &#58; 	//	initialize our data objects with random data ...
; 9    &#58; 	data1.AsBytes&#91;0&#93; = 4; data1.AsBytes&#91;1&#93; = 4; data1.AsBytes&#91;2&#93; = 4; data1.AsBytes&#91;3&#93; = 4;
; 10   &#58; 	data1.AsBytes&#91;4&#93; = 4; data1.AsBytes&#91;5&#93; = 4; data1.AsBytes&#91;6&#93; = 4; data1.AsBytes&#91;7&#93; = 4;
; 11   &#58; 	data1.AsBytes&#91;8&#93; = 4; data1.AsBytes&#91;9&#93; = 4; data1.AsBytes&#91;10&#93; = 4; data1.AsBytes&#91;11&#93; = 4;
; 12   &#58; 	data1.AsBytes&#91;12&#93; = 4; data1.AsBytes&#91;13&#93; = 4; data1.AsBytes&#91;14&#93; = 4; data1.AsBytes&#91;15&#93; = 4;
; 13   &#58; 
; 14   &#58; 	data2.AsBytes&#91;0&#93; = 0; data2.AsBytes&#91;1&#93; = 4; data2.AsBytes&#91;2&#93; = 0; data2.AsBytes&#91;3&#93; = 4;
; 15   &#58; 	data2.AsBytes&#91;4&#93; = 0; data2.AsBytes&#91;5&#93; = 4; data2.AsBytes&#91;6&#93; = 0; data2.AsBytes&#91;7&#93; = 4;
; 16   &#58; 	data2.AsBytes&#91;8&#93; = 7; data2.AsBytes&#91;9&#93; = 0; data2.AsBytes&#91;10&#93; = 7; data2.AsBytes&#91;11&#93; = 0;
; 17   &#58; 	data2.AsBytes&#91;12&#93; = 7; data2.AsBytes&#91;13&#93; = 0; data2.AsBytes&#91;14&#93; = 7; data2.AsBytes&#91;15&#93; = 0;
; 18   &#58; 
; 19   &#58; 	xmm128i A&#40; data1 );
; 20   &#58; 	xmm128i B&#40; data2 );
; 21   &#58; 
; 22   &#58; 	//	use our operater & ...
; 23   &#58; 	xmm128i A_and_B = A & B;

	push	esi
	lea	edx, DWORD PTR _B$&#91;esp+36&#93;
	lea	ecx, DWORD PTR _A$&#91;esp+36&#93;
	call	??I@YA?AVxmm128i@@ABV0@0@Z		; operator&

; 24   &#58; 
; 25   &#58; 	return A_and_B;
; 26   &#58; &#125;

	mov	ecx, DWORD PTR __$ArrayPad$&#91;esp+36&#93;
	add	esp, 4
	xor	ecx, esp
	mov	eax, esi
	call	@__security_check_cookie@4
	mov	esp, ebp
	pop	ebp
	ret	0
?TEST@@YA?AVxmm128i@@XZ ENDP				; TEST
END
If anyone wants to PM me their email address, I'll be happy to send visual studio project...
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Problem with functions not inlining

Post by wgarvin »

Greg Strong wrote:And here's the assembly output for TEST function:

Code: Select all

?TEST@@YA?AVxmm128i@@XZ PROC				; TEST, COMDAT
; ___$ReturnUdt$ = esi

; 4    &#58; &#123;

	push	ebp
	mov	ebp, esp
	and	esp, -16				; fffffff0H
	sub	esp, 32					; 00000020H
	mov	eax, DWORD PTR ___security_cookie
	xor	eax, esp
	mov	DWORD PTR __$ArrayPad$&#91;esp+32&#93;, eax

; 5    &#58; 	xmm data1;
; 6    &#58; 	xmm data2;
; 7    &#58; 
; 8    &#58; 	//	initialize our data objects with random data ...
; 9    &#58; 	data1.AsBytes&#91;0&#93; = 4; data1.AsBytes&#91;1&#93; = 4; data1.AsBytes&#91;2&#93; = 4; data1.AsBytes&#91;3&#93; = 4;
; 10   &#58; 	data1.AsBytes&#91;4&#93; = 4; data1.AsBytes&#91;5&#93; = 4; data1.AsBytes&#91;6&#93; = 4; data1.AsBytes&#91;7&#93; = 4;
; 11   &#58; 	data1.AsBytes&#91;8&#93; = 4; data1.AsBytes&#91;9&#93; = 4; data1.AsBytes&#91;10&#93; = 4; data1.AsBytes&#91;11&#93; = 4;
; 12   &#58; 	data1.AsBytes&#91;12&#93; = 4; data1.AsBytes&#91;13&#93; = 4; data1.AsBytes&#91;14&#93; = 4; data1.AsBytes&#91;15&#93; = 4;
; 13   &#58; 
; 14   &#58; 	data2.AsBytes&#91;0&#93; = 0; data2.AsBytes&#91;1&#93; = 4; data2.AsBytes&#91;2&#93; = 0; data2.AsBytes&#91;3&#93; = 4;
; 15   &#58; 	data2.AsBytes&#91;4&#93; = 0; data2.AsBytes&#91;5&#93; = 4; data2.AsBytes&#91;6&#93; = 0; data2.AsBytes&#91;7&#93; = 4;
; 16   &#58; 	data2.AsBytes&#91;8&#93; = 7; data2.AsBytes&#91;9&#93; = 0; data2.AsBytes&#91;10&#93; = 7; data2.AsBytes&#91;11&#93; = 0;
; 17   &#58; 	data2.AsBytes&#91;12&#93; = 7; data2.AsBytes&#91;13&#93; = 0; data2.AsBytes&#91;14&#93; = 7; data2.AsBytes&#91;15&#93; = 0;
; 18   &#58; 
; 19   &#58; 	xmm128i A&#40; data1 );
; 20   &#58; 	xmm128i B&#40; data2 );
; 21   &#58; 
; 22   &#58; 	//	use our operater & ...
; 23   &#58; 	xmm128i A_and_B = A & B;

	push	esi
	lea	edx, DWORD PTR _B$&#91;esp+36&#93;
	lea	ecx, DWORD PTR _A$&#91;esp+36&#93;
	call	??I@YA?AVxmm128i@@ABV0@0@Z		; operator&

; 24   &#58; 
; 25   &#58; 	return A_and_B;
; 26   &#58; &#125;

	mov	ecx, DWORD PTR __$ArrayPad$&#91;esp+36&#93;
	add	esp, 4
	xor	ecx, esp
	mov	eax, esi
	call	@__security_check_cookie@4
	mov	esp, ebp
	pop	ebp
	ret	0
?TEST@@YA?AVxmm128i@@XZ ENDP				; TEST
END
If anyone wants to PM me their email address, I'll be happy to send visual studio project...
An observation: There are no SSE2 instructions at all in that generated code.

Look in the project Properties, under Configuration Properties...

Under C/C++ > Optimization, check that "Enable Intrinsic Functions" is set to Yes (/Oi)

Under C/C++ > Code Generation, check that "Enable Enhanced Instruction Set" is set to "Streaming SIMD Extensions 2" (/arch:SSE2)

Especially that second one, because I believe the default is "Not Set".
User avatar
Greg Strong
Posts: 388
Joined: Sun Dec 21, 2008 6:57 pm
Location: Washington, DC

Re: Problem with functions not inlining

Post by Greg Strong »

You're correct about the defaults. But, yes, both those options are set correctly. And if I look at the "command line" section of the configuration properties, it shows that both /Oi and /arch:SSE2 are indeed in the string (supposedly) being passed to the compiler.

Now, in the other thread, Nathan Thom mentions arguments not actually making it to the compiler, but I don't know if they showed correctly in the "command line" section, or how he figured it out. I'm thinking I might well be having the same problem :(
User avatar
nthom
Posts: 112
Joined: Thu Mar 09, 2006 6:15 am
Location: Australia

Re: Problem with functions not inlining

Post by nthom »

Greg Strong wrote:You're correct about the defaults. But, yes, both those options are set correctly. And if I look at the "command line" section of the configuration properties, it shows that both /Oi and /arch:SSE2 are indeed in the string (supposedly) being passed to the compiler.

Now, in the other thread, Nathan Thom mentions arguments not actually making it to the compiler, but I don't know if they showed correctly in the "command line" section, or how he figured it out. I'm thinking I might well be having the same problem :(
For my problem, I had Maximize Speed (/O2) set in the Optimization screen, but when looking at the Command Line screen it wasn't listed until I effectively changed things, saved, then changed back.
User avatar
Greg Strong
Posts: 388
Joined: Sun Dec 21, 2008 6:57 pm
Location: Washington, DC

Re: Problem with functions not inlining

Post by Greg Strong »

nthom wrote:
Greg Strong wrote:You're correct about the defaults. But, yes, both those options are set correctly. And if I look at the "command line" section of the configuration properties, it shows that both /Oi and /arch:SSE2 are indeed in the string (supposedly) being passed to the compiler.

Now, in the other thread, Nathan Thom mentions arguments not actually making it to the compiler, but I don't know if they showed correctly in the "command line" section, or how he figured it out. I'm thinking I might well be having the same problem :(
For my problem, I had Maximize Speed (/O2) set in the Optimization screen, but when looking at the Command Line screen it wasn't listed until I effectively changed things, saved, then changed back.
Ah well, the options do show up there correctly for me, so that' not my problem ...
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Problem with functions not inlining

Post by Sven »

That constructor does not work as expected:

Code: Select all

   __forceinline xmm128i&#40; xmm const &source )
   &#123; _mm_load_si128&#40; &source.As128i ); &#125; 
since it does not assign anything to m_data. Please fix and try again. At least, that constructor is essential for your test code to work.

Nevertheless, you are now showing modified code which may have good chances to match your requirements. But my proposal was to show the part of the *original* code that contained your problem when starting this thread. Especially I would like to see the few lines where your original operator& was used somehow within an expression, including surrounding ode. The modified code might work or not work for very different reasons, and solving or understanding new issues might or might not be related to the original problem.

This is an important thing to understand: don't keep an unsolved problem open forever by just "modifying until it works", it will hit you again some day in future. Instead, try to understand why this single thing did not work, and only then proceed.

Sven
User avatar
Greg Strong
Posts: 388
Joined: Sun Dec 21, 2008 6:57 pm
Location: Washington, DC

Re: Problem with functions not inlining

Post by Greg Strong »

Good catch! I corrected that function, but still no dice.

I'm now pretty confidant that the optimizer is just broken. According to the documentation, __forceinline will force the function to be expanded inline, or, if it's not possible, it will generate a warning. Since neither is happening, it is clearly not working as advertised.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Problem with functions not inlining

Post by wgarvin »

I reproduced this with VC Express 2008 and the test code you pasted (and /arch:SSE2 and intrinsics enabled). With the Release build, I got generated code almost identical to what you pasted.

I'm not sure why, but when I disable exceptions (Configuration Properties > Code Generation > Enable C++ Exceptions) then it does use the SSE2 instructions right in the TEST function. So unless you need them, I suggest turning that off (which is generally a good idea anyway unless you are going to use them). I've never used C++ exceptions with SSE2 stuff, so I'm not sure what is going on here.

Also, turning off the "Buffer Security Check" on the same page might be worth doing.