Very strange Bug

bob · Post by **bob** » Tue Feb 15, 2011 7:06 pm

Desperado wrote:Hello Bob,

well my data is 99,9% standard "int", with the exceptions for
bitboards and hash numbers which are ui64_t (ms: unsigned __int64).

if you think of that, ok. But if you think also to replace the standard
integers with 64 bit integers i would have to think about it first.

of course i dont like too an explicit cast or using temporary values
where it is not necessary by default.

strange thing this one...

cheers

Ints are problematic if you use them for array subscripts and such, because on a 64 bit architecture, you need 64 bit addresses (or at least 48 bit addresses on today's platforms)...

using int might be ok, or not. But there is no advantage to using int unless you are dealing with an array of ints vs an array of int64's as the former is more cache-friendly than the latter. But there are a lot of hidden performance issues that using int can make you stumble over without ever knowing.

Desperado · Post by **Desperado** » Tue Feb 15, 2011 8:02 pm

well, whatever that means, but when i disable the compiler optimization
everything is working...

bob · Post by **bob** » Tue Feb 15, 2011 9:04 pm

Desperado wrote:well, whatever that means, but when i disable the compiler optimization
everything is working...

It "means" that you don't want to do that. Which is why I said that you really should stick to int64 except for cases where you have a large array. And if you use that array as a subscript for another array, you ought to use 64 bit ints there as well. Otherwise you are going to waste a ton of time on this kind of problem.

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Feb 15, 2011 9:04 pm

Desperado wrote:well, whatever that means, but when i disable the compiler optimization
everything is working...

The byte-wise generated assembly is certainly an "optimization" issue. The error is the shl al,cl instead of mov rax, -1; shl rax, cl, where shift amounts > 31 wrap around due to modulo 32.

While -(ui64_t) 1 looks fine to me, Volker mentions an overflow, which gives the compiler the "freedom" to optimize garbage as happened.

Code: Select all

const ui64_t allBitsSet = 0xffffffffffffffffULL;
ui64_t lo = allBitsSet << bsr64((lmsk->lineLo & occ)|1);

If that still fails, and it still generates the shl al,cl stuff, it is definitely a compiler bug.

adamh · Post by **adamh** » Wed Feb 16, 2011 12:44 am

Volker seems to be 100% right. And the solution should be:

Code: Select all

-( (ui64_t)1 )
// or "true c++ style
-reinterpret_cast<ui64_t>( 1 )

A more detailed discussion here:
http://www.talkchess.com/forum/viewtopi ... =&start=30

Now, I forgot, how much were you betting ? :p

Desperado · Post by **Desperado** » Wed Feb 16, 2011 8:20 am

Gerd Isenberg wrote:
Desperado wrote:well, whatever that means, but when i disable the compiler optimization
everything is working...
The byte-wise generated assembly is certainly an "optimization" issue. The error is the shl al,cl instead of mov rax, -1; shl rax, cl, where shift amounts > 31 wrap around due to modulo 32.

While -(ui64_t) 1 looks fine to me, Volker mentions an overflow, which gives the compiler the "freedom" to optimize garbage as happened.
Code: Select all
const ui64_t allBitsSet = 0xffffffffffffffffULL;
ui64_t lo = allBitsSet << bsr64((lmsk->lineLo & occ)|1); 
If that still fails, and it still generates the shl al,cl stuff, it is definitely a compiler bug.

Good morning everyone and hello Gerd,

I already tested "0xffffffffffffffffULL" , but did not look at the asm-file.
So i just repeat it.

Error occurs like in the -(ui64_t) 1 version. and here the corresponding
assembly is.

Code: Select all

	mov	QWORD PTR [rsp+16], rbx
	push	rdi
	sub	rsp, 32					; 00000020H
	test	BYTE PTR [rcx+392], 1
	mov	rdi, rdx
	mov	rbx, rcx
	je	$LN1@genCastleC
	movsxd	rax, DWORD PTR [rcx+404]
	mov	r8, QWORD PTR [rcx+8]
	mov	QWORD PTR [rsp+48], rsi
	or	r8, QWORD PTR [rcx]
	lea	r11, QWORD PTR [rax+rax*2]
	lea	rsi, OFFSET FLAT:lmsk
	mov	rax, QWORD PTR [rsi+r11*8]
	mov	r9, QWORD PTR [rsi+r11*8+8]
	mov	edx, 255				; 000000ffH
	and	rax, r8
	and	r9, r8
	or	rax, 1
	bsr	r10, rax
	mov	rax, QWORD PTR [rsi+r11*8+1536]
	and	rax, r8
	or	rax, 1
	bsr	rcx, rax
	mov	rax, QWORD PTR [rsi+r11*8+1544]
	and	rax, r8
	movzx	r8d, al
	neg	r8b
	and	r8b, al
	mov	eax, edx

	shl	al, cl          ; // is this the point you mentioned Gerd ?  al, lo file = -1 byte wise << bsr64(lo file | 1) ?

	add	r8b, r8b
	mov	ecx, r10d
	add	r8b, al
	shl	dl, cl
	movzx	eax, r9b
	and	r8b, BYTE PTR [rsi+r11*8+1552]
	neg	al
	and	al, r9b
	add	al, al
	add	al, dl
	and	al, BYTE PTR [rsi+r11*8+16]
	mov	rsi, QWORD PTR [rsp+48]
	or	r8b, al
	test	r8b, 32					; 00000020H
	je	SHORT $LN1@genCastleC
	mov	rdx, rdi
	mov	rcx, rbx
	call	?generate_oow@@YAXPEAUpos_t@@PEAUmli_t@@@Z ; generate_oow
$LN1@genCastleC:
	mov	rbx, QWORD PTR [rsp+56]
	add	rsp, 32					; 00000020H
	pop	rdi
	ret	0

Michael

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Feb 16, 2011 8:36 am

Desperado wrote:
Gerd Isenberg wrote:
Desperado wrote:well, whatever that means, but when i disable the compiler optimization
everything is working...
The byte-wise generated assembly is certainly an "optimization" issue. The error is the shl al,cl instead of mov rax, -1; shl rax, cl, where shift amounts > 31 wrap around due to modulo 32.

While -(ui64_t) 1 looks fine to me, Volker mentions an overflow, which gives the compiler the "freedom" to optimize garbage as happened.
Code: Select all
const ui64_t allBitsSet = 0xffffffffffffffffULL;
ui64_t lo = allBitsSet << bsr64((lmsk->lineLo & occ)|1); 
If that still fails, and it still generates the shl al,cl stuff, it is definitely a compiler bug.
Good morning everyone and hello Gerd,

I already tested "0xffffffffffffffffULL" , but did not look at the asm-file.
So i just repeat it.

Error occurs like in the -(ui64_t) 1 version. and here the corresponding
assembly is.
Code: Select all
	mov	edx, 255				; 000000ffH
	shl	al, cl          ; // is this the point you mentioned Gerd ?  al, lo file = -1 byte wise << bsr64(lo file | 1) ?
	shl	dl, cl

Yep. Buggy compiler.

Sven · Post by **Sven** » Wed Feb 16, 2011 9:45 am

Gerd Isenberg wrote:
Desperado wrote:
Gerd Isenberg wrote:
Desperado wrote:well, whatever that means, but when i disable the compiler optimization
everything is working...
The byte-wise generated assembly is certainly an "optimization" issue. The error is the shl al,cl instead of mov rax, -1; shl rax, cl, where shift amounts > 31 wrap around due to modulo 32.

While -(ui64_t) 1 looks fine to me, Volker mentions an overflow, which gives the compiler the "freedom" to optimize garbage as happened.
Code: Select all
const ui64_t allBitsSet = 0xffffffffffffffffULL;
ui64_t lo = allBitsSet << bsr64((lmsk->lineLo & occ)|1); 
If that still fails, and it still generates the shl al,cl stuff, it is definitely a compiler bug.
Good morning everyone and hello Gerd,

I already tested "0xffffffffffffffffULL" , but did not look at the asm-file.
So i just repeat it.

Error occurs like in the -(ui64_t) 1 version. and here the corresponding
assembly is.
Code: Select all
	mov	edx, 255				; 000000ffH
	shl	al, cl          ; // is this the point you mentioned Gerd ?  al, lo file = -1 byte wise << bsr64(lo file | 1) ?
	shl	dl, cl
Yep. Buggy compiler.

@Michael: I would also propose to remove this line:

Code: Select all

#pragma warning (disable: 4146)

which points again to the "-(ui64_t) 1"

For this reason I also don't think the compiler is "buggy" here. With the warning C4146, given that you don't suppress it, it announces that something unexpected can happen.

Which data type is returned by bsr64()? In case of "unsigned char", this might be part of the problem, if the compiler truncates the left operand of the << operator in the "optimized" version due to the "unsigned char" type of the right operand.

Sven

Desperado · Post by **Desperado** » Wed Feb 16, 2011 10:47 am

Sven Schüle wrote:
Gerd Isenberg wrote:
Desperado wrote:
Gerd Isenberg wrote:
Desperado wrote:well, whatever that means, but when i disable the compiler optimization
everything is working...
The byte-wise generated assembly is certainly an "optimization" issue. The error is the shl al,cl instead of mov rax, -1; shl rax, cl, where shift amounts > 31 wrap around due to modulo 32.

While -(ui64_t) 1 looks fine to me, Volker mentions an overflow, which gives the compiler the "freedom" to optimize garbage as happened.
Code: Select all
const ui64_t allBitsSet = 0xffffffffffffffffULL;
ui64_t lo = allBitsSet << bsr64((lmsk->lineLo & occ)|1); 
If that still fails, and it still generates the shl al,cl stuff, it is definitely a compiler bug.
Good morning everyone and hello Gerd,

I already tested "0xffffffffffffffffULL" , but did not look at the asm-file.
So i just repeat it.

Error occurs like in the -(ui64_t) 1 version. and here the corresponding
assembly is.
Code: Select all
	mov	edx, 255				; 000000ffH
	shl	al, cl          ; // is this the point you mentioned Gerd ?  al, lo file = -1 byte wise << bsr64(lo file | 1) ?
	shl	dl, cl
Yep. Buggy compiler.
@Michael: I would also propose to remove this line:
Code: Select all
#pragma warning (disable: 4146)
which points again to the "-(ui64_t) 1"

For this reason I also don't think the compiler is "buggy" here. With the warning C4146, given that you don't suppress it, it announces that something unexpected can happen.

Which data type is returned by bsr64()? In case of "unsigned char", this might be part of the problem, if the compiler truncates the left operand of the << operator in the "optimized" version due to the "unsigned char" type of the right operand.

Sven

Hello Sven,

ok, i did so.

if we call it now a compiler bug or not (because the compiler is warning) doesnt help me anyway.
The more important thing is now that (big thank you gerd) i know (and understand) what the compiler is doing with my statements.

my return value is unsigned long, because the 64bit release compile uses the _bitscan64() intrinsic.

The following post will provide a proper solution, to avoid just unintended
compiler behaviour.

Michael

Desperado · Post by **Desperado** » Wed Feb 16, 2011 11:08 am

ok, my solution.

Code: Select all

extern __inline ui64_t bitMask(sq_t sq) {return(bMask[sq]);}

Code: Select all


static __inline ui64_t attackLine(linemask_t *lmsk,ui64_t occ)
{
 ui64_t lo = bitMask(bsr64((lmsk->lineLo & occ)|1));
 ui64_t hi = lmsk->lineHi & occ;
 return(lmsk->lineEx & (2*(hi&-hi)-lo));
}

1: avoiding unintended shift operations
===============================

single_bit_masks are from now on simple lookups (avoiding any shiftoperations).
While i used more than once single_bit_mask

like: extern __inline ui64_t bit(int id) {return((ui64_t)1<<id);}

i can easily update my project, with a lookup approach.

2: change in attackLine
==================

using a simple bitmask requires only to substruct lo instead of adding it.

3: compiler/platform
==================

i think other compilers and platforms will also be happy with this
kind of implementation.

4: thx
=====

Thx to everyone for annotations,explanations,proposals.
There are many things to think about, i found in this thread.
There is a lot room to improve my skills.

Ok, a big thx to everyone.

ps: if you even find a lack in my solution, pls let me know.

Michael

Very strange Bug

Re: Very strange Bug

Re: Very strange Bug

Re: Very strange Bug

Re: Very strange Bug

Re: Very strange Bug

Re: Very strange Bug

Re: Very strange Bug

Re: Very strange Bug

Re: Very strange Bug

Re: Very strange Bug