Compiler Problem

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: Compiler Problem

Post by rbarreira »

bob wrote: That begins to sound like either a variable that is uninitialized, or a local array subscript that goes out of bounds. If a printf fixes the problem, the only change a printf has is that it will alter the stack since it is a library call. And altering the stack can change a value you might get on a bad array index.
I disagree. Adding printfs (or anything really) can trigger all sorts of conditions on the compiler which may obscure the bug in many different ways.

A few examples of what adding a simple printf can affect:

1- It may change the register allocation behavior.
2- It affects the code size, potentially triggering different heuristic behavior for the inlining of functions.
3- It may cause a spill of a value to memory which fixes the bug.
4- It may affect the instruction ordering. (instruction ordering bugs are quite difficult to reproduce while one is changing the code)

When I find what appears to be a compiler bug, I try to isolate and reduce the bug-causing code as much as possible. Otherwise it's a nightmare to look through the assembly code, and no compiler vendor would take my bug reports seriously if I didn't produce a small test case (since I'm not paying big bucks for intensive product support).
LoopList

Re: Compiler Problem

Post by LoopList »

Mincho Georgiev wrote:If the class sort_c s; was initialized globally, the issue would most likely be gone. I've seen the same behavior with exact same compiler version and /O2 flag. In my case was improper preserving of the registers inside a block. Here the case might be different, but just on first sight it looks the same. Probably it is a compiler bug after all.
Many inline functions in the code - posted by me - are nested. I noticed the problem within the following position:

r1bq1bnr/ppp1p1pp/2n5/6k1/4pP2/8/PPPP2PP/RNB1KB1R b KQ f3

The move e4f3 is a quite rare check evasion for Black. And the code here is mutiple nested by calling several small inline functions. By disabling inlining everything works well.

I reported the problem to MS with 50 lines of code (the same I posted here). Perhaps I'll receive an answer. For the moment I continue to development via VC 2008.

Fritz
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Compiler Problem

Post by wgarvin »

rbarreira wrote:
bob wrote: That begins to sound like either a variable that is uninitialized, or a local array subscript that goes out of bounds. If a printf fixes the problem, the only change a printf has is that it will alter the stack since it is a library call. And altering the stack can change a value you might get on a bad array index.
I disagree. Adding printfs (or anything really) can trigger all sorts of conditions on the compiler which may obscure the bug in many different ways.

A few examples of what adding a simple printf can affect:

1- It may change the register allocation behavior.
2- It affects the code size, potentially triggering different heuristic behavior for the inlining of functions.
3- It may cause a spill of a value to memory which fixes the bug.
4- It may affect the instruction ordering. (instruction ordering bugs are quite difficult to reproduce while one is changing the code)

When I find what appears to be a compiler bug, I try to isolate and reduce the bug-causing code as much as possible. Otherwise it's a nightmare to look through the assembly code, and no compiler vendor would take my bug reports seriously if I didn't produce a small test case (since I'm not paying big bucks for intensive product support).
Adding printf can also change inlining decisions in the surrounding code. Since turning off inlining hides the bug, and the bug is for a recent version of a 64-bit compiler, I think its at least plausible that it is a compiler bug.

Since he has actually reduced the code to something fairly small already, I hope Fritz will post the assembly that the compiler generates so that we can have a look at it. It might not help, or there might turn out to be an obvious bug in the generated code.

Just last year I saw a compiler bug in an MS compiler where it had an inlined function which was passed a reference to a primitive int and decremented it (it was something to do with reference counting). Its been a while and I forget the details, but based on the surrounding code, the compiler decided that the value was a constant and completely replaced the read-decrement-write logic with an instruction that stored a constant into the variable, causing a slow memory leak in our application *only in our LTCG-optimized builds*. Since the if-expression became a "constant", the code for that condition also got optimized out. Imagine our surprise when we looked at the generated code for an inlined "AddRef/Release" and discovered that the "Release" did nothing but store 1 into the reference count! Another instance of the same bug caused the app to crash by calling the wrong cleanup function on a data structure, because it generated bad code in the function which figured out which cleanup function to apply, and that one occurred in all of our builds except the debug ones. I think we worked around these bugs by putting our "noinline" macro on the methods involved.

Anyway... if changing relatively safe compiler options (like inlining) can change the program's behaviour, then you have only three choices:

(1) You're relying on undefined behaviour, which is always bad. Breaking aliasing rules or pointer arithmetic/casting rules are common examples. Reading uninitialized variables or trashing your stack due to a buffer overrun would also fall into that category. Technically if you invoke undefined behaviour, the compiler can do anything it wants including printing "PIGEON!" in giant letters and then exiting. Some compilers will even generate code to deference a constant NULL pointer ("crash now").
(2) You're relying on implementation-defined behaviour, which is not as bad as undefined but is still worth avoiding if you can (but there are many things here which are safe on all reasonable implementations, such as memsetting pointers with zero bytes, or relying on integer arithmetic being 2's complement). Or..
(3) The code is correct and its a genuine compiler bug! The rarest of the three, but it does occasionally happen.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Compiler Problem

Post by wgarvin »

I went back and looked at the code which triggered the compiler bug I was thinking of last year, and it was worse than I remembered. The code pattern was like this:

Code: Select all

void SomeClass::ReleaseRef()
{
    int variable = 0;
    someObject.GetReferenceCount(variable);   // (A)  param is an int&
    // do unrelated stuff
    someObject.SetReferenceCount(--variable);   // (B)

    if (variable == 0)
    {
         // release someObject resources here
    }
}
The compiler decided that "variable" had a known constant (zero) as its value on line B. It eliminated the read of the reference count completely, on line A. It replaced the decrement and store, with a store of a constant value (which I guess was 0xFFFFFFFF). We had a similar code pattern in lots of places and as far as we could determine, only 2 of them were miscompiled, and one of those was compiled correctly unless LTCG was on. That one was very much like the code above; I suppose that due to how the code was laid out, GetReferenceCount and SetReferenceCount were not able to be inlined without the LTCG option. A co-worker used a memory tracking tool to figure out where the leak was occurring, and because he already suspected the compiler, he looked at the generated code in the debugger and was able to find the bad codegen pretty quickly.

Anyway, the moral of the story is that even solid compilers occasionally contain bugs. Over the last few years, I've encountered about one compiler bug per year from various production compilers.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Compiler Problem

Post by bob »

rbarreira wrote:
bob wrote: That begins to sound like either a variable that is uninitialized, or a local array subscript that goes out of bounds. If a printf fixes the problem, the only change a printf has is that it will alter the stack since it is a library call. And altering the stack can change a value you might get on a bad array index.
I disagree. Adding printfs (or anything really) can trigger all sorts of conditions on the compiler which may obscure the bug in many different ways.

A few examples of what adding a simple printf can affect:

1- It may change the register allocation behavior.
2- It affects the code size, potentially triggering different heuristic behavior for the inlining of functions.
3- It may cause a spill of a value to memory which fixes the bug.
4- It may affect the instruction ordering. (instruction ordering bugs are quite difficult to reproduce while one is changing the code)

When I find what appears to be a compiler bug, I try to isolate and reduce the bug-causing code as much as possible. Otherwise it's a nightmare to look through the assembly code, and no compiler vendor would take my bug reports seriously if I didn't produce a small test case (since I'm not paying big bucks for intensive product support).
While all of that is well and good, I have not seen more than a half-dozen true compiler errors in the last 15 years of working on Crafty. Most of those had to do with long long support when it was fairly uncommon for anyone to use it.

I've not seen register issues at all, because the PC has so few, those kinds of issues are rare in my experience.

I'd still suspect local memory accesses as the most likely candidate, particularly for a pretty mature compiler like MSVC.
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: Compiler Problem

Post by rbarreira »

bob wrote: I'd still suspect local memory accesses as the most likely candidate, particularly for a pretty mature compiler like MSVC.
This year I already found one bug in gcc and one in icc... there are new ones reported every day.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Compiler Problem

Post by bob »

rbarreira wrote:
bob wrote: I'd still suspect local memory accesses as the most likely candidate, particularly for a pretty mature compiler like MSVC.
This year I already found one bug in gcc and one in icc... there are new ones reported every day.
Correct, but only one out of every 1,000 "reported compiler bugs" is a real compiler bug...
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: Compiler Problem

Post by rbarreira »

bob wrote:
rbarreira wrote:
bob wrote: I'd still suspect local memory accesses as the most likely candidate, particularly for a pretty mature compiler like MSVC.
This year I already found one bug in gcc and one in icc... there are new ones reported every day.
Correct, but only one out of every 1,000 "reported compiler bugs" is a real compiler bug...
In that case I must have been extremely lucky (unlucky?), because both of them were reproduced. Here's the icc one, which incidentally is an instruction ordering bug which easily disappears when the surrounding code changes:

http://software.intel.com/en-us/forums/ ... hp?t=74776
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Compiler Problem

Post by Gian-Carlo Pascutto »

In that case I must have been extremely lucky (unlucky?)
I'd say yes because I agree with Bob: the majority of these cases are really programmer errors. Particularly with popular compilers on x86, the odds of finding wrong code generation are slim. (The odds of actually crashing the compiler tend to be better)

If you use more offbeat stuff, the chances increase. In the case you posted, you're using a brand new intrinsic, so maybe not that surprising.

That said, the original post here does look like a real compiler problem. Better wait for MSVC2010 SP1, it seems.
LoopList

Re: Compiler Problem

Post by LoopList »

Gian-Carlo Pascutto wrote:
In that case I must have been extremely lucky (unlucky?)
I'd say yes because I agree with Bob: the majority of these cases are really programmer errors. Particularly with popular compilers on x86, the odds of finding wrong code generation are slim. (The odds of actually crashing the compiler tend to be better)

If you use more offbeat stuff, the chances increase. In the case you posted, you're using a brand new intrinsic, so maybe not that surprising.

That said, the original post here does look like a real compiler problem. Better wait for MSVC2010 SP1, it seems.
MS is evaluating the problem and we will be informed within the next two weeks. I hope :)

Fritz