How could a compiler break the lockless hashing method?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27811
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: How could a compiler break the lockless hashing method?

Post by hgm »

Well, what seems to be missing is gradations of desirability. Run-time abortion can be acceptable as a last resort, and definitely undesirable as an 'easy way out'.

Specifying that strcpy() with overlapping areas leads to UB was only useful because it would free the strcpy() implementation from the burden to test for such overlaps, and thus allow it to be fast. Having it test anyway for the overlap, in order to abort, destroys that advantage completely. It hurts everyone, and basically makes the library implementation unusable by those that want efficiency for their code that did not involve copying to overlapping areas at all.

The argument on performance benefits is a bit shaky: loops suitable for vectorization typically have simple tests for their upper bounds, and from the increment and test expressions it can be very easily determined whether there could occur any overflow or not. for(i=0; i<1000; i+=2) will obviously never cause integer overflow, and can be optimized and GPU'ed at the heart's content of compiler writers in the absence of any dubious assumptions. for(i=1; i>0; i += i) can equally obviously be seen to always cause overflow. If an optimizer is clever enough to prove by induction that i will always be >= 1 by adding values >= 1 to it, it can also be clever enough to realize that 'optimizing' the i>0 test away will cause an infinite loop, and thus integer overflow when you infinitely many times add a value >= 1 to the loop counter. I don't understand the desire to 'optimize' the termination condition of a loop away at all. Why would the programmer write an expression there that can be optimized away? This points to a programmer error, not to a good opportunity to perform an optimization. Typically it should lead to warnngs like "statement has no effect", or "due to value range of type comparison is always true". And this is what you would expect if it is 100% sure that the construct can be optimized away, rather than merely suspecting it on the basis of a fictional assumption. If you can do something obviously crazy based on an assumption that might not hold true, the proper thing to do is abandon the assumption, rather than common sense.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

Rein Halbersma wrote:
bob wrote:The optimization about a + 1 > a being optimized into "1" is an unsafe one.
No. The use of a + 1 > a for signed int a is unsafe. The optimization for sure will not help this single line very much, but the optimizer rules in a compiler are written for an entire universe of correct code, not for the hopefully soon extinct subset of non-conforming and unmaintained code.

The whole "the compiler is malicious" line of reasoning is based on a fundamental misunderstanding. It seems that you and HG regard C as a mapping of the hardware to C programming constructs, instead of the other way around. The reason that some program constructs in C lead to undefined behavior is that the mapping of C to hardware is not one-to-one. Some things cannot be defined in a portable way because of the enormous diversity in hardware (memory models, bus-width, cache coherence etc.).
\

Here's the problem with that explanation. The mapping of C to each different architecture is unique anyway. No two machines have the same instruction set, register set, etc as another except for machines within the same family (X86 for example). So the compiler already has to map a single source to many different architectures. Why can't it do as it has always done in the past and map the C to the best instruction mix possible? For example, the x * 2 / 2 example does NOT overflow on x86 if done right, and it produces X every time, exactly as it should, no clever optimizer needed. So why not do that. The argument seems to be "but then it is non-portable because it won't work on other architectures that don't have the two-register destination for multiply." In reality MOST architectures do this anyway. But since the C standard says "this is undefined behavior" we reach the absolute mess of today. Rather than making it work where it will work correctly, let's make it fail EVERYWHERE we possibly can, intentionally, by doing unsafe optimizations that have very little effect on real program speed anyway. Crafty with or without -fwrapv shows no speed change whatsoever, so it doesn't bother me at all to omit the speculative optimizations that can break under circumstances that are perfectly valid on that specific platform.

I wrote this again last night, but I think it is still important. C was designed to let us use a high-level language but get as close to the hardware as physically possible. The old "register" keyword is an example. But now, even though X86 handles overflow in a perfectly defined way, I can't safely use it. Because someone decided that since this won't work correctly on all architectures (even though it certainly will on anything available today) we are not going to allow it to work correctly on ANY architecture.

If I wanted a C for dummies compiler, I'd buy one. I want a "C for experts" that lets me do what I want, without looking over my shoulder and saying "That's dangerous, I'm breaking it intentionally..."
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

hgm wrote:Well, what seems to be missing is gradations of desirability. Run-time abortion can be acceptable as a last resort, and definitely undesirable as an 'easy way out'.

Specifying that strcpy() with overlapping areas leads to UB was only useful because it would free the strcpy() implementation from the burden to test for such overlaps, and thus allow it to be fast. Having it test anyway for the overlap, in order to abort, destroys that advantage completely. It hurts everyone, and basically makes the library implementation unusable by those that want efficiency for their code that did not involve copying to overlapping areas at all.

The argument on performance benefits is a bit shaky: loops suitable for vectorization typically have simple tests for their upper bounds, and from the increment and test expressions it can be very easily determined whether there could occur any overflow or not. for(i=0; i<1000; i+=2) will obviously never cause integer overflow, and can be optimized and GPU'ed at the heart's content of compiler writers in the absence of any dubious assumptions. for(i=1; i>0; i += i) can equally obviously be seen to always cause overflow. If an optimizer is clever enough to prove by induction that i will always be >= 1 by adding values >= 1 to it, it can also be clever enough to realize that 'optimizing' the i>0 test away will cause an infinite loop, and thus integer overflow when you infinitely many times add a value >= 1 to the loop counter. I don't understand the desire to 'optimize' the termination condition of a loop away at all. Why would the programmer write an expression there that can be optimized away? This points to a programmer error, not to a good opportunity to perform an optimization. Typically it should lead to warnngs like "statement has no effect", or "due to value range of type comparison is always true". And this is what you would expect if it is 100% sure that the construct can be optimized away, rather than merely suspecting it on the basis of a fictional assumption. If you can do something obviously crazy based on an assumption that might not hold true, the proper thing to do is abandon the assumption, rather than common sense.
I think there is a different "mindset" than what you are thinking of, one that is not completely rational. Rather than asking "will this work in all reasonable cases?" it seems to be a question of "is there any possible scenario where this can cause a problem? If so, we are free to either break it, complain about it, remove it completely, or crash. Assuming A+1 > A will be true in almost every case. And by almost every case, I'd bet we are talking about one exception out of trillions of actual loops that are optimized. So let's break some of the ones that would work correctly, plus that one that is problematic, rather than just letting that one wreck itself. There are a TON of possible ways to screw up in programming. Adding a large float to a small float, for example, where you can only keep 7 significant digits. Adding .000000001 to 100.0 does exactly nothing. As it should. Yet the compiler doesn't mind letting THAT turn into an infinite loop since the first value is really not getting incremented. If you keep the FP value on the stack, the above will actually work since the stack uses 80 bits. If you have to spill it to memory, it fails because you have to truncate it back to IEEE-32bit. A huge hole that many programmers fall into, professional AND student, yet the compiler stays out of the way. But not in other cases.
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: How could a compiler break the lockless hashing method?

Post by rbarreira »

bob wrote:For example, the x * 2 / 2 example does NOT overflow on x86 if done right, and it produces X every time, exactly as it should, no clever optimizer needed. So why not do that.
Because that would mean changing the standard to fuck up the whole type system. If "x" is a 32-bit int then "x * 2" is also an int, not a 64-bit number.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

AlvaroBegue wrote:
bob wrote:But as an example, let's take our good friend x86.

Done the "normal way" x * 2 / 2 will never overflow. It is trivial to prevent.

mov eax, x
imul 2
idiv 2

end up with x, thanks to imul putting product low order 32 bits in eax, and the high order bits in edx. The idiv takes it right back down to x.

Of course, if they choose to get "cute" and do the sloppier form that doesn't use a register pair to prevent overflow, then they just made a poor choice. By doing the right thing, they get the right answer. If they assume x * 2 / 2 = x, they are exactly right. And all is well.
So you think compilers should minimize surprises for the user, and yet you expect these functions to produce different answers:

Code: Select all

int f1&#40;int x&#41; &#123;
  return x * 2 / 2;
&#125;

int f2&#40;int x&#41; &#123;
  int y = x * 2;
  return y / 2;
&#125;
Or do you expect them to produce the same answer as long as nobody uses `y' for anything else?

Why would those produce different answers? The asm (optimized) should look identical. In the above code, y is local. A good compiler would never even use it, and produce the same code as above. GCC for example simply turns that into "return x" by the way, you can check it for yourself. So you don't get different answers at all.


I think at this point you don't even know what your position is.
It would help if you knew what "state of the art" really is, don't you think? You give an example to make a point, but you make MY point because it does it correctly. But let the compiler see that x=0x40000000 and the compiler will change its behavior. gcc for the second example returns the wrong value:

here's good:

movl %edi, %eax
ret

Here's bad (40000000):

movl $-1073741824, %eax
ret

In the first case, it just copied input to output, optimizing away the *2/2 stuff. In the second, since it actually saw the 0x40000000, it broke the code. Seem reasonable?

You thought the first would not work. It works perfectly, in fact. You'd probably expect the second to work if the first worked. I would. It fails brilliantly. All because the compiler is brain dead. If it took either piece of source and just did "normal" optimization, it would do as follows:

1. hmmm... the y variable is local, not used, so I can translate

y = x * 2;
return y/2

into

return X * 2 / 2;

immediately.

Now I see 2/2 and by a simple constant-folding operation, I can turn that into 1 at compile time, since a constant divided by a constant will always be a constant, and I get

return x;

works just fine.

Unless it INTENTIONALLY breaks the code.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

syzygy wrote:
bob wrote:
wgarvin wrote:For some optimizations they don't have to know it occurs. When there's a program construct for which some possible executions have defined behavior and others have UB, they can ignore the UB possibility and just translate it into something that works correctly for the defined cases. If they always had to handle the undefined cases too they would often be forced to generate less efficient code. e.g. for a signed int X, they will translate "X*2 / 2" into "X" even though this will change the value of the expression if integer overflow occurred during the multiply expression X*2.

This was one of a bunch of examples given in this blog post by Chris Lattner that I linked, and which you claimed to have read:
Signed integer overflow: If arithmetic on an 'int' type (for example) overflows, the result is undefined. One example is that "INT_MAX+1" is not guaranteed to be INT_MIN. This behavior enables certain classes of optimizations that are important for some code. For example, knowing that INT_MAX+1 is undefined allows optimizing "X+1 > X" to "true". Knowing the multiplication "cannot" overflow (because doing so would be undefined) allows optimizing "X*2/2" to "X". While these may seem trivial, these sorts of things are commonly exposed by inlining and macro expansion. A more important optimization that this allows is for "<=" loops like this:

for (i = 0; i <= N; ++i) { ... }

In this loop, the compiler can assume that the loop will iterate exactly N+1 times if "i" is undefined on overflow, which allows a broad range of loop optimizations to kick in. On the other hand, if the variable is defined to wrap around on overflow, then the compiler must assume that the loop is possibly infinite (which happens if N is INT_MAX) - which then disables these important loop optimizations. This particularly affects 64-bit platforms since so much code uses "int" as induction variables.

It is worth noting that unsigned overflow is guaranteed to be defined as 2's complement (wrapping) overflow, so you can always use them. The cost to making signed integer overflow defined is that these sorts of optimizations are simply lost (for example, a common symptom is a ton of sign extensions inside of loops on 64-bit targets). Both Clang and GCC accept the "-fwrapv" flag which forces the compiler to treat signed integer overflow as defined (other than divide of INT_MIN by -1).
I did read that. Had read it previously in fact.

But as an example, let's take our good friend x86.

Done the "normal way" x * 2 / 2 will never overflow. It is trivial to prevent.

mov eax, x
imul 2
idiv 2

end up with x, thanks to imul putting product low order 32 bits in eax, and the high order bits in edx. The idiv takes it right back down to x.

Of course, if they choose to get "cute" and do the sloppier form that doesn't use a register pair to prevent overflow, then they just made a poor choice. By doing the right thing, they get the right answer. If they assume x * 2 / 2 = x, they are exactly right. And all is well.
You miss the point completely.

You have been arguing all the time: signed integer overflow should wrap.

So what happens if signed integer overflow wraps? Let's take 16-bit ints to keep the numbers smaller.
x = 20000
x * 2 = -25536
x * 2 / 2 = -12768

So if you want signed int overflow to wrap, then clearly x * 2 / 2 cannot be optimised to x.

If you want signed int overflow to wrap, then preventing overflow, as you are now suggesting the compiler should do, is simply very clearly dead wrong.
Sorry, but do you know x86 asm?

movw $20000, %ax
imulw 2
idivw 2

result = 20000

Exactly what I expected. No idea what machine you are running on, but on x86, multiply 2 16 bit ints, you get a 32 bit int; Then the divide hopefully reduces it back to a 16 bit int, otherwise the overflow flag is set. Either way, you get the right answer, or a wrapped answer, not a undefined answer.

I have no intention of writing x = a * b / c and hope that a*b wraps before the divide. I actually do normal math.

But if I do something like a++ and a wraps, I expect that if I do it enough times, or if a starts off large.

I DO expect that (a * b) / c == a * (b / c) which x86 is capable of doing quite correctly. I don't want the operator functionality changed. I just want the compiler to stay out of it.
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: How could a compiler break the lockless hashing method?

Post by rbarreira »

bob wrote:
Sorry, but do you know x86 asm?

movw $20000, %ax
imulw 2
idivw 2

result = 20000

Exactly what I expected. No idea what machine you are running on, but on x86, multiply 2 16 bit ints, you get a 32 bit int; Then the divide hopefully reduces it back to a 16 bit int, otherwise the overflow flag is set. Either way, you get the right answer, or a wrapped answer, not a undefined answer.

I have no intention of writing x = a * b / c and hope that a*b wraps before the divide. I actually do normal math.

But if I do something like a++ and a wraps, I expect that if I do it enough times, or if a starts off large.

I DO expect that (a * b) / c == a * (b / c) which x86 is capable of doing quite correctly. I don't want the operator functionality changed. I just want the compiler to stay out of it.
Bob, I'm starting to wonder if your account has been taken over by someone. Now you're displaying ignorance about basic C and integer numbers in one single post.
Last edited by rbarreira on Wed Dec 11, 2013 2:58 pm, edited 2 times in total.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

Rein Halbersma wrote:
bob wrote: I've been programming since 1968. I've used integer overflow so many times (ever seen a linear congruential PRNG as just one example) and they have worked flawlessly. Now some compiler guys have decided that since it is undefined, even though it is an established and working programming paradigm, they can break it in any way they choose; throw the code out, make it abort, write an error warning out, or make demons fly out of your nose. All with something that has been used for so long it is not funny, and something that would work PERFECTLY still if the compiler guys had just left it alone.
And Michael Schumacher has been driving at 200 mph since he got out of his diapers. That doesn't make it OK for him to do that in a residential zone. Maybe in your neck of the woods you get away with it, on account of the sheriff knowing you from Saturday night drag racing. But try and do that in some village in Texas, and Chuck Norris will roundhouse kick you in the nasal passage so that demons will come flying out of said passage... :twisted:
I don't drive 200 in a residential area. But using YOUR logic, I can't drive 200mph ANYWHERE. Not on a race track. Not on a closed course. Not on the Autobahn. Not in the Bonneville salt flats. Nowhere. Yet there are places where it can be done legally and "safely". Going that fast ANYWHERE is dangerous. Do you choose to make it impossible to exceed 70mph even at the drag strip, so that "everybody remains safe" or can drivers knowingly accept the risk and do what they want, where they don't pose a risk to others?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

rbarreira wrote:
bob wrote:For example, the x * 2 / 2 example does NOT overflow on x86 if done right, and it produces X every time, exactly as it should, no clever optimizer needed. So why not do that.
Because that would mean changing the standard to f*** up the whole type system. If "x" is a 32-bit int then "x * 2" is also an int, not a 64-bit number.

On every machine I have programmed on in last 20 years, a 32 bit multiply produces a 64 bit result using two registers. You pick the machine. Vax. Sparc. X86. Etc. EVERYONE is aware of the potential problem dealing with the mathematical equality

(a * b) / c == a * (b / c)

so long as you ignore integer truncation.

Feel free to explain how the simple multiply fix everyone uses "f***'s up the whole type system". It has been used for 30 years or longer. The compiler does it RIGHT so long as it can't see the actual values to (apparently) say to itself "Aha, integer overflow is possible, I'm going to screw this up on purpose." If the compiler does the instructions normally, it works perfectly. You seem to think it fails most of the time. It does NOT.

In fact, in only fails when the compiler starts mucking around and "recognizes" an overflow that it can intentionally screw up.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

wgarvin wrote:Fine. You win the discussion, cuz I'm tired of going in circles over the same ground, over and over.

Its been a lot of fun to discuss, but I'll probably stop reading these threads so I don't get nerdrage after reading something nonsensical and feel obligated to respond. That happened to me several times lately, and I've been using entirely too much emphasis in some of those posts, and it just wears me out.

I'll try not to forget that Crafty is officially unsupported on everything except x86/x64. :roll:

Hopefully there will always be a compiler around that can mostly do what you want, even if most of it is not part of the C language.

I do share your wistful dream, of a better future where the compiler just faithfully translates anything I write to the underlying machine model AND gives me spectacularly good optimized code performance, for which I should certainly not have to give up my own unsound programming habits. (Long live -fno-strict-aliasing, and they can pry reinterpret_cast<> from my cold, dead fingers! :lol:) I don't know how we get there from here, though.
Crafty works everywhere. That was my point. Integer overflow is not a problem on ANY existing hardware. But the compiler makes it a problem, if I can see that it happens through the use of a constant. Without a constant, the compiler produces perfect code, and everything works just fine. All that is necessary to get from here to there for me is for the compiler to stay out of the way. converting the conditional a + 1 > 1 directly into "1" is bogus. Because ALL computers have a fixed word length, and ALL will overflow and wrap somewhere. Let it happen. Quit pretending it doesn't happen. And quit using optimizations that REQUIRE that it not happen. A computer has a certain set of limitations, where word length for integer values is one of them.