How could a compiler break the lockless hashing method?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

Rein Halbersma wrote:
bob wrote:
This difference between signed and unsigned is arbitrary. It serves no good purpose. It ADDS confusion to the programming process. And it is diametrically opposed to what the hardware does for both cases, which is to handle them exactly the same.
It might be a better world if signed integer overflow was implementation-defined rather than undefined, you'll hear no complaints from me if that was the case. But it isn't. So given that state of affairs, can you answer me why

Code: Select all

if (a + 1 > a)
is any better than

Code: Select all

if (a > INT_MAX - 1)

The former is undefined, and the latter is implementation-defined. Given that piece of knowledge (free of charge), why not use that?
I don't use either one. If I am concerned about overflow, I don't do it. If I can deal with wrap, I do. if a+1 is < 0, or if a > MAXINT-1 what do I do? If it wraps and I am ok with that, I won't do the test. if I do care, I might do the latter. Or I might do the former. Either should work equally well if the hardware is consistent and the compiler is not off in never-never land.

My general opinion, one which for the longest ALL compiler writers agreed with was "a program should produce the SAME results regardless of optimization level chosen. It should just produce the result faster with more optimization." That standard seems to have been changed just a bit.

I don't particularly consider that a "good change". I can remember thousands of test cases used to test the compilers I worked on, doing exactly that test. An optimization should NEVER change program output. If it does, it is defective.

We would never accept such nonsense from speculative/out-of-order execution inside the CPU, where if you run on an original pentium (no OOE) you get different results than if you run on a pentium PRO which added OOE. But it is ok for the compiler, somehow???
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: How could a compiler break the lockless hashing method?

Post by wgarvin »

mvk wrote:
wgarvin wrote:I know. Its like talking to a wall. He just thinks he knows better than everyone else, even when he's repeating the same arguments that have been refuted multiple times already in the same thread.
These discussions always remind me of this.
"The Black Knight always triumphs!"
:lol:

I mean, bob's not the Black Knight. I don't think he does it on purpose, its just the way his personality is. We all have our quirks. My own ego could probably do with some puncturing.

But man is it ever frustrating, when he says something silly and I point out how its silly, and for the next few posts we avoid the silly topic and then a day later we wander right back to it and he says the same thing again. He's been doing that to like 5 different people in these threads, which contain a total of about 600 posts now. I think at least a hundred of those are bob's, and a large fraction of them contain total nonsense. I'm starting to think the time I spent reading and replying to a bunch of them was a complete waste.

I mean, how is it possible that the person who wrote a large, useful C program (Crafty) also believes signed integer overflow should in each case either wrap or be covered up by the compiler, depending of course on "common sense" and the programmer's clear intentions?

He asserts over and over, that the optimizations enabled by UB aren't very useful. I would argue that if they didn't speed up at least some programs, the compiler guys wouldn't bother to implement them. Maybe its all about gaming the SPEC benchmarks, since about half of them are reported to contain integer overflows and other UB. I mean, when Chris Lattner, a respected compiler optimization expert and the primary author of LLVM/clang, says that these optimizations are useful I tend to take that statement at face value.
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: How could a compiler break the lockless hashing method?

Post by rbarreira »

bob wrote: I thought someone had already pointed out that shifting a negative value left is "undefined behavior"?

But if a shift is dangerous, don't do it. Do the imul which works correctly. Or, of course, you can do a shld. BTW what do you know about (say) SAL and SHL? :)
Shifting left is fine, it's right shift that is undefined. Just compile the following program and see the assembly code for yourself:

Code: Select all

#include <stdio.h>

int main &#40;void&#41;
&#123;
        int a,b;

        scanf ("%d", &a&#41;;
        printf ("%d\n", a*8&#41;;
        return 0;
&#125;
bob wrote:

As to your question, if I do that on the arm, I will get a 32 bit value. So what? If I do it on Intel, I get the right value. How many ARMs are there compared to X86 in terms of numbers? Close enough to zero for a statistician to say "zero"? :)
2 billion CPUs in 3 months is not enough for you?

http://news.softpedia.com/news/ARM-Reco ... 6252.shtml
Last edited by rbarreira on Wed Dec 11, 2013 5:09 pm, edited 2 times in total.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

wgarvin wrote:
rbarreira wrote:Bob, I'm starting to wonder if your account has been taken over by someone. Now you're displaying ignorance about basic C and integer numbers in one single post.
I know. Its like talking to a wall. He just thinks he knows better than everyone else, even when he's repeating the same arguments that have been refuted multiple times already in the same thread.
Sorry you feel that way. I simply have my opinion, developed over several years of compiler development. Answer me this:

If you run an ASM program on an original Intel pentium, and then you run the same code on the pentium PRO which supported OOE, and you get DIFFERENT answers, would you consider that acceptable? Why or why not?

If you compile the same source with -O0 and then with -O3 and you get different answers, would you consider that acceptable? Why or why not?

Finally, do your two answers match? If not, why? If both are yes, we probably can't communicate at all. Semantics are semantics. I don't want the computer hardware changing them on the fly just so it can execute things faster. If both are "no" then we seem to be in perfect agreement.

Because today, OOE is guaranteed to produce the exact same results as if the program is executed in-order. But compilers provably do produce different results with -O0 and -O3 as already discussed. I consider that ridiculous. Even more, I consider it poor compiler development.


He gives examples from x86 code of "how compilers ought to do it" that haven't been current for almost 20 years, like his ridiculous imul / idiv example. Come on, bob! There's more than one form of imul instruction in x86 assembly, and there has been for a long time.
RTFM. Why do you think that give us the one operand imul instruction? Because of the possibility of overflow and they give us a solution to prevent it completely between a successive multiply/divide pair. No you don't have to use 'em. And if you don't, you will get the expected 32 bit wrap whether you do imul or mul.

You can use the one you quoted, but it has severe restrictions on which registers it can use. The one compilers usually use is the 32 * 32 -> 32 bit form, because it can be used with any of the general-purpose registers. And also because they know they only need 32 bits of result, since thats how the C language has been specified, since forever. Compilers also sometimes implement the multiply using shifts and adds, and they definitely implement divide by a constant using other instructions besides idiv, because idiv can be slow. All of this has been true for decades.
You make my point. They don't do it the "right" way according to the hardware, they do it the way that is compatible with hardware that doesn't have that option. Even though most existing hardware today is capable of doing so. On most non-x86 boxes that have real registers, if you multiply x times an even-numbered register, you get a even+odd register pair for the product. If you multiply x times an odd-numbered register, you get a single register value that will overflow much sooner than the pair of registers.

But ignoring that, let's take a simple example:

x = 0x40000000 on a 32 bit box, but it is passed in to a function so that the compiler can't see the value.

x * 2 / 2 produces 0x40000000 as expected.

But if you use 0x40000000 * 2 / 2, broken constant folding turns that into 0xc0000000 which is not quite the same. Why does it do them differently? Shouldn't it at LEAST be consistent in mathematical operations. Does "+" imply some different operation when applied to constants vs variables??? If so, why?


Bob's stubbornness in a debate is pretty amazing. I can only imagine the cognitive dissonance it must cause to argue, and believe, such a mix of contradictory and nonsensical positions.
Feel free to cite my "nonsensical position". If you think consistency is nonsensical, I can see why we don't agree. But that doesn't mean my believe that consistency is reasonable to expect is a flawed concept.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:You miss the point completely.

You have been arguing all the time: signed integer overflow should wrap.

So what happens if signed integer overflow wraps? Let's take 16-bit ints to keep the numbers smaller.
x = 20000
x * 2 = -25536
x * 2 / 2 = -12768

So if you want signed int overflow to wrap, then clearly x * 2 / 2 cannot be optimised to x.

If you want signed int overflow to wrap, then preventing overflow, as you are now suggesting the compiler should do, is simply very clearly dead wrong.
Sorry, but do you know x86 asm?

movw $20000, %ax
imulw 2
idivw 2

result = 20000

Exactly what I expected. No idea what machine you are running on, but on x86, multiply 2 16 bit ints, you get a 32 bit int; Then the divide hopefully reduces it back to a 16 bit int, otherwise the overflow flag is set. Either way, you get the right answer, or a wrapped answer, not a undefined answer.

I have no intention of writing x = a * b / c and hope that a*b wraps before the divide. I actually do normal math.

But if I do something like a++ and a wraps, I expect that if I do it enough times, or if a starts off large.

I DO expect that (a * b) / c == a * (b / c) which x86 is capable of doing quite correctly. I don't want the operator functionality changed. I just want the compiler to stay out of it.
Do you have any clue what you are asking from the compiler when you write x * 2 / 2?

If x is a signed int, and you want signed overflow to wrap, which so far you have always wanted, then what you are asking is to get -12768 when x = 20000 and ints are 16-bit.

If you do not understand this, you should not touch a compiler.
What I am asking of the compiler is "get out of the way and let the hardware do what it will do." Don't try to make assumptions.

x * 2 / 2 is done correctly. The compiler can replace it with x if it wants, or it can do the multiply/divide which STILL produces x. But the compiler comes up with a different answer when IT chooses what should happen. That I do not like.

Most of the time I do NOT want overflow to occur, and I tend to take care of the numbers that I use so that I am sure it won't. But if it does, I would like ONE model to have to deal with. Not (a) what the hardware reference says and (b) what the compiler says. PARTICULARLY when "what the compiler says" is inconsistent, as in this case a constant, or a variable with the same value, produce two DIFFERENT answers. That doesn't seem so hot. Regardless of the definition of undefined behavior. I worked on multiple compiler projects over the years. Probably the biggest was with Texas Instruments back in the 70's with their supercomputer, the ASC. Our final regression test suite was automated, and compiled several thousand different programs, with optimization disabled, and with it enabled, and we had to produce the same results regardless. Apparently that standard of compiler writing is a lost art since we certainly don't get that today...
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: How could a compiler break the lockless hashing method?

Post by Rein Halbersma »

bob wrote:
Rein Halbersma wrote:
bob wrote:
This difference between signed and unsigned is arbitrary. It serves no good purpose. It ADDS confusion to the programming process. And it is diametrically opposed to what the hardware does for both cases, which is to handle them exactly the same.
It might be a better world if signed integer overflow was implementation-defined rather than undefined, you'll hear no complaints from me if that was the case. But it isn't. So given that state of affairs, can you answer me why

Code: Select all

if &#40;a + 1 > a&#41;
is any better than

Code: Select all

if &#40;a > INT_MAX - 1&#41;

The former is undefined, and the latter is implementation-defined. Given that piece of knowledge (free of charge), why not use that?
I don't use either one. If I am concerned about overflow, I don't do it. If I can deal with wrap, I do. if a+1 is < 0, or if a > MAXINT-1 what do I do? If it wraps and I am ok with that, I won't do the test. if I do care, I might do the latter. Or I might do the former. Either should work equally well if the hardware is consistent and the compiler is not off in never-never land.
How do you know if a is close to INT_MAX? By doing a test. You do "a + 1 > a" which is undefined. You could either flag -fwrapv to your compiler, or use "a > INT_MAX - 1", either of which make it implementation defined and since you know your hardware, you're fine.

Doing the former test when a safe and equivalent test exist is a clear sign of stubborness. Good luck and sweet dreams.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: How could a compiler break the lockless hashing method?

Post by Rein Halbersma »

bob wrote: Sorry you feel that way. I simply have my opinion, developed over several years of compiler development.
Objection, facts not in evidence. Please show us which conforming C compiler you actually developed? Or any other compiler that was used by anyone except yourself, your students or coworkers.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How could a compiler break the lockless hashing method?

Post by bob »

hgm wrote:
syzygy wrote:You want to use undefined behavior? You get undefined behavior...
Wrong! What I want it that behavior is not undefined at all, when the hardware perfectly defines it. If there is to be any UB from an unambiguous instruction to the hardware what to do, it should be that the compiler writers get roasted over a low fire!

And indeed, I think the compiler writers are more to blame than the standard committee, although they definitely share blame. The compiler writers are malicious, while the standard committee was just naive, in not realizing they had to make the standard fool-proof to malicious interpretation and abuse.

The argument that this helps to optimize X*2/2 to X is a non-argument. No one would write that if they want it to mean X. If I ever would write anything like that it would be to exploit the hardware's response to overflow (such as forcing a trap), and replacing it by X would be the last thing I wanted.

As it turns out the standard is not adequate in a world where compiler writers are malicious. It should be repaired, my removing any UB according to the current definition, and replace it by 'hardware-defined behavior'.
The people here that are involved are simply not up to grasping that requirement. All they can do is repeat "undefined behavior" over and over and over. As if undefined behavior gives the compiler free reign to do anything it wants. I believe it is supposed to precisely follow my semantics, without inserting or removing extra crap. I believe that a source compiled with -O0 should produce EXACTLY the same results as a source compiled with -O3, otherwise the optimizer is broken and that is unacceptable. That we have people arguing against that, yet they DEMAND that OOE processors produce the same results as if the program were executed in the order it is written just boggles the mind. I am SURE intel would have liked to remove a few constraints to improve IPC in current processors. Say "OK, the source won't be written to for N cycles, so I will go ahead and issue this instruction now, ignoring the RAW hazard. So what if it produces the wrong answer, it will produce it faster. And much of the time it might not hurt either...

Who would scream first? These "undefined behavior" supporters.
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: How could a compiler break the lockless hashing method?

Post by rbarreira »

bob wrote: What I am asking of the compiler is "get out of the way and let the hardware do what it will do." Don't try to make assumptions.
That is not a reasonable request of you, because not only is there different hardware out there but even on x86 there are many different ways to calculate "x * 2":

- the compiler can use imul
- the compiler can use add
- the compiler can use shl
- the compiler can use lea
- the compiler could conceivably use SIMD (SSE) instructions, if it's multiplying several values in a loop.

Why do you assume only one of those is a valid response by the compiler?
Last edited by rbarreira on Wed Dec 11, 2013 5:30 pm, edited 2 times in total.
User avatar
hgm
Posts: 27814
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: How could a compiler break the lockless hashing method?

Post by hgm »

wgarvin wrote:He asserts over and over, that the optimizations enabled by UB aren't very useful. I would argue that if they didn't speed up at least some programs, the compiler guys wouldn't bother to implement them.
Well, this remains to be seen. Although it was really the topic of another thread, I completely fail to see who would benefit from translating a strcpy() with overlapping source/destination as abort(). It seems to hurt everyone. It does not only break backward compatibility with programs written for older standards, but it also seems to seriously hurt everyone that religiously sticks to the new standard. As it requires wasting time on a non-trivial overlap test, requiring run-time determination of the length of the string, before it can start copying it.

The sole motivation for making overlapping copy areas UB was actually to avoid the necessity for such tests. Given that the tests will now always be done anyway, it would have much better to specify that strcpy should work no matter what. People would get more for the same performance. The Apple implementation is intentionally malicious towards everyone.

So the Apple strcpy case demonstrates that the current standard can be and will be too easily abused to inflict global and universal suffering, and needs to be rephrased to make such practices impossible.