wgarvin wrote:bob wrote:
Of course, if they choose to get "cute" and do the sloppier form that doesn't use a register pair to prevent overflow, then they just made a poor choice. By doing the right thing, they get the right answer. If they assume x * 2 / 2 = x, they are exactly right. And all is well.
Their assumption might break on a machine that doesn't do the two-register product, but most machines over the years have had that as an option. VAX used even/odd registers like that if you wanted. Etc. But again, if they do the right thing, we all will get what we expect.
See, here we have the problem. None of the things you're talking about have anything to do with the C language.
In C, the product of a signed int with a signed int is again
a signed int. See, with your 32-bit-int compiler you have there, if the result doesn't fit in that 32-bit signed integer, that's called integer overflow, and its undefined behavior which means you are
never allowed to do it if you want your C program to have any predictable meaning according to the language standard. So if you happen to "get what you expect" from that construct, either your C compiler has nicely extended the language semantics to give it to you, or
you were just lucky that it happened to do what you wanted. There's nothing in the C language requiring it to do the same thing next time.
You're a professor of computer science, you're supposed to know this stuff. You keep claiming that you do, and then you go on to ramble about how some ancient piece of hardware worked or how a compiler should just "do the right thing" when you feed it code with undefined semantics. I'm just shaking my head in confusion and dismay over here.
X86 is ancient hardware? X86_64 is ancient hardware? I miss exactly what you are talking about. X86/X86_64 is about 99.99999% of the computer processors on planet earth. They can do this flawlessly. So why can't C? I don't care about the underlying "abstract machine". At SOME point the compiler actually has to produce code for a specific machine. Why would it not produce code that would make that "undefined result" go away at least for multiplies using the double-reg trick, or for add/sub using normal x86 instructions? Given the choice of a more restrictive instruction that both loses part of the power of the actual processor and which also causes the "undefined behavior" for the multiply example, why not emit instructions that eliminate a few of those? Just because they want to chant "undefined behavior, must have undefined behavior" over and over???
I understand the issues. I understand the architecture. And I understand the "right thing" to do when converting a C to asm, just as surely as when I hand-code something in asm I use whatever the hardware offers to make the results as accurate/fast as possible. The current compiler guys COULD do more of that. But they take "undefined behavior" as an "unlimited license to mangle programs" in an effort to inflict an optimization that is barely worthwhile, while breaking programs that are written by programmers that know what they are doing.
I don't see a THING wrong with writing a C program that is targeted specifically to X86_64. I don't care if there are things on X86_64 that won't work on x86 (64 bit adds and such). I don't care if there are things on X86_64 that won't work on a sun Sparc, or a MIPS, or a Power PC, or whatever. Portability is not a necessity for a program, unless the author wants that as a goal. The original Cray Blitz would not come close to running on a PC. No "vector merge" instruction, no "gather/scatter" indirect array references. We didn't care, we wrote Cray Blitz for the Cray architecture and it worked perfectly. Crafty is targeted for the X86. Nice that it works on others but that is not MY requirement.
So just compile the source to do as closely as possible what I ask for, and forget all the cute unsafe optimizations that run afoul of the "undefined behavior" ghost. Certainly integer overflow has no undefined behavior on the X86 target architecture. Yet because it is a problem on an old univac that used a bizarro binary representation, they insist that it won't work on the PC either. That's certainly to this "old computer scientist" the exact opposite of what I want. My compilers would map the source syntax into the best/most efficient/most accurate target machine instructions possible.
I get the purist point of view with the "abstract machine" and "crappy C specification" and such. But that does not FORCE a compiler writer to break programs that know how to use overflow on their target architecture. Yet they certainly act like it does. As far as your notes below, what does the X86 architecture say about signed integer overflow in the processor reference manual? It works perfectly, using the same behavior as unsigned (wrapping). But the compiler guys pretend the hardware can't do that and make it misbehave intentionally with unsafe optimizations. Do that for machines that actually can't handle overflow. But if the machine can deal with it, let it. Don't break it INTENTIONALLY. Yet that is what is being done.
CERT has a nice page explaining some of the risks with integer overflow.
Signed integer overflow is undefined behavior 36. Consequently, implementations have considerable latitude in how they deal with signed integer overflow (see MSC15-C. Do not depend on undefined behavior). An implementation that defines signed integer types as being modulo, for example, need not detect integer overflow. Implementations may also trap on signed arithmetic overflows, or simply assume that overflows will never happen and generate object code accordingly. It is also possible for the same conforming implementation to emit code that exhibits different behavior in different contexts. For example, an implementation may determine that a signed integer loop control variable declared in a local scope cannot overflow and may emit efficient code on the basis of that determination, while the same implementation may determine that a global variable used in as [sic] similar context will wrap.
Edit: I also like their page
MSC15-C. Do not depend on undefined behavior.
It says all the same things Rein, Ronald, I and others have been saying for the past two weeks. It rates the severity of this issue as 'High'.
Choose a decent context. "programmer that understands the underlying architecture, programmer that is proficient in C." NOW what is the danger of signed overflow? Unsigned numbers wrap and produce wrong (but predictable / modulo) answers. But signed overflow is a severe bug? Only if you don't know what you are doing...
I chose to learn C a LONG time ago, because it did not try to be a pascal-like strongly typed, highly restrictive language. It seems to be losing a bit of that "edge" in recent compilers. It was never intended to be a baby-programmable language. We even used the "register" modifier to specifically request certain variables be allocated in a register if possible. That was its claim to fame. Close to the hardware, but high-level. Not so close to the hardware any longer, it would seem. I can certainly deal with overflows perfectly in asm on X86. But the compiler guys are beginning to get in the way in some cases by doing optimizations that don't help speed enough to measure, yet pries a large wedge between the programmer and the underlying target machine...
I never wanted to see "C for dummies". That's what Pascal and Ada (among others) are for.