strcpy() revisited

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: strcpy() revisited

Post by mvk »

mcostalba wrote:
bob wrote: I'm going to repeat my original story, with some early blanks filled in.
I don't want to get into this discussions because it is useless. I'd just want to give you a good practical advice: run a valgrind session on your engine.

This tool can find (and warn on) the overlapping addresses and many other dubious usages. I had an overlapping with memcpy() and valgrind was able to spot it. And of course I have quickly fixed it in the proper way, without posting on talkchess...but this is another story.
Not just that. OSX comes with an awesome development environment, XCode. If you run the project from it it will show you the stack trace in your face, with the offending line highlighted. Very hard to miss where it is going wrong. There is absolutely no need to debug a week for this if you are familiar with developing on the Mac.

Image

And besides, why would a week of debugging be problematic, but two weeks of complaining be no problem at all?
[Account deleted]
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

bob wrote:I DO understand what it means (which, by the way, does not match much of what I have read here). This "demons" nonsense is just that. Unless you REALLY believe that a strcpy(a, a+1) is allowed to format your hard disk or any other equally damaging act.
You clearly don't understand it then. It is definitely allowed to format your hard disk. Nobody would ever make a compiler that does that on purpose, but how could they possibly guarantee it won't happen by accident? The standard makes no promises whatsoever about what will happen if you try execute undefined behavior in a program. It could fail before, after or during the undefined operation. It could abort with an error message, it could silently do something mysterious and surprising, it could silently do what you (for some reason) expected it to do, or it could silently format your hard drive.
bob wrote:I don't buy that, and would call that an outright stupid opinion. The generally accepted definition of undefined behavior is that anything can happen, but within the scope of the operation being performed, ONLY.
No, wrong!! There is no such "scope of the operation being performed". By invoking undefined behavior, you left the realm of C programs and entered a bizarre netherworld, where anything at all might happen. And there's a specific reason for this, too. Its so the compiler writers and library authors don't have to care about what happens when your program tries to do one of these illegal, undefined things. The compiler can optimize your program assuming any undefined behavior can't happen. In practice that can mean anything from replacing sensible-looking-but-undefined comparisons with "1", to deleting entire blocks of code. (Chris Lattner explains that LLVM used to just silently delete the code, but now they usually try and put a trapping instruction in there instead... even though it makes the code size bigger, apparently programmers found it too confusing when the program just fell out of the end of a function that had invoked undefined behavior and started executing whatever function came next in the compiled code...)
bob wrote:That is, an overlapping string might overwrite a buffer, or it might not, but it won't do anything INTENTIONALLY worse.
Well, it probably won't do anything intentionally malicious... but after inlining, macro substitution, and other transformations by the optimizer, even non-malicious optimization can have very surprising effects on your program's behavior. And this happens all the time to buggy programs, and the programmers sometimes even report those as bugs against the compiler even though the actual bug is in their program. :lol:
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

All I can say is that I would not want to use compilers that use the very liberal definition of 'undefined behavior' in a standard as an excuse to maliciously sabotage my program when such behavior might occur, anymore than I would hire personel that thinks an unclear order I give them can mean I want them to shoot me in the back.

A crime remains a crime, even when you commit it because your host told you "make yourself at home"...
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

hgm wrote:All I can say is that I would not want to use compilers that use the very liberal definition of 'undefined behavior' in a standard as an excuse to maliciously sabotage my program when such behavior might occur, anymore than I would hire personel that thinks an unclear order I give them can mean I want them to shoot me in the back.

A crime remains a crime, even when you commit it because your host told you "make yourself at home"...
Sure, of course. But all optimizing compilers take advantage of undefined behavior to generate better code. If you read the links I gave from Lattner and Regehr, they describe several examples of how the compiler is able to generate better code for common, valid programs because it knows that the undefined behaviors "aren't allowed" and it is free to ignore them.

The vast majority of C and C++ programmers don't understand this stuff very well, and yet they write lots of working code anyways. Most programmers know that some things are not defined and avoid using them. Trouble occurs though when they forget about signed overflow, unsafe pointer arithmetic, etc. or just accidentally rely on them in their programs. There are many many examples of this in real programs in the wild -- for example, Regehr recently surveyed open-source crypto libraries and found that most of them contain instances of undefined behavior. Serious bugs and exploitable security vulnerabilities have been traced back to undefined behavior.

So thats why I think every programmer ought to know enough about this stuff to avoid getting screwed by it. Which basically means "avoid undefined behavior like the plague". Even if your program works today, if it relies on UB somehow then it might fail unexpectedly 2 years from now or 5 years from now. As a professional programmer, I have an obligation to write robust, future-proof code that won't suddenly fail one day after I have moved on to something else.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

wgarvin wrote:Sure, of course. But all optimizing compilers take advantage of undefined behavior to generate better code. If you read the links I gave from Lattner and Regehr, they describe several examples of how the compiler is able to generate better code for common, valid programs because it knows that the undefined behaviors "aren't allowed" and it is free to ignore them.
Apparently there is some confusion, then, on what 'better' means. In my book this would go under the heading 'fatally flawed code'. It is like hiring someone who firmly believes that everyone would be better off by meeting his maker as soon as possible, and is just waiting for a feeble excuse to do me in.

I would NEVER want to use a compiler that thinks the undefinedness of integer overflow would be any other than how the hardware it compiles for defines it. That different hardware might define it in a different way is one thing. To use that as an excuse to do something completely different, which no hardware would ever do and which is certainly not what the programmer intends is quite another. These are not at the same level. Like filing something in a wrong drawer because I wasn't clear about where to file is not at the same legal level as shooting me in the back because I wasn't clear where to file.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: strcpy() revisited

Post by syzygy »

hgm wrote:I would NEVER want to use a compiler that thinks the undefinedness of integer overflow would be any other than how the hardware it compiles for defines it.
Then you should use a compiler that makes a promise that it will deal with integer overflow in a particular way. For gcc, just get used to invoking it with -fwrapv. I'm sure you won't mind the loss in efficiency, especially on loops using an int as loop variable.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: strcpy() revisited

Post by Rein Halbersma »

hgm wrote:
I would NEVER want to use a compiler that thinks the undefinedness of integer overflow would be any other than how the hardware it compiles for defines it. That different hardware might define it in a different way is one thing. To use that as an excuse to do something completely different, which no hardware would ever do and which is certainly not what the programmer intends is quite another. These are not at the same level. Like filing something in a wrong drawer because I wasn't clear about where to file is not at the same legal level as shooting me in the back because I wasn't clear where to file.
No sane compiler will emit completely bogus machine instructions, but it can optimize away your instructions because it will think that no sane programmer will intend to rely on undefined behavior. For integer overflow, gcc has both a warning to let you know it is making these optimizations and a flag to force the behavior you intend to use on your machine. But because there are so many (~200 in the C Standard) different constructs that lead to undefined behavior, most compiler won't warn about them all.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

syzygy wrote:
bob wrote:So? There are hardware-specific libraries all over the planet, already. Some hand-coded asm in the glibc stuff. That has no effect on atom processors at all because it won't run on 'em.
Huh? Atom processors certainly can run glibc, even the x86-compiler version. Atom processors implement the x86 architecture.
They don't. They intentionally select the atom-optimized versions of whatever is available.
BTW strcpy() does NOT have an atom-specific version. Copy is STILL left-to-right.
Note the word STILL.
Only thing that changed was that it crashes on overlap. Sort of shoots this point dead in the water because there is zero gain for ANYONE, and significant harm for MANY. That is bad software development. Fix it or ignore it, but don't intentionally break it.
You have fixed the bug (I assume). There is the gain.
The code was not failing BEFORE the lib change either. This was a loss.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

Nevertheless, there is something very troubling here. Your program's behavior is undefined -- you have no way of knowing what will happen...That means compilers may generate code to do whatever they like: reformat your disk, send suggestive email to your boss, fax source code to your competitors, whatever.
-- Scott Meyers, "Effective C++
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

Academic Attention for Undefined Behavior at Regehr's blog
Undefined behaviors are like blind spots in a programming language; they are areas where the specification imposes no requirements. In other words, if you write code that executes an operation whose behavior is undefined, the language implementation can do anything it likes. In practice, a few specific undefined behaviors in C and C++ (buffer overflows and integer overflows, mainly) have caused, and are continuing to cause, a large amount of economic damage in the form of exploitable vulnerabilities. On the other hand, undefined behaviors have advantages: they simplify compiler implementations and permit more efficient code to be generated. Although the stakes are high, no solid understanding of the trade-offs exists because, for reasons I don’t understand, the academic programming languages community has basically ignored the issue. This may be starting to change, and recently I’ve learned about two new papers about undefined behavior, one from UIUC and the other (not yet publicly available, but hopefully soon) from MIT will appear in the “Correctness” session at APSYS 2012 later this month. Just to be clear: plenty has been written about avoiding specific undefined behaviors, generally by enforcing memory safety or similar. But prior to these two papers, nothing has been written about undefined behavior in general.
(emphasis mine)