strcpy() revisited

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

strcpy() revisited

Post by bob »

Since the other thread has gone so far off the deep end, with accusations of "you advocate undefined behavior" or "you don't understand undefined behavior" I'm going to repeat my original story, with some early blanks filled in.

Back in the 70's Bert Gower and I were thinking about opening books, and what to do with ours. Most of us back then organized a book as a sort of tree/linked-list, which failed to detect transpositions unless the transpositions were hard-coded into the book input. One day the "use the hash signature" idea popped up in my head and I wrote the code and was quite happy. One thing it did was eliminate any sort of hand-coding with links, meaning the book input (and resulting file) could be as large as we wanted.

Bert and I immediately began to hand-type the most recent MCO book (MCO 10) line by line, page by page. We needed some way to keep up with where moves came from so we started to use a "pseudo-PGN" format that looked like this:

[MCO10 Page xx Col yy Footnote zz]

1. P-K4 P-K4 2. N-Kb3 N-Qb3 ... etc (yes, that book was English Descriptive).

Cray Blitz was in fortran, with no strings or supporting functions of any kind, so I wrote a quick parser that included a "shift()" procedure that does what the overlapped strcpy() call did in Crafty, namely shift the input string left N elements (it was an array of course) to eliminate the just-parsed move. All worked well.

In 1994 when I started the Crafty project, I copied some of the CB stuff and simply converted it to C as quickly as practical. That "shift" was apparently turned into the equivalent strcpy(). Don't remember any of it specifically, just remember doing the quick conversion on a few pieces of Cray Blitz to get started (current seaboard command, for example, as we had the "Forsythe notation" to encode a position long before the "Forsythe-Edwards specification was written)..

So, overlapping strcpy() used, whether it was known that it used undefined behavior or not I have absolutely no idea. We are talking almost 20 years ago for this conversion, late 1994.

Flash forward to today. It has worked since late 1994, not one single problem, with crafty running on everything from a Cray, to early 64 bit processors (alpha, SPARC, MIPS, even Itanium) all the way through my macbook with the 3rd generation i7. Worked with every compiler ever tried, with only a few incompatibilities found (IBM AIX C compiler defaulted chars to unsigned, most everyone else defaults to signed, as an example). the code which was eventually modified into the 1998 edition of ReadPGN() has never failed. Not once. Not on any machine. Not on any O/S. Not on any compiler. That was the state of affairs until 2-3 weeks ago.

I released a new version of Crafty to fix a couple of quirks I had found dealing with EGTB hits (which I don't use). Within a week, I received a bug report that said "Can't build a book with crafty, it crashes..."

First thought was an obvious one... I broke something. I diff'ed everything and looked at each change. Nothing even remotely related to book creating, PGN parsing, move translation, file I/O (I do my own file I/O to avoid endian issues). Nothing.

I asked for more information and for the PGN that caused the problem. Received it (a fairly long file) and tried it. Worked perfectly. Unfortunately most of this communication and testing happened while I was at my office so I was doing the testing on my office linux box. Could not make it fail.

I asked "what system?" He responded "Apple Macbook." A lead. Since I had canned clang and had installed gcc, I looked up a co-worker that still had a macbook that was unchanged. Unfortunately it was running Mountain Lion. But not knowing that was the problem, I tested. It worked. Another request. Exactly which OS X version and which compiler. The reply was "Mavericks clang".

Now I am at an impasse, because I have Mavericks, but I dumped clang for gcc 4.7.3... My co-worker mentioned he was fixing to upgrade, if I wanted I could do it for him and then test. I did, and it crashed. Aha, I thought. clang again, since the clang on my macbook (with 10.7) was broken and Crafty would not work correctly at all. I compiled on my mac and crash. Just the mysterious "abort" printed, no explanation or anything. Since the POSIX standard says "diagnostic output should go to stderr", and since I was not seeing anything other than "abort" I set about debugging. The debugger was helpless here, it could not tell me anything at all, not even which source file was crashing or anything. So many printf()'s later, I track it down to the strcpy().

Now, "why does that suddenly fail?" A quick google, after wasting more than a week, and I discover many others have encountered the same thing and they were just as unhappy as I was.

This is where the thread I started came from.

My complaint: Something that worked for a LONG time suddenly failed with no explanation. I dig up the library change and look at it, and think "why did they slow this down to check for overlapping?" I then thought "if they wasted the time to check for overlapping, why did they not just call memmove() rather than aborting?"

And that's where we are. To be clear:

I am NOT a proponent of using undefined behavior.

I DO understand what it means (which, by the way, does not match much of what I have read here). This "demons" nonsense is just that. Unless you REALLY believe that a strcpy(a, a+1) is allowed to format your hard disk or any other equally damaging act. I don't buy that, and would call that an outright stupid opinion. The generally accepted definition of undefined behavior is that anything can happen, but within the scope of the operation being performed, ONLY. That is, an overlapping string might overwrite a buffer, or it might not, but it won't do anything INTENTIONALLY worse.

The Apple example is an example of poor behavior, IMHO. Why? because (1) there is no reason to intentionally break something that has worked for a long time (precedence). (2) changing something to make it faster, smaller, more accurate, whatever is perfectly acceptable, and if that changes the behavior, fine. But don't change it just because you can.

That is ALL I have said. Repeatedly. I come from the perspective of actually having written multiple compilers over the years. The writer's goal is to do what the programmer's code says do, no more, no less. If you add two positive integers and get a negative integer, fine. That is not "random undefined behavior". Ditto for memory race conditions.

Some here seem to believe that by using "undefined" in the standard, the compiler writers have carte blanche to do whatever they want in such cases. I, on the other hand, believe the compiler writers have the responsibility to do what I say do, to the best of their ability. Period.

If you want to argue "Apple did this to prevent future problems." I would argue "nonsense."

1. They COULD have fixed the problem. Check for overlap. If detected, directly call memmove() instead. Problem fixed. No code broken. Future problems eliminated as well.

2. They COULD have left it alone. Legacy code is just that. There's no sense in breaking something just because you can.

3. They did capriciously break a BUNCH of software products, with no warning. Not even an error in the place where POSIX says it should be displayed.

So please get off of the "Bob believe undefined behavior is acceptable and he teaches his students to write code like that." Pure nonsense. I've not defended my code in any way other than to say "it worked for a long time, it was unintentional, the existing library source files show that the overlap I use will work correctly. Apple should not have changed it without any warning."

That's all I have said. If there are any reasonable comments, feel free to leave 'em here. I don't see any point in the endless debate about what undefined means. Doesn't matter at all to the original point of this discussion. Still doesn't.
simon
Posts: 50
Joined: Sun Mar 20, 2011 6:42 pm

Re: strcpy() revisited

Post by simon »

bob wrote:I DO understand what it means (which, by the way, does not match much of what I have read here). This "demons" nonsense is just that. Unless you REALLY believe that a strcpy(a, a+1) is allowed to format your hard disk or any other equally damaging act. I don't buy that, and would call that an outright stupid opinion. The generally accepted definition of undefined behavior is that anything can happen, but within the scope of the operation being performed, ONLY. That is, an overlapping string might overwrite a buffer, or it might not, but it won't do anything INTENTIONALLY worse.
The generally accepted definition of undefined behaviour is the one given in the C standard.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: strcpy() revisited

Post by Rein Halbersma »

bob wrote: I DO understand what it means (which, by the way, does not match much of what I have read here). This "demons" nonsense is just that. Unless you REALLY believe that a strcpy(a, a+1) is allowed to format your hard disk or any other equally damaging act. I don't buy that, and would call that an outright stupid opinion. The generally accepted definition of undefined behavior is that anything can happen, but within the scope of the operation being performed, ONLY. That is, an overlapping string might overwrite a buffer, or it might not, but it won't do anything INTENTIONALLY worse.
This is why lawyers don't let defendants testify, because they generally make things worse. You not only don't agree with, but you also DON'T understand undefined behavior as defined in the C Standard.

BTW, you have a point that any reasonable compiler will not go hunting for UB and intentionally emit hard drive erasing or nasal demons launching machine instructions. That would be plain malicious.

However, if you read any of the papers on UB, or in particular the paper by Boehm on "benign" data races, you would realize that modern, aggressively optimizing compilers routinely apply all kinds of very intricate code reshuffling that can accidentally have very strange and potententially damaging effects if those optimizations are applied under the assumptions that a conforming program exhibits no undefined behavior.

Apple didn't go out to punish you, it simply incorporated an optimization that turns out to break programs with UB (like yours, who used strcpy() on overlapping ranges). It was within its rights to do so, and not doing it would have hurt other users (e.g. Atom processors where the other way of copying is apparently faster).
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

simon wrote:
bob wrote:I DO understand what it means (which, by the way, does not match much of what I have read here). This "demons" nonsense is just that. Unless you REALLY believe that a strcpy(a, a+1) is allowed to format your hard disk or any other equally damaging act. I don't buy that, and would call that an outright stupid opinion. The generally accepted definition of undefined behavior is that anything can happen, but within the scope of the operation being performed, ONLY. That is, an overlapping string might overwrite a buffer, or it might not, but it won't do anything INTENTIONALLY worse.
The generally accepted definition of undefined behaviour is the one given in the C standard.
The definition is NOT "a definition".

A simple question. When you write the following line of code, assuming a, b and C are signed ints, what do you expect to happen?

a = b + c;

Do you expect it to abort? get a compiler error? crash? Format your hard drive? Or do you expect it to add b and c and store the answer in a, whether b and c or positive, negative, or if they produce a value too large / small to store.

This is a key. Can the compiler prove that overflows? Nope. No idea what the values are at compile time unless they are initialized locally. SO what does it do? It just adds the two numbers. And you get what you get, which is not only the expected behavior, it is the RIGHT (as opposed to correct) behavior.

Or do you disagree? The compiler has absolutely no control over this. And it can't even detect it unless it decides to use the hardware overflow flag (x86). But if it does that, the operation is not undefined at all, it is precisely defined by the x86 hardware. So you can't say "but signed overflow is undefined in the standard, but the compiler can use a defined overflow within the hardware to produce undefined behavior in the executable." That simply makes no sense.

Ditto for race conditions which can not be discovered by the compiler when pointers are allowed. Forget C++. Not talking c++. Not talking objects and pseudo-pointers. I am talking plain ANSI C.

So what does undefined REALLY mean? 99% of the time the compiler does the right thing because it can't even detect that overflow or races will occur at compile time, so it produces code that follows the intent of the programmer as closely as possible. Which I happen to agree with completely.

But what if the compiler can prove that overflow will happen. Or in some simple case it can prove that a race can happen. Does it now have the right to behave completely differently? Or should it STILL do what we asked, as closely as possible? Or can different parts of our program operate differently? And is it acceptable for a compiler that has accepted a = b + c; for years to suddenly cause the program to crash because this MIGHT produce undefined behavior???
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

Rein Halbersma wrote:
bob wrote: I DO understand what it means (which, by the way, does not match much of what I have read here). This "demons" nonsense is just that. Unless you REALLY believe that a strcpy(a, a+1) is allowed to format your hard disk or any other equally damaging act. I don't buy that, and would call that an outright stupid opinion. The generally accepted definition of undefined behavior is that anything can happen, but within the scope of the operation being performed, ONLY. That is, an overlapping string might overwrite a buffer, or it might not, but it won't do anything INTENTIONALLY worse.
This is why lawyers don't let defendants testify, because they generally make things worse. You not only don't agree with, but you also DON'T understand undefined behavior as defined in the C Standard.

BTW, you have a point that any reasonable compiler will not go hunting for UB and intentionally emit hard drive erasing or nasal demons launching machine instructions. That would be plain malicious.
Sort of like making a program crash for no reason whatsoever? You apparently begin to see my point. Apple COULD have fixed this so that undefined behavior is completely removed. Instead they just chose to abort the execution and cause a bunch of programmers to track down the change.

However, if you read any of the papers on UB, or in particular the paper by Boehm on "benign" data races, you would realize that modern, aggressively optimizing compilers routinely apply all kinds of very intricate code reshuffling that can accidentally have very strange and potententially damaging effects if those optimizations are applied under the assumptions that a conforming program exhibits no undefined behavior.

Apple didn't go out to punish you, it simply incorporated an optimization that turns out to break programs with UB (like yours, who used strcpy() on overlapping ranges). It was within its rights to do so, and not doing it would have hurt other users (e.g. Atom processors where the other way of copying is apparently faster).
So? There are hardware-specific libraries all over the planet, already. Some hand-coded asm in the glibc stuff. That has no effect on atom processors at all because it won't run on 'em. BTW strcpy() does NOT have an atom-specific version. Copy is STILL left-to-right. Only thing that changed was that it crashes on overlap. Sort of shoots this point dead in the water because there is zero gain for ANYONE, and significant harm for MANY. That is bad software development. Fix it or ignore it, but don't intentionally break it.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: strcpy() revisited

Post by syzygy »

bob wrote:So, overlapping strcpy() used, whether it was known that it used undefined behavior or not I have absolutely no idea. We are talking almost 20 years ago for this conversion, late 1994.
1994, so let's have a look at the C89 standard.
http://port70.net/~nsz/c/c89/c89-draft.html#4.11.2.3
The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1 . If copying takes place between objects that overlap, the behavior is undefined.
So... this is like driving without a driver's license and finally getting caught after 20 years.
I am NOT a proponent of using undefined behavior.

I DO understand what it means (which, by the way, does not match much of what I have read here). This "demons" nonsense is just that.
You definitely do not understand it.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: strcpy() revisited

Post by mvk »

syzygy wrote:So... this is like driving without a driver's license and finally getting caught after 20 years.
And then arguing that there were no accidents (forgetting the driver just hit the police car from behind) and, there was no valid reason to ask for the license on this occasion in the first place because the police car's insurance was overdue.

Seriously, it is time for a strcpy-subforum.
[Account deleted]
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: strcpy() revisited

Post by syzygy »

bob wrote:So? There are hardware-specific libraries all over the planet, already. Some hand-coded asm in the glibc stuff. That has no effect on atom processors at all because it won't run on 'em.
Huh? Atom processors certainly can run glibc, even the x86-compiler version. Atom processors implement the x86 architecture.
BTW strcpy() does NOT have an atom-specific version. Copy is STILL left-to-right.
Note the word STILL.
Only thing that changed was that it crashes on overlap. Sort of shoots this point dead in the water because there is zero gain for ANYONE, and significant harm for MANY. That is bad software development. Fix it or ignore it, but don't intentionally break it.
You have fixed the bug (I assume). There is the gain.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

bob, I humbly suggest you take an hour and read each of these links with an open mind. I know it seems like a waste of time because you think you already know how it works. But just for an hour, forget what you know and read what these two experts have to say about the topic of undefined behavior:

(1) Chris Lattner, primary author of LLVM and the clang optimizing compiler. Chris is a bona-fide expert on compiler optimization and knows more about undefined behavior than any ten average programmers combined.
* LLVM Project BLog: What Every C Programmer Should Know About Undefined Behavior, Part 1 / 3
* LLVM Project BLog: What Every C Programmer Should Know About Undefined Behavior, Part 2 / 3
* LLVM Project BLog: What Every C Programmer Should Know About Undefined Behavior, Part 3 / 3

(2) John Regehr, associate professor of CS at University of Utah. He does research on software correctness, including things like compiler fuzzing, static analysis of source programs to find undefined behavior, etc. Using fuzzing and test-case-reducing tools he and his students wrote, Regehr has reported many hundreds of bugs in most of the widely-used optimizing compilers (GCC, clang, etc.) and helped popularize that style of compiler testing.
* A Guide to Undefined Behavior in C and C++, Part 1 of 3
* A Guide to Undefined Behavior in C and C++, Part 2 of 3
* A Guide to Undefined Behavior in C and C++, Part 3 of 3
* Integer Overflow Paper
* Winners of a contest to find the most surprising code snippet affected by undefined behavior

If you read these pages with an open mind, you may come away with a new respect for (and perhaps a healthy fear of) undefined behavior.


I think most programmers who use the C and C++ languages don't really understand what they are messing with when they stray into 'undefined' territory. They think the C language closely maps to what the underlying hardware does, so anything you write in C will somehow be mapped in a sensible way. This is only true for the parts of the language that are NOT declared as 'off limits' by labelling them as 'undefined behavior'. If you stray into UB territory, nine times out of ten you'll get away with it and the tenth time you'll get ripped apart by a rabid bear. Or something equally unpleasant.

Most programmers just don't know about this. If something happens to work on their compiler, they assume it is OK. They are blissfully unaware of how close they came to disaster (and how their code is a time-bomb that might someday fail in a surprising way). Chris even explains how LLVM takes pains to try and do the 'sensible' thing for some cases of undefined behavior, because in those cases if the compiler actually exploited it to the fullest, too many existing programs would not run correctly.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: strcpy() revisited

Post by mcostalba »

bob wrote: I'm going to repeat my original story, with some early blanks filled in.
I don't want to get into this discussions because it is useless. I'd just want to give you a good practical advice: run a valgrind session on your engine.

This tool can find (and warn on) the overlapping addresses and many other dubious usages. I had an overlapping with memcpy() and valgrind was able to spot it. And of course I have quickly fixed it in the proper way, without posting on talkchess...but this is another story.