strcpy() revisited

bob · Post by **bob** » Fri Dec 13, 2013 4:37 am

bnemias wrote:
bob wrote:
bnemias wrote:
bob wrote:It takes one more add and one more compare. How hard is that? I think it is doing something wrong by detecting the overlap in the first place.
You do realize that to implement the solution you've been advocating, covering up the bug by invoking memmove(), that it is necessary to actually detect the overlap?
You do realize that I said EXACTLY THAT? Since they chose to detect the overlap, why just abort rather than actually fix it? Simple enough now? I have only written that a dozen times. Seems silly to waste the CPU cycles to catch it, when it is not done in most programs, and then get NOTHING from those wasted cycles... If you'd join a thread by reading from the top, this wasted post would not be necessary.
I was just pointing out the absurdity of that particular quoted statement. It may be wasted to you, but I've read the thread and consider it rather amusing and therefore not wasted.

What is absurd about it? It is a simple statement of fact. Apple HAS already computed the length of the source to check for overlap. It now has ALL it needs to pass to memmove() and make it work properly. No overhead beyond what it has already wasted to check for the overflow. So after investing that wasted time, why not get SOMETHING useful in return? Or is that just "absurd" in your book?

bob · Post by **bob** » Fri Dec 13, 2013 4:40 am

mvk wrote:
bob wrote:What does it cost to walk down the string once? Twice? They have to walk it once to get the length, then again to copy. Close enough to 2x to use it as an approximation of the cost.
I have reason to believe they execute the check after the copy. See my screenshot of XCode. If that's confirmed, the cost is near zero (branch prediction etc).

Not in the code I saw posted. But now it is my turn to ask "they check AFTER doing something that has undefined behavior?" Something that could overrun the destination? Something that could cause who knows what? And we NEVER see their message?

You have to pick one side of this thinking and stay there. They can't be sure their (broken) test will even be performed if the strings overlap.

And what they did does not exactly "work". They only catch overlap in one direction, and NOT the worst one at that...
So far that was only the case in your toy example. And that was explained: a gcc optimisation touched strcpy before apple got there. On non-toy examples, I get buffer errors in both directions.

Could be. Really doesn't matter, however. It is broken all the same. The compiler and lib ARE part of a single "package" that is used to produce an executable.

bob · Post by **bob** » Fri Dec 13, 2013 4:53 am

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:
bnemias wrote:
bob wrote:Seems perfectly reasonable to me. Somewhere. Some alternate universe. Where insanity reigns supreme... ... when they COULD have fixed both with a simple call to memmove().
Yes, let's cover up bugs instead of expose them.... that is insanity.
So, to recap the position you must believe, "In order to compile more efficiently and be able to use tricky optimizations, it is perfectly OK to slow down strcpy() by a factor of 2."??? I wonder if those "wonderful optimizations" will offset that factor of 2? That is, I wonder if this is one of those "useful optimizations" that slows the code down rather than speeding it up?
No he said it is insane to cover up bugs instead of exposing them.

There are two things here:
1) Apple pushing people to fix their bugs;
2) the good sides of strcpy() UB.

Ad 1)
The "ethics" of what Apple did may be debatable, but what they did does work. The extra check certainly does not slow down strcpy() by a factor of 2, that's just nonsense.
What does it cost to walk down the string once? Twice? They have to walk it once to get the length, then again to copy. Close enough to 2x to use it as an approximation of the cost.
I guess you haven't heard of caches. Plus, just reading is considerably faster than reading and writing.

Anyway, if you compile with -D_FORTIFY_SOURCE, then you choose to have these checks.

I did not, of course...

And what they did does not exactly "work". They only catch overlap in one direction, and NOT the worst one at that...
As far as I understand, it only does not work on some constant strings that are not copied using the library implementation.

You were going to install Linux.
Once I finish grading, yes. Does that help anyone but me, however?
You obviously don't care about anyone else that would like to compile and run your code with highly optimising compilers that do not respect "your intentions" for UB or that are not x86_64.

Works everywhere on the planet tried to date, except for mavericks. This has been fixed so it works there too. But I do not choose to accept that kind of development nonsense. There I most certainly DO have a choice. There are enough other quirks in mavericks (poor processor scheduling, poor handling of hyper threading, I had already started looking into what was necessary. Turns out to be simple enough...

Here's an example. Type "w" or whatever you want on an os x box to get the load averages. Here is my macbook, where NOTHING is running:

scrappy% w
21:43 up 20 days, 4:30, 3 users, load averages: 0.77 0.82 0.82

Here is my office linux box, same conditions:

crafty% w
21:44:48 up 86 days, 8:57, 3 users, load average: 0.00, 0.01, 0.07

And no, it is not spotlight. Disabled. I have had zero luck getting this down to even 0.5, much less the 0.00/0.01 I am used to seeing on ANY unused linux box. Not acceptable. Not interested in wasting the time to track this nonsense down.

bnemias · Post by **bnemias** » Fri Dec 13, 2013 4:59 am

bob wrote:
bnemias wrote:
bob wrote:
bnemias wrote:
bob wrote:It takes one more add and one more compare. How hard is that? I think it is doing something wrong by detecting the overlap in the first place.
You do realize that to implement the solution you've been advocating, covering up the bug by invoking memmove(), that it is necessary to actually detect the overlap?
You do realize that I said EXACTLY THAT? Since they chose to detect the overlap, why just abort rather than actually fix it? Simple enough now? I have only written that a dozen times. Seems silly to waste the CPU cycles to catch it, when it is not done in most programs, and then get NOTHING from those wasted cycles... If you'd join a thread by reading from the top, this wasted post would not be necessary.
I was just pointing out the absurdity of that particular quoted statement. It may be wasted to you, but I've read the thread and consider it rather amusing and therefore not wasted.
What is absurd about it? It is a simple statement of fact. Apple HAS already computed the length of the source to check for overlap. It now has ALL it needs to pass to memmove() and make it work properly. No overhead beyond what it has already wasted to check for the overflow. So after investing that wasted time, why not get SOMETHING useful in return? Or is that just "absurd" in your book?

You think they're doing something wrong detecting the overlap. You also think they should invoke memmove() instead of doing strcpy(). Hint, it's funny.

But you're right, it's a wasted post if you can't laugh at yourself.

wgarvin · Post by **wgarvin** » Fri Dec 13, 2013 5:22 am

bob wrote:
syzygy wrote:
bob wrote:My belief is still the same, the compiler should do "the right thing". I'm not sure what you mean by "assign semantics to my program." I supply the semantics, specifically, in the form of C source.
The C you supplied does not have semantics.

If you mean it is "hyatt C" and not C, then you should use a "hyatt C" compiler.
Do you know the definition of "semantics"? Apparently not.

You must be using the word differently from us. "Semantics" are the meaning of your program. Since it is allegedly a C program, that means it is written using C syntax and its semantics are specified in the C language standard. And if that program invokes undefined behavior, the C language does not assign to it any semantics at all. Its not even guaranteed to execute the other parts correctly up to the undefined behavior; the entire execution is undefined.

I know you aren't happy that the standard completely punts as soon as any UB is introduced into the execution, and thus its a crappy standard etc., but that's the language we've got. And large and useful applications are still written in it, and most of them work fine despite these annoying shortcomings of the language standard.

Sure, it could probably be improved, but even with 191+ flavors of undefined behavior in it, C is still one of the most useful programming languages around.

bob · Post by **bob** » Fri Dec 13, 2013 5:31 am

bnemias wrote:
bob wrote:
bnemias wrote:
bob wrote:
bnemias wrote:
bob wrote:It takes one more add and one more compare. How hard is that? I think it is doing something wrong by detecting the overlap in the first place.
You do realize that to implement the solution you've been advocating, covering up the bug by invoking memmove(), that it is necessary to actually detect the overlap?
You do realize that I said EXACTLY THAT? Since they chose to detect the overlap, why just abort rather than actually fix it? Simple enough now? I have only written that a dozen times. Seems silly to waste the CPU cycles to catch it, when it is not done in most programs, and then get NOTHING from those wasted cycles... If you'd join a thread by reading from the top, this wasted post would not be necessary.
I was just pointing out the absurdity of that particular quoted statement. It may be wasted to you, but I've read the thread and consider it rather amusing and therefore not wasted.
What is absurd about it? It is a simple statement of fact. Apple HAS already computed the length of the source to check for overlap. It now has ALL it needs to pass to memmove() and make it work properly. No overhead beyond what it has already wasted to check for the overflow. So after investing that wasted time, why not get SOMETHING useful in return? Or is that just "absurd" in your book?
You think they're doing something wrong detecting the overlap. You also think they should invoke memmove() instead of doing strcpy(). Hint, it's funny.

But you're right, it's a wasted post if you can't laugh at yourself.

How about stopping with trying to put words in my mouth. My EXACT statement, reduced to C-like syntax

IF (they are going to check for overlap) {
check and call memmove() if detected;
else
just let strcpy() do its normal thing.

I did NOT suggest that they check for overlap and call memmove(). I suggested that they leave it alone, but since they HAD checked, they ought to at least use the result and improve things.

Nothing funny whatsoever IMHO.

mvk · Post by **mvk** » Fri Dec 13, 2013 8:00 am

bob wrote:But now it is my turn to ask "they check AFTER doing something that has undefined behavior?" Something that could overrun the destination? Something that could cause who knows what? And we NEVER see their message? You have to pick one side of this thinking and stay there. They can't be sure their (broken) test will even be performed if the strings overlap.

Wylie and Ronald have explained this several times by now: What the compiler does behind the door is its own business.

Could be. Really doesn't matter, however. It is broken all the same. The compiler and lib ARE part of a single "package" that is used to produce an executable.

Would be good to file a bug report if you want it repaired for your example as well. I doubt they read this forum.

bob · Post by **bob** » Fri Dec 13, 2013 8:02 pm

wgarvin wrote:
bob wrote:
syzygy wrote:
bob wrote:My belief is still the same, the compiler should do "the right thing". I'm not sure what you mean by "assign semantics to my program." I supply the semantics, specifically, in the form of C source.
The C you supplied does not have semantics.

If you mean it is "hyatt C" and not C, then you should use a "hyatt C" compiler.
Do you know the definition of "semantics"? Apparently not.
You must be using the word differently from us. "Semantics" are the meaning of your program.

What, EXACTLY, did I say. First read your statement above. "Semantics are the meaning of your program." I stated "I supply the semantics in the form of a C source."

I have learned ONE thing in this long discussion. The next time I find something odd that a specific compiler is doing, something that might affect a chess programmer and cause them to waste time looking for it, I will NOT be posting anything here. Let 'em waste the time just as I did. As opposed to my getting dragged into absolutely ridiculous conversations with "A professor supports doing things that cause undefined behavior" and such nonsense. Not again...

Since it is allegedly a C program, that means it is written using C syntax and its semantics are specified in the C language standard. And if that program invokes undefined behavior, the C language does not assign to it any semantics at all. Its not even guaranteed to execute the other parts correctly up to the undefined behavior; the entire execution is undefined.

And that is a bogus concept. Here's why.

Given a simple C statement, "i++; where i is a signed int"

A compiler could do any of the following:

1. just add 1 to i and move on.

2. add 1 to i and check the overflow flag and abort if set.

3. add 1 to i and check the overflow flag and do something completely unexpected if set, such as reverting i to its original state, setting it to zero, or anything else.

But in reality, it will ALWAYS do 1. Because it doesn't know whether it will overflow or not at compile time, and since it can't see the value of i at compile time, it just adds 1 and lets the hardware do its thing. Which is EXACTLY what I believe it should do. Just do what I asked.

It will ALWAYS do 1, UNLESS it can see the value of i, and realize that adding one will overflow. Because at compile time it might be able to do a sort of reverse constant propagation to see that i = a specific value when it gets here. NOW it behaves differently. It can just omit the operation completely. It can do something completely wrong in the case of if (a+1 > a).

Inconsistent. Even if someone knows the above possibilities, they might overlook the trickiness of the compiler and allow it to discern the value of i and wreck the code.

That is that part I disagree with. Since everyone constantly adds and such with signed integers, it is always possible that some of those adds will overflow. Yet the compiler just does the right thing. While if you happen to use a constant, it can wreck the code completely.

If you like that behavior, fine. I simply do not. As an old-school compiler person, that is NOT what we would want to produce from a compiler.

I know you aren't happy that the standard completely punts as soon as any UB is introduced into the execution, and thus its a crappy standard etc., but that's the language we've got. And large and useful applications are still written in it, and most of them work fine despite these annoying shortcomings of the language standard.

Sure, it could probably be improved, but even with 191+ flavors of undefined behavior in it, C is still one of the most useful programming languages around.

C is certainly useful, but this kind of nonsense is making it less useful. Because the compiler guys are taking great liberties with the term "undefined behavior". They catch some of it, ignore some of it, and completely break some of it. Consistency would be nice.

I don't want to get into a game of wits with the compiler...

bob · Post by **bob** » Fri Dec 13, 2013 8:04 pm

mvk wrote:
bob wrote:But now it is my turn to ask "they check AFTER doing something that has undefined behavior?" Something that could overrun the destination? Something that could cause who knows what? And we NEVER see their message? You have to pick one side of this thinking and stay there. They can't be sure their (broken) test will even be performed if the strings overlap.
Wylie and Ronald have explained this several times by now: What the compiler does behind the door is its own business.

Could be. Really doesn't matter, however. It is broken all the same. The compiler and lib ARE part of a single "package" that is used to produce an executable.
Would be good to file a bug report if you want it repaired for your example as well. I doubt they read this forum.

I didn't post it here for Apple's developers. I posted it here because it represented a change in behavior that has not been changed in 20 years, it was only on one system (Mavericks), it took a good bit of time to track down since it was not obvious that it was a mavericks issue at first. I thought it might benefit others that saw something similar and they could save some time. I won't make that mistake again, however.

wgarvin · Post by **wgarvin** » Fri Dec 13, 2013 8:27 pm

bob wrote:
wgarvin wrote:
bob wrote:
syzygy wrote:The C you supplied does not have semantics.

If you mean it is "hyatt C" and not C, then you should use a "hyatt C" compiler.
Do you know the definition of "semantics"? Apparently not.
You must be using the word differently from us. "Semantics" are the meaning of your program.
What, EXACTLY, did I say. First read your statement above. "Semantics are the meaning of your program." I stated "I supply the semantics in the form of a C source."

Yes, my statement was correct, but yours is not. You supply the C source. The C language standard supplies the semantics, and the C implementation (compiler, libraries, pthreads etc) may add some extensions of its own. Your C code means exactly what the compiler thinks it means. It doesn't mean what YOU think it means. Whenever what you think it means and what the language standard says it means disagree, what YOU think is WRONG.

I don't know how I can possibly say this any clearer than I have. You are professor of CS who teaches C programming to others, and its just incomprehensible to me that you don't accept or apparently even understand this. I'm sorry to be the bearer of bad news here.

Note that it doesn't mean every C programmer has to memorize the standard, or even have read it before (although that might be worthwhile if they intend to work on anything serious written in C). Most C programmers probably haven't, and yet they are able to write correct or nearly-correct programs that work reasonably well in practice. But it does mean that if they labor under a misconception of how a language feature works, they will sometimes get burned. The compiler doesn't care what YOU think the code means, its "mind reading skills" are even worse than mine. It only cares about the semantics spelled out in the standard.

bob wrote:
wgarvin wrote: Since it is allegedly a C program, that means it is written using C syntax and its semantics are specified in the C language standard. And if that program invokes undefined behavior, the C language does not assign to it any semantics at all. Its not even guaranteed to execute the other parts correctly up to the undefined behavior; the entire execution is undefined.

And that is a bogus concept. Here's why.

[--some stuff snipped--]

Inconsistent. Even if someone knows the above possibilities, they might overlook the trickiness of the compiler and allow it to discern the value of i and wreck the code.

That is that part I disagree with. Since everyone constantly adds and such with signed integers, it is always possible that some of those adds will overflow. Yet the compiler just does the right thing. While if you happen to use a constant, it can wreck the code completely.

If you like that behavior, fine. I simply do not. As an old-school compiler person, that is NOT what we would want to produce from a compiler.

I know you aren't happy that the standard completely punts as soon as any UB is introduced into the execution, and thus its a crappy standard etc., but that's the language we've got. And large and useful applications are still written in it, and most of them work fine despite these annoying shortcomings of the language standard.

Sure, it could probably be improved, but even with 191+ flavors of undefined behavior in it, C is still one of the most useful programming languages around.
C is certainly useful, but this kind of nonsense is making it less useful. Because the compiler guys are taking great liberties with the term "undefined behavior". They catch some of it, ignore some of it, and completely break some of it. Consistency would be nice.

I don't want to get into a game of wits with the compiler...

Too late! The moment you allowed UB to slip undetected into your program, you were at its mercy.

But yeah, if you don't want to get into a "game of wits" with the compiler, then there's only one main thing you need to do, which is to be aware of the common types of UB and avoid them like the plague. Then you will have a legal C program with the semantics guaranteed by the standard, and everything will be fine. Crafty probably is such a program today, or nearly so.

strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited

Re: strcpy() revisited