strcpy() revisited

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: strcpy() revisited

Post by bnemias »

bob wrote:I didn't post it here for Apple's developers. I posted it here because it represented a change in behavior that has not been changed in 20 years, it was only on one system (Mavericks), it took a good bit of time to track down since it was not obvious that it was a mavericks issue at first. I thought it might benefit others that saw something similar and they could save some time.
Fair enough. I'd say you accomplished that just fine with one post in the first thread.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

wgarvin wrote:
bob wrote:
wgarvin wrote:
bob wrote:
syzygy wrote:The C you supplied does not have semantics.

If you mean it is "hyatt C" and not C, then you should use a "hyatt C" compiler.
Do you know the definition of "semantics"? Apparently not.
You must be using the word differently from us. "Semantics" are the meaning of your program.
What, EXACTLY, did I say. First read your statement above. "Semantics are the meaning of your program." I stated "I supply the semantics in the form of a C source."
Yes, my statement was correct, but yours is not. You supply the C source. The C language standard supplies the semantics, and the C implementation (compiler, libraries, pthreads etc) may add some extensions of its own. Your C code means exactly what the compiler thinks it means. It doesn't mean what YOU think it means. Whenever what you think it means and what the language standard says it means disagree, what YOU think is WRONG.

I don't know how I can possibly say this any clearer than I have. You are professor of CS who teaches C programming to others, and its just incomprehensible to me that you don't accept or apparently even understand this. I'm sorry to be the bearer of bad news here.
Just to clarify my thoughts, which have been repeatedly twisted, I initially reported a problem I had found that related to UB in a strcpy(). I did not tell anyone to use that sort of UB. I simply wrote "it has been working, and it was changed for absolutely no reason, and all the change did was cause the program to silently abort." I don't think that is reasonable compiler or library development. Never have I suggested that anyone USE UB. I pointed out that everyone tolerates it due to signed arithmetic that can overflow without the compiler noticing. Which is exactly what I want it to do. Do what I wrote... do the best thing. Until the compiler steps over the line and says "I can't do anything about the possible overflow over THERE because I can't confirm it happens at compile time, but over HERE I can peek at a value and notice that it will overflow, and I can break that to get away with a cute but not very important optimization.

That has been my position in all of this the whole time. Hasn't changed one bit. If something is on the ragged edge of the language spec, and I use it whether by accident or intentionally, and I get burned due to a change, fine.

For example, I have, more than once, misused one or more of these by accident, writing code in a hurry:

volatile int *p;
int * volatile p;
volatile int * volatile p;

In threads, often they are interchangeable since they force the compiler's hand in optimizing and either of the first two might give it enough trouble that it has to do the right thing and your code works. Until the compiler guys get more registers (as in x86_64 when suddenly a timing hole opens. Happened to me quite a few years back. I can deal with that. An accidental bug caused by me, exposed by a better compiler, ok. Except that apple didn't make anything any better. In fact, it is quite a mess.

Just look at the library vs built-in mess that was exposed this past week. One overlap is caught by the library, the other is optimized away by the compiler's build-in strcpy() stuff. We got N different sorts of behavior depending on who did what to whom. That's not so hot.





Note that it doesn't mean every C programmer has to memorize the standard, or even have read it before (although that might be worthwhile if they intend to work on anything serious written in C). Most C programmers probably haven't, and yet they are able to write correct or nearly-correct programs that work reasonably well in practice. But it does mean that if they labor under a misconception of how a language feature works, they will sometimes get burned. The compiler doesn't care what YOU think the code means, its "mind reading skills" are even worse than mine. It only cares about the semantics spelled out in the standard.
bob wrote:
wgarvin wrote: Since it is allegedly a C program, that means it is written using C syntax and its semantics are specified in the C language standard. And if that program invokes undefined behavior, the C language does not assign to it any semantics at all. Its not even guaranteed to execute the other parts correctly up to the undefined behavior; the entire execution is undefined.

And that is a bogus concept. Here's why.

[--some stuff snipped--]

Inconsistent. Even if someone knows the above possibilities, they might overlook the trickiness of the compiler and allow it to discern the value of i and wreck the code.

That is that part I disagree with. Since everyone constantly adds and such with signed integers, it is always possible that some of those adds will overflow. Yet the compiler just does the right thing. While if you happen to use a constant, it can wreck the code completely.

If you like that behavior, fine. I simply do not. As an old-school compiler person, that is NOT what we would want to produce from a compiler.


I know you aren't happy that the standard completely punts as soon as any UB is introduced into the execution, and thus its a crappy standard etc., but that's the language we've got. And large and useful applications are still written in it, and most of them work fine despite these annoying shortcomings of the language standard.
My point was/is, however, that C has not always behaved quite like that. It has generally "done what I wrote" as opposed to introducing some optimizations that I would call unsafe at best. Sort of like simple constant-folding in addition. If I write this:

x += constant + constant

it can be turned into

x += constant2 where constant2 is the equivalent to constant + constant. That can introduce an untended side-effect. That most would probably not think about. For example, suppose x = -2000000000? Right on the edge of wrapping. If I code

x += 0x4000000 + 0x40000000;

The compiler can wreck that. when the final result, using two additions, would be perfectly valid. But if I use two variables, where the compiler can't see the values at compile time, it would work just as I intended. Math to us is not math to the computer, because we don't have any overflow or wrap around. The compiler guys are trying to pretend that the wraparound doesn't exist, when it absolutely must in a fixed-length word. And they break some and let some work. That makes programming "interesting"... "Let's see what the compiler did to us today to cause this failure..."




Sure, it could probably be improved, but even with 191+ flavors of undefined behavior in it, C is still one of the most useful programming languages around. :lol:
C is certainly useful, but this kind of nonsense is making it less useful. Because the compiler guys are taking great liberties with the term "undefined behavior". They catch some of it, ignore some of it, and completely break some of it. Consistency would be nice.

I don't want to get into a game of wits with the compiler...
Too late! The moment you allowed UB to slip undetected into your program, you were at its mercy. :lol:

But yeah, if you don't want to get into a "game of wits" with the compiler, then there's only one main thing you need to do, which is to be aware of the common types of UB and avoid them like the plague. Then you will have a legal C program with the semantics guaranteed by the standard, and everything will be fine. Crafty probably is such a program today, or nearly so.
SO far as I know, Crafty is perfectly compliant, ignoring the potential overflows possible in any arithmetic operation. But since the compiler can't stick its nose into most of those, all is well. Doesn't mean there is no other UB left. It was never added intentionally, but anything can sneak in by accident. I always enable all warnings, although it seems that even gcc and clang don't quite agree on what is ok and what is not. Big surprise...
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: strcpy() revisited

Post by wgarvin »

bob wrote:Just to clarify my thoughts, which have been repeatedly twisted, I initially reported a problem I had found that related to UB in a strcpy(). I did not tell anyone to use that sort of UB. I simply wrote "it has been working, and it was changed for absolutely no reason, and all the change did was cause the program to silently abort." I don't think that is reasonable compiler or library development. Never have I suggested that anyone USE UB. I pointed out that everyone tolerates it due to signed arithmetic that can overflow without the compiler noticing. Which is exactly what I want it to do. Do what I wrote... do the best thing. Until the compiler steps over the line and says "I can't do anything about the possible overflow over THERE because I can't confirm it happens at compile time, but over HERE I can peek at a value and notice that it will overflow, and I can break that to get away with a cute but not very important optimization.

That has been my position in all of this the whole time. Hasn't changed one bit. If something is on the ragged edge of the language spec, and I use it whether by accident or intentionally, and I get burned due to a change, fine.
I admit that everything in this post seems entirely reasonable to me. I'm happy to let this thread die now because I think plenty of interesting ground was covered but everyone's tired of it and its time to move on. I feel like I might have kept prolonging it, with some of my posts. When a lot of imprecise statements are thrown around its pretty easy to find fault with some of them.

Bob, if you feel I've twisted your responses around, or been rude at any point during these discussions, then I apologise for that. I do have lots of respect for your work with Crafty and your contributions to the community.