strcpy() revisited

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

wgarvin wrote:
hgm wrote:Except that the programs that really suffered buffer overflow because of this already had been fixed long before. After all, doing a strcpy(a, b) as while(*a++ = *b++); can only end in two ways: either it works perfectly, or a lies within the sting b, and it will lead to an infinite repetitive string, certainly causing a segfault.

No buffer overflow would ever be caught by this measure that would not have segfaulted by itself.

It only makes a difference for the harmless cases, that worked absolutely correctly.
This post by Dann Corbit in the openchess thread of 2 weeks ago seems to me to be evidence that overlapping calls to strcpy can malfunction without necessarily causing a segfault.

Code: Select all

GCC gave me this:

dcorbit@dcorbit /q/cc
$ cat bozo.c
#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
char b[32];
strcpy(b, "123456789012345");
strcpy(b + 1, b);
printf("[%s]\n", b);
return 0;
}

dcorbit@dcorbit /q/cc
$ gcc -Wall -ansi -pedantic bozo.c

dcorbit@dcorbit /q/cc
$ ./a
[1123456788012345]

Look at it carefully, is it what you expected?
Well, this then obviously must be a version of strcpy(a,b) that does NOT use the while(*a++=*b++);implementation my statement clearly referred to. The output posted by Bob is even weirder, btw, as the later printf does print strings that do not match with their strlen().

Still there is nothing to weaken my criticism. No matter how they implemented their strcpy(), it must test for overlap to exhibit the abort behavior in case there is overlap. Given that it does make this test, it could without any extra performance loss have made it work for any overlap, simply reversing copy direction when the overlap was detected that otherwise would cause a buffer overrun. If they felt that a safeguard was needed because of their unusual implementation (locating the end and copying back to front) might cause crashes for the opposite case as people are used to, it is still malicious to let it abort in stead. Arguments that this is preferable over buffer overruns are just beside the point: there would not have to be any buffer overruns ever, without any performance loss. They could have made it such that strcpy would make a faithful copy for any kind of overlap, for free. In stead they decided to make the program abort on cases that on most other compilers would work, and thus would be very likely to be around in abundance. They found a loophole that allowed them to inflict great damage on their customers and software suppliers while hiding behind the standard, and jumped to the occasion.

I think it would be pretty naive to think that they would have to do this only for a transition period, until 'everyone has fixed his code', and then switch back to a faster implementation that again allows the buffer overruns. As soon as they would stop enforcing this, things would of course degenerate into chaos again very quickly.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: strcpy() revisited

Post by mvk »

bob wrote:
wgarvin wrote:
hgm wrote:Except that the programs that really suffered buffer overflow because of this already had been fixed long before. After all, doing a strcpy(a, b) as while(*a++ = *b++); can only end in two ways: either it works perfectly, or a lies within the sting b, and it will lead to an infinite repetitive string, certainly causing a segfault.

No buffer overflow would ever be caught by this measure that would not have segfaulted by itself.

It only makes a difference for the harmless cases, that worked absolutely correctly.
This post by Dann Corbit in the openchess thread of 2 weeks ago seems to me to be evidence that overlapping calls to strcpy can malfunction without necessarily causing a segfault.

Code: Select all

GCC gave me this:

dcorbit@dcorbit /q/cc
$ cat bozo.c
#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
char b[32];
strcpy(b, "123456789012345");
strcpy(b + 1, b);
printf("[%s]\n", b);
return 0;
}

dcorbit@dcorbit /q/cc
$ gcc -Wall -ansi -pedantic bozo.c

dcorbit@dcorbit /q/cc
$ ./a
[1123456788012345]

Look at it carefully, is it what you expected?
That's because of their hand-coded/optimized code that does different things depending on how many bytes you copy. It is quirky. And you want to know what is REALLY funny? this damned mavericks library did NOT detect that overlap. :)

Now isn't that absolutely-frickin' amazing? Here is the code:

#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
char b[64];
int i;

printf("overlap pass 1\n");
for (i=1;i<20;i++) {
strcpy(b, "123456789012345");
strcpy(b + i, b);
printf("(%d) [%s] strlen=%d\n", i, b, strlen(b));
}

printf("overlap pass 2\n");
for (i=1;i<20;i++) {
strcpy(b, "123456789012345");
strcpy(b, b + i);
printf("(%d) [%s] strlen=%d\n", i, b, strlen(b));
}
return 0;
}

Here is the output on mavericks:

scrappy% ./tst2
overlap pass 1
(1) [1123456788012345] strlen=15
(2) [12123456787812345] strlen=15
(3) [123123456786782345] strlen=15
(4) [1234123456785678345] strlen=15
(5) [12345123456784567845] strlen=15
(6) [123456123456783456785] strlen=15
(7) [1234567123456782345678] strlen=15
(8) [123456781234567812345678] strlen=15
(9) [1234567891234567891234567] strlen=15
(10) [12345678901234567890123456] strlen=15
(11) [123456789011234567890112345] strlen=15
(12) [1234567890121234567890121234] strlen=15
(13) [12345678901231234567890123123] strlen=15
(14) [123456789012341234567890123412] strlen=15
(15) [1234567890123451234567890123451] strlen=15
(16) [123456789012345] strlen=15
(17) [123456789012345] strlen=15
(18) [123456789012345] strlen=15
(19) [123456789012345] strlen=15
overlap pass 2
Abort


So isn't that absolutely-frickin' wonderful? They abort on the overlap that is perfectly safe, they ignore the one that causes the problems. What a WONDERFUL group of library folks, wouldn't you agree? They couldn't even break the most dangerous case.

wow..
The checks depend on the -O level.

No -O --> pass 1 aborts. I didn't expect that because -D_FORTIFY_SOURCE requires -O or higher to kick in. Well, apparently not.

-O --> pass 1 runs, pass 2 aborts. inconsistent. -S shows that in the first loop there is no strcpy call produced, and in the second there is.

-O2 --> pass 1 runs with wrong output(!). pass 2 aborts

Have you noted the reported 'strlen' in your output stays at 15 for the first iteration? Some strcpy-induced UB is kicking in again in the form of an optimisation (strlen call got replaced by constant 15 under -O2).

Looking at the assembly, the whole strcpy is never present in the first loop. The reason is very funny: it recognizes that your toy string, including \0, is 16 bytes long. It then replaces the whole thing with quad-word move instructions. If you replace your string with "01234567890123456", so one char extra, it aborts with every optimisation level.

(Let's not start about formatting size_t with %d.)
[Account deleted]
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: strcpy() revisited

Post by hgm »

Interesting. It means it makes an assumption on what strcpy does. While in principle the library it linked with could have contained anything. strcpy is not a reserved C keyword. Is this a result of 'link-time optimization'? Or do the inculded header files tell it somehow that strcpy can only be the thing in clib, and that clib can only be the Apple implementation of it?
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: strcpy() revisited

Post by mvk »

hgm wrote:Interesting. It means it makes an assumption on what strcpy does. While in principle the library it linked with could have contained anything. strcpy is not a reserved C keyword. Is this a result of 'link-time optimization'? Or do the inculded header files tell it somehow that strcpy can only be the thing in clib, and that clib can only be the Apple implementation of it?
Once you say "#include <string.h>" the compiler can know the semantics of strcpy and strlen in the code that follows, because <string.h> is standardised. There is no obligation to implement <string.h> with a file for example. (if you want that, you have to say #include "string.h").
Last edited by mvk on Wed Dec 11, 2013 11:55 pm, edited 1 time in total.
[Account deleted]
Rein Halbersma
Posts: 751
Joined: Tue May 22, 2007 11:13 am

Re: strcpy() revisited

Post by Rein Halbersma »

hgm wrote:Interesting. It means it makes an assumption on what strcpy does. While in principle the library it linked with could have contained anything. strcpy is not a reserved C keyword. Is this a result of 'link-time optimization'? Or do the inculded header files tell it somehow that strcpy can only be the thing in clib, and that clib can only be the Apple implementation of it?
do you really think the compiler has special case source code to detect Apple versions of strcpy? maybe if you ask gcc nicely, they'll add some of that special casing based on your preferred userID as well...
Rein Halbersma
Posts: 751
Joined: Tue May 22, 2007 11:13 am

Re: strcpy() revisited

Post by Rein Halbersma »

mvk wrote:
hgm wrote:Interesting. It means it makes an assumption on what strcpy does. While in principle the library it linked with could have contained anything. strcpy is not a reserved C keyword. Is this a result of 'link-time optimization'? Or do the inculded header files tell it somehow that strcpy can only be the thing in clib, and that clib can only be the Apple implementation of it?
Once you say "#include <string.h>" the compiler can know the semantics of strcpy and strlen in the code that follows, because <string.h> is standardised.
I think they will parse and compile <string.h> just like any other C file, with the exception that some -Wextra warnings can be suppressed since not all C library files compile cleanly at the warning.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

Rein Halbersma wrote:
hgm wrote:Interesting. It means it makes an assumption on what strcpy does. While in principle the library it linked with could have contained anything. strcpy is not a reserved C keyword. Is this a result of 'link-time optimization'? Or do the inculded header files tell it somehow that strcpy can only be the thing in clib, and that clib can only be the Apple implementation of it?
do you really think the compiler has special case source code to detect Apple versions of strcpy? maybe if you ask gcc nicely, they'll add some of that special casing based on your preferred userID as well...
Has to be somewhere outside gcc, because it doesn't fail on 4.7.3 on my office box. since string.h (/usr/include/string.h) only has the prototype for strcpy() it is apparently using the Apple library or shared library, I did not try to figure out which. My bin tools (which included objdump) disappeared with mavericks, I'll reinstall somewhere along the way, unless I get Linux installed first.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

hgm wrote:
wgarvin wrote:
hgm wrote:Except that the programs that really suffered buffer overflow because of this already had been fixed long before. After all, doing a strcpy(a, b) as while(*a++ = *b++); can only end in two ways: either it works perfectly, or a lies within the sting b, and it will lead to an infinite repetitive string, certainly causing a segfault.

No buffer overflow would ever be caught by this measure that would not have segfaulted by itself.

It only makes a difference for the harmless cases, that worked absolutely correctly.
This post by Dann Corbit in the openchess thread of 2 weeks ago seems to me to be evidence that overlapping calls to strcpy can malfunction without necessarily causing a segfault.

Code: Select all

GCC gave me this:

dcorbit@dcorbit /q/cc
$ cat bozo.c
#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
char b[32];
strcpy(b, "123456789012345");
strcpy(b + 1, b);
printf("[%s]\n", b);
return 0;
}

dcorbit@dcorbit /q/cc
$ gcc -Wall -ansi -pedantic bozo.c

dcorbit@dcorbit /q/cc
$ ./a
[1123456788012345]

Look at it carefully, is it what you expected?
Well, this then obviously must be a version of strcpy(a,b) that does NOT use the while(*a++=*b++);implementation my statement clearly referred to. The output posted by Bob is even weirder, btw, as the later printf does print strings that do not match with their strlen().

Still there is nothing to weaken my criticism. No matter how they implemented their strcpy(), it must test for overlap to exhibit the abort behavior in case there is overlap. Given that it does make this test, it could without any extra performance loss have made it work for any overlap, simply reversing copy direction when the overlap was detected that otherwise would cause a buffer overrun. If they felt that a safeguard was needed because of their unusual implementation (locating the end and copying back to front) might cause crashes for the opposite case as people are used to, it is still malicious to let it abort in stead. Arguments that this is preferable over buffer overruns are just beside the point: there would not have to be any buffer overruns ever, without any performance loss. They could have made it such that strcpy would make a faithful copy for any kind of overlap, for free. In stead they decided to make the program abort on cases that on most other compilers would work, and thus would be very likely to be around in abundance. They found a loophole that allowed them to inflict great damage on their customers and software suppliers while hiding behind the standard, and jumped to the occasion.

I think it would be pretty naive to think that they would have to do this only for a transition period, until 'everyone has fixed his code', and then switch back to a faster implementation that again allows the buffer overruns. As soon as they would stop enforcing this, things would of course degenerate into chaos again very quickly.
I think you are making an enormous mistake here. Assuming ANY rational thought of any kind is clearly completely unsupported by any evidence given in the test code from Corbitt that shows that Mavericks is simply broken in so many ways regarding strcpy().

BTW my gcc -S does show strcpy() calls, but not for the initialization, it uses MOVQ with 8 byte constants. It also ends up calling __strcpy_chk() rather than __strcpy(). Have to dig that one up...
syzygy
Posts: 5719
Joined: Tue Feb 28, 2012 11:56 pm

Re: strcpy() revisited

Post by syzygy »

bob wrote:
Rein Halbersma wrote:
hgm wrote:Interesting. It means it makes an assumption on what strcpy does. While in principle the library it linked with could have contained anything. strcpy is not a reserved C keyword. Is this a result of 'link-time optimization'? Or do the inculded header files tell it somehow that strcpy can only be the thing in clib, and that clib can only be the Apple implementation of it?
do you really think the compiler has special case source code to detect Apple versions of strcpy? maybe if you ask gcc nicely, they'll add some of that special casing based on your preferred userID as well...
Has to be somewhere outside gcc, because it doesn't fail on 4.7.3 on my office box. since string.h (/usr/include/string.h) only has the prototype for strcpy() it is apparently using the Apple library or shared library, I did not try to figure out which. My bin tools (which included objdump) disappeared with mavericks, I'll reinstall somewhere along the way, unless I get Linux installed first.
http://en.wikibooks.org/wiki/GNU_C_Comp ... chitecture
GCC built-in functions are the functions that are evaluated at compile time. For example, if the size argument of a strcpy() function is a constant then GCC replaces the function call with the required number of assignments. The compiler replaces standard library calls with built-in functions and then evaluates them once the function's AST is constructed. In case of strcpy(), the compiler checks the size argument and uses the optimized built-in version of strcpy() if the argument is constant.

Code: Select all

#include <stdio.h>

int main(int argc, char **argv)
{
  char *a = argv[1];
  char b[256];
  __builtin_strcpy(b, a);
  printf("strlen("%s") = %d\n", b, __builtin_strlen(a));
  return 0;
}
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: strcpy() revisited

Post by bob »

syzygy wrote:
bob wrote:
Rein Halbersma wrote:
hgm wrote:Interesting. It means it makes an assumption on what strcpy does. While in principle the library it linked with could have contained anything. strcpy is not a reserved C keyword. Is this a result of 'link-time optimization'? Or do the inculded header files tell it somehow that strcpy can only be the thing in clib, and that clib can only be the Apple implementation of it?
do you really think the compiler has special case source code to detect Apple versions of strcpy? maybe if you ask gcc nicely, they'll add some of that special casing based on your preferred userID as well...
Has to be somewhere outside gcc, because it doesn't fail on 4.7.3 on my office box. since string.h (/usr/include/string.h) only has the prototype for strcpy() it is apparently using the Apple library or shared library, I did not try to figure out which. My bin tools (which included objdump) disappeared with mavericks, I'll reinstall somewhere along the way, unless I get Linux installed first.
http://en.wikibooks.org/wiki/GNU_C_Comp ... chitecture
GCC built-in functions are the functions that are evaluated at compile time. For example, if the size argument of a strcpy() function is a constant then GCC replaces the function call with the required number of assignments. The compiler replaces standard library calls with built-in functions and then evaluates them once the function's AST is constructed. In case of strcpy(), the compiler checks the size argument and uses the optimized built-in version of strcpy() if the argument is constant.

Code: Select all

#include <stdio.h>

int main(int argc, char **argv)
{
  char *a = argv[1];
  char b[256];
  __builtin_strcpy(b, a);
  printf("strlen("%s") = %d\n", b, __builtin_strlen(a));
  return 0;
}
No idea what they are talking about there. There is no size argument to strcpy(). strcpy(dest, src) is all there is. The builtin strcpy() doesn't seem to be any problem, because this didn't change with the version of gcc, it changed with the version of the library distributed with Mavericks, and the Apple developers owned up to changing the library and even posted the strcpy() code somewhere in all the stuff I had read.