Compiler Problem

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Compiler Problem

Post by bob »

Dann Corbit wrote:
Dann Corbit wrote:
Gian-Carlo Pascutto wrote:
Bo Persson wrote:C and C++ runs fine, because those languages don't require a specific format.
Don't they require float and double to be IEEE compliant?
No.
Consider, for instance, some Cray supercompuers do not use IEEE float or old systems like the CDC Cyber also do not use IEEE float. It would not be possible to write an efficient C compiler for such a machine.
Or the dec VAX. Or early IBM mainframes (I suspect they eventually went IEEE but I have not run on one since the /360 days. There used to be a ton of incompatible FP formats, no idea why everyone couldn't agree back then...
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Compiler Problem

Post by wgarvin »

Bo Persson wrote:
wgarvin wrote:On the other hand, sometimes its the language that's the problem, not the compiler.

Some things that programmers like to rely on (such as 2's complement integer overflow) are actually in undefined-behaviour territory, and optimized versions of the code might not do what you expect.
If the language doesn't even require 2's complement, how could it define the result of it overflowing? :-)
I think they actually did define what happens when unsigned integers overflow or underflow. Signed ones though, are undefined. Which means optimizing compilers are allowed to assume "this can never happen", and thus they are allowed to generate code that does all sorts of weird things when it does happen. It gets worse than that, too... those slides I linked are entertaining if you've never seen them before. It hilights the problem that as optimizations get more aggressive, it gets harder and harder for a programmer (or auditor) to verify what the program actually does just by looking at its source code. To write secure code, you might need to limit yourself to -O2 and/or use various arcane compiler options to mitigate the weirdness.
Bo Persson wrote:There is a basic idea of implementability behind most of C's and C++'s undefined behavior. The implementation can use whatever the hardware does.

Other languages, with a very well defined behavior - like Java - is actually less portable because of that. :-)
That was the original idea, yes. However, compiler writers have perverted it into "if you step into undefined territory, my compiler can do whatever the hell I want". So now you have compilers optimizing out safety checks for buffer overruns, because it can prove that if you failed the safety check you must have invoked undefined behaviour somewhere too, and therefore it is allowed to treat the safety check as dead code and discard it completely!

Even if this flexibility makes the compiler 1.3% faster on some benchmark or other, it also introduces new kinds of problems that never existed before, which the programmers then have to understand and avoid. Maybe its worth it, but it has a little tinge of inmates-running-the-asylum to it.

And for most types of software, its probably not worth it.
Bo Persson wrote:For example, you need extra hardware to run Java efficiently on an IBM mainframe, because the floating point format used there since the 1960's didn't match the language spec. C and C++ runs fine, because those languages don't require a specific format.
Sure, but maybe it will help convince the last few hardware vendors to get with the program and implement IEEE floating point like everybody else. Then we can all have languages which rely on that behaviour to guarantee useful properties to us about our programs.

One of the most annoying things about undefined behaviour in C/C++ is that if you looked at modern machines (say, all those made in the past 20 years) you could define a lot of the things that are undefined, and it would still be a highly efficient language on all of those machines. You could require 2's complement arithmetic, specify 2's complement wrapping of signed ints (no trapping on ints), require char to be 8 or 16 bits and require all integral and floating point types to be a multiple of 8 bits, specify one of two behaviours for shift of more than N bits, and so on. These are things that nearly everybody assumes anyway, when they're just trying to get some code to work. Even "portable" programs often contain assumptions of this sort. People memset arrays of pointers with zero, they assume that all pointers to 32-bit types are aligned and store extra flags in the bottom bits of the pointers, they perform integer math on pointers followed by invalid casts for type punning, they perform boundary tests that involve an actual overflow in the computation, they read from an aligned volatile int variable and assume the read was atomic, and probably lots of other stuff I've forgotten about. Anyway, even C is a language with some sharp edges to it. C++ is like a whole kitchen full of sharp objects. Use with care!
User avatar
Bo Persson
Posts: 243
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: Compiler Problem

Post by Bo Persson »

wgarvin wrote:
Bo Persson wrote:For example, you need extra hardware to run Java efficiently on an IBM mainframe, because the floating point format used there since the 1960's didn't match the language spec. C and C++ runs fine, because those languages don't require a specific format.
Sure, but maybe it will help convince the last few hardware vendors to get with the program and implement IEEE floating point like everybody else. Then we can all have languages which rely on that behaviour to guarantee useful properties to us about our programs.

One of the most annoying things about undefined behaviour in C/C++ is that if you looked at modern machines (say, all those made in the past 20 years) you could define a lot of the things that are undefined, and it would still be a highly efficient language on all of those machines. You could require 2's complement arithmetic, specify 2's complement wrapping of signed ints (no trapping on ints), require char to be 8 or 16 bits and require all integral and floating point types to be a multiple of 8 bits, specify one of two behaviours for shift of more than N bits, and so on. These are things that nearly everybody assumes anyway, when they're just trying to get some code to work. Even "portable" programs often contain assumptions of this sort. People memset arrays of pointers with zero, they assume that all pointers to 32-bit types are aligned and store extra flags in the bottom bits of the pointers, they perform integer math on pointers followed by invalid casts for type punning, they perform boundary tests that involve an actual overflow in the computation, they read from an aligned volatile int variable and assume the read was atomic, and probably lots of other stuff I've forgotten about. Anyway, even C is a language with some sharp edges to it. C++ is like a whole kitchen full of sharp objects. Use with care!
How would it convince IBM to abandon the format they and their customers have been using for 50 years? Or Unisys? Which even makes 36-bit 1's complement hardware, with 72-bit floating point! And 9 bit bytes. :-)

http://ecommunity.unisys.com/ecommunity ... 20Products

C and C++ are designed so that they can run natively on these machines, Java cannot!

If you want to write your programs for a certain set of hardware, you are free to do so, but why would we limit the languages to that set?

Shifting bits more than the size of an int actually changed between different generations of x86 hardware. That means identical binary code will get different results on different hardware. Some languages say this is what will happen (undefined behavior), others require a specific behavior which limits the number of possible targets.

Which do you prefer? What if C had specified the behavior of the then current hardware, and been incompatible with the present?

IBM solves the Java problem by selling hardware add-ons for their mainframes. Only $100k a pop!

http://www-03.ibm.com/systems/z/hardwar ... index.html

I like my tools to be sharp. You would not take the nail gun away from the carpenter, just because his apprentice could hurt himself. Having dull knifes in the kitchen doesn't help either.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Compiler Problem

Post by wgarvin »

Ok, I'll bite. =)
Bo Persson wrote: How would it convince IBM to abandon the format they and their customers have been using for 50 years? Or Unisys? Which even makes 36-bit 1's complement hardware, with 72-bit floating point! And 9 bit bytes. :-)

http://ecommunity.unisys.com/ecommunity ... 20Products

C and C++ are designed so that they can run natively on these machines, Java cannot!

If you want to write your programs for a certain set of hardware, you are free to do so, but why would we limit the languages to that set?
Obviously IBM would not break compatibility with their old hardware, but they also design new hardware -- hardware that uses IEEE floats like everyone else, and on which Java can be efficiently compiled. Anyway, most of the software that gets written, does not need the, um, flexibility offered by all of the undefined behaviour in the C/C++ standards. Most software will only ever run on commodity hardware (desktops or cheap racks of servers), and nearly all commodity hardware shares some features that are not assumed by C/C++: 8-bit bytes, power-of-two register sizes, IEEE floating point, 2- and 4-way SIMD instruction sets, etc.

C and C++ are the one-size-fits-all languages, which is fine. But narrowing them so they don't fit *every* weird bit of hardware out there, would make it more useful for the 95% case. But that goes against their philosophy.

What bugs me is that most of the code written in C or C++ will NOT ever run on a weird IBM mainframe that doesn't support IEEE floating point. It will NOT be run on some little DSP where a char is 32-bits. It will NOT be run in some weird virtualized environment where the NULL pointer has a binary representation which isn't all zeros. And the programmers writing that code have to remember that "de-facto" standard features of all of their target platforms, are not "standard" at all according to the spec, and if they try to use those features, their compiler might do anything it wants.
Shifting bits more than the size of an int actually changed between different generations of x86 hardware. That means identical binary code will get different results on different hardware. Some languages say this is what will happen (undefined behavior), others require a specific behavior which limits the number of possible targets.
IIRC that change happened more than 20 years ago. It gets used in code samples which detect the type of processor you're running on; at least it used to, back when people still bothered to check for those old CPUs.
Bo Persson wrote:Which do you prefer? What if C had specified the behavior of the then current hardware, and been incompatible with the present?
Then things would have been different. Hardware vendors would have standardized on the behaviour that C specified, using 100 extra transistors somewhere. /shrug. Instead, the standard left it unspecified, and everyone who wrote code that relied on it wrote programs whose behaviour was undefined. Fortunately, it took a long time for the compiler vendors to get around to optimizing out that behaviour, or their programs would have mysteriously not worked at high optimization levels.

Anyways, they didn't specify the behaviour back then, so who cares? I'm talking about *now*, and about how it would be useful to have a subset, or a new language, or an annex to the spec, which addresses the common commodity hardware platforms that 95% of the software written in C or C++ actually runs on. I don't mean to get into flamewar territory, but I really hope to see someday a smaller, simpler compiled language ala C-with-classes, that specifies most of the things left undefined by C and C++. I would like to be able to write a line of code that relies on 2's complement integer overflow without casting to unsigned and back.
Bo Persson wrote:I like my tools to be sharp. You would not take the nail gun away from the carpenter, just because his apprentice could hurt himself. Having dull knifes in the kitchen doesn't help either.
I like my tools to be useful, and easy to use. C and C++ can get the job done, but I would not say they are easy to use for large projects. Its too bad there's been so little evolution of native-compiled languages in the past 20 years, except maybe for the D language, and it hasn't really gone mainstream.

Okay, full disclosure time... What I wish I had is not actually some sort of official "C subset", because C still contains too much baggage for my taste (and C++ is about 90% baggage, and C++'s big mistake at the beginning was trying to be source-compatible with C). We all use our own unofficial subset anyway. I guess there's nothing wrong with baggage, except that most programmers end up not using those features and not being that familiar with them. Nobody uses all of C++; everybody uses different subsets of it. Then they have to debug and maintain each other's code, and all that baggage gets in the way. From my point of view, exceptions, RTTI and virtual base classes are all 100% baggage--nobody in the game industry uses them. On some of our target platforms, the compilers we are required to use don't even support these things properly. I know somebody somewhere uses them for something, but that doesn't make them worth having. The "try to please everybody" stance of the committee has made the C++ language into a giant mess. C99 is not as bad, but still contains some bloat (e.g. dynamic arrays)

What I really want is to see a new natively-compiled language emerge, which can be an alternative C and C++ for mainstream medium-to-large codebases. A language which doesn't try to please everybody, or chase after application domains where Java or Python or Ruby is already a better choice (like D). A language with simple and clean semantics, a minimum amount of mandatory runtime library code, no mandatory garbage collector, efficient tool support (fast incremental compilation and code indexing, hot code replace, etc.) better metaprogramming facilities and so on. I want it specifically for writing game engines, but the language I wish for would be general enough to use in a lot of other domains too. It would not be a better choice for business software than, say, Java. But it might be a better choice than C or C++ in some niches where they are currently the only usable choice.
Ron Murawski
Posts: 397
Joined: Sun Oct 29, 2006 4:38 am
Location: Schenectady, NY

Re: Compiler Problem

Post by Ron Murawski »

wgarvin wrote:
What I really want is to see a new natively-compiled language emerge, which can be an alternative C and C++ for mainstream medium-to-large codebases. A language which doesn't try to please everybody, or chase after application domains where Java or Python or Ruby is already a better choice (like D). A language with simple and clean semantics, a minimum amount of mandatory runtime library code, no mandatory garbage collector, efficient tool support (fast incremental compilation and code indexing, hot code replace, etc.) better metaprogramming facilities and so on. I want it specifically for writing game engines, but the language I wish for would be general enough to use in a lot of other domains too. It would not be a better choice for business software than, say, Java. But it might be a better choice than C or C++ in some niches where they are currently the only usable choice.
Take a look at the Vala/Genie programming language. It doesn't fulfill *all*
your wishes, but it satisfies most of them. Maybe 90%.

The two languages are in lock-step with each other and share a common compiler.
Vala is for those who prefer C#/Java syntax and Genie is for those who like
Python/Boo/D syntax. The Vala/Genie compiler 'valac' is, at the moment, a
front-end. The Vala/Genie code is translated into C and then run through gcc.

The Vala/Genie project is intimately tied to the Gnome C libraries. That might
sound limiting, but those libraries have been ported to most platforms. The
development system does *not* require the Gnome desktop. Most programs require
the Gnome GLib library only. Also, there is a tool to create Vala bindings for
other C libraries. Plus there are facilities to make native calls to the
operating system, such as time-of-day, uptime, system timers, etc. You also can
create C files from your Vala/Genie code and integrate it with other C code.

The language documentation is a bit sparse, but the mailing list is active and
helpful. I'm very impressed with the language so far: the error-checking is
quite a bit better than C and there's an arsenal of high-level constructs that
makes programming easier.

Vala: http://live.gnome.org/Vala
http://en.wikipedia.org/wiki/Vala_%28pr ... anguage%29

Genie: http://live.gnome.org/Genie
http://en.wikipedia.org/wiki/Genie_%28p ... anguage%29

I'm evaluating Genie as a possible chess-programming language. I spent the
weekend porting Pradu's magic bitboards from C to Genie. The code seems to be
working just fine with equivalent performance.


Ron
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Compiler Problem

Post by wgarvin »

Ron Murawski wrote: Take a look at the Vala/Genie programming language. It doesn't fulfill *all*
your wishes, but it satisfies most of them. Maybe 90%.

The two languages are in lock-step with each other and share a common compiler.
Vala is for those who prefer C#/Java syntax and Genie is for those who like
Python/Boo/D syntax. The Vala/Genie compiler 'valac' is, at the moment, a
front-end. The Vala/Genie code is translated into C and then run through gcc.
Thanks Ron! At a glance, it doesn't look like it would be too suitable for writing game engines in, but I will definitely read more about it.