Questions about getting ready for multicore programming.

Discussion of chess software programming and technical issues.

Moderator: Ras

Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Questions about getting ready for multicore programming.

Post by Tord Romstad »

I never thought I would find myself defending C++, a language I detest. In this case, however, I still have to disagree with Carey.
Carey wrote:As for additional C++ features... although C++ can certainly allow you a certain amount of organization improvements and programmer productivity, it's not really capable of providing any higher performance than what C can.

In fact, due to the extra difficulty of optimizing C++ code, it's likely to be somewhat worse. Maybe not a lot, but at least a little. At best, no faster than C.
I think you are wrong. Different programming languages are optimized for different purposes. Some languages make it easy to write fast programs, but difficult and time-consuming to write 100% correct programs. Other languages make it easy to write correct and bug-free code, but more laborious to produce very fast programs. C occupies the extreme end of the easy-to-make-it-fast-but-hard-to-make-it-correct side of the spectrum. C++ is also close to this end of the spectrum, but is not quite as extreme.

C++ is not inherently slower than C, it just requires some more effort, knowledge and familiarity with the compiler to produce the fastest possible code with C++. With sufficient expertise, it should sometimes even be possible to achieve faster code with C++, because templates make it possible to do more computations and code generation at compile-time, and because stronger typing probably makes some types of optimization easier for the compiler.
It's also not a trivial task to seperate good C++ classes and features out of a chess program.

A chess program just doesn't seem to want to be organized in good OOP style.
Perhaps not, but there is no reason to use OOP just because you use C++. At least to me, OOP is not among the most compelling advantages of C++ compared to C. Stronger typing, exception handling and real strings are far more important.

Tord
Carey
Posts: 313
Joined: Wed Mar 08, 2006 8:18 pm

Re: Questions about getting ready for multicore programming.

Post by Carey »

Tord Romstad wrote:I never thought I would find myself defending C++, a language I detest. In this case, however, I still have to disagree with Carey.
Carey wrote:As for additional C++ features... although C++ can certainly allow you a certain amount of organization improvements and programmer productivity, it's not really capable of providing any higher performance than what C can.

In fact, due to the extra difficulty of optimizing C++ code, it's likely to be somewhat worse. Maybe not a lot, but at least a little. At best, no faster than C.
I think you are wrong. Different programming languages are optimized for different purposes. Some languages make it easy to write fast programs, but difficult and time-consuming to write 100% correct programs. Other languages make it easy to write correct and bug-free code, but more laborious to produce very fast programs. C occupies the extreme end of the easy-to-make-it-fast-but-hard-to-make-it-correct side of the spectrum.
Certainly no disagreement there.

I'd have to categorize C as a language that should never have become popular.

C was definetly a hacker's language. A system programming language.

Not something meant for the general public.
C++ is also close to this end of the spectrum, but is not quite as extreme.

C++ is not inherently slower than C, it just requires some more effort, knowledge and familiarity with the compiler to produce the fastest possible code with C++.
It's not so much the language, as the compilers having considerable difficulty optimzing it.

A lot of compiler writers objected pretty loudly to the inventive nature of the C++ standardization process, but the people doing the inventing were too excited about adding new features to be concerned about how much effort it would take to implement them efficiently. Or even if they could be implemented efficiently.


With sufficient expertise, it should sometimes even be possible to achieve faster code with C++, because templates make it possible to do more computations and code generation at compile-time, and because stronger typing probably makes some types of optimization easier for the compiler.
I have to disagree about both of those cases.

There is absolutely nothing in C++ that can't be achieved (with effort!) in C.

C++ can not be any more efficient that C can be. It can only be different styles of programming etc.

The different style may end up resulting in better efficiency, but that is due to the style and not the language.

And with the compiler probably implemented mostly in C, and sharing the back-end with the C++ compiler, it's kind of hard to argue that C++ can be inherently faster than C.



As for 'stronger typing' in C++... Hah! What a joke.... You are talking to an ex-Pascal programmer. Compared to Pascal, C++ has no type checking at all.

You have no idea how much I wish that C & C++ had decent type checking. Not quite as much as ISO Pascal, but at least half way there.

I looked at alternative languages for chess programs several times in the past few years. FreePascal, GNU Pascal, Modula-3... Either they have been abandoned, are near death, or the code produced is so bad I'd be almost better off with interpretive BASIC. (But all this is a side rant...)

It's also not a trivial task to seperate good C++ classes and features out of a chess program.

A chess program just doesn't seem to want to be organized in good OOP style.
Perhaps not, but there is no reason to use OOP just because you use C++. At least to me, OOP is not among the most compelling advantages of C++ compared to C. Stronger typing, exception handling and real strings are far more important.

Tord
I guess some of that comes from some discussions I've had with other people not here in the forum. In some of those talks, the goal was to try and break chess up into reasonable OOP chunks, instead of simply doing a C style chess program in C++.

We were never able to really break it up into reasonable chunks and keep things isolated.


I don't think C++'s slightly improved type checking is any significant benefit. Just not strong enough to really stop a programmer from making mistakes.

I don't think exception handling is all that useful for chess, either. And it doesn't come free. (Not expensive, but not free.) There are some places it could be handy, but not really enough to be worth the extra cost.

As for the strings.... I'm not sure there are any C programmers that like C's version of string handling. Not a major aspect of a chess program, but...
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Questions about getting ready for multicore programming.

Post by wgarvin »

Wow, it's friday night and I'm in the mood for a rant. Here goes...

C became popular because it was low-level and powerful, and required minimal run-time resources (which is one reason it is still popular for embedded programming, device drivers, etc). But for programs as large as you often find today, C is an error-prone and tedious language to write them in.

C++ code is theoretically as efficient as equivalent C code with a good optimizing compiler. In practice, C++ code depends heavily on a good optimizing compiler in order to get decent performance (much more heavily than C does). But those compilers do exist. As long as you are careful with templates, multiple inheritance, etc. and disable things like RTTI and exceptions, you can write C++ code that performs as good or better as the equivalent C code, except you can take advantage of encapsulation, constructors, templates, and the STL.

However: The C++ language itself, really is a bloated pig. It is way too complicated, includes too many "nice" features that don't play well together (e.g. constructors and destructors? cool. When mixed with exceptions? bogus). It is slow to compile, and modern projects stress the old C linking model to the breaking point (our current project can't use whole-program optimization for example, because the linker runs out of memory and crashes. Even without it, the link process takes more than 1GB of memory, and more than 5 minutes to run on a very modern machine). The STL contains elegant, efficient implementations of container classes and strings, and even some useful algorithms... however, its requires a lot of expertise to use it correctly because the API stinks and their design always chose performance rather than ease-of-use (much like C and C++ themselves).

Its easy to see how C++ got the way it is. It carried forward everything from C (including utterly bogus things like the C preprocessor with its textual includes), plus they have added several layers of new features to it over the years. Templates for example, are a powerful feature that deserve to occupy a central role in the type system and have a clean syntax. Instead, they came late to the party and so they have ugly syntax and subtly-different rules for things like overload signature matching.

Templates are actually a good microcosm of everything that is wrong with C++, so I will pick on them a bit further. C++ templates are quite powerful, and they can be used to write functional programs at compile time (posterchild: the boost library)--they are actually a turing-complete functional programming language that is interpreted using the compiler's type system! Yuuuuuuck. Even non-metaprogramming uses of templates can often produce an entire screenful of gibberish as the error message from 1 incorrect line of code, so you can imagine how horrendous the error messages from a template metaprogram can get. Even the source code of the template metaprogram code is pretty difficult to read, and usually almost impossible to debug. And yet, the fact that people *go to such lengths to write metaprograms with templates* shows that they want the compile-time metaprogramming capability. They are desperate for it.

Another complaint: because C++ is so complex, it took 10+ years to develop decent optimizing compilers for it, and all the development tools (IDEs, debuggers, etc) suck. Microsoft Visual Studio is the closest you will get to a non-sucky IDE for C++, and it still sucks in a lot of ways, but not for lack of trying! It is a direct result of C++ being too complicated and difficult to manipulate and process with tools. I highly doubt anyone will be able to make C++ tooling much better than what Microsoft has produced, but these state-of-the-art IDEs for C++ seem rather pitiful when you compare them to (for example) the Java tools that come with Eclipse.

Its too bad C++ is so entrenched now in millions of legacy programs, because I would love to see a new compiled language emerge with the low-level performance characteristics of C/C++, but none of the language bloat. In the games industry, nearly everyone uses C++ because they need its low-level power and performance, and because it compiles to machine code and does not require a garbage collector.

I suspect that a new language could offer 95% of the useful power of C++ while (initially) having about 10% of the complexity. The three main things I would want out of this language are:

(1) Be designed for efficient incremental compilation and linking, and efficient manipulation by tools/IDEs (in other words: no more preprocessor!).

(2) Strong compile-time metaprogramming support built into the language. Let me write *imperative* metaprograms which look just like my imperative C++ code! They should be able to inspect declarations, types, sizes and alignments as known to the compiler, and generate new statements or declarations or types on-the-fly. In short, all of the power of LISP macros combined with all the nice syntactic sugar of curly-brace languages. Also, there should be a phase which evaluates all the metaprograms, templates, compile-time conditionals etc. and outputs the results *as human-readable source code*, with the option to switch between this generated source and the original source while debugging. The language features would all have to be designed so that this generated source would stay comprehensible by humans (in contrast to, for example, the generated type names of C++ templates, which can easily run to thousands of characters and be nigh-unparseable by a human).

(3) Must be suitable for embedded real-time programming, must provide full control over memory allocation (i.e. it can't require a garbage collector), and must be able to link directly to compiled C code or libraries on the target machine.

The only language I know of that's close to what I want is the D by Walter Bright. Unfortunately, that language is aimed at large-scale application programming, so it more or less requires a garbage collector, which effectively means we can't use it for console games. D has not achieved widespread adoption (at least not yet). I don't seriously believe that any new language with the properties I want will be developed and achieve enough market penetration for useful compilers, debuggers, etc. to appear for the necessary platforms. Which is unfortunate, but oh well...!

Anyway, sorry for my rant. I actually typed all of this while waiting for my C++ build to finish. It takes around 20 minutes, even using a custom in-house tool that triples our build speeds. My belief is that a properly-designed language could be compiled around 5 times faster than C++ code, and (more importantly) be compiled incrementally. If I had that language today, I would never have had time to write this! =)

[Edit: Chess engines are complex and have complex behaviour, but the program itself is not very large, so C++ does an adequate job. But for large projects, C++ can be painful to work with. For example, the debug info for our *release* executables is still hundreds of megabytes. (I don't even try to build debug ones anymore, they take forever to link.)]
User avatar
Bo Persson
Posts: 257
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: Questions about getting ready for multicore programming.

Post by Bo Persson »

Carey wrote:
Bo Persson wrote:
Carey wrote:
Bo Persson wrote:
Carey wrote: I tried MSVC 2008 Express. The performance penalty was 3%.
I believe that just dumping everything in a class is not really fair. If you actually transform the code from C to "proper" C++, I belive you can regain more than these 3%.

As for additional C++ features... although C++ can certainly allow you a certain amount of organization improvements and programmer productivity, it's not really capable of providing any higher performance than what C can.

In fact, due to the extra difficulty of optimizing C++ code, it's likely to be somewhat worse. Maybe not a lot, but at least a little. At best, no faster than C.

I'm certainly not opposed to C++. I'm just saying that it's not capable of being any faster than C. Any differences would be attributable more to programmer style than the language.
I bet you haven't seen this paper by Bjarne Stroustrup, where he shows a case of the C++ standard library being inherently faster than the C library. It uses C++ templates to do things a C compiler isn't able to do.

"Learning Standard C++ as a New Language"

http://www.research.att.com/~bs/new_learning.pdf

The idea is that you can do some things differently in C++, and the language lets the compiler optimize the code better. You do need a good compiler, but in some cases a C++ compiler can do things a C compiler can not.
I just glanced through it and from what little I saw, it's much more slight of hand. Like comparing a water-mellon to an apple and saying the apple is better because it's smaller and is a pretty red color.

Comparing black-box library routines are pretty much in the same category. They were written with different requirements and specifications and interfaces and says nothing about your code.

What he's really comparing isn't C & C++ but the interfaces to their libraries. That really says very little about your code.


Write your own library to suite your own programming style and you'd probably get comparable performance to what Bjarne is allegding that C can't do.


.
You should have read it more carefully! :-)

The claim is that the C library isn't designed by chance, and that qsort is just about the best way you can write a general sort routine in C. It is also very hard to find a better way of optimizing this piece of library code.

In the example, the C++ std::sort runs up to 8 times faster for some data. It does so because you *can* choose a different library API. In C you cannot!


You are also right that, if needed, you can write some special hand tuned C code that fits the specific requirements. With C++ perhaps you don't have to.

If you find a really fast way of doing something in C, you can do it exactly the same way in C++ as well.



So, C code can be just as fast as C++, but no faster. :-)
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Questions about getting ready for multicore programming.

Post by Zach Wegner »

Bo Persson wrote:So, C code can be just as fast as C++, but no faster. :-)
What's interesting is that, whenever C++ is as fast as C, it is C.
Guetti

Re: Questions about getting ready for multicore programming.

Post by Guetti »

Hm, it has been stated several times that it is harder for C++ compilers to do good optimization. While this may be true in practice, I'm a bit baffled about this. I would have thought that C++ language makes it easier for the compiler to detect parts that can be optimized, like vars defined in loops, class member functions that can be inlined etc.

I also would like to point out that glaurung2 is a beautiful piece of software written in C++, despite Tords remark that he detests C++. It is a great help for me to look from time to time at glaurungs code, which gives me the ideas how to implement some things in a better way in C++. And apart of that, it is much easier to read than any C program like Crafty or C+(+) programs like Fruit.
Actually glaurung is the C++ chess engine I always wanted to write myself. Unfortunately Tord is much better and was faster than me.
User avatar
Bo Persson
Posts: 257
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: Questions about getting ready for multicore programming.

Post by Bo Persson »

Zach Wegner wrote:
Bo Persson wrote:So, C code can be just as fast as C++, but no faster. :-)
What's interesting is that, whenever C++ is as fast as C, it is C.
No it's not. That's just what Bjarne's paper was to show. The part where the C++ code was especially much faster was here:

Code: Select all

   vector<string> buf;
   fstream fin(file, ios::in);
   string d;

   while (getline(fin, d))
      buf.push_back(d);

    sort(buf.begin(), buf.end());
Not much C there! Just lots of supposedly slow and bloated templates. Happens to run 2-4 times faster than the equivalent C code with fopen, fscanf, realloc, and qsort :-)
User avatar
Bo Persson
Posts: 257
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: Questions about getting ready for multicore programming.

Post by Bo Persson »

Guetti wrote:Hm, it has been stated several times that it is harder for C++ compilers to do good optimization. While this may be true in practice, I'm a bit baffled about this. I would have thought that C++ language makes it easier for the compiler to detect parts that can be optimized, like vars defined in loops, class member functions that can be inlined etc.
What is true is that it is a lot harder to write a C++ compiler compared to a C compiler. :-)

On the other hand, C has other problems that C++ does not. The C99 language has added the restrict keyword to aid the compiler in optimizing an aliasing problem in pointers to potentially overlappning arrays.

In C++ you would use references to class objects, which cannot overlap in that way. Easier for the compiler!

Guetti wrote:I also would like to point out that glaurung2 is a beautiful piece of software written in C++, despite Tords remark that he detests C++. It is a great help for me to look from time to time at glaurungs code, which gives me the ideas how to implement some things in a better way in C++. And apart of that, it is much easier to read than any C program like Crafty or C+(+) programs like Fruit.
Actually glaurung is the C++ chess engine I always wanted to write myself. Unfortunately Tord is much better and was faster than me.
I haven't looked that closely at it. Perhaps I should! :-)
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Questions about getting ready for multicore programming.

Post by Zach Wegner »

Bo Persson wrote:
Zach Wegner wrote:
Bo Persson wrote:So, C code can be just as fast as C++, but no faster. :-)
What's interesting is that, whenever C++ is as fast as C, it is C.
No it's not. That's just what Bjarne's paper was to show. The part where the C++ code was especially much faster was here:

Code: Select all

   vector<string> buf;
   fstream fin(file, ios::in);
   string d;

   while (getline(fin, d))
      buf.push_back(d);

    sort(buf.begin(), buf.end());
Not much C there! Just lots of supposedly slow and bloated templates. Happens to run 2-4 times faster than the equivalent C code with fopen, fscanf, realloc, and qsort :-)
The templates are "slow and bloated" when compared to _real_ C. C is not a language that is designed to support general code like that (a fair complaint against it, of course), it is designed to give you access to the machine and get as much out of it as you can in the simplest way possible. There isn't really a reason to have a qsort in the standard library, and I wouldn't consider the functions in it to be representative of the language as a whole. So when someone writing _real_ C code sits down to write an efficient piece of code that sorts strings, it is going to be faster than that C++. And when a C++ programmer sits down and does the same thing, he's going to come up with the same program as the C programmer. If you want to compare C's qsort to C++'s sort, fine. But IMO you are comparing the standard libraries, not the languages. You might as well compare C to Java and conclude that you can't do anything in C because there isn't a standard library function for a hashtable. The real answer is, you actually need to learn what a hashtable is to program one in C.

I will note that there can be advantages like the "restrict" example given.

What the paper seems to be arguing is that _new_ programmers can come in and write code that is similar in efficiency to optimized C without learning all of the low-level details. I don't have a problem with this, but the argument doesn't apply to me. I am happy using C. It is fast, it is beautiful, and it gives me complete control.

Of course, this is all IMO. Let's not start another language war...
dzhao

Re: Consistant performance penalty for C++ classes

Post by dzhao »

Gerd Isenberg wrote: I have the impression that accessing global variables in 64-bit mode becomes more expensive. There is no compact mode with 32-bit addresses. There is a rip-relative addressing mode, but assembly generated by vc2005 indicates a pointer is needed to access globals all the time. Globals like static class members as well as statics inside the local scope of a function.

Code: Select all

 lea r10, base of some data_segment
 mov rax, [r10 + offset global var]
Thus passing a board-, search- or equivalently a this-pointer around - even in a recursive search, might be faster than accessing globals. It might even make sense to keep all the constant data inside a one time initialized, embedded none static "const" member.
Gerd
I noticed this when I played with x64 first time. I think the reason the compiler uses a register instead of an explicit constant to access a global is to reduce the code size. A 64 bit constant is 8 bytes, which is a large operand and no good for fast instruction decoding.

I don't think you need to put constants on heap if you do multithreading and use pointer to address a search tree or board.
In such a case r10 is already known (or initialized), that is the board (or tree) pointer for a thread. The first load only executes once.