lucasart wrote:So problem solved, but still: the GCC 4.8.1 compile is 5% slower than the 4.7.3 compile. This is a really big performance regression: same code, same compile flags. 5% slower
Lucas,
If you still want to use gcc 4.7.x it's pretty simply to download and compile the source code from the Gnu mirrors. I went the opposite way. I'm sticking with Linux Mint 13, which defaults to gcc 4.6.3, and installing 4.7.2 and 4.8.1 separately. I've made the decision to only upgrade to the LTS releases. I'm well past the point where I enjoy debugging OS problems. Once I have a stable system, I basically stick with the motto,
Michel wrote:
It may well be that this whole thing hides a bug in your code. Compiler errors are extremely rare.
In this case, you were right. But I used to (a long time ago) work for a compiler company, and I assure you, compilers are full of bugs. Everything from syntax parsing errors (front-end) to bad code generation (back-end). As well as performance regressions. Some are very hard to fix and may linger around in the codebase for years.
Michel wrote:
It may well be that this whole thing hides a bug in your code. Compiler errors are extremely rare.
In this case, you were right. But I used to (a long time ago) work for a compiler company, and I assure you, compilers are full of bugs. Everything from syntax parsing errors (front-end) to bad code generation (back-end). As well as performance regressions. Some are very hard to fix and may linger around in the codebase for years.
--Jon
That may be true, but unless an application programmer can show a reproducible code example where the compiler goes against the language standard specification, my money would be on a programmer's error, not on a compiler bug.
Rein Halbersma wrote:
BTW, I got an error with Clang SVN on your code even with std=gnu++11 because Clang will only allow VLA on POD (Plain Old Data) types and your type move_t is not a POD. The rules for POD classes changed in C++11: they need to be both trivial (essentially no user-defined overrides of compiler generated default, copy and move constructors, assignment operators and destructors) and standard-layout (no virtuals, only one access level). To fix it, you need to change the default constructor to
Anything else (zero initialization, or even providing move_t() {} yourself, makes move_t a non-trivial class). Fortunately, you are already in the good habit of initalization variables at the point of first use, rather than at the top of its scope (Stockfish authors, I am talking to you!) so it shouldn't change the meaning of your code.
I removed the useless constructor, and got back all the lost performance and more! I really can't believe that these useless zero initializations can cost that much, but sometimes the relation between code and performance is chaotic: remove a line of code, and suddenly (by butterfly effect?) a whole optimization takes place that didn't take place before. From 5% slower I got to 4% faster than with GCC 4.7. That being said, I don't have GCC 4.7 anymore, nad it's possible that the same code modification also improves performance on GCC 4.7.
The reason I had this constructor (zero initializer) is only to silence some wrong compiler warnings. The compiler thought I was using an uninitialized variable: it's not the case, but it's too complicated for the compiler to understand. So, in order to compile cleanly, I have no other choice than to silence the warning for using unitialized variables. I don't really like to do this, because it often caught errors in the past.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
I have a similar piece of code that generates warnings because the compiler does not understand the data flow. It isn't worth adding code to eliminate the warnings IMO.
Isn't there alloca available in Clang ?
Alloca is even better than VLA's because you can use restrict (or w/e C++ equivalent your compiler implements) with the pointer.
My experience with gcc 4.8 is that it produces faster binaries than 4.7 but I use MinGW for Windows so other things might have changed there.
What optimization options do you use ? I think I tried all of them in many combinations so maybe I notice something missing
Isn't there alloca available in Clang ?
Alloca is even better than VLA's because you can use restrict (or w/e C++ equivalent your compiler implements) with the pointer.
My experience with gcc 4.8 is that it produces faster binaries than 4.7 but I use MinGW for Windows so other things might have changed there.
What optimization options do you use ? I think I tried all of them in many combinations so maybe I notice something missing
For C++11, my preferred option is using a std::vector with a flexible and quite well performing arena allocator. This will allocate from a stack-based fixed-sized buffer, and go to the heap when that runs out. Overhead per allcoation is a conditional (check if pointer is inside buffer) and a pointer increment.
In code where going from heap to stack allocation makes a difference a conditional might be significant as well. I don't really know though, just guessing, I don't do C++. I would be shocked though if any major compiler didn't provide a way to allocate stuff on the stack. Even Visual Studio had _alloca and _malloca, GCC has both alloca and VLA's and Clang has alloca for C (at least that's what I've heard) so there should be one available for C++ as well.
Rein Halbersma wrote:For C++11, my preferred option is using a std::vector with a flexible and quite well performing arena allocator. This will allocate from a stack-based fixed-sized buffer, and go to the heap when that runs out. Overhead per allcoation is a conditional (check if pointer is inside buffer) and a pointer increment.
Why do things in a simple and effecient way when there is always an ugly bloated solution I would kill for that.
Do you realize that the code is actually broken?
It doesn't align the allocs.
So if you use the same "arena" to allocate 3 bytes then one 32-bit int on ARM, you will get an exception once you try to access it.