mcostalba wrote: diep wrote: mcostalba wrote:
diep wrote: Every compare here is with GCC. A junk compiler i use daily, so the junk it produces i know everything about.
GCC has improved a bit since the 90s. Today this produces the fastest binary on Windows:
Why do you claim this nonsense?
When, in the 90s, have you stopped testing and started to go "by memory" ?
For your information this is the compiler used by Jim to produce the fastest x86-64 SSE42 Windows binary released for SF 2.2.2 (few months ago). Before, until last year, he was using the Intel compiler, but found this one faster.
Latest GCC snapshot i tested a few weeks ago, and it's still lightyears behind intel c++ and even visual studio.
Just because it hardly gets any speedup by PGO.
To avoid a bug in GCC's pgo, i'm doing the profile run single threaded with Diep. Even then it just gives 3% speedup.
I've posted extensively examples of how GCC messes up everywhere on the net. Starting in 2007.
Latest snapshot still didn't have that fixed.
So it already STARTS with a disadvantage over other compilers of 25% or so. Such bad PGO performance is of course a joke.
Note that around 2004-2005 some snapshots back then did do pretty ok at PGO, then suddenly BOOM and it no longer worked at all for Diep that is.
Default pgo gives 0.5% in GCC. Bug after bug and 7 years later it still hasn't been fixed.
One of the big screw ups in GCC which hits much software hard is the rewrite to end of function; it is grabbing your code, and instead of generating a simple CMOV it moves the code to end of function, jumps sometimes to there and then jumps after executing 2 instructions, back to where it was.
To quote Linus: "there is no excuse to not generate CMOV's"
A polish guy then posted back in 2007 replying to Linus: "but then it is slower at my P4".
Only at around end 2011 they started moving. We're some months later now, but a snapshot of a few weeks ago still was TURTLE slow still having the same bugs and bottlenecks.
Of course i am compiling for 64 bits, yet diep's code would be faster in 32 bits; i just want efficient code without messing up with the branch prediction.
I want a normal PGO just like other compilers have it!
They aren't capable of producing that, and they're overruling Linus on their way refusing to generate effectively shorter code for *many* years.
Now that they have some competition from other compilers that are 'on the production line' to overtake GCC, it wouldn't amaze me if they 'magically' suddenly improve a lot. They need a kick in their butt man.
The GCC team showed the middle finger to dozens of very important and influencial guys such as Linus for many years.
I'm amazed they know how to produce SSE 4.2 with SF, as they still didn't figure out how to efficiently produce code for branches. The entire fall through model of intel simply hasn't been implemented in GCC.
When did intel introduce this?
Oh 1994 or so?
The difference of gcc 4.0 versus the latest snapshots i tried, is just a few percent for Diep, meanwhile visualstudio and intel c++ got dozens of percentages faster for the modern hardware opteron (barcelona core) and core2 xeons that i have here.
"x86-64 and IA-64 will prove to be the ultimate disaster for GCC"
Marc Lehmann, in a private email to me