Stockfish port to C# Complete

diep · Post by **diep** » Wed Mar 28, 2012 11:03 am

Jim Ablett wrote:
diep wrote:
mcostalba wrote:
diep wrote:
mcostalba wrote:
diep wrote: Every compare here is with GCC. A junk compiler i use daily, so the junk it produces i know everything about.
GCC has improved a bit since the 90s. Today this produces the fastest binary on Windows:

http://www.equation.com/servlet/equation.cmd?fa=fortran
Why do you claim this nonsense?

When, in the 90s, have you stopped testing and started to go "by memory" ?

For your information this is the compiler used by Jim to produce the fastest x86-64 SSE42 Windows binary released for SF 2.2.2 (few months ago). Before, until last year, he was using the Intel compiler, but found this one faster.
Latest GCC snapshot i tested a few weeks ago, and it's still lightyears behind intel c++ and even visual studio.

Just because it hardly gets any speedup by PGO.

To avoid a bug in GCC's pgo, i'm doing the profile run single threaded with Diep. Even then it just gives 3% speedup.

I've posted extensively examples of how GCC messes up everywhere on the net. Starting in 2007.

Latest snapshot still didn't have that fixed.

So it already STARTS with a disadvantage over other compilers of 25% or so. Such bad PGO performance is of course a joke.

Note that around 2004-2005 some snapshots back then did do pretty ok at PGO, then suddenly BOOM and it no longer worked at all for Diep that is.

Default pgo gives 0.5% in GCC. Bug after bug and 7 years later it still hasn't been fixed.

One of the big screw ups in GCC which hits much software hard is the rewrite to end of function; it is grabbing your code, and instead of generating a simple CMOV it moves the code to end of function, jumps sometimes to there and then jumps after executing 2 instructions, back to where it was.

That *hurts*.

To quote Linus: "there is no excuse to not generate CMOV's"

A polish guy then posted back in 2007 replying to Linus: "but then it is slower at my P4".

Only at around end 2011 they started moving. We're some months later now, but a snapshot of a few weeks ago still was TURTLE slow still having the same bugs and bottlenecks.

Of course i am compiling for 64 bits, yet diep's code would be faster in 32 bits; i just want efficient code without messing up with the branch prediction.

I want a normal PGO just like other compilers have it!

They aren't capable of producing that, and they're overruling Linus on their way refusing to generate effectively shorter code for *many* years.

Now that they have some competition from other compilers that are 'on the production line' to overtake GCC, it wouldn't amaze me if they 'magically' suddenly improve a lot. They need a kick in their butt man.

The GCC team showed the middle finger to dozens of very important and influencial guys such as Linus for many years.

I'm amazed they know how to produce SSE 4.2 with SF, as they still didn't figure out how to efficiently produce code for branches. The entire fall through model of intel simply hasn't been implemented in GCC.

When did intel introduce this?

Oh 1994 or so?

The difference of gcc 4.0 versus the latest snapshots i tried, is just a few percent for Diep, meanwhile visualstudio and intel c++ got dozens of percentages faster for the modern hardware opteron (barcelona core) and core2 xeons that i have here.

Vincent

"x86-64 and IA-64 will prove to be the ultimate disaster for GCC"
Marc Lehmann, in a private email to me

Try latest link-time optimizations.
Code: Select all
 -Ofast -flto -fwhole-program -fprofile-generate / -fprofile-use
Jim.

The released 4.7.0 is indeed a lot faster than the initial snapshots i tested.
The flto is just slowing down Diep by the way, quite a tad.
when using the flag in the linker to use the flto it still slows down 2%.

Probably problem is that GCC as usual produces too many instructions to get things done, causing a bigger L1i miss.

Note i'm using it in 32 bits, as in 64 bits this effect would be worse.

But where the snapshots at AMD hardware hardly are faster than the old GCC's, at the core2 hardware here default -O2 makes it 10% faster in 32 bits. Not sure about 64 bits - could be more limited effect there.

What they seem to have fixed compared to snapshots 4.6+ is the PGO. At least for intel it gives around a 8% speedup without need to modify diep to single threaded single core - i could run it multithreaded (that means 1 search thread and 1 i/o to user thread). Not sure about AMD yet. So overall GCC is a 18% faster at first sight than it used to be for core2.

Note visual studio speeds up a 22% by pgo or so. Intel c++ way more than that. At every "normal 21th centuries compiler feature" if i may call it like that, there the commercial compilers still total hammer GCC as they profit more there than GCC.

But there is progress there for first time in 7 years!

All together it's not bad what i see for 4.7.0. I'll compare it one of these days with intel c++ and also at the AMD. Previous 4.6 snapshots i had tried at AMD barcelona core. As for intel core2 xeons i have here it's a lot faster this compiler. That's good news!

Maybe the speeddiff has halved now, making intel c++ a 20% faster now than GCC or so. Exact measurements are needed there yet not easy as intel nowadays charges big cash for its compiler!

The flto is a big bummer though. In 32 bits, where instruction sizes are a lot smaller, it's default 7% slower or so. When having linker use it, it's 2% slower.

Jim Ablett · Post by **Jim Ablett** » Wed Mar 28, 2012 1:13 pm

Maybe the speeddiff has halved now, making intel c++ a 20% faster now than GCC or so. Exact measurements are needed there yet not easy as intel nowadays charges big cash for its compiler!

The flto is a big bummer though. In 32 bits, where instruction sizes are a lot smaller, it's default 7% slower or so. When having linker use it, it's 2% slower.

Your experiences with Gcc 32 bit are true for me too in Windows where generally Msvc/Intel still produces faster compiles, though the gap has been reduced. In my experience this is not the case in Windows 64 bit and Linux anymore where GCC 4.6/4.7 produces equal or faster compiles when latest optimization switches are used.

Jim.

diep · Post by **diep** » Wed Mar 28, 2012 2:39 pm

Jim Ablett wrote:
Maybe the speeddiff has halved now, making intel c++ a 20% faster now than GCC or so. Exact measurements are needed there yet not easy as intel nowadays charges big cash for its compiler!

The flto is a big bummer though. In 32 bits, where instruction sizes are a lot smaller, it's default 7% slower or so. When having linker use it, it's 2% slower.
Your experiences with Gcc 32 bit are true for me too in Windows where generally Msvc/Intel still produces faster compiles, though the gap has been reduced. In my experience this is not the case in Windows 64 bit and Linux anymore where GCC 4.6/4.7 produces equal or faster compiles when latest optimization switches are used.

Jim.

GCC is turtle slow for diep in 64 bits. Diep is alraedy suffering from the L1i in 32 bits and as you know GCC always needs more instructions to get the same thing done and that will never change of course.

So in 64 bits GCC suffers more for Diep. Doing things efficient in 64 bits in number of instructions is simply very important - GCC always has been bad there.

The only thing you speak about is an open source engine that's fast with GCC according to you. An open source engine that's hardly getting used online and not in any benchmark.

However all the other programmers here will be way faster with intel c++.

Now of course last few years intel started charging cash for its compiler and i see that the previous compiler i got for free of them it had just a 1 year license.

So that in itself means that GCC will get more popular again now some company paid some work to get GCC faster at intel hardware.

We have seen such boost of GCC before. It happened around 2004. Then in 2005 it was over and we had to wait another 7 years before any significant improvement of GCC was there, which happened the past few months/weeks. A few weeks ago when i tested, it still wasn't much good, and suffered from bugs!

You shouldn't make GCC bigger than it is. It's always the slowest compiler of course, with SF as exception. Now if we go tune with intel c++ in a professional manner, i'm sure you'll get it faster than GCC for SF as well.

If we see the genius SIMD that intel is using for specint under which sjeng, that's a level GCC will never reach of course.

Yet i guess no one is interested in doing that, which favours the open source community.

In any case it's a very positive thing that last weeks released GCC releases have speeded up a lot!

Also please note a few weeks ago when i tested a bunch of snapshots, it suffered from bugs everywhere, diep's output was not deterministic.

I'll do after i finish this job in a few days, another attempt of running gcc 4.7.0 but then in 64 bits at my 16 core AMD box.

Let's see whether it's any faster than a 7 years old compiler there...

Now hopefully it doesn't take another 7 years before the next improvements in GCC are there to get on par with visual studio / intel c++. That said realize how 'generic' the stuff of visual studio works... ...it's not rocket science you know. 1 good programmer can make the difference.

We saw that back in 2004 as well when, wasn't it at SUSE? 1 guy there got paid to improve GCC. The company had a 2nd guy also involved. So we can see it as 2 persons busy improving GCC and they speeded it up a lot back then. Same thing happened now, let's hope the guy who managed this, that he keeps getting paid to improve GCC.

Vincent

Jim Ablett · Post by **Jim Ablett** » Wed Mar 28, 2012 2:48 pm

No no, GCC is turtle slow for diep in 64 bits. Diep is alraedy suffering from the L1i in 32 bits and as you know GCC always needs more instructions to get the same thing done and that will never change of course.

So in 64 bits GCC suffers more for Diep.

For chess programs written to utilize 64 bit architecture & using bitboards, then GCC is faster generally.

Jim.

diep · Post by **diep** » Wed Mar 28, 2012 2:54 pm

Jim Ablett wrote:
No no, GCC is turtle slow for diep in 64 bits. Diep is alraedy suffering from the L1i in 32 bits and as you know GCC always needs more instructions to get the same thing done and that will never change of course.

So in 64 bits GCC suffers more for Diep.
For chess programs written to utilize 64 bit architecture & using bitboards, then GCC is faster generally.

Jim.

Well i hope some of the authors post something here for you, as when it's even 0.5% faster they will switch to GCC i'm sure

However latest report i had was that for most bitboarders it's 50% difference positive for intel c++. The speedup i see from GCC, though diep is not bitboards, is just a 10% boost by 4.7.0, and it's 14% behind on pgo already onto visual studio, we can only guess intel c++ there.

So i seriously doubt your claim here.

Let me do an email to a few authors there, waking them up. Maybe some want to even post here.

Vincent

Jim Ablett · Post by **Jim Ablett** » Wed Mar 28, 2012 3:06 pm

diep wrote:
Jim Ablett wrote:
No no, GCC is turtle slow for diep in 64 bits. Diep is alraedy suffering from the L1i in 32 bits and as you know GCC always needs more instructions to get the same thing done and that will never change of course.

So in 64 bits GCC suffers more for Diep.
For chess programs written to utilize 64 bit architecture & using bitboards, then GCC is faster generally.

Jim.
Well i hope some of the authors post something here for you, as when it's even 0.5% faster they will switch to GCC i'm sure

However latest report i had was that for most bitboarders it's 50% difference positive for intel c++. The speedup i see from GCC, though diep is not bitboards, is just a 10% boost by 4.7.0, and it's 14% behind on pgo already onto visual studio, we can only guess intel c++ there.

So i seriously doubt your claim here.

Let me do an email to a few authors there, waking them up. Maybe some want to even post here.

Vincent

Of course the results still depend on how the source code is written, but this is generally what I am seeing now when I compile for 64 bit, especially in Linux. In Windows it is a lot closer.

Jim.

diep · Post by **diep** » Wed Mar 28, 2012 3:08 pm

Jim Ablett wrote:
diep wrote:
Jim Ablett wrote:
No no, GCC is turtle slow for diep in 64 bits. Diep is alraedy suffering from the L1i in 32 bits and as you know GCC always needs more instructions to get the same thing done and that will never change of course.

So in 64 bits GCC suffers more for Diep.
For chess programs written to utilize 64 bit architecture & using bitboards, then GCC is faster generally.

Jim.
Well i hope some of the authors post something here for you, as when it's even 0.5% faster they will switch to GCC i'm sure

However latest report i had was that for most bitboarders it's 50% difference positive for intel c++. The speedup i see from GCC, though diep is not bitboards, is just a 10% boost by 4.7.0, and it's 14% behind on pgo already onto visual studio, we can only guess intel c++ there.

So i seriously doubt your claim here.

Let me do an email to a few authors there, waking them up. Maybe some want to even post here.

Vincent
Of course the results still depend on how the source code is written, but this is generally what I am seeing now when I compile for 64 bit, especially in Linux. In Windows it is a lot closer.

Jim.

Do you see a difference between intel c++ for windows and intel c++ for linux? Other than the interface of it of course...

rbarreira · Post by **rbarreira** » Wed Mar 28, 2012 4:46 pm

diep wrote: You shouldn't make GCC bigger than it is. It's always the slowest compiler of course, with SF as exception. Now if we go tune with intel c++ in a professional manner, i'm sure you'll get it faster than GCC for SF as well.

In my experience gcc caught up with the Intel C compiler as soon as gcc 4.5 was released. I'm using 64-bit linux here.

So I agree with Jim that the latest gcc versions generate very good 64-bit code. But obviously things will vary from program to program.

Jim Ablett · Post by **Jim Ablett** » Wed Mar 28, 2012 6:14 pm

Do you see a difference between intel c++ for windows and intel c++ for linux? Other than the interface of it of course...

Yes.
GCC produces faster code in Linux than Intel compiler now. That has my experience for a while now.
Intel compiler in 32 bit Windows still reigns supreme but difference is decreasing.
GCC produces faster code in Windows 64 bit where code is designed for 64 bit architecture & uses
bitboards, popcount, bitscan etc. Difference is small, the exception being latest Stockfish which was 5%
faster with GCC than Intel in 64 bit. I use Intel v12.

Jim.

bob · Post by **bob** » Wed Mar 28, 2012 7:01 pm

diep wrote:
Jim Ablett wrote:
No no, GCC is turtle slow for diep in 64 bits. Diep is alraedy suffering from the L1i in 32 bits and as you know GCC always needs more instructions to get the same thing done and that will never change of course.

So in 64 bits GCC suffers more for Diep.
For chess programs written to utilize 64 bit architecture & using bitboards, then GCC is faster generally.

Jim.
Well i hope some of the authors post something here for you, as when it's even 0.5% faster they will switch to GCC i'm sure

However latest report i had was that for most bitboarders it's 50% difference positive for intel c++. The speedup i see from GCC, though diep is not bitboards, is just a 10% boost by 4.7.0, and it's 14% behind on pgo already onto visual studio, we can only guess intel c++ there.

So i seriously doubt your claim here.

Let me do an email to a few authors there, waking them up. Maybe some want to even post here.

Vincent

This is a bit unexpected to me. I find, even today, that intel's compiler produces faster executables that gcc in every test I have done, so long as you are actually running on an Intel CPU and not AMD...

I'll try to give this a test later today to confirm it is still true, but I would be absolutely amazed if gcc beats a commercial compiler written by the company that designed the cpu itself...

BTW I don't see anything like 50%. Maybe 10%. But 10% is still 10%...

Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete