Page 2 of 4

Re: Stockfish haswell optimized build

Posted: Sun Apr 06, 2014 10:42 pm
by syzygy
phenri wrote:Why in the makefile, POPCNT comes with the flag -msse3 instead of -msse4.2 while POPCNT is present only for architectures with a minimum SSE4.2.
It is not entirely true that POPCNT needs SSE4.2. On AMD native popcnt came with SSE4a.

I tried compiling with -msse4.2. It did not improve the speed.
And why the flag -mpopcnt is not included?

Code: Select all

### 3.9 popcnt
ifeq ($(popcnt),yes)
	CXXFLAGS += -msse3 -DUSE_POPCNT
endif
Because stockfish uses inline assembly for native popcount, so the compiler does not need to be told it can generate the popcnt instruction. It would be different if stockfish used the __builtin_popcountll() compiler intrinsic.
Same for preftech why the flag is so low?

Code: Select all

### 3.7 prefetch[/quote]
I did not check, but I suppose the prefetch instruction is available on systems with sse. So no need to generate executables that don't work on systems that do not have sse2 or higher.

Re: Stockfish haswell optimized build

Posted: Sun Apr 06, 2014 10:43 pm
by syzygy
j_romang wrote:I didn't try :wink:
See the suggested code changes I posted above. They try.

Re: Stockfish haswell optimized build

Posted: Sun Apr 06, 2014 11:02 pm
by AdminX
j_romang wrote:Hello,
I just tried to make a haswell optimized build : https://www.dropbox.com/s/ghbs1vw18q6q4 ... 8_bmi2.zip
Thanks to Ronald de Man's code, I implemented BMI2 instructions in stockfish. This build also supports his syzygy's tablebases.
Please tell me if it works, you should have a ~4% speedup with the corresponding abrok.eu version. Of course you need a Haswell processor to run it !
Please note that this is NOT an official build, but just an experiment :wink:
About to take it out for a spin. :)

Thanks

Re: Stockfish haswell optimized build

Posted: Sun Apr 06, 2014 11:55 pm
by phenri
Thank's Ronald

The makefile should be upgraded to the SSE4.2

Re: Stockfish haswell optimized build

Posted: Mon Apr 07, 2014 1:01 am
by syzygy
phenri wrote:The makefile should be upgraded to the SSE4.2
Not a good idea, unless it would gain speed. It does not for me.

Compiling with -msse4.2 means it won't work on machines that have sse3 but not sse4.2.

Re: Stockfish haswell optimized build

Posted: Mon Apr 07, 2014 1:35 am
by phenri
syzygy wrote:
phenri wrote:The makefile should be upgraded to the SSE4.2
Not a good idea, unless it would gain speed. It does not for me.

Compiling with -msse4.2 means it won't work on machines that have sse3 but not sse4.2.
But sse3 is relatively very old, why it come with modern profile

Re: Stockfish haswell optimized build

Posted: Wed Apr 09, 2014 2:13 am
by phenri
Hi,
I have another question: For Stockfish supports a CPU with AVX, should we add additional code or simply use the flag -mavx?

Re: Stockfish haswell optimized build

Posted: Wed Apr 09, 2014 7:55 am
by syzygy
phenri wrote:
syzygy wrote:
phenri wrote:The makefile should be upgraded to the SSE4.2
Not a good idea, unless it would gain speed. It does not for me.

Compiling with -msse4.2 means it won't work on machines that have sse3 but not sse4.2.
But sse3 is relatively very old, why it come with modern profile
Ok, third time.

Why do you want to make life difficult for people with older hardware.
If it gains you anything, I can understand. If it gains you nothing, I do not understand.

I have measured. It gains you nothing.
So you just want to make life difficult for people with older hardware?

Re: Stockfish haswell optimized build

Posted: Wed Apr 09, 2014 9:15 am
by BBauer
It's up to anybody to change the Makefile.
I use
PGOBENCH = ./$(EXE) bench 1024 1 4 default time
which results in a longer compile time.
CXXFLAGS += -msse4a -DUSE_POPCNT
and have deleted -msse.
CXXFLAGS += -Ofast -march=native
I have the feeling that native may give a small speed up.

So do what you like, but for the general distribution we should think of others needs too.
Kind regards
Bernhard

Re: Stockfish haswell optimized build

Posted: Wed Apr 09, 2014 11:41 am
by zullil
BBauer wrote:It's up to anybody to change the Makefile.
I use
PGOBENCH = ./$(EXE) bench 1024 1 4 default time
which results in a longer compile time.
CXXFLAGS += -msse4a -DUSE_POPCNT
and have deleted -msse.
CXXFLAGS += -Ofast -march=native
I have the feeling that native may give a small speed up.

So do what you like, but for the general distribution we should think of others needs too.
Kind regards
Bernhard
No need for -msse4a since you have -march=native.

Try -O3 -fno-tree-pre instead of -Ofast.