phenri wrote:Why in the makefile, POPCNT comes with the flag -msse3 instead of -msse4.2 while POPCNT is present only for architectures with a minimum SSE4.2.
It is not entirely true that POPCNT needs SSE4.2. On AMD native popcnt came with SSE4a.
I tried compiling with -msse4.2. It did not improve the speed.
Because stockfish uses inline assembly for native popcount, so the compiler does not need to be told it can generate the popcnt instruction. It would be different if stockfish used the __builtin_popcountll() compiler intrinsic.
### 3.7 prefetch[/quote]
I did not check, but I suppose the prefetch instruction is available on systems with sse. So no need to generate executables that don't work on systems that do not have sse2 or higher.
j_romang wrote:Hello,
I just tried to make a haswell optimized build : https://www.dropbox.com/s/ghbs1vw18q6q4 ... 8_bmi2.zip
Thanks to Ronald de Man's code, I implemented BMI2 instructions in stockfish. This build also supports his syzygy's tablebases.
Please tell me if it works, you should have a ~4% speedup with the corresponding abrok.eu version. Of course you need a Haswell processor to run it !
Please note that this is NOT an official build, but just an experiment
About to take it out for a spin.
Thanks
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
phenri wrote:The makefile should be upgraded to the SSE4.2
Not a good idea, unless it would gain speed. It does not for me.
Compiling with -msse4.2 means it won't work on machines that have sse3 but not sse4.2.
But sse3 is relatively very old, why it come with modern profile
Ok, third time.
Why do you want to make life difficult for people with older hardware.
If it gains you anything, I can understand. If it gains you nothing, I do not understand.
I have measured. It gains you nothing.
So you just want to make life difficult for people with older hardware?
It's up to anybody to change the Makefile.
I use
PGOBENCH = ./$(EXE) bench 1024 1 4 default time
which results in a longer compile time.
CXXFLAGS += -msse4a -DUSE_POPCNT
and have deleted -msse.
CXXFLAGS += -Ofast -march=native
I have the feeling that native may give a small speed up.
So do what you like, but for the general distribution we should think of others needs too.
Kind regards
Bernhard
BBauer wrote:It's up to anybody to change the Makefile.
I use
PGOBENCH = ./$(EXE) bench 1024 1 4 default time
which results in a longer compile time.
CXXFLAGS += -msse4a -DUSE_POPCNT
and have deleted -msse.
CXXFLAGS += -Ofast -march=native
I have the feeling that native may give a small speed up.
So do what you like, but for the general distribution we should think of others needs too.
Kind regards
Bernhard