Page 1 of 4
Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 7:18 pm
by j_romang
Hello,
I just tried to make a haswell optimized build :
https://www.dropbox.com/s/ghbs1vw18q6q4 ... 8_bmi2.zip
Thanks to Ronald de Man's code, I implemented BMI2 instructions in stockfish. This build also supports his syzygy's tablebases.
Please tell me if it works, you should have a ~4% speedup with the corresponding abrok.eu version. Of course you need a Haswell processor to run it !
Please note that this is NOT an official build, but just an experiment
Re: Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 8:55 pm
by Gerd Isenberg
j_romang wrote:Hello,
I just tried to make a haswell optimized build :
https://www.dropbox.com/s/ghbs1vw18q6q4 ... 8_bmi2.zip
Thanks to Ronald de Man's code, I implemented BMI2 instructions in stockfish. This build also supports his syzygy's tablebases.
Please tell me if it works, you should have a ~4% speedup with the corresponding abrok.eu version. Of course you need a Haswell processor to run it !
Please note that this is NOT an official build, but just an experiment
This attack code instead of magics?
https://github.com/syzygy1/tb/blob/master/src/bmi2.h
Or is there something better in the meantime or more BMI2 changes elsewhere?
Thanks,
Gerd
Re: Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 8:57 pm
by j_romang
Re: Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 9:13 pm
by Gerd Isenberg
Thanks, yes Ronald's PDEP/PEXT 210.5k lookup approach. Wow, 4% is a huge speedup considering only attack-getters are changed, but of course affecting other memory and cache issues in other areas of the program. Assuming you also tried the PEXT only approach with 4 times greater tables, Ronald had the right sense, congratulations!
http://www.talkchess.com/forum/viewtopi ... 11&start=3
Cheers,
Gerd
Re: Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 9:20 pm
by phenri
I would probably have no utility of this patch before some months, or even some years.
But thank you for the future.
I have a question that has nothing to do with BMI2.
I'm asking you because I do not really expect a response from Marco.
I probably be wrong, but I want to understand.
Why in the makefile,
POPCNT comes with the flag -msse3 instead of -msse4.2 while POPCNT is present only for architectures with a minimum SSE4.2.
And why the flag -mpopcnt is not included?
Code: Select all
### 3.9 popcnt
ifeq ($(popcnt),yes)
CXXFLAGS += -msse3 -DUSE_POPCNT
endif
Same for preftech why the flag is so low?
Code: Select all
### 3.7 prefetch
ifeq ($(prefetch),yes)
ifeq ($(sse),yes)
CXXFLAGS += -msse
DEPENDFLAGS += -msse
endif
else
CXXFLAGS += -DNO_PREFETCH
endif
Regards,
Paul
Re: Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 9:35 pm
by syzygy
It might be interesting to try this.
In bitboard.h change
Code: Select all
struct BMI2Info {
unsigned short *data;
uint64_t mask1;
uint64_t mask2;
};
extern unsigned short attack_table[107648];
extern struct BMI2Info bishop_bmi2[64];
extern struct BMI2Info rook_bmi2[64];
template<PieceType Pt>
inline Bitboard attacks_bb(Square s, Bitboard occ) {
struct BMI2Info *info = (Pt == ROOK ? &rook_bmi2[s] : &bishop_bmi2[s]);
return _pdep_u64(info->data[_pext_u64(occ, info->mask1)], info->mask2);
}
into
Code: Select all
struct BMI2Info {
uint64_t *data;
uint64_t mask;
};
extern struct BMI2Info bishop_bmi2[64];
extern struct BMI2Info rook_bmi2[64];
template<PieceType Pt>
inline Bitboard attacks_bb(Square s, Bitboard occ) {
struct BMI2Info *info = (Pt == ROOK ? &rook_bmi2[s] : &bishop_bmi2[s]);
return info->data[_pext_u64(occ, info->mask)];
}
In bitboard.cpp change the lines
Code: Select all
static unsigned short attacks_table[107648];
...
info[sq].mask1 = bb
...
if (i == 0)
info[sq].mask2 = bb2;
attacks_table[idx++] = _pext_u64(bb2, info[sq].mask2);
into
Code: Select all
static uint64_t attacks_table[107648];
...
info[sq].mask = bb
...
attacks_table[idx++] = bb2;
I did not test this, so maybe something is wrong or missing.
This might be faster and this might be slower, but it would be interesting to know.
Re: Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 9:39 pm
by syzygy
Gerd Isenberg wrote:Assuming you also tried the PEXT only approach with 4 times greater tables, Ronald had the right sense, congratulations!
I don't think he tried, but we can now find out
Re: Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 10:18 pm
by Gerd Isenberg
syzygy wrote:Gerd Isenberg wrote:Assuming you also tried the PEXT only approach with 4 times greater tables, Ronald had the right sense, congratulations!
I don't think he tried, but we can now find out
Would be nice if Jean-Francois or you could try and report. U64 versus pdep_u64(U16, mask2).
Re: Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 10:26 pm
by j_romang
I didn't try
Re: Stockfish haswell optimized build
Posted: Sun Apr 06, 2014 10:30 pm
by j_romang
Gerd Isenberg wrote:
Wow, 4% is a huge speedup considering only attack-getters are changed, but of course affecting other memory and cache issues in other areas of the program.
According to my profiling experiments stockfish spends about 5-6% of time computing attack bitboards, that's we I wanted to give a try to the pext solution.