The new instruction specs for Haswell have been released: http://software.intel.com/file/36945
The instructions, particularly the bit manipulation instructions, look pretty awesome for chess. Check out PEXT/PDEP: instead of magic bitboards, you can just get the attack mask, extract out the relevant bits with PEXT, do a table lookup of a 2-byte value (just a compressed attack bitboard), and use PDEP to decompress back to an attack bitboard. Too bad there's not a vector version too
ANDN/BLSI/BLSMSK/BLSR/BZHI/SH*X should speed up many common bitboard operations, too.
Haswell New Instructions
Moderators: hgm, Rebel, chrisw
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: Haswell New Instructions
Very interesting.
The BLSR/BLSI instruction reset/extract the lowest set bit. I'm not aware of compiler intrinsics for those operations, so I hope that compilers will detect idioms like "x & (x-1)" and "x & (-x)" and convert them to those instructions.
The BLSR/BLSI instruction reset/extract the lowest set bit. I'm not aware of compiler intrinsics for those operations, so I hope that compilers will detect idioms like "x & (x-1)" and "x & (-x)" and convert them to those instructions.
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Haswell New Instructions
Wonderful!Zach Wegner wrote:The new instruction specs for Haswell have been released: http://software.intel.com/file/36945
The instructions, particularly the bit manipulation instructions, look pretty awesome for chess. Check out PEXT/PDEP: instead of magic bitboards, you can just get the attack mask, extract out the relevant bits with PEXT, do a table lookup of a 2-byte value (just a compressed attack bitboard), and use PDEP to decompress back to an attack bitboard. Too bad there's not a vector version too
ANDN/BLSI/BLSMSK/BLSR/BZHI/SH*X should speed up many common bitboard operations, too.
Time to write some new stuff in CPW
Looking for latencies ...
Intrinsics are already listed in the paper but possibly not yet available.
Code: Select all
PEXT unsigned __int64 _pext_u64(unsigned __int64 src, unsigned __int32 mask);
PDEP unsigned __int64 _pdep_u64(unsigned __int64 src, unsigned __int32 mask);
BLSR unsigned __int64 _blsr_u64(unsigned __int64 src);
Gerd
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: Haswell New Instructions
There's a bug in their intrinsic definitions there: the 64-bit versions should take a 64-bit mask, of course.Gerd Isenberg wrote:Wonderful!Zach Wegner wrote:The new instruction specs for Haswell have been released: http://software.intel.com/file/36945
The instructions, particularly the bit manipulation instructions, look pretty awesome for chess. Check out PEXT/PDEP: instead of magic bitboards, you can just get the attack mask, extract out the relevant bits with PEXT, do a table lookup of a 2-byte value (just a compressed attack bitboard), and use PDEP to decompress back to an attack bitboard. Too bad there's not a vector version too
ANDN/BLSI/BLSMSK/BLSR/BZHI/SH*X should speed up many common bitboard operations, too.
Time to write some new stuff in CPW
Looking for latencies ...
Intrinsics are already listed in the paper but possibly not yet available.
Cheers,Code: Select all
PEXT unsigned __int64 _pext_u64(unsigned __int64 src, unsigned __int32 mask); PDEP unsigned __int64 _pdep_u64(unsigned __int64 src, unsigned __int32 mask); BLSR unsigned __int64 _blsr_u64(unsigned __int64 src);
Gerd
Can't wait to get my hand on some silicon...
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Haswell New Instructions
Using PEXT for indices would give away the advantage of magic multiply where most squares have magics that give you a 1-bit-smaller index. Maybe it will be faster than a multiply, and still worth it?Zach Wegner wrote:The new instruction specs for Haswell have been released: http://software.intel.com/file/36945
The instructions, particularly the bit manipulation instructions, look pretty awesome for chess. Check out PEXT/PDEP: instead of magic bitboards, you can just get the attack mask, extract out the relevant bits with PEXT, do a table lookup of a 2-byte value (just a compressed attack bitboard), and use PDEP to decompress back to an attack bitboard. Too bad there's not a vector version too
ANDN/BLSI/BLSMSK/BLSR/BZHI/SH*X should speed up many common bitboard operations, too.
I'm sure several of these instructions will find niche uses though for chessprogramming. Compressing attack bitboards and using PDEP to uncompress them does sound like an easy win.
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: Haswell New Instructions
Right... Unfortunately they're probably icc only, not gcc. But if gcc adds intrinsics for it (and I'm guessing they will), icc will probably add the same ones since it's always gcc compatible. So it's probably all good.Gerd Isenberg wrote: Intrinsics are already listed in the paper but possibly not yet available.
Cheers,Code: Select all
PEXT unsigned __int64 _pext_u64(unsigned __int64 src, unsigned __int32 mask); PDEP unsigned __int64 _pdep_u64(unsigned __int64 src, unsigned __int32 mask); BLSR unsigned __int64 _blsr_u64(unsigned __int64 src);
Gerd
-
- Posts: 778
- Joined: Sat Jul 01, 2006 7:11 am
Re: Haswell New Instructions
By not extracting the bits on the edges of the board and the bit of the moving piece, wouldn't you come up with 10 bits for rook attacks and 9 bits for bishop attacks, thus needing a 1K by 64 qword table for rooks and 512 by 64 qword table for bishops.wgarvin wrote:Using PEXT for indices would give away the advantage of magic multiply where most squares have magics that give you a 1-bit-smaller index. Maybe it will be faster than a multiply, and still worth it?Zach Wegner wrote:The new instruction specs for Haswell have been released: http://software.intel.com/file/36945
The instructions, particularly the bit manipulation instructions, look pretty awesome for chess. Check out PEXT/PDEP: instead of magic bitboards, you can just get the attack mask, extract out the relevant bits with PEXT, do a table lookup of a 2-byte value (just a compressed attack bitboard), and use PDEP to decompress back to an attack bitboard. Too bad there's not a vector version too
ANDN/BLSI/BLSMSK/BLSR/BZHI/SH*X should speed up many common bitboard operations, too.
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Haswell New Instructions
You'd extract exactly the same bits that are used as input to a variable-shift magic multiply. However, the index formed by magic mul, shift usually has 1 bit less. So the tables would approximately double in size.jwes wrote:By not extracting the bits on the edges of the board and the bit of the moving piece, wouldn't you come up with 10 bits for rook attacks and 9 bits for bishop attacks, thus needing a 1K by 64 qword table for rooks and 512 by 64 qword table for bishops.wgarvin wrote:Using PEXT for indices would give away the advantage of magic multiply where most squares have magics that give you a 1-bit-smaller index. Maybe it will be faster than a multiply, and still worth it?Zach Wegner wrote:The new instruction specs for Haswell have been released: http://software.intel.com/file/36945
The instructions, particularly the bit manipulation instructions, look pretty awesome for chess. Check out PEXT/PDEP: instead of magic bitboards, you can just get the attack mask, extract out the relevant bits with PEXT, do a table lookup of a 2-byte value (just a compressed attack bitboard), and use PDEP to decompress back to an attack bitboard. Too bad there's not a vector version too
ANDN/BLSI/BLSMSK/BLSR/BZHI/SH*X should speed up many common bitboard operations, too.
[edit: it might still be worth it though, since the PEXT replaces both the multiply and the variable shift... in effect you get the variable index sizes "for free" with PEXT, which is pretty nice]
However, the PDEP at the end shrinks table size by a factor of four and requires no extra memory indirection, so that seems like a pure win.
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Haswell New Instructions
I thought of another idea... If there wasn't already a dedicated POPCNT instruction, it could easily be implemented with a PEXT, TZCNT sequence [edit: or PEXT, NOT, LZCNT might be better, I guess]
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Haswell New Instructions
If pext/pdep pairs well, minimal Kindergarten BBs will get a new kick as well, ignoring redundancy of the outer squares for same extract and deposit masks.
Code: Select all
rankMask dq 64 dup (?) ; include squares
fileMask dq 64 dup (?) ; include squares
firstRankLookup db 256 dup (?)
; input: rcx square index 0..63, not changed
; rdx occupancy, not changed
; output: rax rook attacks
pext rax, rdx, quad word ptr [rankMask + 8*rcx]
pext r08, rdx, quad word ptr [fileMask + 8*rcx]
movzx rax, byte ptr [firstRankLookup + rax]
movzx r08, byte ptr [firstRankLookup + r08]
pdep rax, rax, quad word ptr [rankMask + 8*rcx]
pdep r08, r08, quad word ptr [fileMask + 8*rcx]
or rax, r08