Haswell New Instructions

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Sep 12, 2011 8:33 pm

Gerd Isenberg wrote:If pext/pdep pairs well, minimal Kindergarten BBs will get a new kick as well, ignoring redundancy of the outer squares for same extract and deposit masks.

Code: Select all

rankMask dq 64 dup (?) ; include squares
fileMask dq 64 dup (?) ; include squares
firstRankLookup db 256 dup (?)

; input&#58;  rcx square index 0..63, not changed
;         rdx occupancy, not changed
; output&#58; rax rook attacks

  pext    rax, rdx, quad word ptr &#91;rankMask + 8*rcx&#93;
  pext    r08, rdx, quad word ptr &#91;fileMask + 8*rcx&#93;
  movzx   rax, byte ptr &#91;firstRankLookup + rax&#93;
  movzx   r08, byte ptr &#91;firstRankLookup + r08&#93;
  pdep    rax, rax, quad word ptr &#91;rankMask + 8*rcx&#93;
  pdep    r08, r08, quad word ptr &#91;fileMask + 8*rcx&#93;
  or      rax, r08

Oups, nonsense! One has to index firstRankLookup by file or rank indices of course. So a little more effort.

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Sep 12, 2011 8:57 pm

Gerd Isenberg wrote:If pext/pdep pairs well, minimal Kindergarten BBs will get a new kick as well, ignoring redundancy of the outer squares for same extract and deposit masks.

Oups, nonsense! One has to index firstRankLookup by file or rank indices of course. So a little more effort.

something like this considering outer squares ...

Code: Select all

rankMaskEx dq 8 dup (?) ; exclude outer squares
fileMaskEx dq 8 dup (?) ; exclude outer squares
rankMask   dq 8 dup (?) ; include all squares
fileMask   dq 8 dup (?) ; include all squares
firstRankLookup db 512 dup (?) ;  firstRankLookup&#91;occ 0&#58;63&#93;&#91;file index 0&#58;7&#93;

; input&#58;  rcx square index 0..63
;         rdx occupancy
; output&#58; rax rook attacks

  mov     r08, rcx
  and     rcx, 7 ; file index  
  shr     r08, 3 ; rank index
  pext    rax, rdx, quad word ptr &#91;rankMaskEx + 8*r08&#93;  ; inner six bits
  pext    rdx, rdx, quad word ptr &#91;fileMaskEx + 8*rcx&#93;  ; inner six bits
  movzx   rax, byte ptr &#91;firstRankLookup + rcx + 8*rax&#93;
  movzx   rdx, byte ptr &#91;firstRankLookup + r08 + 8*rdx&#93;
  pdep    rax, rax, quad word ptr &#91;rankMask + 8*r08&#93;
  pdep    rdx, rdx, quad word ptr &#91;fileMask + 8*rcx&#93;
  or      rax, rdx

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Sep 12, 2011 10:20 pm

Ok, the magic alternative with ~200 KB (< 10K for Bishops), preserving the input parameters, looks really promising:

Code: Select all

; input&#58;  rcx square index 0..63
;         rdx occupancy
; output&#58; rax rook attacks

  pext    rax, rdx, quad word ptr &#91;rookMaskEx + 8*rcx&#93;  ; 10,11,12 bits
  mov     r10, &#91;rookSquarePointer + 8*rcx&#93; ; pointer to square array
  movzx   rax, word ptr &#91;r10 + 2*rax&#93;
  pdep    rax, rax, quad word ptr &#91;rookMask + 8*rcx&#93;

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Sep 12, 2011 10:43 pm

Or pext only with the conventional magic array sizes, pext saves "and", mul, shift, not to mention the lookup of the factor and variable shift:

Code: Select all

; input&#58;  rcx square index 0..63
;         rdx occupancy
; output&#58; rax rook attacks

  pext    rax, rdx, quad word ptr &#91;rookMaskEx + 8*rcx&#93;  ; 10,11,12 bits
  mov     r10, &#91;rookSquarePointer + 8*rcx&#93; ; pointer to square array
  mov     rax, word ptr &#91;r10 + 8*rax&#93;

ibid · Post by **ibid** » Tue Sep 13, 2011 11:27 am

Zach Wegner wrote:The instructions, particularly the bit manipulation instructions, look pretty awesome for chess. Check out PEXT/PDEP: instead of magic bitboards, you can just get the attack mask, extract out the relevant bits with PEXT, do a table lookup of a 2-byte value (just a compressed attack bitboard), and use PDEP to decompress back to an attack bitboard. Too bad there's not a vector version too :)

Excellent. I've been waiting for those two instructions for many years.
Always seemed like an "obvious" way to do it.

You can save a little memory on the 2-byte tables too. For example,
the rook tables for h1 and a8 would be identical. Doesn't seem to work
for many tables though.

Haswell New Instructions

Re: Haswell New Instructions

Re: Haswell New Instructions

Re: Haswell New Instructions

Re: Haswell New Instructions

Re: Haswell New Instructions