Oups, nonsense! One has to index firstRankLookup by file or rank indices of course. So a little more effort.Gerd Isenberg wrote:If pext/pdep pairs well, minimal Kindergarten BBs will get a new kick as well, ignoring redundancy of the outer squares for same extract and deposit masks.
Code: Select all
rankMask dq 64 dup (?) ; include squares fileMask dq 64 dup (?) ; include squares firstRankLookup db 256 dup (?) ; input: rcx square index 0..63, not changed ; rdx occupancy, not changed ; output: rax rook attacks pext rax, rdx, quad word ptr [rankMask + 8*rcx] pext r08, rdx, quad word ptr [fileMask + 8*rcx] movzx rax, byte ptr [firstRankLookup + rax] movzx r08, byte ptr [firstRankLookup + r08] pdep rax, rax, quad word ptr [rankMask + 8*rcx] pdep r08, r08, quad word ptr [fileMask + 8*rcx] or rax, r08
Haswell New Instructions
Moderators: hgm, Rebel, chrisw
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Haswell New Instructions
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Haswell New Instructions
something like this considering outer squares ...Gerd Isenberg wrote:If pext/pdep pairs well, minimal Kindergarten BBs will get a new kick as well, ignoring redundancy of the outer squares for same extract and deposit masks.
Oups, nonsense! One has to index firstRankLookup by file or rank indices of course. So a little more effort.
Code: Select all
rankMaskEx dq 8 dup (?) ; exclude outer squares
fileMaskEx dq 8 dup (?) ; exclude outer squares
rankMask dq 8 dup (?) ; include all squares
fileMask dq 8 dup (?) ; include all squares
firstRankLookup db 512 dup (?) ; firstRankLookup[occ 0:63][file index 0:7]
; input: rcx square index 0..63
; rdx occupancy
; output: rax rook attacks
mov r08, rcx
and rcx, 7 ; file index
shr r08, 3 ; rank index
pext rax, rdx, quad word ptr [rankMaskEx + 8*r08] ; inner six bits
pext rdx, rdx, quad word ptr [fileMaskEx + 8*rcx] ; inner six bits
movzx rax, byte ptr [firstRankLookup + rcx + 8*rax]
movzx rdx, byte ptr [firstRankLookup + r08 + 8*rdx]
pdep rax, rax, quad word ptr [rankMask + 8*r08]
pdep rdx, rdx, quad word ptr [fileMask + 8*rcx]
or rax, rdx
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Haswell New Instructions
Ok, the magic alternative with ~200 KB (< 10K for Bishops), preserving the input parameters, looks really promising:
Code: Select all
; input: rcx square index 0..63
; rdx occupancy
; output: rax rook attacks
pext rax, rdx, quad word ptr [rookMaskEx + 8*rcx] ; 10,11,12 bits
mov r10, [rookSquarePointer + 8*rcx] ; pointer to square array
movzx rax, word ptr [r10 + 2*rax]
pdep rax, rax, quad word ptr [rookMask + 8*rcx]
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Haswell New Instructions
Or pext only with the conventional magic array sizes, pext saves "and", mul, shift, not to mention the lookup of the factor and variable shift:
Code: Select all
; input: rcx square index 0..63
; rdx occupancy
; output: rax rook attacks
pext rax, rdx, quad word ptr [rookMaskEx + 8*rcx] ; 10,11,12 bits
mov r10, [rookSquarePointer + 8*rcx] ; pointer to square array
mov rax, word ptr [r10 + 8*rax]
-
- Posts: 89
- Joined: Mon Jun 13, 2011 12:09 pm
Re: Haswell New Instructions
Excellent. I've been waiting for those two instructions for many years.Zach Wegner wrote:The instructions, particularly the bit manipulation instructions, look pretty awesome for chess. Check out PEXT/PDEP: instead of magic bitboards, you can just get the attack mask, extract out the relevant bits with PEXT, do a table lookup of a 2-byte value (just a compressed attack bitboard), and use PDEP to decompress back to an attack bitboard. Too bad there's not a vector version too :)
Always seemed like an "obvious" way to do it.
You can save a little memory on the 2-byte tables too. For example,
the rook tables for h1 and a8 would be identical. Doesn't seem to work
for many tables though.