Yes, gcc-4.2. Presumably compilers know to align __m128i data correctly. I showed the out-of-line code, but the function is indeed inlined.Gerd Isenberg wrote: OK, that looks not that optimal. Is that GCC?
All memory->xmm transfer goes via general purpose reg, may be a matter of 16 byte alignment of the data. Also the final transfer from xmm0 low qword to rax goes via stack. Also the function should be inlined.
llvm-gcc-4.2 gives this:
Code: Select all
_bishopAttacks:
pushq %rbp
movq %rsp, %rbp
movslq %esi, %rax
shlq $4, %rax
movq _singleBitsXMM@GOTPCREL(%rip), %rcx
movaps (%rcx,%rax), %xmm0
movq _diaAntiMaskXMM@GOTPCREL(%rip), %rcx
movaps (%rcx,%rax), %xmm1
movd %rdi, %xmm2
movlhps %xmm2, %xmm2
andps %xmm1, %xmm2
movaps %xmm2, %xmm3
psubq %xmm0, %xmm3
movq _swapMaskXMM@GOTPCREL(%rip), %rax
movaps (%rax), %xmm4
pshufb %xmm4, %xmm0
pshufb %xmm4, %xmm2
psubq %xmm0, %xmm2
pshufb %xmm4, %xmm2
xorps %xmm3, %xmm2
andps %xmm1, %xmm2
movaps %xmm2, %xmm0
shufpd $3, %xmm0, %xmm0
paddq %xmm2, %xmm0
movd %xmm0, %rax
popq %rbp
ret
Robert P.