I was thinking: usually the problem with magics is that multiplications cannot shift bits downwards. So it is the squares near the upper end of the baord that cause most problems. Also, with pure shifts, you need different instructions for shifting left and right.
But i386/x64 architecture also has a rotate instruction, which doesn't suffer from that limitation. As rotating N bits right is the same as rotating 64-N bits left. So why not use the rotate instruction first to get the entire King neighborhood in a fixed set of bits, and apply a magic multiplyer on that?
In particular, for a King it seems a good thing to rotate the move set right by the square number, to get the square at the King location itsef into the LSB. That would give you the word
e00000fg h0000000 .... 000000ab c00000d0
The important thing is that the rotate would move squares e-h (which would be off board if the King really was on a1) to the upper end of the word. Now we could bring the squares a-d there as well by multiplying wth 0x2010000000000009ull:
Code: Select all
e00000fg h0000000 .... 000000ab c00000d0 (<< 0)
000fgh00 00000000 .... 000abc00 000d0000 (<< 3)
00abc000 00d00000 .... 00000000 00000000 (<< 52)
0d000000 00000000 .... 00000000 00000000 (<< 61)
___________________________________________ +
ed???hfg h0d00000 ...
We see that the upper part edabc is contaminated by adding fg, (which was needed to get h 'in view'), but since g and h are uncontaminated in the two lowest bits of the upper byte, it would be possible to correct for this contamination, proving uniqueness. The prescription would thus be:
x = kingMoves[sq] & ~friends;
x = x << sq | x >> 64-sq; // this is a single rotate instruction
x *= 0x2010000000000009ull; // magic multiplier
x >>= 56; // bring to lower byte
Note that the magic imultiplier s now the same for every square, so that it doesn't have to be looked up. Presumably the elimination of this load operation would compensate the extra rotate instruction. It also reduces cache pressure by eliminating the magics table. Also note that the mapping between index values and move patterns relative to the King is now also always the same.
It would be nice if C was extended with an operator >< for 'rotate'.
