Compact attack map

hgm · Post by **hgm** » Tue May 17, 2022 12:29 pm

In The Mailbox Trials I used an attack map of 32 uint64_t (one for every piece): to make it easy to merge the attackers sets of four minors such that they would extract in LVA order, the attacker bits were spaced out as one per nibble. A second bit in each nibble could be used to record the protectors. But that would still leave half the bits unused, and thus the attack map twice as large as needed. Which makes the copy step on the copy-make update in MakeMove() twice as expensive as needed.

So I was wondering if it wouldn't be better to use 32-bit integers for the attack-map elements, and expand these to 64 bits when they are used for capture extraction. This is possible with a single extra multiply, if in the 32-bit format the Pawns are interlieved with the other pieces:

Code: Select all

                                pPkKpPqQpPrRpPrRpPbBpPbBpPnNpPnN  a = (uint64_t) attackers[victim];
KpPqQpPrRpPrRpPbBpPbBpPnNpPnN00pPkKpPqQpPrRpPrRpPbBpPbBpPnNpPnN0  a *= (8LL<<32) + 2;
K000Q000R000R000B000B000N000N000P000P000P000P000P000P000P000P000  a &= 0x8888888888888888LL;

The AND operation would be needed anyway to select the attackers of the opponent color, so the only extra operation is a multiply by a constant. You would have to do that for all possible victims, though, i.e. 15 times. (We will never need to extract captures on the King!)

The 64-bit attack map could be coppied with 32 load and 32 store instructions, the 32-bit attack map with 16 load and 16 store instructions (all transferring 64-bit words). So we would save 17 instructions. What might even be more important is that we reduce pressure on the cache bandwith, which is likely the bottleneck in the attack-map upddate.

I guess you could also expand only after merging the attackers sets on two victims:

Code: Select all

                                pPkKpPqQpPrRpPrRpPbBpPbBpPnNpPnN  a = (uint64_t) attackers[victim1];
                                pPkKpPqQpPrRpPrRpPbBpPbBpPnNpPnN  b = (uint64_t) attackers[victim2];
                                0P0K0P0Q0P0R0P0R0P0B0P0B0P0N0P0N  a &= 0x55555555;
                                0P0K0P0Q0P0R0P0R0P0B0P0B0P0N0P0N  b &= 0x55555555;
                                PPKKPPQQPPRRPPRRPPBBPPBBPPNNPPNN  a += 2*b;
KKPPQQPPRRPPRRPPBBPPBBPPNNPPNN00PPKKPPQQPPRRPPRRPPBBPPBBPPNNPPNN  a *= (4LL<<32) + 1;
KK00QQ00RR00RR00BB00BB00NN00NN00PP00PP00PP00PP00PP00PP00PP00PP00  a &= 0xC0C0C0C0C0C0C0C0LL;

This requires 7 instructions to merge and expand two attackers sets. (The a += 2*b can be done with a LEA instruction.) Expanding the sets individually before merge would require 2 x 3 (as above), plus a shift and OR for the merge, a total of 8 instructions.

I guess one can load the two attackers sets of two victims in the same value group with a single 64-bit load, effectively concatenating them. Then you could mask out the protector bits with a single AND, and use multiply + shift to interleave them:

Code: Select all

pPkKpPqQpPrRpPrRpPbBpPbBpPnNpPnNpPkKpPqQpPrRpPrRpPbBpPbBpPnNpPnN  a = *(unit64_t*)(attackers [victim1]);
0P0K0P0Q0P0R0P0R0P0B0P0B0P0N0P0N0P0K0P0Q0P0R0P0R0P0B0P0B0P0N0P0N  a &= 0x5555555555555555LL;
PPKKPPQQPPRRPPRRPPBBPPBBPPNNPPNN0P0K0P0Q0P0R0P0R0P0B0P0B0P0N0P0N  a *= (2<<32) + 1;
                                PPKKPPQQPPRRPPRRPPBBPPBBPPNNPPNN  a >>= 32;

That takes only 4 instructions for the initial merge, and 2 more for expanding (as above), 6 in total.

Merging two attackers sets in the 64-bit format would have taken only 4 instructions (two loads, a LEA for a += 4*b, and an AND with 0x5555555555555555LL). So the unpacking takes 2 extra instructions per pair. For 8 pairs that is still 16 instructions. So there is not much to gain here.