Code: Select all
// generation:
moveList[nrOfMove++] = { fromSqr, toSqr, SORTKEY(piece, victim) };
// searching:
for(k=0; k<nrOfMoves; k++) {
move = moveList[k];
for(i=k+1; i<nrOfMove; i++) if(moveList[i].key > move.key) SWAP(move, moveList[i]); // extract next move
moveList[k] = move; // in case we will make a second pass through the move list, e.g. for IID
// search the move
MakeMove(move.from, move.to);
...
}
[/quote]
This can be improved upon a lot if the sort key is a simple MVV/LVA. The idea is to store the generated captures in an attack map rather than a move list. In such a map each capture is represented by a single bit. The captures on the same victim are packed in the same word, and the bits within such a word (representing different attackers) are assigned such that they would extract in LVA order. The loop over moves can then simply extract the next bit from the attackers of the next victim to know what it has to search next, without additional sorting. So
[code]// generation:
attackers[victim] |= attackBit[piece];
// searching
for(victim=QUEEN; victim >= LASTPAWN; victim--) { // run through opponent pieces top to bottom
toSqr = location[victim];
h = attackers[victim];
while(h) { // extract attackers low to high
int n = BSF(h);
piece = fromTable[n];
fromSqr = location[piece];
// search the corresponding move
MakeMove(fromSqr, toSqr);
...
h &= h - 1;
}
}
If running through all potential victims would be inefficient (e.g. because few pieces are left), we could maintain a separe bitmap 'attacked' to indicate which pieces are attacked (the bits assigned so they extract in MVV order), and would get
Code: Select all
// generation:
attackers[victim] |= attackBit[piece]; attacked |= victimBit[victim];
// searching
v = attacked;
while(v) { // run through opponent pieces that are actually attacked, top to bottom
int n = BSF(v);
victim = victimTable[n];
toSqr = location[victim];
h = attackers[victim];
do { // extract attackers low to high
int n = BSF(h);
piece = fromTable[n];
fromSqr = location[piece];
// search the corresponding move
MakeMove(fromSqr, toSqr);
...
h &= h - 1;
} while(h);
}
This can be solved by making the attackers[] elements overly large, and spread out the bits in them. So that there is enough spacing between them to slip in the corresponding bits of another piece of the same value. There are only 16 possible attackers, but if we make attackers[] an array of uint64_t, we can keep 3 unused bits between the different attackers, like:
MSB 000K000Q000R000R000B000B000N000N000P000P000P000P000P000P000P000P LSB
When we get to searching captures of minors, we can then first merge attackers[B1] + 2*attackers[B2] + 4*attackers[N1] + 8*attackers[N2], before going into an extraction loop as above, to search all captures on the minors in perfect MVV/LVA order. For captures on the Rooks we would merge two attackers[] elements, while captures of the Queen would require no merging.
The captures on the Pawns are still problematic, though, because there can be 8 Pawns, and all potential captures on those will not fit in a uint64_t. We can at most merge the captures on 4 Pawns, splitting the Pawn value group into two. But then the problem returns that QxP on a group of the first value group would be searched before NxP on a Pawn in the second. We can at least make sure all PxP captures are searched by any others, by concatenating the P sections of the merged attackers of the first value group. After merging the attackers of 4 Pawns the merged set looks like
MSB KKKKQQQQRRRRRRRRBBBBBBBBNNNNNNNNPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP LSB
By taking mergeP1 & 0xFFFFFFFFull | mergeP2 << 32 we get all PxP attacks in one set, which can go into a usual attacker extraction loop (except that it has to use a different fromTable[]). Capture of Pawns by other pieces can be merged in another uint64_t (from which we extract captures after the PxP), albeit in a more complex way. If we calculate mergeP1 >> 16 | mergeP2 >>32 & 0xFFFF we would get
MSB 0000000000000000KKKKQQQQRRRRRRRRBBBBBBBBNNNNNNNNBBBBBBBBNNNNNNNN LSB
where the bold attackers are from mergeP2. Rather than trying to fudge in the K, Q and R of mergeP2 in the right place, we could just mask away K and Q attackers from P1 and run it through an extraction loop (which also does the RxP captures on P-group 1). The remaining captures can be collected in a third merge, as (mergeP2 >> 48 & 0xFFF) | (mergeP1 >> 52 & 0XFF000) | (mergeP2 & (0xFF<<56)), where the three terms in parantheses are
000000000000000000000000000000000000000000000000QQQQRRRRRRRR
00000000000000000000000000000000000000000KKKKQQQQ000000000000
KKKK000000000000000000000000000000000000000000000000000000000000
which OR to
KKKK0000000000000000000000000000000000000000KKKKQQQQQQQQRRRRRRRR
Note, however, that this juggling of merge sets (which is a kind of bulk sorting) is only needed on captures of Pawns, i.e. after all other captures have already been searched, and have had a chance to cause a cutoff. In addition, very often capturing a Pawn will be futile, so that you would have no need to search them at all, and would not care about the order. Finally, capturing a Pawn with a non-Pawn is very likely to be a bad capture, unless it is not protected. And in the latter case the LVA ordering no longer has any rationalization. Ordering should be purely based on SEE for these.
Finally some thoughts about the overhead of clearing all attackers[] elements before the move generation. Obviously this overhead will be worse if we make the attackers sets larger; zeroing 16x8 bytes is more work than zeroing 16x2 bytes. There is a trade-off possible between the work of clearing the attackers[] array, and the work we do when using its elements: we could scatter the bits over the word to create space for merging only afterwards, so that initially we can use smaller words to collect the attackers. E.g. when we assign bits initially in a 16-bit int as
MSB PPBKPPBQPPNRPPNR LSB
we could take h = attackers[victim] * 0x8000400020001ull & 0x8888888888888888ull; to space out the bits in the way suitable for merging. Now using 16-bit integers usually causes so much overhead that this will never be worth it, the same idea could be used when storing the attacker bits already with a single spacer bit in a 32-bit word, in the order P0K0P0Q0P0R0P0R0P0B0P0B0P0N0P0N0, and multiplying by 0x400000000ull. And we could do that after merging a pair (e.g. attacker[R1] + 2*attackers[R2]), to reduce the overhead of unpacking by a factor 2.
It would also be possible to directly use the words in the attackers[] array to contain the attackers on a pair of pieces already during collection. That would drive up the work during generation a bit, because we would have to record the attacks as
attackers[victim >> 1] |= attackBit[piece] << (victim & 1);
That would save you the work of initializing 32 bytes to zero, and the pair merging afterwards. But whether that would earn you back the extra cost of recording the captures would depend on how many captures there actually are. On a very sparsely populated board it probably would. But of course there are other methods to make things more efficient on sparsely populated boards. (Like not assigning piece numbers to pieces or piece types that are no longer present.)