http://www.talkchess.com/forum3/viewtop ... =7&t=80149
For optimal perft or movegeneration it is optimal to generate all moves at once. This is already solved for sliders, and also for pawns.
Moving forward all pawns in a few instructions:
Code: Select all
uint64_t mov = move::atk_forward<c>(forward_pawns) & empty & valid;
Promotion needs extra code:
Code: Select all
uint64_t mov_norm = mov & ~move::edges;
uint64_t mov_prom = mov & move::edges;
Code: Select all
Bitloop(mov_norm) {
make_move(from, src(from))
}
Bitloop(mov_prom) {
make_promotion(from, src(from), 0);
make_promotion(from, src(from), 1);
make_promotion(from, src(from), 2);
make_promotion(from, src(from), 3);
}
All of the above process can be done in bulk without any branches or loops, similar to how PEXT and Magic Bitboards can solve all moves - and index into a preprepared movelist (where popcount and moves are already known)
I cannot share the code, (because of ongoing research into a big project) but the idea is described above.
Doing this on a single thread yields these results (Billion pawnmoves per second):
Code: Select all
Loop: 0.56G/s
Cube: 13.5G/s
Having a movelist in the prepared for all possible permutation of moves (at compiletime) that can eminate from a square. Have your algorithm not return the bitset of attacked squares (while taking pins and check evasions into account), but the index into the correct movelist. Doing moves this way is infact faster than popcount which is a 4x per clock instruction!