Optimal solution for Pawn moves

dangi12012 · Post by **dangi12012** » Mon Oct 24, 2022 8:40 pm

This topic inspired me to optimize what is maybe already very fast. The pawn moves.
http://www.talkchess.com/forum3/viewtop ... =7&t=80149

For optimal perft or movegeneration it is optimal to generate all moves at once. This is already solved for sliders, and also for pawns.
Moving forward all pawns in a few instructions:

Code: Select all

uint64_t mov = move::atk_forward<c>(forward_pawns) & empty & valid;

which solves for pins, check evasions.
Promotion needs extra code:

Code: Select all

uint64_t mov_norm = mov & ~move::edges;
uint64_t mov_prom = mov & move::edges;

Finally iteration is needed:

Code: Select all

Bitloop(mov_norm) {
	make_move(from, src(from))
}

Bitloop(mov_prom) {
	make_promotion(from, src(from), 0);
	make_promotion(from, src(from), 1);
	make_promotion(from, src(from), 2);
	make_promotion(from, src(from), 3);
}

This was the old way it was done.

All of the above process can be done in bulk without any branches or loops, similar to how PEXT and Magic Bitboards can solve all moves - and index into a preprepared movelist (where popcount and moves are already known)

I cannot share the code, (because of ongoing research into a big project) but the idea is described above.
Doing this on a single thread yields these results (Billion pawnmoves per second):

Code: Select all

Loop: 0.56G/s
Cube: 13.5G/s

The main idea is applicable to all moves:
Having a movelist in the prepared for all possible permutation of moves (at compiletime) that can eminate from a square. Have your algorithm not return the bitset of attacked squares (while taking pins and check evasions into account), but the index into the correct movelist. Doing moves this way is infact faster than popcount which is a 4x per clock instruction!

Chessnut1071 · Post by **Chessnut1071** » Tue Oct 25, 2022 3:00 am

dangi12012 wrote: ↑Mon Oct 24, 2022 8:40 pm This topic inspired me to optimize what is maybe already very fast. The pawn moves.
http://www.talkchess.com/forum3/viewtop ... =7&t=80149

For optimal perft or movegeneration it is optimal to generate all moves at once. This is already solved for sliders, and also for pawns.
Moving forward all pawns in a few instructions:
Code: Select all
uint64_t mov = move::atk_forward<c>(forward_pawns) & empty & valid;
which solves for pins, check evasions.
Promotion needs extra code:
Code: Select all
uint64_t mov_norm = mov & ~move::edges;
uint64_t mov_prom = mov & move::edges;
Finally iteration is needed:
Code: Select all
Bitloop(mov_norm) {
	make_move(from, src(from))
}

Bitloop(mov_prom) {
	make_promotion(from, src(from), 0);
	make_promotion(from, src(from), 1);
	make_promotion(from, src(from), 2);
	make_promotion(from, src(from), 3);
}
This was the old way it was done.

All of the above process can be done in bulk without any branches or loops, similar to how PEXT and Magic Bitboards can solve all moves - and index into a preprepared movelist (where popcount and moves are already known)

I cannot share the code, (because of ongoing research into a big project) but the idea is described above.
Doing this on a single thread yields these results (Billion pawnmoves per second):
Code: Select all
Loop: 0.56G/s
Cube: 13.5G/s
The main idea is applicable to all moves:
Having a movelist in the prepared for all possible permutation of moves (at compiletime) that can eminate from a square. Have your algorithm not return the bitset of attacked squares (while taking pins and check evasions into account), but the index into the correct movelist. Doing moves this way is infact faster than popcount which is a 4x per clock instruction!

I saw material on moving sliders and pawns simultaneously and was impressed by the speed. I'm using PEXT for move generation; however, when I generate a move I also need 14 pieces of information for that u64 move, including the following:
1. piece 2. move 3. capture 4. direction 5. check 6. line of check from 7. line of check to 8. line of check direction 9 skip 10. pawn promotion type 11. ent passant 12. castle king 13. castle queen 14 15-bit evaluation

the time saved simultaneously generating multiple moves then has to content with the lookups and tables for the 13 non-evaluation variables. Are you really saving time by moving all the sliders and pawns at once if you have to take the additional information? To be a bit clearer, I'm using bit scan forward and bit scan backward to generate the moves asynchronously.

dangi12012 · Post by **dangi12012** » Wed Oct 26, 2022 6:41 pm

Updates: brought the algorithm to the gpu!
On a RTX 3080 this algorithm can solve 10E6 random chess positions in 54200ns for its pawn moves.

Code: Select all

Loop: 0.679248G/s
Cube: 16.8909G/s
cu_Loop: 1.329G/s
cu_Cube: 837.078G/s

These results are validated against each other and a reference pawnmove generator.
I cannot share the code yet - but I will keep you posted.

Walking the tree of chess will be much slower but the new way to solve all pawn moves is a breakthrough.

In the meantime I can point you towards the sliders:
https://github.com/Gigantua/Chess_Movegen_GPU

Optimal solution for Pawn moves

Optimal solution for Pawn moves

Re: Optimal solution for Pawn moves

Re: Optimal solution for Pawn moves