Tricky Optimization Question

Discussion of chess software programming and technical issues.

Moderator: Ras

Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Tricky Optimization Question

Post by Mike Sherwin »

For move generation of a white pawn on the second rank.

Code: Select all

u64 empty = ~occ;

u64 bb = wPawnMoves[fs] & empty;
// What is the fastest way (branchless) to remove the bit if there is one
// for fs + 16 if there is a piece on fs + 8
User avatar
Ras
Posts: 2696
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Tricky Optimization Question

Post by Ras »

Mike Sherwin wrote: Sun Mar 13, 2022 12:39 am// What is the fastest way (branchless) to remove the bit if there is one for fs + 16 if there is a piece on fs + 8
You could just OR the existing "occ" mask with the "occ" mask shifted up by one board line. Basically pretending that the piece blocking the double step isn't right before the pawn, but instead on the target square of the double step.
Rasmus Althoff
https://www.ct800.net
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Tricky Optimization Question

Post by dangi12012 »

BB move = pawns & (occ << 8)
BB push = pawns & FirstRank & move & (occ << 16)

This selects the pawns which can move 1 or 2 squares forward as 2 bitboards. Taking and promotion look very similar.

Notice how that solves 16 pawnmoves in 6 instructions. Impossible with mailbox.

When you also want to do check evasion and pin handling check my signature.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Tricky Optimization Question

Post by Mike Sherwin »

Ras wrote: Sun Mar 13, 2022 2:34 am
Mike Sherwin wrote: Sun Mar 13, 2022 12:39 am// What is the fastest way (branchless) to remove the bit if there is one for fs + 16 if there is a piece on fs + 8
You could just OR the existing "occ" mask with the "occ" mask shifted up by one board line. Basically pretending that the piece blocking the double step isn't right before the pawn, but instead on the target square of the double step.
You left a small detail out.
mask = occ | ((occ ^ (1ull << fs)) << 8);

00 | 00 = 00 correct :)
01 | 10 = 11 correct :)
10 | 00 = 10 correct :)
11 | 10 = 11 correct :D

bb = 11 ^ 00; = 11 correct
bb = 11 ^ 11; = 00 correct
bb = 11 ^ 10; = 01 correct
bb = 11 ^ 11; = 00 correct

It took all day and half of the morning to figure out what was wrong. :(
You could have saved me many hours of work if only you had negated the fs bit. :x :P
But at least I got it done. :D
So thank you!
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Tricky Optimization Question

Post by Mike Sherwin »

dangi12012 wrote: Sun Mar 13, 2022 3:28 am BB move = pawns & (occ << 8)
BB push = pawns & FirstRank & move & (occ << 16)

This selects the pawns which can move 1 or 2 squares forward as 2 bitboards. Taking and promotion look very similar.

Notice how that solves 16 pawnmoves in 6 instructions. Impossible with mailbox.

When you also want to do check evasion and pin handling check my signature.
I'm storing the ts bits in a bit table (u64 genbb[ply][fs]) and not spinning them into a move list.
so that means I'd have to have two while loops to store all the bits. That is why it is done one pawn at a time.

genbb[ply][fs] = wPawnMoves[fs] ^ (occ | ((occ ^ (1ull << fs)) << 8));

So I'm not sure which way is faster? :?:
User avatar
Ras
Posts: 2696
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Tricky Optimization Question

Post by Ras »

Mike Sherwin wrote: Sun Mar 13, 2022 12:35 pmYou left a small detail out.
Yeah, I don't even know your board geometry, and that determines whether to shift left or right, and whether by 1 or by 8 bits. So I figured just posting the general idea would do the trick because you'd know your implementation best, and it would also be most helpful for people who later find that via the forum search function. :-)
Rasmus Althoff
https://www.ct800.net
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Tricky Optimization Question

Post by dangi12012 »

Mike Sherwin wrote: Sun Mar 13, 2022 12:48 pm
dangi12012 wrote: Sun Mar 13, 2022 3:28 am BB move = pawns & (occ << 8)
BB push = pawns & FirstRank & move & (occ << 16)

This selects the pawns which can move 1 or 2 squares forward as 2 bitboards. Taking and promotion look very similar.

Notice how that solves 16 pawnmoves in 6 instructions. Impossible with mailbox.

When you also want to do check evasion and pin handling check my signature.
I'm storing the ts bits in a bit table (u64 genbb[ply][fs]) and not spinning them into a move list.
so that means I'd have to have two while loops to store all the bits. That is why it is done one pawn at a time.

genbb[ply][fs] = wPawnMoves[fs] ^ (occ | ((occ ^ (1ull << fs)) << 8));

So I'm not sure which way is faster? :?:
Transforming a BB into a list always takes a loop.
I found the fastest loop was this (instead of _blsr_u64 you can use (x-1) & x)

Code: Select all

#define Bitloop(X) for(;X; X = _blsr_u64(X))
So I would recommend to you this:

Code: Select all

BB pmove = pawns & (occ << 8);
BB ppush = pawns & FirstRank & pmove & (occ << 16);
BitLoop(pmove)
{
	//*movelist++ = ...
	
	//Normally you dont need to go back to integer representation like this from BB (you only loose speed) but i included it here:
	//int squareFrom = SquareOf(pmove);
	//int squareTo = squareFrom +8;
}
BitLoop(ppush)
{
	//*movelist++ = ...
}
That is as efficient as I ever found it. If you have an improvement let me know! :)
Its faster to solve all pawns and loop over solution bits instead of looping over all pawns and then over all bits of each pawn.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Tricky Optimization Question

Post by Mike Sherwin »

dangi12012 wrote: Sun Mar 13, 2022 4:35 pm
Mike Sherwin wrote: Sun Mar 13, 2022 12:48 pm
dangi12012 wrote: Sun Mar 13, 2022 3:28 am BB move = pawns & (occ << 8)
BB push = pawns & FirstRank & move & (occ << 16)

This selects the pawns which can move 1 or 2 squares forward as 2 bitboards. Taking and promotion look very similar.

Notice how that solves 16 pawnmoves in 6 instructions. Impossible with mailbox.

When you also want to do check evasion and pin handling check my signature.
I'm storing the ts bits in a bit table (u64 genbb[ply][fs]) and not spinning them into a move list.
so that means I'd have to have two while loops to store all the bits. That is why it is done one pawn at a time.

genbb[ply][fs] = wPawnMoves[fs] ^ (occ | ((occ ^ (1ull << fs)) << 8));

So I'm not sure which way is faster? :?:
Transforming a BB into a list always takes a loop.
I found the fastest loop was this (instead of _blsr_u64 you can use (x-1) & x)

Code: Select all

#define Bitloop(X) for(;X; X = _blsr_u64(X))
So I would recommend to you this:

Code: Select all

BB pmove = pawns & (occ << 8);
BB ppush = pawns & FirstRank & pmove & (occ << 16);
BitLoop(pmove)
{
	//*movelist++ = ...
	
	//Normally you dont need to go back to integer representation like this from BB (you only loose speed) but i included it here:
	//int squareFrom = SquareOf(pmove);
	//int squareTo = squareFrom +8;
}
BitLoop(ppush)
{
	//*movelist++ = ...
}
That is as efficient as I ever found it. If you have an improvement let me know! :)
Its faster to solve all pawns and loop over solution bits instead of looping over all pawns and then over all bits of each pawn.
But but but I do not spin the bits into a bulk move list. I store the raw bits into genbb[ply][fs] for later use that still won't make a bulk move list. No move list.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Tricky Optimization Question

Post by dangi12012 »

Mike Sherwin wrote: Sun Mar 13, 2022 6:16 pm But but but I do not spin the bits into a bulk move list. I store the raw bits into genbb[ply][fs] for later use that still won't make a bulk move list. No move list.
Later you will have to differentiate between a move and a push because that marks a pawn as Enpassant. If you store both in a single bitboard how do you store that information?

Also there will have to be promotion - how do you store that too?
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Tricky Optimization Question

Post by Mike Sherwin »

Let me explain some more. The ts bits are more valuable than moves.
1) If there is a tt move
..a) If there a bit that matches in genbb[ply][fs] the ts of the tt move it is not a collision
....i) Make the tt move
....ii) If no beta cut clear the bit in genbb[ply][fs] and continue
....iii) If beta cut return beta
2) If in atkbb[ply] there are attacks on the last/first rank look for promotions remove bit and play move
3) If in atkbb[ply] there are winning attacks by piece value remove bit in genbb[ply] and play move
4) If in genbb[ply][fs] there is a bit for a killer play move, on return clear bit
5) For quiets just traverse the remaining bits, etc

Having the bits to work with instead of a move list is far more efficient! :idea:
Bob

'Never do now what might not need to be done later.'

Last edited by Mike Sherwin on Sun Mar 13, 2022 7:22 pm, edited 2 times in total.