Hey all,
I've been using "__builtin_ctzll()" to bitscanforward and serialize bitboards for purposes of move generation. However, this has led to an extreme disparity between generation times for white and for black. Black move generation takes around 8-9x as much time as white move generation. I tried using "__builtin_clzll()" (bsr) for black, but this didn't really change anything. I've poured over CPW and other parts of the web for hours now and can't seem to find anything that works. What do you guys recommend?
Thank you in advance!
Serialization symmetry between black and white
Moderators: hgm, Rebel, chrisw
-
- Posts: 7
- Joined: Wed Mar 10, 2021 6:47 pm
- Full name: Glen Cahilly
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Serialization symmetry between black and white
I don't observe any such discrepancy, at least not in perft.
are you sure your measurements are correct?
are you sure your measurements are correct?
Martin Sedlak
-
- Posts: 7
- Joined: Wed Mar 10, 2021 6:47 pm
- Full name: Glen Cahilly
Re: Serialization symmetry between black and white
You're absolutely correct, my bad. My tests were incorrect.
That said, I have a few questions on move generation efficiency.
1) should I simply be happy and move on with my ~2 million positions per second (in Perft functions)?
2) is this process efficient for generating, say, knight moves: serialize knight BB, then find the possible moves using a precomputed mask, then calculate (mask & ~squaresOccupiedBySameColor) to prevent it from capturing friendly pieces, then finally serialize this end result and store the indices in a vector? Or is all of this serialization inefficient and unnecessary?
3) last one: is this following code the most efficient way to serialize a BB?
Code: Select all
vector<int> serialize(U64 bitboard) {
vector<int> list;
while (bitboard) {
list.push_back(__builtin_ctzll(bitboard)); // determine bit index of least significant one bit
bitboard &= bitboard-1; // reset LS1B
}
return list;
}
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Serialization symmetry between black and white
ouch, don't do this. don't use std::vector to hold moves!gcahilly wrote: ↑Wed Mar 17, 2021 4:40 amCode: Select all
vector<int> serialize(U64 bitboard) { vector<int> list; while (bitboard) { list.push_back(__builtin_ctzll(bitboard)); // determine bit index of least significant one bit bitboard &= bitboard-1; // reset LS1B } return list; }
why:
- allocated on heap (unless you use a custom allocator which you don't) => implies extra heap deallocation when your done with the vector as temporary buffer
- push_back grows capacity exponentially but you still get a reallocation/copy a couple of times (unless the compiler is super-smart)
so I bet the slowest part of your serialization loop is actually push_back
for regular chess, maximum # of moves per position is 218 I think (256 a safe bet), std::array is your friend, allocated on the stack somewhere
EDIT: I see now that you use this to extract non-zero bits, not to generate moves yet. in that case 64 is obviously the maximum the buffer needs to hold
Martin Sedlak
-
- Posts: 7
- Joined: Wed Mar 10, 2021 6:47 pm
- Full name: Glen Cahilly
Re: Serialization symmetry between black and white
mar wrote: ↑Wed Mar 17, 2021 6:25 amouch, don't do this. don't use std::vector to hold moves!gcahilly wrote: ↑Wed Mar 17, 2021 4:40 amCode: Select all
vector<int> serialize(U64 bitboard) { vector<int> list; while (bitboard) { list.push_back(__builtin_ctzll(bitboard)); // determine bit index of least significant one bit bitboard &= bitboard-1; // reset LS1B } return list; }
why:
- allocated on heap (unless you use a custom allocator which you don't) => implies extra heap deallocation when your done with the vector as temporary buffer
- push_back grows capacity exponentially but you still get a reallocation/copy a couple of times (unless the compiler is super-smart)
so I bet the slowest part of your serialization loop is actually push_back
for regular chess, maximum # of moves per position is 218 I think (256 a safe bet), std::array is your friend, allocated on the stack somewhere
EDIT: I see now that you use this to extract non-zero bits, not to generate moves yet. in that case 64 is obviously the maximum the buffer needs to hold
Thank you so much, mar! I did as you said and changed to std::array. My move generation is way faster now!
I don't want to bother you too much, but I have been having issues with Quiescence search.
Here is my minimax code. I know it's really ugly that I'm using tuples for storing moves, I plan on changing it soon (prototype now, optimize later):
Code: Select all
int minimax(Board b, int depth, int alpha, int beta, bool white) {
array<tuple<int,int,char,bool,bool,bool>, 256> listOfPseudoMoves = generatePseudoMoves(b, white);
array<tuple<int,int,char,bool,bool,bool>, 256> listOfLegalMoves = generateLegalMoves(b, listOfPseudoMoves, white);
if ((get<0>(listOfLegalMoves[0])<0) or (get<1>(listOfLegalMoves[0])<0)) {
//no moves
if (isInCheck(b, white)) {
//checkmate
if (white) {
return -1000000;
} else {
return 1000000;
}
} else {
return 0;
}
}
if (depth == 0) {
return quiesceEval(b, alpha, beta, white);
}
if (white) {
int maxEval = -1000000;
for (int i=0; i<256; i++) {
int startSq = get<0>(listOfLegalMoves[i]);
int endSq = get<1>(listOfLegalMoves[i]);
if (startSq>=0 and endSq>=0) {
char movingPiece = get<2>(listOfLegalMoves[i]);
bool promotion = get<3>(listOfLegalMoves[i]);
bool castling = get<4>(listOfLegalMoves[i]);
bool enPassant = get<5>(listOfLegalMoves[i]);
Board c = b;
c.makeMove(startSq, endSq, movingPiece, promotion, castling, enPassant);
int eval;
U64 TTIndex = c.hash%TTSize;
int TTDepth = transpositionTable[TTIndex].depth;
int TTEval = transpositionTable[TTIndex].eval;
if (TTDepth>=depth-1 and TTDepth!=-1) {
//use TT info
eval = TTEval;
numTranspositions++;
} else {
eval=minimax(c, depth-1, alpha, beta, !white);
//overwrite transposition table entry
transpositionTable[TTIndex].depth = depth-1;
transpositionTable[TTIndex].eval = eval;
}
alpha = max(alpha, eval);
maxEval = max(maxEval, eval);
if (beta <= alpha) {
i=256; //BREAK
}
} else {
i=256; //BREAK
}
}
return maxEval;
}
else {
int minEval = 1000000;
for (int i=0; i<256; i++) {
int startSq = get<0>(listOfLegalMoves[i]);
int endSq = get<1>(listOfLegalMoves[i]);
if (startSq>=0 and endSq>=0) {
char movingPiece = get<2>(listOfLegalMoves[i]);
bool promotion = get<3>(listOfLegalMoves[i]);
bool castling = get<4>(listOfLegalMoves[i]);
bool enPassant = get<5>(listOfLegalMoves[i]);
Board c = b;
c.makeMove(startSq, endSq, movingPiece, promotion, castling, enPassant);
int eval;
U64 TTIndex = c.hash%TTSize;
int TTDepth = transpositionTable[TTIndex].depth;
int TTEval = transpositionTable[TTIndex].eval;
if (TTDepth>=depth-1 and TTDepth!=-1) {
//use TT info
eval = TTEval;
numTranspositions++;
} else {
eval=minimax(c, depth-1, alpha, beta, !white);
//overwrite transposition table entry
transpositionTable[TTIndex].depth = depth-1;
transpositionTable[TTIndex].eval = eval;
}
beta = min(beta, eval);
minEval = min(minEval, eval);
if (beta <= alpha) {
i=256;
}
} else {
i=256;
}
}
return minEval;
}
}
But now that I changed that depth=0 instruction to Quiesce, it doesn't work.
I've looked around quite a bit on the web and have spent a lot of time tinkering with my own code to try to fix it, but nothing has worked.
Here is my Quiesce code:
Code: Select all
int quiesceEval(Board b, int alpha, int beta, bool white) {
int score = evaluation(b, white);
if (score >= beta) return score;
array<tuple<int,int,char,bool,bool,bool>, 256> pseudoCaptures = generatePseudoCaptures(b, white);
array<tuple<int,int,char,bool,bool,bool>, 256> legalCaptures = generateLegalMoves(b, pseudoCaptures, white);
for (int i=0; i<256; i++) {
Board c = b;
int startSq = get<0>(legalCaptures[i]);
int endSq = get<1>(legalCaptures[i]);
if (startSq>=0 and endSq>=0) {
char movingPiece = get<2>(legalCaptures[i]);
bool promotion = get<3>(legalCaptures[i]);
bool castling = get<4>(legalCaptures[i]);
bool enPassant = get<5>(legalCaptures[i]);
c.makeMove(startSq, endSq, movingPiece, promotion, castling, enPassant);
score = -quiesceEval(c, -beta, -alpha, !white);
if (score >= alpha) {
alpha = score;
if (score >= beta) {
i=256;
}
}
} else {
i=256;
}
}
return score;
}
-Glen
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Serialization symmetry between black and white
You're pushing your luck. You should debug your code, not waste other people's time. Is this some form of sophisticated trolling?
Your quiescence uses negamax, but your search doesn't - I suggest you rewrite what you call search (minimax) to use negamax as well => you'll save those extra ifs and reduce
the probability of introducing new bugs in the future, because you won't have to duplicate large chunks of code that only differ in alpha/beta logic. just do negamax where the logic is the same => you always improve alpha from stm's point of view.
EDIT: so, if you want to "fix" the call to quiesce, you should probably add extra if and call negamaxed version of quiesce if stm is black
Martin Sedlak
-
- Posts: 7
- Joined: Wed Mar 10, 2021 6:47 pm
- Full name: Glen Cahilly
Re: Serialization symmetry between black and white
Really sorry. I spent many hours debugging and couldn't figure it out. Sorry for wasting your time. I really appreciate your help and will try extra hard in the future to debug before looking for help.mar wrote: ↑Sat Mar 20, 2021 11:10 amYou're pushing your luck. You should debug your code, not waste other people's time. Is this some form of sophisticated trolling?
Your quiescence uses negamax, but your search doesn't - I suggest you rewrite what you call search (minimax) to use negamax as well => you'll save those extra ifs and reduce
the probability of introducing new bugs in the future, because you won't have to duplicate large chunks of code that only differ in alpha/beta logic. just do negamax where the logic is the same => you always improve alpha from stm's point of view.
EDIT: so, if you want to "fix" the call to quiesce, you should probably add extra if and call negamaxed version of quiesce if stm is black
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: Serialization symmetry between black and white
Your QS function returns "score" which is a random score (of the last subtree visited, unless there was a beta cutoff). You need to return the best score which is either "alpha" or a "bestScore" you track separately. And that is already my second remark: your QS uses a mixture of fail-hard and fail-soft alpha-beta. In a consistent fail-hard implementation, for instance, you would return beta (not score) in case of a cutoff, and otherwise alpha.
EDIT: And as already suggested, your QS uses negamax so it returns a score from the moving side's viewpoint while your minimax search always returns a score from the viewpoint of White. So in minimax the return value of QS needs to be negated when it is Black's turn.
EDIT: And as already suggested, your QS uses negamax so it returns a score from the moving side's viewpoint while your minimax search always returns a score from the viewpoint of White. So in minimax the return value of QS needs to be negated when it is Black's turn.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)