How can I improve my engine’s speed?
Moderator: Ras
-
- Posts: 26
- Joined: Wed Feb 16, 2022 6:21 am
- Full name: P. M. Zamboni
How can I improve my engine’s speed?
Hello, everyone! A few months ago, I posted about my chess bot (“engine”): https://talkchess.com/forum3/viewtopic.php?f=7&t=79379
Since then, I implemented bitboards, and it severely helped increase the bot’s analysis speed without needing to use an external native program for it! https://github.com/zamfofex/dummyette/c ... b961f69356
However, it’s still very slow. Don’t get me wrong, it performs well enough for me to be happy about it, but I still think there is a lot of room for improvement.
I have been reading a lot about fast chess move generation algorithms lately, but I didn’t find anything directly applicable to me. You see, my approach is to only ever generate pseudo‐moves, while also entirely disconsidering e.p. and castling. The king may be captured, and is worth many points, so that is naturally avoided by the bot during the search. This is a sufficient enough heuristic for my purposes. Other optimizations I found (ones that I don’t already implement) were mostly related to discarding illegal moves from what I was able to understand.
What I don’t understand is why the move generation and search is so slow in my bot. It’s very frustrating for me, because although there *clearly* is room for improvement, I can’t seem to be able to find it. It is not a case of the bot being slower than other chess programs by being half or even a fifth of the speed, it is a case of the bot being slower by *multiple magnitudes*, which is really worrying! The search it currently performs, besides inherently simplistic, is also really shallow, which prevents it from finding simple tactics and checkmating patterns in end games.
Here is a direct URL to the bot’s analysis algorithm: https://github.com/zamfofex/dummyette/b ... nalysis.js
What am I missing?
Since then, I implemented bitboards, and it severely helped increase the bot’s analysis speed without needing to use an external native program for it! https://github.com/zamfofex/dummyette/c ... b961f69356
However, it’s still very slow. Don’t get me wrong, it performs well enough for me to be happy about it, but I still think there is a lot of room for improvement.
I have been reading a lot about fast chess move generation algorithms lately, but I didn’t find anything directly applicable to me. You see, my approach is to only ever generate pseudo‐moves, while also entirely disconsidering e.p. and castling. The king may be captured, and is worth many points, so that is naturally avoided by the bot during the search. This is a sufficient enough heuristic for my purposes. Other optimizations I found (ones that I don’t already implement) were mostly related to discarding illegal moves from what I was able to understand.
What I don’t understand is why the move generation and search is so slow in my bot. It’s very frustrating for me, because although there *clearly* is room for improvement, I can’t seem to be able to find it. It is not a case of the bot being slower than other chess programs by being half or even a fifth of the speed, it is a case of the bot being slower by *multiple magnitudes*, which is really worrying! The search it currently performs, besides inherently simplistic, is also really shallow, which prevents it from finding simple tactics and checkmating patterns in end games.
Here is a direct URL to the bot’s analysis algorithm: https://github.com/zamfofex/dummyette/b ... nalysis.js
What am I missing?
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: How can I improve my engine’s speed?
Algorithmic:
You are not leveraging bitboards - you just use your bitboard like an array - which is not the way to go:
With a BB engine you dont have sturctures like this at all:
for (let y = 7 ; y >= 0 ; y--)
for (let x = 7 ; x >= 0 ; x--)
Also your sliders are also using 8 for loops which is not the way with Bitboards
I recommend you take a look at this:
https://github.com/Gigantua/Gigantua/bl ... velist.hpp
https://www.codeproject.com/Articles/53 ... egenerator
https://github.com/Gigantua/Chess_Movegen
And generally reading everything under bitboards in the cpw.
https://www.chessprogramming.org/Bitboards
At the end you should have no loops over the squares, and no switch over the piecetype.
Language:
Write this in C++ and use EMScripten to use it like your javascript.
You are not leveraging bitboards - you just use your bitboard like an array - which is not the way to go:
With a BB engine you dont have sturctures like this at all:
for (let y = 7 ; y >= 0 ; y--)
for (let x = 7 ; x >= 0 ; x--)
Also your sliders are also using 8 for loops which is not the way with Bitboards
I recommend you take a look at this:
https://github.com/Gigantua/Gigantua/bl ... velist.hpp
https://www.codeproject.com/Articles/53 ... egenerator
https://github.com/Gigantua/Chess_Movegen
And generally reading everything under bitboards in the cpw.
https://www.chessprogramming.org/Bitboards
At the end you should have no loops over the squares, and no switch over the piecetype.
Language:
Write this in C++ and use EMScripten to use it like your javascript.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 17
- Joined: Thu Jun 30, 2022 12:46 pm
- Full name: Kabir Kumar
Re: How can I improve my engine’s speed?
Let me share first that I am a newbie in chess programming as well, so my suggestions may be helpful at all.
I have not checked your code, but this is what I think which could help you.
1. Profile the application, which should provide you with some hotpath which takes up most time. Try to optimize this. Share it here and people can help as well.
2. Check if the you have implemented alpha-beta pruning, if yes then further implement something like MVV/LVA, killers and history heuristics and order the moves (try captures first and in captures try the good ones first, MVV/LVA should help).
These two points should help you in my view.
Thanks.
I have not checked your code, but this is what I think which could help you.
1. Profile the application, which should provide you with some hotpath which takes up most time. Try to optimize this. Share it here and people can help as well.
2. Check if the you have implemented alpha-beta pruning, if yes then further implement something like MVV/LVA, killers and history heuristics and order the moves (try captures first and in captures try the good ones first, MVV/LVA should help).
These two points should help you in my view.
Thanks.
-
- Posts: 26
- Joined: Wed Feb 16, 2022 6:21 am
- Full name: P. M. Zamboni
Re: How can I improve my engine’s speed?
dangi12012 wrote: ↑Thu Jul 21, 2022 10:43 am Algorithmic:
You are not leveraging bitboards - you just use your bitboard like an array - which is not the way to go:
With a BB engine you dont have sturctures like this at all:
for (let y = 7 ; y >= 0 ; y--)
for (let x = 7 ; x >= 0 ; x--)
I will note: These loops you found only run once per analysed position, and they are used to convert the board representation from their array form to bitboard form. (Because everywhere else in the program, where I don’t need good performance, I use a complete valid‐move‐only array implementation.) When operating on the bitboards themselves, loops of this form are not used.dangi12012 wrote: ↑Thu Jul 21, 2022 10:43 am At the end you should have no loops over the squares, and no switch over the piecetype.
Oh? I thought the idea of using bitboards was to optimize moving pieces across squares (by using shifting instead of reading/writing memory in an array). I don’t understand how I could move rooks/bishops/queens without loops, even while using bitboards.dangi12012 wrote: ↑Thu Jul 21, 2022 10:43 am Also your sliders are also using 8 for loops which is not the way with Bitboards
I acknowledge JavaScript is not the usually the fastest language, but I don’t believe it is the biggest bottleneck of my program currently. The goal of my project is to write a chess bot *in JavaScript*, and I don’t want to abandon that. I know I could write it in C (or C++ as you suggest) to make it faster (then either use FFI or WebAssembly to interact with it from JavaScript), but that is not my goal.dangi12012 wrote: ↑Thu Jul 21, 2022 10:43 am Language:
Write this in C++ and use EMScripten to use it like your javascript.
I remember implementing alpha-beta pruning, and also a capture‐first heuristic at some point (you might be able to find the latter by traversing the commit history), but it was unsuccessful at improving speed significantly, if at all.
That actually doesn’t sound like a bad idea. I’ll try to profile it and share the results here once I do so!
Thank you both for your thoughts and suggestions, by the way! I really appreciate it.
-
- Posts: 26
- Joined: Wed Feb 16, 2022 6:21 am
- Full name: P. M. Zamboni
Re: How can I improve my engine’s speed?
I have written the following program, then profiled one of its threads. (Profiling results are shown below.)

Interestingly, a surprisingly amount of time is spent on the ‘play’ function! Perhaps less unexpectedly, a significant amount of time is also spent on the ‘load’ function. (Considering only self time.)
Code: Select all
import {AsyncAnalyser} from "./dummyette.js"
import {standardBoard} from "./chess.js"
let board = standardBoard
let analyser = AsyncAnalyser()
while (true)
{
console.log(board.toASCII())
console.log("---------------")
let moves = await analyser.analyse(board)
let move = moves[0]
if (!move) break
board = move.play()
}
Deno.exit()

Interestingly, a surprisingly amount of time is spent on the ‘play’ function! Perhaps less unexpectedly, a significant amount of time is also spent on the ‘load’ function. (Considering only self time.)
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: How can I improve my engine’s speed?
When you have the square (int) and the blockers (uint64_t) like that pext:
Code: Select all
(SliderPext + RookOffset_Pext[square])[_pext_u64(occupy, rmask[square])];
Code: Select all
return m.attacks[((occ | m.mask) * m.hash) >> (64 - 12)];
Code: Select all
static constexpr uint64_t attack(uint64_t pieces, uint32_t x, uint64_t mask) {
uint64_t o = pieces & mask;
return ((o - (1ull << x)) ^ bit_bswap(bit_bswap(o) - (0x8000000000000000ull >> x))) & mask; //Daniel 28.04.2022 - Faster shift. Replaces (1ull << (s ^ 56))
}
Code: Select all
static constexpr uint64_t Queen(int sq, uint64_t occ) {
occ |= 0x8000000000000001;
uint64_t bb = (ray[std::countr_zero(ray[sq].rwsNW & occ)].rayNW
| ray[std::countr_zero(ray[sq].rwsNN & occ)].rayNN
| ray[std::countr_zero(ray[sq].rwsNE & occ)].rayNE
| ray[std::countr_zero(ray[sq].rwsEE & occ)].rayEE
| ray[63 - std::countl_zero(ray[sq].rwsSE & occ)].raySE
| ray[63 - std::countl_zero(ray[sq].rwsSS & occ)].raySS
| ray[63 - std::countl_zero(ray[sq].rwsSW & occ)].raySW
| ray[63 - std::countl_zero(ray[sq].rwsWW & occ)].rayWW)
^ queens[sq];
return bb;// Rook(sq, occ) | Bishop(sq, occ);
}
Code: Select all
static uint64_t Vector32_Unrolled(int sq, uint64_t occ, uint64_t gather, uint64_t scatter, uint32_t count, const uint8_t* weights) {
const __m256i input = _mm256_set1_epi16((short)_pext_u64(occ, gather));
uint32_t result = 0;
//One network infers one output bit - 14 Networks with this configuration: 16 x 32 x 1
//These networks have learned how to transform the _pext_u64 occupancy into the input for _pdep_u64 so that the output is exactly the correct sliding piece
//Normally a table of 512kb would be needed for this task. The networks can infer this with 28k of memory!
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 0 * 32)))))) > 16) << 0;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 1 * 32)))))) > 16) << 1;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 2 * 32)))))) > 16) << 2;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 3 * 32)))))) > 16) << 3;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 4 * 32)))))) > 16) << 4;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 5 * 32)))))) > 16) << 5;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 6 * 32)))))) > 16) << 6;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 7 * 32)))))) > 16) << 7;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 8 * 32)))))) > 16) << 8;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 9 * 32)))))) > 16) << 9;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 10 * 32)))))) > 16) << 10;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 11 * 32)))))) > 16) << 11;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 12 * 32)))))) > 16) << 12;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 13 * 32)))))) > 16) << 13;
return _pdep_u64(result, scatter);
}
https://github.com/Gigantua/Chess_Moveg ... XShift.hpp
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 26
- Joined: Wed Feb 16, 2022 6:21 am
- Full name: P. M. Zamboni
Re: How can I improve my engine’s speed?
I will say that JavaScript provides no way of performing certain bit manipulation operations such as ‘pext’ and ‘popcount’, but even then, I don’t understand how what you are proposing is meant to be understood nor how it is meant to help me. Note that I use the loops to find moves to search further. From what I was able to understand (which I will admit is limited), your advice finds the *longest* slider moves for each piece, whereas I think I still need to iterate through them to be able to actually form positions to search further.
I will also say that, considering my profiling results, although a considerable portion of time is spent on the ‘playLineMove’ function, it is not anywhere nearly as much as the time spent on the ‘play’ and ‘load’ functions, so I think that, by far, the biggest opportunity for improvement is not what you are proposing. Though it could definitely be useful for further optimizations later on! (If I am able to eventually understand what you mean more thoroughly, of course.)
I will also say that, considering my profiling results, although a considerable portion of time is spent on the ‘playLineMove’ function, it is not anywhere nearly as much as the time spent on the ‘play’ and ‘load’ functions, so I think that, by far, the biggest opportunity for improvement is not what you are proposing. Though it could definitely be useful for further optimizations later on! (If I am able to eventually understand what you mean more thoroughly, of course.)
-
- Posts: 26
- Joined: Wed Feb 16, 2022 6:21 am
- Full name: P. M. Zamboni
Re: How can I improve my engine’s speed?
A bit of progress: On my local repository, I was able to improve the speed noticiably by removing the ‘save’ and ‘load’ functions and incorporating them into the ‘play’ and ‘playPromotion’ functions directly. However, this is still not nearly enough to increase the depth at all without sacrificing significant speed.
The most expensive function is still the ‘play’ function (considering self time). I’m not sure about what part of the ‘play’ function is causing for it to be so slow, except for acknowledging it is one of the most frequently‐called functions in the analysis algorithm.
Ideally, to me, I’d be able to get the depth to be around four to six plies at least, as opposed to the current three‐ply depth. I am currently able to bump the depth by one (from three to four plies) to change the analysis speed from ~1s to ~15s on my computer.
The most expensive function is still the ‘play’ function (considering self time). I’m not sure about what part of the ‘play’ function is causing for it to be so slow, except for acknowledging it is one of the most frequently‐called functions in the analysis algorithm.
Ideally, to me, I’d be able to get the depth to be around four to six plies at least, as opposed to the current three‐ply depth. I am currently able to bump the depth by one (from three to four plies) to change the analysis speed from ~1s to ~15s on my computer.
-
- Posts: 26
- Joined: Wed Feb 16, 2022 6:21 am
- Full name: P. M. Zamboni
Re: How can I improve my engine’s speed?
A couple more notes about my endeavors:
- I decided to port the entire ‘analysis.js’ module to C, just to verify how much better it would perform. It seems to be somewhere between two to five times faster!
- I decided to verify the number of nodes per second of the C program, and it is about 40 million n/s on one of my computers. (Pseudo‐move generation.)
- Bumping the depth by two plies does seem to make the bot better, but it is still not fast enough, even with the C program.
- Perhaps I need to indeed figure out a better analysis algorithm.
- Thoughts and suggestions would be appreciated!
Code: Select all
#include <stdint.h>
#include <stdbool.h>
#include <stddef.h>
int32_t dummyette_analyse(uint8_t turn, char *array, int32_t score, uint32_t white_score, uint32_t black_score, uint32_t *countdown)
{
uint64_t file_a = 0x0101010101010101;
uint64_t rank_1 = 0b11111111;
uint64_t file_h = file_a << 7;
uint64_t rank_8 = rank_1 << (7 * 8);
uint64_t corner_a1 = file_a | rank_1;
uint64_t corner_a8 = file_a | rank_8;
uint64_t corner_h1 = file_h | rank_1;
uint64_t corner_h8 = file_h | rank_8;
uint64_t knight_a = file_a | (file_a << 1);
uint64_t knight_h = file_h | (file_h >> 1);
uint64_t knight_1 = rank_1 | (rank_1 << 8);
uint64_t knight_8 = rank_8 | (rank_8 >> 8);
enum { infinity = 1234567890 };
enum { pawn_value = 100, knight_value = 320, bishop_value = 330, rook_value = 500, queen_value = 900, king_value = 599900 };
struct bitboard
{
uint64_t self, *color, *other, *group;
void (*move)(struct bitboard *bitboard);
};
enum { depth = 12 };
int32_t results[depth] = {-infinity};
uint32_t indices[depth] = {0};
bool white_turn = turn == 0;
int initial_turn = 1;
if (!white_turn) initial_turn = -1;
uint8_t i = 0;
uint64_t any = 0;
uint64_t pawns = 0;
uint64_t knights = 0;
uint64_t bishops = 0;
uint64_t rooks = 0;
uint64_t queens = 0;
uint64_t kings = 0;
uint64_t white = 0;
uint64_t black = 0;
struct bitboard bitboards[32];
uint8_t bitboard_count = 0;
auto void play(struct bitboard *bitboard, uint64_t after);
auto void play_promotion(struct bitboard *bitboard, uint64_t after);
auto void play_line_move(struct bitboard *bitboard, uint64_t limit, int n);
auto void play_line_move_2(struct bitboard *bitboard, uint64_t limit, int n);
auto void move_rook(struct bitboard *bitboard);
auto void move_bishop(struct bitboard *bitboard);
auto void move_queen(struct bitboard *bitboard);
auto void move_knight(struct bitboard *bitboard);
auto void move_king(struct bitboard *bitboard);
auto void move_white_pawn(struct bitboard *bitboard);
auto void move_black_pawn(struct bitboard *bitboard);
auto void move_initial_white_pawn(struct bitboard *bitboard);
auto void move_initial_black_pawn(struct bitboard *bitboard);
auto void next(void);
void play(struct bitboard *bitboard, uint64_t after)
{
int score2 = score;
uint64_t any2 = any;
uint64_t color = *bitboard->color;
uint64_t group = *bitboard->group;
uint64_t other = *bitboard->other;
uint64_t self = bitboard->self;
if (after & any & other)
{
int n = (bitboard->color == &white) / black_score - (bitboard->color == &black) / white_score;
if (after & pawns) score += n * pawn_value;
if (after & knights) score += n * knight_value;
if (after & bishops) score += n * bishop_value;
if (after & rooks) score += n * rook_value;
if (after & queens) score += n * queen_value;
if (after & kings) score += n * king_value;
}
any &= ~bitboard->self;
any |= after;
*bitboard->other &= ~after;
*bitboard->group |= after;
*bitboard->group &= ~bitboard->self;
*bitboard->color |= after;
*bitboard->color &= ~bitboard->self;
bitboard->self = after;
next();
score = score2;
any = any2;
*bitboard->color = color;
*bitboard->group = group;
*bitboard->other = other;
bitboard->self = self;
}
void play_promotion(struct bitboard *bitboard, uint64_t after)
{
uint64_t queens2 = queens;
uint64_t group = *bitboard->group;
int score2 = score;
int n = (bitboard->color == &white) / white_score - (bitboard->color == &black) / black_score;
score += n * (queen_value - pawn_value);
*bitboard->group &= ~bitboard->self;
bitboard->group = &queens;
*bitboard->group |= bitboard->self;
void (*move)(struct bitboard *bitboard) = bitboard->move;
bitboard->move = &move_queen;
play(bitboard, after);
bitboard->move = move;
bitboard->group = &pawns;
queens = queens2;
*bitboard->group = group;
score = score2;
}
void play_line_move(struct bitboard *bitboard, uint64_t limit, int n)
{
uint64_t before = bitboard->self;
uint64_t after = before;
for (;;)
{
if (after & limit) break;
after >>= n;
if (after & any)
{
if (after & *bitboard->other) play(bitboard, after);
break;
}
play(bitboard, after);
}
}
void play_line_move_2(struct bitboard *bitboard, uint64_t limit, int n)
{
uint64_t before = bitboard->self;
uint64_t after = before;
for (;;)
{
if (after & limit) break;
after <<= n;
if (after & any)
{
if (after & *bitboard->other) play(bitboard, after);
break;
}
play(bitboard, after);
}
}
void move_rook(struct bitboard *bitboard)
{
play_line_move(bitboard, rank_1, 8);
play_line_move_2(bitboard, rank_8, 8);
play_line_move(bitboard, file_a, 1);
play_line_move_2(bitboard, file_h, 1);
}
void move_bishop(struct bitboard *bitboard)
{
play_line_move(bitboard, corner_a1, 9);
play_line_move_2(bitboard, corner_a8, 7);
play_line_move(bitboard, corner_h1, 7);
play_line_move_2(bitboard, corner_h8, 9);
}
void move_queen(struct bitboard *bitboard)
{
move_rook(bitboard);
move_bishop(bitboard);
}
void move_knight(struct bitboard *bitboard)
{
uint64_t before = bitboard->self;
if (!(before & knight_a) && !(before & rank_1))
{
uint64_t after = before >> 10;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & file_a) && !(before & knight_1))
{
uint64_t after = before >> 17;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & knight_a) && !(before & rank_8))
{
uint64_t after = before << 6;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & file_a) && !(before & knight_8))
{
uint64_t after = before << 15;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & knight_h) && !(before & rank_1))
{
uint64_t after = before >> 6;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & file_h) && !(before & knight_1))
{
uint64_t after = before >> 15;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & knight_h) && !(before & rank_8))
{
uint64_t after = before << 10;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & file_h) && !(before & knight_8))
{
uint64_t after = before << 17;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
}
void move_king(struct bitboard *bitboard)
{
uint64_t before = bitboard->self;
if (!(before & file_a))
{
uint64_t after = before >> 1;
if (!(after & any & *bitboard->color))
play(bitboard, after);
if (!(before & rank_1))
{
uint64_t after = before >> 9;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & rank_8))
{
uint64_t after = before << 7;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
}
if (!(before & file_h))
{
uint64_t after = before << 1;
if (!(after & any & *bitboard->color))
play(bitboard, after);
if (!(before & rank_1))
{
uint64_t after = before >> 7;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & rank_8))
{
uint64_t after = before << 9;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
}
if (!(before & rank_1))
{
uint64_t after = before >> 8;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
if (!(before & rank_8))
{
uint64_t after = before << 8;
if (!(after & any & *bitboard->color))
play(bitboard, after);
}
}
void move_white_pawn(struct bitboard *bitboard)
{
uint64_t before = bitboard->self;
uint64_t after;
after = before << 7;
if (!(before & file_a) && (after & any & black))
{
if (after & rank_8)
play_promotion(bitboard, after);
else
play(bitboard, after);
}
after = before << 9;
if (!(before & file_h) && (after & any & black))
{
if (after & rank_8)
play_promotion(bitboard, after);
else
play(bitboard, after);
}
after = before << 8;
if (!(after & any))
{
if (after & rank_8)
play_promotion(bitboard, after);
else
play(bitboard, after);
}
}
void move_black_pawn(struct bitboard *bitboard)
{
uint64_t before = bitboard->self;
uint64_t after;
after = before >> 7;
if (!(before & file_h) && (after & any & white))
{
if (after & rank_1)
play_promotion(bitboard, after);
else
play(bitboard, after);
}
after = before >> 9;
if (!(before & file_a) && (after & any & white))
{
if (after & rank_1)
play_promotion(bitboard, after);
else
play(bitboard, after);
}
after = before >> 8;
if (!(after & any))
{
if (after & rank_1)
play_promotion(bitboard, after);
else
play(bitboard, after);
}
}
void move_initial_white_pawn(struct bitboard *bitboard)
{
bitboard->move = &move_white_pawn;
move_white_pawn(bitboard);
uint64_t before = bitboard->self;
uint64_t after = before << 16;
if (!(after & any)) play(bitboard, after);
bitboard->move = &move_initial_white_pawn;
}
void move_initial_black_pawn(struct bitboard *bitboard)
{
bitboard->move = &move_black_pawn;
move_black_pawn(bitboard);
uint64_t before = bitboard->self;
uint64_t after = before >> 16;
if (!(after & any)) play(bitboard, after);
bitboard->move = &move_initial_black_pawn;
}
void next(void)
{
if (!(kings & any & white))
{
indices[i] = i;
results[i] = infinity * initial_turn;;
return;
}
if (!(kings & any & black))
{
indices[i] = i;
results[i] = -infinity * initial_turn;
return;
}
if (i > 4)
{
int32_t result = -score * initial_turn;
if (results[i] > result) results[i] = result;
return;
}
white_turn = !white_turn;
i++;
if (i % 2 != 0)
{
indices[i] = i;
results[i] = infinity;
for (int j = 0 ; j < bitboard_count ; j++)
{
struct bitboard *bitboard = bitboards + j;
if (results[i] < results[i - 1]) break;
if ((bitboard->color == &white) == white_turn) continue;
if (!(bitboard->self & *bitboard->color & any)) continue;
(*bitboard->move)(bitboard);
}
if (results[i - 1] < results[i])
results[i - 1] = results[i],
indices[i - 1] = indices[i];
}
else
{
indices[i] = i;
results[i] = -infinity;
for (int j = 0 ; j < bitboard_count ; j++)
{
struct bitboard *bitboard = bitboards + j;
if (results[i] > results[i - 1]) break;
if ((bitboard->color == &white) == white_turn) continue;
if (!(bitboard->self & *bitboard->color & any)) continue;
(*bitboard->move)(bitboard);
}
if (results[i - 1] > results[i])
results[i - 1] = results[i],
indices[i - 1] = indices[i];
}
white_turn = !white_turn;
i--;
}
for (int y = 0 ; y < 8 ; y++)
for (int x = 7 ; x >= 0 ; x--)
{
any <<= 1;
pawns <<= 1;
knights <<= 1;
bishops <<= 1;
rooks <<= 1;
queens <<= 1;
kings <<= 1;
white <<= 1;
black <<= 1;
char piece = array[y * 8 + x];
switch (piece)
{
default:
return 1;
case ' ':
continue;
case 'R':
case 'r':
rooks |= 1;
break;
case 'N':
case 'n':
knights |= 1;
break;
case 'B':
case 'b':
bishops |= 1;
break;
case 'Q':
case 'q':
queens |= 1;
break;
case 'K':
case 'k':
kings |= 1;
break;
case 'P':
case 'p':
pawns |= 1;
break;
}
any |= 1;
if (piece < 'a')
white |= 1;
else
black |= 1;
}
for (int i = 0 ; i < 64 ; i++)
{
if (!((any>>i) & 1)) continue;
struct bitboard *bitboard = bitboards + bitboard_count;
bitboard_count++;
bitboard->self = ((uint64_t) 1) << i;
if ((pawns >> i) & 1)
{
bitboard->group = &pawns;
if ((white >> i) & 1) bitboard->move = &move_initial_white_pawn;
if ((black >> i) & 1) bitboard->move = &move_initial_black_pawn;
}
if ((knights >> i) & 1) bitboard->group = &knights, bitboard->move = &move_knight;
if ((bishops >> i) & 1) bitboard->group = &bishops, bitboard->move = &move_bishop;
if ((rooks >> i) & 1) bitboard->group = &rooks, bitboard->move = &move_rook;
if ((queens >> i) & 1) bitboard->group = &queens, bitboard->move = &move_queen;
if ((kings >> i) & 1) bitboard->group = &kings, bitboard->move = &move_king;
if ((white >> i) & 1) bitboard->color = &white, bitboard->other = &black;
if ((black >> i) & 1) bitboard->color = &black, bitboard->other = &white;
}
next();
if (countdown != NULL) *countdown = indices[i];
return results[0];
}
-
- Posts: 84
- Joined: Wed Aug 04, 2021 12:42 am
- Full name: Ellie Moore
Re: How can I improve my engine’s speed?
I don't know if anyone has already mentioned it, but there are so many great bitwise hacks in the book Hacker's Delight. These hacks will help you get the most out of your bitboard implementation!
https://github.com/lancetw/ebook-1/blob ... dition.pdf
The most critical operations used in bitboard programming are:
& : intersection
| : union
~ : complement
- : two's complement
^ : XOR to replace slower stuff like & ~ when possible. Also great for scrambling bits and making keys.
>> or << : right shift, left shift. Can be slow, but great for generating tables of bitboards and doing necessary stuff like pushing pawns!
Iteration over high bits is accomplished like so:
All permutations of a masked section of a bitboard is accomplished with the ripple-carry
Some twiddling tricks!
Find the number of high bits in a bitboard!
Isolate the most significant bit!
There are lots, lots more in the book. I also recommend looking at the hacks
on the chessprogramming website.
Also look into bit scan forward implementations and instructions for your hardware.
Also look into magic bitboards and PEXT instructions for your hardware.
Good luck and happy hacking!
https://github.com/lancetw/ebook-1/blob ... dition.pdf
The most critical operations used in bitboard programming are:
& : intersection
| : union
~ : complement
- : two's complement
^ : XOR to replace slower stuff like & ~ when possible. Also great for scrambling bits and making keys.
>> or << : right shift, left shift. Can be slow, but great for generating tables of bitboards and doing necessary stuff like pushing pawns!
Iteration over high bits is accomplished like so:
Code: Select all
for(u64 x = bb; x; x &= x - 1)
{
u8 square = bitScanForward(x);
// Do something with square !
}
Code: Select all
// cycle through all permutations of the masked bits in a 64-bit integer, x.
u64 x = 0;
do
{
// Do stuff with x
x = (x - mask) & mask;
// x = ((x | ~mask) + 1) & mask;
} while (x);
Code: Select all
// Isolate the least significant set bit -> 1011 -> 0001
x & -x; // Note the two's complement
// Pop the least significant set bit -> 1011 -> 1010
x & x - 1;
// Pop the least significant string of set bits -> 1011 -> 1000
x & x + 1;
// absolute value of x
x - ((x << 1U) & (x >> 31U));
// If x is a power of 2... n % x
n & (x - 1);
// is x a power of two?
x && !(x & (x - 1));
Code: Select all
// How many set bits are in the 64 bit integer x?
// Use the formula high bit count =
// x - (SUM from n=0 to n=inf (x >> n))!
// Count bits in each 4-bit section.
u64 n =
(x >> 1U) & 0x7777777777777777UL;
x -= n;
n = (n >> 1U) & 0x7777777777777777UL;
x -= n;
n = (n >> 1U) & 0x7777777777777777UL;
x -= n;
// add 4-bit sums into byte sums.
x = (x + (x >> 4U)) & 0x0F0F0F0F0F0F0F0FUL;
// Add the byte sums.
x *= 0x0101010101010101UL;
return (u8)(x >> 56U);
Code: Select all
// isolate the most significant bit of an unsigned
// 64-bit integer, x, without branching.
x |= x >> 1U;
x |= x >> 2U;
x |= x >> 4U;
x |= x >> 8U;
x |= x >> 16U;
x |= x >> 32U;
x = (x >> 1U) + 1;
// isolate msb of x with branching...
// n holds the result. Ideal for small
// integers or integers with sparse
// number of bits.
u64 n(0);
for(u64 x = bb; x; n = x, x &= x - 1);
return n;
on the chessprogramming website.
Also look into bit scan forward implementations and instructions for your hardware.
Also look into magic bitboards and PEXT instructions for your hardware.
Good luck and happy hacking!