How can I improve my engine’s speed?

zamfofex · Post by **zamfofex** » Thu Jul 21, 2022 10:22 am

Hello, everyone! A few months ago, I posted about my chess bot (“engine”): https://talkchess.com/forum3/viewtopic.php?f=7&t=79379

Since then, I implemented bitboards, and it severely helped increase the bot’s analysis speed without needing to use an external native program for it! https://github.com/zamfofex/dummyette/c ... b961f69356

However, it’s still very slow. Don’t get me wrong, it performs well enough for me to be happy about it, but I still think there is a lot of room for improvement.

I have been reading a lot about fast chess move generation algorithms lately, but I didn’t find anything directly applicable to me. You see, my approach is to only ever generate pseudo‐moves, while also entirely disconsidering e.p. and castling. The king may be captured, and is worth many points, so that is naturally avoided by the bot during the search. This is a sufficient enough heuristic for my purposes. Other optimizations I found (ones that I don’t already implement) were mostly related to discarding illegal moves from what I was able to understand.

What I don’t understand is why the move generation and search is so slow in my bot. It’s very frustrating for me, because although there *clearly* is room for improvement, I can’t seem to be able to find it. It is not a case of the bot being slower than other chess programs by being half or even a fifth of the speed, it is a case of the bot being slower by *multiple magnitudes*, which is really worrying! The search it currently performs, besides inherently simplistic, is also really shallow, which prevents it from finding simple tactics and checkmating patterns in end games.

Here is a direct URL to the bot’s analysis algorithm: https://github.com/zamfofex/dummyette/b ... nalysis.js

What am I missing?

dangi12012 · Post by **dangi12012** » Thu Jul 21, 2022 10:43 am

Algorithmic:
You are not leveraging bitboards - you just use your bitboard like an array - which is not the way to go:
With a BB engine you dont have sturctures like this at all:
for (let y = 7 ; y >= 0 ; y--)
for (let x = 7 ; x >= 0 ; x--)

Also your sliders are also using 8 for loops which is not the way with Bitboards

I recommend you take a look at this:
https://github.com/Gigantua/Gigantua/bl ... velist.hpp
https://www.codeproject.com/Articles/53 ... egenerator
https://github.com/Gigantua/Chess_Movegen

And generally reading everything under bitboards in the cpw.
https://www.chessprogramming.org/Bitboards
At the end you should have no loops over the squares, and no switch over the piecetype.

Language:
Write this in C++ and use EMScripten to use it like your javascript.

okidoki · Post by **okidoki** » Thu Jul 21, 2022 10:56 am

Let me share first that I am a newbie in chess programming as well, so my suggestions may be helpful at all.
I have not checked your code, but this is what I think which could help you.

1. Profile the application, which should provide you with some hotpath which takes up most time. Try to optimize this. Share it here and people can help as well.
2. Check if the you have implemented alpha-beta pruning, if yes then further implement something like MVV/LVA, killers and history heuristics and order the moves (try captures first and in captures try the good ones first, MVV/LVA should help).

These two points should help you in my view.

Thanks.

zamfofex · Post by **zamfofex** » Thu Jul 21, 2022 11:09 am

dangi12012 wrote: ↑Thu Jul 21, 2022 10:43 am Algorithmic:
You are not leveraging bitboards - you just use your bitboard like an array - which is not the way to go:
With a BB engine you dont have sturctures like this at all:
for (let y = 7 ; y >= 0 ; y--)
for (let x = 7 ; x >= 0 ; x--)

dangi12012 wrote: ↑Thu Jul 21, 2022 10:43 am At the end you should have no loops over the squares, and no switch over the piecetype.

I will note: These loops you found only run once per analysed position, and they are used to convert the board representation from their array form to bitboard form. (Because everywhere else in the program, where I don’t need good performance, I use a complete valid‐move‐only array implementation.) When operating on the bitboards themselves, loops of this form are not used.

dangi12012 wrote: ↑Thu Jul 21, 2022 10:43 am Also your sliders are also using 8 for loops which is not the way with Bitboards

Oh? I thought the idea of using bitboards was to optimize moving pieces across squares (by using shifting instead of reading/writing memory in an array). I don’t understand how I could move rooks/bishops/queens without loops, even while using bitboards.

dangi12012 wrote: ↑Thu Jul 21, 2022 10:43 am Language:
Write this in C++ and use EMScripten to use it like your javascript.

I acknowledge JavaScript is not the usually the fastest language, but I don’t believe it is the biggest bottleneck of my program currently. The goal of my project is to write a chess bot *in JavaScript*, and I don’t want to abandon that. I know I could write it in C (or C++ as you suggest) to make it faster (then either use FFI or WebAssembly to interact with it from JavaScript), but that is not my goal.

okidoki wrote: ↑Thu Jul 21, 2022 10:56 am 2. Check if the you have implemented alpha-beta pruning, if yes then further implement something like MVV/LVA, killers and history heuristics and order the moves (try captures first and in captures try the good ones first, MVV/LVA should help).

I remember implementing alpha-beta pruning, and also a capture‐first heuristic at some point (you might be able to find the latter by traversing the commit history), but it was unsuccessful at improving speed significantly, if at all.

okidoki wrote: ↑Thu Jul 21, 2022 10:56 am 1. Profile the application, which should provide you with some hotpath which takes up most time. Try to optimize this. Share it here and people can help as well.

That actually doesn’t sound like a bad idea. I’ll try to profile it and share the results here once I do so!

Thank you both for your thoughts and suggestions, by the way! I really appreciate it.

zamfofex · Post by **zamfofex** » Thu Jul 21, 2022 11:47 am

I have written the following program, then profiled one of its threads. (Profiling results are shown below.)

Code: Select all

import {AsyncAnalyser} from "./dummyette.js"
import {standardBoard} from "./chess.js"

let board = standardBoard
let analyser = AsyncAnalyser()

while (true)
{
	console.log(board.toASCII())
	console.log("---------------")
	let moves = await analyser.analyse(board)
	let move = moves[0]
	if (!move) break
	board = move.play()
}

Deno.exit()

Interestingly, a surprisingly amount of time is spent on the ‘play’ function! Perhaps less unexpectedly, a significant amount of time is also spent on the ‘load’ function. (Considering only self time.)

dangi12012 · Post by **dangi12012** » Thu Jul 21, 2022 12:31 pm

zamfofex wrote: ↑Thu Jul 21, 2022 11:09 am Oh? I thought the idea of using bitboards was to optimize moving pieces across squares (by using shifting instead of reading/writing memory in an array). I don’t understand how I could move rooks/bishops/queens without loops, even while using bitboards.

When you have the square (int) and the blockers (uint64_t) like that pext:

Code: Select all

(SliderPext + RookOffset_Pext[square])[_pext_u64(occupy, rmask[square])];

Or like that (fancy magic):

Code: Select all

return m.attacks[((occ | m.mask) * m.hash) >> (64 - 12)];

Or like that (per direction)

Code: Select all

	static constexpr uint64_t attack(uint64_t pieces, uint32_t x, uint64_t mask) {
		uint64_t o = pieces & mask;
		return ((o - (1ull << x)) ^ bit_bswap(bit_bswap(o) - (0x8000000000000000ull >> x))) & mask; //Daniel 28.04.2022 - Faster shift. Replaces (1ull << (s ^ 56))
	}

Or like that:

Code: Select all

	static constexpr uint64_t Queen(int sq, uint64_t occ) {
		occ |= 0x8000000000000001;
		uint64_t bb = (ray[std::countr_zero(ray[sq].rwsNW & occ)].rayNW
			| ray[std::countr_zero(ray[sq].rwsNN & occ)].rayNN
			| ray[std::countr_zero(ray[sq].rwsNE & occ)].rayNE
			| ray[std::countr_zero(ray[sq].rwsEE & occ)].rayEE
			| ray[63 - std::countl_zero(ray[sq].rwsSE & occ)].raySE
			| ray[63 - std::countl_zero(ray[sq].rwsSS & occ)].raySS
			| ray[63 - std::countl_zero(ray[sq].rwsSW & occ)].raySW
			| ray[63 - std::countl_zero(ray[sq].rwsWW & occ)].rayWW)
			^ queens[sq];

		return bb;// Rook(sq, occ) | Bishop(sq, occ);
	}

Or as research like that: (I authored this)

Code: Select all

	static uint64_t Vector32_Unrolled(int sq, uint64_t occ, uint64_t gather, uint64_t scatter, uint32_t count, const uint8_t* weights) {
		const __m256i input = _mm256_set1_epi16((short)_pext_u64(occ, gather));
		uint32_t result = 0;

		//One network infers one output bit - 14 Networks with this configuration: 16 x 32 x 1
		//These networks have learned how to transform the _pext_u64 occupancy into the input for _pdep_u64 so that the output is exactly the correct sliding piece
		//Normally a table of 512kb would be needed for this task. The networks can infer this with 28k of memory!

		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 0 * 32)))))) > 16) << 0;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 1 * 32)))))) > 16) << 1;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 2 * 32)))))) > 16) << 2;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 3 * 32)))))) > 16) << 3;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 4 * 32)))))) > 16) << 4;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 5 * 32)))))) > 16) << 5;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 6 * 32)))))) > 16) << 6;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 7 * 32)))))) > 16) << 7;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 8 * 32)))))) > 16) << 8;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 9 * 32)))))) > 16) << 9;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 10 * 32)))))) > 16) << 10;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 11 * 32)))))) > 16) << 11;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 12 * 32)))))) > 16) << 12;
		result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(Chess_Lookup::BNNInternal::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 13 * 32)))))) > 16) << 13;

		return _pdep_u64(result, scatter);
	}

Also a loop can be optimized and parallelized as well (I authored this as well)
https://github.com/Gigantua/Chess_Moveg ... XShift.hpp

zamfofex · Post by **zamfofex** » Fri Jul 22, 2022 4:41 am

I will say that JavaScript provides no way of performing certain bit manipulation operations such as ‘pext’ and ‘popcount’, but even then, I don’t understand how what you are proposing is meant to be understood nor how it is meant to help me. Note that I use the loops to find moves to search further. From what I was able to understand (which I will admit is limited), your advice finds the *longest* slider moves for each piece, whereas I think I still need to iterate through them to be able to actually form positions to search further.

I will also say that, considering my profiling results, although a considerable portion of time is spent on the ‘playLineMove’ function, it is not anywhere nearly as much as the time spent on the ‘play’ and ‘load’ functions, so I think that, by far, the biggest opportunity for improvement is not what you are proposing. Though it could definitely be useful for further optimizations later on! (If I am able to eventually understand what you mean more thoroughly, of course.)

zamfofex · Post by **zamfofex** » Fri Jul 22, 2022 6:44 am

A bit of progress: On my local repository, I was able to improve the speed noticiably by removing the ‘save’ and ‘load’ functions and incorporating them into the ‘play’ and ‘playPromotion’ functions directly. However, this is still not nearly enough to increase the depth at all without sacrificing significant speed.

The most expensive function is still the ‘play’ function (considering self time). I’m not sure about what part of the ‘play’ function is causing for it to be so slow, except for acknowledging it is one of the most frequently‐called functions in the analysis algorithm.

Ideally, to me, I’d be able to get the depth to be around four to six plies at least, as opposed to the current three‐ply depth. I am currently able to bump the depth by one (from three to four plies) to change the analysis speed from ~1s to ~15s on my computer.

zamfofex · Post by **zamfofex** » Sat Jul 23, 2022 2:39 am

A couple more notes about my endeavors:

I decided to port the entire ‘analysis.js’ module to C, just to verify how much better it would perform. It seems to be somewhere between two to five times faster!
I decided to verify the number of nodes per second of the C program, and it is about 40 million n/s on one of my computers. (Pseudo‐move generation.)
Bumping the depth by two plies does seem to make the bot better, but it is still not fast enough, even with the C program.
Perhaps I need to indeed figure out a better analysis algorithm.
Thoughts and suggestions would be appreciated!

The C program (it uses GNU extensions that are not implemented by Clang, namely, inline functions):

Code: Select all

#include <stdint.h>
#include <stdbool.h>
#include <stddef.h>

int32_t dummyette_analyse(uint8_t turn, char *array, int32_t score, uint32_t white_score, uint32_t black_score, uint32_t *countdown)
{
	uint64_t file_a = 0x0101010101010101;
	uint64_t rank_1 = 0b11111111;
	
	uint64_t file_h = file_a << 7;
	uint64_t rank_8 = rank_1 << (7 * 8);
	
	uint64_t corner_a1 = file_a | rank_1;
	uint64_t corner_a8 = file_a | rank_8;
	uint64_t corner_h1 = file_h | rank_1;
	uint64_t corner_h8 = file_h | rank_8;
	
	uint64_t knight_a = file_a | (file_a << 1);
	uint64_t knight_h = file_h | (file_h >> 1);
	uint64_t knight_1 = rank_1 | (rank_1 << 8);
	uint64_t knight_8 = rank_8 | (rank_8 >> 8);
	
	enum { infinity = 1234567890 };
	
	enum { pawn_value = 100, knight_value = 320, bishop_value = 330, rook_value = 500, queen_value = 900, king_value = 599900 };
	
	struct bitboard
	{
		uint64_t self, *color, *other, *group;
		void (*move)(struct bitboard *bitboard);
	};
	
	enum { depth = 12 };
	int32_t results[depth] = {-infinity};
	uint32_t indices[depth] = {0};
	
	bool white_turn = turn == 0;
	int initial_turn = 1;
	if (!white_turn) initial_turn = -1;
	
	uint8_t i = 0;
	
	uint64_t any = 0;
	uint64_t pawns = 0;
	uint64_t knights = 0;
	uint64_t bishops = 0;
	uint64_t rooks = 0;
	uint64_t queens = 0;
	uint64_t kings = 0;
	uint64_t white = 0;
	uint64_t black = 0;
	
	struct bitboard bitboards[32];
	uint8_t bitboard_count = 0;
	
	auto void play(struct bitboard *bitboard, uint64_t after);
	auto void play_promotion(struct bitboard *bitboard, uint64_t after);
	auto void play_line_move(struct bitboard *bitboard, uint64_t limit, int n);
	auto void play_line_move_2(struct bitboard *bitboard, uint64_t limit, int n);
	auto void move_rook(struct bitboard *bitboard);
	auto void move_bishop(struct bitboard *bitboard);
	auto void move_queen(struct bitboard *bitboard);
	auto void move_knight(struct bitboard *bitboard);
	auto void move_king(struct bitboard *bitboard);
	auto void move_white_pawn(struct bitboard *bitboard);
	auto void move_black_pawn(struct bitboard *bitboard);
	auto void move_initial_white_pawn(struct bitboard *bitboard);
	auto void move_initial_black_pawn(struct bitboard *bitboard);
	auto void next(void);
	
	void play(struct bitboard *bitboard, uint64_t after)
	{
		int score2 = score;
		uint64_t any2 = any;
		uint64_t color = *bitboard->color;
		uint64_t group = *bitboard->group;
		uint64_t other = *bitboard->other;
		uint64_t self = bitboard->self;
		
		if (after & any & other)
		{
			int n = (bitboard->color == &white) / black_score - (bitboard->color == &black) / white_score;
			
			if (after & pawns) score += n * pawn_value;
			if (after & knights) score += n * knight_value;
			if (after & bishops) score += n * bishop_value;
			if (after & rooks) score += n * rook_value;
			if (after & queens) score += n * queen_value;
			if (after & kings) score += n * king_value;
		}
		
		any &= ~bitboard->self;
		any |= after;
		
		*bitboard->other &= ~after;
		*bitboard->group |= after;
		*bitboard->group &= ~bitboard->self;
		*bitboard->color |= after;
		*bitboard->color &= ~bitboard->self;
		bitboard->self = after;
		
		next();
		
		score = score2;
		any = any2;
		*bitboard->color = color;
		*bitboard->group = group;
		*bitboard->other = other;
		bitboard->self = self;
	}
	
	void play_promotion(struct bitboard *bitboard, uint64_t after)
	{
		uint64_t queens2 = queens;
		uint64_t group = *bitboard->group;
		int score2 = score;
		
		int n = (bitboard->color == &white) / white_score - (bitboard->color == &black) / black_score;
		score += n * (queen_value - pawn_value);
		
		*bitboard->group &= ~bitboard->self;
		bitboard->group = &queens;
		*bitboard->group |= bitboard->self;
		
		void (*move)(struct bitboard *bitboard) = bitboard->move;
		bitboard->move = &move_queen;
		
		play(bitboard, after);
		
		bitboard->move = move;
		bitboard->group = &pawns;
		
		queens = queens2;
		*bitboard->group = group;
		score = score2;
	}
	
	void play_line_move(struct bitboard *bitboard, uint64_t limit, int n)
	{
		uint64_t before = bitboard->self;
		uint64_t after = before;
		
		for (;;)
		{
			if (after & limit) break;
			after >>= n;
			if (after & any)
			{
				if (after & *bitboard->other) play(bitboard, after);
				break;
			}
			play(bitboard, after);
		}
	}
	
	void play_line_move_2(struct bitboard *bitboard, uint64_t limit, int n)
	{
		uint64_t before = bitboard->self;
		uint64_t after = before;
		
		for (;;)
		{
			if (after & limit) break;
			after <<= n;
			if (after & any)
			{
				if (after & *bitboard->other) play(bitboard, after);
				break;
			}
			play(bitboard, after);
		}
	}
	
	void move_rook(struct bitboard *bitboard)
	{
		play_line_move(bitboard, rank_1, 8);
		play_line_move_2(bitboard, rank_8, 8);
		play_line_move(bitboard, file_a, 1);
		play_line_move_2(bitboard, file_h, 1);
	}
	
	void move_bishop(struct bitboard *bitboard)
	{
		play_line_move(bitboard, corner_a1, 9);
		play_line_move_2(bitboard, corner_a8, 7);
		play_line_move(bitboard, corner_h1, 7);
		play_line_move_2(bitboard, corner_h8, 9);
	}
	
	void move_queen(struct bitboard *bitboard)
	{
		move_rook(bitboard);
		move_bishop(bitboard);
	}
	
	void move_knight(struct bitboard *bitboard)
	{
		uint64_t before = bitboard->self;
		
		if (!(before & knight_a) && !(before & rank_1))
		{
			uint64_t after = before >> 10;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
		if (!(before & file_a) && !(before & knight_1))
		{
			uint64_t after = before >> 17;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
		if (!(before & knight_a) && !(before & rank_8))
		{
			uint64_t after = before << 6;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
		if (!(before & file_a) && !(before & knight_8))
		{
			uint64_t after = before << 15;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
		if (!(before & knight_h) && !(before & rank_1))
		{
			uint64_t after = before >> 6;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
		if (!(before & file_h) && !(before & knight_1))
		{
			uint64_t after = before >> 15;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
		if (!(before & knight_h) && !(before & rank_8))
		{
			uint64_t after = before << 10;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
		if (!(before & file_h) && !(before & knight_8))
		{
			uint64_t after = before << 17;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
	}
	
	void move_king(struct bitboard *bitboard)
	{
		uint64_t before = bitboard->self;
		if (!(before & file_a))
		{
			uint64_t after = before >> 1;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
			
			if (!(before & rank_1))
			{
				uint64_t after = before >> 9;
				if (!(after & any & *bitboard->color))
					play(bitboard, after);
			}
			if (!(before & rank_8))
			{
				uint64_t after = before << 7;
				if (!(after & any & *bitboard->color))
					play(bitboard, after);
			}
		}
		if (!(before & file_h))
		{
			uint64_t after = before << 1;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
			
			if (!(before & rank_1))
			{
				uint64_t after = before >> 7;
				if (!(after & any & *bitboard->color))
					play(bitboard, after);
			}
			if (!(before & rank_8))
			{
				uint64_t after = before << 9;
				if (!(after & any & *bitboard->color))
					play(bitboard, after);
			}
		}
		if (!(before & rank_1))
		{
			uint64_t after = before >> 8;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
		if (!(before & rank_8))
		{
			uint64_t after = before << 8;
			if (!(after & any & *bitboard->color))
				play(bitboard, after);
		}
	}
	
	void move_white_pawn(struct bitboard *bitboard)
	{
		uint64_t before = bitboard->self;
		uint64_t after;
		
		after = before << 7;
		if (!(before & file_a) && (after & any & black))
		{
			if (after & rank_8)
				play_promotion(bitboard, after);
			else
				play(bitboard, after);
		}
		
		after = before << 9;
		if (!(before & file_h) && (after & any & black))
		{
			if (after & rank_8)
				play_promotion(bitboard, after);
			else
				play(bitboard, after);
		}
		
		after = before << 8;
		if (!(after & any))
		{
			if (after & rank_8)
				play_promotion(bitboard, after);
			else
				play(bitboard, after);
		}
	}
	
	void move_black_pawn(struct bitboard *bitboard)
	{
		uint64_t before = bitboard->self;
		uint64_t after;
		
		after = before >> 7;
		if (!(before & file_h) && (after & any & white))
		{
			if (after & rank_1)
				play_promotion(bitboard, after);
			else
				play(bitboard, after);
		}
		
		after = before >> 9;
		if (!(before & file_a) && (after & any & white))
		{
			if (after & rank_1)
				play_promotion(bitboard, after);
			else
				play(bitboard, after);
		}
		
		after = before >> 8;
		if (!(after & any))
		{
			if (after & rank_1)
				play_promotion(bitboard, after);
			else
				play(bitboard, after);
		}
	}
	
	void move_initial_white_pawn(struct bitboard *bitboard)
	{
		bitboard->move = &move_white_pawn;
		
		move_white_pawn(bitboard);
		uint64_t before = bitboard->self;
		uint64_t after = before << 16;
		if (!(after & any)) play(bitboard, after);
		
		bitboard->move = &move_initial_white_pawn;
	}
	
	void move_initial_black_pawn(struct bitboard *bitboard)
	{
		bitboard->move = &move_black_pawn;
		
		move_black_pawn(bitboard);
		uint64_t before = bitboard->self;
		uint64_t after = before >> 16;
		if (!(after & any)) play(bitboard, after);
		
		bitboard->move = &move_initial_black_pawn;
	}
	
	void next(void)
	{
		if (!(kings & any & white))
		{
			indices[i] = i;
			results[i] = infinity * initial_turn;;
			return;
		}
		if (!(kings & any & black))
		{
			indices[i] = i;
			results[i] = -infinity * initial_turn;
			return;
		}
		
		if (i > 4)
		{
			int32_t result = -score * initial_turn;
			if (results[i] > result) results[i] = result;
			return;
		}
		
		white_turn = !white_turn;
		i++;
		
		if (i % 2 != 0)
		{
			indices[i] = i;
			results[i] = infinity;
			for (int j = 0 ; j < bitboard_count ; j++)
			{
				struct bitboard *bitboard = bitboards + j;
				if (results[i] < results[i - 1]) break;
				if ((bitboard->color == &white) == white_turn) continue;
				if (!(bitboard->self & *bitboard->color & any)) continue;
				(*bitboard->move)(bitboard);
			}
			
			if (results[i - 1] < results[i])
				results[i - 1] = results[i],
				indices[i - 1] = indices[i];
		}
		else
		{
			indices[i] = i;
			results[i] = -infinity;
			for (int j = 0 ; j < bitboard_count ; j++)
			{
				struct bitboard *bitboard = bitboards + j;
				if (results[i] > results[i - 1]) break;
				if ((bitboard->color == &white) == white_turn) continue;
				if (!(bitboard->self & *bitboard->color & any)) continue;
				(*bitboard->move)(bitboard);
			}
			
			if (results[i - 1] > results[i])
				results[i - 1] = results[i],
				indices[i - 1] = indices[i];
		}
		
		white_turn = !white_turn;
		i--;
	}
	
	for (int y = 0 ; y < 8 ; y++)
	for (int x = 7 ; x >= 0 ; x--)
	{
		any <<= 1;
		pawns <<= 1;
		knights <<= 1;
		bishops <<= 1;
		rooks <<= 1;
		queens <<= 1;
		kings <<= 1;
		white <<= 1;
		black <<= 1;
		
		char piece = array[y * 8 + x];
		switch (piece)
		{
			default:
				return 1;
			case ' ':
				continue;
			case 'R':
			case 'r':
				rooks |= 1;
				break;
			case 'N':
			case 'n':
				knights |= 1;
				break;
			case 'B':
			case 'b':
				bishops |= 1;
				break;
			case 'Q':
			case 'q':
				queens |= 1;
				break;
			case 'K':
			case 'k':
				kings |= 1;
				break;
			case 'P':
			case 'p':
				pawns |= 1;
				break;
		}
		
		any |= 1;
		if (piece < 'a')
			white |= 1;
		else
			black |= 1;
	}
	
	for (int i = 0 ; i < 64 ; i++)
	{
		if (!((any>>i) & 1)) continue;
		
		struct bitboard *bitboard = bitboards + bitboard_count;
		bitboard_count++;
		bitboard->self = ((uint64_t) 1) << i;
		if ((pawns >> i) & 1)
		{
			bitboard->group = &pawns;
			if ((white >> i) & 1) bitboard->move = &move_initial_white_pawn;
			if ((black >> i) & 1) bitboard->move = &move_initial_black_pawn;
		}
		if ((knights >> i) & 1) bitboard->group = &knights, bitboard->move = &move_knight;
		if ((bishops >> i) & 1) bitboard->group = &bishops, bitboard->move = &move_bishop;
		if ((rooks >> i) & 1) bitboard->group = &rooks, bitboard->move = &move_rook;
		if ((queens >> i) & 1) bitboard->group = &queens, bitboard->move = &move_queen;
		if ((kings >> i) & 1) bitboard->group = &kings, bitboard->move = &move_king;
		if ((white >> i) & 1) bitboard->color = &white, bitboard->other = &black;
		if ((black >> i) & 1) bitboard->color = &black, bitboard->other = &white;
	}
	
	next();
	
	if (countdown != NULL) *countdown = indices[i];
	return results[0];
}

RedBedHed · Post by **RedBedHed** » Wed Jul 27, 2022 9:50 pm

I don't know if anyone has already mentioned it, but there are so many great bitwise hacks in the book Hacker's Delight. These hacks will help you get the most out of your bitboard implementation!

https://github.com/lancetw/ebook-1/blob ... dition.pdf

The most critical operations used in bitboard programming are:

& : intersection
| : union
~ : complement
- : two's complement
^ : XOR to replace slower stuff like & ~ when possible. Also great for scrambling bits and making keys.
>> or << : right shift, left shift. Can be slow, but great for generating tables of bitboards and doing necessary stuff like pushing pawns!

Iteration over high bits is accomplished like so:

Code: Select all

for(u64 x = bb; x; x &= x - 1) 
{
u8 square = bitScanForward(x);
// Do something with square !
}

All permutations of a masked section of a bitboard is accomplished with the ripple-carry

Code: Select all

// cycle through all permutations of the masked bits in a 64-bit integer, x.
u64 x = 0;
do
{
 // Do stuff with x
  x = (x - mask) & mask;
 // x = ((x | ~mask) + 1) & mask;
} while (x);

Some twiddling tricks!

Code: Select all

// Isolate the least significant set bit -> 1011 -> 0001
x & -x; // Note the two's complement
// Pop the least significant set bit -> 1011 -> 1010
x & x - 1;
// Pop the least significant string of set bits -> 1011 -> 1000
x & x + 1;
// absolute value of x
x - ((x << 1U) & (x >> 31U));
// If x is a power of 2... n % x
n & (x - 1);
// is x a power of two?
x && !(x & (x - 1));

Find the number of high bits in a bitboard!

Code: Select all

// How many set bits are in the 64 bit integer x?
// Use the formula high bit count = 
// x - (SUM from n=0 to n=inf (x >> n))!
// Count bits in each 4-bit section.
u64 n =
   (x >> 1U) & 0x7777777777777777UL;
x -= n;
n = (n >> 1U) & 0x7777777777777777UL;
x -= n;
n = (n >> 1U) & 0x7777777777777777UL;
x -= n;
// add 4-bit sums into byte sums.
x = (x + (x >> 4U)) & 0x0F0F0F0F0F0F0F0FUL;
// Add the byte sums.
x *= 0x0101010101010101UL;
return (u8)(x >> 56U);

Isolate the most significant bit!

Code: Select all

// isolate the most significant bit of an unsigned 
// 64-bit integer, x, without branching.
  x |= x >> 1U;
  x |= x >> 2U;
  x |= x >> 4U;
  x |= x >> 8U;
  x |= x >> 16U;
  x |= x >> 32U;
  x = (x >> 1U) + 1;
// isolate msb of x with branching... 
// n holds the result. Ideal for small 
// integers or integers with sparse 
// number of bits.
u64 n(0);
for(u64 x = bb; x; n = x, x &= x - 1);
return n;

There are lots, lots more in the book. I also recommend looking at the hacks
on the chessprogramming website.

Also look into bit scan forward implementations and instructions for your hardware.

Also look into magic bitboards and PEXT instructions for your hardware.

Good luck and happy hacking!

How can I improve my engine’s speed?

How can I improve my engine’s speed?

Re: How can I improve my engine’s speed?

Re: How can I improve my engine’s speed?

Re: How can I improve my engine’s speed?

Re: How can I improve my engine’s speed?

Re: How can I improve my engine’s speed?

Re: How can I improve my engine’s speed?

Re: How can I improve my engine’s speed?

Re: How can I improve my engine’s speed?

Re: How can I improve my engine’s speed?