lucasart wrote:Gusev wrote:
Code: Select all
#include <bitset>
typedef std::bitset<NUM_SQUARES> Bitboard;
That's intersting. Is there really zero cost in using std::bitset<64> instead of using uint64_t ? I'm always very suspicious about these obfuscated template libraries, but perhaps I'm wrong.
In
this other thread, I wrote about a
class template wrapper around a 64-bit integer that allows STL style iteration (begin(), end(), ++, *, etc.). The
assembly code has 0% overhead compared to raw 64-bit integer code.
The operations that are critical on a bitboard are:
- get/set/test the i-th bit. I already see that bitset::set() is a variable parameter function, so I'm more than worried about the performance compared to a simple b |= (1ULL << i);
- all bitwise operations (logical ones, but also shifts, and multiplication for magics). How can a general purpose bitset template library do the >> and << efficiently ? (these operations don't make any sense from a set point of view)
For almost every high-quality implementation of the STL, the general std::bitset<N> template is implemented in terms of a basic_bitset<Nwords, N> class template, where Nwords is the number of machine words that is used as the underlying raw data. Using partial template specialization, the case Nwords=1 is defined separately. In that case, all the get/set/test and <<, >> member functions are one-liners that are guaranteed to be inlined by any modern compiler. You should expect native performance for them. However, I don't know if every STL implementation has a specialization for 64-bit builds so that you get the same performance as for native 64-bit integers.
- lsb(), msb(), popcnt(): those must use portable assembly (compiler intrinsics) to achieve any decent performance. I really doubt that a general purpose STL could do that.
Has anyone used bitset in a chess engine, and looked into performance ?
The original SGI implementation (pre 1998) of the STL
std::bitset had no compiler intrinsics but lookup tables for each byte. The modern
gcc/libstdc++ and
Clang/libc++ implementations use __builtin_ctzl / __builtin_popcountl and memset to get native performance for bit-twiddling.
The major inconvenience for std::bitset is that their are no iterators. The SGI and gcc implementations have non-standard find_first() and find_next() member functions (with intrinsic bitscan for gcc) that allow efficient looping over all 1-bits of a bitset. But for MSVC++ and Clang/libc++ you are out of luck. Even so, I think my class template wrapper with special iterators begin()/end() that can be passed to any non-modifying STL algorithm (std::copy, std::is_sorted, std::for_each etc.) allow you a real boost in convenience and abstraction with virtually no performance penalty.