Thought bitboards was faster :-)

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Daniel Anulliero
Posts: 759
Joined: Fri Jan 04, 2013 4:55 pm
Location: Nice

Thought bitboards was faster :-)

Post by Daniel Anulliero »

Hi all

Some month ago , I rewrote my engine Isa with a bitboard move generator (classical approach , no magics)
I wrote it from scratch , for a bitboard learning purpose (as I'm a newbie in the strange bitboards world)
My original Isa started from firstchess.c .

After some aspirin ( :-) ) I've finaly got a working version .

The movegen is good , validated by perft , so no problem.

But the problem is :
It is slower than my classical array based engine Isa
Look at the perft 6 results from the starting position :

Code: Select all

Isa Bitboards:

Perft 1	nodes = 20    0 ms
Perft 2	nodes = 400    0 ms
Perft 3	nodes = 8902    0 ms
Perft 4	nodes = 197281    78 ms
Perft 5	nodes = 4865609    1685 ms
Perft 6	nodes = 119060324    33119 ms

Code: Select all

Isa normal :

Perft 1	nodes = 20    0 ms
Perft 2	nodes = 400    0 ms
Perft 3	nodes = 8902    0 ms
Perft 4	nodes = 197281    50 ms
Perft 5	nodes = 4865609    1327 ms
Perft 6	nodes = 119060324    28533 ms

I've created a modified version of my classical Isa with the same feature as Isa bitboards
Tried to start to play at 5 seconds per moves
There is not much diference , but again the Isa bitboards is slower :

Code: Select all

Isa bitboards : 

d  score time  nodes          PV 
1  50     0       20         b1c3
2  0      0       158       b1c3 b8c6
3  50     0       710       b1c3 b8c6 g1f3
4  0       0       1488     b1c3 b8c6 g1f3 g8f6
5  35     0       10202    g1f3 b8c6 b1c3 g8f6 d2d4
6  0       0       39670    b1c3 b8c6 d2d4 g8f6 g1f3 d7d5
7  24     0       480679   d2d4 g8f6 g1f3 d7d5 b1c3 b8c6
8  0      4        2151544  b1c3 b8c6 g1f3 d7d5 d2d4 g8f6 c1f4 c8f5

Code: Select all

Isa normal : 

d  score time  nodes           PV 
1  50     0       20         b1c3
2  0       0       87         b1c3 b8c6
3  50     0       589        b1c3 b8c6 g1f3
4  0       0       2114      b1c3 b8c6 g1f3 g8f6
5  35     0       13279     b1c3 b8c6 g1f3 g8f6 d2d4
6  0       0       36612     b1c3 b8c6 g1f3 g8f6 d2d4 d7d5
7  26     0       237742    e2e4 d7d5 e4e5 b8c6 g1f3 d5d4 b1a3
8  2      2        806140    e2e4 b8c6 g1f3 g8f6 e4e5 f6g4 d2d4 d7d5

My code is here :

https://github.com/Dany1962/Isa-bitboards

As I wrote previously , I'm a newbie in bitboards , so , it would be kind if some bitboards experts could look at my code

Thanks in advance
Isa download :
User avatar
Bo Persson
Posts: 243
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: Thought bitboards was faster :-)

Post by Bo Persson »

Bitboards are not automagically faster, but most often open up some possiblities for further optimizations.

For example, your function

Code: Select all

void set_bit(U64 *pbitboard, int y, int type)  
{
	U64 x = pbitboard[type];
		
	x = (BIT[y] | x);
	
	pbitboard[type] = x;	
}
could be simplified into

Code: Select all

   pbitboard[type] |= (U64)1 << y;
which would then benefit *hughley* from being inlined. :)

Also, in one place you have noticed that a queen has things in common with rooks and bishops:

Code: Select all

//fou
U64 mask_bishop[64];
U64 mask_b_no[64];
U64 mask_b_ne[64];
U64 mask_b_so[64];
U64 mask_b_se[64];
//tour
U64 mask_rook[64];
U64 mask_r_n[64];
U64 mask_r_s[64];
U64 mask_r_e[64];
U64 mask_r_o[64];
//dame = tour + fou
There are many other symmetries that can be used to reduce the number of bitbords stored. For example, if you have black pieces and white pieces stored, it is very easy to compute occupied or empty squares on demand.

By pushing this, I have reduced your 17 bitmaps into just 7:

Code: Select all

      // By piece-type

      BitBoard   Pieces[5];

      static constexpr auto Pawns = 0;
      static constexpr auto Kings = 1;
      static constexpr auto Knights = 2;
      static constexpr auto DiagSweepers = 3;
      static constexpr auto AngularSweepers = 4;

      // By color

      BitBoard   AllPieces[2];

      static constexpr auto Dark = 0;
      static constexpr auto Lite = 1;

and then use little helper functions like

Code: Select all

      // Positional board info

      inline BitBoard LitePieces() 
      { return AllPieces[Lite]; }

      inline BitBoard DarkPieces() 
      { return AllPieces[Dark]; }

      inline BitBoard Occupied() 
      { return LitePieces() | DarkPieces(); }

      inline BitBoard EmptySquares() 
      { return ~Occupied(); }

      // Pawns

      inline BitBoard LitePawns() 
      { return Pieces[Pawns] & LitePieces(); }

      inline BitBoard DarkPawns() 
      { return Pieces[Pawns] & DarkPieces(); }

jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Thought bitboards was faster :-)

Post by jdart »

I'd recommend magic bitboards for movegen if possible.

Btw. intrinsics such as __builtin_ctzll(x) are not portable (this assumes a gcc or compatible compiler). C++20's <bit> header is one way to be more portable.
asanjuan
Posts: 214
Joined: Thu Sep 01, 2011 5:38 pm
Location: Seville, Spain

Re: Thought bitboards was faster :-)

Post by asanjuan »

jdart wrote: Thu Feb 11, 2021 10:29 am I'd recommend magic bitboards for movegen if possible.

Btw. intrinsics such as __builtin_ctzll(x) are not portable (this assumes a gcc or compatible compiler). C++20's <bit> header is one way to be more portable.
I agree. I think that the most difficult part is generating sliding pieces moves, and magic bitboards are much faster telling you where the pieces can go with a simple multiplication.
Still learning how to play chess...
knigths move in "L" shape ¿right?
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Thought bitboards was faster :-)

Post by Henk »

Nothing magic about it. Only using a perfect hash table.
Why call perfect hashing magic? I already forgot how it worked by the way.
Uses current occupancy to compute the perfect hashkey.
Use the hashkey to look up the precomputed moves for a square.

To compute the hashkey it uses magic array with a precomputed magic number per square.
So that is where magic comes from. These magic number are created by trial and error I remenber.
Map an ulong into [0 ..63]


I remember I had to implement perfect hashing at school in first year or so.
But that's too long ago. So should be basic knowledge.
Last edited by Henk on Thu Feb 11, 2021 3:09 pm, edited 15 times in total.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Thought bitboards was faster :-)

Post by Joost Buijs »

Magic bitboards are somewhat faster than the old Hyatt style rotated bitboards, but the difference is not very big.

I don't know on what kind of hardware you ran perft(), but the results look slow indeed. For instance my magic bitboard implementation runs perft 6 in 0.77 seconds and perft 7 in 19,8 seconds, this is with bulk-counting. Without bulk-counting it is: 1.76 seconds and 46.6 seconds, tested on a 3800 MHz. i9-10908XE.

It is not so easy to tell by having a short look at the source what causes the slowness, but there certainly must be room for improvement.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Thought bitboards was faster :-)

Post by Henk »

Henk wrote: Thu Feb 11, 2021 2:40 pm
Map an ulong into [0 ..63]
O wait that is not correct. As I said I don't remember how magic bitboards work.
Probably map current occupancy into moves per square.
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Thought bitboards was faster :-)

Post by lithander »

Joost Buijs wrote: Thu Feb 11, 2021 2:45 pm I don't know on what kind of hardware you ran perft(), but the results look slow indeed. For instance my magic bitboard implementation runs perft 6 in 0.77 seconds and perft 7 in 19,8 seconds, this is with bulk-counting. Without bulk-counting it is: 1.76 seconds and 46.6 seconds, tested on a 3800 MHz. i9-10908XE.
That's interesting to compare to. I'm not using bitboards but an 8x8 array of pieces my perft 6 takes 16 seconds on a 4.2Ghz Ryzen 3600. Good to know that there's a lot of room for improvement with a less basic forward move generator.

Just to make sure: That's how fast you can generate legal moves for tree search, too? It's not just spitting out a perft count but actually does generate all the 119,060,324 moves?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Patrice Duhamel
Posts: 193
Joined: Sat May 25, 2013 11:17 am
Location: France
Full name: Patrice Duhamel

Re: Thought bitboards was faster :-)

Post by Patrice Duhamel »

Outside move generation Bitboards are also usefull (ex: mobility in evaluation function).
jdart wrote: Thu Feb 11, 2021 10:29 am Btw. intrinsics such as __builtin_ctzll(x) are not portable (this assumes a gcc or compatible compiler). C++20's <bit> header is one way to be more portable.
Using compiler explorer (I don't have latest MSVC installed), it seems MSVC add a test for popcount, and other <bit> functions.

Code: Select all

int bitcount(uint64_t b)
{
    return std::popcount<uint64_t>(b);
}

Code: Select all

int bitcount(unsigned __int64) PROC
cmp     DWORD PTR __isa_available, 2
jl      int std::_Popcount_fallback<unsigned __int64>(unsigned __int64)
popcnt  rax, rcx
ret     0
int bitcount(unsigned __int64) ENDP 
Anything that can go wrong will go wrong.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Thought bitboards was faster :-)

Post by Joost Buijs »

lithander wrote: Thu Feb 11, 2021 5:17 pm
Joost Buijs wrote: Thu Feb 11, 2021 2:45 pm I don't know on what kind of hardware you ran perft(), but the results look slow indeed. For instance my magic bitboard implementation runs perft 6 in 0.77 seconds and perft 7 in 19,8 seconds, this is with bulk-counting. Without bulk-counting it is: 1.76 seconds and 46.6 seconds, tested on a 3800 MHz. i9-10908XE.
That's interesting to compare to. I'm not using bitboards but an 8x8 array of pieces my perft 6 takes 16 seconds on a 4.2Ghz Ryzen 3600. Good to know that there's a lot of room for improvement with a less basic forward move generator.

Just to make sure: That's how fast you can generate legal moves for tree search, too? It's not just spitting out a perft count but actually does generate all the 119,060,324 moves?
Well that is what my Perft without bulk-counting does, it generates al the the moves checks legality plays the moves on the board and takes the moves back. With bulk-counting it only checks legality and counts the moves. Perft is a small function, it fits in the cache that's why it is fast.

You probably mean how fast the move-picker in my tree search is, it's been a while since I timed this, if I remember it well it does around 25 mnps. The whole three search is much slower though, probing the TT, SEE and evaluation take a lot of time, with my HCE the engine runs at 2.5 to 3.5 mnps on a single core, with NN evaluation it is around 1.3 to 1.7 mnps.