An approach to precomputed move generation bitboards

Michael Sherwin · Post by **Michael Sherwin** » Tue Dec 01, 2015 2:58 am

Using the following data setup ...

#define s32   signed __int32
#define u64 unsigned __int64 

typedef struct &#123;
  s32 type;
  s32 i;
  u64 moves    &#91;200&#93;;
  u64 capts    &#91;200&#93;;
&#125; piece_s;

typedef struct &#123;
  s32       wtm;
  s32       board&#91;64&#93;;
  piece_s   piece&#91;33&#93;;
  history_s history&#91;400&#93;;
&#125; thread_s;

s32 iBoard&#91;64&#93; = &#123;13, 9,11,15,16,12,10,14,
                   1, 2, 3, 4, 5, 6, 7, 8,
                   0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0,
                  17,18,19,20,21,22,23,24,
                  29,25,27,31,32,28,26,30&#125;;

... it would work like this. Using a board of piece indexes a stack of bitboards for moves and one for captures can be accessed using the stack pointer 'i'. The bitboards for each piece are precomputed and placed in their stack. Whenever a piece is moved that bit is zeroed, the stack pointer is incremented and its bitboards are generated. Pieces that are affected also haft to have this done. A bit_list of affected pieces can be maintained so this work can be done when needed avoiding unnecessary work at the leaves.

I'm curious if this approach is different than other precomputed move ideas. And what is thought about its efficiency. Also do my data structures look correct for an SMP implementation? Thanks

Michael Sherwin · Post by **Michael Sherwin** » Tue Dec 01, 2015 4:17 am

A similar approach that I am toying with that really is quite different. Is bitboard on demand using classical bitboard rays. It would require a way to lookup which ray a move is on.

Code: Select all

typedef struct &#123;
  s32 type;
  s32 i;
  u32 direction&#91;200&#93;; // 00000000000000000000000011111111, For Queen 
  u64 moves    &#91;200&#93;;
 &#125; piece_s;

Say you want to try the hash move first. Check in direction if that direction has been generated. If not generate that ray and OR it into moves verify move and remove bit. For generating all moves when moves has no more bits set then check direction for next direction and repeat. If an early cutoff is found at a leaf node you can save alot of time?

Which approach is better?

stegemma · Post by **stegemma** » Tue Dec 01, 2015 9:37 am

Michael Sherwin wrote:Using the following data setup ...
Code: Select all
#define s32   signed __int32
#define u64 unsigned __int64 

typedef struct &#123;
  s32 type;
  s32 i;
  u64 moves    &#91;200&#93;;
  u64 capts    &#91;200&#93;;
&#125; piece_s;

typedef struct &#123;
  s32       wtm;
  s32       board&#91;64&#93;;
  piece_s   piece&#91;33&#93;;
  history_s history&#91;400&#93;;
&#125; thread_s;

s32 iBoard&#91;64&#93; = &#123;13, 9,11,15,16,12,10,14,
                   1, 2, 3, 4, 5, 6, 7, 8,
                   0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0,
                  17,18,19,20,21,22,23,24,
                  29,25,27,31,32,28,26,30&#125;;
... it would work like this. Using a board of piece indexes a stack of bitboards for moves and one for captures can be accessed using the stack pointer 'i'. The bitboards for each piece are precomputed and placed in their stack. Whenever a piece is moved that bit is zeroed, the stack pointer is incremented and its bitboards are generated. Pieces that are affected also haft to have this done. A bit_list of affected pieces can be maintained so this work can be done when needed avoiding unnecessary work at the leaves.

I'm curious if this approach is different than other precomputed move ideas. And what is thought about its efficiency. Also do my data structures look correct for an SMP implementation? Thanks

I've done something similar in my differential move generator, in one of the past versions of Satana. I've used a single stack of moves and a pointer for any piece, to the moves itself. Anytime the piece was moved or another piece "intercept" one of its moves rays, the engine updates the pointer to moves and generates new ones. At the end, it was not faster than generating all the moves from scratch.

Aleks Peshkov · Post by **Aleks Peshkov** » Tue Dec 01, 2015 10:08 am

The chess pieces need 1) attacked squares bitboard and 2) legal moves squares bitboard.
The latter can be generated from the former easily for all pieces except pawns.

I prefer to keep all data for each chess side in separate struct. My bitboards for each side are relative to its own home rank, so pawns move generation and all other functions and data tables are colorless. There is no need keep the color of the side to move information anywhere inside search.

Karlo Bala · Post by **Karlo Bala** » Tue Dec 01, 2015 3:24 pm

Michael Sherwin wrote:Using the following data setup ...
Code: Select all
#define s32   signed __int32
#define u64 unsigned __int64 

typedef struct &#123;
  s32 type;
  s32 i;
  u64 moves    &#91;200&#93;;
  u64 capts    &#91;200&#93;;
&#125; piece_s;

typedef struct &#123;
  s32       wtm;
  s32       board&#91;64&#93;;
  piece_s   piece&#91;33&#93;;
  history_s history&#91;400&#93;;
&#125; thread_s;

s32 iBoard&#91;64&#93; = &#123;13, 9,11,15,16,12,10,14,
                   1, 2, 3, 4, 5, 6, 7, 8,
                   0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0,
                  17,18,19,20,21,22,23,24,
                  29,25,27,31,32,28,26,30&#125;;
... it would work like this. Using a board of piece indexes a stack of bitboards for moves and one for captures can be accessed using the stack pointer 'i'. The bitboards for each piece are precomputed and placed in their stack. Whenever a piece is moved that bit is zeroed, the stack pointer is incremented and its bitboards are generated. Pieces that are affected also haft to have this done. A bit_list of affected pieces can be maintained so this work can be done when needed avoiding unnecessary work at the leaves.

I'm curious if this approach is different than other precomputed move ideas. And what is thought about its efficiency. Also do my data structures look correct for an SMP implementation? Thanks

Here is one simple and nice approach:
http://talkchess.com/forum/viewtopic.ph ... 84&t=30542

Desperado · Post by **Desperado** » Tue Dec 01, 2015 9:28 pm

Hello, Michael,

currently i try to implement bitboard attacks in an incremental way too.
so, if we ignore the implementation details for the moment, we can focus on some more general questions in this context (imo).

For simplicity let's say we talk about attacks for the current piece locations.

1. Let's assume further we have a pretty fast (like magic bitboards) AttackGetter at hand.

1a.

Magic bitboards are finally an index computation with a lookup (parallel in context to lines).
A simplified view on this makes clear that storing different bitboards on some stack won't improve
the basic computation time just by having another lookup approach.
(with some overhead to update the bitboards on the stack after making a move and not to forget to revert the attacks again)

1b.

Anyway, if we manage to get some speed gain within the "node activities", that means that the incremental overhead must be
equalized and our incremental lookups must be faster to some degree. The better the utilization of the once computed set of attacks is,
the higher the speed gain will be.

1c.

By default there are two places where the attacks are used. First, the move generation part, which as you already pointed out,
can be disabled at leave nodes, to save some time. But you won't disalbe the "incremental" attack computation because of the second
component, the evaluation. Each leave node will be evaluated, so this is the place where you really might need your attack sets.
Finally, you save some time in a perft frame, but the real engine needs the attack computation for evaluation too.

2. Where might the speed gain come from (thinking of implementation contents) ?

2a.

Non sliders aren't affected as long the piece location does not change. Sliders may be affected
by move types. The general algorithm might look like this to determine necessary attack updates:

* reset the attack information for the source square of a move
* compute the attack for the destination square of a move

* mask the diagonals for the source square and check for bishops and queens
* mask the straight lines for the source square and check for rooks and queens
* mask the diagonals for the destination square and check for bishops and queens
* mask the straight lines for the source square and check for rooks and queens.

* compute/update the attacks for the checked bishops/rooks/queens.

En passent/Castling/Promotion moves need some minimal refinements for the masking part.

Well, at this point the question arises and it needs to be checked, if a simple computation for all pieces is really slower on average,
than to determine which pieces needs to be updated. But i still talk of a one time computation after a makeMove operation.

2b

Depending on the implementation, this update needs to be reverted which requires an analog computation like 2a or a restore of some data.
If you restore "Attackmaps" you will shift many (a lot more) bytes over the search and the copy operation will take a lot of time too.

2c

The best case is, that sliders aren't affected and you simply reset the attacks for the source and compute one attack for the destination square,
while having all attacks at hand. This is a minimal computation but finally you still look it up (as you do with magics for example)

2d

Finally i don't see a good reason to use incremental attack computation for "attack from squares"(simple piece locations).
I you use a less optimal AttackGetter like Magic Bitboards, there a different other ways like "caching attack sets" which improve the speed,
to get a better performance.

3. But anyway, there is potential from another view point imho.

There are different kinds of attacks you might be interested in. These are about, from which location/piece the current square is attacked,
how many times it is attacked and so on. The use is the bool squareIsAttacked() function (incheck) or pin computations or attack counts for a square,
these more complex computations can be solved in one run too and can be retrieved at any time at the current node. The base to compute and
store these attacks are the attackFrom sets of course. The idea starts with:

Having a new node:
1. Compute all piece attacks(attackFrom) (optional: store the attacks to look them up, maybe this is not necessary if your AttackGetter is magic like)
2. Compute the "complex" attacks out of it one time and retrieve the information on demand.

(At this point i solved my own problem LOL, thx

)

So, if the option is to use direct computations (AttackGetter functions) without storing them you just need to think foward in the tree.
Find a representation to keep the "complex" attack sets. If then the use is stack based, there is no need for copy/revert operations.
I guess to handle it with make/unmake would be to expensive then.

Just my 2 cents.

Michael Sherwin · Post by **Michael Sherwin** » Wed Dec 02, 2015 7:14 am

(At this point i solved my own problem LOL, thx Wink )

Your welcome

Yes, I did not consider needing to generate moves/attacks in/for the evaluation. So basically if all that move generation work has to be done at the leaves then basically the savings lower in the tree will not have a large impact. I've often wondered if positional evaluation could be done a few nodes from the leaves and then just do a material balance search to the leaves. Then the incremental approach combined with time savings could work together super synergistically.

Thanks for your detailed reply!

Michael Sherwin · Post by **Michael Sherwin** » Wed Dec 02, 2015 7:20 am

stegemma wrote: At the end, it was not faster than generating all the moves from scratch.

Seems to be a recurring observation. Thanks

Aleks Peshkov · Post by **Aleks Peshkov** » Wed Dec 02, 2015 1:17 pm

I have a matrix PieceIndex(16) x SquareBitsOnTheSingleChessRank.
So I have to split attacks bitboard into 8 bytes when updating my attack_matrix.

1) But on the other end: AttacksTo() function is simple reading the 16-byte vector (fits one XMM register) and a few very simple bit operations.

2) To gather all attacked squares by all pieces of one side into one bitboard is also very simple function.

Cardoso · Post by **Cardoso** » Wed Dec 02, 2015 1:57 pm

Hi,
assuming one has a reasonable move generator, that doesn't take a huge ammount of time to compute, then going from that to a move generator that would take zero time to compute wouldn't make any ELO difference IMO.
The evaluation function usually dominates the search time as you already know.

best regards,
Alvaro

An approach to precomputed move generation bitboards

An approach to precomputed move generation bitboards

Re: An approach to precomputed move generation bitboards

Re: An approach to precomputed move generation bitboards

Re: An approach to precomputed move generation bitboards

Re: An approach to precomputed move generation bitboards

Incremental bitboard attack computations

Re: Incremental bitboard attack computations

Re: An approach to precomputed move generation bitboards

Re: An approach to precomputed move generation bitboards

Re: Incremental bitboard attack computations