Using bitboards to store move lists

matthewlai · Post by **matthewlai** » Fri Aug 22, 2014 9:06 pm

elcabesa wrote:if I understood correctly:

staged generator:

generate moves of stage N and put them in a list.
order & iterate the moves
go to stage n+1

your generator

fast generate all the moves and store thme in bitboard

get the move of a stage from bitboard
order & iterate them
go to stage n+1

so you don't do the serialization of the moves in case you don't need them

I don't think it will be faster of a staged move gen.
I don't think it will be easier than a stage move enerator if you serialize the move you need before ordering them

but it's an interesting try.

I solved the difficult of writing a staged movegen writing a full move list generator and then transforming it in a staged one simply using function TEMPLATE.
you can look at the code of you.

That is a correct description.

It can potentially be faster than staged gen if the move generator do some free ordering (promotions before captures before non-captures, etc), but I am not convinced it will actually make a significant difference. The aim is simplicity, and still be as fast as a staged generator.

One thing that is easier with this approach is taking out hash moves, killers, and history moves.

With a staged generator, those moves must be carefully excluded from the remaining moves if they don't fail high. Searching those nodes twice probably won't really have a performance impact since they would still be in TT, but it makes things like LMR more complicated.

With my approach, we can just look for those moves in the list, and if we find them, we know the move is pseudo-valid, and we can turn off the appropriate bit for the piece then.

This way there would still be stages, but the stages are fewer. Hash move, killers, and histories no longer need their own stages, since all moves are "generated" in the beginning. They can just be taken out of the stages they would otherwise be in.

The search also doesn't need a separate promotions stage, because it can be implicit in the ordering of moves generated.

In the boring moves stage (which isn't usually sorted), if a move fails high, rest of the moves in that stage don't need to be serialized. This is where it can potentially perform better than staged generation, but I'm guessing this doesn't happen all that often.

The only "stage" that really requires extracting all moves and sorting is the captures stage (and in this stage it's as fast as a staged generator). Every other stage can be extracted 1 move at a time.

mcostalba · Post by **mcostalba** » Sat Aug 23, 2014 9:09 am

matthewlai wrote: What do you think?

I thought about this a couple of years ago, and I discarded the idea because of problems in move sorting and ordering (yes, I read your replies....what I've said still stands).

The only good thing that you can get from here is to filter out killers and friends at move generation instead of move validation. For instance quiets moves in SF are validated in this way:

Code: Select all

      case QUIETS_1_S1&#58; case QUIETS_2_S1&#58;
          move = &#40;cur++)->move;
          if (   move != ttMove
              && move != killers&#91;0&#93;.move
              && move != killers&#91;1&#93;.move
              && move != killers&#91;2&#93;.move
              && move != killers&#91;3&#93;.move
              && move != killers&#91;4&#93;.move
              && move != killers&#91;5&#93;.move&#41;
              return move;
          break;

Insead of doing this for each move you can setup a single bitboard of all killers moves and pass to quiet move generator, like:

Code: Select all

endQuiets = end = generate<QUIETS>&#40;pos, moves, killersBB&#41;;

With this is possible to filter out all the forbidden moves in one go at bitboard stage, so very fast, and to avoid the above multiple checks, so perhaps if you are lucky you get a speed up.

I can try to write something along this line if I find the motivation.

ZirconiumX · Post by **ZirconiumX** » Sat Aug 23, 2014 11:57 am

I suppose this could be made even faster by using direction-wise move generation, although working out the original piece would be fun.

DirGolem would be a good template, although you'd probably be looking at pseudo-legality rather than full legality, as checking full legality can be delayed.

Matthew:out

matthewlai · Post by **matthewlai** » Sat Aug 23, 2014 9:24 pm

mcostalba wrote:
matthewlai wrote: What do you think?
I thought about this a couple of years ago, and I discarded the idea because of problems in move sorting and ordering (yes, I read your replies....what I've said still stands).

The only good thing that you can get from here is to filter out killers and friends at move generation instead of move validation. For instance quiets moves in SF are validated in this way:
Code: Select all
      case QUIETS_1_S1&#58; case QUIETS_2_S1&#58;
          move = &#40;cur++)->move;
          if (   move != ttMove
              && move != killers&#91;0&#93;.move
              && move != killers&#91;1&#93;.move
              && move != killers&#91;2&#93;.move
              && move != killers&#91;3&#93;.move
              && move != killers&#91;4&#93;.move
              && move != killers&#91;5&#93;.move&#41;
              return move;
          break;
Insead of doing this for each move you can setup a single bitboard of all killers moves and pass to quiet move generator, like:
Code: Select all
endQuiets = end = generate<QUIETS>&#40;pos, moves, killersBB&#41;;
With this is possible to filter out all the forbidden moves in one go at bitboard stage, so very fast, and to avoid the above multiple checks, so perhaps if you are lucky you get a speed up.

I can try to write something along this line if I find the motivation.

That's good to know.

Do you know approximately how much time (percentage) is spent on move generation in SF?

I am just starting a rewrite of my engine, so I am making all the design decisions now, and I am not sure if it's worthwhile to really optimize move generation. For example, how much actual gain is there in using pseudo-legal vs legal, and staged vs single stage?

The bitboards for killers idea, does it work? Don't both source and destination squares have to match? If you used 2 bitboards, wouldn't it become all combinations of sources and destinations?

ZirconiumX · Post by **ZirconiumX** » Sat Aug 23, 2014 10:07 pm

matthewlai wrote:
mcostalba wrote:
matthewlai wrote: What do you think?
I thought about this a couple of years ago, and I discarded the idea because of problems in move sorting and ordering (yes, I read your replies....what I've said still stands).

The only good thing that you can get from here is to filter out killers and friends at move generation instead of move validation. For instance quiets moves in SF are validated in this way:
Code: Select all
      case QUIETS_1_S1&#58; case QUIETS_2_S1&#58;
          move = &#40;cur++)->move;
          if (   move != ttMove
              && move != killers&#91;0&#93;.move
              && move != killers&#91;1&#93;.move
              && move != killers&#91;2&#93;.move
              && move != killers&#91;3&#93;.move
              && move != killers&#91;4&#93;.move
              && move != killers&#91;5&#93;.move&#41;
              return move;
          break;
Insead of doing this for each move you can setup a single bitboard of all killers moves and pass to quiet move generator, like:
Code: Select all
endQuiets = end = generate<QUIETS>&#40;pos, moves, killersBB&#41;;
With this is possible to filter out all the forbidden moves in one go at bitboard stage, so very fast, and to avoid the above multiple checks, so perhaps if you are lucky you get a speed up.

I can try to write something along this line if I find the motivation.
That's good to know.

Do you know approximately how much time (percentage) is spent on move generation in SF?

2% during search. The biggest hit is the evaluation function at 27%.

For example, how much actual gain is there in using pseudo-legal vs legal

A reasonable amount, since legality checking is only performed on moves actually made.

and staged vs single stage?

From my experience, this depends on your board representation. Bitboard programs gain a lot more from this than square-centric programs like Fruit, since bitboard programs can simply mask out quiets, whereas square-centric programs end up iterating across quiet moves anyway.

The bitboards for killers idea, does it work?

I suppose we'll find out.

Don't both source and destination squares have to match?

Yes.

If you used 2 bitboards, wouldn't it become all combinations of sources and destinations?

If you had enough moves to be excluded, yes.

Matthew:out

mcostalba · Post by **mcostalba** » Sun Aug 24, 2014 11:51 am

mcostalba wrote: I can try to write something along this line if I find the motivation.

Ok, the good news is that I have found a way to test this 'killers bitboards' idea and it kicks in as expected: in 97% of cases we are able to quickly check the quiet move and only in 3% of cases we fall back on the full original validation.

Branch quiet_fast_validate is here:

https://github.com/mcostalba/Stockfish/ ... d6a22d56e5

The bad news is that it does not speed up anything. Actually it seems even a slow down of almost 2%!

This was unexpected, hard to say what's going on, perhaps the CPU branch predictor is able to do a good job on the original code so that the impact of all that conditions is very low, anyhow difficult to say without a deep profiling.

Michael Sherwin · Post by **Michael Sherwin** » Wed Sep 17, 2014 8:49 am

matthewlai wrote:Just had a funny idea for pseudo-legal bitboard engines.

Most of the move generation time is spent in extracting bits out of bitboards, and pushing them onto a list.

That effort is wasted if an early move causes a cutoff.

Staged move generation deals with this, but is complicated, and sometimes results in redundant work (depending on implementation).

What if instead of the move generator returning a list of moves, it returns a list of pairs of pieces, and possible destinations in a bitboard?

That way, most of the work (extract the bit) is done only if the move is used, and most of the simplicity of single-stage generation is retained, and there is almost no redundant work at all.

Downside is sorting becomes harder (besides selection sort).

What do you think?

RomiChess uses this method. However, only for regular search and not for qsearch. There seemed no way to do it for qsearch. Probably because of immediate sorting in qsearch. Too long ago for me to remember. In full search it was possible to extract certain moves out before having to spin the rest into a list for sorting. So if one of the extracted moves caused a cut then no spinning and sorting needed.

Can doing it this way really speed up a search? I suppose that qsearch speed matters magnitudes more? Well engines that use their hash moves in qsearch might get a huge speed up. Just thought of that.

Using bitboards to store move lists

Re: Using bitboards to store move lists

Re: Using bitboards to store move lists

Re: Using bitboards to store move lists

Re: Using bitboards to store move lists

Re: Using bitboards to store move lists

Re: Using bitboards to store move lists

Re: Using bitboards to store move lists