Caching generated moves list in recursive searches

Rein Halbersma · Post by **Rein Halbersma** » Sun May 10, 2015 8:47 pm

mcostalba wrote:
Rein Halbersma wrote:Are there any engines that cache the generated move list? What I mean with that is that during internal iterative deepening, multi-probe cut, PVS and LMR re-searches, the position from which is being re-searched is the same one at the current ply level inside a recursive search() function.
In SF move list is allocated on the stack so you can't really do this without copying the moves to a permanent storage.

Maybe you could use a global storage as backup to generate the moves, but this seems quite complex to me, considering also SMP case and the fact that you have to allocate in advance MAX_MOVES * MAX_PLY size storage.

Actually, I was thinking of simply storing two ExtMove pointers inside the StateInfo class, and to let the move generator check these pointers against NULL before generating moves. If the moves have been generated before, return a pair of pointers to the begin/end of that storage. If they haven't, generate them, and return two pointers to the beginning and end of that range.

Something like this (untested)

Code: Select all

struct StateInfo &#123;
    ExtMove* first;
    ExtMove* last;
    
    // as before
&#125;;

void Position&#58;&#58;do_cache_movelist&#40;ExtMove* first, ExtMove* last&#41; &#123;
    st->first = first;
    st->last = last;
&#125;

void Position&#58;&#58;do_move&#40;Move m, StateInfo& newSt, const CheckInfo& ci, bool moveIsCheck&#41; &#123;
  // as before
  
  // this is also already there, but after this statement...
  st = &newSt; 

   // ... you can reset the status to "uncached" by putting in null pointers
  do_cache_movelist&#40;nullptr, nullptr&#41;; // undo_move will restore to previous StateInfo with the cached movelist
  
  // as before
&#125;

template<>
std&#58;&#58;pair<ExtMove*, ExtMove*> cached_generate<LEGAL>&#40;const Position& pos, ExtMove* first&#41; &#123;
  if &#40;p.st->last == nullptr&#41; // haven't generated moves for this position before
      p.cache_movelist&#40;first, generate<LEGAL>&#40;p, first&#41;);
  return std&#58;&#58;make_pair&#40;p.st->first, p.st->last&#41;;
&#125;

You can also easily adapt your MoveList struct to this type of caching (you have to extract the first/second data members of std::pair). The rest of the code (search in particular) should remain untouched. There is also no locking or other SMP interference. Just two extra pointers to keep track of off, and the rest remains the same.

hgm · Post by **hgm** » Sun May 10, 2015 8:56 pm

Doesn't one usually pass a 'node type' argument anyway?

If you are worried about the cost of passing arguments in the recursive call, it would be better to go to an iterative implementation, where you don't have to pass anything at all, and every node can dig directly into the stack.

Rein Halbersma · Post by **Rein Halbersma** » Sun May 10, 2015 8:57 pm

syzygy wrote:How many percent of the nodes are we talking about anyway?

Saving a bit work at no cost is great, but carrying around extra arguments between function calls for saving a minor bit of work in 1% or maybe 0.1% of all nodes will not help.

Well, any node where you do recursive IID, or where probCut fails low (the search continues at the same ply level), or where you have a PVS null-window fail-high & re-search, or where you have a LMR fail-high & re-search. in those cases, you'd like to re-use the generated movelist. If that's < 1%, then it's a minor thing.

syzygy · Post by **syzygy** » Sun May 10, 2015 10:10 pm

hgm wrote:Doesn't one usually pass a 'node type' argument anyway?

If you are worried about the cost of passing arguments in the recursive call, it would be better to go to an iterative implementation, where you don't have to pass anything at all, and every node can dig directly into the stack.

This is only about speed. If the "fix" costs more than it brings, it is not a fix.

hgm · Post by **hgm** » Sun May 10, 2015 10:25 pm

But if you pass the node-type parameter anyway, there is no cost.

syzygy · Post by **syzygy** » Sun May 10, 2015 10:29 pm

Rein Halbersma wrote:
syzygy wrote:How many percent of the nodes are we talking about anyway?

Saving a bit work at no cost is great, but carrying around extra arguments between function calls for saving a minor bit of work in 1% or maybe 0.1% of all nodes will not help.
Well, any node where you do recursive IID, or where probCut fails low (the search continues at the same ply level), or where you have a PVS null-window fail-high & re-search, or where you have a LMR fail-high & re-search. in those cases, you'd like to re-use the generated movelist. If that's < 1%, then it's a minor thing.

I may be wrong, but more than 1% seems unlikely. And you have to make sure that you actually generated all moves (e.g. the LMR search might have jumped directly into the qsearch, or there might have been a hash hit), unless you want to keep track of even more details.

Some engines might destruct the move list, for example overwriting hash moves and killer moves that were already tried by moves from the end of the list (before sorting the history moves). When the position is researched, the hash move and killer moves might have changed and you have a problem.

So to me it seems a bit of a hassle that could restrict implementation of move generation in undesirable ways. But maybe I'm too pessimistic.

Evert · Post by **Evert** » Mon May 11, 2015 12:56 am

In SjaakII the move list is allocated on the heap rather than the stack (at startup, essentially indexed as movelist[ply][move]). I do this because I don't know how many moves I would need to store (this being very variant dependent) so I dynamically enlarge the buffer if needed. The resizing is slow of course, but after the first few times it happens rarely.

Anyway, because it isn't stored on the stack, I figured I would try to reuse the movelist from the reduced deepening search. Surprisingly, this turned out to perform worse than the original. I don't understand why though, since it should be a simple speed optimisation. It may change move ordering and therefore node counts, but I would epect it to be Elo neutral at worst. It's always possible I messed this up somehow though, so I may give it another try.

zd3nik · Post by **zd3nik** » Mon May 11, 2015 1:58 am

Rein Halbersma wrote:Are there any engines that cache the generated move list? What I mean with that is that during internal iterative deepening, multi-probe cut, PVS and LMR re-searches, the position from which is being re-searched is the same one at the current ply level inside a recursive search() function.

For iterative searchers (I think HGM once wrote that he does not use recursion for IID) there is a direct savings, but I wonder if there are any engines that pass a move list to their search() function. Stockfish e.g. does not use this kind of optimization (except to pass around an excluded move inside its Stack* parameter). The easiest way to implement this would probably be to give search() a pair of pointers to generated moves (or a pointer and a count). I wonder if anyone has ever tried this.

Probably about half of the engines I've written use a move generation cache. I always use a pre-allocated node stack and each node in the stack always has a pre-allocated move buffer. All that is needed to turn those move buffers into "caches" is the addition of a movegen key at each node. The movegen key is the position hash key plus movegen type key (type = all moves or quiescence moves, etc), plus movegen stage if the engine uses a multi-stage move generator. I'm over simplifying the staged movegen case, but hopefully this is enough info to get the idea across.

When movegen is called on a node, if the requested movegen key matches the key of the last completed movegen then the last moves generated can be re-used. This is very easy to implement and the added overhead is insignificant if your engine uses a pre-allocated node stack with dedicated move buffers. And depending on your search framework the scores applied to those cached moved can be very useful! I think movegen caching saves more in terms of move scoring and sorting than it saves in move generation.

But beware! Movegen caching can go horribly wrong if done incorrectly and is VERY difficult to debug. Perft is your friend! Use it often and on as many test positions as you can get your hands on.

All that being said, movegen caching of this kind generally only kicks in when a parent node does a re-search, so the benefit is likely pretty small since re-searches don't occur very often. I have not done extensive comparisons to determine whether the approach provides enough benefit to be worth the effort. But I have done enough verification testing (years ago) to know it doesn't slow things down.

I've also tried H.G. Muller's idea of having the re-search logic moved down a node to eliminate the need for caching in order to avoid regenerating moves when a re-search occurs. But that made the code too complex for my taste.

I hope this was informative. And good luck with your engine(s)!

STC

Rein Halbersma · Post by **Rein Halbersma** » Tue May 12, 2015 9:10 pm

syzygy wrote:
Rein Halbersma wrote:
syzygy wrote:How many percent of the nodes are we talking about anyway?

Saving a bit work at no cost is great, but carrying around extra arguments between function calls for saving a minor bit of work in 1% or maybe 0.1% of all nodes will not help.
Well, any node where you do recursive IID, or where probCut fails low (the search continues at the same ply level), or where you have a PVS null-window fail-high & re-search, or where you have a LMR fail-high & re-search. in those cases, you'd like to re-use the generated movelist. If that's < 1%, then it's a minor thing.
I may be wrong, but more than 1% seems unlikely. And you have to make sure that you actually generated all moves (e.g. the LMR search might have jumped directly into the qsearch, or there might have been a hash hit), unless you want to keep track of even more details.

Some engines might destruct the move list, for example overwriting hash moves and killer moves that were already tried by moves from the end of the list (before sorting the history moves). When the position is researched, the hash move and killer moves might have changed and you have a problem.

So to me it seems a bit of a hassle that could restrict implementation of move generation in undesirable ways. But maybe I'm too pessimistic.

I just did a quick instrumentation of the Stockfish move generator. In the StateInfo I store two ExtMove* and the GenType (this is a template parameter of the Stockfish move generator). I then measure the number of times the cached_movegen finds that the GenType matches the already stored value. I use an extra STALE enum value after each do_move(). Note that I do not skip the actual move generation yet, this is just a profiling run. I test this on the 37 positions from the "bench" pseudo-UCI command. The result is that 1.4% of all positions could benefit from cached move generation:

Code: Select all

===========================
Total time &#40;ms&#41; &#58; 5481
Nodes searched  &#58; 8918745
Nodes/second    &#58; 1627211

movelist cache queries  &#58; 7120440
movelist cach hits      &#58; 106155
hit rate &#40;per 1000&#41;     &#58; 14

But you are right, actually caching the generated moves is considerably involved, because Stockish uses a pretty intricate MovePicker object that has pointers to regular moves, to quiescence moves, and the next() member will also reorder (through sorting, partitioning and swapping) these moves. I don't know whether this is done in a stable manner, in the sense that multiple passes over these moves will give the same answer. And the scoring will also depend on the History table, and caching these scores as well will change the benchmark number, so then it's not just a matter of a simple speed test, but the functional change will have to go throgh the usual ELO testing.

syzygy · Post by **syzygy** » Tue May 12, 2015 10:11 pm

Rein Halbersma wrote:But you are right, actually caching the generated moves is considerably involved, because Stockish uses a pretty intricate MovePicker object that has pointers to regular moves, to quiescence moves, and the next() member will also reorder (through sorting, partitioning and swapping) these moves. I don't know whether this is done in a stable manner, in the sense that multiple passes over these moves will give the same answer. And the scoring will also depend on the History table, and caching these scores as well will change the benchmark number, so then it's not just a matter of a simple speed test, but the functional change will have to go throgh the usual ELO testing.

If SF at least leaves the complete list of moves intact as it goes through the moves (which e.g. my engine does not), it should be possible to create a separate MovePicker state that simply walks through the old list. But then you won't profit from improved move ordering which is the whole point at least of IID.

Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches

Re: Caching generated moves list in recursive searches