Skipping duplicat moves

hgm · Post by **hgm** » Sun Dec 03, 2017 11:30 pm

My engines often search hash moves or killers before move generation has been done. That leads to the annoying complication that duplicats of these moves will be generated later. To prevent those from being searched a second time, you would have to compare each move against a hash move and two killers before you search it. Or when you generate it. (With the risk that you never would get to it because of a cutoff.)

In micro-Max I take the lazy approach, and rely on the duplicats causing a hash hit on their initial search result. That leads to redundant make/unmake and hash probe for one move (micro-Max doesn't use killers), but make/unmake is very fast (mailbox without piece list), and the hash entry should still be cached from the previous search of the move. So you won't lose much compared to doing a simpler test (than a hash probe) on all the moves.

But this starts to fail under large hash pressure. And it would of course not work at all without hash tables. (Where this is still an issue, because the move of a previous IID iteration will serve as 'hash move' in the next iteration, and you still have to deal with killers.)

Another problem is that I often use an IID scheme that first verifies if a beta cutoff in a non-final iteration will hold all the way up to the requested depth, before stepping up the iteration depth. (So that when the move fails low after all, the search for an alternative can continue at the lower depth, preserving the advantages of IID.) This creates another situation where a move has already been sufficiently searched. E.g. in the d=2 iteration of a d=6 node, where the move still seemed to cut at d=5, but then failed low at d=6. The IID then resumes the d=2 iteration, and after that finishes without a cutoff, it will revisit the move for d=6 (or perhaps d=4), while redoing searches at those depth is pointless, as the d=6 result was already obtained. Again, you can hope for the hash hit on the 'over-deep' result to save the day.

But now I am programming for a machine where hash tables are unaffordable (32KB RAM). So I cannot ignore the problem anymore. Move lists are still affordable, though. So the IID problem could be solved by keeping a depth and score with each move in the move list, to hold the deepest result obtained for it so far. This acts as a sort of private cache for the hash entries of all daughters, which canot be overwritten, and the parent could cheaply probe that table before embarking on a search of the move. It would not help against moves that are in the list twice. And of course it makes some overhead to initialize the extra info in the move list.

This consideration, plus the fact that the machine I am writing for has difficulty crossing (256-byte) memory page boundaries, and handling data types longer than 8 bits, made me look for a more compact idea than a move list. Finally I came up with the following:

In a give position, each move can be encoded in a single byte. This can be done reasonably efficiently, if from-square, to-square and the piece number (the index in the piece table) of the moved piece are all known. (As they must be at the start of MakeMove.) It just requires two lookups in small tables:

moveCode = stepCode[fromSqr - toSqr] + pieceCode[pieceNr];

A move along (say) a file needs only 7 codes to be unique, although the distance moved can range from -7 to +7, because for any location at most 7 of these 14 possible targets can be on the board. So a Queen needs 4*7 = 28 codes to unambiguously identify her move, and a Rook or Bishop only 14. Pawns need 4 (because of the double push), and Knights and King 8. The numbering of the step vectors can be done such that all the codes for Knight jumps, as well as white and black Pawn moves are contiguous, and the King moves only have a single hole (so it spans 9 codes). Rooks and Bishops are perfectly complementary, so no matter how you assign the stepCodes, a Rook and Bishop with the same pieceCode will never have any of their moves collide; it is just encoded like a Queen, and depending on whether the described move is orthogonal or diagonal you kow if the Rook or Bishop was meant. A full FIDE army needs only 141 codes, and even allowing an extra Queen gives only 169.

So a 256-byte page could hold this no problem. But memory is not that abundant, and I would need to hold at least a two-byte score and a depth. And usually there would be only 30-40 moves in the position, and usually only a small fraction of those would have been searched deeper than the current depth iteration requested. (And at most 3 would be hash or killer.) So it seems better to map the move code into a 128-entry table. Only 13 entries then could suffer a collision, and by picking the codes in a clever way it can be arranged that Pawn moves collide with distant (4-step) slider moves. Pawns only rarely have their capture moves, and on a board full of Pawns sliders usually do not move very far.

Still, there must not be any ambiguity, so the collisions must be resolvable by a signature in the table, for which we could use the from-square. Perhaps we can even afford to use a 64-byte table that way. This would typically map 2 (and rarely 3) moves to the same entry. One memory page could then hold 64 entries consisting of from-square, depth and 2-byte score.

Such a table could have another interesting application, namely storing counter moves. When we immediately deepen cut moves until they fail low or reach final depth, the first IID iteration will either fail high (to full depth), in which case we are done, or fail low after searching all moves (some possibly to higher depth). In the latter case each move would have been refuted by a cut move in the respective daughter. We could make the daughter return that cut move, and store it in the table. And then, in the next IID iteration, pass it to the daughter as hash move.

This would require us to add a fifth byte to the table entries, though. (The packed move encoded in the daughter position could be stored there.) This is a bit awkward. But each ply level would need some local variables anyway, in addition to this private hash cache, and there should be 64 spare bytes in that second page. Then each ply level would use two 256-byte memory pages, which is affordable.

Note that on entering the node, it would find the table as another node at the same ply level would have left it. We'd rather not spend time on clearing the table. This seems to be possible through the following code:

Code: Select all

Search&#40;alpha, beta, depth, hashMove&#41;
&#123;
  int8 moveDepth&#91;256&#93;; int16 moveScore&#91;256&#93;;
  int mCode, hCode;
  iterDepth = 0;
  keyMove = 64;
  moveDepth&#91;hCode&#93; = 0; // clear any hash move we might have inherited
  while&#40;iterDepth++ < depth&#41; &#123; // IID
    if&#40;hashMove&#41; &#123; // we have a 'hash move' &#40;from previous iter, or PV&#41;
      MakeMove&#40;hashMove&#41;;
      d=iterDepth-1;
      while&#40;d++ <depth&#41; &#123;
        score = -Search&#40;-beta, -alpha, d-1, NOMOVE&#41;;
        depthReached = d;
        if&#40;score < beta&#41; break;
      &#125;
      UnMake&#40;);
      if&#40;score > alpha&#41; &#123;
        alpha = score;
        if&#40;score >= beta&#41; return beta;
      &#125; else hashMove = NOMOVE; // it failed low, don't keep it      
      mCode = hCode = Pack&#40;hashMove&#41;;
      moveDepth&#91;mCode&#93; = depthReached + keyMove; // label as hash move; d could be > iterDepth
      moveScore&#91;mCode&#93; = score;
    &#125;
    for&#40;ALL_MOVES&#41; &#123;
      mCode = Pack&#40;move&#41;;
      oldDepth = moveDepth&#91;mCode&#93;;
      if&#40;oldDepth >= keyMove&#41; &#123; // in first iter, keyMove = 64 and only the hash move can beat that
        moveDepth&#91;mCode&#93; &= ~64; // clear the hash marker, but leave its depth
        score = moveScore&#91;mCode&#93;; // use the previously calculated result for the duplicat or previously over-deep searched move
      &#125; else &#123; // on first iter, all other moves than hash go here
        MakeMove&#40;move&#41;;
        d = iterDepth-1;
        while&#40;d++ < depth&#41; &#123;
          score = -Search&#40;-beta, -alpha, d-1, NOMOVE&#41;;
          depthReached = d;
          if&#40;score < beta&#41; break;
        &#125;
        UnMake&#40;);
        if&#40;depthReached > iterDepth&#41; &#123; // move has had an extra deep 'preview'
          moveDepth&#91;mCode&#93; = depthReached; // remember that for next iter
          moveScore&#91;mCode&#93; = score;
        &#125; else
           moveDepth&#91;mCode&#93; = 0; // this makes sure we overwrite any uninitialized stuff
     &#125;
      if&#40;score > alpha&#41; &#123;
        alpha = score;
        if&#40;score >= beta&#41; return beta; // stop iterating after cutoff
        hashMove = move; // found new best move
      &#125;
    &#125; // iteration failed low
    keyMove = 0; // all data in table is now from this node; allow its use
  &#125;
&#125;

This has very little overhead; calculating the mCode is not more work than comparing with hash and killers to see if we have a duplicate (or remove those from the move list). If mCode is hashed to a smaller index, there should be some more work for vetting a hit, but I care little how much it costs to treat a hit, as it saves an enormous amount of work by skipping a search. What worries me is the overhead on misses, what you do on every move for no payback.

hgm · Post by **hgm** » Mon Dec 04, 2017 10:08 am

The bare-bones idea underlying all this is of course to have a table indexed by move (e.g. like the history table), for each ply level, to be used for marking moves that have already been done before in that node. Like

Code: Select all

char alreadyDone&#91;MAXPLY&#93;&#91;64*64&#93;;

When you enter search the table can be assumed to be empty (i.e. all zeros). When then searching hash or killers before move generation, you would then flag that through code like

Code: Select all

alreadyDone&#91;ply&#93;&#91;hashMove&#93; = 1;

The move loop would then contain the code

Code: Select all

if&#40;alreadyDone&#91;ply&#93;&#91;move&#93;) &#123; alreadyDone&#91;ply&#93;&#91;move&#93; = 0; continue; &#125;

before the MakeMove(). You would just have to make sure to always clean up before leaving the node, by clearing the flags you set for hash and killers. This because you might never get to the duplicat (which would also clear it) because of a beta cutoff through a move ecountered earlier.

This allows testing for duplication of a number of moves with just one simple test of the alreadyDone flags, which needs to be done at the latest possible time (maximizing the chaces it wouldn't have to be done at all). That is a lot more efficient than running through the move list in advance, to hunt for the duplicat and remove it.

The rest of the story is just dressing it up, to reduce the memory requirement by packing the move to be used for table index, and to also use it as a poor-man's substitute for a hash table.

Steve Maughan · Post by **Steve Maughan** » Mon Dec 04, 2017 3:00 pm

Hi H.G.

Is it really a significant saving to search the hash and killer moves before generating the move list? I don't do it in Maverick. I simply generate all moves and then order them. When I last looked the move generation routine didn't take much time at all. I decided to for simplicity.

Steve

hgm · Post by **hgm** » Mon Dec 04, 2017 3:52 pm

The problem with that line of reasoning is that something that is not a significant saving now, might become significant after the time taken by all those other things has been reduced to nearly zero (each through savings that were not significant by themselves either) can become very significant. If 100 tasks each take 1% of the time, speeding op one of them 10-fold will only give me a 0.9% speed increase. But after I have addressed all 100, I would still be 10 times faster. You got to start somewhere...

When you say your move generation does not take much time, does that include the sorting? Sorting in general is a slow process, even if you manage to do it as an O(1) process (which I think no one manages, BTW). Looping through some 40 moves to hunt for a hash move and killers does seem a non-negligible effort.

The savings I am aiming for is not having to do any sorting, and in fact not even have a move list. Also, doing d=4, 5, and 6-ply searches (say) that were not needed because the move was already searched to 6 ply on the d=3 iteration would be quite expensive.

Steve Maughan · Post by **Steve Maughan** » Mon Dec 04, 2017 4:07 pm

It's all done as part of the "move_ordering". Here's the relevant code:

Code: Select all

void order_moves&#40;struct t_board *board, struct t_move_list *move_list, int ply&#41; &#123;

    struct t_move_record *hash_move = move_list->hash_move;
    struct t_move_record *move;

    struct t_move_record *killer1 = board->pv_data&#91;ply&#93;.killer1;
    struct t_move_record *killer2 = board->pv_data&#91;ply&#93;.killer2;
    struct t_move_record *killer3 = NULL;
    struct t_move_record *killer4 = NULL;

	t_chess_color color = board->to_move;


    struct t_move_record *refutation = NULL;

    if &#40;ply > 1&#41; &#123;
        killer3 = board->pv_data&#91;ply - 2&#93;.killer1;
        killer4 = board->pv_data&#91;ply - 2&#93;.killer2;
		board->pv_data&#91;ply&#93;.killer3 = killer3;
		board->pv_data&#91;ply&#93;.killer4 = killer4;
	&#125;
	else&#123;
		board->pv_data&#91;ply&#93;.killer3 = NULL;
		board->pv_data&#91;ply&#93;.killer4 = NULL;
	&#125;

	for &#40;int i = move_list->count - 1; i >= 0; i--)
	&#123;
		move = move_list->move&#91;i&#93;;
		assert&#40;move&#41;;
		if &#40;move == hash_move&#41; &#123;
			move_list->value&#91;i&#93; = MOVE_ORDER_HASH;
		&#125;
		else if &#40;move->captured&#41; &#123;
			move_list->value&#91;i&#93; = move->mvvlva + MOVE_ORDER_CAPTURE - COLOR_RANK&#40;color, move->to_square&#41;;
		&#125;
		else if &#40;move == killer1&#41; &#123;
			move_list->value&#91;i&#93; = MOVE_ORDER_KILLER1;
		&#125;
		else if &#40;move == refutation&#41; &#123;
			move_list->value&#91;i&#93; = MOVE_ORDER_REFUTATION;
		&#125;
		else if &#40;move == killer2&#41; &#123;
			move_list->value&#91;i&#93; = MOVE_ORDER_KILLER2;
		&#125;
		else if &#40;move == killer3&#41; &#123;
			move_list->value&#91;i&#93; = MOVE_ORDER_KILLER3;
		&#125;
		else if &#40;move == killer4&#41; &#123;
			move_list->value&#91;i&#93; = MOVE_ORDER_KILLER4;
		&#125;
		else &#123;
			move_list->value&#91;i&#93; = move->history;
		&#125;
    &#125;
&#125;

hgm · Post by **hgm** » Mon Dec 04, 2017 7:16 pm

And this takes an insignificant time? It seems a significant amount of work has to be done here, with 40 moves in an average position. And that is just the tip of it; the next thing would be to extract the hash move from the move list, for which you would have to run through the list again. Only then you get the first opportunity for a cutoff. And if there is no cutff, you would have to run through the list again and again, for extracting the next move in line.

Note that there are two ways to interpret the observation that a certain task in an engine "only takes insignificant time". It could also mean the rest of the engine is excruciatingly slow...

What are the most time-consuming tasks in Maverick?

AlvaroBegue · Post by **AlvaroBegue** » Mon Dec 04, 2017 7:17 pm

In RuyDos I just fish out the hash move after generating captures (if the hash move was a capture) or after generating non-captures (if the hash move was not a capture).

For killers, I actually generate the non-captures first, then find them in the list and move them to the front.

I believe this scheme is fast enough. Obsessing over speed is probably a bad idea.

hgm · Post by **hgm** » Mon Dec 04, 2017 7:50 pm

Well, anythig you don't have to do at all is a gain both in terms of speed and simplicity. And when the projected speed is around 300 nodes/sec, speed is reasonably important.

Sven · Post by **Sven** » Mon Dec 04, 2017 11:19 pm

Considering that the hash move already produces a cutoff in many cases (say, at 90% of all full-width nodes) which means that there will be no move generation at all with that "try hash move before generating moves" approach, the time spent to remove the hash move from the move list is really negligible. It is like doing 20 comparisons on average at about 10% of all full-width nodes (or all nodes including qsearch if you use the TT hash in qsearch as well), so something like 2 extra comparisons per node on average. That *is* something you can ignore safely since you will have a hard time to find any faster alternative with the same functionality.

hgm · Post by **hgm** » Tue Dec 05, 2017 11:35 am

OK, that is a good point. All-nodes will in generally have no hash move, and the problem would not arise. So only the cut-nodes have to be considered. The calculation hinges on postponing the hash-move removal until after the hash move has been searched, though.

This suggests the best approach would be to explicitly test whether the next move in line happens to be a duplicat of the hash move, and skip it if it is. Then the test overhead is maximally delayed, and every cutoff occurring before it would relieve you of making the test for the remaining moves. The only downside is that you would then also make this test on all moves in all-nodes (where it always fails, because the hash move was invalid), unless you use different search routines for all-nodes and cut-nodes (or actually for nodes with hash move and without). You would not have that problem if you would locate it as a separate loop through the move list in the part of the code that searches the hash move, after you established that it did not cut. (And have a call to move generation from there.)

This still would not solve the problem with the IID, however. If during one of the IID iterations a particular move was deepened beyond the iteration depth, because it seemed to be a cut move, but in the end did not live up to the promise, it would still rely on the hash hit to prevent duplicat searches of that move in later iterations. Which is usually fine if you have a hash table, but rather costly if you have none (or a too-heavily-loaded one). If you would have a method for solving that, it would solve the duplicat hash and killer problem as a free side effect.

So I still like the idea of 'caching' all daughter hash entries in the local variables of the node, where they can never be overwritten. Keeping this info can be very affordable even in situations where memory is too cramped for a general TT. It only takes something like 256-512 bytes per ply. It seems important to keep at least the cut-move of the daughters for all daughters, because it is even important when it was obtained at a lower depth than the now requested one, for use as hash move. So even if in one IID iteration in an all-node did not search any of the moves deeper then necessary to establish their fail low, the cut-moves would still be used in the next iteration.

In fact you could get even more mileage out of this: very often the situation will occur during search of late moves in an all-node at ply N, the moves have similar refutations (e.g. through the same killer at ply N+1). This means that the all-node granddaughters at ply N+2 will be searched immediately after each other, at that level, and that the second one would thus inherit the cached info on the daughters of its cousin. So counting from the node at ply N, there would be a counter-move table for the move sequences

pointless - killer - X - anti-X

and we are now in a position

pointless' - killer - X

It seems there is an appreciable chance here that the move anti-X would do a good job at refuting X here as well. I am not sure it deserves to be tried as hash move, but it should definitely be eligible as killer. This seems to be dependent on the move at ply N+1 being the same. So it seems a good idea to store with the counter-move table which move was used to reach the position that created it. If a cousin then inherits that table, and finds it was reached by the same move, as for which the table was made, it could try to use the moves in that table, passing the relevant counter move for what it tries to its daughter. This is a sort of 'hyper-killer' scheme that inherits killers not from sibblings but from grand-cousins (is that the proper term?), in a way that keeps the two preceding ply equal (and only the ply before that different). This could be useful even in engines that do have a hash table, as normal hash probing would not provide such information. (Of course 'analogy probing', where you keep probing the tree of a sibbling late move in addition to your own, could provide it, even for deeper plies.)

Skipping duplicat moves

Skipping duplicat moves

Re: Skipping duplicat moves

Re: Skipping duplicat moves

Re: Skipping duplicat moves

Re: Skipping duplicat moves

Re: Skipping duplicat moves

Re: Skipping duplicat moves

Re: Skipping duplicat moves

Re: Skipping duplicat moves

Re: Skipping duplicat moves