Repetition check

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Repetition check

Post by hgm »

A fast way to detect repetitions is to store the hash keys in a 256-entry hash table. With the 50-move rule that table will be at least half empty, so that on average you will find an empty entry in it after 2 tries. (You could do linear rehash in case of a collision.)

This works fine in Chess, but in Shogi it poses some problems. Simply stacking the keys and doing a linear search on the topmost elements (upto the last reversible move) also poses problem, though:

*) In games with piece drops, like Shogi and Crazyhouse, all moves are reversible. A captured piece gets in the hand, can be dropped, captured by the opponent, and dropped by him to re-appear on the original square. Pawns can be captured, and dropped again closer to the back rank. So you would always have to search back through the repeat stack to the very beginning of the game. This could be a significant load in QS. (You could still declare null moves in search as irreversible, though, to discard null-move-spanning repeats.)

*) In Shogi, the outcome of a repeat depends on the entire repeat cycle, not just on the repeated position. E.g. if one side is checking in every position along the cycle, that side loses. This makes it necessary to store at least some information on the positions on the stack, and identify the previous occurrence of the current position in the stack, so that you can loop through the top elements to determine the outcome.

*) There are pseudo-repeats, where the position on the board is the same, but the pieces in hand are different. It pays to recognize those, to prevent a side that is in trouble to use a material-burning pseudo-loop to push the trouble over the horizon.

Upto now my Shogi engine ignored these problems, and was still resetting the 50-move counter on captures, like the (Chinese) Chess program it was derived of. It then just stacked the keys and looped over them, as in search the 50-move counter never gets very high, (and in QS stays 0), while it needs to be at least 4 before you have to check anything. This would make it overlook repeat cycles involving captures and drops, and occasionally lost games because it started perpetually checking. Not very frequently, as such a loop takes at least 8 ply, but it did happen. Pseudo-repeats were not recognized at all, leading to bad horizon problems.

The solution I use now zeros the (32-bit) signature part of the Zobrist keys for the hand, which is also the part I store on the repeat stack. In the main hash table positions with different hand are still considered different, due to the holdings contributing to the address part of the key, but in the repeat stack the holdings do not contribute, so that pseudo-repeats will be flagged. In addition to the signatures I also stack the (incrementally updated) material evaluation. In case of a key match this evaluation will be the same if we have a true repeat, but different when the pieces not on the board are distributed in a different manner over the hand. If the difference is significant (like 2 Pawns in hand, where 1 Pawn was transferred to the opponent's hand), the side that is apparently burning material for no reason can be scored as losing.

The 50-move counter is not reset on captures (or you would never detect any pseudo-repeats at all), and the problem of having to search through the repeat stack to the beginning of the game is alleviated by only searching for keys that might be on the stack. To this end I keep a hash table of 256 byte-sized elements, indexed by the lowest 8 bits of the repeat key, storing the next-higher 8 bits. The entry to which the newly pushed key maps will be remembered on MakeMove(), and restored in UnMake(), but overwritten with the new key (part) in MakeMove. By comparing the old value in that table entry to the new key, you can see if the key could be in the stack, and not bother to search through the stack if it isn't.

There is a risk here, because you remember only the last key that mapped to the same entry. So there is a 1 in 256 chance (the hash-table size) that a position in the repeat loop happens to map to the same entry, and mask the repeated position at the start of the loop. You run that rsik for every intermediate position, and even in the shortest possible loop of 4, you run the risk 3 times. Too high for comfort, but I don't want to make the table much bigger, as I like it to fit in the L1 cache. There also is a 1 in 256 chance on a 'false positive', when there is already a key in the table that happens to have the same stored bits. Even in games that already last more than 256 moves, so they have a completely filled table, this has only a 1-in-256 chance (as we store 8 bits), and the worst that happens is that we do a linear search through the repeat stack in vain. This does not represent a significant average load.

To prevent keys mapping to the same hash-table entry masking each other, I divided the table in a btm and wtm part. That means that the chance one key will mask another will go up to 1:128, but in a loop of 4 there will only be one intermediate position with the same side to move, so the odds that the repeated position would be masked there is 1:128, rather than 3:256. A (too) small gain. So in stead of keeping only the (mini-)signature of last key that mapped to the entry, I keep the last two. That reduces the table to 64 pairs of entries ('buckets') per color, so that the chances on a collision are now 1:64, but only two collisions will cause masking. This makes masking in a loop of 4 totally impossible, while for a loop of 6 the chances that both intermediate positions (with same stm) will map to the same hash bucket as the repeating one will be only 1:4096. That seems small enough, especially if you take into account that repeats longer than 4 are comparatively rare.

So I only take the 6 lowest bits of the 32-bit signature key, used as repeat key, plus the stm bit, to identify a bucket in the repeat hash table, and store 8 other bits of that signature key there. The repeat test then consists of comparing the latter 8 bits to both stored values. In most cases this will indicate the key is not on the repeat stack, in which case we are done. Only 1:128 of the cases this will produce a false positive, and we will do a linear search through the repeat stack without finding the key. If there really is a repeat, almost always one of the two entries will match, and the linear search through the stack will find the key, or run into a null move (i.e. exhaust the 50-move counter before finding the matching key), after which we can examine the loop to decide on the outcome. The overhead consist not only of comparing with the two stored bytes, but also of calculating the table index, remembering one entry and moving up the other, storing the new one in MakeMove (and the reverse in UnMake). This all seems bearable, and is made up for by saving work on wasteful searches through the repeat stack that would not find anything.

Close to the root, where efficiency is of no concern, you could ignore the hash table, and do the full linear search, to have absolute accuracy.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Repetition check

Post by Sven »

That sounds very reasonable, although quite sophisticated. For me the key point you mentioned is that there are several games where most or all moves are reversible so that a linear backward search through the hash keys has no obvious speed advantage.

I understood that you currently use two hash key tables for wtm and btm. What would happen if you use more hash key lookup tables, e.g. via something like "(total piece count white+black) modulo N", with N being a reasonably small number like N=4 or N=8? That might help to reduce the effort of scanning the whole hash key table, especially during QS where the total piece counts change frequently, while it certainly has an additional cost to manage N tables of different dynamic length. Maybe some N can be found that performs better than N=1. Whether you combine that with still maintaining separate wtm+btm tables or not would have to be tested.

Sven
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Repetition check

Post by hgm »

Well, it would force you to keep track of the number of pieces, for one. Currently I am focusing on mini-Shogi, and most games are decided within 20 (full-)moves, so the linear search will not get too expensive. For normal Shogi games can easily be 100 moves, and there further improvements might be meaningful.

But I would be more inclined in that case to simply store the index of the occurrence of a key with that (mini-)signature in the repeat hash table, which would add another 256 bytes to it. (Assuming the repeat stack will not grow over 256 half-moves; or actually over 512 half moves, as the stm is implied, so you don't need to store the bit of the index that tells you if it is at an even or odd entry in the repeat stack.

So you would check two entries of the repeat hash for that stm to see if the key might be in the table, and if it is, you would fetch the index, and look at 2*index+stm (assuming stm = 0 or 1) in the repeat stack to find the full key. If that indeed is the 32-bit key you were looking for you found the (pseudo-)repeat. If it is another key that matched the mini-signature and hash entry number (8 + 6 bits), but differs in other bits, you know it was a false positive, and simply ignore it. If you find a key that did not match the 8+6 bits in the given position, you know that the repeat-stack index must have overflowed when you clipped it to 8 bits, and you can now step through the repeat stack in steps of 512 to hunt for an earlier occurrence of the repeat candidate. (Or you could assume it is a miss, as it seems rather unlikely that you would have repeat loops of more than 512 half-moves. But I guess you should really start looking at the most recent occurrence, i.e. at 2*index+stm+(repeatStackPointer & ~511) in that case.)