Hashed repetition table

jwes · Post by **jwes** » Sun Sep 04, 2016 3:12 am

I am implementing a repetition table as hashed and chained and wonder if anyone else has tried this?
My layout is HASHKEY[255], where the first 128 entries are hash buckets and the other 127 are used for chained entries. I use the low 7 bits for chaining, 0 for no chain or 1-127 as array indices for array 128-254.
I reset the table on irreversible moves at the root. Since a repetition table is like a stack, I simply save the old value and index and rewrite it on undo.
The advantages are that it nearly always it only takes one comparison to determine if there is a repetition and I don't need to (can't) worry about irreversible moves in the search.
The disadvantage is that the code is more complex and slower per comparison.

bob · Post by **bob** » Sun Sep 04, 2016 5:48 am

jwes wrote:I am implementing a repetition table as hashed and chained and wonder if anyone else has tried this?
My layout is HASHKEY[255], where the first 128 entries are hash buckets and the other 127 are used for chained entries. I use the low 7 bits for chaining, 0 for no chain or 1-127 as array indices for array 128-254.
I reset the table on irreversible moves at the root. Since a repetition table is like a stack, I simply save the old value and index and rewrite it on undo.
The advantages are that it nearly always it only takes one comparison to determine if there is a repetition and I don't need to (can't) worry about irreversible moves in the search.
The disadvantage is that the code is more complex and slower per comparison.

Bruce Moreland (Ferret) is at least one I know that used a small hash table for repetitions. The only down-side is that this approach is a bit problematic when it comes to a parallel search.

hgm · Post by **hgm** » Sun Sep 04, 2016 12:21 pm

I often use a hash table with linear rehash for this. Linear rehash is not the most efficient method, but it requires very little modification of the code compared to the usual running through a stack. The only difference is that you start not at the top but at an index derived from the hash key. (And wrap at the end of the table,although you can also simply provide a few overflow entries there.) You run through the table until you encounter a match or an empty entry, and in the latter case you know where to store (and remember that for the UnMake). Efficiency of rehashing is basically a non-issue if the table is not nearly completely filled, and it is not very expensive to make the table at least twice as large as what could be reasonably needed. Add-the-hash rehash would not be nearly as simple to implement, however.

What I store in the table (apart from a signature) is the ply level at which the position occured. I can then compare the current ply level minus the stored level with the reversible move counter. Then I only have to reset the latter on an irreversible (or null) move, while the information in the table stays valid.

Note that you can revisit positions if you don't consider null-move-spanning repetitions a draw. So when encountering a match from before the last null move you would have to save the old level so you can UnMake.

thomasahle · Post by **thomasahle** » Sun Sep 04, 2016 11:51 pm

You could also use a slightly larger table, say 1,000,000 entries, but of booleans so the size would still be small. Given that you never have more than 100 elements in the table, the chance of ever having a collision is 0.5%. This way you don't have to do the comparison, but can just check for a hash collision.

mar · Post by **mar** » Sun Sep 04, 2016 11:56 pm

thomasahle wrote:You could also use a slightly larger table, say 1,000,000 entries, but of booleans so the size would still be small. Given that you never have more than 100 elements in the table, the chance of ever having a collision is 0.5%. This way you don't have to do the comparison, but can just check for a hash collision.

In the past I used to have such hash table for repetitions => results were disastrous in the case of a collision.
Iterating a small list linearly can be faster and most importantly robust; similar thing happens with sorting: if n is small then a simple insertion sort outperforms quicksort or similar theoretically ~n log n algorithms.

thomasahle · Post by **thomasahle** » Mon Sep 05, 2016 12:23 am

mar wrote:In the past I used to have such hash table for repetitions => results were disastrous in the case of a collision.

Why would it be worse than a collision in the TT? Neither are checked and both can give very wrong scores. If you make the collision probability as small as for the TT how can it be an issue?

mar wrote:Iterating a small list linearly can be faster

I don't disagree in general, but here the comparison was apparently a performance issue.

Cardoso · Post by **Cardoso** » Mon Sep 05, 2016 4:42 pm

thomasahle wrote:
mar wrote:In the past I used to have such hash table for repetitions => results were disastrous in the case of a collision.
Why would it be worse than a collision in the TT? Neither are checked and both can give very wrong scores. If you make the collision probability as small as for the TT how can it be an issue?

mar wrote:Iterating a small list linearly can be faster
I don't disagree in general, but here the comparison was apparently a performance issue.

I've not measured anything but as Martin said iterating a small list linearly is fast. The cpu cache really works well in this case.
This iteration is of litle significance in terms of time usage compared with other things like the evaluation function.

Alvaro

hgm · Post by **hgm** » Mon Sep 05, 2016 4:59 pm

The time usage can become significant, however, if you have to iterate all the way to the start of the game everywhere.

It also depends on how you want to treat null-move-spanning repeats. If you just ignore those you can stop at the nearest null move, even in the absence of irreversible moves. If you want to do something special there, you would still be interested to know whether it is a repeat, even if you don't intend to declare it a draw.

Cardoso · Post by **Cardoso** » Mon Sep 05, 2016 10:41 pm

hgm wrote:The time usage can become significant, however, if you have to iterate all the way to the start of the game everywhere.

It also depends on how you want to treat null-move-spanning repeats. If you just ignore those you can stop at the nearest null move, even in the absence of irreversible moves. If you want to do something special there, you would still be interested to know whether it is a repeat, even if you don't intend to declare it a draw.

Agreed, just let's hope those pathological cases aren't the norm.
Anyway while there are still pawn moves that possibility is low, and when there are few pawns left on the board the game is usually decided.
But then again in some near endgame or endgame that can happen.

Alvaro

flok · Post by **flok** » Mon Sep 05, 2016 11:55 pm

What about an std::unordered_set<uint64_t> where you keep track of the hashes?
An unordered_set is O(1) afaik.

Hashed repetition table

Hashed repetition table

Re: Hashed repetition table

Re: Hashed repetition table

Re: Hashed repetition table

Re: Hashed repetition table

Re: Hashed repetition table

Re: Hashed repetition table

Re: Hashed repetition table

Re: Hashed repetition table

Re: Hashed repetition table