most similar hashes of two positions

mar · Post by **mar** » Thu Aug 13, 2015 11:10 pm

Zenmastur wrote:About a year ago I went through the posts looking for hash collision info. I thought I did a reasonably good search, but I don't remember seeing HGM giving an expose on the subject. Can you point out the discussion?

I'm pretty sure you could find it somewhere in the topics.
Certain people know exactly what they're talking about (while you make a fool of yourself

I had a "clever idea" about hamming distance and others (hgm, Evert, Alvaro) have shown that linear independence is much more important.

hgm · Post by **hgm** » Fri Aug 14, 2015 12:16 am

bob wrote:Only issue I would see is alignment. I might give it a try to see what it does to speed.

Indeed, and on some architectures it would not be allowed at all for that reason. But on modern Intel-architectures I discovered that there is no penalty at all, unless the word straddles a cache-line boundary.

wgarvin · Post by **wgarvin** » Fri Aug 14, 2015 1:40 am

hgm wrote:
bob wrote:Only issue I would see is alignment. I might give it a try to see what it does to speed.
Indeed, and on some architectures it would not be allowed at all for that reason. But on modern Intel-architectures I discovered that there is no penalty at all, unless the word straddles a cache-line boundary.

I remember some threads about that sort of thing, a few years ago.

For example:
If you have piece type P in [1..12] and square X in [0..63], and you read an 8-byte zobrist key from a byte offset (32*X + 2*P) in the table, then your read will never cross a 32-byte boundary and you won't pay any misalignment penalty on any of the popular Intel or AMD CPUs. That reduces the table size from 6 KB to 2 KB, perhaps reducing the number of L1 cache lines those table entries will occupy. As long as the bits are reasonably random, it shouldn't be any worse than random keys (but I don't know if anybody has actually verified it). You can even support big-endian builds that use the same key values for the cost of one negate/xor at the end, and for target hardware where the misaligned accesses are bad, just expand it out to a "full-size" 6KB table and you're no worse off than most engines are today.

wgarvin · Post by **wgarvin** » Fri Aug 14, 2015 2:27 am

Of course the L1 cachelines are 64 bytes on these architectures, not 32 bytes, but every 64-byte boundary is also a 32-byte boundary.

You might be able to overlap the keys every 8 bits instead of every 16 bits, but its pretty difficult to get a table smaller than 2 KB with no 64-byte boundary crossings without spending extra cycles on the index computation. 2 KB is probably small enough.

bob · Post by **bob** » Fri Aug 14, 2015 4:09 am

hgm wrote:
bob wrote:Only issue I would see is alignment. I might give it a try to see what it does to speed.
Indeed, and on some architectures it would not be allowed at all for that reason. But on modern Intel-architectures I discovered that there is no penalty at all, unless the word straddles a cache-line boundary.

Or unless you use xmm instructions and forget to use the unaligned opcode (which is slower of course)...

bob · Post by **bob** » Fri Aug 14, 2015 4:11 am

wgarvin wrote:Of course the L1 cachelines are 64 bytes on these architectures, not 32 bytes, but every 64-byte boundary is also a 32-byte boundary.

You might be able to overlap the keys every 8 bits instead of every 16 bits, but its pretty difficult to get a table smaller than 2 KB with no 64-byte boundary crossings without spending extra cycles on the index computation. 2 KB is probably small enough.

You, sir, are one strange individual.

Aleks Peshkov · Post by **Aleks Peshkov** » Fri Aug 14, 2015 3:42 pm

hgm wrote:This saved a factor 8 in cache load.

Just six 64-bit keys is sufficient to encode all chess pieces on the table.
7th key is needed to encode FRC castling and en-passant status without special tricks.

1) Each key is a code for each piecetype at square(0).
2) Each square is circular bit-shift of the original square.
3) Opposite side piece is bit-inverse of the original key.

An example set of keys:

Code: Select all

        0x0218a392cd5d3dbf,
        0x024530decb9f8ead,
        0x02b91efc4b53a1b3,
        0x02dc61d5ecfc9a51,
        0x031faf09dcda2ca9,
        0x0352138afdd1e65b,
        0x03ac4dfb48546797,

Last key is used as super-piece replacing a Rook with active castling right on 1st rank or a Pawn with active en-passant status on the 4th rank.

hgm · Post by **hgm** » Fri Aug 14, 2015 10:45 pm

I tried the circular bit shift once, but it gave me a far higher collision rate than with independent keys. While circularly shifting by 8 bits gave no change in the collision rate at all (compared to independent keys). I use the 8-bit circular shift in XBoard (or even 4-bit shifts in the latest version) on the Polyglot keys to generate extra keys for all the unorthodox piece types (XBoard supports 22 piece types) and larger boards (upto 16x16).

BTW, it occurred to me that with the evry large hash sizes people can afford today, you gain very little by storing a 64-bit signature instead of 32-bit, as most extra bits are implied by the index. But you can get around that by XOR-ing the key with the incremental evaluation before using it as signature (and use the unmodified key for the index)!

Aleks Peshkov · Post by **Aleks Peshkov** » Mon Aug 17, 2015 6:50 pm

hgm wrote:I tried the circular bit shift once, but it gave me a far higher collision rate than with independent keysх.

I did not discovered any 64-bit collisions in my numerous perft-tests. And I observed the same collision rate with samples of short subsequences against a random set.

Keys for bit-shift should not be random. DeBruijn numbers are better then pure random as they are permutations of all 6-bit sequences. So at least they are proved to have no dependency between rotations of itself.

Out of 2^32 64-bit DeBruijn numbers it is possible to select about 2^12 "best" numbers (by Hamming distance or by the largest subsequence length without collisions). Then I pick a set of 7 decent keys by brute force test. Of course several pieces located on the same square are skipped from test.

But you can get around that by XOR-ing the key with the incremental evaluation before using it as signature (and use the unmodified key for the index)!

I do not understand what do you do and what is the gain.

hgm · Post by **hgm** » Mon Aug 17, 2015 11:45 pm

The enhanced collision rate I observed might indeed very well be a property of the PRNG I was using. I did not really get to the bottom of this.

What I do is

Code: Select all

typedef int64 Key;
typedef struct &#123;
Key lock;
...
&#125; HashEntry;

HashEntry hashTable&#91;1<<24&#93;; // 16M entries
int hashMask = &#40;1<<24&#41;-1;
Key hashKey;
int incrementalPstEval;

int index = hashKey & hashMask;
int signature = hashKey + incrementalPstEval;
if&#40;hashTable&#91;index&#93;.lock == signature&#41; &#123; // hash hit
  ...
&#125;

What is achieved is that the key is now truly 64 bits. When the plain hashKey would have been used as signature, the lowest 24 bits of it would automatically match. For if they were different, it would not have been stored in that location, as the index was derived from them. So there would really only be 40 bits, and the probability for a key collision would be 1/2^40 per probe (or a multiple of that if you probe multiple entries). Now keys that have the same upper 40 bits only cause a collision if the PST evaluation of the positions they represent is also exactly the same.

The cost is just a single addition of a chached variable.

most similar hashes of two positions

Re: most similar hashes of two positions

Re: most similar hashes of two positions

Re: most similar hashes of two positions

Re: most similar hashes of two positions

Re: most similar hashes of two positions

Re: most similar hashes of two positions

Re: most similar hashes of two positions

Re: most similar hashes of two positions

Re: most similar hashes of two positions

Re: most similar hashes of two positions