lockless hashing

lucasart · Post by **lucasart** » Wed May 11, 2016 1:10 am

I tested it, using the most hasic tt implementation: always replace.

I even skewed the testing conditions as far as I could to make the effect mesurable (8 threads, 1mb hash). But could not measure any elo gain.

Maybe my machine somehow loads and stores my 16 bytes entries atomically already. Even if not guaranteed by the manual.

Has anyone ever managed to get anything out of lockless hashing ? I'm only interested in something statistically mesurable, handwaving or theoretical arguments aside.

syzygy · Post by **syzygy** » Wed May 11, 2016 1:24 am

lucasart wrote:I tested it, using the most hasic tt implementation: always replace.

I even skewed the testing conditions as far as I could to make the effect mesurable (8 threads, 1mb hash). But could not measure any elo gain.

The number of hash errors due to one thread reading an entry at the same time as another thread is updating that same entry with a new position is so small (especially with a large hashtable, which in the end is the only thing that really matters) that you will not be able to measure the effect.

In particular if you are testing with Stockfish, which already has to deal with thousands of hash errors per second.

lucasart · Post by **lucasart** » Wed May 11, 2016 2:57 am

syzygy wrote:
lucasart wrote:I tested it, using the most hasic tt implementation: always replace.

I even skewed the testing conditions as far as I could to make the effect mesurable (8 threads, 1mb hash). But could not measure any elo gain.
The number of hash errors due to one thread reading an entry at the same time as another thread is updating that same entry with a new position is so small (especially with a large hashtable, which in the end is the only thing that really matters) that you will not be able to measure the effect.

In particular if you are testing with Stockfish, which already has to deal with thousands of hash errors per second.

I'm testing with my own engine: https://github.com/lucasart/Demolito

As you can see, the tt implementation is as simple as it gets. And I use TT everywhere, including qsearch (should maximize the effect of collusions).

hgm · Post by **hgm** » Wed May 11, 2016 7:40 am

The point is that hash probe errors (e.g. because of key collisions) have virtually zero effect on Elo anyway, unless the invalidity of the hash move can crash the machine (e.g. by accidentally capturing a King, leading to an off-board access in a later check test). Lockless hashing is just a method to remove one source of probe errors.

Best way to test the effect is (in a single-threaded search) to intentionally once every N moves spoil the hash entry you are storing by replacing the second 8 bytes with those from a randomly picked other hash entry already in the table. And then see how low you can make N before you observe any effect on Elo. That way you can easily get millions as many wrong stores as with multi-threading. And you probably still won't see any effect on Elo, if N > 100.

lucasart · Post by **lucasart** » Wed May 11, 2016 1:56 pm

hgm wrote:The point is that hash probe errors (e.g. because of key collisions) have virtually zero effect on Elo anyway, unless the invalidity of the hash move can crash the machine (e.g. by accidentally capturing a King, leading to an off-board access in a later check test). Lockless hashing is just a method to remove one source of probe errors.

Best way to test the effect is (in a single-threaded search) to intentionally once every N moves spoil the hash entry you are storing by replacing the second 8 bytes with those from a randomly picked other hash entry already in the table. And then see how low you can make N before you observe any effect on Elo. That way you can easily get millions as many wrong stores as with multi-threading. And you probably still won't see any effect on Elo, if N > 100.

Interesting.

In short, lockless hashing is an elegant solution to a non-problem

hgm · Post by **hgm** » Wed May 11, 2016 2:30 pm

Indeed, it seems like that. Unless your engine could crash on a wrong hash move, and the move is not stored in the same atomically loaded part of the entry as the signature. Then you would have to do something to make it crash-resistent. That could involve checking the hash move for any wrecking activity (like moving empty squares, capturing Kings, or castling on top of interposed pieces). But that might be more expensive than the lockless hashing. A larger signature would not get the probe-error rate below that caused by SMP store collisions.

A 4-byte signature would produce one probe error per 4G probes, or at 1Mnps one probe error every 4M sec ~1000 hours. At 10Mnps that would become 100 hours. Of course not every probe error will cause a crash; positions in the tree tend to resemble each other, and most of the time a move valid in one would still be valid in the other. So crash rate might be 10 times smaller. Still, it is just a little bit too large for comfort. Storing the full 64-bit key as signature seems like over-doing it, and thus waste space. An alternative solution is to store a 1-byte 'signature extension' in the other half of the hash entry, with bits not used to derive the index. That would give you an extra check in case of what otherwise would have been considered a hit, which is is probably cheaper than validating the hash move. It would reduce the intrinsic key-collision problem by another factor 256, making it bearable even at long TC, (crash every 250,000 hours), and at the same time catch 99.6% of the SMP store corruption.

Karlo Bala · Post by **Karlo Bala** » Wed May 11, 2016 5:36 pm

hgm wrote:The point is that hash probe errors (e.g. because of key collisions) have virtually zero effect on Elo anyway, unless the invalidity of the hash move can crash the machine (e.g. by accidentally capturing a King, leading to an off-board access in a later check test). Lockless hashing is just a method to remove one source of probe errors.

Best way to test the effect is (in a single-threaded search) to intentionally once every N moves spoil the hash entry you are storing by replacing the second 8 bytes with those from a randomly picked other hash entry already in the table. And then see how low you can make N before you observe any effect on Elo. That way you can easily get millions as many wrong stores as with multi-threading. And you probably still won't see any effect on Elo, if N > 100.

I thought that the point of lockless hashing is to speed up the search by avoiding lock/unlock when too many threads try to access shared memory.

Evert · Post by **Evert** » Wed May 11, 2016 6:50 pm

Karlo Bala wrote:
hgm wrote:The point is that hash probe errors (e.g. because of key collisions) have virtually zero effect on Elo anyway, unless the invalidity of the hash move can crash the machine (e.g. by accidentally capturing a King, leading to an off-board access in a later check test). Lockless hashing is just a method to remove one source of probe errors.

Best way to test the effect is (in a single-threaded search) to intentionally once every N moves spoil the hash entry you are storing by replacing the second 8 bytes with those from a randomly picked other hash entry already in the table. And then see how low you can make N before you observe any effect on Elo. That way you can easily get millions as many wrong stores as with multi-threading. And you probably still won't see any effect on Elo, if N > 100.
I thought that the point of lockless hashing is to speed up the search by avoiding lock/unlock when too many threads try to access shared memory.

It is, but you only need to lock/unlock if you care about scrambled hash table entries to begin with. If accepting the occasional scrambled table entry is perfectly fine, then you don't need lockless hashing.

It might be worthwhile to dig out the testing conditions for the lockless scheme, which might be quite different from what we see today.

hgm · Post by **hgm** » Wed May 11, 2016 6:53 pm

Karlo Bala wrote:I thought that the point of lockless hashing is to speed up the search by avoiding lock/unlock when too many threads try to access shared memory.

But we already take it for granted that no lock/unlock is done. Just don't lock, without worrying about anything.

So 'lockless hashing' as invented by Bob is a bit of a misnomer. Sure, it doesn't use locking, but it isn't the only method that doesn't use locking, and it isn't the obvious method that doesn't use locking. A better description would be "self-checking non-atomic hashing", with the emphasis on "self-checking".

bob · Post by **bob** » Wed May 11, 2016 7:18 pm

lucasart wrote:I tested it, using the most hasic tt implementation: always replace.

I even skewed the testing conditions as far as I could to make the effect mesurable (8 threads, 1mb hash). But could not measure any elo gain.

Maybe my machine somehow loads and stores my 16 bytes entries atomically already. Even if not guaranteed by the manual.

Has anyone ever managed to get anything out of lockless hashing ? I'm only interested in something statistically mesurable, handwaving or theoretical arguments aside.

Lockless hashing is NOT about "Elo". It is about correctness in a threaded search, where a bad move (or other information) could cause a program to crash. If your program is immune to crashes caused by a bogus hash table best move, this won't provide any measurable gain whatsoever. Same applies to lockless pawn hashing as well.

lockless hashing

lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing

Re: lockless hashing