hash collisions

Ovyron · Post by **Ovyron** » Tue Feb 18, 2020 8:53 am

chrisw wrote: ↑Mon Feb 17, 2020 11:19 am If there's no list (it was asserted a few posts ago that there is a huge list)

I linked you to the list. What happens is that they don't differentiate between code changes to improve the engine, to change the engine (i.e. simplifications or other "non-functional changes") and to fix bugs. For that, you'd need to check every single comment of each commit to see if it was a bug fix, and I'm sorry, but nobody is going to compile a list of "just the bug fixes" just for you.

This becomes moot if you change the definition of "bug", though. Suppose there was a bug with an useless line of code in the engine that was checked every cycle and that removing the line makes the executable 1% faster. The code was there when Stockfish 10 was shipped, so it wasn't bug-free.

But it doesn't make the program hang or crash or make a different/losing move and any user couldn't really tell the speedup difference because some other code may actually slow it down (i.e. code that makes the engine search less nodes, but play stronger) so the bug fixing is barely noticeable. Now you'd say this wasn't a bug? So you could do something like this for all bugs on the list and claim the program was actually bug-free?

Stockfish bugs are being fixed as fast as humanly possible, but once one appears, it means it's been there for a while, and you can't get a list of remaining bugs because they haven't been discovered yet. All this means nothing if we can't agree to the definition of "bug."

hgm · Post by **hgm** » Tue Feb 18, 2020 10:33 am

mar wrote: ↑Mon Feb 17, 2020 9:40 pm
hgm wrote: ↑Mon Feb 17, 2020 8:46 pm I don't think this works. You can also get zero during squashing when all key extensions differ from the sought one, but all in different bits. So you would get a lot of false potential hits.

I often regret that there doesn't exist something like a 'logical multiply', which does a binary multiplication using OR on the partial products instead of adding them. A logical multiply of the signature-key XOR with 0xFF would then only leave 0 on the 0x8080808080808080ull bits if the corresponding byte was all 0 (i.e. a match). It could be emulated by
Code: Select all
bhash ^= xunp;
bhash |= bhash << 1;
bhash |= bhash << 2;
bhash |= bhash << 4;
return (bhash & 0x8080808080808080ull) != 0x8080808080808080ull;
You're right, I just did a quick test and got about 970k false positives out of 1M random matches with expected failure. That's a disaster.
I tried what you propose instead but it seems to suffer from lots of false negatives, unless I implemented it in a wrong way.
EDIT: correction: your version indeed works, I messed up the test itself

The method I proposed above uses 9 instructions to collect the difference bits in one place (3 x (MOVE, SHIFT, OR)). It can be done with fewer instructions, and without shifts (which can be a bottle-neck, as there usually is only a single shift unit per core, as opposed to 3 ALUs), making use of carry propagation:

Code: Select all

bhash ^= xunp;
bhash |= (bhash | 0x8080808080808080ull) - 0x0101010101010101ull;
return (bhash & 0x8080808080808080ull) != 0x8080808080808080ull;

The subtraction only clears the (artifically set) uppermost bit of each extension when all the lower bits in the difference were zero, after which we OR it with the original difference for itself. Only if that was also zero a zero remains. This requires a MOVE, SUB and 2x OR, only 4 instructions and no shifts. It might be better to just return (bhash & 0x8080808080808080ull) rather than the result of the comparison, so that the caller can test it for 0 by XORing it with 0x8080808080808080ull, and when it isn't, continue with extraction of 1 bits to identify the potential hits.

This would work just as well with 9-bit signature extensions (where it would take more shifts + ORs to collect the 9 bits that way). Or in fact for signature extensions of any size; for hash entries that really cannot be made to fit in 8 bytes you could move some less critical info like depth out of the atomic part, and pack it with the similar info of other entries in another word of the buckets, i.e. have 1 word with 6 10-bit signature extensions, one word with 6 depths, and 6 8-byte entries.

chrisw · Post by **chrisw** » Tue Feb 18, 2020 12:48 pm

bob wrote: ↑Tue Feb 18, 2020 3:37 am So, as usual, you want to just hand-wave the discussion away.

I prefer it to Bob dense paragraph waving. My rule for which has been, for a long time now, the longer you post and the more examples using cars included, the more is the one central point (being usually lost by then anyway) wrong.

This is a major problem. The entire field of software engineering sprang up to make progress. And today, the best SE can do is REDUCE the number of bugs that slip by. It is impossible to eliminate them completely.

your argument is circular. You assume there are always more bugs, therefore you can never get to none. Is like the maths proof about infinity and largest number. If I say N, you say N+1.
If I say N+2, you say N+3 and so on.

There are not an infinite number of bugs in a developing chess engine, obviously. What you are trying to tell us is just wrong, you are telling us that we can’t sit down and methodically test a finite engine of relatively low complexity in a bounded 64 square environment, finding and removing bugs one by one until we found them all. All N of them.

I say it is possible. Signed contracts to do it. Created the infrastructure to do it. Done it. Seen others do it. Got product through the most destructive possible testing environments. Many times.

If we could, that would be the holy grail of computer science.

misuse of concept. You just mean it would be a good thing if stuff worked.

Do some reading. Learn some facts. Then you will begin to understand the problems.

Here we go again. You just can’t stop yourself with the NPD put downs, can you?
Is there a technical argument anywhere that exists where you are not personally insulting the person who technically argues with you?

Having some foreign hot-shot come and yell is NOT going to eliminate all bugs.

actually, it did, and it provided the model by which we did it ourselves. Japanese style quality control. Which created Value, proven by Market.

Only way to do that is to eliminate all programmers, which would then eliminate all programs, which is the only way to eliminate all bugs.

chrisw · Post by **chrisw** » Tue Feb 18, 2020 12:53 pm

Ovyron wrote: ↑Tue Feb 18, 2020 8:53 am
chrisw wrote: ↑Mon Feb 17, 2020 11:19 am If there's no list (it was asserted a few posts ago that there is a huge list)
I linked you to the list. What happens is that they don't differentiate between code changes to improve the engine, to change the engine (i.e. simplifications or other "non-functional changes") and to fix bugs. For that, you'd need to check every single comment of each commit to see if it was a bug fix, and I'm sorry, but nobody is going to compile a list of "just the bug fixes" just for you.

Just one would be enough. But we didn’t see it from you, only the assertion of lots of bugs in SF plus a huge list of changes where changes are not bugs.
One actual bug from SF10 release to SF11 release .....?

This becomes moot if you change the definition of "bug", though. Suppose there was a bug with an useless line of code in the engine that was checked every cycle and that removing the line makes the executable 1% faster. The code was there when Stockfish 10 was shipped, so it wasn't bug-free.

But it doesn't make the program hang or crash or make a different/losing move and any user couldn't really tell the speedup difference because some other code may actually slow it down (i.e. code that makes the engine search less nodes, but play stronger) so the bug fixing is barely noticeable. Now you'd say this wasn't a bug? So you could do something like this for all bugs on the list and claim the program was actually bug-free?

Stockfish bugs are being fixed as fast as humanly possible, but once one appears, it means it's been there for a while, and you can't get a list of remaining bugs because they haven't been discovered yet. All this means nothing if we can't agree to the definition of "bug."

mar · Post by **mar** » Tue Feb 18, 2020 2:17 pm

hgm wrote: ↑Tue Feb 18, 2020 10:33 am The method I proposed above uses 9 instructions to collect the difference bits in one place (3 x (MOVE, SHIFT, OR)). It can be done with fewer instructions, and without shifts (which can be a bottle-neck, as there usually is only a single shift unit per core, as opposed to 3 ALUs), making use of carry propagation:
Code: Select all
bhash ^= xunp;
bhash |= (bhash | 0x8080808080808080ull) - 0x0101010101010101ull;
return (bhash & 0x8080808080808080ull) != 0x8080808080808080ull;
The subtraction only clears the (artifically set) uppermost bit of each extension when all the lower bits in the difference were zero, after which we OR it with the original difference for itself. Only if that was also zero a zero remains. This requires a MOVE, SUB and 2x OR, only 4 instructions and no shifts. It might be better to just return (bhash & 0x8080808080808080ull) rather than the result of the comparison, so that the caller can test it for 0 by XORing it with 0x8080808080808080ull, and when it isn't, continue with extraction of 1 bits to identify the potential hits.

This would work just as well with 9-bit signature extensions (where it would take more shifts + ORs to collect the 9 bits that way). Or in fact for signature extensions of any size; for hash entries that really cannot be made to fit in 8 bytes you could move some less critical info like depth out of the atomic part, and pack it with the similar info of other entries in another word of the buckets, i.e. have 1 word with 6 10-bit signature extensions, one word with 6 depths, and 6 8-byte entries.

That's amazing! I understood your previous idea, where you squash into msbit, but I need to wrap my head around this one first (I can confirm it works).

Yes, one could use the 9-bit extended signature with this one, but:

question is how many bits are really needed for age and how to store it, I find it tricky
to pack tightly into a 64-bit dense entry.

Assuming we use 8-bit extended signature:
we store 24-bit part of the hash + 2 bits bounds + 14 bits move + 8 bits depth (I could probably use 7 bits for depth) + 16 bits value (score), there's not much space left for age (I don't want to cut from hash).
We still have 7 bits left in ext_sig (1 bit wasted, but that's fine), assuming I'd only store 7-bit depth that'd leave me with only 2 bits for age.
Maybe I could cut one extra bit from score.

You actually already proposed a solution, but I don't want to give up on a 7-entry bucket in favor of a 6-entry one.

hgm · Post by **hgm** » Tue Feb 18, 2020 3:30 pm

This isn't really the fault of the bucket layout. A cache line has only 512 bit, so if you want 7 entries per bucket there are only 73 bits available for each entry (plus one spare), no matter what. You have to economize on something to make that fit. With 32 key bits, 7 depth, 2 flags, 16 score, 14 move you already use 71.

Do you really need 16 bits for score? Fairy-Max uses INFINITY = 8000 (centiPawn), and it has never caused a problem.

In principle 13 bits should be enough for the move: one bit to indicate promotion, and in that case use two of the rank bits of the to-square to indicate the promotion choice. (In fact even 12 bits would be enough, as all moves of a 7th rank Pawn will always be promotions, so you do not need an extra flag bit to indicate it.)

My preferred replacement scheme does not consider all entries equivalent, but has dedicated entries for depth-preferred, always-replace and undercut treatment. Only the depth-preferred entries need aging bits; always-replace and undercut are 'self-purging'. My preferred probe scheme is to probe entry 1, 2, 4, 6 or entry 1, 3, 5, 7 depending on an otherwise unused key bit (which effectively lowers the chances for a collision, except in entry 1, which is the always-replace entry and will always contain very low depth, where collisions are especially harmless). Only 4-7 would need aging bits; 2 and 3 would be undercut entries. I discovered the latter to be very important in long analysis sessions: the depth-preferred engines fill up there, because the entire session is considered one search (or it would be usueless in the first place), so that aging doesn't work. After some time the search then has to run entirely in the always-replace entries, which makes it very sluggish, as the valuable deep entries are flushed out by QS nodes very quickly. The undercut entries hang on to deeper data much longer.

Rebel · Post by **Rebel** » Tue Feb 18, 2020 4:02 pm

bob wrote: ↑Mon Feb 17, 2020 7:08 pm
Rebel wrote: ↑Mon Feb 17, 2020 11:32 am Definition of a bug - A bug is when the engine doesn't follow the intention of the programmer.

When a programmer can't stand losing a game he will make the engine crash after a losing score.

It's NOT a bug.

[/troll]
No,

Yes!

[d]8/8/k7/1pK5/p1p5/B1P5/1P6/4n3 b - - 33 115
This is Komodo 10 vs Stockfish 8 during a cutechess match, the PGN states:

Code: Select all

115... Nc2 +9.92/29

We are used to call that a bug, but is it?

The idiotic score can't be reprodced and it's quite well possible this particular position fits the subject line of this thread, a notorious case of a hash collision. And if so then it is NOT a bug because every programmer is well aware that one day this might happen and so my definition [A bug is when the engine doesn't follow the intention of the programmer.] still stands.

mar · Post by **mar** » Tue Feb 18, 2020 4:42 pm

I've just realized that my code (I haven't touched it for the past 5 years except for minor bugfixes - no pun intended) relies on move flags
(validation) so I'll probably stick with 4-entry buckets, bob's xor trick and 64-bit keys.
I still like your idea a lot - certainly worth trying, but maybe for a different project in the future.

(score could be reduced by a couple of bits for sure)

I need to search for what exactly you mean with undercut replacement. So far I found one thread here where you proposed it as a replacement scheme several years ago.
As for replacement scheme, I simply compute a replacement score such that entries with mismatching age are the hottest candidates, then depth and finally I penalize replacing PV nodes based on bound.
Since I treat all of them equally, there's certainly a lot of room for experiments and improvement, especially because I hash qsearch too.
I could certainly try to fit your replacement ideas to the 4-bucket TT to see how it does,
this is very tempting (especially considering that my default TT size is 4MB [bytes, not entries]).

Alayan · Post by **Alayan** » Tue Feb 18, 2020 5:01 pm

Rebel · Post by **Rebel** » Tue Feb 18, 2020 5:03 pm

chrisw wrote: ↑Tue Feb 18, 2020 12:48 pm
bob wrote: ↑Tue Feb 18, 2020 3:37 am Do some reading. Learn some facts. Then you will begin to understand the problems.

Here we go again. You just can’t stop yourself with the NPD put downs, can you?
Is there a technical argument anywhere that exists where you are not personally insulting the person who technically argues with you?

I am sure he uses a compiler that creates bug-free code.

hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions