hash collisions

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: hash collisions

Post by hgm »

chrisw wrote: Tue Feb 11, 2020 3:06 pm
Terje wrote: Tue Feb 11, 2020 12:50 pm
mar wrote: Tue Feb 11, 2020 9:31 am
Terje wrote: Tue Feb 11, 2020 3:19 am Is this really better performance-wise than a (pseudo-)legality test on hash moves?
You need a full legality test for hashmoves. pseudo-legal move needs valid context (position), in the case of a hash collision, you get random junk unrelated to the position you're probing.

a pseudo-legal move comes from pseudo-legal move generator and doesn't need full validation, the only thing you need is to make sure you don't leave your own king in check

back on topic, to sum up there are several possibilities:
1) do nothing and bet that your hashkey is big enough that you get no collision and the engine doesn't crash
(xor trick may help reducing the probability since it acts as kind-of a checksum)
2) check your hashmove for legality (if you use a staged move generator)
3) use full legal movegen and simply order the hashmove first (this is what Lucas does in Demolito)
4) postpone validation the way hgm does (=make sure your engine doesn't crash on invalid hashmove)

Not sure how much time staged move generators really saves as full validation gets complicated so perhaps the best price-complexity would be #3
(plus you can easily detect single-reply for free, not that it would matter elo-wise)
No, you don't need a full legality test on hash moves. MakeMove takes care of legality testing pseudo-legal moves in the position already, assuming you use pseudo-legal movegen, thus a pseudo-legality test of the hash move is sufficient.
that would assume you already built a pseudo legal move list to check against. It’s way cheaper to carry out a validation on the assumption your hash move could be any old junk, rather than perform a genmove and then a list check. One of the advantages of generating recommendations out of history (apart from move order) is that you may be able to get away with never doing a genmove at all.

My question was, however, is 4) actually faster? Remembering pieces captured by castling, which piece promoted in case of non-pawn, searching impossible positions, etc. doesn't sound like it would be faster to me, but I might be wrong. I assume hgm has tried both, and I'd love to hear the result of such testing.

For reference, here is the pseudo-legality test in my engine. Captured pieces are encoded in the move and extracted with capturing(move), and will be empty for non-captures and en passants.
He doesn't refer to a list check. What he says is that MakeMove() would already test pseudo-legal moves for legality, for the purpose of checking the moves that come out of your normal move generator, once you get to running that. So if you use that same MakeMove() for making the hash move, you only have to test the latter for pseudo-legality to bring it on par with the moves that would be generated.

My point was that you don't even have to do that, and you only have to check the move for 'absurdities' (such as moving opponent pieces or empty squares), without having to care much where it lands. (It would be useful in this discussion to have a term for a move that is not pseudo-legal, but limits itself to relocating an own piece to an empty square or capturing an opponent with it; I propose to call this a 'regular move'.)
Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: hash collisions

Post by Terje »

hgm wrote: Tue Feb 11, 2020 2:02 pm
Terje wrote: Tue Feb 11, 2020 12:50 pmNo, you don't need a full legality test on hash moves. MakeMove takes care of legality testing pseudo-legal moves in the position already, assuming you use pseudo-legal movegen, thus a pseudo-legality test of the hash move is sufficient.
It is especially the pseudo-legality that is mostly irrelevant. It is hard to imagine code that would break when you move a Knight as a Rook, or have a Bishop jump over 3 pieces. What could be trouble is moving a piece of the wrong color (checking yourself unexpectedly and thus undetected!). Or perhaps capturing a piece of your own (although a careful implementation of MakeMove can usually avoid that without any overhead).
My question was, however, is 4) actually faster? Remembering pieces captured by castling, which piece promoted in case of non-pawn, searching impossible positions, etc. doesn't sound like it would be faster to me, but I might be wrong. I assume hgm has tried both, and I'd love to hear the result of such testing.
Castling and promotion are rare in the tree, (in fact castling might not occur in the entire game, as it was already done in the book line!) so even doing very complex things that only apply to these special cases has no measurable impact on the speed at all. And the required actions to make the code robust are very cheap. My MakeMove() would normally record the type of the moved piece anyway, in order to put it back. So it is just a matter of making sure you put back that piece, rather than the one on the to-square or a hard-coded Pawn. That doesn't really involve any extra work.

Searching 'impossible positions' (or, more accurately, positions that should not have occurred in the tree at that point) would only be done in the case of a hash corruption, which should be even rarer. Like an extra QS every few minutes. You don't want your engine to crash every few minutes, but a few microseconds slowdown per minute is entirely acceptable.

So in practice, what you save is what the the legality-testing of the hash move took.

P.S.

Why do you distinguish the cases KNIGHT, BISHOP, ROOK and QUEEN when you are doing the same thing in all of those?

You don't seem to purely test pseudo-legality here, as having a different capture victim doesn't affect pseudo-legality of the move.
Thanks for the response, that is interesting. I might give it a shot in the future.

The AttackBB function is a fairly recent addition, previously they weren't as similar. Good call, I will see if I can consolidate them into one case :)

And yes, it tests that the move actually matches the position a bit better, by checking that the intended capture is indeed what would be captured if it were played now. TakeMove places back the piece the move says was captured (move stored as history and used to determine how to take it back), so this is required given the current implementation.
Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: hash collisions

Post by Terje »

mar wrote: Tue Feb 11, 2020 2:57 pm
Terje wrote: Tue Feb 11, 2020 12:50 pm No, you don't need a full legality test on hash moves. MakeMove takes care of legality testing pseudo-legal moves in the position already, assuming you use pseudo-legal movegen, thus a pseudo-legality test of the hash move is sufficient.
I certainly do hashmove/killer legality check inside my staged movegen (before next move is selected), but that doesn't really change anything.
(killers use a slightly more relaxed validation)

you need to consider a "move" that comes from a hash collision as a random sequence of bits,
it's not what I'd call pseudo-legal in a given position (random != pseudo-legal),
so for me a pseudo-legal move has a valid source/target, ep cap, promotion, castling etc. => no need to check anything except that it doesn't leave own king in check, which is much easier to do than a full validation (obviously)
How would a move stored in your TT be random bits?
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

Terje wrote: Tue Feb 11, 2020 4:27 pm
mar wrote: Tue Feb 11, 2020 2:57 pm
Terje wrote: Tue Feb 11, 2020 12:50 pm No, you don't need a full legality test on hash moves. MakeMove takes care of legality testing pseudo-legal moves in the position already, assuming you use pseudo-legal movegen, thus a pseudo-legality test of the hash move is sufficient.
I certainly do hashmove/killer legality check inside my staged movegen (before next move is selected), but that doesn't really change anything.
(killers use a slightly more relaxed validation)

you need to consider a "move" that comes from a hash collision as a random sequence of bits,
it's not what I'd call pseudo-legal in a given position (random != pseudo-legal),
so for me a pseudo-legal move has a valid source/target, ep cap, promotion, castling etc. => no need to check anything except that it doesn't leave own king in check, which is much easier to do than a full validation (obviously)
How would a move stored in your TT be random bits?
It could be squares separated by a knight offset that just happened to have been there when the origins square contains a not-knight, for example
It’s safest to assume “any old junk” could be there (what did the initialiser put there in first place, eg)
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: hash collisions

Post by hgm »

That is not 'random bits'. So you move a Rook (say) as a Knight. Totally harmless.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

hgm wrote: Tue Feb 11, 2020 5:44 pm That is not 'random bits'. So you move a Rook (say) as a Knight. Totally harmless.
I'm assuming origin, destination are in integer range 0-63, "random bits" would give any of 4096 possible 'moves'. Unless you're arguing to build a search tree which contains, say g3 to h7, as a move, thus with unreachable lines, then at some stage those 'to' squares are going to have to be checked off against possible destination squares for the piece type. Since all this stuff can be cheaply done at verification in line with checking for square occupancy and so on, there's really no good reason to leave valid destination square offset (and possible in-between bit occupation) to a later test (if you propose to do that test at all, which is by no means clear from your posts). Not crashing the engine is one thing, not producing impossible search lines is another (imho).
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

hgm wrote: Tue Feb 11, 2020 3:27 pm
chrisw wrote: Tue Feb 11, 2020 3:06 pm
Terje wrote: Tue Feb 11, 2020 12:50 pm
mar wrote: Tue Feb 11, 2020 9:31 am
Terje wrote: Tue Feb 11, 2020 3:19 am Is this really better performance-wise than a (pseudo-)legality test on hash moves?
You need a full legality test for hashmoves. pseudo-legal move needs valid context (position), in the case of a hash collision, you get random junk unrelated to the position you're probing.

a pseudo-legal move comes from pseudo-legal move generator and doesn't need full validation, the only thing you need is to make sure you don't leave your own king in check

back on topic, to sum up there are several possibilities:
1) do nothing and bet that your hashkey is big enough that you get no collision and the engine doesn't crash
(xor trick may help reducing the probability since it acts as kind-of a checksum)
2) check your hashmove for legality (if you use a staged move generator)
3) use full legal movegen and simply order the hashmove first (this is what Lucas does in Demolito)
4) postpone validation the way hgm does (=make sure your engine doesn't crash on invalid hashmove)

Not sure how much time staged move generators really saves as full validation gets complicated so perhaps the best price-complexity would be #3
(plus you can easily detect single-reply for free, not that it would matter elo-wise)
No, you don't need a full legality test on hash moves. MakeMove takes care of legality testing pseudo-legal moves in the position already, assuming you use pseudo-legal movegen, thus a pseudo-legality test of the hash move is sufficient.
that would assume you already built a pseudo legal move list to check against. It’s way cheaper to carry out a validation on the assumption your hash move could be any old junk, rather than perform a genmove and then a list check. One of the advantages of generating recommendations out of history (apart from move order) is that you may be able to get away with never doing a genmove at all.

My question was, however, is 4) actually faster? Remembering pieces captured by castling, which piece promoted in case of non-pawn, searching impossible positions, etc. doesn't sound like it would be faster to me, but I might be wrong. I assume hgm has tried both, and I'd love to hear the result of such testing.

For reference, here is the pseudo-legality test in my engine. Captured pieces are encoded in the move and extracted with capturing(move), and will be empty for non-captures and en passants.
He doesn't refer to a list check. What he says is that MakeMove() would already test pseudo-legal moves for legality, for the purpose of checking the moves that come out of your normal move generator, once you get to running that. So if you use that same MakeMove() for making the hash move, you only have to test the latter for pseudo-legality to bring it on par with the moves that would be generated.
Yes.

My point was that you don't even have to do that, and you only have to check the move for 'absurdities' (such as moving opponent pieces or empty squares), without having to care much where it lands. (It would be useful in this discussion to have a term for a move that is not pseudo-legal, but limits itself to relocating an own piece to an empty square or capturing an opponent with it; I propose to call this a 'regular move'.)
Unless your MakeMove is testing for origin-destination bad offset (and blocked in-between squares), feeding it potential rules-of-chess-impossible moves is going to produce impossible lines (even though engine doesn't crash).
While if MakeMove is testing for bad offset, then many moves it receives from the pseudo-legal list will be unnecessarily tested.

Still not getting why not program the engine to the conceptually clear maxim of Validate-History-Moves at the point a proposed History-Move is selected for forward processing and not in bits, here, there, and somewhere else later. It feels like asking for trouble.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: hash collisions

Post by hgm »

Well, so once every so many million (billion?) nodes you search an impossible line, without crashing the engine, and the result will be discarded long before it reaches the root with 99.9999% probability. So what? Even testing only every hash move for a valid offset will take thousands times more time.

If searching the occasional impossible line would produce more Elo, there are many people here that will prefer it.

This whole invalid-hash-move business is only an issue because it can crash the engine.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

hgm wrote: Tue Feb 11, 2020 7:34 pm Well, so once every so many million (billion?) nodes you search an impossible line, without crashing the engine, and the result will be discarded long before it reaches the root with 99.9999% probability. So what? Even testing only every hash move for a valid offset will take thousands times more time.

If searching the occasional impossible line would produce more Elo, there are many people here that will prefer it.

This whole invalid-hash-move business is only an issue because it can crash the engine.
Ultimately, this is an example of commercial/academic split on the ethics of not bothering to remove possible bugs/faults. I learnt to work to 100% no bugs and 100% no undefined behaviours. Because it matters. You think rare cases don’t matter and speculate that it will all be okay anyway. Well, it’s a philosophy that is not going to wash in a quality control environment intolerant of faults (all successful companies), especially those that don’t need to be there. In the real world where design matters, you’ld just get told it’s sloppy and if you insist on not fixing then bye-bye, product not accepted, cheque not sent.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: hash collisions

Post by Dann Corbit »

I complained to the Stockfish group that badly formed EPD will cause Stockfish to crash and core dump heinously, eventually filling my disk drive with core files.
Their response was: "Don't send badly formed EPD to Stockfish then."
I guess we should all expect that the millions of EPD positions we find on the internet are all properly formed. After all, a badly formed EPD record would mean that someone made a mistake in forming the EPD record.

In the commercial software world, this would be a criminal act (knowingly releasing something with a severe defect). It would involve incompetence and negligence at the very least. And if told of the problem and the problem was not addressed, possibly even malice and/or breach of contract.

So I had to come up with my own EPD purifier. I recently wrote an automatic Castle bug corrector (removes all impossible castle flags) because it is much better to correct than detect when possible {KQkq on EPD records with no possible castle operations is very common.}. About 97% of all e.p. flags are spurious. No matter how absurd a condition might seem (too many kings on the board, pawns on first and last rank, etc.) they occur quite often in real life. This is especially true of test EPD files and problem solving site EPD files because they are often created manually.

But you get what you pay for, and I must admit the price is right when it comes to Stockfish. I have work-arounds for most of the problems I have discovered. I stopped reporting them because it is clearly a waste of time.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.