hash collisions

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: hash collisions

Post by Terje »

mar wrote: Wed Feb 12, 2020 2:52 am I thought there's no need to explain this.
The only thing you had to explain was your unintuitive (wrong) usage of "random bits".
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

Dann Corbit wrote: Wed Feb 12, 2020 12:05 am I complained to the Stockfish group that badly formed EPD will cause Stockfish to crash and core dump heinously, eventually filling my disk drive with core files.
Their response was: "Don't send badly formed EPD to Stockfish then."
I guess we should all expect that the millions of EPD positions we find on the internet are all properly formed. After all, a badly formed EPD record would mean that someone made a mistake in forming the EPD record.

In the commercial software world, this would be a criminal act (knowingly releasing something with a severe defect). It would involve incompetence and negligence at the very least. And if told of the problem and the problem was not addressed, possibly even malice and/or breach of contract.

So I had to come up with my own EPD purifier. I recently wrote an automatic Castle bug corrector (removes all impossible castle flags) because it is much better to correct than detect when possible {KQkq on EPD records with no possible castle operations is very common.}. About 97% of all e.p. flags are spurious. No matter how absurd a condition might seem (too many kings on the board, pawns on first and last rank, etc.) they occur quite often in real life. This is especially true of test EPD files and problem solving site EPD files because they are often created manually.
Yup There's an awful lot of ways EPDs can be broken or internally inconsistent, including ways one never thought of and only find when everything goes wrong. There's also a lot of different 'standards' of extra data tagged onto EPDs, scoring information and so on, lurking there to break things.
In the case of SF, frankly it would be better, recognising that "out there" contains epd suites with problematical epds, if someone write some thorough EPD load source and submitted it for inclusion. Then everything works on a plug and play basis, as a matter of design philosophy. But, sure, as you write, SF is free, so.

But you get what you pay for, and I must admit the price is right when it comes to Stockfish. I have work-arounds for most of the problems I have discovered. I stopped reporting them because it is clearly a waste of time.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

hgm wrote: Wed Feb 12, 2020 9:23 am
mar wrote: Wed Feb 12, 2020 2:52 am
Terje wrote: Tue Feb 11, 2020 4:27 pm How would a move stored in your TT be random bits?
Of course it is, if you get a hash collision, which this thread is about.
You don't have to go this far of course, what about killers? But hey at least you know they're quiet :)

The common sense is of course to check that the move is pseudo-legal first (complicated), then you make sure pseudo-legal is legal (easy).
I thought there's no need to explain this.

So yeah, doing a random move is harmless unless it's not, assuming it's vastly more likely deep down tree and the search will compensate for that.
I think the issue here comes from a different interpretation of the word 'random'. The move there must have been a perfectly legal move when it was stored, i.e. in some other position. So you will never encounter moves like f1g4 or e4e5q.
That would depend entirely on assumptions of what the hash table memory was initialised with, if at all. It would be good practice to assume it could be filled with anything at all, including entirely random bits of junk. Assuming otherwise and getting away with it, is just storing up future problems for when something else gets done differently by some apparently disconnected new or changed code.

Just filling the move field with a random bit pattern would give you such moves.

This distinction can be especially important when your move consists of more than just from-square and to-square, but also contains a field to indicate promotion type or e.p. victim. Depending on the implementation invalid combination of fields could have unpredictable, unexpected and undesired effects.
mar wrote: Wed Feb 12, 2020 3:36 am
hgm wrote: Tue Feb 11, 2020 7:34 pm Even testing only every hash move for a valid offset will take thousands times more time.
Any data to support this claim? You still need to do some basic validatation even if you postpone the check.
A simple calculation from counting the number of instructions used in the test, the number of cycles their execution takes, the nodes per second of the engine and the frequency of hash collisions / corruptions.
If searching the occasional impossible line would produce more Elo, there are many people here that will prefer it.
Only if you're lucky, which is very unlikely (unlikely implies no elo gain or even elo loss). If this produces elo then why even bother with move ordering?
Searching unnecessary lines (be it impossible lines or lines that could have been cut off) wastes time. This hurts more when you do it in every node (likely multiple times) than when you do it only once every billion nodes. So the amount of work you can afford to prevent it before the cure gets worse than the disease will be several billions times larger in the case of move ordering. It turns out that ordering the moves as we typically do pays off. Now this move scoring and sorting is expensive, but it takes nowhere near a billion CPU cycles (per node). So it shouldn't be a surprise that even spending a single cycle for a billion times smaller problem doesn't pay off.
That's not true. This is not about absolute CPU time used to fix problem. It's about proportionate program time spent against risk. Any code that defends against risk is worth including if it's effect on total program time is small. A few cycles spent on testing the move offset (plus slider blockers) is well worth the removal of a correctable fault such as chess-nonsense lines from the tree.

Else you are saying to users: "I know my program can generate idiot unlawful trees, and I could fix it at negligible cost, but I don't care." Faced with a choice of using/employing a programmer with that attitude and one with an attitude of writing software with internal integrity testing, any corporate, or socially responsible organisation, is going to fire the former and use the latter, rightly so. Is also a bad message re sloppy programming to be giving out to new programmers here. One thing I don't quite get is that since you're going to have to integrity-check all the other history generated suggestions, else you'll be generating a lot of bad tree, why not simply properly check everything, including hash moves and have done with it?
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

hgm wrote: Wed Feb 12, 2020 9:03 am
chrisw wrote: Tue Feb 11, 2020 11:50 pmUltimately, this is an example of commercial/academic split on the ethics of not bothering to remove possible bugs/faults. I learnt to work to 100% no bugs and 100% no undefined behaviours. Because it matters. You think rare cases don’t matter and speculate that it will all be okay anyway. Well, it’s a philosophy that is not going to wash in a quality control environment intolerant of faults (all successful companies), especially those that don’t need to be there. In the real world where design matters, you’ld just get told it’s sloppy and if you insist on not fixing then bye-bye, product not accepted, cheque not sent.
Then writing chess programs is the wrong buiseness for you.
yeah, well, history would suggest otherwise.

All existing programs play less than perfect chess, and thus play faulty moves frequently. To the user it doesn't matter at all whether an engine blunders because it has an impossible line somewhere in its search tree, or because the tree cut away the relevant node to make time for executing the code to prevent impossible lines.
Misuse of the term "fault". Well written AB algorithm code does the best it can to find a good chess move given that it is time/space unable to search the entire tree. That the move is not certifiably "perfect" is NOT a fault, not is the code "faulty".
Code that knowingly creates pieces of chess-unlawful tree, with unpredictable chessic results, when those unlawful tree branches could be avoided by simple integrity testing, does contains "faults".

A good example of a generally accepted error-prone technique is ignoring the game history when hashing, and only hash on the current position. This tends to mask repetitions, which in some positions is a major fault.

All incandescent light bulbs that have ever been sold were faulty, and burnt out after some time. Yet companies that sold them have been very successful, because there was a huge demand even for such faulty light bulbs. Most people did not whine about the ethics this, but just accepted the flaw as a fact of life.
Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: hash collisions

Post by Terje »

chrisw wrote: Wed Feb 12, 2020 3:59 pm
hgm wrote: Wed Feb 12, 2020 9:23 am
mar wrote: Wed Feb 12, 2020 2:52 am
Terje wrote: Tue Feb 11, 2020 4:27 pm How would a move stored in your TT be random bits?
Of course it is, if you get a hash collision, which this thread is about.
You don't have to go this far of course, what about killers? But hey at least you know they're quiet :)

The common sense is of course to check that the move is pseudo-legal first (complicated), then you make sure pseudo-legal is legal (easy).
I thought there's no need to explain this.

So yeah, doing a random move is harmless unless it's not, assuming it's vastly more likely deep down tree and the search will compensate for that.
I think the issue here comes from a different interpretation of the word 'random'. The move there must have been a perfectly legal move when it was stored, i.e. in some other position. So you will never encounter moves like f1g4 or e4e5q.
That would depend entirely on assumptions of what the hash table memory was initialised with, if at all. It would be good practice to assume it could be filled with anything at all, including entirely random bits of junk. Assuming otherwise and getting away with it, is just storing up future problems for when something else gets done differently by some apparently disconnected new or changed code.
Is it not common to explicitly initialize the hash table to all zeroes?
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: hash collisions

Post by hgm »

chrisw wrote: Wed Feb 12, 2020 3:59 pmElse you are saying to users: "I know my program can generate idiot unlawful trees, and I could fix it at negligible cost, but I don't care."
Not at all. I would be saying: "This could be easily fixed, but the cure would be worse than the disease, so I opted for giving you the best possible product, Elo-wise".
chrisw wrote: Wed Feb 12, 2020 4:09 pmMisuse of the term "fault". Well written AB algorithm code does the best it can to find a good chess move given that it is time/space unable to search the entire tree. That the move is not certifiably "perfect" is NOT a fault, not is the code "faulty".
Code that knowingly creates pieces of chess-unlawful tree, with unpredictable chessic results, when those unlawful tree branches could be avoided by simple integrity testing, does contains "faults".
They would NOT do the best they can if they spent time on curing something that cannot even be called a cosmetic flaw (as it would almost certainly never become visible to the user even once in his lifetime), time taken away from expanding the search tree.

That YOU don't like what is under the hood of a program is also not a 'fault'. (Of the program, that is...)

Chess programs occasional do blunder. It is of no concern to the user whether the cause of the blunder is searching an impossible line, or of search fewer nodes than you could have searched. The bottom line is to minimize the frequency of blunders.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: hash collisions

Post by Ovyron »

Indeed, we have the Rybka case, that had a major known bug where the engine would slow down to a crawl once the score hit a certain point, where the advice was to use another engine once that happened.

Instead of fixing it and providing a patch like Vas was doing in the times of Rybka 2.3.2a (such fixes were what gave it this funny version name), he just focused in providing more elo for the engine.

In other words, there was a big defect that made the engine useless once *any side* had a big advantage, to the point you'd rather use free Rybka 1.0 Beta for those, and nobody cared, the engine still sold well, because the distance from second best at the time was ridiculous, people were willing to live with any such defect.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: hash collisions

Post by hgm »

The difference, though, is that this was something that users would really experience. I wonder how Vas would have reacted if someone had complined that Rubka was written in C, would have insisted that assembler would have been a better choice, and would have insisted that Vas would cure this 'flaw'.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

Terje wrote: Wed Feb 12, 2020 5:19 pm
chrisw wrote: Wed Feb 12, 2020 3:59 pm
hgm wrote: Wed Feb 12, 2020 9:23 am
mar wrote: Wed Feb 12, 2020 2:52 am
Terje wrote: Tue Feb 11, 2020 4:27 pm How would a move stored in your TT be random bits?
Of course it is, if you get a hash collision, which this thread is about.
You don't have to go this far of course, what about killers? But hey at least you know they're quiet :)

The common sense is of course to check that the move is pseudo-legal first (complicated), then you make sure pseudo-legal is legal (easy).
I thought there's no need to explain this.

So yeah, doing a random move is harmless unless it's not, assuming it's vastly more likely deep down tree and the search will compensate for that.
I think the issue here comes from a different interpretation of the word 'random'. The move there must have been a perfectly legal move when it was stored, i.e. in some other position. So you will never encounter moves like f1g4 or e4e5q.
That would depend entirely on assumptions of what the hash table memory was initialised with, if at all. It would be good practice to assume it could be filled with anything at all, including entirely random bits of junk. Assuming otherwise and getting away with it, is just storing up future problems for when something else gets done differently by some apparently disconnected new or changed code.
Is it not common to explicitly initialize the hash table to all zeroes?
In which case hgm program would be finding moves like a1 to a1, where he would move his rook to his rook. Does he test for same square moves, or possibly he relies on the side effect that he does a validity check on capturing own piece, which a1 to a1 would be. If he doesn't, and allows capture own piece with the idea of replacing it back later, he may run into crash difficulties replacing the captured piece with the piece that is already there and so on. Don't know. Instead of even discussing all this, just perform integrity test validation on any move that comes from history (hash included). Is pretty straightforward: sensible programmers, developers and programming teams write code which integrity checks itself, is robust, and can be put away as black box and is future proof. Programmers who never really need to check their work, aren't under quality control and say things like "the bugs are very rare, and all code is by definition buggy and this is a not a fault it's a feature, and generally it won't happen" and all the other lazy disorganised excuses, produce shambles code that creates chaos into the future and they spend their lives (or likely other mugs who have to use their code) fixing stuff that should never have been broken in the first place. Meanwhile, we'll send hgm off on a sightseeing trip to Mars guided with a computer programmed to mostly work except on rare occasions.

Actually zeroing out the hash table is a question. Why do it if you validate hash moves in same way as other history moves? And why do it at program start, why not every newgame or every new epd? I guess there is the question of what value is defaulted into the depth field.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

hgm wrote: Wed Feb 12, 2020 5:46 pm
chrisw wrote: Wed Feb 12, 2020 3:59 pmElse you are saying to users: "I know my program can generate idiot unlawful trees, and I could fix it at negligible cost, but I don't care."
Not at all. I would be saying: "This could be easily fixed, but the cure would be worse than the disease, so I opted for giving you the best possible product, Elo-wise".
chrisw wrote: Wed Feb 12, 2020 4:09 pmMisuse of the term "fault". Well written AB algorithm code does the best it can to find a good chess move given that it is time/space unable to search the entire tree. That the move is not certifiably "perfect" is NOT a fault, not is the code "faulty".
Code that knowingly creates pieces of chess-unlawful tree, with unpredictable chessic results, when those unlawful tree branches could be avoided by simple integrity testing, does contains "faults".
They would NOT do the best they can if they spent time on curing something that cannot even be called a cosmetic flaw (
as it would almost certainly never become visible to the user even once in his lifetime),

What can I say? I heard this and every other excuse for programmer not having to write robust code and leaving errors, faults, features, bugs and whatever and not fixing it, to last a lifetime.

Ultimately, I would fire you. Sorry. Door, P45.

time taken away from expanding the search tree.

That YOU don't like what is under the hood of a program is also not a 'fault'. (Of the program, that is...)
Not the point. Programmers are employed to produce stuff that works to the specifications and quality control of the employer and the quality controllers. Disaster one: allow programmers to be their own quality control.

Chess programs occasional do blunder. It is of no concern to the user
You don't know. You are not the "Universal User". Tester, quality controllers and customer facing staff know better. You also can't predict what some customers may demand.

whether the cause of the blunder is searching an impossible line, or of search fewer nodes than you could have searched. The bottom line is to minimize the frequency of blunders.
The bottom line is to produce code that is 100% bug free under all circumstances, including a bunch of circumstances you won't have predicted. Try the School of Hard Knocks for the appropriate lesson.