hash collisions

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: hash collisions

Post by Ras »

Relying on malloc (as opposed to calloc) yielding anything other than random garbage is basically undefined behaviour, as are all other instances of using uninitialised memory. It's also how programs suddenly break after decades and programmers complain about the compiler "breaking their code" where the fact is that it has been broken all along.

True, the OS will either zero it to avoid cross-process data leaks, or give you what you already had. In the latter case, that could be as well some memory that you had other than hash tables, especially if using anything like printf, or in the case of C++, new/delete at any point.

I use memset with a non-zero value on all of the hash tables after allocation, then sync_synchronize(), then memset to 0, then sync-synchronize() again. This forces the OS to actually blend in the pages instead of doing that during search where this would go on the engine's thinking time.

As for the move validation, something like 0.7% performance penalty would result in about 0.6 Elo (assuming 60 Elo per doubling). You couldn't even measure that. Of course, there might be more impact for other engines, but my benchmark is the only one so far, the rest is talk without data.

In absence of further benchmarks, the current question is whether projected 0.6 Elo are worth the off-chance that there might be a hash table hiccup with an even smaller chance that it might have any influence on the calculated reply move. If the result were an engine crash, I cannot agree with such coding habits. If it were just an illegal line that might be calculated, but the engine keeps working and always replies with a legal move (if there is one), then it's a matter of design priorities if the programmer decides that the projected Elo loss is less than 0.6 Elo.
Rasmus Althoff
https://www.ct800.net
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: hash collisions

Post by hgm »

Well, I actually do use calloc(), so this is probably why I never had any problems. In many engines I zero the hash at the start of a new game anyway, because they are multi-variant engines, and in other variants pieces that use the same Zobrist keys might move differently. So I want to make sure I get no hits on anything from the previous game, e.g. for the initial position. And Fairy-Max also uses the hash table for storing the game history, so many positions from early in the game would survive there forever.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: hash collisions

Post by bob »

I'm going to add a couple of things that serve as responses to a few posts here.

1. I have always used a "pseudo legal move" test as far back as I can remember. Because bad moves would cause me difficulties (IE castling when impossible could add an extra king or rook, making an impossible EP capture could spawn a new pawn, unmaking an illegal capture could spawn a new piece, etc.). That was (to me) a simple thing to do.

2. Pseudo-legal move generation has long been a standard in computer chess. The only thing you have to be careful of is that in a set of moves M, a of them are perfectly legal and b of them are not. For the moves in set b, you simply need to know that making/unmaking them will not cause an issue. The best example of this is replies to check where most moves are illegal since they leave the king in check. These moves do not break anything, other than if the game could continue beyond them, bogus stuff would be backed up. Most programs avoid a problem here by noting, at the next ply, a king was captured meaning the move at the previous ply left the king in check. Return something brutal (+ infinity for the capturing side, for example) which will immediately cause that move to be rejected as outside of any possible alpha/beta window. Done this forever too and it works.

3. There is a place where pseudo-legal move generation can be replaced by a special "get out of check" move generator that only generates perfectly legal moves. I do this in Crafty. It does two things. (1) trivial to see what is available (IE want to do a one single reply to check extension? You have that information at hand right now.) (2) Avoid going through all the make - oh crap, captured a king unmake overhead. For me, in Crafty, this was a measurable performance improvement. The cost of doing the legal move generation (obviously higher than normal pseudo-legal move generation) was way more than offset by avoiding that make-recognize-unmake overhead, which was incurred on most of the pseudo-legal moves (when in check, most moves are illegal).

So for me, pseudo-legality checking (applies to killer moves and hash moves since Crafty has used a staged move generator almost as long as it has existed) makes sense. Test just enough to be sure the move won't break anything. IE simple things like "does source square on real board contain the moving piece from the encoded move? Does the destination square contain either zero or the captured piece from the move? If castling, is it actually legal (IE king and rook on right squares, castling status allow this? I even throw in capturing a king as another illegal move since it is never legal anyway.)

Years ago I used to spit out a "illegal move from hash table" if that happened. It was a pretty useful debugging tool since it indicated something had gone wrong. In current Crafty this can be done, but is turned off by default, since I only use it when testing as it indicates some sort of problem (It COULD be a hash collision but that is a very low probability, most likely it indicates that something is badly broken internally and a rash of such errors points out something has been broken in the current changes being tested.)

Seems to me this discussion has gone from a technical discussion about possible ramifications concerning hash collisions into some sort of debate on program quality and such. Whether a program searches illegal stuff or not is irrelevant. Only thing that is important is that (a) it doesn't crash due to any possible collisions and (b) searching such stuff won't affect scores or moves actually chosen. All the other stuff about correctness and such are way out there.

One final thought. I do not believe there is a single program (chess program or other application) of any size that has ZERO bugs. And the larger the program, the more of such errors are included. So getting all pedantic about the debate going on over the last ⅔ of this thread seems pretty pointless, since we live with bugs we don't know exist already. No way to drive 'em out of our programs, we can only fix one when it shows up (maybe since something might fail once, but not be reproducible, which is another reason this discussion is pretty pointless.)

My conclusions are as follows:

(1) hash collisions are not a problem from a quality of search point of view. This was convincingly shown in the collisions paper Cozzie and I wrote.

(2) bogus collisions, where you have key from position A and data/move from position B thanks to out-of-order memory writes, are slightly more probable, but again, their effect on quality of search is zero. These can be eliminated with lockless hashing and this only applies to SMP searches anyway.

(3) the only issue that remains is "can the bogus entry crash the engine?" If you can make the answer to this question "no" then all is well. There is no chance the program will crash or corrupt anything, and there is almost zero percent chance the bogus entry can affect the move played in the real game. And almost zero is not something like .1%. We are talking about probabilities that are immeasurably small. For example, if one collision every million nodes won't affect the root move chosen, then this number is incredibly small since we don't expect real collisions to be within many orders of magnitude of that already small probability. It isn't a problem so long as it won't crash the engine.

Bottom line: no crash, then no problem, nothing to worry about.
Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: hash collisions

Post by Terje »

bob wrote: Fri Feb 14, 2020 5:09 am I'm going to add a couple of things that serve as responses to a few posts here.

...
Good summary, except the outstanding issue (to me anyway) is which approach - pseudo legal test or handle making/unmaking illegal moves stably - is best. Not in the sense that it's "correct coding practice" or whatever, but best for speed / strength of the engine. Maybe I'll try the latter approach some day to compare, but if someone does in the meantime I'd love to hear the results :)
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: hash collisions

Post by Rebel »

bob wrote: Fri Feb 14, 2020 5:09 am Bottom line: no crash, then no problem, nothing to worry about.
On average, how many collisions did you and Anthony get per one million look up's ?
90% of coding is debugging, the other 10% is writing bugs.
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: hash collisions

Post by brianr »

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: hash collisions

Post by bob »

Rebel wrote: Fri Feb 14, 2020 8:22 pm
bob wrote: Fri Feb 14, 2020 5:09 am Bottom line: no crash, then no problem, nothing to worry about.
On average, how many collisions did you and Anthony get per one million look up's ?

You need to look up the paper, if you can't I can send you a version. I think it is in html and used to be on Mike Byrne's "crafty chess" web page.

The idea was that we forced a collision every N nodes. For N we tried 10, 100, 1000, 10000, 100000, etc. Small N values did not wreck the search as expected. The smaller it got, the more interesting things got. By the time we got to 1 collision every million nodes, you could hardly detect anything had happened. Ditto for 1 every 100K nodes...
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: hash collisions

Post by jdart »

Collisions are rare even with modern big hardware. If you check moves for validity (not strict legality), most illegal moves will be weeded out. So the probability of getting an illegal move through the validity check is a small probability (of collision) times a small probability (of passing validity check). That is already small enough for practical purposes. But if you really want zero probability of an illegal move and the possible resultant problems from that you can in addition apply a full legality check, taking into account edge cases like the one I mentioned where you are evading check.

Btw. I agree with Bob's observations about no software being bug free. I have worked in commercial software development for over 30 years. Most of the systems I worked on never had less than 1000 open bugs. These are defects known to exist but not fixed yet. Some would never be fixed. That is of course not counting the bugs you don't know about. There was also a study I remember of a large codebase (Microsoft's, IIRC) that indicated that in mature, production software, there was at least one bug for every page of code.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: hash collisions

Post by chrisw »

jdart wrote: Sat Feb 15, 2020 3:33 am Collisions are rare even with modern big hardware. If you check moves for validity (not strict legality), most illegal moves will be weeded out. So the probability of getting an illegal move through the validity check is a small probability (of collision) times a small probability (of passing validity check). That is already small enough for practical purposes. But if you really want zero probability of an illegal move and the possible resultant problems from that you can in addition apply a full legality check, taking into account edge cases like the one I mentioned where you are evading check.

Btw. I agree with Bob's observations about no software being bug free. I have worked in commercial software development for over 30 years. Most of the systems I worked on never had less than 1000 open bugs. These are defects known to exist but not fixed yet. Some would never be fixed. That is of course not counting the bugs you don't know about. There was also a study I remember of a large codebase (Microsoft's, IIRC) that indicated that in mature, production software, there was at least one bug for every page of code.
That’s what all programmers say, and it is not true. A chess engine should be 100% bug free, no excuses for anything else.

The kind of “bugs” you are referring to with large codebases are systemic, institutional, organisational problems, where management and quality control have not got on top of the programmers. There’s nothing more catastrophic as letting the programmers (who are no more than skilled engineers) take control of the software production systems. Once that happens the system begins to be run FOR the programmers and to their benefit, a state almost certainly at odds with 100% error free robust output and satisfying final customer.
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: hash collisions

Post by abulmo2 »

chrisw wrote: Sat Feb 15, 2020 1:08 pm That’s what all programmers say, and it is not true. A chess engine should be 100% bug free, no excuses for anything else.
What do you call a bug on a chess engine?
  1. Is an engine not returning the best move on a given position buggy?
  2. Is an engine with hash collisions buggy?
  3. Is an engine crashing on illegal input buggy?
  4. Or an engine crashing while playing a normal game with legal input?
(1) is an unreachable goal but it is on what most programmers are working on and each new engine version is expected to play better.
(2) at a small rate seems to not affect the playing strength of a program.
On illegal input it cannot handle (3) , a program should give an error message and eventually quit. But for many users such a behaviour is a bug. A program should not quit but guess the right input from the wrong one. So to crash or to stop working after displaying an error message is considered identical by many, and thus some programmers does not care about illegal input. Moreover, on normal legal inputs such crashes never happen.
(2) may be annoying, but a crash can just be considered a loss in game playing and so is not that much different from (1). If crashes are rare enough they will not even affect the overal strength of an engine.
Richard Delorme