hash collisions

hgm · Post by **hgm** » Wed Feb 12, 2020 10:42 pm

chrisw wrote: ↑Wed Feb 12, 2020 8:15 pmUltimately, I would fire you. Sorry. Door, P45.

Well, so I would start my own company and run you out of business.

Not the point. Programmers are employed to produce stuff that works to the specifications and quality control of the employer and the quality controllers. Disaster one: allow programmers to be their own quality control.

Well, in any case I would be the one dictating the specifications. I have not much interest in being a programmer ordered to make stupid programs...

You don't know. You are not the "Universal User". Tester, quality controllers and customer facing staff know better. You also can't predict what some customers may demand.

Simple logic. What they cannot see in any way will not matter to them.

You seem to describe a parallel universe different from the one we live in. Already many decades ago dedicated chess computers were succesfully sold with the explicit instruction that using the hash table, without which it would play hundreds of Elo weaker, could occasionally lead to faulty moves or lines, and that to be really sure (e.g. when solving puzzles), you'd better run them without using hash.

The bottom line is to produce code that is 100% bug free under all circumstances, including a bunch of circumstances you won't have predicted. Try the School of Hard Knocks for the appropriate lesson.

So according to you chess programmers cannot use hash tables, as these would occasionally mask repetition draws. Dream on...

hgm · Post by **hgm** » Wed Feb 12, 2020 11:01 pm

chrisw wrote: ↑Wed Feb 12, 2020 8:04 pmIn which case hgm program would be finding moves like a1 to a1, where he would move his rook to his rook. Does he test for same square moves, or possibly he relies on the side effect that he does a validity check on capturing own piece, which a1 to a1 would be. If he doesn't, and allows capture own piece with the idea of replacing it back later, he may run into crash difficulties replacing the captured piece with the piece that is already there and so on.

I usually take care to do things in a way / order that would make it resistant to such things. E.g. put the piece that was on the from-square in MakeMove back, rather than the one on the to-square at the time of UnMake. (The latter would be wrong anyway, in the case of a promotion.) Put back the piece on the from-square after clearing the to-square (to make sure the piece would not disappear if these are the same, as they can for instance be for Lion captures in Chu Shogi).

The case you mention is actually solved by interpreting 0 as an invalid hash move. Not every hash entry will have a hash move (it could have resulted from a fail-low). Since I don't use separate entries for fail-low and fail-high results, all entries will have a move field. So I need some code that can be put in the hash field to indicate "don't try this as a hash move", and code that obeys this instruction. By using 0 for that, hash moves from unused entries that were initialized to 0 (as in my experience they always are after malloc(); I think the standard demands that, and in any case OS security demands some sort of initialization for memory allocated by a program, and using something different from 0 for that would be plainly malicious) will never be tried. So it is really a non-problem.

Terje · Post by **Terje** » Thu Feb 13, 2020 4:50 am

chrisw wrote: ↑Wed Feb 12, 2020 8:04 pm
Terje wrote: ↑Wed Feb 12, 2020 5:19 pm Is it not common to explicitly initialize the hash table to all zeroes?
Actually zeroing out the hash table is a question. Why do it if you validate hash moves in same way as other history moves? And why do it at program start, why not every newgame or every new epd? I guess there is the question of what value is defaulted into the depth field.

Even ignoring the problems of not zeroing the non-move fields, zeroing the hash allows you to assume moves found in the table are either real moves, or 0. This simplifies the validation immensely.

Doing it after allocating and on ucinewgame is the norm afaik.

chrisw · Post by **chrisw** » Thu Feb 13, 2020 6:42 am

Terje wrote: ↑Thu Feb 13, 2020 4:50 am
chrisw wrote: ↑Wed Feb 12, 2020 8:04 pm
Terje wrote: ↑Wed Feb 12, 2020 5:19 pm Is it not common to explicitly initialize the hash table to all zeroes?
Actually zeroing out the hash table is a question. Why do it if you validate hash moves in same way as other history moves? And why do it at program start, why not every newgame or every new epd? I guess there is the question of what value is defaulted into the depth field.
Even ignoring the problems of not zeroing the non-move fields, zeroing the hash allows you to assume moves found in the table are either real moves, or 0.

except real move for knight != real move for rook and so on, thus you’re still faced with needing to make sure the move offset matches the piece type on the origin square.

I don’t get the difficulty with the validation checker.
If not ourpieces & s1_bb return false
If ourpieces & s2_bb return false
If not movemask[s1][s1_type][colour] & s2_bb return false
If inbetween[s1][s2] return false

Now you have a valid move, valid offset, just remains to check off special cases, capture, ep and castles.
The test for the move offset above is two lines and two precomputed tables (both of which you’ll be needing for other code anyway).

And in the above code an initialisation with zeros, or any S1=s2 values, will save you some cycles, until the hash or more or less full up, in the case of a clash. But the save is infrequent and small, and doesn’t mean you can skip the following offset matching test.

This simplifies the validation immensely.

Doing it after allocating and on ucinewgame is the norm afaik.

chrisw · Post by **chrisw** » Thu Feb 13, 2020 6:43 am

hgm wrote: ↑Wed Feb 12, 2020 11:01 pm
chrisw wrote: ↑Wed Feb 12, 2020 8:04 pmIn which case hgm program would be finding moves like a1 to a1, where he would move his rook to his rook. Does he test for same square moves, or possibly he relies on the side effect that he does a validity check on capturing own piece, which a1 to a1 would be. If he doesn't, and allows capture own piece with the idea of replacing it back later, he may run into crash difficulties replacing the captured piece with the piece that is already there and so on.
I usually take care to do things in a way / order that would make it resistant to such things. E.g. put the piece that was on the from-square in MakeMove back, rather than the one on the to-square at the time of UnMake. (The latter would be wrong anyway, in the case of a promotion.) Put back the piece on the from-square after clearing the to-square (to make sure the piece would not disappear if these are the same, as they can for instance be for Lion captures in Chu Shogi).

The case you mention is actually solved by interpreting 0 as an invalid hash move. Not every hash entry will have a hash move (it could have resulted from a fail-low). Since I don't use separate entries for fail-low and fail-high results, all entries will have a move field. So I need some code that can be put in the hash field to indicate "don't try this as a hash move", and code that obeys this instruction. By using 0 for that, hash moves from unused entries that were initialized to 0 (as in my experience they always are after malloc(); I think the standard demands that, and in any case OS security demands some sort of initialization for memory allocated by a program, and using something different from 0 for that would be plainly malicious) will never be tried. So it is really a non-problem.

Wing and a prayer programming

chrisw · Post by **chrisw** » Thu Feb 13, 2020 6:56 am

hgm wrote: ↑Wed Feb 12, 2020 10:42 pm
chrisw wrote: ↑Wed Feb 12, 2020 8:15 pmUltimately, I would fire you. Sorry. Door, P45.
Well, so I would start my own company and run you out of business.

Not the point. Programmers are employed to produce stuff that works to the specifications and quality control of the employer and the quality controllers. Disaster one: allow programmers to be their own quality control.
Well, in any case I would be the one dictating the specifications. I have not much interest in being a programmer ordered to make stupid programs...

well, I can only repeat, if you are your own quality controller, then quality of your program depends on your attitude combined with the amount of skin in the game you have in for ensuring quality. From commentary I’m not optimistic.

You don't know. You are not the "Universal User". Tester, quality controllers and customer facing staff know better. You also can't predict what some customers may demand.
Simple logic. What they cannot see in any way will not matter to them.

well, what you do for your own code is one thing, but you’re advocating really bad practice here which is not good for a computer science academic.

You seem to describe a parallel universe different from the one we live in.

silly. who is “we” here?

Already many decades ago dedicated chess computers were succesfully sold with the explicit instruction that using the hash table, without which it would play hundreds of Elo weaker, could occasionally lead to faulty moves or lines, and that to be really sure (e.g. when solving puzzles), you'd better run them without using hash.

meaningless data point rejected.

The bottom line is to produce code that is 100% bug free under all circumstances, including a bunch of circumstances you won't have predicted. Try the School of Hard Knocks for the appropriate lesson.
So according to you chess programmers cannot use hash tables, as these would occasionally mask repetition draws. Dream on...

well, that’s a piece of made up nonsense.
Bottom line is you’re advocating bad practice.

hgm · Post by **hgm** » Thu Feb 13, 2020 8:22 am

More like you are advocating poor performance for no other reason than paranoia...

Hard facts are that searching an occasional (like once in a million nodes) impossible line will or can be made completely invisible to any user, and as Bob's research has shown will have no measurable impact on move quality. 'Specifications' that stipulate it exist only in your mind; no one here writes chess programs for a boss who meddles with how they should work internally. Every strong program in fact does it: null move is illegal and creates an impossible line. Users can and thus will not complain about what they cannot know.

Your whole argument is based on the delusion that 'quality' is a measure of how well something conforms to your preferences.

mar · Post by **mar** » Thu Feb 13, 2020 11:28 am

I was wrong about the "random bits", but my move validator is immune to any random combination regardless (I verified this by brute-force) and I'm not going to change it.
What I meant was that a pseudo-legal (even legal) move in position A is not even pseudo-legal in position B (it may be but the probability is low).

When it comes to zeroing memory, any sane OS will clear the physical memory once you touch the page (for security reasons, so that you don't get to see any pages that were used by another process) - or maybe it'd clear the page on free, that would make more sense as it would be safer,
but I very much doubt malloc clears memory by default (that would be slow because since we have paging, you only get physical memory once you touch it) - the same applies to stack which grows on demand until a per thread hard limit (iOS has different defaults for main thread and for user-defined threads AFAIK).

If I wanted a "safe" heap, then I'd clear on free (just in case there was a plaintext password string stored there (for example, applies to any kind of sensitive data) and someone would be able to read your process' memory) and do nothing for alloc/realloc.
You still need worry about the stack though, I've read a fun story where a compiler optimized away a memset on a stack buffer.

Anyway, I always clear the hashtable on resize (and I assume everyone else does) so I don't see any real problem here.
In theory you could rehash (if you store the full key in TT), but you would have to make sure the reallocation will start at the same pointer (otherwise you'd waste memory) - but I don't think users change TT size often once the engine is loaded.

hgm · Post by **hgm** » Thu Feb 13, 2020 11:40 am

You are right, malloc() should not do this, but the OS. It is just that my engines only malloc and free memory for the TT, so even reallocating a TT can give only entries that contained a possible chess move or no move at all. The problem after reallocation is not different from what you have after starting a new game with the same hash table.

mar · Post by **mar** » Thu Feb 13, 2020 12:24 pm

Hmm, I think you should not assume anything about the data you get from malloc. calloc or malloc + explicit zeroing shouldn't be a big deal.
I don't know about what glibc does, but microsoft C runtime uses special fillers in debug mode,
so a block after malloc gets filled with 0xcd, after free it's filled with something else (0xfeee?).
This is useful because you can immediately spot problems like dangling pointers when debugging.

hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions