How could a compiler break the lockless hashing method?

wgarvin · Post by **wgarvin** » Thu Dec 12, 2013 1:28 am

bob wrote:

bob wrote:BTW, for the divides the compiler in unoptimized mode produces the undefined behavior right shift. Go figure. It is undefined if I do it, but apparently not if the compiler does it. How exactly does that work, again???
Can you REALLY not answer this yourself?
No I really can't figure it out. Actually I already knew the answer. This one was actually handled slightly better by the standards committee. Shifting a negative value right is not undefined behavior. It is implementation dependent. Seems we get way too many claims of UB.

Uhhh.. I hope you were going for humor here, right? Otherwise this is a textbook-perfect example of one of your "nonsense" posts that I complained about before.

Ronald knows the actual answer of course, but in case anyone is still reading along who doesn't know, I will answer your question. (note: I'm not really answering it for bob's benefit, since its abundantly clear by now that the part of his brain where he stores this kind of stuff is set to 'read only'.)

bob wrote:It is undefined if I do it, but apparently not if the compiler does it. How exactly does that work, again???

You are not the compiler. Your obligations under the standard and the compiler's obligations under the standard are two completely different things. The compiler's job is to generate x86 machine code that has the standard-described semantics for all of the things that the standard defines (and can do whatever the compiler vendor wants in the cases that the standard says are undefined). The compiler really is writing x86 machine code and can--actually, must--rely on the x86 machine semantics you are so fond of, to meet its obligations according to the standard. YOU, on the other hand, are a mere mortal human C programmer writing C code. This is NOT the "high-level assembler" language you seem to think it is, but a language whose semantics are clearly spelled out in the standard. So YOU don't get to rely on x86 machine semantics for your C code. All the semantics you get is what the language standard promises to give you, or what the compiler vendors decide to add to it themselves. If this isn't enough for you, better switch back to FORTRAN or write the next version of Crafty in assembler so you can take direct advantage of x86 signed integer overflow semantics!

See the difference? The compiler generates machine code. YOU write source code in a language that you think is C. Other programmers like you do the same, with the added bonus that some of them actually understand how to write programs that are actual legal C programs. But you seem to be laboring under the bizarre delusion that your C program's semantics will somehow automatically match the semantics of the x86 machine underneath. A language-spec-defined subset of them does match, but in general, they are two completely different animals that just happen to partly overlap. And they ALWAYS have been, even if you managed to program in C for over 30 years without ever noticing (an impressive feat, by the way!

)

rbarreira · Post by **rbarreira** » Thu Dec 12, 2013 8:36 am

bob wrote:
rbarreira wrote:
bob wrote:
rbarreira wrote:
wgarvin wrote: I don't believe the compiler writers would bother to do this stuff if it didn't actually speed up some actual programs out there. Their job is to generate the best possible code while complying with the spec. Our job is to write programs, but if we want those programs to work reliably, we also have to comply with the spec.
It's actually trivial to give examples of where not using Bob's favorite instruction (1-operand imul) results in better performance. The easiest would be a situation where the compiler is using the edx register and doesn't want it clobbered. A slightly more complicated one would be auto-vectorization of code, where the pmullud SSE instruction would be used.
Read my post AGAIN. If you do a divide, you ARE going to use edx. Period. Therefore a single-operand mul makes perfect sense since you need a quadword to do a 32 bit idiv anyway. If you don't do that form of imul, you get to move the number to eax, then follow that with a cdq (another instruction that extends eax into edx, then do the idiv.

So with a divide, you use edx or else don't do the divide. Unless you decide to go to SSE or whatever. Which is yet another can of worms.
But if I write "x * y / 2" then idiv is not needed. So that's just wrong.
What are you going to do? Shift a signed integer to the right? And get undefined behavior? or possibly try to optimize out the idiv and use a shrd instruction? Either one works for me. the shrd would be faster, but hey, aren't you advocating "no undefined behavior"???

How are you going to divide by 2?

Use the shr / sar instruction, obviously. It's not undefined in x86 assembly, as Wylie just explained.

bob · Post by **bob** » Thu Dec 12, 2013 7:12 pm

rbarreira wrote:
bob wrote:
rbarreira wrote:
bob wrote:
rbarreira wrote:
wgarvin wrote: I don't believe the compiler writers would bother to do this stuff if it didn't actually speed up some actual programs out there. Their job is to generate the best possible code while complying with the spec. Our job is to write programs, but if we want those programs to work reliably, we also have to comply with the spec.
It's actually trivial to give examples of where not using Bob's favorite instruction (1-operand imul) results in better performance. The easiest would be a situation where the compiler is using the edx register and doesn't want it clobbered. A slightly more complicated one would be auto-vectorization of code, where the pmullud SSE instruction would be used.
Read my post AGAIN. If you do a divide, you ARE going to use edx. Period. Therefore a single-operand mul makes perfect sense since you need a quadword to do a 32 bit idiv anyway. If you don't do that form of imul, you get to move the number to eax, then follow that with a cdq (another instruction that extends eax into edx, then do the idiv.

So with a divide, you use edx or else don't do the divide. Unless you decide to go to SSE or whatever. Which is yet another can of worms.
But if I write "x * y / 2" then idiv is not needed. So that's just wrong.
What are you going to do? Shift a signed integer to the right? And get undefined behavior? or possibly try to optimize out the idiv and use a shrd instruction? Either one works for me. the shrd would be faster, but hey, aren't you advocating "no undefined behavior"???

How are you going to divide by 2?
Use the shr / sar instruction, obviously. It's not undefined in x86 assembly, as Wylie just explained.

I was quoting the "undefined behavior" comment from someone else. I had already pointed out it is not undefined, it is "implementation dependent".

And better NOT use shr, btw. That will certainly not come close to working. Not if the number is negative.

bob · Post by **bob** » Thu Dec 12, 2013 7:17 pm

wgarvin wrote:
bob wrote:

bob wrote:BTW, for the divides the compiler in unoptimized mode produces the undefined behavior right shift. Go figure. It is undefined if I do it, but apparently not if the compiler does it. How exactly does that work, again???
Can you REALLY not answer this yourself?
No I really can't figure it out. Actually I already knew the answer. This one was actually handled slightly better by the standards committee. Shifting a negative value right is not undefined behavior. It is implementation dependent. Seems we get way too many claims of UB.
Uhhh.. I hope you were going for humor here, right? Otherwise this is a textbook-perfect example of one of your "nonsense" posts that I complained about before.

Ronald knows the actual answer of course, but in case anyone is still reading along who doesn't know, I will answer your question. (note: I'm not really answering it for bob's benefit, since its abundantly clear by now that the part of his brain where he stores this kind of stuff is set to 'read only'.)

bob wrote:It is undefined if I do it, but apparently not if the compiler does it. How exactly does that work, again???
You are not the compiler. Your obligations under the standard and the compiler's obligations under the standard are two completely different things. The compiler's job is to generate x86 machine code that has the standard-described semantics for all of the things that the standard defines (and can do whatever the compiler vendor wants in the cases that the standard says are undefined). The compiler really is writing x86 machine code and can--actually, must--rely on the x86 machine semantics you are so fond of, to meet its obligations according to the standard. YOU, on the other hand, are a mere mortal human C programmer writing C code. This is NOT the "high-level assembler" language you seem to think it is, but a language whose semantics are clearly spelled out in the standard. So YOU don't get to rely on x86 machine semantics for your C code. All the semantics you get is what the language standard promises to give you, or what the compiler vendors decide to add to it themselves. If this isn't enough for you, better switch back to FORTRAN or write the next version of Crafty in assembler so you can take direct advantage of x86 signed integer overflow semantics!

See the difference? The compiler generates machine code. YOU write source code in a language that you think is C. Other programmers like you do the same, with the added bonus that some of them actually understand how to write programs that are actual legal C programs. But you seem to be laboring under the bizarre delusion that your C program's semantics will somehow automatically match the semantics of the x86 machine underneath. A language-spec-defined subset of them does match, but in general, they are two completely different animals that just happen to partly overlap. And they ALWAYS have been, even if you managed to program in C for over 30 years without ever noticing (an impressive feat, by the way! )

Somewhere in there or in another post I had pointed out this was not "undefined behavior" as someone else had claimed. It is "implementation dependent". Much of it was obviously "in jest".

BTW the semantics are anything BUT "clearly spelled out in the standards." I don't consider "undefined behavior" to be "clearly spelled out." Particularly when things like signed integer overflow are clearly defined in the x86 specs.

syzygy · Post by **syzygy** » Thu Dec 12, 2013 7:24 pm

bob wrote:BTW the semantics are anything BUT "clearly spelled out in the standards." I don't consider "undefined behavior" to be "clearly spelled out."

The whole point of UB is that the semantics of a C program is undefined once it exhibits any kind of UB.

Particularly when things like signed integer overflow are clearly defined in the x86 specs.

Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

This distinction is important. Maybe I should repeat it.

Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

There.

wgarvin · Post by **wgarvin** » Thu Dec 12, 2013 7:32 pm

bob wrote:Somewhere in there or in another post I had pointed out this was not "undefined behavior" as someone else had claimed. It is "implementation dependent". Much of it was obviously "in jest".

Ok, fair enough. I think I was mostly reacting to this part:

bob wrote: It is undefined if I do it, but apparently not if the compiler does it. How exactly does that work, again???

and it seemed like you were implying those two scenarios were the same or even remotely similar. But since that is obviously absurd, you would never imply that. Right???

jdart · Post by **jdart** » Thu Dec 12, 2013 8:25 pm

If that happens, what is the result? Nothing worse than the original hash collision problem we all tolerate when we elect to use 64 bit signatures for a position that can not be uniquely represented using 64 bits. The paper Cozzie and I wrote show that even one false match every 1000 nodes has a minimal effect on a program, where this is a false match every N billion nodes where N is pretty big itself.

How bad a failure is depends on what you use the hash entry for. One reason to query the hashtable is to get a PV move to search first. Obtaining and executing an illegal move for the current position can be catastrophic. My program also stores the in check status for a position and getting this wrong can be very serious, since you don't want to execute the evasion move generator when not in check. The validity of this info in the table does not depend on draft.

What I have found recently is that it is apparently unsafe to use this info while relying on lockless hashing alone to guarantee consistent retrieval. The move info may be usable if you do some checks on it for validity (I actually do this but an experiment removing it caused crashes). The check status may need to be computed w/o using the cached value.

--Jon

abulmo · Post by **abulmo** » Thu Dec 12, 2013 9:14 pm

syzygy wrote:
bob wrote:BTW the semantics are anything BUT "clearly spelled out in the standards." I don't consider "undefined behavior" to be "clearly spelled out."
The whole point of UB is that the semantics of a C program is undefined once it exhibits any kind of UB.

Particularly when things like signed integer overflow are clearly defined in the x86 specs.
Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

This distinction is important. Maybe I should repeat it.

Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C progrwam.

There.

It looks like the misunderstanding comes from the important source of undefined behaviour being the binary representation of negative numbers. Actually the problem is the possibility offered by the C standard of trap representations. An integer overflow can lead to a trap representation and abort the program, for example. Or anything else allowed by undefined behaviour. gcc invoked the -ftrapv option is an example of such a compiler.

bob · Post by **bob** » Thu Dec 12, 2013 9:16 pm

jdart wrote:
If that happens, what is the result? Nothing worse than the original hash collision problem we all tolerate when we elect to use 64 bit signatures for a position that can not be uniquely represented using 64 bits. The paper Cozzie and I wrote show that even one false match every 1000 nodes has a minimal effect on a program, where this is a false match every N billion nodes where N is pretty big itself.
How bad a failure is depends on what you use the hash entry for. One reason to query the hashtable is to get a PV move to search first. Obtaining and executing an illegal move for the current position can be catastrophic. My program also stores the in check status for a position and getting this wrong can be very serious, since you don't want to execute the evasion move generator when not in check. The validity of this info in the table does not depend on draft.

What I have found recently is that it is apparently unsafe to use this info while relying on lockless hashing alone to guarantee consistent retrieval. The move info may be usable if you do some checks on it for validity (I actually do this but an experiment removing it caused crashes). The check status may need to be computed w/o using the cached value.

--Jon

I've run literally zillions of nodes testing lockless hashing on big boxes, and have not found any failures at all. I have not successfully created a hash collision error, in fact, a case where the signatures match but the positions do not. Obviously that can happen.

However, if you have data that you can not safely validity check, and where an error is catastrophic, I would presume that ANY form of hashing would therefore be considered unusable, because one can prove that hash collisions will occur, just not their frequency, since hash signatures are not particularly uniformly distributed across the entire 64 bit space...

I've always had to validity-check moves, because an illegal move will wreck the chessboard in an unrepairable way, leaving the program with broken information that will completely wreck the search. Without lockless hashing, using 12 cores, I get a large number of bad hash move errors. With just one thread, searching 12x longer, I get zero. With lockless, I may see one or two per game. I'll run that test again to confirm.

bob · Post by **bob** » Thu Dec 12, 2013 9:21 pm

syzygy wrote:
bob wrote:BTW the semantics are anything BUT "clearly spelled out in the standards." I don't consider "undefined behavior" to be "clearly spelled out."
The whole point of UB is that the semantics of a C program is undefined once it exhibits any kind of UB.

Particularly when things like signed integer overflow are clearly defined in the x86 specs.
Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

This distinction is important. Maybe I should repeat it.

Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

There.

Fine. Yet I am writing C on a compiler written for the X86, where I expect the program to run on x86, so I don't expect oddball things to break that work perfectly ON x86. And the compiler COULD do exactly that. It SHOULD do exactly that. But a few keep repeating the "but undefined behavior means anything can happen" which is actually a false statement much of the time. Certainly integer overflow on any hardware from the last 30 years, excepting for Univac which has been dead for about that long, is not "undefined at all. It is very clearly specified. And I see no reason for a compiler to not revert to "OK, on this hardware this is what happens, so let's do that, as opposed to doing something whacko, just because we can.

I realize that there are those of us in software development that have reasonable interpretations of how things SHOULD be done, and there are those that lather up when they see things that mention "undefined behavior" in the spec. I consider that to be poor. Why hide capabilities of the hardware I paid for, just because the spec is vague, when the compiler COULD "do the right thing." I know what your answer will be, the robotic "undefined behavior means..." You can accept that nonsense if you like, I simply do not like it, and do not agree with it...

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?