How could a compiler break the lockless hashing method?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: How could a compiler break the lockless hashing method?

Post by wgarvin »

syzygy wrote:
wgarvin wrote:The least they ought to have done, was define some semantics for it with unsigned integer types, providing that it might return ANY bit pattern, but would not trap or otherwise blow up. I have a hard time imagining a C-capable platform that could not meet that requirement.
If you allow signed overflow to return ANY bit pattern, it seems difficult to imagine any program (written by a programmer well aware that signed overflow can return ANY bit pattern) that on purpose lets integers overflow.
I was not talking about integer overflow there. I was talking about shadow spin locking (spinning on a regular read instead of a locked cmpxchg) and how I suspect it is UB under C11.

If C11 had specified that a "data race" on an unsigned int resulted in undefined values instead of UB, it would be possible to implement such things without entering UB territory, and I can't really imagine an optim that would be possible because of that UB, so there wouldn't be much downside that I can see to defining it.

For integer overflow, the costs/benefits would be different, and its clear at least that the UB of integer overflow is valuable to optimizers.
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: How could a compiler break the lockless hashing method?

Post by syzygy »

mvk wrote:
syzygy wrote:If you allow signed overflow to return ANY bit pattern, it seems difficult to imagine any program (written by a programmer well aware that signed overflow can return ANY bit pattern) that on purpose lets integers overflow. In other words, any overflow would be a bug and can easily lead to crashes, buffer overflows and any other kind of unexpected behavior. There does not seem to be much to gain here.
I found a case of signed overflow in my program where this would still be ok. (...)
So in your program overflow only occurs in cases where the results are of no interest. Ok, that appears to be a valid counterexample.

Allowing any bit pattern would probably leave most if not all optimisations intact. The expression "a + 1 > a" may still be optimised to "1". The "i += i" loop may still be optimised to "while (1)".
Mind that this case has no uninitialized variables, (...)
Now that you mention them, I'm sure there are also programs that in a very similar way read out uninitialized variables without actually using the results.
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: How could a compiler break the lockless hashing method?

Post by syzygy »

wgarvin wrote:
syzygy wrote:
wgarvin wrote:The least they ought to have done, was define some semantics for it with unsigned integer types, providing that it might return ANY bit pattern, but would not trap or otherwise blow up. I have a hard time imagining a C-capable platform that could not meet that requirement.
If you allow signed overflow to return ANY bit pattern, it seems difficult to imagine any program (written by a programmer well aware that signed overflow can return ANY bit pattern) that on purpose lets integers overflow.
I was not talking about integer overflow there. I was talking about shadow spin locking (spinning on a regular read instead of a locked cmpxchg) and how I suspect it is UB under C11.
Oops, I did not read well (at all).

Yes, a regular read would give UB if some other thread writes to the same location in a manner that is "neither before nor after" the read (taking into account C11 synchronisation primitives). I don't think C11 has an equivalent for "lock cmpxchg", so the "instead" I don't completely follow. C11 also imposes no requirements on programs using inline assembly :-)

Once you are doing such "dangerous things" you are effectively outside C11 and just have to rely on the compiler and runtime environment.
If C11 had specified that a "data race" on an unsigned int resulted in undefined values instead of UB, it would be possible to implement such things without entering UB territory, and I can't really imagine an optim that would be possible because of that UB, so there wouldn't be much downside that I can see to defining it.
Not declaring it as UB would mean you can't allow the compiled code to reload the value when that would be convenient. So you do restrict the compiler at least somewhat. And since the compiler usually will not know which data races might exist, it will be prevented from reloading values from any memory location potentially known to other threads.