How could a compiler break the lockless hashing method?

Rein Halbersma · Post by **Rein Halbersma** » Sun Dec 08, 2013 10:51 pm

bob wrote: OK, code given and explained. Now on to the biggee. How can the compiler detect the race condition?

1. Can it prove that two different threads store at exactly the same memory address? Absolutely not. It can't even predict which table entry will be overwritten since that is a run-time decision made by HashStore().

2. Can it prove that two different threads will have the same pointer value when they use different pointer variables? How? Again, not possible at compile time.

The compiler does not have to detect the race condition. It only has to assume there is no race condition, let Sufficiently Smart Optimizations kick in, and if your program happens to have a race condition (and it does), then the undefined behavior will bite you. Please read the Boehm paper I posted in the top message of this thread. It's full of such examples, that were tested on real world compilers.

bob · Post by **bob** » Sun Dec 08, 2013 10:51 pm

mar wrote:
syzygy wrote: No, that would have way too much impact on optimisation possibilities. And in any case "same output" is not well-defined. Do you mean the same output as the previous version of the compiler at the same optimisation level? Or the same output as the same version of the compiler at a lower optimisation level?

Compilers may do whatever they want as long as they stick to the standards they are advertised as supporting (e.g. C99, pthreads) and as long as they honour any other explicit guarantees that the compiler writers made.
I meant that a function that writes to memory should write the same values after optimization.
Consider the following: let's say you have an application that uses a simple symmetric cipher to encrypt a memory buffer
and at some other point it decrypts the same buffer using the same cipher.
The compiler may detect that and optimize it away.
Let's say it does so only to protect some sensitive data in memory.
I wouldn't like at all if the compiler would optimize that away.

I think that one problem is that some here seem to think the compilers have almost god-like skills in optimizing. In reality, a good human ASM programmer can whip any optimizing compiler known or likely to be known.

When I was doing some disassembly on a well-known "project" a while back, I even asked Richard Vida about a really stupid block of asm code. Neither of us could explain it. Nor could we get any compiler to reproduce it. Yet we had it right there in front of us. They are good. But "just good". No more than that.

bob · Post by **bob** » Sun Dec 08, 2013 10:55 pm

Rein Halbersma wrote:
bob wrote: OK, code given and explained. Now on to the biggee. How can the compiler detect the race condition?

1. Can it prove that two different threads store at exactly the same memory address? Absolutely not. It can't even predict which table entry will be overwritten since that is a run-time decision made by HashStore().

2. Can it prove that two different threads will have the same pointer value when they use different pointer variables? How? Again, not possible at compile time.
The compiler does not have to detect the race condition. It only has to assume there is no race condition, let Sufficiently Smart Optimizations kick in, and if your program happens to have a race condition (and it does), then the undefined behavior will bite you. Please read the Boehm paper I posted in the top message of this thread. It's full of such examples, that were tested on real world compilers.

That's all fine. We were, at one point, discussing "undefined behavior" which a race condition is. And the repeated conclusion was "the compiler can make demons fly out of your nose if it wants." To do so, it has to RECOGNIZE that race condition. I have all the tools I need to keep the compiler under control. For critical race conditions, volatile solves it as the compiler loses the ability to move stores around, optimize stores or loads away, etc. At a cost in speed. For some, I let it optimize until its tongue hangs out, because it is not going to cause any "undefined behavior" that I can't handle, until we reach the point where it actually says "Hey, I can prove this is a race so I am going to do something REALLY unexpected like zero a big block of memory or whatever." So long as it tries to do the right thing, all is well. And the "undefined behavior" doesn't represent a problem, IF one thinks...

syzygy · Post by **syzygy** » Sun Dec 08, 2013 10:56 pm

bob wrote:I think that one problem is that some here seem to think the compilers have almost god-like skills in optimizing. In reality, a good human ASM programmer can whip any optimizing compiler known or likely to be known.

No, your problem is that we are able to comprehend and accept the meaning and implications of the C standard, and that you seem to be plagued by demons and other types of UB.

Rein Halbersma · Post by **Rein Halbersma** » Sun Dec 08, 2013 11:00 pm

bob wrote: You can only optimize if you can absolutely prove with a data dependency graph that there is no intervening stuff between the first XOR and the second (and every other use of course).

No. Absent language level synchronization (atomics, locks), the compiler is allowed to assume that all hash reads/writes it sees in Search() are all being done without a race condition. In principle it can optimize away the XORs (although current compilers might not be able to do it, especially with a lot of code in between).

bob · Post by **bob** » Sun Dec 08, 2013 11:01 pm

syzygy wrote:
bob wrote:I think that one problem is that some here seem to think the compilers have almost god-like skills in optimizing. In reality, a good human ASM programmer can whip any optimizing compiler known or likely to be known.
No, your problem is that we are able to comprehend and accept the meaning and implications of the C standard, and that you seem to be plagued by demons and other types of UB.

As I said... we are talking about the C standard as written. The standard (new one) mentions race conditions, something it can't even detect. Seems like a really good standard to describe the behavior of something it can't even recognize???

bob · Post by **bob** » Sun Dec 08, 2013 11:07 pm

Rein Halbersma wrote:
bob wrote: You can only optimize if you can absolutely prove with a data dependency graph that there is no intervening stuff between the first XOR and the second (and every other use of course).
No. Absent language level synchronization (atomics, locks), the compiler is allowed to assume that all hash reads/writes it sees in Search() are all being done without a race condition. In principle it can optimize away the XORs (although current compilers might not be able to do it, especially with a lot of code in between).

I compile search.c, hash.c and other files SEPARATELY. How can the compiler be certain there are no locks around HashStore() when it is called from search (which it can't see when compiling hash.c) and around HashProbe() when it is called? I gave one simple example in Crafty's search. Normal search calls NextMove() normally, passing it a pointer to a non-shared split block. But at the split ply, SearchSMP() calls Next move AFTER acquiring a lock, and at the same time passing it a pointer to a shared split block. How can the compiler, when just compiling next.c, determine that? Answer is, it obviously can not. There is no rule whatsoever about where locks have to be acquired or released, other than before and after the potential race condition. Nothing about how FAR before or after. So the compiler can't assume a blasted thing if there is no lock present. In fact, how can it TELL there is a lock? Nothing in C for that, that's part of the posix threads library. Oops, I don't use pthread_mutex stuff at all, I have my own hand-coded asm lock. Will the compiler recognize every possible hand-coded asm lock? Asking way too much.

If you go to an OO-type language where you force the programmer into a box with how he expresses parallelism, sure, this could work. But for C I can spin off a thread with clone() in linux, or with fork() which starts a brand new process, not necessarily even running the same binary if I so choose, or I can physically start two different processes ala' MPI and let any of em communicate with a mmap() block of shared memory, or a shmat()'ed block of shared memory, or just plain old local memory which is always shared in threads. Compiler is clueless.

syzygy · Post by **syzygy** » Sun Dec 08, 2013 11:08 pm

mar wrote:
syzygy wrote:No, that would have way too much impact on optimisation possibilities. And in any case "same output" is not well-defined. Do you mean the same output as the previous version of the compiler at the same optimisation level? Or the same output as the same version of the compiler at a lower optimisation level?

Compilers may do whatever they want as long as they stick to the standards they are advertised as supporting (e.g. C99, pthreads) and as long as they honour any other explicit guarantees that the compiler writers made.
I meant that a function that writes to memory should write the same values after optimization.

I'm afraid you are still imprecise.

What you probably mean is that in your view the compiled code, independent of optimisation level, should eventually still write the same values to memory as the abstract machine defined by the C standard would do.

The C standard in fact agrees with you:

5.1.2.3 ad 2 wrote:Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)

So I don't think eliminating the xor-operations is allowed by the C standard.

hgm · Post by **hgm** » Sun Dec 08, 2013 11:09 pm

I don't think it can optimize anything away if Search() is in a separate file search.c. The hash table is a global variable, and it has no idea whether after the search some other routine will start reading the hash table. So it must leave the values there that the program specifies.

If the hash table is accessed through a pointer (as is usual for allocated memory), it cannot even know if the pointer points to memory that is shared with other processes. Therefore it can never mess with the stores, and store anything other than the exact thing that the program specified.

Rein Halbersma · Post by **Rein Halbersma** » Sun Dec 08, 2013 11:13 pm

bob wrote:
syzygy wrote:
bob wrote:I think that one problem is that some here seem to think the compilers have almost god-like skills in optimizing. In reality, a good human ASM programmer can whip any optimizing compiler known or likely to be known.
No, your problem is that we are able to comprehend and accept the meaning and implications of the C standard, and that you seem to be plagued by demons and other types of UB.
As I said... we are talking about the C standard as written. The standard (new one) mentions race conditions, something it can't even detect. Seems like a really good standard to describe the behavior of something it can't even recognize???

You are confusing cause and effect here. The Standard defines behavior in general as "external appearance or action", and undefined behavior as "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements." So the cause is the nonportable program, and the effect is undefined behavior.

Nowhere does the Standard require that the nonportable or erroneous program construct has to be detected at compile time. But since no requirements are imposed, a compiler can assume that the program does not contain such constructs (signed overflow, data races, overlapping ranges in strcpy(), what have you). That assumption combined with aggressive optimizations can lead to surprising output. Such surprises are known as undefined behavior. It is not just one out of a preconceived set of reasonable outcomes, it can be any outcome.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?