Gian-Carlo Pascutto wrote:bob wrote:
It is a volatile function which is a different issue that says "everything done prior to this function has to be completed before this function is executed."
Yes, so it's a barrier. Note that the volatile in there is purely a GCC convention and doesn't have anything to do with volatile variables.
I think Bob misspoke when he called it a "volatile function", but I think he's correct about the issue. The compiler doesn't understand what semantics your asm block might have, so it is constrained in how much optimization it can do to the surrounding code. In this case, it happens to make the compiler act like there is a barrier there. But you can't rely on that in general.
Barriers are non-portable. If you need one, then you have to put one explicitly. It might be a hardware instruction (telling the hardware to wait for the earlier instructions to complete before doing the later ones), a compiler intrinsic (telling the compiler not to rearrange the order of the loads or stores across the barrier) or both (i.e. it might be a compiler intrinsic that also emits an actual hardware barrier/fence instruction).
Also, whether or not you need a barrier at all, and exactly what type of barrier you need in a certain situation, is going to depend on what guarantees your hardware architecture offers about its memory ordering. In general, "volatile" is not enough to meet any of the ordering requirements for synchronization between two threads (unless your compiler adds extra semantics to "volatile" accesses; I think Microsoft does this, but some other x86 compilers do not).
bob wrote:
volatile is a compiler directive, and has nothing to do with the architecture. It simply says "this value can change spontaneously". This instructs the compiler to re-read the value from memory each time it is accessed, rather than trying to optimize and keep the value in a register. This is an issue in two common places. One is where you have a threaded/parallel application and another thread can change a value you are occasionally checking. The compiler doesn't know about the concept of threading, so it looks at your source and sees a load here from X, then a little further it sees another load from X, and concludes "OK, X is not modified between the first and second load, so I can use the first value. not good for parallel algorithms.
Hope that helps...
You forgot to mention the other common place.
I guess that it's programming for embedded devices (or device drivers for hardware in general) where memory addresses might be mapped to a hardware port instead of RAM. That's actually what volatile was invented for in the first place -- C is a popular language for that kind of low-level programming.