mar wrote:matthewlai wrote:Volatile is not designed to be a synchronization primitive, though. It's supposed to be used for things like memory-mapped IO. Though it does seem like Microsoft compiler inserts fence for volatile access, probably because too many programmers assume that. 
Or because it makes sense 

 Btw. I'm more concerned about reordering on CPU (compiler barrier != CPU barrier)
 
Well, the end results of CPU reordering and compiler reordering are the same - your memory accesses are reordered. When you issue a fence, it's the compiler's job to make sure the CPU doesn't reorder as well (usually by issuing a memory fence instruction).
It doesn't make sense for people using volatile as intended - to access memory-mapped IO devices. In those cases, a memory barrier introduces extra latency for no reason. If you need to stream a large buffer of data into a hardware address (through a volatile pointer), and don't have DMA, having each access trigger a memory barrier would be detrimental to performance.
The standard also doesn't say the compiler need to do that.
Microsoft probably doesn't care about that because their compiler can't target embedded devices anyways (deeply embedded, not phones etc, which are really more like PCs), but other compilers do.
Regarding atomics, what implementation of atomics are you using? For most implementations, if the atomic size you want is already atomic on the architecture in question, the compiler will optimize it to be just a plain old variable access. There should be zero overhead at run time.
Only on architectures where accesses to the size you want is not atomic, should there be an overhead.
I'm talking about RMW atomics that make more sense to me, like inc/dec and CAS. IIRC the overhead for increment was 8x on x86 (lock xadd), but I don't remember
(this is one of the reasons why atomic refcounting is slower than nonatomic, assuming you change the counter often).[/quote]
But why do you need that for a flag? Assignments aren't RMW, and should be atomic on most/all architectures.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.