volatile?

syzygy · Post by **syzygy** » Fri Mar 21, 2014 10:38 pm

bob wrote:
syzygy wrote:
hgm wrote:As to the #include of the code, this still puzzles me. I can of course see that this help the compiler to se what the routines do, and thus which global variables run the risk of being changed, and which are safe. But when I #include a file that really defines a routine in more than one of my source files, I usually get a 'multiply-defined symbol' linker error. How is this prevented, in this case?
If you mean #include <pthread.h>, you do not have to worry what happens below the level of the source code you have typed. If your system complies with POSIX, and you stick to the rules (i.e. #include and link in the proper way and not copy & paste from the library source), then there is no need to make variables volatile in order to prevent optimisations from introducing bugs.

In the meantime I have understood better why "volatile" and concurrency are completely orthogonal concepts. "volatile" forces the compiler to reload values from memory, but gives no guarantee whatsoever (at the C/C++ standard level) that what you read is the value that has been written by another thread. It would be perfectly fine if the value returned is the local value in the processor's cache. volatile does not enforce cache coherency.

On a POSIX-compliant system, there is a guarantee that certain primitives synchronise memory across threads (or at least work "as if" memory is synchronised at these points).
You DO realize that when I copied the pthread_mutex_lock() code I STILL had to #include <pthread.h> correct?

Do I have to spell out everything?

POSIX TELLS YOU TO INCLUDE pthread.h AND TO LINK AGAINST THE PTHREAD LIBRARY.

syzygy · Post by **syzygy** » Fri Mar 21, 2014 10:47 pm

syzygy wrote:POSIX TELLS YOU TO INCLUDE pthread.h AND TO LINK AGAINST THE PTHREAD LIBRARY.

The point was this:

syzygy wrote:For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.

syzygy wrote:If "lock" is based on phtreads primitives or a similar library, which is the case I am talking about, then "lock" acts as a memory barrier. The compiler will reload "variable". No need for making it volatile.

If you don't use pthreads properly, you're on your own. Understand?

No you won't understand. This is beyond you. Go ahead and troll on.

bob · Post by **bob** » Fri Mar 21, 2014 10:50 pm

syzygy wrote:
hgm wrote:
syzygy wrote:"volatile" forces the compiler to reload values from memory, but gives no guarantee whatsoever (at the C/C++ standard level) that what you read is the value that has been written by another thread. It would be perfectly fine if the value returned is the local value in the processor's cache. volatile does not enforce cache coherency.
Hardware automatically forces cache coherency. There is nothing the compiler has to or can do about that.
Maybe your hardware does that, but in general it does not. Certainly the various standards do not require any automatic enforcement of cache coherency.

Eh? Please identify ONE cpu, actually used in a multiple-cpu box, where cache coherency is not handled by the cache. That's about as nonsensical an idea as I have seen you post. Coherency works on every CPU I currently have, which includes X86, Itanium, SPARC, MIPS, gone-but-not-forgotten alpha, etc. This has always been a hardware issue. Nothing you can do in software if the hardware doesn't provide coherency, you are hopelessly lost in trying to program something on such a box.

I believe the x86 architecture has a memory model that is so software friendly (and hardware unfriendly) that the pthreads library does not have to do anything special. On other architectures this is certainly different, e.g. memory writes by one thread may be observed by other threads out of order. The pthreads primitive (or those of a comparable libray) on those platforms will take care of this and the programmer will not notice anything, provided he sticks to the rules.

Hence compiler intrinsics such as on the alpha to impose a memory barrier that requires all writes to be completed before the barrier can be crossed.

Returning the local value in the core's private cache is always fine. If that wasn't the currently valid value (because it was changed in DRAM, a shared higher cache level or some other core's private cache), it would be no longer in your cache.
Already on x86 there is no general guarantee that the value read from cache is identical to the value stored in DRAM by another thread, especially if you consider multi-socket systems. What is guaranteed (on x86, not on other architectures) is that if CPU1 writes to A and then to B, and (some small time later) CPU2 reads B and then A, it will retrieve the new value from A if it retrieved the new value from B. But it is OK if it retrieves old (cached) values for both A and B or if it retrieves the old value for B and the new value for A.

It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.

bob · Post by **bob** » Fri Mar 21, 2014 10:52 pm

syzygy wrote:
bob wrote:
syzygy wrote:
hgm wrote:As to the #include of the code, this still puzzles me. I can of course see that this help the compiler to se what the routines do, and thus which global variables run the risk of being changed, and which are safe. But when I #include a file that really defines a routine in more than one of my source files, I usually get a 'multiply-defined symbol' linker error. How is this prevented, in this case?
If you mean #include <pthread.h>, you do not have to worry what happens below the level of the source code you have typed. If your system complies with POSIX, and you stick to the rules (i.e. #include and link in the proper way and not copy & paste from the library source), then there is no need to make variables volatile in order to prevent optimisations from introducing bugs.

In the meantime I have understood better why "volatile" and concurrency are completely orthogonal concepts. "volatile" forces the compiler to reload values from memory, but gives no guarantee whatsoever (at the C/C++ standard level) that what you read is the value that has been written by another thread. It would be perfectly fine if the value returned is the local value in the processor's cache. volatile does not enforce cache coherency.

On a POSIX-compliant system, there is a guarantee that certain primitives synchronise memory across threads (or at least work "as if" memory is synchronised at these points).
You DO realize that when I copied the pthread_mutex_lock() code I STILL had to #include <pthread.h> correct?
Do I have to spell out everything?

POSIX TELLS YOU TO INCLUDE pthread.h AND TO LINK AGAINST THE PTHREAD LIBRARY.

Actually it does NOT say "link against the pthread library". Absolutely NOTHING says you can't directly include the pthread library source in your program. In fact, the library specifications demand that this be true. That's the purpose of the library, to avoid having to include the source, not PREVENT you from including the source. BTW HOW does the compiler know what you are linking against? The compiler has long since finished its job by the time that step is done. Jeez the amount of disinformation that is spread here...

bob · Post by **bob** » Fri Mar 21, 2014 10:55 pm

syzygy wrote:
syzygy wrote:POSIX TELLS YOU TO INCLUDE pthread.h AND TO LINK AGAINST THE PTHREAD LIBRARY.
The point was this:
syzygy wrote:For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.

syzygy wrote:If "lock" is based on phtreads primitives or a similar library, which is the case I am talking about, then "lock" acts as a memory barrier. The compiler will reload "variable". No need for making it volatile.
If you don't use pthreads properly, you're on your own. Understand?

No you won't understand. This is beyond you. Go ahead and troll on.

What I wrote is what is correct. The compiler will reload ANY memory variable after a procedure is called where it can not see the procedure's source code to determine which, if any, of the global variables are modified. Nothing to do with pthread.h, nothing to do with pthread library, just a pure C requirement dealing with global variables...

syzygy · Post by **syzygy** » Fri Mar 21, 2014 11:00 pm

bob wrote:
syzygy wrote:
hgm wrote:
syzygy wrote:"volatile" forces the compiler to reload values from memory, but gives no guarantee whatsoever (at the C/C++ standard level) that what you read is the value that has been written by another thread. It would be perfectly fine if the value returned is the local value in the processor's cache. volatile does not enforce cache coherency.
Hardware automatically forces cache coherency. There is nothing the compiler has to or can do about that.
Maybe your hardware does that, but in general it does not. Certainly the various standards do not require any automatic enforcement of cache coherency.
Eh? Please identify ONE cpu, actually used in a multiple-cpu box, where cache coherency is not handled by the cache. That's about as nonsensical an idea as I have seen you post.

You have no facility for abstraction, so there is no way I can explain this to you any better than what I already wrote:

Certainly the various standards do not require any automatic enforcement of cache coherency.

Other people might find this interesting:

Q: Is there any guarantee by any commonly followed standard (ISO C or C++, or any of the POSIX/SUS specifications) that a variable (perhaps marked volatile), not guarded by a mutex, that is being accessed by multiple threads will become eventually consistent if it is assigned to?

A: It's going to depend on your architecture. While it is unusual to require an explicit cache flush or memory sync to ensure memory writes are visible to other threads, nothing precludes it, and I've certainly encountered platforms (including the PowerPC-based device I am currently developing for) where explicit instructions have to be executed to ensure state is flushed.

Note that thread synchronisation primitives like mutexes will perform the necessary work as required, but you don't typically actually need a thread synchronisation primitive if all you want is to ensure the state is visible without caring about consistency - just the sync / flush instruction will suffice.

EDIT: To anyone still in confustion about the volatile keyword - volatile guarantees the compiler will not generate code that explicitly caches data in registers, but this is NOT the same thing as dealing with hardware that transparently caches / reorders reads and writes. Read e.g. this or this, or this Dr Dobbs article, or the answer to this SO question, or just pick your favourite compiler that targets a weakly consistent memory architecture like Cell, write some test code and compare what the compiler generates to what you'd need in order to ensure writes are visible to other processes.

bob wrote:Hence compiler intrinsics such as on the alpha to impose a memory barrier that requires all writes to be completed before the barrier can be crossed.

Yes, and a pthreads implementation on alpha will insert the required barriers around mutex_lock() etc.

Already on x86 there is no general guarantee that the value read from cache is identical to the value stored in DRAM by another thread, especially if you consider multi-socket systems. What is guaranteed (on x86, not on other architectures) is that if CPU1 writes to A and then to B, and (some small time later) CPU2 reads B and then A, it will retrieve the new value from A if it retrieved the new value from B. But it is OK if it retrieves old (cached) values for both A and B or if it retrieves the old value for B and the new value for A.
It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.

That's the illusion guaranteed by x86. My example does not break this illusion.

syzygy · Post by **syzygy** » Fri Mar 21, 2014 11:03 pm

bob wrote:
syzygy wrote:
syzygy wrote:POSIX TELLS YOU TO INCLUDE pthread.h AND TO LINK AGAINST THE PTHREAD LIBRARY.
The point was this:
syzygy wrote:For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.

syzygy wrote:If "lock" is based on phtreads primitives or a similar library, which is the case I am talking about, then "lock" acts as a memory barrier. The compiler will reload "variable". No need for making it volatile.
If you don't use pthreads properly, you're on your own. Understand?

No you won't understand. This is beyond you. Go ahead and troll on.
What I wrote is what is correct. The compiler will reload ANY memory variable after a procedure is called where it can not see the procedure's source code to determine which, if any, of the global variables are modified. Nothing to do with pthread.h, nothing to do with pthread library, just a pure C requirement dealing with global variables...

I don't care what you wrote. The point is that what I wrote is correct. You can jump up and down, but that's not going to change it.

Rein Halbersma · Post by **Rein Halbersma** » Fri Mar 21, 2014 11:05 pm

bob wrote: It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.

Just read the Intel blog "What could possibly go wrong?"

Guarantees by Intel hardware are worthless when the C++ compiler is not obliged to map C++ source code with a race to the hardware instructions that you have in mind.

syzygy · Post by **syzygy** » Sat Mar 22, 2014 12:49 am

bob wrote:It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.

I was wrong to say that Intel guarantees the illusion that a read will always give the last value written by any other CPU. Not even this illusion holds true in its full generality.

Suppose memory locations x and y are initialised to 0.

Now CPU1 performs a write and a read:
mov $1, [x]
mov [y], %eax

At roughly the same time, CPU2 also performs a write and a read:
mov $1, [y]
mov [x], %eax

Now %eax for both CPU1 and CPU2 may be 0.

How can that be? If CPU1 reads 0 from [y], it must have executed the read before CPU2 executed the write, right? So CPU1 must have executed the write even earlier, and CPU2 must have executed the read even later. That means that CPU2 can only have read 1. But in reality, it may read a 0.

bob · Post by **bob** » Sat Mar 22, 2014 1:20 am

syzygy wrote:
bob wrote:
syzygy wrote:
hgm wrote:
syzygy wrote:"volatile" forces the compiler to reload values from memory, but gives no guarantee whatsoever (at the C/C++ standard level) that what you read is the value that has been written by another thread. It would be perfectly fine if the value returned is the local value in the processor's cache. volatile does not enforce cache coherency.
Hardware automatically forces cache coherency. There is nothing the compiler has to or can do about that.
Maybe your hardware does that, but in general it does not. Certainly the various standards do not require any automatic enforcement of cache coherency.
Eh? Please identify ONE cpu, actually used in a multiple-cpu box, where cache coherency is not handled by the cache. That's about as nonsensical an idea as I have seen you post.
You have no facility for abstraction, so there is no way I can explain this to you any better than what I already wrote:
Certainly the various standards do not require any automatic enforcement of cache coherency.
Other people might find this interesting:
Q: Is there any guarantee by any commonly followed standard (ISO C or C++, or any of the POSIX/SUS specifications) that a variable (perhaps marked volatile), not guarded by a mutex, that is being accessed by multiple threads will become eventually consistent if it is assigned to?

A: It's going to depend on your architecture. While it is unusual to require an explicit cache flush or memory sync to ensure memory writes are visible to other threads, nothing precludes it, and I've certainly encountered platforms (including the PowerPC-based device I am currently developing for) where explicit instructions have to be executed to ensure state is flushed.

Note that thread synchronisation primitives like mutexes will perform the necessary work as required, but you don't typically actually need a thread synchronisation primitive if all you want is to ensure the state is visible without caring about consistency - just the sync / flush instruction will suffice.

EDIT: To anyone still in confustion about the volatile keyword - volatile guarantees the compiler will not generate code that explicitly caches data in registers, but this is NOT the same thing as dealing with hardware that transparently caches / reorders reads and writes. Read e.g. this or this, or this Dr Dobbs article, or the answer to this SO question, or just pick your favourite compiler that targets a weakly consistent memory architecture like Cell, write some test code and compare what the compiler generates to what you'd need in order to ensure writes are visible to other processes.

bob wrote:Hence compiler intrinsics such as on the alpha to impose a memory barrier that requires all writes to be completed before the barrier can be crossed.
Yes, and a pthreads implementation on alpha will insert the required barriers around mutex_lock() etc.

Absolutely does not. Tim Mann spent some time debugging that and had to do the barrier himself when he ported Crafty to the alpha.

You expect FAR too much of the compiler. You think it knows FAR more than it really does, particularly when dealing with libraries. But let your imagination continue to run wild if you want.

Or actually TEST the stuff you write first. There are plenty of compilers around to test your various hypotheses for correctness. I posted a concrete example with Intel, produced by latest gcc.

Already on x86 there is no general guarantee that the value read from cache is identical to the value stored in DRAM by another thread, especially if you consider multi-socket systems. What is guaranteed (on x86, not on other architectures) is that if CPU1 writes to A and then to B, and (some small time later) CPU2 reads B and then A, it will retrieve the new value from A if it retrieved the new value from B. But it is OK if it retrieves old (cached) values for both A and B or if it retrieves the old value for B and the new value for A.
It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
That's the illusion guaranteed by x86. My example does not break this illusion.

volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?