volatile?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

Rein Halbersma wrote:
bob wrote: It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
Just read the Intel blog "What could possibly go wrong?"

Guarantees by Intel hardware are worthless when the C++ compiler is not obliged to map C++ source code with a race to the hardware instructions that you have in mind.
Hardware simply guarantees that when ANY write is done, any read occurring ANYWHERE after that write will get the new value. Period. That's all I count on, and it is 100% reliable and unbreakable. This is the case on ANY mp system that claims to provide cache coherency, which includes everything being made today that is actually capable of having more than one CPU. Picking some non-SMP compatible CPU and plugging two of 'em into a hand-blown system is not going to work since the caches are not aware of the potential problems. But who cares about home-grown systems? Intel, AMD, Alpha, Power PC, MIPS, SPARC, et. al. all do this perfectly, as they should, when the multiple-cpu versions of the chips/caches are used.

I don't care about the oddball boxes one can build that are broken.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
I was wrong to say that Intel guarantees the illusion that a read will always give the last value written by any other CPU. Not even this illusion holds true in its full generality.

Suppose memory locations x and y are initialised to 0.

Now CPU1 performs a write and a read:
mov $1, [x]
mov [y], %eax

At roughly the same time, CPU2 also performs a write and a read:
mov $1, [y]
mov [x], %eax

Now %eax for both CPU1 and CPU2 may be 0.

How can that be? If CPU1 reads 0 from [y], it must have executed the read before CPU2 executed the write, right? So CPU1 must have executed the write even earlier, and CPU2 must have executed the read even later. That means that CPU2 can only have read 1. But in reality, it may read a 0.
That is a trivial example that is well known. Has absolutely nothing to do with current discussion which would only be about ONE variable. You will NEVER get an old value with Intel.

The basis for the discussion has been optimization of memory loads across procedures, and a bunch of testing has me 100% convinced that:

(1) if the compiler can not see EVERYTHING resulting from a procedure call, such as library code that is invisible since it has already been compiled without the compiler having access to the original source, then no memory values will be carried across the procedure call. Every last one will be re-loaded, if they sit in a global memory area where they could be modified by something below this point in the call tree;

(2) if the compiler can see everything, and it verifies that a value is not modified in the call tree, it will continue to use it if it was able to preserve it in a register.

(3) other than the above two, nothing else matters. There is no special handling for library functions like pthread_mutex_lock() and friends, they simply call re-loads in every case because the compiler can not see whether they modify global variables or not.

The above is quite easy to test. I explained exactly what I did to test it. I tried a dozen different tricks, could not fool the compiler to keeping global values across procedure calls unless I let the compiler see everything in the call tree.

Yes, you can write code without volatile. Yes, you can write code without atomic locks. And yes, both can work correctly. But NOT on all architectures. Which, to me, is a major problem. I don't want to debug something every time someone offers me a new platform to run on. Crafty works, as is, on every platform on planet earth so far. That was my goal.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
syzygy wrote:POSIX TELLS YOU TO INCLUDE pthread.h AND TO LINK AGAINST THE PTHREAD LIBRARY.
The point was this:
syzygy wrote:For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.
syzygy wrote:If "lock" is based on phtreads primitives or a similar library, which is the case I am talking about, then "lock" acts as a memory barrier. The compiler will reload "variable". No need for making it volatile.
If you don't use pthreads properly, you're on your own. Understand?

No you won't understand. This is beyond you. Go ahead and troll on.
What I wrote is what is correct. The compiler will reload ANY memory variable after a procedure is called where it can not see the procedure's source code to determine which, if any, of the global variables are modified. Nothing to do with pthread.h, nothing to do with pthread library, just a pure C requirement dealing with global variables...
I don't care what you wrote. The point is that what I wrote is correct. You can jump up and down, but that's not going to change it.
I'll repeat what I wrote. Compiler does NOT treat pthread_mutex_lock() any differently than any OTHER procedure call. Period.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

Tom Likens wrote:
lucasart wrote: The reason I ask, is because I was wondering what this obscure "volatile" keyword really means. I read this short article:
https://www.kernel.org/doc/Documentatio ... armful.txt
Essentially they make the point that
In properly-written code, volatile can only serve to slow things down
They say that volatile has nothing to do with concurrency, it's almost never correct to use it, and the lock is enough.

On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff :?
It's for hardware access. When you have program memory mapped in the hardware space (i.e. memory that can be accessed by hardware registers) outside the program, the volatile keyword let's you know that the value of the register could change out from under you.

We use it all the time on PCB (printed circuit board) projects where we have external hardware registers that can program shared hardware/software registers and memory locations. I've never used it for a simple program only situation and see no real use for it if external hardware isn't involved.

regards,
--tom
Here is the simplest example of all to understand. Parallel search. You want some method to tell the search "Stop searching, time is up."

Simplest approach is a variable declared like this:

volatile int stop;

Initialize it to zero.

At the top of search, for each new node,

if (stop) {do something and return to abort the current search};

If you don't make stop volatile, the compiler is free to load the value, test it once, and then never test it again since that code never modifies the value. But with volatile, you say "but another thread or process can modify this, so test it by loading from memory every time I reference this stop variable." and it works flawlessly. Don't want to do locks to access the thing, done once per node that would cost time and gain absolutely nothing relative to correctness.

Volatile has its place, used correctly. Like any programming tool it can be used/abused incorrectly and can produce problems...
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

bob wrote:
syzygy wrote:Yes, and a pthreads implementation on alpha will insert the required barriers around mutex_lock() etc.
Absolutely does not. Tim Mann spent some time debugging that and had to do the barrier himself when he ported Crafty to the alpha.
If that is true, and if Crafty actually properly used pthreads, then the alpha environment did not comply with POSIX. That is something I cannot exclude.
You expect FAR too much of the compiler. You think it knows FAR more than it really does, particularly when dealing with libraries. But let your imagination continue to run wild if you want.
The only thing I expect is that my system is POSIX compliant.
POSIX guarantees that pthread primitives provide all the compiler and hardware memory barriers that are needed.

How this is achieved is none of my concern when I am working at the POSIX abstraction level.
Or actually TEST the stuff you write first. There are plenty of compilers around to test your various hypotheses for correctness. I posted a concrete example with Intel, produced by latest gcc.
Your program was not POSIX compliant. Get it? No, you don't.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

bob wrote:
syzygy wrote:
bob wrote:It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
I was wrong to say that Intel guarantees the illusion that a read will always give the last value written by any other CPU. Not even this illusion holds true in its full generality.

Suppose memory locations x and y are initialised to 0.

Now CPU1 performs a write and a read:
mov $1, [x]
mov [y], %eax

At roughly the same time, CPU2 also performs a write and a read:
mov $1, [y]
mov [x], %eax

Now %eax for both CPU1 and CPU2 may be 0.

How can that be? If CPU1 reads 0 from [y], it must have executed the read before CPU2 executed the write, right? So CPU1 must have executed the write even earlier, and CPU2 must have executed the read even later. That means that CPU2 can only have read 1. But in reality, it may read a 0.
That is a trivial example that is well known. Has absolutely nothing to do with current discussion which would only be about ONE variable. You will NEVER get an old value with Intel.
This is what you wrote:
bob wrote:On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
It is wrong.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
syzygy wrote:POSIX TELLS YOU TO INCLUDE pthread.h AND TO LINK AGAINST THE PTHREAD LIBRARY.
The point was this:
syzygy wrote:For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.
syzygy wrote:If "lock" is based on phtreads primitives or a similar library, which is the case I am talking about, then "lock" acts as a memory barrier. The compiler will reload "variable". No need for making it volatile.
If you don't use pthreads properly, you're on your own. Understand?

No you won't understand. This is beyond you. Go ahead and troll on.
What I wrote is what is correct. The compiler will reload ANY memory variable after a procedure is called where it can not see the procedure's source code to determine which, if any, of the global variables are modified. Nothing to do with pthread.h, nothing to do with pthread library, just a pure C requirement dealing with global variables...
I don't care what you wrote. The point is that what I wrote is correct. You can jump up and down, but that's not going to change it.
I'll repeat what I wrote. Compiler does NOT treat pthread_mutex_lock() any differently than any OTHER procedure call. Period.
Who cares!

Look at my two (identical) statements.
syzygy wrote:For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.
syzygy wrote:If "lock" is based on phtreads primitives or a similar library, which is the case I am talking about, then "lock" acts as a memory barrier. The compiler will reload "variable". No need for making it volatile.
They are correct.

Agreed?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
I was wrong to say that Intel guarantees the illusion that a read will always give the last value written by any other CPU. Not even this illusion holds true in its full generality.

Suppose memory locations x and y are initialised to 0.

Now CPU1 performs a write and a read:
mov $1, [x]
mov [y], %eax

At roughly the same time, CPU2 also performs a write and a read:
mov $1, [y]
mov [x], %eax

Now %eax for both CPU1 and CPU2 may be 0.

How can that be? If CPU1 reads 0 from [y], it must have executed the read before CPU2 executed the write, right? So CPU1 must have executed the write even earlier, and CPU2 must have executed the read even later. That means that CPU2 can only have read 1. But in reality, it may read a 0.
That is a trivial example that is well known. Has absolutely nothing to do with current discussion which would only be about ONE variable. You will NEVER get an old value with Intel.
This is what you wrote:
bob wrote:On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
It is wrong.
Sorry it is absolutely correct. Just look up their MESIF cache coherency protocol, it will explain EXACTLY why it is guaranteed to be true. That is the very definition of "cache coherent NUMA"
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
syzygy wrote:POSIX TELLS YOU TO INCLUDE pthread.h AND TO LINK AGAINST THE PTHREAD LIBRARY.
The point was this:
syzygy wrote:For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.
syzygy wrote:If "lock" is based on phtreads primitives or a similar library, which is the case I am talking about, then "lock" acts as a memory barrier. The compiler will reload "variable". No need for making it volatile.
If you don't use pthreads properly, you're on your own. Understand?

No you won't understand. This is beyond you. Go ahead and troll on.
What I wrote is what is correct. The compiler will reload ANY memory variable after a procedure is called where it can not see the procedure's source code to determine which, if any, of the global variables are modified. Nothing to do with pthread.h, nothing to do with pthread library, just a pure C requirement dealing with global variables...
I don't care what you wrote. The point is that what I wrote is correct. You can jump up and down, but that's not going to change it.
I'll repeat what I wrote. Compiler does NOT treat pthread_mutex_lock() any differently than any OTHER procedure call. Period.
Who cares!

Look at my two (identical) statements.
syzygy wrote:For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.
syzygy wrote:If "lock" is based on phtreads primitives or a similar library, which is the case I am talking about, then "lock" acts as a memory barrier. The compiler will reload "variable". No need for making it volatile.
They are correct.

Agreed?
No. If lock is based on ANY external procedure call, the compiler will reload any global variables referenced prior to the call. If no lock is called, the compiler will reload ANY global variable referenced prior to the call, UNLESS the compiler has access to the source code for the procedures being called so that it can directly determine which global variables (if any) might be modified.

I've already given a reason why locks are not bearable in some places, such as when asking "do I have work to do". Volatile solves that cleanly, correctly, and with minimal overhead. No way to do it in a minimally invasive way without volatile. Locks would simply be too much overhead and produce way too much cache traffic acquiring and releasing the lock from cache to cache.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:Yes, and a pthreads implementation on alpha will insert the required barriers around mutex_lock() etc.
Absolutely does not. Tim Mann spent some time debugging that and had to do the barrier himself when he ported Crafty to the alpha.
If that is true, and if Crafty actually properly used pthreads, then the alpha environment did not comply with POSIX. That is something I cannot exclude.
You expect FAR too much of the compiler. You think it knows FAR more than it really does, particularly when dealing with libraries. But let your imagination continue to run wild if you want.
The only thing I expect is that my system is POSIX compliant.
POSIX guarantees that pthread primitives provide all the compiler and hardware memory barriers that are needed.

How this is achieved is none of my concern when I am working at the POSIX abstraction level.
Or actually TEST the stuff you write first. There are plenty of compilers around to test your various hypotheses for correctness. I posted a concrete example with Intel, produced by latest gcc.
Your program was not POSIX compliant. Get it? No, you don't.
Let me see, my program used pthread_create(), and for alpha Tim started with pthread_mutex_lock()/pthread_mutex_unlock(), which was too much overhead (and which STILL had the barrier problem). He then went to the InterlockedExchange intrinsic on alpha, and added a barrier/fence (I do not remember specifics, could probably find it if necessary) to solve the out of order store problem...

Program was PERFECTLY posix compliant at that point when he was using nothing but pthreads library stuff.