volatile?

AlvaroBegue · Post by **AlvaroBegue** » Thu Mar 20, 2014 9:35 pm

hgm wrote:
syzygy wrote:After pthread_mutex_lock() the compiler knows it must reload all values from memory.
So the compiler recognizes this specific function call, and does not treat it like any other?

That is correct. Since you didn't follow the advice of looking it up, here it is: http://pubs.opengroup.org/onlinepubs/96 ... #tag_04_11

Oh, and this is rather long, but it is very very informative: http://channel9.msdn.com/Shows/Going+De ... ons-1-of-2

syzygy · Post by **syzygy** » Thu Mar 20, 2014 9:56 pm

hgm wrote:
Rein Halbersma wrote: Contention for locks hurts performance, not the locks themselves.
Well, the last time I looked machine instructions typically used for implementing locks, like XCHG reg, mem, or instructions using an explicit LOCK prefix, where incredibly expensive (like 48 clocks). I don't know if recent CPUs now have made that problem completely go away.

On modern CPUs the lock prefix almost does not cost anything, provided the cache line is uncontended.

syzygy wrote:After pthread_mutex_lock() the compiler knows it must reload all values from memory.
So the compiler recognizes this specific function call, and does not treat it like any other?

Correct. (In fact, it probably treats it as a function call about which it knows absolutely nothing, which means anything can have happened to global memory when it returns.)

lucasart · Post by **lucasart** » Fri Mar 21, 2014 12:29 am

mcostalba wrote:
lucasart wrote: On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff
A variable accessed under lock protection does not require to be defined 'volatile'.

Instead if accessed by many threads outside lock protection is better to define volatile, although of course this doesn't give you any protection against races and you really need to know what you are doing.

But races are an intrinsic part of a SMP chess engine, for instance TT table is intrinsically racy for performance reasons, because to protect with lock it would be very slow...this is not a problem per-se, as long as you know it and you deal with it.

I see. It's a calculated risk.

Senpai follows the conservative approach to use locks more systematically. As Fabien said, he wanted to get it right first, before optimizing (and risking to spend time debugging tricky SMP races).

When you say accesses "under lock protection", you mean write and read? For example, if I have a function that reads from a shared variable at the beginning and the end of the function, I would have to lock before the first read, and unlock after the last read? So the whole function would be under lock protection? Otherwise the compiler (in the absence of volatile) may assume the variable hasn't changed and not actually go and read it in memory. In that context, volatile seems to be a good compromise to write racy code that is at least safe from dangerous compiler optimizations (although not safe from races, which are dealt with, and assumed to be rare enough that we don't care).

Also, what about atomicity? If one thread modified a shared variable and another one modifies it at the same time. Is it possible that the bytes in memory composing that variable and up all scrambled?

syzygy · Post by **syzygy** » Fri Mar 21, 2014 12:56 am

lucasart wrote:Also, what about atomicity? If one thread modified a shared variable and another one modifies it at the same time. Is it possible that the bytes in memory composing that variable and up all scrambled?

If both threads write a 32-bit int to the same 4 bytes of memory within a single cacheline, then this write is guaranteed to be atomic. In other words, the end result is one of the two 32-bit ints and not a mixture of the two.

On x86-64 the same applies to 64-bit ints.

Obviously the same holds for 16-bit and 8-bit itns.

syzygy · Post by **syzygy** » Fri Mar 21, 2014 1:14 am

lucasart wrote:When you say accesses "under lock protection", you mean write and read? For example, if I have a function that reads from a shared variable at the beginning and the end of the function, I would have to lock before the first read, and unlock after the last read? So the whole function would be under lock protection? Otherwise the compiler (in the absence of volatile) may assume the variable hasn't changed and not actually go and read it in memory. In that context, volatile seems to be a good compromise to write racy code that is at least safe from dangerous compiler optimizations (although not safe from races, which are dealt with, and assumed to be rare enough that we don't care).

Strictly speaking, once there is a race (i.e. the theoretical possibility of a race), the program has UB and the standard (C11/C++11 or pthreads+C/C++) makes no guarantee whatsoever. See the UB threads.

In practice, with the compilers of today, it will probably work if you have a volatile shared variable that is written under lock protection and read without. You lose optimisation possibilities, but in return you don't have to take a lock. At least I don't currently see how it could go wrong. But I may be overlooking something. (edit: I would not be surprised if it can fail on architectures with fewer memory ordering guarantees than x86.)

Btw, senpai does not seem to synchronise access to hashtable entries. Hash entries have a "lock" field, but that is just part of the hash key. So with more than one thread it exhibits UB.

syzygy · Post by **syzygy** » Fri Mar 21, 2014 1:38 am

An explanation of why volatile is not (or just seldom?) useful to solve concurreny problems can be found here.

Basically volatile ensure that the read access actually happens, but this does not tell you when it happens. In principle the compiler is free to reorder the read access. You need a memory fence to prevent this. But once you have a memory fence, you don't need the volatile keyword anymore.

The comp.programming.threads FAQ also has something to say (but this is more for Bob):

Code: Select all

 Q56&#58; Why don't I need to declare shared variables VOLATILE?  


> I'm concerned, however, about cases where both the compiler and the
> threads library fulfill their respective specifications.  A conforming
> C compiler can globally allocate some shared &#40;nonvolatile&#41; variable to
> a register that gets saved and restored as the CPU gets passed from
> thread to thread.  Each thread will have it's own private value for
> this shared variable, which is not what we want from a shared
> variable.

In some sense this is true, if the compiler knows enough about the
respective scopes of the variable and the pthread_cond_wait &#40;or
pthread_mutex_lock&#41; functions. In practice, most compilers will not try
to keep register copies of global data across a call to an external
function, because it's too hard to know whether the routine might
somehow have access to the address of the data.

So yes, it's true that a compiler that conforms strictly &#40;but very
aggressively&#41; to ANSI C might not work with multiple threads without
volatile. But someone had better fix it. Because any SYSTEM &#40;that is,
pragmatically, a combination of kernel, libraries, and C compiler&#41; that
does not provide the POSIX memory coherency guarantees does not CONFORM
to the POSIX standard. Period. The system CANNOT require you to use
volatile on shared variables for correct behavior, because POSIX
requires only that the POSIX synchronization functions are necessary.

So if your program breaks because you didn't use volatile, that's a BUG.
It may not be a bug in C, or a bug in the threads library, or a bug in
the kernel. But it's a SYSTEM bug, and one or more of those components
will have to work to fix it.

You don't want to use volatile, because, on any system where it makes
any difference, it will be vastly more expensive than a proper
nonvolatile variable. &#40;ANSI C requires "sequence points" for volatile
variables at each expression, whereas POSIX requires them only at
synchronization operations -- a compute-intensive threaded application
will see substantially more memory activity using volatile, and, after
all, it's the memory activity that really slows you down.)

/---&#91; Dave Butenhof &#93;-----------------------&#91; butenhof@zko.dec.com &#93;---\
| Digital Equipment Corporation           110 Spit Brook Rd ZKO2-3/Q18 |
| 603.881.2218, FAX 603.881.0120                  Nashua NH 03062-2698 |
\-----------------&#91; Better Living Through Concurrency &#93;----------------/

So: POSIX says you do not need volatile.

rbarreira · Post by **rbarreira** » Fri Mar 21, 2014 1:40 am

bob wrote:
syzygy wrote:
bob wrote:The code looks ugly to me and is probably not safe.
If you had just read:
syzygy wrote:It seems splitPoint->moveCount is always accessed under lock protection (except for an assert), so it seems it could be made non-volatile.
Please spare us your confused "contributions" if you can't take the time to read first.
Please read what I wrote. Locks do NOT avoid the need for volatile. Not now, not ever. I might not be able to acquire the lock if someone else has it, but I can certainly keep a cached value of the variable for a long time, which can certainly be a problem...

You are wrong.

When a compiler sees a pthread_mutex_lock call (or many other kinds of API calls for that matter) it has to assume "anything can happen here, so don't make optimizations relying on things staying in the same state before and after the call".

To be more precise, at least it has to behave as if it did that (which does not allow it to mandate the use of volatile in code).

syzygy · Post by **syzygy** » Fri Mar 21, 2014 1:56 am

Another interesting FAQ entry:
Q162: Cache Architectures, Word Tearing, and VOLATILE

mcostalba · Post by **mcostalba** » Fri Mar 21, 2014 3:40 am

lucasart wrote:
mcostalba wrote:
lucasart wrote: On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff
A variable accessed under lock protection does not require to be defined 'volatile'.

Instead if accessed by many threads outside lock protection is better to define volatile, although of course this doesn't give you any protection against races and you really need to know what you are doing.

But races are an intrinsic part of a SMP chess engine, for instance TT table is intrinsically racy for performance reasons, because to protect with lock it would be very slow...this is not a problem per-se, as long as you know it and you deal with it.
I see. It's a calculated risk.

Senpai follows the conservative approach to use locks more systematically. As Fabien said, he wanted to get it right first, before optimizing (and risking to spend time debugging tricky SMP races).

When you say accesses "under lock protection", you mean write and read? For example, if I have a function that reads from a shared variable at the beginning and the end of the function, I would have to lock before the first read, and unlock after the last read? So the whole function would be under lock protection? Otherwise the compiler (in the absence of volatile) may assume the variable hasn't changed and not actually go and read it in memory. In that context, volatile seems to be a good compromise to write racy code that is at least safe from dangerous compiler optimizations (although not safe from races, which are dealt with, and assumed to be rare enough that we don't care).

Also, what about atomicity? If one thread modified a shared variable and another one modifies it at the same time. Is it possible that the bytes in memory composing that variable and up all scrambled?

No risk is zero because anything out of tt is validated before to be used. In particular tt move is tested for legality.

Under lock it means both read and write. In your example code is probably bad designed because you may want that also the thread that writes the variable does this under lock protection so is blocked while the first reads the value that could be even cached after first read and volatile is not needed.

bob · Post by **bob** » Fri Mar 21, 2014 4:11 am

syzygy wrote:
hgm wrote:
Rein Halbersma wrote: Contention for locks hurts performance, not the locks themselves.
Well, the last time I looked machine instructions typically used for implementing locks, like XCHG reg, mem, or instructions using an explicit LOCK prefix, where incredibly expensive (like 48 clocks). I don't know if recent CPUs now have made that problem completely go away.
On modern CPUs the lock prefix almost does not cost anything, provided the cache line is uncontended.

syzygy wrote:After pthread_mutex_lock() the compiler knows it must reload all values from memory.
So the compiler recognizes this specific function call, and does not treat it like any other?
Correct. (In fact, it probably treats it as a function call about which it knows absolutely nothing, which means anything can have happened to global memory when it returns.)

Actually, for current gcc this is dead wrong. In fact, gcc reloads ALL global variable memory values across ANY procedure call. Which makes sense because it has no idea what was changed by the procedure being called.

The case you are overlooking, which I brought up the LAST time this discussion about volatile erupted, is that I do NOT have to call pthread_mutex_lock right at the point you are looking at. In fact, I can call it from a wrapper function using a different name, one that the compiler does NOT see when compiling this code. So it does not know that pthread_mutex_lock is being called and it has to do something cute. Used to not do that. So this discussion is off-target. It has nothing to do with pthread_mutex_lock() whatsoever. It has everything to do with the compiler not knowing what global variables might be changed by any procedure call that might be made.

Volatile and locks are not interoperable. For example, in Crafty, I have a spin loop where idle threads continually check a pointer to see if it is non-null. When a thread is given work to do, the pointer is set to a split block that the thread uses to do the requested search. A lock there would be really inefficient and smoke cache as the lock gets passed from cache to cache. A simple store to a volatile variable works perfectly since the compiler can't optimize the reference to the pointer away thinking no one can change it since no procedures are being called within the idle loop.

Turns out this discussion is really about global variables, not memory barriers/fences, which don't even belong in the discussion here. The general usage of the word fence/barrier is a point beyond which execution won't continue until all current memory transactions are completed, not something the compiler knows much about.. There are semantics within C to use a fence/barrier, depending on the platform being used, of course... We had to use this on the Alpha since it does out of order writes, something Intel guarantees will not happen. You have to set up a barrier right before clearing a lock to be sure all the writes inside the critical section are completed before the lock is cleared allowing other threads into the critical section.

Interestingly when I compile a simple test including the SOURCE for pthread_mutex_lock() and pthread_mutex_unlock() the compiler will maintain variable values right across the lock calls as I had originally thought, only because it can see none of the global variables are changed in the lock code, making it safe. Again, this has nothing to do with pthread_mutex_lock() itself, it is an artifact of all procedure calls, something I should have instantly realized. I'll blame it on this cough-medicine my doc has me on for flu-like symptoms...

volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?