volatile?

bob · Post by **bob** » Fri Mar 21, 2014 4:13 am

rbarreira wrote:
bob wrote:
syzygy wrote:
bob wrote:The code looks ugly to me and is probably not safe.
If you had just read:
syzygy wrote:It seems splitPoint->moveCount is always accessed under lock protection (except for an assert), so it seems it could be made non-volatile.
Please spare us your confused "contributions" if you can't take the time to read first.
Please read what I wrote. Locks do NOT avoid the need for volatile. Not now, not ever. I might not be able to acquire the lock if someone else has it, but I can certainly keep a cached value of the variable for a long time, which can certainly be a problem...
You are wrong.

When a compiler sees a pthread_mutex_lock call (or many other kinds of API calls for that matter) it has to assume "anything can happen here, so don't make optimizations relying on things staying in the same state before and after the call".

To be more precise, at least it has to behave as if it did that (which does not allow it to mandate the use of volatile in code).

Actually this is wrong. The compiler has to assume that ANY global variable can be changed by any procedure call, including pthread_mutex_lock(). It is not doing anything specific for pthread_mutex_lock() at all. You can confirm this by downloading the library source, and sticking those two source files into your code. Now that the compiler can see what pthread_mutex_lock()/unlock() does, it notices none of your global variables are changed, and it will keep them right across the call as one would expect...

bob · Post by **bob** » Fri Mar 21, 2014 4:19 am

lucasart wrote:
mcostalba wrote:
lucasart wrote: On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff
A variable accessed under lock protection does not require to be defined 'volatile'.

Instead if accessed by many threads outside lock protection is better to define volatile, although of course this doesn't give you any protection against races and you really need to know what you are doing.

But races are an intrinsic part of a SMP chess engine, for instance TT table is intrinsically racy for performance reasons, because to protect with lock it would be very slow...this is not a problem per-se, as long as you know it and you deal with it.
I see. It's a calculated risk.

Senpai follows the conservative approach to use locks more systematically. As Fabien said, he wanted to get it right first, before optimizing (and risking to spend time debugging tricky SMP races).

When you say accesses "under lock protection", you mean write and read? For example, if I have a function that reads from a shared variable at the beginning and the end of the function, I would have to lock before the first read, and unlock after the last read? So the whole function would be under lock protection? Otherwise the compiler (in the absence of volatile) may assume the variable hasn't changed and not actually go and read it in memory. In that context, volatile seems to be a good compromise to write racy code that is at least safe from dangerous compiler optimizations (although not safe from races, which are dealt with, and assumed to be rare enough that we don't care).

Also, what about atomicity? If one thread modified a shared variable and another one modifies it at the same time. Is it possible that the bytes in memory composing that variable and up all scrambled?

I don't think it is a calculated risk. This concept is still wrong, as I have pointed out. The reason it works for this specific case is that the compiler can NOT see inside the pthread_mutex_lock() function, it can not tell which variables might be changed by that code, so it has to assume ANY global variable can be altered by the call, and therefore it has to re-load from memory. Doesn't matter what routine you call, this is true, unless the compiler can see the code. If you insert pthread_mutex_lock() code into your source directly, where the compiler can see it, it will optimize just as you might expect.

All of this disinformation about volatile is just that, disinformation. If a variable can be changed by something outside the current instruction stream (another thread typically, but also by a memory-mapped device controller, etc) then it should be declared volatile or problems might well show up. Sometimes you can get away with not using volatile, but is it worth the risk and potential painful debugging? My vote, after writing parallel code since the late 70's, is "no". It is silly to write code that just happens to work on architecture A, but which fails miserably on architecture B. You never know when you might want to use B, and you don't want to get bit when you do.

bob · Post by **bob** » Fri Mar 21, 2014 4:23 am

syzygy wrote:An explanation of why volatile is not (or just seldom?) useful to solve concurreny problems can be found here.

Basically volatile ensure that the read access actually happens, but this does not tell you when it happens. In principle the compiler is free to reorder the read access. You need a memory fence to prevent this. But once you have a memory fence, you don't need the volatile keyword anymore.

The comp.programming.threads FAQ also has something to say (but this is more for Bob):

Code: Select all

 Q56&#58; Why don't I need to declare shared variables VOLATILE?  


> I'm concerned, however, about cases where both the compiler and the
> threads library fulfill their respective specifications.  A conforming
> C compiler can globally allocate some shared &#40;nonvolatile&#41; variable to
> a register that gets saved and restored as the CPU gets passed from
> thread to thread.  Each thread will have it's own private value for
> this shared variable, which is not what we want from a shared
> variable.

In some sense this is true, if the compiler knows enough about the
respective scopes of the variable and the pthread_cond_wait &#40;or
pthread_mutex_lock&#41; functions. In practice, most compilers will not try
to keep register copies of global data across a call to an external
function, because it's too hard to know whether the routine might
somehow have access to the address of the data.

So yes, it's true that a compiler that conforms strictly &#40;but very
aggressively&#41; to ANSI C might not work with multiple threads without
volatile. But someone had better fix it. Because any SYSTEM &#40;that is,
pragmatically, a combination of kernel, libraries, and C compiler&#41; that
does not provide the POSIX memory coherency guarantees does not CONFORM
to the POSIX standard. Period. The system CANNOT require you to use
volatile on shared variables for correct behavior, because POSIX
requires only that the POSIX synchronization functions are necessary.

So if your program breaks because you didn't use volatile, that's a BUG.
It may not be a bug in C, or a bug in the threads library, or a bug in
the kernel. But it's a SYSTEM bug, and one or more of those components
will have to work to fix it.

You don't want to use volatile, because, on any system where it makes
any difference, it will be vastly more expensive than a proper
nonvolatile variable. &#40;ANSI C requires "sequence points" for volatile
variables at each expression, whereas POSIX requires them only at
synchronization operations -- a compute-intensive threaded application
will see substantially more memory activity using volatile, and, after
all, it's the memory activity that really slows you down.)

/---&#91; Dave Butenhof &#93;-----------------------&#91; butenhof@zko.dec.com &#93;---\
| Digital Equipment Corporation           110 Spit Brook Rd ZKO2-3/Q18 |
| 603.881.2218, FAX 603.881.0120                  Nashua NH 03062-2698 |
\-----------------&#91; Better Living Through Concurrency &#93;----------------/

So: POSIX says you do not need volatile.

So? There are ways to lock on x86 without using an atomic lock (lck prefix, xchg, etc) Is that advised? Absolutely not.

And should that "POSIX" write ever write a device driver for a device that uses memory-mapped I/O, he might have a minor problem getting it to work, since two consecutive reads to the same address will produce two different values, intentionally. Compiler won't understand that without volatile.

lucasart · Post by **lucasart** » Fri Mar 21, 2014 4:26 am

syzygy wrote:
lucasart wrote:Also, what about atomicity? If one thread modified a shared variable and another one modifies it at the same time. Is it possible that the bytes in memory composing that variable and up all scrambled?
If both threads write a 32-bit int to the same 4 bytes of memory within a single cacheline, then this write is guaranteed to be atomic. In other words, the end result is one of the two 32-bit ints and not a mixture of the two.

On x86-64 the same applies to 64-bit ints.

Obviously the same holds for 16-bit and 8-bit itns.

OK, that means for built-in types read/write operations are atomic. This is because built-in types have a size that divides the cash line size. Hence, in the absence of unaligned memory access (which you would really have to provoke on purpose with some ugly C-tyle reinterpretation of pointers), you are guaranteed that they don't cross a cache line.

For example, I'm wondering if in this line of code:
https://github.com/lucasart/Sensei/blob ... i.cc#L5861
I can remove the lock protection.

If I can assume that 'p_workers++' is an atomic operation, I should be able to remove the lock. I can think of two ways the compiler would translate this into assembly:
1/ incrementing directly the variable in memory (a single INC op-code)
2/ moving it into a registry, incrementing the registry, and moving it back to the memory. That three step approach wouldn't be atomic, leading to racy code without the lock protection.

Is there anything in the C++ standard that forbids 2/, and guarantees that the incrementation will be atomic? (hence allows removal of the lock). Should I define the variable p_workers as std::atomic<int> in order to get this guarantee?

BeyondCritics · Post by **BeyondCritics** » Fri Mar 21, 2014 4:26 am

I recommened
http://channel9.msdn.com/Shows/Going+De ... of-2#embed
and
http://channel9.msdn.com/Shows/Going+De ... ons-2-of-2

t is really not that difficult.

bob · Post by **bob** » Fri Mar 21, 2014 4:30 am

AlvaroBegue wrote:
hgm wrote:
syzygy wrote:After pthread_mutex_lock() the compiler knows it must reload all values from memory.
So the compiler recognizes this specific function call, and does not treat it like any other?
That is correct. Since you didn't follow the advice of looking it up, here it is: http://pubs.opengroup.org/onlinepubs/96 ... #tag_04_11

Oh, and this is rather long, but it is very very informative: http://channel9.msdn.com/Shows/Going+De ... ons-1-of-2

Just for the record, I compiled a simple test program, but manually included the pthread source for pthread_mutex_lock() and pthread_mutex_unlock(). The compiler then nicely optimized ACROSS the function calls, loading a shared value before the lock call and then using that same copy after the lock call. Unless volatile was used. I don't think this has anything to do with any specific pthread lib call, it has to do with the fact that the compiler can not see the lib source, and has no idea whether or not those library functions modify any important global data it might try to carry across the procedure call. So it simply reloads ANY global (but not local) values once it returns from the lock() call, not because the lock call is treated in a special way, but because it doesn't know what is going on inside the lock call unless you actually include the source. Then it certainly does not treat it in a special way...

syzygy · Post by **syzygy** » Fri Mar 21, 2014 8:52 am

bob wrote:Interestingly when I compile a simple test including the SOURCE for pthread_mutex_lock() and pthread_mutex_unlock() the compiler will maintain variable values right across the lock calls as I had originally thought, only because it can see none of the global variables are changed in the lock code, making it safe. Again, this has nothing to do with pthread_mutex_lock() itself, it is an artifact of all procedure calls, something I should have instantly realized. I'll blame it on this cough-medicine my doc has me on for flu-like symptoms...

Congratulations on using non-POSIX compliant locking primitives. POSIX tells you to #include the appropriate system files.

I propose we all just ignore Bob.

Rein Halbersma · Post by **Rein Halbersma** » Fri Mar 21, 2014 9:35 am

lucasart wrote:
For example, I'm wondering if in this line of code:
https://github.com/lucasart/Sensei/blob ... i.cc#L5861
I can remove the lock protection.

If I can assume that 'p_workers++' is an atomic operation, I should be able to remove the lock. I can think of two ways the compiler would translate this into assembly:
1/ incrementing directly the variable in memory (a single INC op-code)
2/ moving it into a registry, incrementing the registry, and moving it back to the memory. That three step approach wouldn't be atomic, leading to racy code without the lock protection.

Is there anything in the C++ standard that forbids 2/, and guarantees that the incrementation will be atomic? (hence allows removal of the lock). Should I define the variable p_workers as std::atomic<int> in order to get this guarantee?

The answers for C++11 and beyond are indeed: no, there is nothing that prevents 2/, and yes, you need a synchronization primitive such as std::atomic<int>.

hgm · Post by **hgm** » Fri Mar 21, 2014 10:08 am

Note that even a single INC mem instruction is not atomic on i386/x64. It still involves reading and then writing back the data in separate steps of the micro-architecture, and other cores could read or write that same memory location in between. Only with a LOCK prefix the instruction access to memory by other cores will be blocked between the read and the write.

As to the #include of the code, this still puzzles me. I can of course see that this help the compiler to se what the routines do, and thus which global variables run the risk of being changed, and which are safe. But when I #include a file that really defines a routine in more than one of my source files, I usually get a 'multiply-defined symbol' linker error. How is this prevented, in this case?

rbarreira · Post by **rbarreira** » Fri Mar 21, 2014 10:19 am

bob wrote:
rbarreira wrote:
bob wrote:
syzygy wrote:
bob wrote:The code looks ugly to me and is probably not safe.
If you had just read:
syzygy wrote:It seems splitPoint->moveCount is always accessed under lock protection (except for an assert), so it seems it could be made non-volatile.
Please spare us your confused "contributions" if you can't take the time to read first.
Please read what I wrote. Locks do NOT avoid the need for volatile. Not now, not ever. I might not be able to acquire the lock if someone else has it, but I can certainly keep a cached value of the variable for a long time, which can certainly be a problem...
You are wrong.

When a compiler sees a pthread_mutex_lock call (or many other kinds of API calls for that matter) it has to assume "anything can happen here, so don't make optimizations relying on things staying in the same state before and after the call".

To be more precise, at least it has to behave as if it did that (which does not allow it to mandate the use of volatile in code).
Actually this is wrong. The compiler has to assume that ANY global variable can be changed by any procedure call, including pthread_mutex_lock(). It is not doing anything specific for pthread_mutex_lock() at all. You can confirm this by downloading the library source, and sticking those two source files into your code. Now that the compiler can see what pthread_mutex_lock()/unlock() does, it notices none of your global variables are changed, and it will keep them right across the call as one would expect...

You are so wrong it isn't even funny. Your knowledge of compilers is stuck in the early 90s.

On top of that you're also contradicting yourself. First you said volatile was needed even with locks, now you say it's not.

Compilers can certainly make optimizations based on knowledge of what some function calls do.

volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?