Actually this is wrong. The compiler has to assume that ANY global variable can be changed by any procedure call, including pthread_mutex_lock(). It is not doing anything specific for pthread_mutex_lock() at all. You can confirm this by downloading the library source, and sticking those two source files into your code. Now that the compiler can see what pthread_mutex_lock()/unlock() does, it notices none of your global variables are changed, and it will keep them right across the call as one would expect...rbarreira wrote:You are wrong.bob wrote:Please read what I wrote. Locks do NOT avoid the need for volatile. Not now, not ever. I might not be able to acquire the lock if someone else has it, but I can certainly keep a cached value of the variable for a long time, which can certainly be a problem...syzygy wrote:If you had just read:bob wrote:The code looks ugly to me and is probably not safe.Please spare us your confused "contributions" if you can't take the time to read first.syzygy wrote:It seems splitPoint->moveCount is always accessed under lock protection (except for an assert), so it seems it could be made non-volatile.
When a compiler sees a pthread_mutex_lock call (or many other kinds of API calls for that matter) it has to assume "anything can happen here, so don't make optimizations relying on things staying in the same state before and after the call".
To be more precise, at least it has to behave as if it did that (which does not allow it to mandate the use of volatile in code).
volatile?
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: volatile?
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: volatile?
I don't think it is a calculated risk. This concept is still wrong, as I have pointed out. The reason it works for this specific case is that the compiler can NOT see inside the pthread_mutex_lock() function, it can not tell which variables might be changed by that code, so it has to assume ANY global variable can be altered by the call, and therefore it has to re-load from memory. Doesn't matter what routine you call, this is true, unless the compiler can see the code. If you insert pthread_mutex_lock() code into your source directly, where the compiler can see it, it will optimize just as you might expect.lucasart wrote:I see. It's a calculated risk.mcostalba wrote:A variable accessed under lock protection does not require to be defined 'volatile'.lucasart wrote: On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff
Instead if accessed by many threads outside lock protection is better to define volatile, although of course this doesn't give you any protection against races and you really need to know what you are doing.
But races are an intrinsic part of a SMP chess engine, for instance TT table is intrinsically racy for performance reasons, because to protect with lock it would be very slow...this is not a problem per-se, as long as you know it and you deal with it.
Senpai follows the conservative approach to use locks more systematically. As Fabien said, he wanted to get it right first, before optimizing (and risking to spend time debugging tricky SMP races).
When you say accesses "under lock protection", you mean write and read? For example, if I have a function that reads from a shared variable at the beginning and the end of the function, I would have to lock before the first read, and unlock after the last read? So the whole function would be under lock protection? Otherwise the compiler (in the absence of volatile) may assume the variable hasn't changed and not actually go and read it in memory. In that context, volatile seems to be a good compromise to write racy code that is at least safe from dangerous compiler optimizations (although not safe from races, which are dealt with, and assumed to be rare enough that we don't care).
Also, what about atomicity? If one thread modified a shared variable and another one modifies it at the same time. Is it possible that the bytes in memory composing that variable and up all scrambled?
All of this disinformation about volatile is just that, disinformation. If a variable can be changed by something outside the current instruction stream (another thread typically, but also by a memory-mapped device controller, etc) then it should be declared volatile or problems might well show up. Sometimes you can get away with not using volatile, but is it worth the risk and potential painful debugging? My vote, after writing parallel code since the late 70's, is "no". It is silly to write code that just happens to work on architecture A, but which fails miserably on architecture B. You never know when you might want to use B, and you don't want to get bit when you do.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: volatile?
So? There are ways to lock on x86 without using an atomic lock (lck prefix, xchg, etc) Is that advised? Absolutely not.syzygy wrote:An explanation of why volatile is not (or just seldom?) useful to solve concurreny problems can be found here.
Basically volatile ensure that the read access actually happens, but this does not tell you when it happens. In principle the compiler is free to reorder the read access. You need a memory fence to prevent this. But once you have a memory fence, you don't need the volatile keyword anymore.
The comp.programming.threads FAQ also has something to say (but this is more for Bob):So: POSIX says you do not need volatile.Code: Select all
Q56: Why don't I need to declare shared variables VOLATILE? > I'm concerned, however, about cases where both the compiler and the > threads library fulfill their respective specifications. A conforming > C compiler can globally allocate some shared (nonvolatile) variable to > a register that gets saved and restored as the CPU gets passed from > thread to thread. Each thread will have it's own private value for > this shared variable, which is not what we want from a shared > variable. In some sense this is true, if the compiler knows enough about the respective scopes of the variable and the pthread_cond_wait (or pthread_mutex_lock) functions. In practice, most compilers will not try to keep register copies of global data across a call to an external function, because it's too hard to know whether the routine might somehow have access to the address of the data. So yes, it's true that a compiler that conforms strictly (but very aggressively) to ANSI C might not work with multiple threads without volatile. But someone had better fix it. Because any SYSTEM (that is, pragmatically, a combination of kernel, libraries, and C compiler) that does not provide the POSIX memory coherency guarantees does not CONFORM to the POSIX standard. Period. The system CANNOT require you to use volatile on shared variables for correct behavior, because POSIX requires only that the POSIX synchronization functions are necessary. So if your program breaks because you didn't use volatile, that's a BUG. It may not be a bug in C, or a bug in the threads library, or a bug in the kernel. But it's a SYSTEM bug, and one or more of those components will have to work to fix it. You don't want to use volatile, because, on any system where it makes any difference, it will be vastly more expensive than a proper nonvolatile variable. (ANSI C requires "sequence points" for volatile variables at each expression, whereas POSIX requires them only at synchronization operations -- a compute-intensive threaded application will see substantially more memory activity using volatile, and, after all, it's the memory activity that really slows you down.) /---[ Dave Butenhof ]-----------------------[ butenhof@zko.dec.com ]---\ | Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 | | 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 | \-----------------[ Better Living Through Concurrency ]----------------/
And should that "POSIX" write ever write a device driver for a device that uses memory-mapped I/O, he might have a minor problem getting it to work, since two consecutive reads to the same address will produce two different values, intentionally. Compiler won't understand that without volatile.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: volatile?
OK, that means for built-in types read/write operations are atomic. This is because built-in types have a size that divides the cash line size. Hence, in the absence of unaligned memory access (which you would really have to provoke on purpose with some ugly C-tyle reinterpretation of pointers), you are guaranteed that they don't cross a cache line.syzygy wrote:If both threads write a 32-bit int to the same 4 bytes of memory within a single cacheline, then this write is guaranteed to be atomic. In other words, the end result is one of the two 32-bit ints and not a mixture of the two.lucasart wrote:Also, what about atomicity? If one thread modified a shared variable and another one modifies it at the same time. Is it possible that the bytes in memory composing that variable and up all scrambled?
On x86-64 the same applies to 64-bit ints.
Obviously the same holds for 16-bit and 8-bit itns.
For example, I'm wondering if in this line of code:
https://github.com/lucasart/Sensei/blob ... i.cc#L5861
I can remove the lock protection.
If I can assume that 'p_workers++' is an atomic operation, I should be able to remove the lock. I can think of two ways the compiler would translate this into assembly:
1/ incrementing directly the variable in memory (a single INC op-code)
2/ moving it into a registry, incrementing the registry, and moving it back to the memory. That three step approach wouldn't be atomic, leading to racy code without the lock protection.
Is there anything in the C++ standard that forbids 2/, and guarantees that the incrementation will be atomic? (hence allows removal of the lock). Should I define the variable p_workers as std::atomic<int> in order to get this guarantee?
Last edited by lucasart on Fri Mar 21, 2014 4:27 am, edited 1 time in total.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 396
- Joined: Sat May 05, 2012 2:48 pm
- Full name: Oliver Roese
Re: volatile?
I recommened
http://channel9.msdn.com/Shows/Going+De ... of-2#embed
and
http://channel9.msdn.com/Shows/Going+De ... ons-2-of-2
t is really not that difficult.
http://channel9.msdn.com/Shows/Going+De ... of-2#embed
and
http://channel9.msdn.com/Shows/Going+De ... ons-2-of-2
t is really not that difficult.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: volatile?
Just for the record, I compiled a simple test program, but manually included the pthread source for pthread_mutex_lock() and pthread_mutex_unlock(). The compiler then nicely optimized ACROSS the function calls, loading a shared value before the lock call and then using that same copy after the lock call. Unless volatile was used. I don't think this has anything to do with any specific pthread lib call, it has to do with the fact that the compiler can not see the lib source, and has no idea whether or not those library functions modify any important global data it might try to carry across the procedure call. So it simply reloads ANY global (but not local) values once it returns from the lock() call, not because the lock call is treated in a special way, but because it doesn't know what is going on inside the lock call unless you actually include the source. Then it certainly does not treat it in a special way...AlvaroBegue wrote:That is correct. Since you didn't follow the advice of looking it up, here it is: http://pubs.opengroup.org/onlinepubs/96 ... #tag_04_11hgm wrote:So the compiler recognizes this specific function call, and does not treat it like any other?syzygy wrote:After pthread_mutex_lock() the compiler knows it must reload all values from memory.
Oh, and this is rather long, but it is very very informative: http://channel9.msdn.com/Shows/Going+De ... ons-1-of-2
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: volatile?
Congratulations on using non-POSIX compliant locking primitives. POSIX tells you to #include the appropriate system files.bob wrote:Interestingly when I compile a simple test including the SOURCE for pthread_mutex_lock() and pthread_mutex_unlock() the compiler will maintain variable values right across the lock calls as I had originally thought, only because it can see none of the global variables are changed in the lock code, making it safe. Again, this has nothing to do with pthread_mutex_lock() itself, it is an artifact of all procedure calls, something I should have instantly realized. I'll blame it on this cough-medicine my doc has me on for flu-like symptoms...
I propose we all just ignore Bob.
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: volatile?
The answers for C++11 and beyond are indeed: no, there is nothing that prevents 2/, and yes, you need a synchronization primitive such as std::atomic<int>.lucasart wrote:
For example, I'm wondering if in this line of code:
https://github.com/lucasart/Sensei/blob ... i.cc#L5861
I can remove the lock protection.
If I can assume that 'p_workers++' is an atomic operation, I should be able to remove the lock. I can think of two ways the compiler would translate this into assembly:
1/ incrementing directly the variable in memory (a single INC op-code)
2/ moving it into a registry, incrementing the registry, and moving it back to the memory. That three step approach wouldn't be atomic, leading to racy code without the lock protection.
Is there anything in the C++ standard that forbids 2/, and guarantees that the incrementation will be atomic? (hence allows removal of the lock). Should I define the variable p_workers as std::atomic<int> in order to get this guarantee?
-
- Posts: 27811
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: volatile?
Note that even a single INC mem instruction is not atomic on i386/x64. It still involves reading and then writing back the data in separate steps of the micro-architecture, and other cores could read or write that same memory location in between. Only with a LOCK prefix the instruction access to memory by other cores will be blocked between the read and the write.
As to the #include of the code, this still puzzles me. I can of course see that this help the compiler to se what the routines do, and thus which global variables run the risk of being changed, and which are safe. But when I #include a file that really defines a routine in more than one of my source files, I usually get a 'multiply-defined symbol' linker error. How is this prevented, in this case?
As to the #include of the code, this still puzzles me. I can of course see that this help the compiler to se what the routines do, and thus which global variables run the risk of being changed, and which are safe. But when I #include a file that really defines a routine in more than one of my source files, I usually get a 'multiply-defined symbol' linker error. How is this prevented, in this case?
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: volatile?
You are so wrong it isn't even funny. Your knowledge of compilers is stuck in the early 90s.bob wrote:Actually this is wrong. The compiler has to assume that ANY global variable can be changed by any procedure call, including pthread_mutex_lock(). It is not doing anything specific for pthread_mutex_lock() at all. You can confirm this by downloading the library source, and sticking those two source files into your code. Now that the compiler can see what pthread_mutex_lock()/unlock() does, it notices none of your global variables are changed, and it will keep them right across the call as one would expect...rbarreira wrote:You are wrong.bob wrote:Please read what I wrote. Locks do NOT avoid the need for volatile. Not now, not ever. I might not be able to acquire the lock if someone else has it, but I can certainly keep a cached value of the variable for a long time, which can certainly be a problem...syzygy wrote:If you had just read:bob wrote:The code looks ugly to me and is probably not safe.Please spare us your confused "contributions" if you can't take the time to read first.syzygy wrote:It seems splitPoint->moveCount is always accessed under lock protection (except for an assert), so it seems it could be made non-volatile.
When a compiler sees a pthread_mutex_lock call (or many other kinds of API calls for that matter) it has to assume "anything can happen here, so don't make optimizations relying on things staying in the same state before and after the call".
To be more precise, at least it has to behave as if it did that (which does not allow it to mandate the use of volatile in code).
On top of that you're also contradicting yourself. First you said volatile was needed even with locks, now you say it's not.
Compilers can certainly make optimizations based on knowledge of what some function calls do.