syzygy wrote:hgm wrote:Rein Halbersma wrote: Contention for locks hurts performance, not the locks themselves.
Well, the last time I looked machine instructions typically used for implementing locks, like XCHG reg, mem, or instructions using an explicit LOCK prefix, where incredibly expensive (like 48 clocks). I don't know if recent CPUs now have made that problem completely go away.
On modern CPUs the lock prefix almost does not cost anything, provided the cache line is uncontended.
syzygy wrote:After pthread_mutex_lock() the compiler knows it must reload all values from memory.
So the compiler recognizes this specific function call, and does not treat it like any other?
Correct. (In fact, it probably treats it as a function call about which it knows absolutely nothing, which means anything can have happened to global memory when it returns.)
Actually, for current gcc this is dead wrong. In fact, gcc reloads ALL global variable memory values across ANY procedure call. Which makes sense because it has no idea what was changed by the procedure being called.
The case you are overlooking, which I brought up the LAST time this discussion about volatile erupted, is that I do NOT have to call pthread_mutex_lock right at the point you are looking at. In fact, I can call it from a wrapper function using a different name, one that the compiler does NOT see when compiling this code. So it does not know that pthread_mutex_lock is being called and it has to do something cute. Used to not do that. So this discussion is off-target. It has nothing to do with pthread_mutex_lock() whatsoever. It has everything to do with the compiler not knowing what global variables might be changed by any procedure call that might be made.
Volatile and locks are not interoperable. For example, in Crafty, I have a spin loop where idle threads continually check a pointer to see if it is non-null. When a thread is given work to do, the pointer is set to a split block that the thread uses to do the requested search. A lock there would be really inefficient and smoke cache as the lock gets passed from cache to cache. A simple store to a volatile variable works perfectly since the compiler can't optimize the reference to the pointer away thinking no one can change it since no procedures are being called within the idle loop.
Turns out this discussion is really about global variables, not memory barriers/fences, which don't even belong in the discussion here. The general usage of the word fence/barrier is a point beyond which execution won't continue until all current memory transactions are completed, not something the compiler knows much about.. There are semantics within C to use a fence/barrier, depending on the platform being used, of course... We had to use this on the Alpha since it does out of order writes, something Intel guarantees will not happen. You have to set up a barrier right before clearing a lock to be sure all the writes inside the critical section are completed before the lock is cleared allowing other threads into the critical section.
Interestingly when I compile a simple test including the SOURCE for pthread_mutex_lock() and pthread_mutex_unlock() the compiler will maintain variable values right across the lock calls as I had originally thought, only because it can see none of the global variables are changed in the lock code, making it safe. Again, this has nothing to do with pthread_mutex_lock() itself, it is an artifact of all procedure calls, something I should have instantly realized. I'll blame it on this cough-medicine my doc has me on for flu-like symptoms...