syzygy wrote:bob wrote:It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
I was wrong to say that Intel guarantees the illusion that a read will always give the last value written by any other CPU. Not even this illusion holds true in its full generality.
Suppose memory locations x and y are initialised to 0.
Now CPU1 performs a write and a read:
mov $1, [x]
mov [y], %eax
At roughly the same time, CPU2 also performs a write and a read:
mov $1, [y]
mov [x], %eax
Now %eax for both CPU1 and CPU2 may be 0.
How can that be? If CPU1 reads 0 from [y], it must have executed the read before CPU2 executed the write, right? So CPU1 must have executed the write even earlier, and CPU2 must have executed the read even later. That means that CPU2 can only have read 1. But in reality, it may read a 0.
That is a trivial example that is well known. Has absolutely nothing to do with current discussion which would only be about ONE variable. You will NEVER get an old value with Intel.
The basis for the discussion has been optimization of memory loads across procedures, and a bunch of testing has me 100% convinced that:
(1) if the compiler can not see EVERYTHING resulting from a procedure call, such as library code that is invisible since it has already been compiled without the compiler having access to the original source, then no memory values will be carried across the procedure call. Every last one will be re-loaded, if they sit in a global memory area where they could be modified by something below this point in the call tree;
(2) if the compiler can see everything, and it verifies that a value is not modified in the call tree, it will continue to use it if it was able to preserve it in a register.
(3) other than the above two, nothing else matters. There is no special handling for library functions like pthread_mutex_lock() and friends, they simply call re-loads in every case because the compiler can not see whether they modify global variables or not.
The above is quite easy to test. I explained exactly what I did to test it. I tried a dozen different tricks, could not fool the compiler to keeping global values across procedure calls unless I let the compiler see everything in the call tree.
Yes, you can write code without volatile. Yes, you can write code without atomic locks. And yes, both can work correctly. But NOT on all architectures. Which, to me, is a major problem. I don't want to debug something every time someone offers me a new platform to run on. Crafty works, as is, on every platform on planet earth so far. That was my goal.