Michel wrote:(1) The compiler can produce code that in single threaded mode is completely equivalent to the naive translation but in multithreaded mode may produce r1=42, r2=42.
(2) This is an extreme example. But the issue seems to be that it is tricky (a) to make this example illegal and (b) still allow common compiler optimizations (speculative stores).
If the compiled code for Thread 1 looks like:
Code: Select all
tmp = y;
y = 42;
r1 = x;
if (r1 != 42) y = tmp;
then this would work fine in single-threaded mode, but it would violate the requirements of C++11, because in some executions it includes (visible) stores to global variables that the abstract machine does not perform.
To "correct" this violation, the store would to have to be made invisible retroactively.
Such things happen in transactional memory systems with eager versioning. A speculative / transactional store to a shared variable by a first transaction is visible to another second transaction (i.e. another thread running in transactional mode). If the first transaction aborts, then the second transaction must also be aborted. The transactional memory system takes care of this.
So for Thread 1 we get:
Code: Select all
transaction.start();
y = 42; // speculatively
r1 = x;
if (r1 == 42)
transaction.commit();
else
transaction.abort();
Transaction.start() tells the system that all stores to global variables may have to be rolled back. In an eager versioning system the old values will typically be recorded in an undo log. Transaction.abort() will roll back the transaction. Transaction.commit() will check whether any conflicts occurred that require a rollback. If not, the undo log is cleared and the stores are final. If the transaction is dependent on other transactions, transaction.commit() may have to wait for the fate of those other transactions to become known.
What needs to be added to this for the example to work is that two transactions that are dependent on each other in the sense that both read a value speculatively written by the other are allowed to proceed and commit (necessarily simultaneously). This is probably a bad idea, because it allows these weird results, but there is no technical obstacle to it.
(Btw, what happens if y is read non-transactionally by a third thread? Answer: either the transactional system simply does not allow this, in which case the compiler "optimisation" is illegal, or the transactional systems detects it and aborts the transaction being executed by Thread 1. There are probably many more of these small issues that can be resolved by a little bit of thinking and have been resolved in actual systems.)
This is similar to concurrent database access
Transactional memory has its roots in database transactions, but it is quite different. Its main purpose is to make multithreaded programming easier. Currently the "easy" way to do multithreaded programming is to have one big lock for locking all shared data. This can be terribly inefficient if most accesses do not "touch" each other. The solution is to refine the locks, but this make life much more complicated.
With transactional memory, the programmer can again pretend that all shared data is behind one big lock, usually by encapsulating all accesses like this:
Code: Select all
atomic {
// access shared data
}
This could be implemented as
Code: Select all
singleglobalmutex.lock();
// access shared data
singleglobalmutex.unlock();
but that would be inefficient. With transactional memory the system does not lock anything. Instead, it lets all the threads entering an atomic open a transaction. In transactional mode, it monitors all accesses to see if there is a conflict (usally on a cacheline basis). Usually there will not be a conflict and the transactions can commit.
In chess this could be used to access the shared TT table in a completely safe manner without locking overhead. On Haswell processors with TSX this can already be done today.
One way to do this on Haswell is by simply putting TT accesses behind a lock:
Code: Select all
ttmutex.lock();
// access TT
ttmutex.unlock();
and link to a recent glibc. Glibc will use TSX to "elide" the lock by replacing it with a transaction (and then take the lock if the transaction fails). (I am not sure whether glibc already does this in any official releases.)
It can also be done with self-made spinlocks by inserting XACQUIRE and XRELEASE instructions into the assembly. Older processors will ignore these, processors wih TSX will recognise these and "elide" the lock.
TSX implements a "lazy versioning" form of transactional memory. Stores only become visible after a transaction was committed.