syzygy wrote:hgm wrote:syzygy wrote:What "my" system requires is that speculative dependencies between transactions are detected (such as reads of speculative values), and that cyclic dependencies between threads are permitted but must result in these threads being committed all together or not at all.
The problem is that it cannot know that. Your 42 example hinges on the fact that when there is a test on speculatively transferred data that happens to evaluate to true, you
know that committing the data will clear all dependencies in the cycle, and eventually propagate back to you to confirm the original speculation. But you cannot know that. Part of that dependency cycle is in the other CPU. And that other CPU could have been speculating on something that is completely independent of whether you will commit the store or not. The only thing you know is that you obtained a value (42) which at any time in the future can be withdrawn. Committing your own results and send the missiles flying, just based on the blind gamble that doing so will prevent the source of the speculative data from retracting it, will in general not perform according to the standard.
There is nothing "cannot know" about it. As I wrote several times before:
syzygy wrote:Just to be clear "my" system commits both threads because:
- both conditions evaluate to true (yes, based on forwarded speculative values);
- assuming the speculative stores are valid, everything else is valid as well.
How would this go in practice?
Code: Select all
transaction.begin();
y.store( 42, memory_order_relaxed ); // speculative
r1 = x.load( memory_order_relaxed );
if (r1 == 42)
transaction.commit();
else
transaction.abort();
What might not be clear to you is that transaction.commit()
may fail.
The "r1 == 42" check verifies whether the speculative store was valid (yes, based on forwarded speculative values).
In a next step, transaction.commit() verifies whether "everything else" is valid. This includes verifying that dependent transactions are ready to commit as well.
So both threads have to reach transaction.commit(). If one of them does not, the other will abort as well.
Just to pre-empt I repeat that the deadlock argument is a non-issue.
Either the compiler has proven that each transaction finishes in finite time, or the system times them out.
So now the compiler executes the program BEFORE it compiles it, to prove the above? Please. This is NOT a compiler issue. This is a user programming issue. As in normal transaction processing, you can do whatever you want, and commit once you have finished. And the commit can fail. And YOU have to deal with that in your programming. The compiler doesn't help. The hardware doesn't help.
In the case of transaction memory, there is simply some hardware support to help with the commit and commit-fail or commit-succeed issues to make it more efficient. But it does NOT solve any speculative problems for you, you get to solve those all by yourself, the TM model simply has some hardware support to help deal with atomicity.
Just to pre-empt I also repeat that transactions won't irrecoverably start nuclear wars. Either the compiler has proven that they don't, or the system prevents them from doing it.
Just to pre-empt I note that systems already exist where transactions at commit() time have to wait for dependent transactions to commit() (and if those other transactions abort() then the commit() turns into an abort(), obviously). These systems detect at runtime the dependencies that are created at runtime.
Just to pre-empt I repeat that a thread T1 becomes dependent on a thread T2 if it has been forward a speculatively stored value from T2. (There are more ways to get dependencies, but this is the only one occurring here.)
Please read carefully. And maybe try to be open to the idea that this just might work.
Just to pre-empt: I am NOT advocating that such systems should be built that lead to nonsensical results as in the example that triggered this thread. I do not WANT this.
If we could just leave the hardware issue out of this, I would have no problem with the discussion. As I have repeatedly stated, transaction processing has been around forever. This transaction memory approach is just giving us a more efficient way of doing the commits, without needing a host of locks and complex programming. But it does not automate the process. In the original example given, there was NO transaction memory mentioned, nor was there any transaction model discussed that had something like a commit operation. Just a simple program where it was pointed out that the c standard allowed a result that is actually impossible unless the compiler has gone too far, and doing a speculative store (a meaningless concept in terms of hardware since a store is NEVER speculated to its ultimate conclusion where L1 is updated) is unacceptable in the context of threads because stores can affect other threads and hence can not be done even if they would produce correct results in a sequential program.
About the only reason I can think of to do the store BEFORE the conditional test would be to push the conditional test a cycle forward in time to free up another cycle to wait for part of the expression to be tested. Stores are done outside the normal CPU arithmetic pipelines, so doing them first really offers nothing and I am not even sure it would save a clock cycle since the instruction has long since been fetched and held in the reorder buffer waiting for dependencies to resolve. I can see practically no possible benefit to doing the store first, and then having to undo the store if the condition is false, unless the processor is relying on old-school static branch prediction. For example, one model used in the past was "if a branch goes backward in memory, it is likely a loop, assume it will be taken, if a branch goes forward in memory, assume it will not be taken."
Given this:
if (c)
x = y;
I'll rewrite more like asm:
if (!c) go to l1;
x = y;
l1:
If C is true most of the time, that !c will be false, and the branch is taken most of the time. That is going to get mis-predicted frequently, resulting in pipeline restarts. If you change that to
t=x;
x = y;
if (!c) go to l1;
l2:
and further forward in the program:
l1: x=t;
goto l2;
We put it further forward to make the conditional branch prediction "not taken" since it is in the forward direction.
If (c) is true most of the time, then the path goes t=x, x=y, conditional branch not taken (forward prediction is therefore correct), and we have done the assignment with no mis-prediction. We introduced a bit of overhead with t=x, but we have to get x into L1 anyway so we only added one l1 hit penalty which is generally very small.
But that is COMPLETELY unsafe in a parallel program if x is shared, because we are changing the value of x two times when it should not be changed at all. In standard C, that would be perfectly acceptable, however, since x should be protected by a lock if it is shared. But in C++11, since we are using a class that provides atomicity automatically, the above is NFG as an optimization.
And, to make matters worse, modern branch prediction doesn't use a static mechanism. everyone uses branch history, either local, global or both, which means that branch will be correctly predicted most of the time, whether it is taken or not taken, because it is based on dynamic observation, rather than a static guess.
So, we are back to the 2x42 model. Hardware won't ever do that. Ever. Compilers can do it easily. The guy writing that explanation specifically mentioned that compilers should NOT do such nonsense. Knowing the hardware never will.
I hope I did not screw up the above example, the c and !c gets hard to read after a few minutes..