bob wrote:diep wrote:syzygy wrote:diep wrote:syzygy wrote:diep wrote:If you do a normal store of the entire entry, you willl simply not run into troubles except for 1 case.
But you can't write 16 bytes atomically using TWO write instructions. That the 16 bytes are in the same cacheline does not make it different.
The only way to write 16 bytes atomically on x86-64 is using "lock cmpxchgb16".
It can only happen in between cachelines as it simply hasn't written that cacheline yet when it's still busy in this cacheline.
The first write by the first core makes the cache line dirty. The first write by the second core can force the first core to write the dirty cache line to memory (or maybe to a higher level cache) before the second write by the first core has taken place.
The execution of two consecutive instructions is NOT atomic. Even the instruction of most single instructions is not atomic. Single 8-byte writes to a single cacheline are atomic, though.
I editted my post read it.
The Kiiski scenario of AbCD cannot occur.
What I wrote still stands.
The short story is this: yes, cache lines essentially are read from and written to main memory as a whole. But no, the programmer has no control on when they are written back. They can be written back between two instructions writing to that same cache line.
Ok we are 1 step further now. You are no longer denying that the core hammers 64+ bytes at once to the memory controller.
Progress!
You're still not getting the idea though is it?
AbCD as Joona wrotes it cannot occur.
Suppose we do not have an interrupt, as interrupts kills our proces, of this proces P0.
It's busy writing to the storebuffer local.
Storebuffer has A and it doesn't write B yet which is in the same cacheline.The thread gets stopped for 1 millisecond.
Our 'A' is still in the storebuffer. It's not written to the memory controller yet.
Then after a while our thread P0 resumes. It writes a B. Still nothing happens. It's still local.
It writes then a C which is the next cacheline. Our core says to itself: "heh what the hell, i'm referencing a 2nd cacheline now, so i can write the first 64 bytes to the memory controller"
This is globally what happens.
Here is what happens.
Two different threads want to do this at the same time:
mov hash[rax]. rbx
mov hash+8[rax], rcx
which simply writes two 8-byte pieces of a hash entry back to memory.
the first core writes the first entry, which fetches that 64 byte line into cache, and then updates 8 bytes somewhere. Before that CPU executes the second mov, a context switch (for an interrupt or whatever) occurs. A second core then requests that same block of memory. It gets forwarded from the cpu that currently has the partially modified block. The second CPU writes both words to the same area, and invalidates the copy at the first. When the first tries to write the 2nd 64 bits, it requests it from memory but the second cache forwards the current data. The first then writes the second 64 bit chunk, and we are now left with the first 64 bit chunk from the 2nd cpu, and the 2nd 64 bit chunk from the first, which is obviously wrong. And it happens often enough to be a pain in the ass, although it might not cause any crashes or blunders...
Bob, add chances to every occasion.
First of all you assume simple code here. If you vectorize this it's 1 instruction already of SIMD. Say SSE2 or newer. In fact yoru core2 Xeons have SSSE which is a lot newer than SSE2.
So then it can't happen.
Secondly some OS-es can stop at any point, others cannot.
So in those you are safe as well.
Now let's ignore the above realities. Let's focus upon odds.
What's odds that a thread gets interrupted out of the 8 you run?
Very little of course. Only for a few milliseconds and then it's doing a copule of thousands of instructions of another program and restarts again.
Right?
So we speak of small interrupts here. If it's 1 millisecond it's a lot.
That say each 20 milliseconds.
So we speak of 50 occassions a second that one of the threads thread.
stops there.
however the odds it stops specifically at *this spot* is really tiny.
Say 1 in a million.
How many nps does crafty get at 8 cores. Mlillion or 20?
From this 2 million writes a second to hashtable or so?
So that's 250k stores a second a thread.
You've got 7 other threads.
How many stores in 1 millisecond can they do?
250 * 7 = 1750 stores.
Your hashtable has 500 million entries or so?
I have here.
Odds of 1750 stores by accident hammering onto that slot out of 500 million is : 1750 / 500 million = 1.75 / 0.5 million = 3.5 / 1 million = 0.0000035
Also close to 1 in a million.
So we look at something having odds 1 in 10^12 or so.
That's worse than the 10^10 claim i did do Bob.
And this for our theoretic worst case scenario. It just won't happen a lot. Definitely not dozen times a minute.
You can easily measure write errors Bob. Why not do it?
Just add a 8 bits CRC to each hashtable entry and check this first when scanning hashtable. report exactly the entry and what it has been filled with so that you don't count the same write error a 1000 times Smile
You'll see that in some conditions you will get 0 ever if you have settings that avoid this write error from happening.
Yet my once in a 10^10 stores chance of a write error is what i stated and thats what i measured with Diep.
You've got ECC
Note that my original measurements were on a supercomputer with 200 cpu's and on a dual k7 at home.
For todays hardware odds is way smaller than those machines.
This 10^10 is for a write error. Say Abcd or ABCd or ABcd.
Note that for AbCD you need 2 of these occassions to happen at the same slot with another few constraints so the odds for that is more than the square of the odds of a simple write error.
So odds for that is easy to see is far less than 10^20, in short will never happen in our lifetime.