Well Bob, suppose you would be a CPU designer.bob wrote:Michael is not thinking of what I am talking about. If the CPU only stores A1, then pauses for any reason from interrupt to hyper-threading context swap, the other can store B1, B2 and then the first later comes back to the current process and stores A2.diep wrote:More explanation on how memory controller works:
Vincent says: (4:26:57 AM) we write nonstop with all cores to RAM
Vincent says: (4:27:13 AM) now suppose 2 cores write at same time to the same adress in RAM
Vincent says: (4:27:34 AM) one entry is 16 bytes
Vincent says: (4:27:50 AM) core1 writes (a1,a2)
Vincent says: (4:27:59 AM) core2 writes (b1,b2)
Vincent says: (4:28:15 AM) can we get in ram stored (a1,b2) or (b1,a2) ?
Vincent says: (4:28:22 AM) as that would mean big problem
Michael says: (4:28:51 AM) no. that's not really how it works
Michael says: (4:29:37 AM) all cores share the same two memory controllers via crossbar
Michael says: (4:30:11 AM) and that is essentially the same thing as if they only had one memory controller / channel
Michael says: (4:30:25 AM) so they cannot write "simultaneously" to memory
Anyway, i knew this already back in 2002 or so. Just asked it to confirm it.
Note that at some HPC processors like R14000 it can happen you get write error, as they work more complex than pc processors.
Thanks,
Vincent
But for hardware, I still believe he is wrong, because of the "write-combine" buffers Intel does. A1 could be on the tail end of one of those buffers and get written out _before_ A2 gets written out. Other cores can then overwrite A1 and A2 before before A2 gets written, and we are right back to this problem. The X86 actually has load/store fence operations you can use to help this, if you are writing in asm. But even those do not solve the problem where two consecutive mov [mem], rax and mov [mem+8], rbx do not necessarily get executed close together, even though they appear consecutively in the program being run. And without that guarantee, we have to take evasive action ourselves...
Hyperthreading doesn't matter of course in this context.
Let's first look at a quadcore cpu with 4 cores.
Suppose that cores could simultaneously write to the same cache line and pollute their own L2 like that, instead of writing ENTIRE cacheline from its cache to RAM.
Just SUPPOSE.
Then it is suddenly total impossible to sell that processor.
Because not a single mechanism works correct anymore with respect to RAM.
So the only occurance it can happen is when there is more between the memory controller implemented and the CPU.
That happens for example in R14000. There is a hub in between. So if interrupt happens that hub could pollute it.
Note the bandwidth of those CPU's to RAM is real ugly because of this.
We're speaking of something that has 1GB/s bandwidth.
Todays Nehalem is on paper 32GB/s. Some benchmarks i saw it practical get 19GB/s and even some claims of 21 to 22 GB/s have been done.
You don't have the bandwidth to implement latency sensitive stuff in that case.
Vincent