Writes can be combined, but it still starts with a "read for ownership" which fetches the cacheline even if you end up overwriting it completely. A cpu core cannot write a cacheline before it has exclusive ownership. There are some exceptions such as rep stosb, which on modern CPUs is implemented using microcode which avoid reading the cacheline.hgm wrote: ↑Mon Jun 09, 2025 10:07 pmWell, I doubt it. The technique is known as 'write combining', and I get lots of hits on it from Google. E.g. https://stackoverflow.com/questions/772 ... ack-memory .syzygy wrote: ↑Mon Jun 09, 2025 7:23 pm To write to a location in RAM, a CPU core first needs to issue a "request for ownership" on the location's cache line. An RFO fetches the cache line from RAM to cache.
It seems Intel has a patent on doing an RFO_NODATA, which does not fetch the cacheline's content from RAM. But it seems this was intended for implementing special instructions which perform "non-temporal" writes, such as MOVNTDQA.
https://www.felixcloutier.com/x86/movntdqa
I think that the optimization your propose does not play well with the strongly ordered memory model of x86/x86-64, unless the CPU (or the compiler) can predict that the full cache line will be written so that it knows in advance that an RFO_NODATA suffices, but even then there might be complications.
("Write combine memory" is yet a different concept. This is used for non-cached video memory and is only weakly ordered.)