Actually, it is a single cache line that accounts for 95% of the shared cache line hits.syzygy wrote:So Komodo9 seems to have a few cache lines that are accessed relatively often by different threads.
This does not necessarily mean there is false sharing. It could be a single variable that is intended to be shared, such as a lock or a counter. (But if it's a counter, then it should be relatively easy to replace it with per-thread counters.)
It would be interesting to know if "perf c2c" reports a lot more sharing for Komodo compiled with LTO than for Komodo compiled without LTO.
It seems perf c2c report -N also reports node info. If there is sharing between threads on the same node but not between threads on different nodes, then cache-line sharing might not be the reason for the observed slowdown. (This could normally only be the case if Komodo does NUMA-specific things, which I don't know about.)