An alternative would be to have a global struct that holds all the tables, and have 1 cache-line of padding before and after it.TomKerrigan wrote:Okay, that was my understanding as well.bob wrote:Sharing things that are read-only is fine. They will end up in all caches, and they won't be swapped around. The problem hits when you have one cache block that gets modified in multiple threads. That results in a LOT of cache traffic with all the forwarding and invalidate requests being shipped around.TomKerrigan wrote:My engine shares several arrays/matrices between threads.
I assumed this would be fine because they arrays are never modified.
Something must be wrong with my understanding of memory/cache coherency systems because having multiple copies of these arrays (one per thread) sped up my engine from 70% to 90% of what I should be able to achieve with 16 cores.
(Ideally I want the same performance as running 16 separate processes, which is surprisingly and impressively just as fast as running 1 process. I checked.)
Not sure how I'm going to attack that last 10% but maybe there are some more global variables that are gumming up the works...
I think Matthew was on to something, maybe these arrays that I duplicated were laid out in memory next to something the threads WERE modifying.
Probably the best thing to do is put as much data as possible into the engine class to avoid this sort of possibility. Back when I was writing most of this code in 1997-1998 one of my goals was to conserve memory but now that we have computers with many gigabytes of RAM, duplicating some small tables seems like a complete non-issue.
Code: Select all
struct {
uint8_t padding[64];
// table 1
// table 2
// ...
uint8_t padding[64];
} globals;