The Prefetch() command works something like this:
This next report indicates that hyper-threading should be turned off if using prefetch for a multi-threaded environment. That is, turn on hyper-threading or prefetch, but not both in the system BIOS."Inter-core prefetching uses one compute thread and one or more prefetching threads. The prefetching threads execute on cores that
would otherwise be idle, prefetching the data that the compute thread will need." [ Inter-core Prefetching for Multi-core Processors Using Migrating Helper Threads; Md Kamruzzaman, Steven Swanson, Dean M. Tullsen ]
https://www.ixsystems.com/community/thr ... etch.1226/
Supposedly, multiple prefetch buffers can be allocated on the L2 cache. Does each CPU have its own L2 cache prefetch buffer, or is there only one prefetch thread for the main bus transfer? For Shared Hash Table search threading design, the prefetch thread should apparently be attached to the hash table and not the search thread. The results might be different for engines where each thread has its own hash table.
Assuming the prefetch buffer is allocated to the CPU and not the number of threads using a CPU, then multi-threading on a CPU using prefetch would try to over-write each others buffers. Even if the prefetch thread used locks, the efficiency would be expected to deteriorate. More code identifying multi-thread conditions looks unwise since it would quickly defeat the efficiency expected of prefetch to reduce latency. It is looking as if prefetch can be an inefficient operation.
So the question, is prefetch effective on anything other than single CPU and single thread searches?