Here's the problem. If the CPU needs something from memory, it takes a variable number of clock cycles. Say 4 clocks for a L1 hit. Maybe 20 for a L2 hit. More for a L3 hit. And thousands of clock cycles if we have to go to real memory. That tends to stall a core since it can't proceed if all pending instructions depend on data values coming in from memory. During those "pauses" the other logical core can use that core's resources to work on a second instruction stream.Werewolf wrote:OK, I think I get it. But surely HT can't just _magically_ increase the performance of a core. If chess demands the core's full attention, and assuming the thread it uses is not blocked (which I'm assuming is the case) then trying to get a 2nd thread to do something on the same core would surely be like a riding a bike and trying to play the piano at the same time.bob wrote:
No. The biggest HT gain comes from memory accesses. When you get a L1 cache miss and have to wait for 20 or so cycles or whatever for L2, or longer for L3, or MUCH longer for main memory, the other logical "core" can use the resources to continue since the first thread is "blocked" (much like what happens in a multiprogramming operating system when a process does I/O and others run while it is blocked.
c) Chess demands 100% processing power of each core
d) Therefore HT will simply decrease performance for chess by 30%, for reasons stated above.
If that statement is wrong I've misunderstood things!
Very much as a modern operating system "interleaves" the execution of two processes as they block waiting on I/O. The more such "blocking" happens, the better that second logical core looks. If you are running completely out of L1 cache, you will barely see any HT speedup. If you don't fit in L1, and depend more on L2 or L3 or even main memory, then HT can help more...
Even though you "think" a core is 100% busy, it spends a significant amount of time waiting on data from cache and/or main memory...