jhellis3 wrote:You are wrong, as several have already told you. NPS scaling tells what percentage of the hardware you are able to use. your SMP speedup is bound by the NPS speedup. If you search 1M nodes per second on one CPU, and only 8M on 16 cores, you are wasting 1/2 of the hardware and you will NEVER get an SMP speedup > 8x. And in reality it will be less.
Well, if one wants to be pedantic.... This statement is wrong. If you search 16x many nodes in a given unit of time T with terrible efficiency (1/2 are redundant), then the effective speedup is only 8x. On the other hand, searching 12x as many nodes in T with say 90% efficiency is an effective speedup of 10.8x. I would take the 10.8x over the 8x, but that's just me.
At the end of the day though, Lucas is right in that it doesn't really matter what one does as long as the end result is higher Elo.
Please read what I wrote.
1. NPS gives an absolute upper bound on speedup. If you search 10M naps with one cpu, and only 10M with 16 cpus, you will NEVER get any speedup, because you are doing no "extra work" during that search time. NPS is an important number because it provides this upper bound on performance.
2. Once you run a real SMP test, time to depth, and compute the speedup, you can ask yourself, "OK, how well am I doing here?"
Here's a sample:
Your NPS speedup is only 10x on 16 cpus. Your parallel search speedup is 6x. 6x out of 16 sounds bad. But it is not 6x out of 16, it is 6x out of a max of 10x. Which is not as bad as it originally sounded. What to do here? You can try to improve the parallel search, which will NEVER get to 10x due to search overhead, so you struggle to get part of that 4x you are losing, or you improve the underlying code to try to get that 6x NPS boost that is missing.. If you get half of that back, you will get half of that parallel search lost performance as well.
NPS doesn't measure speedup, but it absolutely measures the upper bound on speedup that is possible. Assuming a perfect parallel search with zero overhead, you can not exceed the NPS speedup, ever.
So both numbers are important. NPS gives information about cache traffic, memory traffic, potential lock interference, and such. Parallel speedup gives information about things like thread waiting times, extra nodes searched due to poor splitting, etc. They are related, but not the same thing. BOTH are critical numbers.