./stockfish bench 16384 16 300000 default time
===========================
Total time (ms) : 11100075
Nodes searched : 233436761771
Nodes/second : 21030196
./stockfish bench 16384 8 300000 default time
===========================
Total time (ms) : 11100001
Nodes searched : 160664514528
Nodes/second : 14474279
21030196/14474279 = 1.45...
Seems like this still suggests poor scaling from 8 to 16 cores. Realize this is a small amount of testing . Suggestions, criticisms, explanations welcome.
As I explained already, NPS is the wrong measure. Searching more nodes is not a goal in itself. Winning more games is. It's elo we care about. Nothing else.
In all likelyhood, what happens is that SF does not search faster (in NPS), but the nodes it calculates are less often wasted.
Besides, SMP is non deterministic, so you can't even conclude from a single bench run anything about NPS scaling.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
./stockfish bench 16384 16 300000 default time
===========================
Total time (ms) : 11100075
Nodes searched : 233436761771
Nodes/second : 21030196
./stockfish bench 16384 8 300000 default time
===========================
Total time (ms) : 11100001
Nodes searched : 160664514528
Nodes/second : 14474279
21030196/14474279 = 1.45...
Seems like this still suggests poor scaling from 8 to 16 cores. Realize this is a small amount of testing . Suggestions, criticisms, explanations welcome.
Interesting. But how does this compare to the version before Joona's patch?
Is this an improvement, even worse or similar to before?
Btw, I think searching 1 min per position would be sufficient, too.
I don't think nps will significantly change after that time.
./stockfish bench 16384 16 300000 default time
===========================
Total time (ms) : 11100075
Nodes searched : 233436761771
Nodes/second : 21030196
./stockfish bench 16384 8 300000 default time
===========================
Total time (ms) : 11100001
Nodes searched : 160664514528
Nodes/second : 14474279
21030196/14474279 = 1.45...
Seems like this still suggests poor scaling from 8 to 16 cores. Realize this is a small amount of testing . Suggestions, criticisms, explanations welcome.
Interesting. But how does this compare to the version before Joona's patch?
Is this an improvement, even worse or similar to before?
Btw, I think searching 1 min per position would be sufficient, too.
I don't think nps will significantly change after that time.
Here are data for Stockfish-f8f5dcbb682830a66a37f68f3c192bbbfc84a33a, from just before Joona's patch. This time with 1 minute per position. Will now retest the latest SF, at 1 minute per position.
./stockfish bench 16384 16 300000 default time
===========================
Total time (ms) : 11100075
Nodes searched : 233436761771
Nodes/second : 21030196
./stockfish bench 16384 8 300000 default time
===========================
Total time (ms) : 11100001
Nodes searched : 160664514528
Nodes/second : 14474279
21030196/14474279 = 1.45...
Seems like this still suggests poor scaling from 8 to 16 cores. Realize this is a small amount of testing . Suggestions, criticisms, explanations welcome.
Interesting. But how does this compare to the version before Joona's patch?
Is this an improvement, even worse or similar to before?
Btw, I think searching 1 min per position would be sufficient, too.
I don't think nps will significantly change after that time.
Here are data for Stockfish-f8f5dcbb682830a66a37f68f3c192bbbfc84a33a, from just before Joona's patch. This time with 1 minute per position. Will now retest the latest SF, at 1 minute per position.
./stockfish bench 16384 16 300000 default time
===========================
Total time (ms) : 11100075
Nodes searched : 233436761771
Nodes/second : 21030196
./stockfish bench 16384 8 300000 default time
===========================
Total time (ms) : 11100001
Nodes searched : 160664514528
Nodes/second : 14474279
21030196/14474279 = 1.45...
Seems like this still suggests poor scaling from 8 to 16 cores. Realize this is a small amount of testing . Suggestions, criticisms, explanations welcome.
Interesting. But how does this compare to the version before Joona's patch?
Is this an improvement, even worse or similar to before?
Btw, I think searching 1 min per position would be sufficient, too.
I don't think nps will significantly change after that time.
Here are data for Stockfish-f8f5dcbb682830a66a37f68f3c192bbbfc84a33a, from just before Joona's patch. This time with 1 minute per position. Will now retest the latest SF, at 1 minute per position.
./stockfish bench 16384 16 300000 default time
===========================
Total time (ms) : 11100075
Nodes searched : 233436761771
Nodes/second : 21030196
./stockfish bench 16384 8 300000 default time
===========================
Total time (ms) : 11100001
Nodes searched : 160664514528
Nodes/second : 14474279
21030196/14474279 = 1.45...
Seems like this still suggests poor scaling from 8 to 16 cores. Realize this is a small amount of testing . Suggestions, criticisms, explanations welcome.
What hardware?
One issue with AMD is that most of the AMD BIOS chips give you two choices on memory setup. (a) NUMA (b) SMP.
NUMA is the traditional NUMA approach, if you have two chips as you do, with say 16gb of DRAM, chip 0 will have addresses 0-8gb and chip 1 will have addresses 8gb-16gb.
SMP interleaves pages between the two chips, so that chip 0 gets addresses 0-4K, chip 1 gets 4K-8K, chip 0 gets 8K-12K, etc. If a program understands NUMA, using the SMP setting will break it badly. If it doesn't understand NUMA, the SMP setting will help avoid memory hotspots but does introduce delays.
If it is intel, I don't believe they have done this, at least not on any machines I have run on and I have used a bunch of 'em over time.
lucasart wrote:As I explained already, NPS is the wrong measure. Searching more nodes is not a goal in itself. Winning more games is. It's elo we care about. Nothing else.
In all likelyhood, what happens is that SF does not search faster (in NPS), but the nodes it calculates are less often wasted.
Besides, SMP is non deterministic, so you can't even conclude from a single bench run anything about NPS scaling.
You are wrong, as several have already told you. NPS scaling tells what percentage of the hardware you are able to use. your SMP speedup is bound by the NPS speedup. If you search 1M nodes per second on one CPU, and only 8M on 16 cores, you are wasting 1/2 of the hardware and you will NEVER get an SMP speedup > 8x. And in reality it will be less.
NPS is just an upper bound, but it is an important number, because it gives you that upper bound that you can never exceed. But just because a program gets a 15x-16x NPS speedup does not mean they search 15x-16x faster. SMP overhead is still there.
Try to understand a topic before making absolute statements.
./stockfish bench 16384 16 300000 default time
===========================
Total time (ms) : 11100075
Nodes searched : 233436761771
Nodes/second : 21030196
./stockfish bench 16384 8 300000 default time
===========================
Total time (ms) : 11100001
Nodes searched : 160664514528
Nodes/second : 14474279
21030196/14474279 = 1.45...
Seems like this still suggests poor scaling from 8 to 16 cores. Realize this is a small amount of testing . Suggestions, criticisms, explanations welcome.
What hardware?
One issue with AMD is that most of the AMD BIOS chips give you two choices on memory setup. (a) NUMA (b) SMP.
NUMA is the traditional NUMA approach, if you have two chips as you do, with say 16gb of DRAM, chip 0 will have addresses 0-8gb and chip 1 will have addresses 8gb-16gb.
SMP interleaves pages between the two chips, so that chip 0 gets addresses 0-4K, chip 1 gets 4K-8K, chip 0 gets 8K-12K, etc. If a program understands NUMA, using the SMP setting will break it badly. If it doesn't understand NUMA, the SMP setting will help avoid memory hotspots but does introduce delays.
If it is intel, I don't believe they have done this, at least not on any machines I have run on and I have used a bunch of 'em over time.