It's interesting to me, going from 8 threads to 16 yields a speed increase of almost 5 million nodes per
second but no depth increase. I wonder why that is? I would have expected a small increase, but you
get an identical depth of 16.7 for all threads at 8 and beyond yet the nodes per second is clearly increasing.
It's interesting to me, going from 8 threads to 16 yields a speed increase of almost 5 million nodes per
second but no depth increase. I wonder why that is? I would have expected a small increase, but you
get an identical depth of 16.7 for all threads at 8 and beyond yet the nodes per second is clearly increasing.
regards,
--tom
regards,
--tom
The search widens with more threads. So that 16.7 depth with 16 threads is stronger than 16.7 depth with 8 threads. According to this data, 16 threads is the best to use strength-wise.
But why?
If you swells the number of cores from 1 to 2, from 2 to 4 or from 4 to 8
not only the wide of search but the depth of search grows too - presuming the cores are physical cores. But if you uses logical cores (by HT/SMT modes) you can experience this strange phenomenon described above.
I think this issue is caused by the behavior of CPU and maybe the operating system too.
It's interesting to me, going from 8 threads to 16 yields a speed increase of almost 5 million nodes per
second but no depth increase. I wonder why that is? I would have expected a small increase, but you
get an identical depth of 16.7 for all threads at 8 and beyond yet the nodes per second is clearly increasing.
regards,
--tom
regards,
--tom
Hi Tom,
With "lazy SMP" search, the number of nodes required to reach a given depth increases as #threads increases. In the case of Wasp, it appears that the additional nodes and additional speed for cores beyond 8 just balance and the depth stays constant. But ask Kai mentioned, the search "widens" with more threads -- which means that different branches are examined and the chance of finding a better move increases. So perhaps Kai is right that 16 threads is best for Wasp. Maybe I'll run a match between 16 threads and 8 threads to see what happens. But it will take quite a while to get enough games....
It's interesting to me, going from 8 threads to 16 yields a speed increase of almost 5 million nodes per
second but no depth increase. I wonder why that is? I would have expected a small increase, but you
get an identical depth of 16.7 for all threads at 8 and beyond yet the nodes per second is clearly increasing.
regards,
--tom
regards,
--tom
Hi Tom,
With "lazy SMP" search, the number of nodes required to reach a given depth increases as #threads increases. In the case of Wasp, it appears that the additional nodes and additional speed for cores beyond 8 just balance and the depth stays constant. But ask Kai mentioned, the search "widens" with more threads -- which means that different branches are examined and the chance of finding a better move increases. So perhaps Kai is right that 16 threads is best for Wasp. Maybe I'll run a match between 16 threads and 8 threads to see what happens. But it will take quite a while to get enough games....
John
What I said is valid for Stockfish with Lazy SMP, I forgot that the numbers are for another engine.
I played a couple very short matches between my latest development version of Wasp using 8 threads and the same version using 16 threads and 12 threads. Time control was Game/10s + 167ms. Here are the results:
Wasp(8 threads) vs Wasp(16 threads): 97-104-308
Wasp(8 threads) vs Wasp(12 threads): 125-126-347
So it appears that my program is not making good use of the hyperthreads. I will start fiddling around with lazy SMP to see if I can improve the average search depth for the arasan19 test suite when using more than 8 threads.
I am very happy with the Ryzen processor and inexpensive ASUS B350M motherboard. Same nodes/second for a given clock speed as my i5 and good value for 8 cores & 16 threads. The stock cooler seems sufficient for 3.7 Ghz with Vcore=1.26V (temps are around 60C when running 10 simultaneous matches) and it's perfectly stable.
[quote="jstanback"]
I played a couple very short matches between my latest development version of Wasp using 8 threads and the same version using 16 threads and 12 threads. Time control was Game/10s + 167ms. Here are the results:
Wasp(8 threads) vs Wasp(16 threads): 97-104-308
Wasp(8 threads) vs Wasp(12 threads): 125-126-347
So it appears that my program is not making good use of the hyperthreads. I will start fiddling around with lazy SMP to see if I can improve the average search depth for the arasan19 test suite when using more than 8 threads.
I am very happy with the Ryzen processor and inexpensive ASUS B350M motherboard. Same nodes/second for a given clock speed as my i5 and good value for 8 cores & 16 threads. The stock cooler seems sufficient for 3.7 Ghz with Vcore=1.26V (temps are around 60C when running 10 simultaneous matches) and it's perfectly stable.
John
[/quote]
If you want to make some correct tests you need TWO PC with Ryzen 7 processor and you ought to make computer-computer matches instead of engine-engine matches on the same PC. Moreover on the PC running engine with 8 or less threads you have to switch OFF (=DISABLED) the SMT mode.
Because when SMT=AUTO the system BIOS pairs the physical and logical (SMT/HT) cores. This means that for an engine using for e.g. 8 cores BIOS gives to it 4 physical cores and 4 logical (SMT/HT) cores. Naturally 4 physical cores + 4 logical cores give lesser power of CPU than 8 physical cores.
jstanback wrote:
I played a couple very short matches between my latest development version of Wasp using 8 threads and the same version using 16 threads and 12 threads. Time control was Game/10s + 167ms. Here are the results:
Wasp(8 threads) vs Wasp(16 threads): 97-104-308
Wasp(8 threads) vs Wasp(12 threads): 125-126-347
So it appears that my program is not making good use of the hyperthreads. I will start fiddling around with lazy SMP to see if I can improve the average search depth for the arasan19 test suite when using more than 8 threads.
I am very happy with the Ryzen processor and inexpensive ASUS B350M motherboard. Same nodes/second for a given clock speed as my i5 and good value for 8 cores & 16 threads. The stock cooler seems sufficient for 3.7 Ghz with Vcore=1.26V (temps are around 60C when running 10 simultaneous matches) and it's perfectly stable.
John
If you want to make some correct tests you need TWO PC with Ryzen 7 processor and you ought to make computer-computer matches instead of engine-engine matches on the same PC. Moreover on the PC running engine with 8 or less threads you have to switch OFF (=DISABLED) the SMT mode.
Because when SMT=AUTO the system BIOS pairs the physical and logical (SMT/HT) cores. This means that for an engine using for e.g. 8 cores BIOS gives to it 4 physical cores and 4 logical (SMT/HT) cores. Naturally 4 physical cores + 4 logical cores give lesser power of CPU than 8 physical cores.
When I first ran my Ryzen I saw the problem you mention where Windows 10 was using hyperthreads for 4 of the 8 threads instead of using the 4 inactive threads. With SMT turn on, the nodes/sec for 8 threads was much worse than 8 * nodes/sec for 1 thread. I updated the BIOS and then changed to the Windows 10 high-performance power plan and this fixed the problem. Now I get the same nps for 8 cores whether SMT is turned on or off.
corres wrote:Because when SMT=AUTO the system BIOS pairs the physical and logical (SMT/HT) cores. This means that for an engine using for e.g. 8 cores BIOS gives to it 4 physical cores and 4 logical (SMT/HT) cores. Naturally 4 physical cores + 4 logical cores give lesser power of CPU than 8 physical cores.
Hi Robert
Interesting. So, is it your contention, that contrary to popular perception, Hyperthreading even for Ryzen Processors, is not really good for computer chess ?