SPCC: Testrun of SF nnue gk200627 finished

Ron Langeveld · Post by **Ron Langeveld** » Fri Jul 24, 2020 8:39 am

pohl4711 wrote: ↑Thu Jul 23, 2020 4:46 pm
Ron Langeveld wrote: ↑Thu Jul 23, 2020 10:18 am You should not run 20 games at the same time on a 12 core (24 thread) laptop. The conditions won't be the same (unreliable results).
That is not true. All engines run slower, if more than 12 threads are used. Thats clear. But this means not a distortion. Only a slowdown.
I opened 20x Stockfish 11-engine in console mode in Windows and started all of them with "go infinite". All of them ran smooth and with stable speed. As long as at least one thread is not in use (free for Windows-operations), there is no distortion. And I keep 4 threads unused.

Every physical core has a second virtual core/thread that is a lot slower in nps. Your test is seriously flawed because you assume that each of the 20 Stockfish instances were fixed on the the thread they started on which is absolutely not true. Each instance gets to use many threads over time and in the end the nps of the session averages out and looks the same for the period you run it. In a test game where a move is calculated in a much shorter time frame like 1 second there will have been considerably less thread switching and the chance of moves being the result of low nps versus high nps moves is increasing considerably with inconsistent quality as a result. If you don't belief me then pick any pool of engines with a known difference in elo and try to establish these differences with specific error margins in a test with 11 games in parallel and a test with 20 games in parallel. You'll will notice that the latter test will need more games to get to the same level of certainty.

pohl4711 · Post by **pohl4711** » Fri Jul 24, 2020 1:22 pm

Ron Langeveld wrote: ↑Fri Jul 24, 2020 8:39 am
pohl4711 wrote: ↑Thu Jul 23, 2020 4:46 pm
Ron Langeveld wrote: ↑Thu Jul 23, 2020 10:18 am You should not run 20 games at the same time on a 12 core (24 thread) laptop. The conditions won't be the same (unreliable results).
That is not true. All engines run slower, if more than 12 threads are used. Thats clear. But this means not a distortion. Only a slowdown.
I opened 20x Stockfish 11-engine in console mode in Windows and started all of them with "go infinite". All of them ran smooth and with stable speed. As long as at least one thread is not in use (free for Windows-operations), there is no distortion. And I keep 4 threads unused.
Every physical core has a second virtual core/thread that is a lot slower in nps. Your test is seriously flawed because you assume that each of the 20 Stockfish instances were fixed on the the thread they started on which is absolutely not true. Each instance gets to use many threads over time and in the end the nps of the session averages out and looks the same for the period you run it. In a test game where a move is calculated in a much shorter time frame like 1 second there will have been considerably less thread switching and the chance of moves being the result of low nps versus high nps moves is increasing considerably with inconsistent quality as a result. If you don't belief me then pick any pool of engines with a known difference in elo and try to establish these differences with specific error margins in a test with 11 games in parallel and a test with 20 games in parallel. You'll will notice that the latter test will need more games to get to the same level of certainty.

All I can say is: My results are valid. Look at the latest: Stockfish 200717 is +30 Elo better than Stockfish 11 in my ratinglist:
https://www.sp-cc.de
And look at the regression-test page of Stockfish:
https://github.com/glinscott/fishtest/w ... sion-Tests
Progress of Stockfish 200717 (single) to Stockfish 11: +30.7 Elo.

Ron Langeveld · Post by **Ron Langeveld** » Fri Jul 24, 2020 2:18 pm

Of course your results can be valid. I never said they weren't. I was addressing another issue though, which basically boils down to your tests suffering from "noise" due to a significant percentage of weak moves as a result of low nps. This means that you will have to run many more games in order to get to the same accuracy in results. This means that if you measure a 30 point elo difference you could have played less games to get there with the same error margins when you just use 11 physical cores.

Rebel · Post by **Rebel** » Fri Jul 24, 2020 4:44 pm

Ron Langeveld wrote: ↑Fri Jul 24, 2020 2:18 pm Of course your results can be valid. I never said they weren't. I was addressing another issue though, which basically boils down to your tests suffering from "noise" due to a significant percentage of weak moves as a result of low nps. This means that you will have to run many more games in order to get to the same accuracy in results. This means that if you measure a 30 point elo difference you could have played less games to get there with the same error margins when you just use 11 physical cores.

Assumptions without evidence. Show me one case.

pohl4711 · Post by **pohl4711** » Sat Jul 25, 2020 6:03 am

Rebel wrote: ↑Fri Jul 24, 2020 4:44 pm
Ron Langeveld wrote: ↑Fri Jul 24, 2020 2:18 pm Of course your results can be valid. I never said they weren't. I was addressing another issue though, which basically boils down to your tests suffering from "noise" due to a significant percentage of weak moves as a result of low nps. This means that you will have to run many more games in order to get to the same accuracy in results. This means that if you measure a 30 point elo difference you could have played less games to get there with the same error margins when you just use 11 physical cores.
Assumptions without evidence. Show me one case.

I made an experiment on my Quadcore Notebook (8 hyperthreading "cores"): Run 5x Stockfish and then I started 2 more Stockfish simultaneously. As I expected, the 2 Stockfish ran exactly at the same speed, even though 5 other Stockfish were running and I had 7 Stockfish in total running on a Quadcore.
QED

Rebel · Post by **Rebel** » Sat Jul 25, 2020 8:35 am

pohl4711 wrote: ↑Sat Jul 25, 2020 6:03 am
Rebel wrote: ↑Fri Jul 24, 2020 4:44 pm
Ron Langeveld wrote: ↑Fri Jul 24, 2020 2:18 pm Of course your results can be valid. I never said they weren't. I was addressing another issue though, which basically boils down to your tests suffering from "noise" due to a significant percentage of weak moves as a result of low nps. This means that you will have to run many more games in order to get to the same accuracy in results. This means that if you measure a 30 point elo difference you could have played less games to get there with the same error margins when you just use 11 physical cores.
Assumptions without evidence. Show me one case.
I made an experiment on my Quadcore Notebook (8 hyperthreading "cores"): Run 5x Stockfish and then I started 2 more Stockfish simultaneously. As I expected, the 2 Stockfish ran exactly at the same speed, even though 5 other Stockfish were running and I had 7 Stockfish in total running on a Quadcore.
QED

I have done many tests using all the threats available and never noticed any problem. It's an important subject, the base of measuring possible elo improvements and so I did several tests playing exact same engines against each other at full speed, after the match running a tool inspecting the output, looked perfect every time.

I don't pretend to know the truth of the matter and while I understand the philosophical assumption about the "cores-1" rule I haven't seen any proof of that and it's quite well possible it's a created myth.

Modern Times · Post by **Modern Times** » Sat Jul 25, 2020 8:57 am

Someone here commented a while ago that hyperthreading, and the AMD equivalent, has probably come a long way since it was first Introduced by Intel all those years ago. Stefan knows what he is doing so I'm inclined to think all is OK.

pohl4711 · Post by **pohl4711** » Sat Jul 25, 2020 10:33 am

Rebel wrote: ↑Sat Jul 25, 2020 8:35 am
I don't pretend to know the truth of the matter and while I understand the philosophical assumption about the "cores-1" rule I haven't seen any proof of that and it's quite well possible it's a created myth.

I do not believe, that using all threads will distort the results by running some engines with more or less speed than others. But when Windows does some hardware-using with more or less efforts, the running engines could be running a little bit faster or slower. That is no big deal, but when using notebooks 24/7, it is better not to use 100% of the hardware...So, I use 20 of 24 threads and all is fine.

Rebel · Post by **Rebel** » Sat Jul 25, 2020 12:12 pm

pohl4711 wrote: ↑Sat Jul 25, 2020 10:33 am
Rebel wrote: ↑Sat Jul 25, 2020 8:35 am
I don't pretend to know the truth of the matter and while I understand the philosophical assumption about the "cores-1" rule I haven't seen any proof of that and it's quite well possible it's a created myth.
I do not believe, that using all threads will distort the results by running some engines with more or less speed than others. But when Windows does some hardware-using with more or less efforts, the running engines could be running a little bit faster or slower. That is no big deal, but when using notebooks 24/7, it is better not to use 100% of the hardware...So, I use 20 of 24 threads and all is fine.

If a PC has internet and/or anti-virus software installed it's a wise thing to do, on a clean PC (IMO) there is no need.

SPCC: Testrun of SF nnue gk200627 finished

Re: SPCC: Testrun of SF nnue gk200627 finished

Re: SPCC: Testrun of SF nnue gk200627 finished

Re: SPCC: Testrun of SF nnue gk200627 finished

Re: SPCC: Testrun of SF nnue gk200627 finished

Re: SPCC: Testrun of SF nnue gk200627 finished

Re: SPCC: Testrun of SF nnue gk200627 finished

Re: SPCC: Testrun of SF nnue gk200627 finished

Re: SPCC: Testrun of SF nnue gk200627 finished

Re: SPCC: Testrun of SF nnue gk200627 finished