Every physical core has a second virtual core/thread that is a lot slower in nps. Your test is seriously flawed because you assume that each of the 20 Stockfish instances were fixed on the the thread they started on which is absolutely not true. Each instance gets to use many threads over time and in the end the nps of the session averages out and looks the same for the period you run it. In a test game where a move is calculated in a much shorter time frame like 1 second there will have been considerably less thread switching and the chance of moves being the result of low nps versus high nps moves is increasing considerably with inconsistent quality as a result. If you don't belief me then pick any pool of engines with a known difference in elo and try to establish these differences with specific error margins in a test with 11 games in parallel and a test with 20 games in parallel. You'll will notice that the latter test will need more games to get to the same level of certainty.pohl4711 wrote: ↑Thu Jul 23, 2020 4:46 pmThat is not true. All engines run slower, if more than 12 threads are used. Thats clear. But this means not a distortion. Only a slowdown.Ron Langeveld wrote: ↑Thu Jul 23, 2020 10:18 am You should not run 20 games at the same time on a 12 core (24 thread) laptop. The conditions won't be the same (unreliable results).
I opened 20x Stockfish 11-engine in console mode in Windows and started all of them with "go infinite". All of them ran smooth and with stable speed. As long as at least one thread is not in use (free for Windows-operations), there is no distortion. And I keep 4 threads unused.
SPCC: Testrun of SF nnue gk200627 finished
Moderators: hgm, Rebel, chrisw
-
- Posts: 140
- Joined: Tue Jan 05, 2010 8:02 pm
Re: SPCC: Testrun of SF nnue gk200627 finished
-
- Posts: 2434
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: SPCC: Testrun of SF nnue gk200627 finished
All I can say is: My results are valid. Look at the latest: Stockfish 200717 is +30 Elo better than Stockfish 11 in my ratinglist:Ron Langeveld wrote: ↑Fri Jul 24, 2020 8:39 amEvery physical core has a second virtual core/thread that is a lot slower in nps. Your test is seriously flawed because you assume that each of the 20 Stockfish instances were fixed on the the thread they started on which is absolutely not true. Each instance gets to use many threads over time and in the end the nps of the session averages out and looks the same for the period you run it. In a test game where a move is calculated in a much shorter time frame like 1 second there will have been considerably less thread switching and the chance of moves being the result of low nps versus high nps moves is increasing considerably with inconsistent quality as a result. If you don't belief me then pick any pool of engines with a known difference in elo and try to establish these differences with specific error margins in a test with 11 games in parallel and a test with 20 games in parallel. You'll will notice that the latter test will need more games to get to the same level of certainty.pohl4711 wrote: ↑Thu Jul 23, 2020 4:46 pmThat is not true. All engines run slower, if more than 12 threads are used. Thats clear. But this means not a distortion. Only a slowdown.Ron Langeveld wrote: ↑Thu Jul 23, 2020 10:18 am You should not run 20 games at the same time on a 12 core (24 thread) laptop. The conditions won't be the same (unreliable results).
I opened 20x Stockfish 11-engine in console mode in Windows and started all of them with "go infinite". All of them ran smooth and with stable speed. As long as at least one thread is not in use (free for Windows-operations), there is no distortion. And I keep 4 threads unused.
https://www.sp-cc.de
And look at the regression-test page of Stockfish:
https://github.com/glinscott/fishtest/w ... sion-Tests
Progress of Stockfish 200717 (single) to Stockfish 11: +30.7 Elo.
-
- Posts: 140
- Joined: Tue Jan 05, 2010 8:02 pm
Re: SPCC: Testrun of SF nnue gk200627 finished
Of course your results can be valid. I never said they weren't. I was addressing another issue though, which basically boils down to your tests suffering from "noise" due to a significant percentage of weak moves as a result of low nps. This means that you will have to run many more games in order to get to the same accuracy in results. This means that if you measure a 30 point elo difference you could have played less games to get there with the same error margins when you just use 11 physical cores.
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: SPCC: Testrun of SF nnue gk200627 finished
Assumptions without evidence. Show me one case.Ron Langeveld wrote: ↑Fri Jul 24, 2020 2:18 pm Of course your results can be valid. I never said they weren't. I was addressing another issue though, which basically boils down to your tests suffering from "noise" due to a significant percentage of weak moves as a result of low nps. This means that you will have to run many more games in order to get to the same accuracy in results. This means that if you measure a 30 point elo difference you could have played less games to get there with the same error margins when you just use 11 physical cores.
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 2434
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: SPCC: Testrun of SF nnue gk200627 finished
I made an experiment on my Quadcore Notebook (8 hyperthreading "cores"): Run 5x Stockfish and then I started 2 more Stockfish simultaneously. As I expected, the 2 Stockfish ran exactly at the same speed, even though 5 other Stockfish were running and I had 7 Stockfish in total running on a Quadcore.Rebel wrote: ↑Fri Jul 24, 2020 4:44 pmAssumptions without evidence. Show me one case.Ron Langeveld wrote: ↑Fri Jul 24, 2020 2:18 pm Of course your results can be valid. I never said they weren't. I was addressing another issue though, which basically boils down to your tests suffering from "noise" due to a significant percentage of weak moves as a result of low nps. This means that you will have to run many more games in order to get to the same accuracy in results. This means that if you measure a 30 point elo difference you could have played less games to get there with the same error margins when you just use 11 physical cores.
QED
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: SPCC: Testrun of SF nnue gk200627 finished
I have done many tests using all the threats available and never noticed any problem. It's an important subject, the base of measuring possible elo improvements and so I did several tests playing exact same engines against each other at full speed, after the match running a tool inspecting the output, looked perfect every time.pohl4711 wrote: ↑Sat Jul 25, 2020 6:03 amI made an experiment on my Quadcore Notebook (8 hyperthreading "cores"): Run 5x Stockfish and then I started 2 more Stockfish simultaneously. As I expected, the 2 Stockfish ran exactly at the same speed, even though 5 other Stockfish were running and I had 7 Stockfish in total running on a Quadcore.Rebel wrote: ↑Fri Jul 24, 2020 4:44 pmAssumptions without evidence. Show me one case.Ron Langeveld wrote: ↑Fri Jul 24, 2020 2:18 pm Of course your results can be valid. I never said they weren't. I was addressing another issue though, which basically boils down to your tests suffering from "noise" due to a significant percentage of weak moves as a result of low nps. This means that you will have to run many more games in order to get to the same accuracy in results. This means that if you measure a 30 point elo difference you could have played less games to get there with the same error margins when you just use 11 physical cores.
QED
I don't pretend to know the truth of the matter and while I understand the philosophical assumption about the "cores-1" rule I haven't seen any proof of that and it's quite well possible it's a created myth.
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 3546
- Joined: Thu Jun 07, 2012 11:02 pm
Re: SPCC: Testrun of SF nnue gk200627 finished
Someone here commented a while ago that hyperthreading, and the AMD equivalent, has probably come a long way since it was first Introduced by Intel all those years ago. Stefan knows what he is doing so I'm inclined to think all is OK.
-
- Posts: 2434
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: SPCC: Testrun of SF nnue gk200627 finished
I do not believe, that using all threads will distort the results by running some engines with more or less speed than others. But when Windows does some hardware-using with more or less efforts, the running engines could be running a little bit faster or slower. That is no big deal, but when using notebooks 24/7, it is better not to use 100% of the hardware...So, I use 20 of 24 threads and all is fine.
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: SPCC: Testrun of SF nnue gk200627 finished
If a PC has internet and/or anti-virus software installed it's a wise thing to do, on a clean PC (IMO) there is no need.pohl4711 wrote: ↑Sat Jul 25, 2020 10:33 amI do not believe, that using all threads will distort the results by running some engines with more or less speed than others. But when Windows does some hardware-using with more or less efforts, the running engines could be running a little bit faster or slower. That is no big deal, but when using notebooks 24/7, it is better not to use 100% of the hardware...So, I use 20 of 24 threads and all is fine.
90% of coding is debugging, the other 10% is writing bugs.