marsell wrote: ↑Fri Aug 14, 2020 1:28 pm
-mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.
The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
I ran two tests so far on this. First I ran SFNNUE Aug 13 on one thread vs. final pre-NNUE SF (July 31) on four threads at 30" +.5" (at 5 GHz), and got a 56 to 54 win for NNUE. Then I quadrupled the time limit to 2' + 2" and got 53 to 51 for NNUE. So there doesn't appear to be a falling-off with more time, and at least based on one thread the 4x CPU claim appears to be true. Now it must be repeated with more threads, like 32 vs 8 or even 64 vs 16, to see if it is really generally valid. Perhaps you'd like to run this, if not I can.
Laskos wrote: ↑Fri Aug 14, 2020 10:36 am
-mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
I do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
marsell wrote: ↑Fri Aug 14, 2020 1:28 pm
-mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.
The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
I ran two tests so far on this. First I ran SFNNUE Aug 13 on one thread vs. final pre-NNUE SF (July 31) on four threads at 30" +.5" (at 5 GHz), and got a 56 to 54 win for NNUE. Then I quadrupled the time limit to 2' + 2" and got 53 to 51 for NNUE. So there doesn't appear to be a falling-off with more time, and at least based on one thread the 4x CPU claim appears to be true. Now it must be repeated with more threads, like 32 vs 8 or even 64 vs 16, to see if it is really generally valid. Perhaps you'd like to run this, if not I can.
That why we need to test with more then one thread, and longer time controls. And yes, I have been running these test and will continue to test NNUE. Unlike some I do not average 23 seconds per game and test at 1 thread, and call it good. And say +100 Elo. The truth is more down to earth....
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
marsell wrote: ↑Fri Aug 14, 2020 1:28 pm
-mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.
The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
Perhaps you'd like to run this, if not I can.
You should run your own testing. But just a reminder, if you are running 1 thread vs 2,3,4... threads. Or some kind of combination of x4 cores. You need to go into bios and lock all the cores to the same speed. Or the one core setup will be running much faster. Have you done this in the testing so far?
This is why I always run all testing with the speed locked on all cores. Because throttling is a bitch when testing chess engines!
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
marsell wrote: ↑Fri Aug 14, 2020 1:28 pm
-mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.
The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
Perhaps you'd like to run this, if not I can.
You should run your own testing. But just a reminder, if you are running 1 thread vs 2,3,4... threads. Or some kind of combination of x4 cores. You need to go into bios and lock all the cores to the same speed. Or the one core setup will be running much faster. Have you done this in the testing so far?
This is why I always run all testing with the speed locked on all cores. Because throttling is a bitch when testing chess engines!
No, I have never done that. I do test with hyperthreading off, which probably reduces the problem, but I suppose it is still an issue. I got an 89 to 81 score for NNUE vs final SF with 8 threads vs 2 at 30" + .5", better than my four vs one thread results, but I'll leave it to you to follow up with more threads if you wish in view of the throttling issue. Maybe it will turn out that quadruple CPU power is overstated, anyway it wasn't my statement.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
Laskos wrote: ↑Fri Aug 14, 2020 10:36 am
-mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
I do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.
The only thing I noticed with your "tests" is that you get higher draw rate than a typical correspondence chess match of today.
That really makes them super uninteresting for anything.
Laskos wrote: ↑Fri Aug 14, 2020 10:36 am
-mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
I do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.
The only thing I noticed with your "tests" is that you get higher draw rate than a typical correspondence chess match of today.
That really makes them super uninteresting for anything.
Testing very strong chess engines is like watching paint dry.. Unless you love it. I suggest you stick to TCEC.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
Jouni wrote: ↑Wed Aug 12, 2020 9:36 pm
Yes SF NNUE is equal to quadruple your CPU cores for free. Incredible .
I actually got a result that SFNNUE (a couple days ago) on one thread beat Stockfish 11 on seven threads, at 2' + 1", by 90 to 80! So you may be understating it!
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.