mwyoung wrote: ↑Wed Dec 30, 2020 10:19 pm
Let me get this right. I posted data, you chimed in and gave a B.S. formula that NNUE clearly ignored in my posted data. And saying I was being deceitful.
I post independent data from CCRL. Again showing NNUE ignores your scaling formula. And then you also trash CCRL testing.
And now you want me to gather all the data for you.....
That would have been much wiser to do that first
Not at all. I am running games now to produce an argument that NNUE scales the same as other engines. I am asking you to provide a complete, all-in-one-place layout of the datapoints you believe give weight to your view, and for you to showcase those specific datapoints and explain why you believe they are the way they are.
I am doing the opposite of what you allege. I want to know exactly what your argument is, so that I can understand your position completely. If I know your position, and what fuels it, perhaps I will agree. Perhaps I won't be able to refute it. Who knows? Not me, at least not until my games finish playing and you present an argument.
Please copy paste the data that you are looking at here, and write up a brief summary of what is being shown.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra "Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm
Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.
I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.
Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.
mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm
Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.
I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.
Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.
I also saw no evidence and I can add that you need to start from the same playing strength and show that the classical engine gets a bigger improvement to claim better scaling.
If you claim that it is correct only for relatively long time control then you can start with a relatively long time control.
start with time control that NNUE engines score 50% against classical engine
You can give 10 minutes for all the game +10 seconds per move time control for NNUE and 200+200 time control for the classical engine(when the engines do not ponder) if you need it and if it is not enough give also more cores to the classical engine.
Increase the time control to
50+50 time control for NNUE and 1000+1000 time control for the classical engine.
If the classical engine score singinficantly better than 50% in the new time control then you prove your point.
mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm
Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.
I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.
Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.
I also saw no evidence and I can add that you need to start from the same playing strength and show that the classical engine gets a bigger improvement to claim better scaling.
If you claim that it is correct only for relatively long time control then you can start with a relatively long time control.
start with time control that NNUE engines score 50% against classical engine
You can give 10 minutes for all the game +10 seconds per move time control for NNUE and 200+200 time control for the classical engine(when the engines do not ponder) if you need it and if it is not enough give also more cores to the classical engine.
Increase the time control to
50+50 time control for NNUE and 1000+1000 time control for the classical engine.
If the classical engine score singinficantly better than 50% in the new time control then you prove your point.
We are discussing SMP scaling, you are totally off topic here.
mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm
Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.
I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.
Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.
I also saw no evidence and I can add that you need to start from the same playing strength and show that the classical engine gets a bigger improvement to claim better scaling.
If you claim that it is correct only for relatively long time control then you can start with a relatively long time control.
start with time control that NNUE engines score 50% against classical engine
You can give 10 minutes for all the game +10 seconds per move time control for NNUE and 200+200 time control for the classical engine(when the engines do not ponder) if you need it and if it is not enough give also more cores to the classical engine.
Increase the time control to
50+50 time control for NNUE and 1000+1000 time control for the classical engine.
If the classical engine score singinficantly better than 50% in the new time control then you prove your point.
We are discussing SMP scaling, you are totally off topic here.
If the claim is only about SMP then you can still use the same idea with unequal time controls.
suppose 10+10 with 1 core for NNUE is the same playing strength as 200+200 with 1 core for some classical engine.
use the same unequal time control but more cores and show that the classical engine get more than 50%.
In other words show that 200+200 time control with 32 cores for classical engine beat 10+10 time control with 32 cores for the NNUE engine.
mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm
Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.
I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.
Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.
I also saw no evidence and I can add that you need to start from the same playing strength and show that the classical engine gets a bigger improvement to claim better scaling.
If you claim that it is correct only for relatively long time control then you can start with a relatively long time control.
start with time control that NNUE engines score 50% against classical engine
You can give 10 minutes for all the game +10 seconds per move time control for NNUE and 200+200 time control for the classical engine(when the engines do not ponder) if you need it and if it is not enough give also more cores to the classical engine.
Increase the time control to
50+50 time control for NNUE and 1000+1000 time control for the classical engine.
If the classical engine score singinficantly better than 50% in the new time control then you prove your point.
We are discussing SMP scaling, you are totally off topic here.
If the claim is only about SMP then you can still use the same idea with unequal time controls.
suppose 10+10 with 1 core for NNUE is the same playing strength as 200+200 with 1 core for some classical engine.
use the same unequal time control but more cores and show that the classical engine get more than 50%.
In other words show that 200+200 time control with 32 cores for classical engine beat 10+10 time control with 32 cores for the NNUE engine.
Yeah everyone knows what needs to be done, but no one ever does it because it is extremely impractical. Can you tell me exactly 2 TCs at which SF-NNUE and SF-classical are equal on a single core? Can anyone?
And then just setting up and playing tournament with asymmetrical TCs is pain in the ass.
8 CPUs
# PLAYER : RATING ERROR POINTS PLAYED (%) W D L D(%) CFS(%)
1 SF-NNUE: 118 89 21.0 33 63.6 13 16 4 48.5 100
2 SF11 : 0 ---- 12.0 33 36.4 4 16 13 48.5 ---
4 CPUs
# PLAYER : RATING ERROR POINTS PLAYED (%) W D L D(%) CFS(%)
1 SF-NNUE: 58 44 58.0 100 58.0 27 62 11 62.0 100
2 SF11 : 0 ---- 42.0 100 42.0 11 62 27 62.0 ---
1 CPU
# PLAYER : RATING ERROR POINTS PLAYED (%) W D L D(%) CFS(%)
1 SF-NNUE: 45 19 281.5 500 56.3 128 307 65 61.4 100
2 SF11 : 0 ---- 218.5 500 43.7 65 307 128 61.4 ---
Pretty high error bars, but still 100% CFS, FWIW
I started testing and as expected from your data. As I have done much testing on this. We are seeing a serious disconnect!.
You show SF NNUE 1 core vs SF 11 1 core only beating SF11 by 45 Elo.
Is this correct? If so you have something wrong, or everyone's testing to this point is bunk.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
I got a TC from you NICE. But there is no way your results can be correct. If this was SF NNUE.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.