CCC has serious hardware update!

AndrewGrant · Post by **AndrewGrant** » Wed Dec 30, 2020 10:23 pm

mwyoung wrote: ↑Wed Dec 30, 2020 10:19 pm Let me get this right. I posted data, you chimed in and gave a B.S. formula that NNUE clearly ignored in my posted data. And saying I was being deceitful.

I post independent data from CCRL. Again showing NNUE ignores your scaling formula. And then you also trash CCRL testing.

And now you want me to gather all the data for you.....

That would have been much wiser to do that first

Not at all. I am running games now to produce an argument that NNUE scales the same as other engines. I am asking you to provide a complete, all-in-one-place layout of the datapoints you believe give weight to your view, and for you to showcase those specific datapoints and explain why you believe they are the way they are.

I am doing the opposite of what you allege. I want to know exactly what your argument is, so that I can understand your position completely. If I know your position, and what fuels it, perhaps I will agree. Perhaps I won't be able to refute it. Who knows? Not me, at least not until my games finish playing and you present an argument.

Please copy paste the data that you are looking at here, and write up a brief summary of what is being shown.

brianr · Post by **brianr** » Wed Dec 30, 2020 10:34 pm

mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.

I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.

Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.

Uri Blass · Post by **Uri Blass** » Thu Dec 31, 2020 12:38 am

brianr wrote: ↑Wed Dec 30, 2020 10:34 pm
mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.
I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.

Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.

I also saw no evidence and I can add that you need to start from the same playing strength and show that the classical engine gets a bigger improvement to claim better scaling.

If you claim that it is correct only for relatively long time control then you can start with a relatively long time control.

start with time control that NNUE engines score 50% against classical engine
You can give 10 minutes for all the game +10 seconds per move time control for NNUE and 200+200 time control for the classical engine(when the engines do not ponder) if you need it and if it is not enough give also more cores to the classical engine.

Increase the time control to
50+50 time control for NNUE and 1000+1000 time control for the classical engine.

If the classical engine score singinficantly better than 50% in the new time control then you prove your point.

Milos · Post by **Milos** » Thu Dec 31, 2020 12:41 am

Uri Blass wrote: ↑Thu Dec 31, 2020 12:38 am
brianr wrote: ↑Wed Dec 30, 2020 10:34 pm
mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.
I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.

Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.
I also saw no evidence and I can add that you need to start from the same playing strength and show that the classical engine gets a bigger improvement to claim better scaling.

If you claim that it is correct only for relatively long time control then you can start with a relatively long time control.

start with time control that NNUE engines score 50% against classical engine
You can give 10 minutes for all the game +10 seconds per move time control for NNUE and 200+200 time control for the classical engine(when the engines do not ponder) if you need it and if it is not enough give also more cores to the classical engine.

Increase the time control to
50+50 time control for NNUE and 1000+1000 time control for the classical engine.

If the classical engine score singinficantly better than 50% in the new time control then you prove your point.

We are discussing SMP scaling, you are totally off topic here.

Uri Blass · Post by **Uri Blass** » Thu Dec 31, 2020 12:50 am

Milos wrote: ↑Thu Dec 31, 2020 12:41 am
Uri Blass wrote: ↑Thu Dec 31, 2020 12:38 am
brianr wrote: ↑Wed Dec 30, 2020 10:34 pm
mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.
I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.

Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.
I also saw no evidence and I can add that you need to start from the same playing strength and show that the classical engine gets a bigger improvement to claim better scaling.

If you claim that it is correct only for relatively long time control then you can start with a relatively long time control.

start with time control that NNUE engines score 50% against classical engine
You can give 10 minutes for all the game +10 seconds per move time control for NNUE and 200+200 time control for the classical engine(when the engines do not ponder) if you need it and if it is not enough give also more cores to the classical engine.

Increase the time control to
50+50 time control for NNUE and 1000+1000 time control for the classical engine.

If the classical engine score singinficantly better than 50% in the new time control then you prove your point.
We are discussing SMP scaling, you are totally off topic here.

If the claim is only about SMP then you can still use the same idea with unequal time controls.

suppose 10+10 with 1 core for NNUE is the same playing strength as 200+200 with 1 core for some classical engine.
use the same unequal time control but more cores and show that the classical engine get more than 50%.

In other words show that 200+200 time control with 32 cores for classical engine beat 10+10 time control with 32 cores for the NNUE engine.

Milos · Post by **Milos** » Thu Dec 31, 2020 12:59 am

Uri Blass wrote: ↑Thu Dec 31, 2020 12:50 am
Milos wrote: ↑Thu Dec 31, 2020 12:41 am
Uri Blass wrote: ↑Thu Dec 31, 2020 12:38 am
brianr wrote: ↑Wed Dec 30, 2020 10:34 pm
mwyoung wrote: ↑Wed Dec 30, 2020 8:21 pm Interesting ignoring the error bars. That you are suggesting that SF NNUE scales better then SF11. As you add more cores.
I never said anything other than SF-NNUE scales just fine, and I highlighted that the error bars are indeed quite large.

Now, it is possible that SF-NNUE actually does scale better than SF-11 because its "eval" (the NNUE net) is vastly better than the HCE (hand-crafted eval) and SF is an A/B engine that does extreme pruning, so slightly better move ordering would enable more accurate deeper searches. I certainly don't have the h/w resources nor the inclination to bother testing that. In any case, I see no evidence that is scales significantly worse than SF-11.
I also saw no evidence and I can add that you need to start from the same playing strength and show that the classical engine gets a bigger improvement to claim better scaling.

If you claim that it is correct only for relatively long time control then you can start with a relatively long time control.

start with time control that NNUE engines score 50% against classical engine
You can give 10 minutes for all the game +10 seconds per move time control for NNUE and 200+200 time control for the classical engine(when the engines do not ponder) if you need it and if it is not enough give also more cores to the classical engine.

Increase the time control to
50+50 time control for NNUE and 1000+1000 time control for the classical engine.

If the classical engine score singinficantly better than 50% in the new time control then you prove your point.
We are discussing SMP scaling, you are totally off topic here.
If the claim is only about SMP then you can still use the same idea with unequal time controls.

suppose 10+10 with 1 core for NNUE is the same playing strength as 200+200 with 1 core for some classical engine.
use the same unequal time control but more cores and show that the classical engine get more than 50%.

In other words show that 200+200 time control with 32 cores for classical engine beat 10+10 time control with 32 cores for the NNUE engine.

Yeah everyone knows what needs to be done, but no one ever does it because it is extremely impractical. Can you tell me exactly 2 TCs at which SF-NNUE and SF-classical are equal on a single core? Can anyone?
And then just setting up and playing tournament with asymmetrical TCs is pain in the ass.

mwyoung · Post by **mwyoung** » Thu Dec 31, 2020 1:12 am

brianr wrote: ↑Wed Dec 30, 2020 8:02 pm it was SF-NNUE v SF-11

Code: Select all

8 CPUs
# PLAYER :  RATING  ERROR  POINTS  PLAYED   (%)    W    D    L  D(%)  CFS(%)
1 SF-NNUE:     118     89    21.0      33  63.6   13   16    4  48.5     100
2 SF11   :       0   ----    12.0      33  36.4    4   16   13  48.5     ---

4 CPUs
# PLAYER :  RATING  ERROR  POINTS  PLAYED   (%)    W    D    L  D(%)  CFS(%)
1 SF-NNUE:      58     44    58.0     100  58.0   27   62   11  62.0     100
2 SF11   :       0   ----    42.0     100  42.0   11   62   27  62.0     ---

1 CPU
# PLAYER :  RATING  ERROR  POINTS  PLAYED   (%)    W    D    L  D(%)  CFS(%)
1 SF-NNUE:      45     19   281.5     500  56.3  128  307   65  61.4     100
2 SF11   :       0   ----   218.5     500  43.7   65  307  128  61.4     ---

Pretty high error bars, but still 100% CFS, FWIW

I started testing and as expected from your data. As I have done much testing on this. We are seeing a serious disconnect!.

You show SF NNUE 1 core vs SF 11 1 core only beating SF11 by 45 Elo.

Is this correct? If so you have something wrong, or everyone's testing to this point is bunk.

brianr · Post by **brianr** » Thu Dec 31, 2020 1:50 am

This was the match command used for the 1 cpu games (IIRC):

Code: Select all

cutechess-cli -each tc=0:20+1.0 restart=on -engine conf=SF11 -engine conf=SF-NNUE -games 12000 -wait 1000 -ratinginterval 10 -tb E:/syzygy -tbpieces 6 -openings file=D:/Cutechess-cli/book_3moves_cp1-24_10944pos.pgn start=1 -repeat -concurrency 12 -pgnout book_3moves_cp1-24_10944pos-SF11_v_SF-NNUE-gekkehenker2020-06-27.pgn

mwyoung · Post by **mwyoung** » Thu Dec 31, 2020 1:55 am

brianr wrote: ↑Thu Dec 31, 2020 1:50 am This was the match command used for the 1 cpu games (IIRC):

Code: Select all

cutechess-cli -each tc=0:20+1.0 restart=on -engine conf=SF11 -engine conf=SF-NNUE -games 12000 -wait 1000 -ratinginterval 10 -tb E:/syzygy -tbpieces 6 -openings file=D:/Cutechess-cli/book_3moves_cp1-24_10944pos.pgn start=1 -repeat -concurrency 12 -pgnout book_3moves_cp1-24_10944pos-SF11_v_SF-NNUE-gekkehenker2020-06-27.pgn

I got a TC from you NICE. But there is no way your results can be correct. If this was SF NNUE.

brianr · Post by **brianr** » Thu Dec 31, 2020 1:58 am

Engine was stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Net was gekkehenker2020-06-27.bin

CCC has serious hardware update!

Re: CCC has serious hardware update!

Re: CCC has serious hardware update!

Re: CCC has serious hardware update!

Re: CCC has serious hardware update!

Re: CCC has serious hardware update!

Re: CCC has serious hardware update!

Re: CCC has serious hardware update!

Re: CCC has serious hardware update!

Re: CCC has serious hardware update!

Re: CCC has serious hardware update!