Dispelling the Myth of NNUE with LazySMP: An Analysis

mwyoung · Post by **mwyoung** » Thu Dec 31, 2020 4:07 am

AndrewGrant wrote: ↑Thu Dec 31, 2020 3:58 am
mwyoung wrote: ↑Thu Dec 31, 2020 3:15 am Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
Okay. You posted everything finally

So can you explain to me why Ethereal NNUE does not exhibit the same scaling change? My non-NNUE version follows what you've called Grant's Law. But so does my NNUE version. What gives? Its the same evaluation function as Stockfish in essence.

If your answer is that its selfplay: Then I will do Ethereal (not NNUE) vs Ethereal (NNUE), and do scaling core counts for the NNUE version. If this returns the similar results... what say you?

No I can not explain it. Other then it exists.

All I know is to listen to what the data is telling us. And your data may give us a answer. Since we tested the same way to show our results.

NNUE has it own way of behaving. And this is just not with SF NNUE, but also Dragon NNUE. What do they have in common that is not in your engine.

And that is why we need to not assume something about peoples honest testing.

AndrewGrant · Post by **AndrewGrant** » Thu Dec 31, 2020 4:10 am

mwyoung wrote: ↑Thu Dec 31, 2020 4:01 am
AndrewGrant wrote: ↑Thu Dec 31, 2020 3:58 am
mwyoung wrote: ↑Thu Dec 31, 2020 3:15 am Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
Okay. You posted everything finally

So can you explain to me why Ethereal NNUE does not exhibit the same scaling change? My non-NNUE version follows what you've called Grant's Law. But so does my NNUE version. What gives? Its the same evaluation function as Stockfish in essence.

If your answer is that its selfplay: Then I will do Ethereal (not NNUE) vs Ethereal (NNUE), and do scaling core counts for the NNUE version. If this returns the similar results... what say you?
What are you talking about. You were in the thread that this data came from. It was posted on Sun Oct 11, 2020 3:40 pm.

Having a comment in a thread is not the same thing as following said thread. Can you respond to my questions?

mwyoung · Post by **mwyoung** » Thu Dec 31, 2020 4:18 am

AndrewGrant wrote: ↑Thu Dec 31, 2020 4:10 am
mwyoung wrote: ↑Thu Dec 31, 2020 4:01 am
AndrewGrant wrote: ↑Thu Dec 31, 2020 3:58 am
mwyoung wrote: ↑Thu Dec 31, 2020 3:15 am Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
Okay. You posted everything finally

So can you explain to me why Ethereal NNUE does not exhibit the same scaling change? My non-NNUE version follows what you've called Grant's Law. But so does my NNUE version. What gives? Its the same evaluation function as Stockfish in essence.

If your answer is that its selfplay: Then I will do Ethereal (not NNUE) vs Ethereal (NNUE), and do scaling core counts for the NNUE version. If this returns the similar results... what say you?
What are you talking about. You were in the thread that this data came from. It was posted on Sun Oct 11, 2020 3:40 pm.
Having a comment in a thread is not the same thing as following said thread. Can you respond to my questions?

I have no answers. I will leave it to you. You are the programmer!

NNUE has it own way of behaving. And this is just not with SF NNUE, but also Dragon NNUE. What do they have in common that is not in your engine.

AndrewGrant · Post by **AndrewGrant** » Thu Dec 31, 2020 4:24 am

mwyoung wrote: ↑Thu Dec 31, 2020 4:18 am I have no answers. I will leave it to you. You are the programmer! NNUE has it own way of behaving. And this is just not with SF NNUE, but also Dragon NNUE. What do they have in common that is not in your engine.

If you have no explanation, how can you attribute what you see to NNUE?
Maybe its about the elo ceiling like I proposed. Maybe its about something else?

mwyoung · Post by **mwyoung** » Thu Dec 31, 2020 4:41 am

AndrewGrant wrote: ↑Thu Dec 31, 2020 4:24 am
mwyoung wrote: ↑Thu Dec 31, 2020 4:18 am I have no answers. I will leave it to you. You are the programmer! NNUE has it own way of behaving. And this is just not with SF NNUE, but also Dragon NNUE. What do they have in common that is not in your engine.
If you have no explanation, how can you attribute what you see to NNUE?
Maybe its about the elo ceiling like I proposed. Maybe its about something else?

As that is what caused the change in the behavior. So this is not hard to deduce.
What changed, what is different. NNUE was the common element.

Unlike most testers. I do not test at micro bullet, and really fast time controls.
And I watch many of the games.

And I also test at many time controls.

I guess this is why I found this issue first.

"I beleive that what other Users are calling a scaling issue in NNUE is actually a live example of the diminishing returns of superior software as one reaches the elo ceiling."

No I do not think this is the answer, because this can not be true with a type B search. And I have shown the mistakes that are made by SF NNUE against perfect play.

AndrewGrant · Post by **AndrewGrant** » Thu Dec 31, 2020 5:10 am

mwyoung wrote: ↑Thu Dec 31, 2020 4:41 am As that is what caused the change in the behavior. So this is not hard to deduce.
What changed, what is different. NNUE was the common element.

Well, there is something else that changed. The elo of the engines. NNUE is far stronger than non-NNUE. NNUE does not exist in a vacuum, and I don't see how you can assign blame to NNUE for what you percieve as a scaling loss, versus the naturally tendency to compress elo at upper levels. By ignoring this, you've essentially discarded all other possible explanations for a phenomena. Which is far from scientific.

Lets assume for a second that you are entirely right in that Stockfish does exhibit reduced scaling now that NNUE has been implemented. I have now provided you with a second program, which uses the same SMP algorithm and Evaluation method, and this second program (Ethereal) does not exhibit reduced scaling.

If I take your original argument, which is that NNUE is the "common element", then I should arrive at the conclusion that Ethereal should not be scaling as well. But that is a contradiction, Ethereal is scaling well. How do you rectify your theory to account for the data I have shared?

mwyoung · Post by **mwyoung** » Thu Dec 31, 2020 5:30 am

AndrewGrant wrote: ↑Thu Dec 31, 2020 5:10 am
mwyoung wrote: ↑Thu Dec 31, 2020 4:41 am As that is what caused the change in the behavior. So this is not hard to deduce.
What changed, what is different. NNUE was the common element.
Well, there is something else that changed. The elo of the engines. NNUE is far stronger than non-NNUE. NNUE does not exist in a vacuum, and I don't see how you can assign blame to NNUE for what you percieve as a scaling loss, versus the naturally tendency to compress elo at upper levels. By ignoring this, you've essentially discarded all other possible explanations for a phenomena. Which is far from scientific.

Lets assume for a second that you are entirely right in that Stockfish does exhibit reduced scaling now that NNUE has been implemented. I have now provided you with a second program, which uses the same SMP algorithm and Evaluation method, and this second program (Ethereal) does not exhibit reduced scaling.

If I take your original argument, which is that NNUE is the "common element", then I should arrive at the conclusion that Ethereal should not be scaling as well. But that is a contradiction, Ethereal is scaling well. How do you rectify your theory to account for the data I have shared?

I have no answers! But the results are the results. And NNUE is still the reason, even if we have reached perfect play.

It was you how said +70 Elo even with NNUE, and I was being deceitful.

I guess we are beyond that now....

And as I have said to my Subs! NNUE is a great thing. You can now have minimal hardware, like a phone. And still have the best.

I love NNUE! Yes it scales like crap, SF and Dragon NNUE, but so what. Nothing can touch it!

And remember I do not have your NNUE program. When it comes out, or if it has already. We will see how your scaling NNUE stacks up.

And I test all the time. And doing so I see that SF NNUE is stronger today then SF 12. So again you ceiling theory really does not hold much water.

D Sceviour · Post by **D Sceviour** » Thu Dec 31, 2020 5:42 am

AndrewGrant wrote: ↑Wed Dec 30, 2020 11:02 pm Stockfish gained an inordinate amount of strength with the introduction of NNUE, as did all engines which have followed in Stockfish's footsteps.

Not Schooner, which attempted the CFish NNUE port. Schooner achieved a large -40 elo loss. Work on NNUE has been abandoned for the moment, but I may look at it again in the future.

AndrewGrant · Post by **AndrewGrant** » Thu Dec 31, 2020 5:43 am

D Sceviour wrote: ↑Thu Dec 31, 2020 5:42 am
AndrewGrant wrote: ↑Wed Dec 30, 2020 11:02 pm Stockfish gained an inordinate amount of strength with the introduction of NNUE, as did all engines which have followed in Stockfish's footsteps.
Not Schooner, which attempted the CFish NNUE port. Schooner achieved a large -40 elo loss. Work on NNUE has been abandoned for the moment, but I may look at it again in the future.

If you lose elo with SF's NNUE, you surely must have done something wrong? Bad conditions to run it? No attempts to scale it? Not using AVX(/2) builds? Not using official SF networks?

D Sceviour · Post by **D Sceviour** » Thu Dec 31, 2020 6:01 am

AndrewGrant wrote: ↑Thu Dec 31, 2020 5:43 am
D Sceviour wrote: ↑Thu Dec 31, 2020 5:42 am
AndrewGrant wrote: ↑Wed Dec 30, 2020 11:02 pm Stockfish gained an inordinate amount of strength with the introduction of NNUE, as did all engines which have followed in Stockfish's footsteps.
Not Schooner, which attempted the CFish NNUE port. Schooner achieved a large -40 elo loss. Work on NNUE has been abandoned for the moment, but I may look at it again in the future.
If you lose elo with SF's NNUE, you surely must have done something wrong? Bad conditions to run it? No attempts to scale it? Not using AVX(/2) builds? Not using official SF networks?

Official SF nets might be the problem. Since I have not finished my own tuning efforts (who can finish?), I tried:

nn-03744f8d56d8.nnue
nn-82215d0fd0df.nnue
nn-eba324f53044.nnue

AVX is on the agenda to look at. My hardware is supposed to handle it, but does it make that much difference?

Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis

Re: Dispelling the Myth of NNUE with LazySMP: An Analysis