NNUE Research Project

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

NNUE Research Project

Post by Rebel »

NNUE Research Project
March 10, 2021

It´s generally known by now similarity testing on moves does not work with NNUE nets. On this page we will try to research if it is not possible using other methods. One method is to calculate the Root-mean-square deviation (or RMS) of the scores instead of moves as after all NNUE is a set of scores. We will present data and the source code for discussion.

Let´s start at the beginning of NNUE in the summer of 2020 the starting point of the NNUE revolution when the Stockfish team implemented the Sergio nets. Our first goal is to measure the stability of the RMS of Stockfish NNUE nets. From the Sergio nets we calculate the RMS of the very first 3 nets (july) and the last 3 (september) and compare the RMS with the final SF12 net, see table one. In table two the nets between SF12 and SF13 are compared plus 5 nets after the release of SF13.

....

http://rebel13.nl/home/nnue.html
90% of coding is debugging, the other 10% is writing bugs.
dkappe
Posts: 1631
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: NNUE Research Project

Post by dkappe »

Some thoughts:

1) would it make sense to include other nets trained on other data?
2) I’ve used RMS and other measures to scale some of my nets against sf master. Eliminating eval abs(score) > 1500 yielded better results (with pure nnue).
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: NNUE Research Project

Post by Rebel »

dkappe wrote: Wed Mar 10, 2021 10:21 pm Some thoughts:

1) would it make sense to include other nets trained on other data?
Sure, one false positive and the idea can down to the toilet.
2) I’ve used RMS and other measures to scale some of my nets against sf master. Eliminating eval abs(score) > 1500 yielded better results (with pure nnue).
As you can see from the source code I skip abs(scores) >999.
90% of coding is debugging, the other 10% is writing bugs.
AndrewGrant
Posts: 1750
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: NNUE Research Project

Post by AndrewGrant »

Would be interesting to take the following 5 things:

1. Stockfish Master (NNUE off)
2. Stockfish Master (pick net X)
3. "Etherlito" (pick the same net as above)
4. I send you a latest Ethereal with the TCEC Net
5. Ethereal Master (No NNUE)

Presumably, you see high sims with 1 & 2, as well as 3 & 4 & 5 (Obviosuly)
And we can test to compare 2 & 3 (Same Net), vs 2 & 4 (Different Net)
Establish a baseline with 1 & 5 (No Networks)
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
connor_mcmonigle
Posts: 530
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: NNUE Research Project

Post by connor_mcmonigle »

Interesting results. If you're just directly applying RMSE to the raw low depth evaluations, you're likely picking up mostly on the scale of the evaluation rather than the uniqueness of the evaluations (which could explain the high similarity of the FF2 net assuming ChessBase's claims about its origins are true). A number of engines scale the output of the network to best match the evaluation scale expected by their search (static null move pruning conditions, etc.). To resolve this, I'd suggest first standardizing (subtract mean and divide by variance) the evaluation distributions to Normal(0, 1) before the taking RMSE. This metric should better correspond to what we mean by a unique evaluation function.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: NNUE Research Project

Post by chrisw »

connor_mcmonigle wrote: Thu Mar 11, 2021 1:19 am Interesting results. If you're just directly applying RMSE to the raw low depth evaluations, you're likely picking up mostly on the scale of the evaluation rather than the uniqueness of the evaluations (which could explain the high similarity of the FF2 net assuming ChessBase's claims about its origins are true). A number of engines scale the output of the network to best match the evaluation scale expected by their search (static null move pruning conditions, etc.). To resolve this, I'd suggest first standardizing (subtract mean and divide by variance) the evaluation distributions to Normal(0, 1) before the taking RMSE. This metric should better correspond to what we mean by a unique evaluation function.
Yup, quite so. When we ran the results, I also programmed in the pearsonr, which I think basically does a scale and translate (slope and intercept) on the data and then measures the deviations. It wasn’t presented on Ed’s result page because last time I tried presenting pearsonr data it just got grunted at, but actually it’s probably a rather fine similarity measure for CC evals.
For example, two nets where one uses (in effect) 13359 for material and the other 2, 6, 6, 10, 18 are basically identical, but RMS won’t tell you. Pearsonr will.

I have only a subset of Ed's data, but here's an inverse sorted list of engine pairs with pearsonr.
0.94, 0.95 and above looks to be the value where quite strong net similarity shows up.

Code: Select all

Pearson r, RMS, engine pair
======================
0.838 108.6 SF12-Igel-280 v SF12-nexus
0.854 105.8 SF12-Igel-270 v SF12-nexus
0.856 104.6 SF12-Igel-280 v SF12-nn-516f5b95189a
0.858 80.5 SF12-nn-803c91ad5c v _SF13
0.858 108.6 SF12-Igel-280 v SF12
0.859 104.7 SF12-Igel-280 v SF12-Minic
0.86 79.5 SF12-nn-803c91ad5c v SF13
0.864 93.7 SF12-nn-803c91ad5c v SF12
0.865 111.1 SF12-Igel-280 v _SF12
0.868 95.5 SF12-nn-803c91ad5c v _SF12
0.868 100.1 SF12-Igel-280 v SF12-nn-dd0c4c630f7e
0.87 101.2 SF12-Igel-280 v SF12-nn-0c6fc5ef48e1
0.871 89.1 SF12-Minic v SF12-nn-803c91ad5c
0.872 105.9 SF12-Igel-270 v SF12
0.873 101.9 SF12-Igel-270 v SF12-Minic
0.874 87.6 SF12-Igel-280 v SF13
0.875 88.5 SF12-Igel-280 v _SF13
0.875 108.8 SF12-Igel-270 v _SF12
0.875 88.1 SF12-nn-516f5b95189a v SF12-nn-803c91ad5c
0.875 101.3 SF12-Igel-270 v SF12-nn-516f5b95189a
0.876 60.7 SF12-nn-803c91ad5c v _FF2
0.881 93.0 SF12-Minic v SF12-nascent
0.883 96.0 SF12-nascent v SF12
0.884 97.3 SF12-Igel-270 v SF12-nn-dd0c4c630f7e
0.885 79.0 SF12-nascent v _SF13
0.885 84.6 SF12-nn-0c6fc5ef48e1 v SF12-nn-803c91ad5c
0.885 83.4 SF12-nn-803c91ad5c v SF12-nn-dd0c4c630f7e
0.886 78.0 SF12-nascent v SF13
0.886 98.3 SF12-Igel-270 v SF12-nn-0c6fc5ef48e1
0.886 85.0 SF12-Igel-270 v SF13
0.886 98.4 SF12-nascent v _SF12
0.887 86.0 SF12-Igel-270 v _SF13
0.89 91.0 SF12-nascent v SF12-nn-516f5b95189a
0.891 85.5 SF12-nexus v SF12-nn-803c91ad5c
0.892 76.2 SF12-nexus v _SF13
0.892 76.5 SF12-nexus v SF13
0.894 91.7 SF12-nascent v SF12-nexus
0.895 62.7 SF12-Igel-280 v _FF2
0.895 88.7 SF12-nascent v SF12-nn-0c6fc5ef48e1
0.895 53.6 SF12-Igel-280 v SF12-nn-803c91ad5c
0.898 78.9 SF12-nexus v _FF2
0.9 51.9 SF12-Igel-270 v SF12-nn-803c91ad5c
0.901 56.1 SF12-nascent v _FF2
0.901 86.3 SF12-nascent v SF12-nn-dd0c4c630f7e
0.902 60.6 SF12-Igel-270 v _FF2
0.905 75.4 SF12-nexus v _SF12
0.905 74.6 SF12-nexus v SF12
0.915 46.1 SF12-nascent v SF12-nn-803c91ad5c
0.916 69.3 SF12-Minic v SF12-nexus
0.922 66.1 SF12-nexus v SF12-nn-dd0c4c630f7e
0.923 65.8 SF12-nexus v SF12-nn-0c6fc5ef48e1
0.923 39.3 SF12-Igel-280 v SF12-nascent
0.925 38.6 SF12-Igel-270 v SF12-nascent
0.925 65.7 SF12-nexus v SF12-nn-516f5b95189a
0.928 69.6 SF12-Minic v _FF2
0.929 72.2 SF12 v _FF2
0.934 73.4 _FF2 v _SF12
0.934 60.6 SF12-Minic v SF13
0.934 67.4 SF12-nn-516f5b95189a v _FF2
0.935 59.6 SF12-Minic v _SF13
0.935 64.8 SF12-nn-dd0c4c630f7e v _FF2
0.938 65.0 SF12-nn-0c6fc5ef48e1 v _FF2
0.939 56.9 SF12-nn-dd0c4c630f7e v SF13
0.94 59.9 SF12 v SF13
0.941 61.0 SF13 v _SF12
0.941 58.8 SF12 v _SF13
0.942 55.5 SF12-nn-dd0c4c630f7e v _SF13
0.943 56.4 SF12-nn-516f5b95189a v SF13
0.944 55.7 SF12-nn-516f5b95189a v _SF13
0.945 58.9 _SF12 v _SF13
0.945 54.7 SF12-nn-0c6fc5ef48e1 v SF13
0.947 55.1 SF12-Minic v SF12-nn-516f5b95189a
0.947 54.8 SF12-Minic v SF12-nn-0c6fc5ef48e1
0.947 51.8 SF13 v _FF2
0.947 54.2 SF12-Minic v SF12-nn-dd0c4c630f7e
0.948 53.2 SF12-nn-0c6fc5ef48e1 v _SF13
0.948 55.0 SF12-nn-516f5b95189a v SF12
0.949 55.3 SF12-nn-516f5b95189a v _SF12
0.951 53.6 SF12-nn-0c6fc5ef48e1 v SF12
0.951 50.9 _FF2 v _SF13
0.952 52.8 SF12-nn-dd0c4c630f7e v SF12
0.954 26.1 SF12-Igel-270 v SF12-Igel-280
0.954 52.7 SF12-nn-dd0c4c630f7e v _SF12
0.954 52.6 SF12-Minic v _SF12
0.955 51.3 SF12-Minic v SF12
0.955 52.2 SF12-nn-0c6fc5ef48e1 v _SF12
0.956 49.7 SF12-nn-516f5b95189a v SF12-nn-dd0c4c630f7e
0.959 47.8 SF12-nn-0c6fc5ef48e1 v SF12-nn-516f5b95189a
0.96 47.0 SF12-nn-0c6fc5ef48e1 v SF12-nn-dd0c4c630f7e
0.975 33.9 SF13 v _SF13
0.977 37.4 SF12 v _SF12

User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: NNUE Research Project

Post by xr_a_y »

What is called "Minic" here is a net if I understand well. But is it "Napping Nexus" or "Nascent Nutrient" ?
The first one is based on SF data, the second on Minic (the engine) ones.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: NNUE Research Project

Post by chrisw »

xr_a_y wrote: Thu Mar 11, 2021 12:09 pm What is called "Minic" here is a net if I understand well. But is it "Napping Nexus" or "Nascent Nutrient" ?
The first one is based on SF data, the second on Minic (the engine) ones.
Ed can answer, because the names listed are the filenames he used to save the data. I think, not entirely sure, he took NNs and ran them within SF12 and SF13, ie he’s trying to compare nets not search. There seem to be three mimic connected nets, which he called Minic, nascent and nexus. Does that figure?
David Carteau
Posts: 121
Joined: Sat May 24, 2014 9:09 am
Location: France
Full name: David Carteau

Re: NNUE Research Project

Post by David Carteau »

Rebel wrote: Wed Mar 10, 2021 9:29 pm NNUE Research Project
March 10, 2021

(...)

http://rebel13.nl/home/nnue.html
That's an interesting initiative, thank you !

Just to say few words about Orion 0.7 (the site states that "Orion 0.7 - From the information on the website the nnue origin of version 0.7 is unclear, the RMS implies a strong correlation with SF12.") : the site should mention "Orion 0.7 NNUE" and not "Orion 0.7".

In fact, "Orion 0.7" embeds a traditional hand-crafted evaluation function, the latter being entirely replaced by the NNUE network of SF12 (hence the high similarity observed) during my experiments to implement my own NNUE inference code. The resulting "experimental" version of Orion was then named "Orion 0.7 NNUE".

Kind regards from France ;)
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: NNUE Research Project

Post by xr_a_y »

chrisw wrote: Thu Mar 11, 2021 12:58 pm
xr_a_y wrote: Thu Mar 11, 2021 12:09 pm What is called "Minic" here is a net if I understand well. But is it "Napping Nexus" or "Nascent Nutrient" ?
The first one is based on SF data, the second on Minic (the engine) ones.
Ed can answer, because the names listed are the filenames he used to save the data. I think, not entirely sure, he took NNs and ran them within SF12 and SF13, ie he’s trying to compare nets not search. There seem to be three mimic connected nets, which he called Minic, nascent and nexus. Does that figure?
ok I missed the "nexus" and "nascent" in here. So my guess is that the net called "Minic" here is something else. Maybe the good SV net I was using at first.