LC0 vs. NNUE - some tech details...

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: LC0 vs. NNUE - some tech details...

Post by Ovyron »

Milos wrote: Thu Jul 30, 2020 2:08 am Since hand-crafted Komodo eval was better than SF eval the only thing one can expect is that Komod NNUE will not benefit as much from NNUE eval as Stockfish does
Oh, it will. We're comparing Komodo's static eval to Komodo's depth 8's eval, or whatever they train it with (if they do.) The better the eval, the better the jump because all the net does is getting there faster. Stockfish NNUE still uses Stockfish's search, the guys at Komodo team could optimize its search to make use of NNUE. Komodo 14 is just 70 elo behind Stockfish, since NNUE without search optimization is giving that boost to Stockfish, getting a Komodo NNUE to top the rating lists is finally for the taking.

But they need to be quick, if they don't implement anything by the time they pass Stockfish's dev of today the merge will have happened and nobody will care, because Komodo NNUE will remain 70 elo behind whatever the merge produces.
Leo
Posts: 1080
Joined: Fri Sep 16, 2016 6:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: LC0 vs. NNUE - some tech details...

Post by Leo »

smatovic wrote: Wed Jul 29, 2020 9:33 am I am a noob in neural networks and implementation, so others wish to correct me
or add something, anyway, cos it may come up repeatedly...

- LC0 uses CNNs, Convolutional Neural Networks, for position evaluation
- NNUE is currently a kind of MLP, Multi-Layer-Perceptron, with incremental updates for the first layer

- A0 used originally about 50 million neural network weights
- NNUE uses currently about 10 million weights? Or more, depending on net size

- LC0 uses a MCTS-PUCT search
- NNUE uses the Alpha-Beta search of its "host" engine

- LC0 uses the Zero approach with Reinforcement Learning on a GPU-Cloud-Cluster
- NNUE uses initial RL with addition of SL, Supervised Learning, with engine-engine games

- LC0 runs the NN part well on GPU (up to hundreds of Vector-Units) via batches
- NNUE runs on the Vector-Unit of the CPU (SSE, AVX, NEO), no batches in need

Cos NNUE runs a smaller kind of NN on a CPU efficient it gains more NPS in an
AB search than previous approaches like Giraffe, you can view it in a way that
it can combine both worlds, the LC0 NN part and the SF AB search part, on a CPU.

--
Srdja
Who are the geniuses who invented NNUE?
Advanced Micro Devices fan.
smatovic
Posts: 2647
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: LC0 vs. NNUE - some tech details...

Post by smatovic »

Leo wrote: Sat Aug 08, 2020 5:12 am
smatovic wrote: Wed Jul 29, 2020 9:33 am ...
Who are the geniuses who invented NNUE?
Yea, it is really a interesting kind of "trick".

The first NN layer is overparametrized, that is where most of the weights are,
but is computed efficient via incremental updates during make/unmake move.

Yu Nasu came up with NNUE for Shogi in 2018:

https://www.chessprogramming.org/NNUE#cite_note-3

It was used successful in several Shogi engines.

Then Hisayori Noda (Nodchip) backported NNUE to SF in 2019 as proof of concept,
and it simply took of in 2020 with the help of Henk Drost (Raphexon) et al.

https://www.chessprogramming.org/Hisayori_Noda

https://www.chessprogramming.org/NNUE#Stockfish_NNUE

--
Srdja
Modern Times
Posts: 3549
Joined: Thu Jun 07, 2012 11:02 pm

Re: LC0 vs. NNUE - some tech details...

Post by Modern Times »

Ovyron wrote: Thu Jul 30, 2020 7:02 am
But they need to be quick, if they don't implement anything by the time they pass Stockfish's dev of today the merge will have happened and nobody will care, because Komodo NNUE will remain 70 elo behind whatever the merge produces.
Maybe they have been working on it already for some time, who knows. Being a commercial engine they don't shout about what they are doing and you have no idea what is going on behind the scenes. But I suspect not.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: LC0 vs. NNUE - some tech details...

Post by Ovyron »

Modern Times wrote: Sat Aug 08, 2020 10:08 am
Ovyron wrote: Thu Jul 30, 2020 7:02 am
But they need to be quick, if they don't implement anything by the time they pass Stockfish's dev of today the merge will have happened and nobody will care, because Komodo NNUE will remain 70 elo behind whatever the merge produces.
Maybe they have been working on it already for some time, who knows. Being a commercial engine they don't shout about what they are doing and you have no idea what is going on behind the scenes. But I suspect not.
Too late already, Stockfish-dev is now 150 elo stronger than Stockfish-dev from before the merge (2 days ago!). Those 150 elo were for the taking by Komodo to make an emergency release and be the undisputed #1 engine. Though I guess nobody would have cared anyway since the progress is happening much faster than people can make meaningful tests.

Stockfish has had the progress of the next 3 years in 1 month.
Leo
Posts: 1080
Joined: Fri Sep 16, 2016 6:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: LC0 vs. NNUE - some tech details...

Post by Leo »

smatovic wrote: Sat Aug 08, 2020 9:54 am
Leo wrote: Sat Aug 08, 2020 5:12 am
smatovic wrote: Wed Jul 29, 2020 9:33 am ...
Who are the geniuses who invented NNUE?
Yea, it is really a interesting kind of "trick".

The first NN layer is overparametrized, that is where most of the weights are,
but is computed efficient via incremental updates during make/unmake move.

Yu Nasu came up with NNUE for Shogi in 2018:

https://www.chessprogramming.org/NNUE#cite_note-3

It was used successful in several Shogi engines.

Then Hisayori Noda (Nodchip) backported NNUE to SF in 2019 as proof of concept,
and it simply took of in 2020 with the help of Henk Drost (Raphexon) et al.

https://www.chessprogramming.org/Hisayori_Noda

https://www.chessprogramming.org/NNUE#Stockfish_NNUE

--
Srdja
Thanks.
Advanced Micro Devices fan.
Rom77
Posts: 45
Joined: Wed Oct 24, 2018 7:37 am
Full name: Roman Zhukov

Re: LC0 vs. NNUE - some tech details...

Post by Rom77 »

smatovic wrote: Sat Aug 08, 2020 9:54 am
Leo wrote: Sat Aug 08, 2020 5:12 am Who are the geniuses who invented NNUE?
Yea, it is really a interesting kind of "trick".

The first NN layer is overparametrized, that is where most of the weights are,
but is computed efficient via incremental updates during make/unmake move.

Yu Nasu came up with NNUE for Shogi in 2018:

https://www.chessprogramming.org/NNUE#cite_note-3
It seems a similar evaluation function was used back in 2005 in the Bonanza program. But it was linear (?):
http://yaneuraou.yaneu.com/2020/05/03/% ... e3%81%ae1/
http://yaneuraou.yaneu.com/2020/05/27/% ... e3%81%ae2/

Of course, I can't say for sure, since I read it through GoogleTranslate.
smatovic
Posts: 2647
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: LC0 vs. NNUE - some tech details...

Post by smatovic »

Rom77 wrote: Sat Aug 08, 2020 1:27 pm
smatovic wrote: Sat Aug 08, 2020 9:54 am
Leo wrote: Sat Aug 08, 2020 5:12 am Who are the geniuses who invented NNUE?
Yea, it is really a interesting kind of "trick".

The first NN layer is overparametrized, that is where most of the weights are,
but is computed efficient via incremental updates during make/unmake move.

Yu Nasu came up with NNUE for Shogi in 2018:

https://www.chessprogramming.org/NNUE#cite_note-3
It seems a similar evaluation function was used back in 2005 in the Bonanza program. But it was linear (?):
http://yaneuraou.yaneu.com/2020/05/03/% ... e3%81%ae1/
http://yaneuraou.yaneu.com/2020/05/27/% ... e3%81%ae2/

Of course, I can't say for sure, since I read it through GoogleTranslate.
Yes, hard to read, here the Bonanza PDF the article mentions:

http://www.ipsj.or.jp/10jigyo/forum/sof ... -print.pdf

I can imagine that this incremental update trick was used before by others, maybe also in other domains than only game tree search.

--
Srdja