How much work is it to train an NNUE?

Gabor Szots · Post by **Gabor Szots** » Thu Feb 11, 2021 8:58 am

To develop a chess engine usually takes several monts, years or even a lifetime. But how much work is it to take an existing engine and replace its NNUE with a different one?
In my naive view, to make an NNUE you collect a huge amount of games, determine which features of positions you want to analyze, then let your computer do the rest while you are having your holidays. When you return, a new NNUE is waiting for you to use.
Which means, at least for me, that FF2 has taken Stockfish's development work of years and put in a couple of days work of its own. Which approximates 99 % Stockfish, 1 % ChessBase.

What is the reality?

Modern Times · Post by **Modern Times** » Thu Feb 11, 2021 10:48 am

With regard to Fat Fritz 2, the reality is that Albert said this:

"The final two months used more than 700 Xeon Platinum threads 24/7"

That is far more than just leaving your home machine running while you for on holiday for a couple of weeks. You vastly underestimate the amount of work Albert and chessbase put into this.

Graham Banks · Post by **Graham Banks** » Thu Feb 11, 2021 10:57 am

Modern Times wrote: ↑Thu Feb 11, 2021 10:48 am With regard to Fat Fritz 2, the reality is that Albert said this:

"The final two months used more than 700 Xeon Platinum threads 24/7"

That is far more than just leaving your home machine running while you for on holiday for a couple of weeks. You vastly underestimate the amount of work Albert and chessbase put into this.

I regard Albert as a good friend (we chat on messenger often), and I believe exactly what he has said.
He has been openly honest about Fat Fritz 2.

Gabor Szots · Post by **Gabor Szots** » Thu Feb 11, 2021 11:05 am

I'd prefer the opinion of NNUE experts. How much human work and how much CPU work is involved? It is irrelevant how many threads you use, if less then you will have to wait longer for the computer to finish but the human input does not change, IMO.

xr_a_y · Post by **xr_a_y** » Thu Feb 11, 2021 11:06 am

* "Computer work"

For Minic it takes around 2 weeks on a 8 cores hardware to generate 2B sfen scored at depth 8.
Then using a GTX1060 for the training process, it takes roughly a week to generate some nets.
After that, another week of testing process to extract the best nets from the many generated.

So I'd say around a month per net for Minic.

* "Human brain work"

If you are using SF implementation (from SF or from available separate lib), integrating the code is simple (less than a day).
Building your own is not that hard either.

What is really taking time is searching for other (better ?) architectures. This is a lot of trials and errors.
They are really a lot of stuff to test !

Just for the trainer, have a look at Stockfish team effort in order to build, configure and tune the pytorch trainer (https://docs.google.com/document/d/1UJe ... aYpBE/edit)

About architecture, there are so much to try ! size of layer, connexion between layers, quantization, activation function, ...

To go in your direction, I'd say that NNUE generation and testing process takes more time than the standard evaluation tuning process and thus with the same hardware, you tend to try less thing, you develop less, and "train" more indeed. But this is not a "choice", this is an hardware (in fact price ...) limitation. This is why I start renting hardware on the cloud, but it is an expense! Maybe I'll consider going in the "patreon" direction soon ...

Gabor Szots · Post by **Gabor Szots** » Thu Feb 11, 2021 11:23 am

Thanks Vivien.

Milos · Post by **Milos** » Thu Feb 11, 2021 12:49 pm

Gabor Szots wrote: ↑Thu Feb 11, 2021 8:58 am To develop a chess engine usually takes several monts, years or even a lifetime. But how much work is it to take an existing engine and replace its NNUE with a different one?
In my naive view, to make an NNUE you collect a huge amount of games, determine which features of positions you want to analyze, then let your computer do the rest while you are having your holidays. When you return, a new NNUE is waiting for you to use.
Which means, at least for me, that FF2 has taken Stockfish's development work of years and put in a couple of days work of its own. Which approximates 99 % Stockfish, 1 % ChessBase.

What is the reality?

Much, much less than what Alberto Plata or his German PR are trying to persuade regarding selling of their snake-oil.

Modern Times · Post by **Modern Times** » Thu Feb 11, 2021 1:06 pm

It is a case of "how long is a piece of string"

You ask the question, "how much effort is it to create a chess engine" ? Well the answer for a 1000 Elo engine is very different to the answer for a 3600 Elo engine. I know it isn't the same, but how many tens of thousands of hours have gone into Lc0 nets for example ? That is a process without an end.

dkappe · Post by **dkappe** » Thu Feb 11, 2021 3:08 pm

Milos wrote: ↑Thu Feb 11, 2021 12:49 pm
Gabor Szots wrote: ↑Thu Feb 11, 2021 8:58 am To develop a chess engine usually takes several monts, years or even a lifetime. But how much work is it to take an existing engine and replace its NNUE with a different one?
In my naive view, to make an NNUE you collect a huge amount of games, determine which features of positions you want to analyze, then let your computer do the rest while you are having your holidays. When you return, a new NNUE is waiting for you to use.
Which means, at least for me, that FF2 has taken Stockfish's development work of years and put in a couple of days work of its own. Which approximates 99 % Stockfish, 1 % ChessBase.

What is the reality?
Much, much less than what Alberto Plata or his German PR are trying to persuade regarding selling of their snake-oil.

Angry man is back!

I’ve missed you, Milos.

For Night Nurse, do you count the time to train the Bad Gyal networks from a blend of human and ab data? Then a year or more. Just Night Nurse? 3 months to generate data and months to experiment with various lambda, eta and eval limit rounds to goose a decent net out of mcts/nn data, which has a different training characteristic from ab data. Lots of failed experiments around RL with night nurse. How do I count that?

With a simple ab engine like Toga II, it’s fairly easy by comparison to produce a nnue. Data generation depends on cpu resources, and then 2-3 rounds of training with various lambda and eta settings can take 2-3 weeks, once you know what you’re doing.

Getting a nnue to equal or exceed SF nets with RL is a bear, however. Every 5 ELO is an ordeal.

Developing a0lite julia, which combines night nurse/ab and bad gyal/mcts, is more complicated but less time consuming.

Milos · Post by **Milos** » Thu Feb 11, 2021 3:44 pm

dkappe wrote: ↑Thu Feb 11, 2021 3:08 pm
Milos wrote: ↑Thu Feb 11, 2021 12:49 pm
Gabor Szots wrote: ↑Thu Feb 11, 2021 8:58 am To develop a chess engine usually takes several monts, years or even a lifetime. But how much work is it to take an existing engine and replace its NNUE with a different one?
In my naive view, to make an NNUE you collect a huge amount of games, determine which features of positions you want to analyze, then let your computer do the rest while you are having your holidays. When you return, a new NNUE is waiting for you to use.
Which means, at least for me, that FF2 has taken Stockfish's development work of years and put in a couple of days work of its own. Which approximates 99 % Stockfish, 1 % ChessBase.

What is the reality?
Much, much less than what Alberto Plata or his German PR are trying to persuade regarding selling of their snake-oil.
Angry man is back! I’ve missed you, Milos.

For Night Nurse, do you count the time to train the Bad Gyal networks from a blend of human and ab data? Then a year or more. Just Night Nurse? 3 months to generate data and months to experiment with various lambda, eta and eval limit rounds to goose a decent net out of mcts/nn data, which has a different training characteristic from ab data. Lots of failed experiments around RL with night nurse. How do I count that?

With a simple ab engine like Toga II, it’s fairly easy by comparison to produce a nnue. Data generation depends on cpu resources, and then 2-3 rounds of training with various lambda and eta settings can take 2-3 weeks, once you know what you’re doing.

Getting a nnue to equal or exceed SF nets with RL is a bear, however. Every 5 ELO is an ordeal.

Developing a0lite julia, which combines night nurse/ab and bad gyal/mcts, is more complicated but less time consuming.

You see, the amount of electricity you burn and time you spent tweaking parameters is not proportional to the end result you get, i.e. strength of the net. First, no one has made you pursue that hopeless direction of using human data. That's just a monumental waste of time and resources. Second, poking different parameters randomly using "intuition" is very far from optimal way of doing things. I know ppl training both Leela and NNUE nets are mostly hobbyist and at best master students at the start of their ML career. But there are certainly better ways like AutoAI.

Finally, I agree it is very hard to equal SF nets, but that is exactly my point. Alberto Plata (if you believe his claims) basically wasted a humongous amount of resources just to end up with a result that is obviously subpar to current SFdev even using larger net. And his "contribution" to changing the net architecture is totally trivial.

How much work is it to train an NNUE?

How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?