crem wrote: ↑Sun Sep 15, 2019 9:20 pm
Now, I’m continuing to retell my March documents to TCEC, regarding the other two parts of the TCEC rule, engine and neural network.
Here the intention was quite good, but interpretation is quite bent..
Let’s go to our favorite example, Allie+Stein.
They use Lc0’s NN architecture (together entire Lc0 NN backends).
NN architecture is something distinctive to an engine (see below), it should be considered either a part of neural network, or an engine (or both). In TCEC interpretation, it’s neither!
- Neural network
Somehow NN architecture is not part of a Neural Network.
So (I guess Albert Silver started this with "DeusX", and now continues with “Fat Fritz” by the way) people somehow use “neural network weights” and “neural network” interchangeably.
In my opinion, as they share exactly the same architecture in detail, all those DeusX, Einstein, etc, are
not different neural networks.
They are
the same neural network, but trained differently, similarly like if you retune all SF constants to make SF play stronger/sharper/more-like-human/whatever, you won’t say that you created a unique eval function, you’ll say you tuned the existing one. Different attempts to write NN-based engine will lead to differences in architecture, both internal and in inputs/outputs. Yes, currently all attempts are similar to AlphaZero, but still they differ in details in architecture, and they have completely different implementations.
Yet, for TCEC, filling the same network with different numbers somehow makes it a different neural net.
(random thought:
Maybe originally TCEC intended to mean “NN architecture” under their “Neural Network” uniqueness rule, and “training method” instead of “training script”, and later screwed up applying their intention? It would make some sense then. Then Allie+Stein would be “same NN architecture, (arguably) different training method, different engine [which with NN architecture being a separate entry, may be translated to ‘different search’]”, 2 of 3)
(one more note:
also regarding network weights file format: currently for Lc0 it’s not some generic format, we change format every time we change NN architecture. So currently the same file format automatically means it’s the same architecture, that’s why it’s often brought up as an indication of non-uniqueness)
- Engine
So, according to TCEC, NN architecture is not part of NN. Maybe it’s part of an engine then?
Nope, from TCEC point of view, nothing NN-related is part of an engine it seems.
So Allie doesn’t have a single line of code which deals with neural networks. Allie’s author didn’t think how to encode chess position for NN, which outputs to get, where to have batch normalization layers, whether to use int8 or fp16 arithmetics, nothing like that.
Why care about all that, because there is this “standard library for NN evals” called "Lc0 backends".
So, Allie doesn’t have a single line of own NN-related code.
Is it still an “unique NN-based engine”? — According to TCEC, yes!
Some background on why competing vs engines reusing Lc0 NN backends feels unfair, even if it formally would be allowed:
I’d say 90% of all code changes leading to strength growth of Lc0 during last 12 months were in neural network backend code (either NN architecture changes or performance optimizations). It is the main vector of improvement for Lc0 for a year, and projects reusing it just get the same improvements for free. Competing with such projects means competing with yourself.
Lc0 NN backend is not a “generic library to do NN evals”, and is not “a standard NN library which is de-facto a standard for all open source NN engines”, it’s an important distinctive part of an engine.