Copyright and Machine Learning IP

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Copyright and Machine Learning IP

Post by hgm »

Sven wrote: Sun Aug 12, 2018 1:32 pmMaybe he is a "training data provider"?
How about 'trainer' or 'coach'?
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Copyright and Machine Learning IP

Post by Sven »

hgm wrote: Sun Aug 12, 2018 1:34 pm
Sven wrote: Sun Aug 12, 2018 1:16 pmWhat this program does is not like compilation of source code, and the NN weights are not another representation of source code resulting from some transformation.
This is debatable. The weights are another representation of a (part of) the knowledge contained in the training data set. How to obtain it from the training set is a precisely defined, mechanical procedure, which fits the definition of 'transformation'. That it is a non-invertable transformation, possibly not capturing all knowledge in the training set, also holds for conventional compilation (which discards layout and comment info).
True but that "knowledge" is only contained implicitly in the training data set, it is not written down explicitly there, so it is not like "source code", so that transformation does nothing even remotely comparable to compiling source code.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Copyright and Machine Learning IP

Post by Sven »

hgm wrote: Sun Aug 12, 2018 1:40 pm
Sven wrote: Sun Aug 12, 2018 1:32 pmMaybe he is a "training data provider"?
How about 'trainer' or 'coach'?
I think the "trainer" in this case is a program.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Copyright and Machine Learning IP

Post by syzygy »

hgm wrote: Sun Aug 12, 2018 1:17 pm
syzygy wrote: Sun Aug 12, 2018 12:52 pmAnyway, it is clear that the Leela code determines only functional and no creative properties of its output, so the Leela authors have no copyright in any weights produced by the program.
That is what I thought too.

Under the sweat-of-the-brow doctrine, someone training the NN might be able to acquire copyrights on the output if it was a very computationally intensive task, requiring special equipment and lots of electricity that he had to pay for.
Indeed, and there might be countries where this is still the law (perhaps Australia, as it used to be the law in the UK). But in the US and the EU it is not, so what are the chances that the rest of the world will not gradually shift to the same position...

In the EU there is special protection outside copyright law for certain databases. I don't think a set of weights is protected as a database under this law, but I am not 100% sure.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Copyright and Machine Learning IP

Post by hgm »

the 1976 revisions to the Copyright Act leave no doubt that originality, not “sweat of the brow,” is the touchstone of copyright protection in directories and other fact-based works.
The emphasized (by me) phrase seems to highly subvert the generality of this statement, however. NN weights are not directories or lists of facts. That they are specifically mentioned does in fact suggest the preceding is not true in other cases. Why else add the phrase?
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Copyright and Machine Learning IP

Post by chrisw »

Sven wrote: Sun Aug 12, 2018 1:32 pm
chrisw wrote: Sun Aug 12, 2018 1:17 pm And there is no good reason at all to credit AS [...]
Maybe he is a "training data provider"? For me his contribution is, at least remotely, comparable to that of an opening book creator who uses an existing book creation tool and feeds selected games into it. Or, perhaps even more closely related to NNs built by supervised learning, to the contribution of the creator of a set of training positions being fed into an evaluation tuning program. Both do not involve significant amounts of creativity but both are "contributions" at least.
Yes, I agree, AS is a contributor. But then also so are a very large number of project volunteers, programmers, insert long list here, none of those are being singled out, nor are they self-promoting. Not that I ever agree with Bob on anything, but I begin to see his problem with Vas who came, took and ran away and so on, as Bob perceived it. There is a fundamental problem with seizing for oneself out of a communal open free system, work and credit for private purposes. And misrepresenting it.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Copyright and Machine Learning IP

Post by hgm »

Sven wrote: Sun Aug 12, 2018 1:42 pmI think the "trainer" in this case is a program.
The program seems more analogous to the training equipment, IMO. If person stipulates a schedule of exercises that an athlete, say for the high jump, has to do with weights or springs to build the relevant muscle, wouldn't he be referred to as the athlete's trainer?

Because training of a NN is also described as 'learning', it would perhaps be better to describe his function as 'teacher' or 'mentor' anyway.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Copyright and Machine Learning IP

Post by syzygy »

hgm wrote: Sun Aug 12, 2018 1:45 pm
the 1976 revisions to the Copyright Act leave no doubt that originality, not “sweat of the brow,” is the touchstone of copyright protection in directories and other fact-based works.
The emphasized (by me) phrase seems to highly subvert the generality of this statement, however. NN weights are not directories or lists of facts.
Fiction-based works have originality by their being fiction.

I don't see why an NN weight is not a fact and a collection of NN weights is not a collection of facts.

Just like, in the end, even a book is an arrangement of facts once you focus on the individual letters or words. If the author has arranged these facts to express an idea having a modicum of creativity, then the arrangement will be a creative work. If the writer has merely produced a shopping list, there will be no copyright protection.

Anyway, section 102 of the US copyright law explicitly requires originality. The Supreme Court decided that originality is also a requirement for compilations under section 103 (which is rather obvious, since section 103 states that the subject matter of copyright "as specified by section 102" includes compilations).
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Copyright and Machine Learning IP

Post by hgm »

I would not call letters and digits, or even words and numbers 'facts', and they certainly are not facts in the sense of the quoted law. A fact is a true statement about reality, which requires at least a sentence.

But even if they were, it is obvious that a specific, non-trivial ordering of words are protected by copyright. So it is all in the ordering. This also hold for NN weights; permute those, and you get a completely different NN, just like permuting the words in a novel can give you a completely different story (but more likely just gibberish). An NN description is far more than a shopping list of numbers.

Also, not everything that is not a 'list of facts' has to be fiction. A history book discussing the relation of historical events would most certainly be protected by copyright. Newspaper reports are protected by copyrights, right? The operative word here is (unordered / trivially ordered) list, i.e. an ultimately unimaginative presentation of the (in themselves not copyrightable) items.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Copyright and Machine Learning IP

Post by chrisw »

syzygy wrote: Sun Aug 12, 2018 12:52 pm Anyway, it is clear that the Leela code determines only functional and no creative properties of its output, so the Leela authors have no copyright in any weights produced by the program.
That's bizarre.

A grandmaster will have looked at many chess games and abstracted chess knowledge that allows him to play chess as well as or better than other grandmasters, and better than he was at playing chess aged 10 months old.

No doubt his neurons have been firing and growing and connecting in some sort of functional way too. That "process", that we don't understand, btw, from 10 months to GM would appear to contain at least a modicum of creativity, no? Nor do we understand a) how an artificial entity manages to extract chess game data into a useable form of chess knowledge, save as to say "it's just the back projection algorithm" reduction to actually not understanding at all, or b) what this chess knowledge actually is, either in part or as a whole.

If one argues "there is no creativity in the Leela(Trainer) code that produces the weights", then where exactly is the "creativity" in the whole process?

Because one thing is very certain, neither the the Leela Authors nor the training operators have very much chess knowledge at all. And what they have, such as there is, is not translated directly into the LCO+weights system. The LC0 system got its knowledge elsewhere, because, it contains a process able to find and extract intangible knowledge from data.

Einstein got his "knowledge" from somewhere too. Earth data, says I jokingly. When he rearranges things to a new way of seeing them, is he not creative at this point? And what has LC0(Trainer).EXE done, other than rearrange things into a new way of seeing them?

I beg to disagree therefore.