Copyright and Machine Learning IP

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Copyright and Machine Learning IP

Post by syzygy »

hgm wrote: Sun Aug 12, 2018 12:11 pm An important precedent is this: once a copyright holder releases his work into the public domain, he irrevocably waives the possibility to exert that right. And by releasing Leela under the GPL the authors explicitly placed any Leela output, creative or not, in the public domain.
Well, the GPLv3 just says this:
The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work.
This seems to be an entirely empty statement. It rains only if it rains.

Anyway, it is clear that the Leela code determines only functional and no creative properties of its output, so the Leela authors have no copyright in any weights produced by the program.

For the output of a program like GNU bison the situation is more interesting. Since its output includes the yyparse() code from GNU bison, it is in principle covered by the copyright on GNU bison. The GNU-bison license nowadays makes clear that the GPL does not apply to the code produced by the program. I guess this means that the yyparse() code has been placed in the public domain (even though that is not really possible in the EU).
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Copyright and Machine Learning IP

Post by hgm »

Branko Radovanovic wrote: Sun Aug 12, 2018 12:21 pm NN weights are intellectual property, but they are not source code. The GPL covers source code only.
I would say that the GPL also covers object code, and even more so: you can distribute source without object, but not object without source.

I fully agree that weights are not source code. But how about object code?

Is 'byte code' in a Java class file 'object code', or could the class file of a Java program that was released under GPL be distributed without a copy of the Java source? Or does the author of the Java byte-code compiler own the copyrights of all byte code produced with it?
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Copyright and Machine Learning IP

Post by syzygy »

Branko Radovanovic wrote: Sun Aug 12, 2018 12:21 pm Just some brief remarks - I could write 5 pages on this issue.

Creativity is not an absolute requirement for copyright; see https://en.wikipedia.org/wiki/Sweat_of_the_brow.
That same page confirms that the sweat-of-the-brow doctrine has been rejected both in the US and in the EU. From Feist Pulications, Inc. v. Rural Telephone Service Co., 499 U.S. 340 (1991):
[44] In summary, the 1976 revisions to the Copyright Act leave no doubt that originality, not “sweat of the brow,” is the touchstone of copyright protection in directories and other fact-based works. Nor is there any doubt that the same was true under the 1909 Act. The 1976 revisions were a direct response to the Copyright Office's concern that many lower courts had misconstrued this basic principle, and Congress emphasized repeatedly that the purpose of the revisions was to clarify, not change, existing law. The revisions explain with painstaking clarity that copyright requires originality; that facts are never original; that the copyright in a compilation does not extend to the facts it contains; and that a compilation is copyrightable only to the extent that it features an original selection, coordination, or arrangement.
And originality means creativity:
[49] The question that remains is whether Rural selected, coordinated, or arranged these uncopyrightable facts in an original way. As mentioned, originality is not a stringent standard; it does not require that facts be presented in an innovative or surprising way. It is equally true, however, that the selection and arrangement of facts cannot be so mechanical or routine as to require no creativity whatsoever. The standard of originality is low, but it does exist. (...) As this Court has explained, the Constitution mandates some minimal degree of creativity (...).
NN weights are intellectual property, but they are not source code.
How are they intellectual property? What law protects them?
The GPL covers source code only. With the rise of NN, this could be an interesting issue, and personally I'd like to hear the position of the Free Software Foundation on that, but I don't think it's likely they'd ever view NN weights as "code", even if there is a degree of functional equivalence.
More often than not the FSF's position is at odds with copyright law, so I'm personally not particularly interested in what they might say about it.
The idea that because monkey pressed the shutter button, the photographer does not own the copyright, is perversely stupid and wrong.
What may be stupid is arguing that the monkey owns the copyright. That being the owner of the camera does not make you the photographer seems pretty obvious.
The final and the most important thing: copyright law has always been largely utilitarian both in theory and practice - of course, because it serves a purpose, like all laws. And let's be honest, it's not to protect "creativity" or "innovation" per se, but rather to protect the financial interests of creators. You have to create something that hasn't existed before, and you have to put some effort, resources and skill into it: this is your investment, and the law is meant to protect it.
Wrong. Sweat of the brow has been rejected a long time ago. Copyright really is about (a modicum of) creativity. And its purpose is to stimulate the production of creative works.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Copyright and Machine Learning IP

Post by Sven »

hgm wrote: Sun Aug 12, 2018 12:58 pm
Branko Radovanovic wrote: Sun Aug 12, 2018 12:21 pm NN weights are intellectual property, but they are not source code. The GPL covers source code only.
I would say that the GPL also covers object code, and even more so: you can distribute source without object, but not object without source.

I fully agree that weights are not source code. But how about object code?

Is 'byte code' in a Java class file 'object code', or could the class file of a Java program that was released under GPL be distributed without a copy of the Java source? Or does the author of the Java byte-code compiler own the copyrights of all byte code produced with it?
Object code is the output of a compiler after it processes source code. So Java byte code is object code but NN weights aren't. NN weights are parameter values produced by a special program that conducts a learning process. What this program does is not like compilation of source code, and the NN weights are not another representation of source code resulting from some transformation.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Copyright and Machine Learning IP

Post by hgm »

syzygy wrote: Sun Aug 12, 2018 12:52 pmAnyway, it is clear that the Leela code determines only functional and no creative properties of its output, so the Leela authors have no copyright in any weights produced by the program.
That is what I thought too.

Under the sweat-of-the-brow doctrine, someone training the NN might be able to acquire copyrights on the output if it was a very computationally intensive task, requiring special equipment and lots of electricity that he had to pay for.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Copyright and Machine Learning IP

Post by chrisw »

syzygy wrote: Sun Aug 12, 2018 11:50 am
chrisw wrote: Sun Aug 12, 2018 11:20 am You couldn't make this up!
I see no (copyright) problem as long as it is acknowledged that Lc0 is the engine behind Deus X. Then its use complies with the GPLv3.

TCEC could have felt cheated but apparently they do not. (But they probably knew it from the start anyway? They can't be that naive.)

TCEC spectators might feel upset, but they are not obliged to be TCEC spectators.
Agreed that there is no "copyright problem" in the sense that copyright law is being broken, or somebody could be sued or whatever; but instead we have the bizarre situation where the computer chess world narrative is that "AS is the author of (insert some ever changing definition here)", and this narrative is false. But everybody believes it and acts on it. It's a false narrative without factual base. It may even be that people don't believe in it but act on it anyway. Just as an aside, I'm fascinated by these situations of "dominant narrative with no factual base", they abound in politics (Chomsky Manufacturing Consent).

Ok, so the current TCEC expression used is: "Albert Silver, the author of the net Deus X powered by Lc0".

AS doesn't have any IP in DeusX, LC0 Authors own the IP.

An "Author" is a writer/creator/composer, as distinguished from a compiler/translator/editor/copyist. Authors have "Authors Right".
Authors rights divide into:
1. moral rights based on the view that a creative work is in some way an expression of the author’s personality.
2. economic rights based on ownership of the IP.

Neither apply here. The term "author" doesn't apply. It's misleading (possibly deliberately so, given it's a steadily retreating backstop expression).

Better would be to call it what it is, neutrally and factually. LC0 trained on non-zero data (C) Leela Authors or LC(NonZero) for short. And there is no good reason at all to credit AS, in particular after the initial and lasting well-documented deceptions.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Copyright and Machine Learning IP

Post by syzygy »

hgm wrote: Sun Aug 12, 2018 12:58 pm
Branko Radovanovic wrote: Sun Aug 12, 2018 12:21 pm NN weights are intellectual property, but they are not source code. The GPL covers source code only.
I would say that the GPL also covers object code, and even more so: you can distribute source without object, but not object without source.

I fully agree that weights are not source code. But how about object code?

Is 'byte code' in a Java class file 'object code', or could the class file of a Java program that was released under GPL be distributed without a copy of the Java source? Or does the author of the Java byte-code compiler own the copyrights of all byte code produced with it?
The GPL obviously also covers the object code created from the source code. It says so ("derivative works").

An interesting question is why there would be copyright on object code. In the EU this is so because Directive 2009/24/EC says so:
Art. 1(2) wrote:Protection in accordance with this Directive shall apply to the expression in any form of a computer program. Ideas and principles which underlie any element of a computer program, including those which underlie its interfaces, are not protected by copyright under this Directive.
Art. 4(1) wrote:Subject to the provisions of Articles 5 and 6, the exclusive rights of the rightholder within the meaning of Article 2 shall include the right to do or to authorise:

(a) the permanent or temporary reproduction of a computer program by any means and in any form, in part or in whole; in so far as loading, displaying, running, transmission or storage of the computer program necessitate such reproduction, such acts shall be subject to authorisation by the rightholder;

(b) the translation, adaptation, arrangement and any other alteration of a computer program and the reproduction of the results thereof, without prejudice to the rights of the person who alters the program;

(c) any form of distribution to the public, including the rental, of the original computer program or of copies thereof.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Copyright and Machine Learning IP

Post by Sven »

chrisw wrote: Sun Aug 12, 2018 1:17 pm And there is no good reason at all to credit AS [...]
Maybe he is a "training data provider"? For me his contribution is, at least remotely, comparable to that of an opening book creator who uses an existing book creation tool and feeds selected games into it. Or, perhaps even more closely related to NNs built by supervised learning, to the contribution of the creator of a set of training positions being fed into an evaluation tuning program. Both do not involve significant amounts of creativity but both are "contributions" at least.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Copyright and Machine Learning IP

Post by chrisw »

hgm wrote: Sun Aug 12, 2018 12:11 pm
chrisw wrote: Sun Aug 12, 2018 11:20 amYou couldn't make this up!
The most common cause for a conclusion that makes no sense, though, is that the reasoning behind it was flawed.

Copyright law was not written with this particular case in mind, so judgement on this case will have to rely on interpretation and precedent. Your assume the training of a NN will be interpreted as a creative act on the part of the NN, which is rather dubious. An important precedent is this: once a copyright holder releases his work into the public domain, he irrevocably waives the possibility to exert that right. And by releasing Leela under the GPL the authors explicitly placed any Leela output, creative or not, in the public domain.
The conclusion makes perfect sense. That it is in conflict with dominant group narrative is suggestive that dominant group narrative is false. These situations happen all the time. In politics often because dominant narrative is the result of effective propaganda. Again, Chomsky Manufacturing Consent.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Copyright and Machine Learning IP

Post by hgm »

Sven wrote: Sun Aug 12, 2018 1:16 pmWhat this program does is not like compilation of source code, and the NN weights are not another representation of source code resulting from some transformation.
This is debatable. The weights are another representation of a (part of) the knowledge contained in the training data set. How to obtain it from the training set is a precisely defined, mechanical procedure, which fits the definition of 'transformation'. That it is a non-invertable transformation, possibly not capturing all knowledge in the training set, also holds for conventional compilation (which discards layout and comment info).