Copyright and Machine Learning IP

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Copyright and Machine Learning IP

Post by syzygy »

hgm wrote: Sun Aug 12, 2018 2:29 pm But even if they were, it is obvious that a specific, non-trivial ordering of words are protected by copyright.
A random ordering of words is not protected. Nor is an alphabetical ordering or any other type of functional ordering, however ingenious.
Also, not everything that is not a 'list of facts' has to be fiction. A history book discussing the relation of historical events would most certainly be protected by copyright. Newspaper reports are protected by copyrights, right?
True, it does not need to be fiction. But in case of a history book discussing historical events or ideas about historical events, copyright will only protect the expression of those historical events or ideas. Another person is free to express the same events and ideas in his own words. In case of a novel, you are more likely to infringe copyright if you retell (parts of) the story in your own words.
The operative word here is (unordered / trivially ordered) list, i.e. an ultimately unimaginative presentation of the (in themselves not copyrightable) items.
And in case of an NN, the individual numbers are not copyrightable and their selection and arrangement is not imaginative but dictated by functional requirements. (Because in the end you can produce them by feeding the training program an unimaginative collection of games, and even if you do select them with imagination, one wouldn't be able to tell from the result.)

(There may be exceptions. If the NN produces paintings and someone carefully selects training data to teach the NN to paint in a particular style, then there might be copyright at least on the paintings. I'm not sure if one would then need to recognise copyright on the NN weights. Perhaps as a computer program.)
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Copyright and Machine Learning IP

Post by syzygy »

chrisw wrote: Sun Aug 12, 2018 3:00 pm
syzygy wrote: Sun Aug 12, 2018 12:52 pm Anyway, it is clear that the Leela code determines only functional and no creative properties of its output, so the Leela authors have no copyright in any weights produced by the program.
That's bizarre.

A grandmaster will have looked at many chess games and abstracted chess knowledge that allows him to play chess as well as or better than other grandmasters, and better than he was at playing chess aged 10 months old.

No doubt his neurons have been firing and growing and connecting in some sort of functional way too. That "process", that we don't understand, btw, from 10 months to GM would appear to contain at least a modicum of creativity, no? Nor do we understand a) how an artificial entity manages to extract chess game data into a useable form of chess knowledge, save as to say "it's just the back projection algorithm" reduction to actually not understanding at all, or b) what this chess knowledge actually is, either in part or as a whole.

If one argues "there is no creativity in the Leela(Trainer) code that produces the weights", then where exactly is the "creativity" in the whole process?
If you think here is an analogy to be found here, then your view of what the Leela code does is quite different from mine. Let's put it like this: if the basic brain structure of a new-born future grandmaster was created by some "creator", then that creator does not, in my view, have a copyright on the synaptic connections that have formed when the brain achieves GM level.

Btw, I was not talking about copyright on the trainer computer-program code (which obviously exists but is irrelevant to this discussion) but copyright on the NN as a structured set of weights. There is no copyright on that structure.
Einstein got his "knowledge" from somewhere too. Earth data, says I jokingly. When he rearranges things to a new way of seeing them, is he not creative at this point? And what has LC0(Trainer).EXE done, other than rearrange things into a new way of seeing them?
There is certainly no copyright in E=mc².
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Copyright and Machine Learning IP

Post by hgm »

syzygy wrote: Sun Aug 12, 2018 3:30 pmA random ordering of words is not protected. Nor is an alphabetical ordering or any other type of functional ordering, however ingenious.
It seems to me this is an untenable distinction. Ultimately 'creativity' is also just the product of an (admittedly complex) functional process.
And in case of an NN, the individual numbers are not copyrightable and their selection and arrangement is not imaginative but dictated by functional requirements. (Because in the end you can produce them by feeding the training program an unimaginative collection of games, and even if you do select them with imagination, one wouldn't be able to tell from the result.)
But of course one would be able to tell it from the result. The trained NN is supposed to perform a specific task, and the performance at this task can be measured. If it performs the task much better than what you would get from training it with the typical unimaginative collection of games, it must be because of the quality of the selection, and the result would be a work of art. Otherwise you might as well say: "paintings cannot be copyrighted, because they are just uncopyrightable drops of paint on a canvas. And even if the painter would have used imagination in applying the paint, you cannot tell the difference".

It also seems strange to err by default towards the side of 'no protection'. Many works of modern art look to me as if an ape could have painted them. Yet I think I would be in for a lot of trouble if I distributed copies of those (rather than having an ape actually paint a similar work). It seems this criterion is more about the copyrights being moot when it is trivial to independently generate something identical.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Copyright and Machine Learning IP

Post by chrisw »

syzygy wrote: Sun Aug 12, 2018 3:46 pm Btw, I was not talking about copyright on the trainer computer-program code (which obviously exists but is irrelevant to this discussion)
Agreed.
I am talking about whether or not the trainer computer-program code is or is not producing, unaided by creative human hand, a "work".

Normally this is straightforward, because it involves a human carrying out some "creative" act(s), a painter choosing a colour, a writer choosing words, a photographer selecting viewpoint and bla-di-bla. The human chosen "acts" build up, the "work" forms, et voila, the "Work". Copyright exists. IP exists. The acts may be individually functional, but hey, Picasso did them, it's a Work.

Our problem arises when these "acts" are chosen and performed by a machine, without human assistance. But you are not relying on the "machines can't produce works" argument, as far as I can tell. They can, but .....
but copyright on the NN as a structured set of weights. There is no copyright on that structure.
An MP3 file consists of a 1-D structured set of numbers. But it represents music. That's copyrightable because of what it represents. Helpfully it's a one-to-one mapping, and decodable by MP3 player. And the music is tangible to the ear.

The NN represents chess knowledge. It's not any form of one-to-one mapping, it's intangible, and it's decodable by LC0.EXE and into LC0.EXE language and performing weird transformations on it and then outputting chess moves AND THEN a smart chess knowledgable person decoding what she thinks the system "knows". OR, by a possibly less smart person comparatively looking at the numeric move value outputs.
..... the NN as a structured set of weights. There is no copyright on that structure.
Because the LCO(Trainer) carries out a repetitive, functional series of steps on the data. Not creative. Therefore no copyright. I think is your argument.

Ok, the machine is different to Picasso, because? He carries out a series of functional repetitive steps with his brush on the canvas we could cynically say. But we could be a little more charitable, each brush stroke is a little different, it builds on the others, it changes a slightly different place, but all the time the whole picture is changing, developing.

I think if we reduce machines to things that carry out repetitive functional steps, we've kind of "got them" on the copyright front, haven't we?

But we could be kind to the machine, like we are kind to Picasso, each little back projection error correction is a little different, it builds on the others, it changes more or less in different places, all the time the whole network is changing, developing.

As ever, it's how we look at things. If a machine can produce, by itself, things that when humans do them, we would say "creative", but they can't be allocated "creative" label because functional steps, well, time to reassess fundamental assumptions.
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Copyright and Machine Learning IP

Post by noobpwnftw »

1) If I write a very sophisticated implementation of rand(), do I own IP over the result of my algorithm from whoever runs it?
2) My rand() implementation is seeded by an integer value, person A and person B chose different integers based their own creative methods that may or may not involve extensive study of my algorithm, do I share the IP of their work?
3) If I spend an enormous amount of resource to run the algorithm, is the very lengthy output of my algorithm then becoming copyright-able?
4) If I refine my algorithm to take no input(aka zero-based), does it make any difference to answer 1 and 3?

I think it's just 4 NOs. Then find and replace "rand()" with "NN" problem solved.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Copyright and Machine Learning IP

Post by chrisw »

noobpwnftw wrote: Sun Aug 12, 2018 6:19 pm 1) If I write a very sophisticated implementation of rand(), do I own IP over the result of my algorithm from whoever runs it?
2) My rand() implementation is seeded by an integer value, person A and person B chose different integers based their own creative methods that may or may not involve extensive study of my algorithm, do I share the IP of their work?
3) If I spend an enormous amount of resource to run the algorithm, is the very lengthy output of my algorithm then becoming copyright-able?
4) If I refine my algorithm to take no input(aka zero-based), does it make any difference to answer 1 and 3?

I think it's just 4 NOs. Then find and replace "rand()" with "NN" problem solved.
not parallel, unfortunately.

You write a program LC which writes another program NN which plays chess. That's a little over-simplified for conceptual clarity, try:

You write a program LC which creates the smarts of another program LCNN which plays chess.

The question is: are the smarts copyright?

Muller says: yes, you get copyright, LC producing of LCNN is creative, and if he creatively data-select assists LC to make smarter smarts, then he gets (some) copyright too.
Whittington says: yes, you get copyright, LC producing of LCNN is creative, any data-select assists are likely trivial, especially functional ones.
Syzygy says: no, LC producing of LCNN is functional, so no copyright, any data-select assists are likely trivial, especially functional ones.

Hopefully, that's not entirely random and will get corrected if so.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Copyright and Machine Learning IP

Post by hgm »

Actually I say that whether you get copyrights is entirely dependent on the license agreement you made with the user of LC, and not on whether anything creative or functional is done.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Copyright and Machine Learning IP

Post by syzygy »

hgm wrote: Sun Aug 12, 2018 4:32 pm
syzygy wrote: Sun Aug 12, 2018 3:30 pmA random ordering of words is not protected. Nor is an alphabetical ordering or any other type of functional ordering, however ingenious.
It seems to me this is an untenable distinction. Ultimately 'creativity' is also just the product of an (admittedly complex) functional process.
Well, it's the law. For example the Endstra tapes were ultimately found not to be copyrighted (even though I think that outcome was wrong).
Hof wrote:5.13

Het voortbrengsel waarvoor de erven Endstra auteursrecht inroepen, is echter niet gelegen in de transscripties van de achterbankgesprekken, maar in de achterbankgesprekken zelf. Hoewel na opschriftstelling aan de door Endstra uitgesproken teksten niet of nauwelijks een touw is vast te knopen – zie de zojuist weergegeven citaten – waren die teksten voor de CIE-ambtenaren met wie Endstra de gesprekken voerde, kennelijk begrijpelijk genoeg. Bij mondelinge communicatie liggen de eisen voor begrijpelijkheid klaarblijkelijk anders/lager dan bij schriftelijke communicatie, zoals ook uit het hiervoor weergegeven citaat van Karel van het Reve naar voren komt. In zoverre is er dus een verschil tussen de transscripties van de achterbankgesprekken en de achterbankgesprekken zelf. Er is evenwel ook een overeenkomst tussen beide. Omdat de transscripties letterlijke weergaven zijn van de achterbankgesprekken, is de formele opbouw/structuur van die gesprekken volledig uit de transscripties te kennen. Blijkens die transscripties bestaan de door Endstra uitgesproken teksten uit een schier eindeloze reeks onafgemaakte, slecht lopende en ronduit kromme zinnen (waardoor de transscripties ook zo moeilijk leesbaar zijn). De vormgeving van het voortbrengsel in kwestie wijst er dus geenszins op dat, wat betreft die vormgeving, sprake is van scheppende, creatieve arbeid (zie rov. 5.7)/een intellectuele schepping (zie rov. 5.5 in fine) van Endstra. Integendeel, gezien de banaliteit van die vormgeving kan niet worden aangenomen dat het door Endstra uitgesprokene op creatieve arbeid van enige betekenis berustte. De argumenten ii) en iii) van de erven Endstra kunnen bijgevolg niet als juist worden aanvaard, althans voor zover daarmee wordt gedoeld op de persoonlijk stempel-eis; dat de uitlatingen van Endstra een eigen karaker hebben, staat niet (meer) ter discussie (zie rov. 5.9). Ten aanzien van argument iii) wordt nader overwogen dat het zo nu en dan bezigen van een onalledaagse uitdrukking – zoals de uitdrukking ‘ze heeft de zwarte band winkelen hoor, dus eh…’ als beschrijving van het koopgedrag van Endstra’s vrouw (MnV onder 2.13) – geen auteursrechtelijk beschermd werk kan maken van een verder banaal of triviaal vormgegeven gesprek.
And in case of an NN, the individual numbers are not copyrightable and their selection and arrangement is not imaginative but dictated by functional requirements. (Because in the end you can produce them by feeding the training program an unimaginative collection of games, and even if you do select them with imagination, one wouldn't be able to tell from the result.)
But of course one would be able to tell it from the result. The trained NN is supposed to perform a specific task, and the performance at this task can be measured. If it performs the task much better than what you would get from training it with the typical unimaginative collection of games, it must be because of the quality of the selection, and the result would be a work of art.
I doubt that you can outperform the network trained on the unimaginative collection of all games between top players selected from some pre-existing collection of games. But even if you do outperform it, that alone does not make the resulting NN copyrightable, precisely because it is an objective criterion.
Otherwise you might as well say: "paintings cannot be copyrighted, because they are just uncopyrightable drops of paint on a canvas. And even if the painter would have used imagination in applying the paint, you cannot tell the difference".
We can tell the difference, or at least lawyers pretend that we can tell the difference between random drops of paint and compositions of paint.
It also seems strange to err by default towards the side of 'no protection'.
Err towards one side seems to imply that there could be room for doubt here.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Copyright and Machine Learning IP

Post by chrisw »

hgm wrote: Sun Aug 12, 2018 7:31 pm Actually I say that whether you get copyrights is entirely dependent on the license agreement you made with the user of LC, and not on whether anything creative or functional is done.
oh, ok, sorry, you're quite right. I hadn't noticed that one. From an earlier posting as well,

syzygy wrote: ↑Sun Aug 12, 2018 11:52 am
Anyway, it is clear that the Leela code determines only functional and no creative properties of its output, so the Leela authors have no copyright in any weights produced by the program.

Hgm wrote:
That is what I thought too.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Copyright and Machine Learning IP

Post by syzygy »

chrisw wrote: Sun Aug 12, 2018 4:51 pm An MP3 file consists of a 1-D structured set of numbers. But it represents music. That's copyrightable because of what it represents. Helpfully it's a one-to-one mapping, and decodable by MP3 player. And the music is tangible to the ear.
This is a good example.

If you feed an mp3 encoder copyrighted music, the resulting mp3 file will be covered by the copyright on the music (because the mp3 file easily preserves enough of the human-perceivable features of the music). It will not be covered by the copyright on the mp3 encoder because the mp3 encoder determines only the functional structure of the mp3 file.

Instead of an mp3 encoder taking as input music and producing as output an mp3 file, here we have an NN trainer program taking as input a collection of games and producing as output a set of weights. For the set of weights to be copyrighted we at least need a (somewhat) originally collection of games and to somehow be able to convince a judge that enough of that originality is perceivable from the set of weights. That's going to be tough.

At the moment a case is pending before the CJEU in which the Court will have to decide whether a taste can be a copyrighted work. The advocate general recently issued his opinion advising the court to decide that the notion of "work" encompasses only subject matter that can be perceived through sight or hearing and "with precision, stability and objectivity".

A neural network as a set of numbers can be perceived by the human eye but is then just meaningless. One will not be able to say that a particular network infringes on the copyright of another network by just looking at the similarity between the weights. (The same holds true for an mp3 file: as a set of numbers it is meaningless to the human eye or ear. You have to run it through an mp3 player to perceive it meaningfully.)

So a neural network will have to be run through the Lc0 client, but can we then perceive it "with precision, stability and objectivity"? I'm not convinced that "playing ability/style" is something that can be the object of copyright. Of course playing strength can be measured to some extent, but that is just a functional criterion (like how aerodynamically a car is shaped).

It is possible that the criteria formulated by the CJEU will be more relaxed than those proposed by the advocate general.
The NN represents chess knowledge. It's not any form of one-to-one mapping, it's intangible, and it's decodable by LC0.EXE and into LC0.EXE language and performing weird transformations on it and then outputting chess moves AND THEN a smart chess knowledgable person decoding what she thinks the system "knows". OR, by a possibly less smart person comparatively looking at the numeric move value outputs.
..... the NN as a structured set of weights. There is no copyright on that structure.
Because the LCO(Trainer) carries out a repetitive, functional series of steps on the data. Not creative. Therefore no copyright. I think is your argument.
My argument is that the Lc0 code does not determine the values of weights but probably only the number of weights. If Lc0 requires that the NN has 5x5 weights, then you will agree that the copyright on Lc0 does not extend to all collections of 5x5 numbers. Even if it determines much more structure than something like "5x5", it will not be enough because the structure is just that what is necessary to make it work with the Lc0 client. Just like the structure imposed by an mp3 encoder is just that what is necessary to allow the mp3 file to be decoded by an mp3 player. (There is probably a bit more to it in case of mp3 encoders since they do not all produce the same quality output, but any variation between mp3 encoders is not an expression of creative freedom but of an attempt to reach a high quality or a good quality/time trade off -- technical criteria. And the perceivable differences between two encodings of the same music or between an mp3 file and the original music will anyway be extremely unlikely to establish a new/additional copyright.)