Copyright and Machine Learning IP

hgm · Post by **hgm** » Mon Aug 13, 2018 12:41 pm

Fulvio wrote: ↑Mon Aug 13, 2018 11:27 amLet's consider the games as source code, the training software as a compiler and megadatabase as a collection of source code files.
If you remove some files from that collection and compile the result are you entitled to any copyright?

I think the compiler analogy is a valid one. So IMO the copyrights the trainer/teacher gets on the NN weights, should be the same as what he would get on the collection of training examples, or at least the aspects of the examples that affects the NN weights. (E.g. the order of the training examples would not matter, so if copyrights would depend on creatively ordering the examples, they would not transfer to the weights.) Just dumping megadatabase on the NN would not create any copyrights, because the trainer would not own any copyrights on megadatabase in the first place (an I assume no one else would). Just like just dumping a bucket of red paint on a canvas would not give a copyrighted painting.

chrisw · Post by **chrisw** » Mon Aug 13, 2018 1:14 pm

hgm wrote: ↑Mon Aug 13, 2018 12:41 pm
Fulvio wrote: ↑Mon Aug 13, 2018 11:27 amLet's consider the games as source code, the training software as a compiler and megadatabase as a collection of source code files.
If you remove some files from that collection and compile the result are you entitled to any copyright?
I think the compiler analogy is a valid one. So IMO the copyrights the trainer/teacher gets on the NN weights, should be the same as what he would get on the collection of training examples, or at least the aspects of the examples that affects the NN weights. (E.g. the order of the training examples would not matter, so if copyrights would depend on creatively ordering the examples, they would not transfer to the weights.) Just dumping megadatabase on the NN would not create any copyrights, because the trainer would not own any copyrights on megadatabase in the first place (an I assume no one else would). Just like just dumping a bucket of red paint on a canvas would not give a copyrighted painting.

Assuming the game set is large and contains generally senseful game data, there's no actual real evidence that the actual game set affects the chessic nature of the eventual output. There's some second hand evidence, DeepMind appear to be of the opinion that a Zero approach is best. Where "best" presumably equates with elo rating.

Why then does anyone imagine that training on say, a very large set of games that are self generated, would produce anything chessicly different than training on a very large set of human games, or a very large set of computer games or anything else? Please remember that what drives the "learning process" and nudges the weights, is game *result*. The software picks out those features of chess positions that are more statistically likely to lead to a win. Whilst it is picking out those features, it could be said to be "learning chess heuristics", known and unknown, and balancing their relative contributions in a non-linear manner. This learning of heuristics takes places whether or not the original players of the game "knew" the heuristic in the first place, the learning learns more than the game players ever individually knew. Indeed, the DeepMinders said that lots of selfplay games is superior because it allows learning without any pre-conceived notions, and is therefore "better" at the discovery of the "true nature" of chess and how to win it.

How on earth anyone imagines they are going to get a play style into the LC0 pipe system (well, random would be easy enough, stupid would be easy enough) when the system learning is only interested in win or not win. Style btw, depends heavily on the opponent, you can't play "Tal" if the opponent doesn't play along with you. Style is deeply subjective. And of course, "style" is open to all the BS-merchants to talk nonsense about.

So, first, to all the alleged "compilers" of new nets, using LC0 pipe, but "selecting" games, who think they do anything differently better, justify yourselves. I am calling BS on the idea (via game selection LC0 pipe) of superiority of "creative selection" over just either a mass of randomly sourced games or a mass of self-play games.

hgm · Post by **hgm** » Mon Aug 13, 2018 2:20 pm

This discussion is not just about LC, but about copyrights on NN in general. LC is just used as an example, and if it indeed does not matter at all what set of games you feed it, that only show it is a rather poor example for the general case. Most of us already came to the conclusion that in the case of Deus X no effort whatsoever was put in selecting the training games, so that the the trainer doesn't own any copyrights on the Deus-X weights.

The question is whether there do exist cases where it does matter what training examples are selected, and how these would be judged by existing law.

BTW, I can imagine that even in the case of Chess it could be possible to get better nets by feeding it judiciously chosen training examples. E.g. feed it many solutions and refutations to tactical problems (disguised as games), to make it better at tactics. Or feeding it elementary end-games to make it better at end-games. Even if that would not be true for large NN, it might be true for smaller NN. Also note that 'better' doesn't necessarily mean "higher score in matches starting from the standard opening position", but could very well mean "more helpful in analysis of positions from humans that first got themselves into tactical trouble, or hard-to-win end-games". I expect LC0 or Deus X to be pretty poor at that.

chrisw · Post by **chrisw** » Mon Aug 13, 2018 2:40 pm

hgm wrote: ↑Mon Aug 13, 2018 2:20 pm This discussion is not just about LC, but about copyrights on NN in general. LC is just used as an example, and if it indeed does not matter at all what set of games you feed it, that only show it is a rather poor example for the general case. Most of us already came to the conclusion that in the case of Deus X no effort whatsoever was put in selecting the training games, so that the the trainer doesn't own any copyrights on the Deus-X weights.

The question is whether there do exist cases where it does matter what training examples are selected, and how these would be judged by existing law.

BTW, I can imagine that even in the case of Chess it could be possible to get better nets by feeding it judiciously chosen training examples. E.g. feed it many solutions and refutations to tactical problems (disguised as games), to make it better at tactics. Or feeding it elementary end-games to make it better at end-games. Even if that would not be true for large NN, it might be true for smaller NN. Also note that 'better' doesn't necessarily mean "higher score in matches starting from the standard opening position", but could very well mean "more helpful in analysis of positions from humans that first got themselves into tactical trouble, or hard-to-win end-games". I expect LC0 or Deus X to be pretty poor at that.

D'accord. We discuss whether different selections of training games will have a chessic effect on the final entity, all other things being equal. Ie, feeding large number of selected games down the training pipe as opposed to a large set of self play or just a large set of everything available down the same pipe.

If you like to add egtb data or test position data for training, there would be a problem with using lc0, because lc0 demands history data and produces different results depending on the history sequence. As things stand, direct adding of static positions to learning would probably reduce lc0 (also alphazero) playing capability and generate worse results. So, if NN in general, problems and egtb inclusion possibly means non-history NNs.

hgm · Post by **hgm** » Mon Aug 13, 2018 3:28 pm

This is why I wanted to add main lines of problem solutions, and some refutations of plausible non-solutions, rather than just static positions. Or in the case of end-games the entire optimal path to mate.

For the matter of copyrights it would be better to focus on another example. Say I make a program to generate syntactically correct song lyrics, and equip it with a NN to guide the choice of words. Now I train that NN by feeding it all the songs written by Bob Dylan. And, lo and behold, it then starts to generate world-class poetry with a high market value.

Although the selection I made was purely functional ("written by Bob Dylan"), the set of training examples was obvious copyrighted (by Bob). So would Bob Dylan now be able to claim copyrights on the NN weigths?

I think it would be reasonable if he could. After all, I reverse-engineered his mind. Shouldn't he have copyrights on his own mind? Or should anyone be free to clone his musical genius if they can get enough access to it (through his published songs)?

A more tricky question is what should be the legal status of the songs generated by the NN. I think the most reasonable solution would be to declare the NN that I make already to be an copy (of Dylan's mind, which is the original), so that the copyright holder (Dylan) can dictate conditions on its use.

chrisw · Post by **chrisw** » Mon Aug 13, 2018 4:47 pm

well, I am never quite sure if we discuss how we would like it to be, or how we think it currently is, or, in the case of a new developing sector with artefacts beyond what is known before, how the legal system is likely to respond.

Me, I prefer to stick to how it is, and for new stuff (like machines creating works unaided) how the system might likely respond. Since we have an example in this field, and we are from this field, it makes sense to discuss examples (in a general sense) from this field. LC0 is one example, but no doubt there will be plenty more NNs designed to run within LC0 and plenty more other NNs built to run in an author's own brewed up framework.

But to answer your question. No. It wouldn't be reasonable. You certainly haven't reverse engineered his mind. Bob Dylan songs draw on all manner of linkages to the world and worldviews and histories that are undeclared in the texts and only understandable by assuming or sharing context. Your neural net would basically be a joke.

However, if a miracle happened, and you did manage the impossible to create some entity that was able to emulate Bob Dylan by somehow capturing all Bob Dylan output and intelligently connecting it to the rest of Bob Dylan's world, finding all contexts, (something nobody has been able to do ever by the way, and not for lack of trying) then it would remain a joke, because it won't be him. And we'll know it. The output of this imposter won't be copyright Bob Dylan anyway.

As to the rights to or of the data you train on. There are none. Anyone is free to read copyrighted books, look at copyrighted paintings, absorb general principles therefrom and then "rearrange" the world in any way they want into a "work", without the "original data" or its copyright owner having any sayso at all. Obviously. Else how would we ever do anything?

This is just not a good example. The Bob Dylan system interacts with and is contained within an infinitely large and complex universe. Chess is contained within the context of an 8x8 board. Just not comparable.

Hopefully back to chess and chess neural nets.

Milos · Post by **Milos** » Mon Aug 13, 2018 5:00 pm

hgm wrote: ↑Mon Aug 13, 2018 3:28 pm This is why I wanted to add main lines of problem solutions, and some refutations of plausible non-solutions, rather than just static positions. Or in the case of end-games the entire optimal path to mate.

For the matter of copyrights it would be better to focus on another example. Say I make a program to generate syntactically correct song lyrics, and equip it with a NN to guide the choice of words. Now I train that NN by feeding it all the songs written by Bob Dylan. And, lo and behold, it then starts to generate world-class poetry with a high market value.

Although the selection I made was purely functional ("written by Bob Dylan"), the set of training examples was obvious copyrighted (by Bob). So would Bob Dylan now be able to claim copyrights on the NN weigths?

I think it would be reasonable if he could. After all, I reverse-engineered his mind. Shouldn't he have copyrights on his own mind? Or should anyone be free to clone his musical genius if they can get enough access to it (through his published songs)?

A more tricky question is what should be the legal status of the songs generated by the NN. I think the most reasonable solution would be to declare the NN that I make already to be an copy (of Dylan's mind, which is the original), so that the copyright holder (Dylan) can dictate conditions on its use.

This is all fine no matter how hypothetical, but it is totally different case than with Lc0 NN. There Lc0 produces chess games. Does Kasparov hold copyright on his chess games?
And to be even more ludicrous. Does SF team hold copyright on self-played SF games?
If you believe answer to any of those questions is yes, I guess there is not much point discussing.

hgm · Post by **hgm** » Mon Aug 13, 2018 5:21 pm

But of course no one here thinks that. The main question, though, was not whether the output of an NN is copyrighted (although in cases where the output can be copyrighted this is certainly an interesting issue), but whether the NN itself can be protected by copyrights.

Another relevant example:
I write a C program

Code: Select all

#include <stdio.h>
main() { FILE *f=fopen("/dev/printer", "w"); fprintf(f, "..."); }

where ... is a beautiful poem, conceived by myself. I then run the program to print it. Does the printed version of the poem now qualify for copyrights?

Suppose I show the program to someone else, and he copies and runs it on his own computer. Who now has copyrights on the poem?

Would this answer change if I released the program under the GPL first?

chrisw · Post by **chrisw** » Mon Aug 13, 2018 6:04 pm

Milos wrote: ↑Mon Aug 13, 2018 5:00 pm
hgm wrote: ↑Mon Aug 13, 2018 3:28 pm This is why I wanted to add main lines of problem solutions, and some refutations of plausible non-solutions, rather than just static positions. Or in the case of end-games the entire optimal path to mate.

For the matter of copyrights it would be better to focus on another example. Say I make a program to generate syntactically correct song lyrics, and equip it with a NN to guide the choice of words. Now I train that NN by feeding it all the songs written by Bob Dylan. And, lo and behold, it then starts to generate world-class poetry with a high market value.

Although the selection I made was purely functional ("written by Bob Dylan"), the set of training examples was obvious copyrighted (by Bob). So would Bob Dylan now be able to claim copyrights on the NN weigths?

I think it would be reasonable if he could. After all, I reverse-engineered his mind. Shouldn't he have copyrights on his own mind? Or should anyone be free to clone his musical genius if they can get enough access to it (through his published songs)?

A more tricky question is what should be the legal status of the songs generated by the NN. I think the most reasonable solution would be to declare the NN that I make already to be an copy (of Dylan's mind, which is the original), so that the copyright holder (Dylan) can dictate conditions on its use.
This is all fine no matter how hypothetical, but it is totally different case than with Lc0 NN. There Lc0 produces chess games. Does Kasparov hold copyright on his chess games?
And to be even more ludicrous. Does SF team hold copyright on self-played SF games?
If you believe answer to any of those questions is yes, I guess there is not much point discussing.

ah well! maybe some chess games, or collections of chess games.

It's generally thought that a chess game has no copyright, but I think the argument for that view depends on the game being played by two opposing players. Not entirely clear that a self-play game produced by a machine, or a collection of such games, could not be claimed copyright by the machine programmer, especially if he was also the operator. Not saying it is, but ...

Fulvio · Post by **Fulvio** » Mon Aug 13, 2018 6:37 pm

hgm wrote: ↑Mon Aug 13, 2018 2:20 pm The question is whether there do exist cases where it does matter what training examples are selected, and how these would be judged by existing law.

The training set is a database:
https://en.wikipedia.org/wiki/Database_Directive
and you have to acquire the rights for data mining:
https://en.wikipedia.org/wiki/Data_mini ... n_Europe_2

But after that using the data to train a NN is just an elaborate analysis and the copyright of the database will not apply to the NN.

Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP

Re: Copyright and Machine Learning IP