How a court detects a derivative work

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

How a court detects a derivative work

Post by tiger »

In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.

Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.

The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.

The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.

To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.

The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.

But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.

There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!

So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.

And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.



// Christophe
Karmazen & Oliver
Posts: 374
Joined: Sat Mar 10, 2007 12:34 am

Re: How a court detects a derivative work

Post by Karmazen & Oliver »

tiger wrote:In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.

Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.

The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.


The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.

To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.

The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.

But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.

There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!

So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.

And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.


// Christophe
then, is it necesary that program B (say Rybka 1.0)... give free code & GLP licence... ¿ and the next versions ( B3.0) don´t need that ?.... or YES..

the text blue is very importan...

postdate: and ? the question is if program streka go the same way ?¡ what is it the problem... is all "cracked" ideas of fruit... ?? on cascade ? fruit-ryba-streka? who will be the next ... ? and where are the limits of ideas program ... ?

uff...
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: How a court detects a derivative work

Post by tiger »

Karmazen & Oliver wrote:
tiger wrote:In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.

Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.

The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.


The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.

To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.

The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.

But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.

There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!

So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.

And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.


// Christophe
then, is it necesary that program B (say Rybka 1.0)... give free code & GLP licence... ¿ and the next versions ( B3.0) don´t need that ?.... or YES..

the text blue is very importan...


The FSF, which now holds the copyright on Fruit 2.1 prefers to resolve GPL violations in a friendly way. They ask the author who has breached the GPL to either release the source code under GPL, or to simply remove all parts that constitute a derivative work from the infringing program.

If Rybka 1.0 is found to be a derivative work of Fruit 2.1, then I guess Vas could simply rewrite the offending parts of the program and release it again, for example as "Rybka 1.0 GPL-free edition".

Remember: The discussion is only about Rybka 1.0. The FSF is not after the source code of infringing programs. They just want the GPL to be respected.

Now breathe. :-)



// Christophe
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: How a court detects a derivative work

Post by Albert Silver »

tiger wrote:If Rybka 1.0 is found to be a derivative work of Fruit 2.1, then I guess Vas could simply rewrite the offending parts of the program and release it again, for example as "Rybka 1.0 GPL-free edition".
You are aware that Rybka 1.0 is no longer being distributed, right? It has been replaced by Rybka 2.2n as the free demo version of Rybka.

Albert
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
Michael J Fitch
Posts: 124
Joined: Tue Mar 20, 2007 6:04 am
Location: Hattiesburg,Mississippi

Re: How a court detects a derivative work

Post by Michael J Fitch »

tiger wrote:
Karmazen & Oliver wrote:
tiger wrote:In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.

Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.

The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.


The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.

To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.

The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.

But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.

There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!

So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.

And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.


// Christophe
then, is it necesary that program B (say Rybka 1.0)... give free code & GLP licence... ¿ and the next versions ( B3.0) don´t need that ?.... or YES..

the text blue is very importan...



The FSF, which now holds the copyright on Fruit 2.1 prefers to resolve GPL violations in a friendly way. They ask the author who has breached the GPL to either release the source code under GPL, or to simply remove all parts that constitute a derivative work from the infringing program.

-------------------------------------------------------------------------------------

((( Being new here, may i ask you a few questions?

1. Who is FSF?
2. Did the FSF say Vas breached the GPL or is it your assumption?
3. If FSF has facts that Vas breached the GPL, why is it taking so long to resolve this assumed infraction?
4. When will you and chris kiss and make up? :P :D :lol: :wink:

-------------------------------------------------------------------------------------
If Rybka 1.0 is found to be a derivative work of Fruit 2.1, then I guess Vas could simply rewrite the offending parts of the program and release it again, for example as "Rybka 1.0 GPL-free edition".

Remember: The discussion is only about Rybka 1.0. The FSF is not after the source code of infringing programs. They just want the GPL to be respected.

-------------------------------------------------------------------------------------

tiger wrote:
If Rybka 1.0 is found to be a derivative work of Fruit 2.1, then I guess Vas could simply rewrite the offending parts of the program and release it again, for example as "Rybka 1.0 GPL-free edition".


You are aware that Rybka 1.0 is no longer being distributed, right? It has been replaced by Rybka 2.2n as the free demo version of Rybka.

Albert
_________________
"Patience means restraining one's inclinations. There are seven emotions: joy, anger, anxiety, love, grief, fear, and hate, and if a man does not give way to these he can be called patient."

Tokugawa Ieyasu (1543-1616)

-------------------------------------------------------------------------------------
Now breathe. :-)


:lol: :lol: :lol:



// Christophe
Where ever you go, there you are!!
chrisw

Re: How a court detects a derivative work

Post by chrisw »

Actually I have not criticised the work of ther reverse engineering artist specifically. Rather I pointed out that in the absence of the symbol table any attempt to recreate the original source is full of problems, that the work is art not science and the lables used in the recreated 'source' are entirely the creation of the artist.

for example to say

"int timer;" in one source is equivalent to "int timer;" in another recreated source is highly dubious, since the reverse engineer artist will have chosen the word "timer" himself in the second source. How do we know it isn't actually "int plydepth", for example?

Basically, any recreated source is going to be full of text put in by the artist. Why do we believe the text is as he says, when he has simply guessed at the names? This recreated source is made to look and appear more similar than it actually is.

In the quoted comparison:

45 infinite = false; infinite = 0;
46 ponder = false; ponder = 0;
47 movestogo = -1; movestogo = 25;
48 winc = -1.0; winc = 0;
49 wtime = -1.0; wtime = 0;
50 binc = -1.0; binc = 0;
51 btime = -1.0; btime = 0;
52 movetime = -1.0; movetime = 0;

well, apart from loading the alleged identical code (is it, you sure?) variables with *different* values (-1 not equal to 25 is it?) who says the second column listed variable names are as written? Or even do the same thing?

why is 2nd column binc actually binc? Because the artist wrote it so, that's why.

I suggest you're trying to influence non-programmers and lays with creative art masquerading as fact.

N'est ce pas?




tiger wrote:In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.

Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.

The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.

The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.

To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.

The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.

But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.

There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!

So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.

And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.



// Christophe
henkf

Re: How a court detects a derivative work

Post by henkf »

Hmm, in order to 'end this all' couldn't you ask Vas to make a Rybka 1.0 compile with the symbol table in place for investigation, so everybody can check that Rybka 1.0 is clean as a whistle? I can understand that Vas is reluctant to show his sources, but wouldn't this amount to the same result without giving anything away?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How a court detects a derivative work

Post by bob »

chrisw wrote:Actually I have not criticised the work of ther reverse engineering artist specifically. Rather I pointed out that in the absence of the symbol table any attempt to recreate the original source is full of problems, that the work is art not science and the lables used in the recreated 'source' are entirely the creation of the artist.

for example to say

"int timer;" in one source is equivalent to "int timer;" in another recreated source is highly dubious, since the reverse engineer artist will have chosen the word "timer" himself in the second source. How do we know it isn't actually "int plydepth", for example?

Basically, any recreated source is going to be full of text put in by the artist. Why do we believe the text is as he says, when he has simply guessed at the names? This recreated source is made to look and appear more similar than it actually is.

In the quoted comparison:

45 infinite = false; infinite = 0;
46 ponder = false; ponder = 0;
47 movestogo = -1; movestogo = 25;
48 winc = -1.0; winc = 0;
49 wtime = -1.0; wtime = 0;
50 binc = -1.0; binc = 0;
51 btime = -1.0; btime = 0;
52 movetime = -1.0; movetime = 0;

well, apart from loading the alleged identical code (is it, you sure?) variables with *different* values (-1 not equal to 25 is it?) who says the second column listed variable names are as written? Or even do the same thing?

why is 2nd column binc actually binc? Because the artist wrote it so, that's why.

I suggest you're trying to influence non-programmers and lays with creative art masquerading as fact.

N'est ce pas?




tiger wrote:In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.

Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.

The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.

The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.

To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.

The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.

But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.

There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!

So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.

And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.



// Christophe
While that is all well and good, the problem is you are saying that the basic "cheater" in programming assignments is perfectly safe. Because the first level of change a student will make is to change the variable names. Then the comments. And if they are still nervous, they change things syntactically, such as loop structures, altering the order of instructions when possible, etc.

As far as artist goes, this is not "art" which is creative by definition. This is no more art than the first attempt at deciphering writings in the great pyramids. It is just a lot of work. Variable names are meaningless and are not even a part of the executable, ditto for procedure names, structure names, etc.

Somehow you refuse to accept this plain fact... that if I have an assembly language program A, and I want to compare it to B, that I can do the following:

(1) verify that both function normally.

(2) change the veriable names in A so that they match B (presumably undoing what the original copier did.

(3) verify that the changed program still works identically to the way it did before the variable name substitutions were done. That is simple and absolute proof that changing the variable names changed nothing semantically, which then proves that this is not "creative" but rather "factual / scientific" in the way it was done. I don't see why you refuse to accept that this is not just possible, but is done every day somewhere. It is not misleading. It is not hyperbole. And something tells me that you actually know that...
chrisw

Re: How a court detects a derivative work

Post by chrisw »

bob wrote:
chrisw wrote:Actually I have not criticised the work of ther reverse engineering artist specifically. Rather I pointed out that in the absence of the symbol table any attempt to recreate the original source is full of problems, that the work is art not science and the lables used in the recreated 'source' are entirely the creation of the artist.

for example to say

"int timer;" in one source is equivalent to "int timer;" in another recreated source is highly dubious, since the reverse engineer artist will have chosen the word "timer" himself in the second source. How do we know it isn't actually "int plydepth", for example?

Basically, any recreated source is going to be full of text put in by the artist. Why do we believe the text is as he says, when he has simply guessed at the names? This recreated source is made to look and appear more similar than it actually is.

In the quoted comparison:

45 infinite = false; infinite = 0;
46 ponder = false; ponder = 0;
47 movestogo = -1; movestogo = 25;
48 winc = -1.0; winc = 0;
49 wtime = -1.0; wtime = 0;
50 binc = -1.0; binc = 0;
51 btime = -1.0; btime = 0;
52 movetime = -1.0; movetime = 0;

well, apart from loading the alleged identical code (is it, you sure?) variables with *different* values (-1 not equal to 25 is it?) who says the second column listed variable names are as written? Or even do the same thing?

why is 2nd column binc actually binc? Because the artist wrote it so, that's why.

I suggest you're trying to influence non-programmers and lays with creative art masquerading as fact.

N'est ce pas?




tiger wrote:In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.

Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.

The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.

The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.

To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.

The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.

But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.

There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!

So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.

And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.



// Christophe
While that is all well and good, the problem is you are saying that the basic "cheater" in programming assignments is perfectly safe. Because the first level of change a student will make is to change the variable names. Then the comments. And if they are still nervous, they change things syntactically, such as loop structures, altering the order of instructions when possible, etc.

As far as artist goes, this is not "art" which is creative by definition. This is no more art than the first attempt at deciphering writings in the great pyramids. It is just a lot of work. Variable names are meaningless and are not even a part of the executable, ditto for procedure names, structure names, etc.

Somehow you refuse to accept this plain fact... that if I have an assembly language program A, and I want to compare it to B, that I can do the following:

(1) verify that both function normally.

(2) change the veriable names in A so that they match B (presumably undoing what the original copier did.

(3) verify that the changed program still works identically to the way it did before the variable name substitutions were done. That is simple and absolute proof that changing the variable names changed nothing semantically, which then proves that this is not "creative" but rather "factual / scientific" in the way it was done. I don't see why you refuse to accept that this is not just possible, but is done every day somewhere. It is not misleading. It is not hyperbole. And something tells me that you actually know that...
Hehe!! The change subject obfuscation - very good ;-)

My concern expressed is the laying side by side of one source with another source derived from disassembly, and then claiming they correspond.

1. They correspond 28% only

2. The variable names correspond only because the reverse artist named them to correspond. Hence the correspondence is actually way less than 28%, because the apparent correspondence is creatively inspired.

My concern is that that published listing comparison which apparently forms a major part of your evidence looks more similar than it actually is, unless one has an understanding how it is produced. I seek to provide that understanding to lays and non-programmers.

All your above blah-blah is about something else and not relevent to the expressed concern. Why even append it to the post? Obfuscation?

Now, to go to your completely disconnected points ...

Even those make little sense. You claim rewriting the labels to some other form of words changes nothing. Absololutely. Programming 101. So what? Nor does it prove anything about any program, copying, similarity or anything else.

Your side has still not presented any identical program fragments, let along blocks of anything. What's been represented is different. Unless that is that 28% == 100% in C++ witchhunters variant language.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: How a court detects a derivative work

Post by bob »

chrisw wrote:
bob wrote:
chrisw wrote:Actually I have not criticised the work of ther reverse engineering artist specifically. Rather I pointed out that in the absence of the symbol table any attempt to recreate the original source is full of problems, that the work is art not science and the lables used in the recreated 'source' are entirely the creation of the artist.

for example to say

"int timer;" in one source is equivalent to "int timer;" in another recreated source is highly dubious, since the reverse engineer artist will have chosen the word "timer" himself in the second source. How do we know it isn't actually "int plydepth", for example?

Basically, any recreated source is going to be full of text put in by the artist. Why do we believe the text is as he says, when he has simply guessed at the names? This recreated source is made to look and appear more similar than it actually is.

In the quoted comparison:

45 infinite = false; infinite = 0;
46 ponder = false; ponder = 0;
47 movestogo = -1; movestogo = 25;
48 winc = -1.0; winc = 0;
49 wtime = -1.0; wtime = 0;
50 binc = -1.0; binc = 0;
51 btime = -1.0; btime = 0;
52 movetime = -1.0; movetime = 0;

well, apart from loading the alleged identical code (is it, you sure?) variables with *different* values (-1 not equal to 25 is it?) who says the second column listed variable names are as written? Or even do the same thing?

why is 2nd column binc actually binc? Because the artist wrote it so, that's why.

I suggest you're trying to influence non-programmers and lays with creative art masquerading as fact.

N'est ce pas?




tiger wrote:In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.

Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.

The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.

The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.

To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.

The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.

But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.

There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!

So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.

And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.



// Christophe
While that is all well and good, the problem is you are saying that the basic "cheater" in programming assignments is perfectly safe. Because the first level of change a student will make is to change the variable names. Then the comments. And if they are still nervous, they change things syntactically, such as loop structures, altering the order of instructions when possible, etc.

As far as artist goes, this is not "art" which is creative by definition. This is no more art than the first attempt at deciphering writings in the great pyramids. It is just a lot of work. Variable names are meaningless and are not even a part of the executable, ditto for procedure names, structure names, etc.

Somehow you refuse to accept this plain fact... that if I have an assembly language program A, and I want to compare it to B, that I can do the following:

(1) verify that both function normally.

(2) change the veriable names in A so that they match B (presumably undoing what the original copier did.

(3) verify that the changed program still works identically to the way it did before the variable name substitutions were done. That is simple and absolute proof that changing the variable names changed nothing semantically, which then proves that this is not "creative" but rather "factual / scientific" in the way it was done. I don't see why you refuse to accept that this is not just possible, but is done every day somewhere. It is not misleading. It is not hyperbole. And something tells me that you actually know that...
Hehe!! The change subject obfuscation - very good ;-)

My concern expressed is the laying side by side of one source with another source derived from disassembly, and then claiming they correspond.

1. They correspond 28% only

2. The variable names correspond only because the reverse artist named them to correspond. Hence the correspondence is actually way less than 28%, because the apparent correspondence is creatively inspired.

My concern is that that published listing comparison which apparently forms a major part of your evidence looks more similar than it actually is, unless one has an understanding how it is produced. I seek to provide that understanding to lays and non-programmers.

All your above blah-blah is about something else and not relevent to the expressed concern. Why even append it to the post? Obfuscation?

Now, to go to your completely disconnected points ...

Even those make little sense. You claim rewriting the labels to some other form of words changes nothing. Absololutely. Programming 101. So what? Nor does it prove anything about any program, copying, similarity or anything else.

Your side has still not presented any identical program fragments, let along blocks of anything. What's been represented is different. Unless that is that 28% == 100% in C++ witchhunters variant language.
My response was made to show that your point 2 is absolutely irrelevant. The variable name issue is just absolute and utter nonsense. You know that. I know that. I know that you know that. You are just trying to argue points around the main issue in an attempt to divert attention from the simple fact that in 200 lines of code, finding 50 identical lines is absolutely unexpected, when the lines show major control decisions the odds go down even farther. You talk like 28% is tiny. When you know perfectly well that it is absolutely unexpected. As the "novel" discussion pointed out. Just copy 28% of a Clancy novel and see what happens. Just copy 10%. You might just as well argue "oh, you flipped a coin and got heads 1,000 times in a row? So what, statistics show that such a thing is bound to happen if you flip the coin enough times." But the probability of it happening on the _first_ 1000 flips is not so hot.