How a court detects a derivative work
Posted: Sat Aug 30, 2008 4:58 am
In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.
Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.
The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.
The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.
To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.
The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.
But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.
There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!
So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.
And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.
// Christophe
Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.
The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.
The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.
To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.
The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.
But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.
There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!
So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.
And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.
// Christophe