How a court detects a derivative work

michiguel · Post by **michiguel** » Sat Aug 30, 2008 10:36 pm

Gerd Isenberg wrote:
bob wrote: Of course it isn't suspicious. We all know that if you put enough monkeys in a room with enough computers, one of them would eventually write a comparison that violates good programming practice (nobody uses == in floating point comparison, I'll leave it as an exercise to the reader to find out why). So this proves nothing, as we now know.

To compare for equality (not the case here) with zero might be an exception, since binary representation of IEEE 754-1985 float and int is the same, also for IEEE 754-1985 double and long long. Some compiler may use the one or the other instruction set for comparison with zero. One may force it on source level with weird unions of int/float or u64/double.

Is it possible that the disassembler interpreted that binary representation as 0.0 when in the source was 0x0?

Miguel

Zach Wegner · Post by **Zach Wegner** » Sat Aug 30, 2008 10:43 pm

Gerd Isenberg wrote:
bob wrote: Of course it isn't suspicious. We all know that if you put enough monkeys in a room with enough computers, one of them would eventually write a comparison that violates good programming practice (nobody uses == in floating point comparison, I'll leave it as an exercise to the reader to find out why). So this proves nothing, as we now know.

To compare for equality (not the case here) with zero might be an exception, since binary representation of IEEE 754-1985 float and int is the same, also for IEEE 754-1985 double and long long. Some compiler may use the one or the other instruction set for comparison with zero. One may force it on source level with weird unions of int/float or u64/double.

Gerd,

This is the assembly. Clearly float instructions. I don't think this would be the result of compiler optimizations though??

Code: Select all

.text&#58;004097E6                 fild    &#91;esp+2Ch+movetime&#93;
.text&#58;004097EA                 fcomp   ds&#58;dbl_6623D0
.text&#58;004097F0                 fnstsw  ax
.text&#58;004097F2                 test    ah, 41h
.text&#58;004097F5                 jnz     short loc_40980E

And for the record, the comparison is >= 0.0, not == 0.0.

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Aug 30, 2008 10:51 pm

michiguel wrote:
Gerd Isenberg wrote:
bob wrote: Of course it isn't suspicious. We all know that if you put enough monkeys in a room with enough computers, one of them would eventually write a comparison that violates good programming practice (nobody uses == in floating point comparison, I'll leave it as an exercise to the reader to find out why). So this proves nothing, as we now know.

To compare for equality (not the case here) with zero might be an exception, since binary representation of IEEE 754-1985 float and int is the same, also for IEEE 754-1985 double and long long. Some compiler may use the one or the other instruction set for comparison with zero. One may force it on source level with weird unions of int/float or u64/double.
Is it possible that the disassembler interpreted that binary representation as 0.0 when in the source was 0x0?

Miguel

I would guess dependent on the assembly instructions, x87 fcom (0.0) versus cmp (0). Since the sign bit semantic and bit-layout of signed int and float is also the same, < 0, >= 0 may be used equivalently with both instruction sets, forced on source level by a int/float union.

Code: Select all

union &#123;int i; float f;&#125; x;
x.i = intExpression;
if ( x.f < 0.0 )

is semantically equivalent to

Code: Select all

if ( x.i < 0 )

but safes some flags...

Whether this is likely, is another question...

tiger · Post by **tiger** » Sat Aug 30, 2008 10:51 pm

chrisw wrote:Actually I have not criticised the work of ther reverse engineering artist specifically. Rather I pointed out that in the absence of the symbol table any attempt to recreate the original source is full of problems, that the work is art not science and the lables used in the recreated 'source' are entirely the creation of the artist.

for example to say

"int timer;" in one source is equivalent to "int timer;" in another recreated source is highly dubious, since the reverse engineer artist will have chosen the word "timer" himself in the second source. How do we know it isn't actually "int plydepth", for example?

Basically, any recreated source is going to be full of text put in by the artist. Why do we believe the text is as he says, when he has simply guessed at the names? This recreated source is made to look and appear more similar than it actually is.

In the quoted comparison:

45 infinite = false; infinite = 0;
46 ponder = false; ponder = 0;
47 movestogo = -1; movestogo = 25;
48 winc = -1.0; winc = 0;
49 wtime = -1.0; wtime = 0;
50 binc = -1.0; binc = 0;
51 btime = -1.0; btime = 0;
52 movetime = -1.0; movetime = 0;

well, apart from loading the alleged identical code (is it, you sure?) variables with *different* values (-1 not equal to 25 is it?) who says the second column listed variable names are as written? Or even do the same thing?

why is 2nd column binc actually binc? Because the artist wrote it so, that's why.

I suggest you're trying to influence non-programmers and lays with creative art masquerading as fact.

N'est ce pas?

tiger wrote:In a post buried in the middle of the discussions about the possibility that Rybka 1.0 is a derivative work of Fruit 2.1, Chris Whittington has criticized the work done by the reverse-engineerer, claiming that he is just an "artist" inventing variable names and the like. I assume this was supposed to tear down any attempt to compare the source code of program A (say Fruit 2.1) with the disassembly of program B (say Rybka 1.0). The word "artist" is probably supposed to reduce the scientific credibility of the process.

Chris Whittington speaks with authority of what the courts would do, but unfortunately he has not done his homework. He does not know how the courts work through such cases.

The courts do not have to compare the source codes. Which means that a copyright or GPL infringement can be detected even if no source code, neither from program A or program B, is actually available. Only the object codes (the "executables") are really required.

The goal of the analysis conducted by the courts in order to determine if one program is a derivative work of the other is to find out if the ideas used by a program (which are not protectable) are expressed in the same way in both. What is protected, ultimately, is the "expression" of the ideas.

To achieve this, the courts compare the semantics of both programs in order to determine if the ideas and algorithms have been expressed in the same way. The data structures are also taken into account in the process.

The semantics of a program can be expressed in a number of ways. With logical graphs, in plain english or in pseudocode for example.

But in order to compare the semantics of two programs, an effective way can be... to disassemble one and reconstruct its source code with the goal of making it as similar as possible to the source code of the other program without touching the semantics.

There are a number of consequences of this process:
- variable and function names have absolutely no importance
- comments, blank lines, spaces... have no importance
- the order of the instructions can be changed as long as the program is semantically untouched
- even the programming language has no importance!

So the process of disassembling a program and reconstructing the source code so it is as close as possible to the source code of another program is a perfectly valid process and it is actually used by the legal system.

And something that has not really been discussed so far: the data structures used by a program are placed at the same level than the semantics. The similarity of data structures can contribute to evaluate the similarity of two programs.

// Christophe

Sorry but all you are doing now is called obstruction.

You only goal seems to block any progress that could let the discussion advance.

I think one does not have to be a programmer to understand your tactics. I hope so. And the fact that you are a programmer taints seriously your attempt: you know what we are talking about but you present it as if it was incorrect in the hope that people lacking the expertise will believe you. You are deceiving people on purpose just to defend your point.

I believe people who have some expertise in a field should not use it to deceive those who do not have it. And I think that is what you are doing.

The courts do not want to be stopped by syntax differences or names differences, because as Bob pointed out several times, even a student could change the syntax or variable names in order to hide plagiarism.

Imagine you have read a good book about a guy called Neo, a girl called Trinity and a man called Morpheus. You really believe that you won't get caught if you publish the story as your own after changing the names to John, Sarah and Peter?

If in your story I can replace John by Neo, Sarah by Trinity and Peter by Morpheus and now I have exactly the same story as in the original book, you're done. So it is perfectly correct to proceed by names substitutions and see if it gets us closer to an existing work.

Now if you publish a story called Starwars that takes place in space with characters called Luke Skywalker, Dark Vador and Yoda, there is no way I will be able to get anywhere by substituting Neo, Trinity and Morpheus to your names.

I hope these examples can speak to everybody and show that your argument is just obstruction.

// Christophe

tiger · Post by **tiger** » Sat Aug 30, 2008 10:52 pm

chrisw wrote:Actually I have not criticised the work of ther reverse engineering artist specifically. Rather I pointed out that in the absence of the symbol table any attempt to recreate the original source is full of problems, that the work is art not science and the lables used in the recreated 'source' are entirely the creation of the artist.

for example to say

"int timer;" in one source is equivalent to "int timer;" in another recreated source is highly dubious, since the reverse engineer artist will have chosen the word "timer" himself in the second source. How do we know it isn't actually "int plydepth", for example?

Basically, any recreated source is going to be full of text put in by the artist. Why do we believe the text is as he says, when he has simply guessed at the names? This recreated source is made to look and appear more similar than it actually is.

In the quoted comparison:

45 infinite = false; infinite = 0;
46 ponder = false; ponder = 0;
47 movestogo = -1; movestogo = 25;
48 winc = -1.0; winc = 0;
49 wtime = -1.0; wtime = 0;
50 binc = -1.0; binc = 0;
51 btime = -1.0; btime = 0;
52 movetime = -1.0; movetime = 0;

well, apart from loading the alleged identical code (is it, you sure?) variables with *different* values (-1 not equal to 25 is it?) who says the second column listed variable names are as written? Or even do the same thing?

why is 2nd column binc actually binc? Because the artist wrote it so, that's why.

I suggest you're trying to influence non-programmers and lays with creative art masquerading as fact.

N'est ce pas?

Sorry but all you are doing now is called obstruction.

You only goal seems to block any progress that could allow the discussion to advance either towards a yes or a no, or anything between.

I think one does not have to be a programmer to understand your tactics. I hope so. And the fact that you are a programmer taints seriously your attempt: you know what we are talking about but you present it as if it was incorrect in the hope that people lacking the expertise will believe you. You are deceiving people on purpose just to defend your point.

I believe people who have some expertise in a field should not use it to deceive those who do not have it. And I think that is what you are doing.

The courts do not want to be stopped by syntax differences or names differences, because as Bob pointed out several times, even a student could change the syntax or variable names in order to hide plagiarism.

Imagine you have read a good book about a guy called Neo, a girl called Trinity and a man called Morpheus. You really believe that you won't get caught if you publish the story as your own after changing the names to John, Sarah and Peter?

If in your story I can replace John by Neo, Sarah by Trinity and Peter by Morpheus and now I have exactly the same story as in the original book, you're done. So it is perfectly correct to proceed by names substitutions and see if it gets us closer to an existing work.

Now if you publish a story called Starwars that takes place in space with characters called Luke Skywalker, Dark Vador and Yoda, there is no way I will be able to get anywhere by substituting Neo, Trinity and Morpheus to your names.

I hope these examples can speak to everybody and show that your argument is just obstruction.

// Christophe

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Aug 30, 2008 11:12 pm

Gerd Isenberg wrote: I would guess dependent on the assembly instructions, x87 fcom (0.0) versus cmp (0). Since the sign bit semantic and bit-layout of signed int and float is also the same, < 0, >= 0 may be used equivalently with both instruction sets, forced on source level by a int/float union.

oups, not exactly, since 0x80000000 (-0) as double pattern is not less 0.0. Need to check some NAN pattern as well. Xor 0x80000000 is quite common to multiply floats by -1 with SEE floats, not sure about x87.

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Aug 30, 2008 11:43 pm

Zach Wegner wrote:
Gerd Isenberg wrote:
bob wrote: Of course it isn't suspicious. We all know that if you put enough monkeys in a room with enough computers, one of them would eventually write a comparison that violates good programming practice (nobody uses == in floating point comparison, I'll leave it as an exercise to the reader to find out why). So this proves nothing, as we now know.

To compare for equality (not the case here) with zero might be an exception, since binary representation of IEEE 754-1985 float and int is the same, also for IEEE 754-1985 double and long long. Some compiler may use the one or the other instruction set for comparison with zero. One may force it on source level with weird unions of int/float or u64/double.
Gerd,

This is the assembly. Clearly float instructions. I don't think this would be the result of compiler optimizations though??
Code: Select all
.text&#58;004097E6                 fild    &#91;esp+2Ch+movetime&#93;
.text&#58;004097EA                 fcomp   ds&#58;dbl_6623D0
.text&#58;004097F0                 fnstsw  ax
.text&#58;004097F2                 test    ah, 41h
.text&#58;004097F5                 jnz     short loc_40980E
And for the record, the comparison is >= 0.0, not == 0.0.

You are right, Zach. I checked some sample it with msvc 6.0

Code: Select all

int foo&#40;)
&#123;
	union &#123;int i; float f;&#125; x; 
	int n = 0;
	for &#40;x.i = 1; x.i; x.i++)
//		if ( x.i >= 0 ) n++;
		if ( x.i >= 0.0 ) n++; // equivalent but weird x87 opcode
//		if ( x.f >= 0.0 ) n++; // not equivalent
	return n;
&#125;

int main&#40;int argc, char* argv&#91;&#93;)
&#123;
	printf&#40;"%d\n", foo&#40;) );
	return 0;
&#125;

?foo@@YAHXZ PROC NEAR					; foo, COMDAT
; Line 7
  00000	51		 push	 ecx
; Line 9
  00001	33 c9		 xor	 ecx, ecx
; Line 10
  00003	c7 44 24 00 01
	00 00 00	 mov	 DWORD PTR _x$&#91;esp+4&#93;, 1
$L591&#58;
; Line 11
  0000b	db 44 24 00	 fild	 DWORD PTR _x$&#91;esp+4&#93;
  0000f	dc 1d 00 00 00
	00		 fcomp	 QWORD PTR __real@0000000000000000
  00015	df e0		 fnstsw	 ax
  00017	25 00 01 00 00	 and	 eax, 256		; 00000100H
  0001c	75 01		 jne	 SHORT $L592
  0001e	41		 inc	 ecx
$L592&#58;
; Line 10
  0001f	8b 44 24 00	 mov	 eax, DWORD PTR _x$&#91;esp+4&#93;
  00023	40		 inc	 eax
  00024	89 44 24 00	 mov	 DWORD PTR _x$&#91;esp+4&#93;, eax
  00028	75 e1		 jne	 SHORT $L591
; Line 13
  0002a	8b c1		 mov	 eax, ecx
; Line 14
  0002c	59		 pop	 ecx
  0002d	c3		 ret	 0
?foo@@YAHXZ ENDP					; foo
_TEXT	ENDS

Same x87 instructions you mentioned. Puhh, strong point.

bob · Post by **bob** » Sat Aug 30, 2008 11:43 pm

Gerd Isenberg wrote:
bob wrote: Of course it isn't suspicious. We all know that if you put enough monkeys in a room with enough computers, one of them would eventually write a comparison that violates good programming practice (nobody uses == in floating point comparison, I'll leave it as an exercise to the reader to find out why). So this proves nothing, as we now know.

To compare for equality (not the case here) with zero might be an exception, since binary representation of IEEE 754-1985 float and int is the same, also for IEEE 754-1985 double and long long. Some compiler may use the one or the other instruction set for comparison with zero. One may force it on source level with weird unions of int/float or u64/double.

The general rule is _never_ compare for equality. Normally you do this:

if (abs (float1 - float2) < .000001) (close enough to zero).

I don't believe _any_ compiler will turn "if (a = 0)" to "if (a = 0.0)" and do an fcomp as opposed to a simple tst or jz if the value was just computed.

bob · Post by **bob** » Sat Aug 30, 2008 11:46 pm

Gerd Isenberg wrote:
michiguel wrote:
Gerd Isenberg wrote:
bob wrote: Of course it isn't suspicious. We all know that if you put enough monkeys in a room with enough computers, one of them would eventually write a comparison that violates good programming practice (nobody uses == in floating point comparison, I'll leave it as an exercise to the reader to find out why). So this proves nothing, as we now know.

To compare for equality (not the case here) with zero might be an exception, since binary representation of IEEE 754-1985 float and int is the same, also for IEEE 754-1985 double and long long. Some compiler may use the one or the other instruction set for comparison with zero. One may force it on source level with weird unions of int/float or u64/double.
Is it possible that the disassembler interpreted that binary representation as 0.0 when in the source was 0x0?

Miguel
I would guess dependent on the assembly instructions, x87 fcom (0.0) versus cmp (0). Since the sign bit semantic and bit-layout of signed int and float is also the same, < 0, >= 0 may be used equivalently with both instruction sets, forced on source level by a int/float union.
Code: Select all
union &#123;int i; float f;&#125; x;
x.i = intExpression;
if ( x.f < 0.0 ) 
is semantically equivalent to
Code: Select all
if ( x.i < 0 ) 
but safes some flags...

Whether this is likely, is another question...

This test will be problematic for the smallest possible negative number which is not "negative zero" which is an interpretation of 0x80000000" that will blow this up. Again, not that such a number is very likely... but mixing them seems problematic anyway.

edit: now I see you caught that as well. I responded after the first post, then read the second.

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Aug 30, 2008 11:57 pm

bob wrote:
Gerd Isenberg wrote:
bob wrote: Of course it isn't suspicious. We all know that if you put enough monkeys in a room with enough computers, one of them would eventually write a comparison that violates good programming practice (nobody uses == in floating point comparison, I'll leave it as an exercise to the reader to find out why). So this proves nothing, as we now know.

To compare for equality (not the case here) with zero might be an exception, since binary representation of IEEE 754-1985 float and int is the same, also for IEEE 754-1985 double and long long. Some compiler may use the one or the other instruction set for comparison with zero. One may force it on source level with weird unions of int/float or u64/double.
The general rule is _never_ compare for equality. Normally you do this:

if (abs (float1 - float2) < .000001) (close enough to zero).

I don't believe _any_ compiler will turn "if (a = 0)" to "if (a = 0.0)" and do an fcomp as opposed to a simple tst or jz if the value was just computed.

Sure, it was about the binary representation of zero and semantically equalatity of interpreting int == 0x000 as float == 0x000. I was wrong through, due to the -0:

Code: Select all

int foo&#40;)
&#123;
	union &#123;int i; float f;&#125; x; 
	int n = 0;
	for &#40;x.i = 1; x.i; x.i++)
//		if ( x.i == 0.0 ) n++;
		if ( x.f == 0.0 ) n++; // not equivaent, since 0x80000000 == 0.0 as well
	return n;
&#125;

How a court detects a derivative work

Re: How a court detects a derivative work

Re: How a court detects a derivative work

Re: How a court detects a derivative work

Re: How a court detects a derivative work

Re: How a court detects a derivative work

Re: How a court detects a derivative work

Re: How a court detects a derivative work

Re: How a court detects a derivative work

Re: How a court detects a derivative work

Re: How a court detects a derivative work