Bob Hyatt says that....

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Bob Hyatt says that....

Post by michiguel »

Alexander Schmidt wrote:
fern wrote: show us specific lines of code that are equal to those from fruit.
http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html
Is this all?

My goodness. I stayed away from this riot but I thought there was more about it.
I can see a similarity that is easily explained in this manner: Vas looked at the code and said, "I see a loop with strtok to parse UCI" and he implemented it that way. Of course, UCI is "universal" and both codes will converge towards those commands. Several lines are mixed, so it does not make any sense to copy and paste a code and later mix it up for no purpose. No reason to obfuscate this... it is easier to rewrite it from scratch!

I see zero proof of violation of GPL from this file.

Miguel
chrisw

Re: Bob Hyatt says that....

Post by chrisw »

michiguel wrote:
Alexander Schmidt wrote:
fern wrote: show us specific lines of code that are equal to those from fruit.
http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html
Is this all?

My goodness. I stayed away from this riot but I thought there was about it.
I can see a similarity that is easily explained in this manner: Vas looked at the code and said, "I see a loop with strtok to parse UCI" and he implemented it that way. Of course, UCI is "universal" and both codes will converge towards those commands. Several lines are mixed, so it does not make any sense to copy and paste a code and later mix it up for no purpose. No reason to obfuscate this... it is easier to rewrite it from scratch!

I see zero proof of violation of GPL from this file.

Miguel
Precisely. They are showing a chunk of code in two programs where the code chunks do the same thing, namely pass parameters to the main engine. There are obviously going to be similarities because they do the same thing (no choice, it has to be done to speak to the interface), but it is no way cut 'n paste code, it has been written separately by two different people for two different programs.
chrisw

Re: Bob Hyatt says that....

Post by chrisw »

Uri Blass wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
Alexander Schmidt wrote:
fern wrote: show us specific lines of code that are equal to those from fruit.
http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html
This is some kind of joke?!

The 'code' contains 200 lines, many of which are blank, ignoring those, there are:

33 lines same
81 lines different

that's a 28% correspondence. Very funny joke.

You have no source of Rybka, so the variable names are guesswork, btw.

Given that the code chucks are doing the same thing, I find 81 different lines to 33 same completely reasonable for programs written by two different people.
Please look again. "33 lines the same". One of us can't count. I stopped at 50. If two lines of C are on the same line they are equivalent.

At least don't try to distort what is being presented. That code is absoilutely _not_ independently written.
50? my goodness me, that's a lot out of 114. Not even half. And many of the equivalances rely on creative naming of variables and functions to, guess what, be the same!

Less than than half of your only disassembled code block so far? Very funny joke, Bob. Hahahaha

This is the famous identical corresponding code blocks is it? The famous 4000 lines of Christophe?

That code could perfectly well be independently written.

Are you going to try and get Vas's source code revealed at icga by this method? Hmmm?
No way to know names of variables or functions so here is analysis by me
when I ignore empty lines of rybka and I compare only lines of rybka(not that I do not think that it proves copying and I may have few mistakes because I did not check for errors).

http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html

same:lines 1,2,4,5,45,46,54,56,57,62,63,66,68,69,72,74,75,79,96-98,100,109,112-113,116,123-126,130-132,135,137-138,141-142,158-166,176-177,181,183,200,202,206,208,210,213-214,216-218(61 identical lines)

same at different place:23,25,27,29-31,33-37,39,103-104,106-107,144(17 that are claimed to be identical lines)
not same but same meaning:7,8,61,67,73,86-89,111,136,157,179-180
,182,190,194,196,198,201,207(21 equivalent lines)
almost the same 193
different value 47-52,65,71,115,134,140
not same 14,16,17,18,78,80-84,184,204

We get at most 100 equivalent lines and have no big blocks that are identical.(the biggest block that can be considered as equivalent is 141-183 and I can see there many empty lines that are not counted.

Note that
{
} are considered as identical lines and by that logic I can easily find many identical lines between every 2 programs.

Uri
Uri,

Where do I start here?!

1. is the function name, chosen by the decompiler. He chose it to be the same. 1. is no way allowable as a comparison.

2. is a brace character, absolutely forced after the function name, no choice at all, it HAS to be there. No way allowable as a comparison.

Just for starters.

Did you include all other brace characters? They're forced whenever they appear.

There's lines in Fruit that are NOT in the Rybka disassembly. Line 70 for example as well as many others. Line 70 you've not mentioned, different. Line 73 not mentioned, different. Line 76 not mentioned, different.

Line 74 is a brace, mentioned as identical, but braces are forced. Shouldn't be included in identical list.

169, 170 not mentioned, different

I could go on ....
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Bob Hyatt says that....

Post by bob »

chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
fern wrote:...the argument that points how many programs or even all use the same algorythms is irrelevant as much they can be writen in so many different ways. So, he add, the reasonning that programs share lot of stuff, as Fabian said, would be not valid.
Ok. Then, if it is so and surely must be because, after all, Bob Hyatt and none other said that, if really the line of code and how was writen is the core of the issue, then let the attackers of Rybka originality show us specific lines of code that are equal to those from fruit.
Of course I wonder how they will do such a thing as much I presume the Rybka lines of code are not easily accesible.

Wondering regards
fern
What is being done is to take the executable, and run in thru a disassembler which produces the assembly language code the compiler produced when the source was compiled. An experienced programmer can then take that assembly language code and reconstruct the C source code it came from.

inc i => i++; in C for example.

It takes time because the optimizer in the compiler re-orders instructions to make them run as efficiently as possible, so the human reverse-engineer has to undo all of that...
Perhaps you should also point out that when the executable code is compiled from the source in the first place, enormous amounts of information are thrown away, so to re-generate the source from an executable requires creativity and massive amounts of creative guesswork from the reverse engineer. Better to call him reverse creative artist actually.

Your final alleged C source, rehashed from the executable is, shall we say, open to question.

Who tells you the label names, for example? Oh whoops, Reverse creative artist calls them himself, whatever he wants to call them, and so on.
Perhaps for you. Not for me. I can't reconstruct variable names. But I can figure out what code is doing and then deduce reasonable names, given enough interest to do so.
Yeah, yeah, yeah, Bob.

Only there won't be deduced "reasonable names", there will be deduced deliberately, names that cirrespond to names in the target program, to make it seem there are more correspondences than there really might be.

That's the creativity.

Do you deny the use of identical names as target program wherever possible will be used? No, of course you don't. Because already that is what has happened in the disassemblys we've been presented with.

Creative creation of variable and function names, deliberately to match the target.

Let's be quite honest to all the lays trying to follow this, shall we?
I will say it again. If the instructions match exactly, and I then copy the names from one program and use them in the other on the same instructions, and everything matches and works properly, and suddenly the two programs appear to be identical, is that "creating" evidence or "discovering" evidence.
28% appears "identical"?

28% lines up and matches "exactly"?

Variable and function names deliberately chosen to match?

Hahahahahaha

Maybe in Creative Art 101.

I thought this was computer science.
I believe it is the twilight zone, but there's no point in arguing about it...

What would be fun, however, is for you to write a novel, and copy 28% of the text from (say) Tom Clancy. And then see how long you laugh. Be sure to tell the judge "it is only 28%, not the whole damned thing. Shoot, your honor, I changed 3 words on page 3... The name of _my_ submarine is Laramie, his is Cheyenne. Even if they are in the same US state, they are different cities..."
Uri Blass
Posts: 10844
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Bob Hyatt says that....

Post by Uri Blass »

chrisw wrote:
Uri Blass wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
Alexander Schmidt wrote:
fern wrote: show us specific lines of code that are equal to those from fruit.
http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html
This is some kind of joke?!

The 'code' contains 200 lines, many of which are blank, ignoring those, there are:

33 lines same
81 lines different

that's a 28% correspondence. Very funny joke.

You have no source of Rybka, so the variable names are guesswork, btw.

Given that the code chucks are doing the same thing, I find 81 different lines to 33 same completely reasonable for programs written by two different people.
Please look again. "33 lines the same". One of us can't count. I stopped at 50. If two lines of C are on the same line they are equivalent.

At least don't try to distort what is being presented. That code is absoilutely _not_ independently written.
50? my goodness me, that's a lot out of 114. Not even half. And many of the equivalances rely on creative naming of variables and functions to, guess what, be the same!

Less than than half of your only disassembled code block so far? Very funny joke, Bob. Hahahaha

This is the famous identical corresponding code blocks is it? The famous 4000 lines of Christophe?

That code could perfectly well be independently written.

Are you going to try and get Vas's source code revealed at icga by this method? Hmmm?
No way to know names of variables or functions so here is analysis by me
when I ignore empty lines of rybka and I compare only lines of rybka(not that I do not think that it proves copying and I may have few mistakes because I did not check for errors).

http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html

same:lines 1,2,4,5,45,46,54,56,57,62,63,66,68,69,72,74,75,79,96-98,100,109,112-113,116,123-126,130-132,135,137-138,141-142,158-166,176-177,181,183,200,202,206,208,210,213-214,216-218(61 identical lines)

same at different place:23,25,27,29-31,33-37,39,103-104,106-107,144(17 that are claimed to be identical lines)
not same but same meaning:7,8,61,67,73,86-89,111,136,157,179-180
,182,190,194,196,198,201,207(21 equivalent lines)
almost the same 193
different value 47-52,65,71,115,134,140
not same 14,16,17,18,78,80-84,184,204

We get at most 100 equivalent lines and have no big blocks that are identical.(the biggest block that can be considered as equivalent is 141-183 and I can see there many empty lines that are not counted.

Note that
{
} are considered as identical lines and by that logic I can easily find many identical lines between every 2 programs.

Uri
Uri,

Where do I start here?!

1. is the function name, chosen by the decompiler. He chose it to be the same. 1. is no way allowable as a comparison.

2. is a brace character, absolutely forced after the function name, no choice at all, it HAS to be there. No way allowable as a comparison.
I agree that the data does not prove violation of the GPL

The question is basically how many lines are the same after changing names of functions and variables(I do not think that the number of lines prove copying and I think that we cannot decide only based on counting
the number of lines if the code was copied).

Note only that
there is a choice about the function line because it is possible to have a function that returns int in case of an error instead of void.

Uri
Last edited by Uri Blass on Sat Aug 30, 2008 12:32 am, edited 1 time in total.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Bob Hyatt says that....

Post by bob »

Just for the record, not _all_ braces are forced.

if (a) {
xxx
}

can omit the braces quite nicely. Once again, this has to be taken in context.
chrisw

Re: Bob Hyatt says that....

Post by chrisw »

bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
fern wrote:...the argument that points how many programs or even all use the same algorythms is irrelevant as much they can be writen in so many different ways. So, he add, the reasonning that programs share lot of stuff, as Fabian said, would be not valid.
Ok. Then, if it is so and surely must be because, after all, Bob Hyatt and none other said that, if really the line of code and how was writen is the core of the issue, then let the attackers of Rybka originality show us specific lines of code that are equal to those from fruit.
Of course I wonder how they will do such a thing as much I presume the Rybka lines of code are not easily accesible.

Wondering regards
fern
What is being done is to take the executable, and run in thru a disassembler which produces the assembly language code the compiler produced when the source was compiled. An experienced programmer can then take that assembly language code and reconstruct the C source code it came from.

inc i => i++; in C for example.

It takes time because the optimizer in the compiler re-orders instructions to make them run as efficiently as possible, so the human reverse-engineer has to undo all of that...
Perhaps you should also point out that when the executable code is compiled from the source in the first place, enormous amounts of information are thrown away, so to re-generate the source from an executable requires creativity and massive amounts of creative guesswork from the reverse engineer. Better to call him reverse creative artist actually.

Your final alleged C source, rehashed from the executable is, shall we say, open to question.

Who tells you the label names, for example? Oh whoops, Reverse creative artist calls them himself, whatever he wants to call them, and so on.
Perhaps for you. Not for me. I can't reconstruct variable names. But I can figure out what code is doing and then deduce reasonable names, given enough interest to do so.
Yeah, yeah, yeah, Bob.

Only there won't be deduced "reasonable names", there will be deduced deliberately, names that cirrespond to names in the target program, to make it seem there are more correspondences than there really might be.

That's the creativity.

Do you deny the use of identical names as target program wherever possible will be used? No, of course you don't. Because already that is what has happened in the disassemblys we've been presented with.

Creative creation of variable and function names, deliberately to match the target.

Let's be quite honest to all the lays trying to follow this, shall we?
I will say it again. If the instructions match exactly, and I then copy the names from one program and use them in the other on the same instructions, and everything matches and works properly, and suddenly the two programs appear to be identical, is that "creating" evidence or "discovering" evidence.
28% appears "identical"?

28% lines up and matches "exactly"?

Variable and function names deliberately chosen to match?

Hahahahahaha

Maybe in Creative Art 101.

I thought this was computer science.
I believe it is the twilight zone, but there's no point in arguing about it...

What would be fun, however, is for you to write a novel, and copy 28% of the text from (say) Tom Clancy. And then see how long you laugh. Be sure to tell the judge "it is only 28%, not the whole damned thing. Shoot, your honor, I changed 3 words on page 3..."
Tom is sure to shoot the judge. You changed the novel from text to words. Sure it is only 28%. Laugh be damned.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Bob Hyatt says that....

Post by michiguel »

bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
fern wrote:...the argument that points how many programs or even all use the same algorythms is irrelevant as much they can be writen in so many different ways. So, he add, the reasonning that programs share lot of stuff, as Fabian said, would be not valid.
Ok. Then, if it is so and surely must be because, after all, Bob Hyatt and none other said that, if really the line of code and how was writen is the core of the issue, then let the attackers of Rybka originality show us specific lines of code that are equal to those from fruit.
Of course I wonder how they will do such a thing as much I presume the Rybka lines of code are not easily accesible.

Wondering regards
fern
What is being done is to take the executable, and run in thru a disassembler which produces the assembly language code the compiler produced when the source was compiled. An experienced programmer can then take that assembly language code and reconstruct the C source code it came from.

inc i => i++; in C for example.

It takes time because the optimizer in the compiler re-orders instructions to make them run as efficiently as possible, so the human reverse-engineer has to undo all of that...
Perhaps you should also point out that when the executable code is compiled from the source in the first place, enormous amounts of information are thrown away, so to re-generate the source from an executable requires creativity and massive amounts of creative guesswork from the reverse engineer. Better to call him reverse creative artist actually.

Your final alleged C source, rehashed from the executable is, shall we say, open to question.

Who tells you the label names, for example? Oh whoops, Reverse creative artist calls them himself, whatever he wants to call them, and so on.
Perhaps for you. Not for me. I can't reconstruct variable names. But I can figure out what code is doing and then deduce reasonable names, given enough interest to do so.
Yeah, yeah, yeah, Bob.

Only there won't be deduced "reasonable names", there will be deduced deliberately, names that cirrespond to names in the target program, to make it seem there are more correspondences than there really might be.

That's the creativity.

Do you deny the use of identical names as target program wherever possible will be used? No, of course you don't. Because already that is what has happened in the disassemblys we've been presented with.

Creative creation of variable and function names, deliberately to match the target.

Let's be quite honest to all the lays trying to follow this, shall we?
I will say it again. If the instructions match exactly, and I then copy the names from one program and use them in the other on the same instructions, and everything matches and works properly, and suddenly the two programs appear to be identical, is that "creating" evidence or "discovering" evidence.
28% appears "identical"?

28% lines up and matches "exactly"?

Variable and function names deliberately chosen to match?

Hahahahahaha

Maybe in Creative Art 101.

I thought this was computer science.
I believe it is the twilight zone, but there's no point in arguing about it...

What would be fun, however, is for you to write a novel, and copy 28% of the text from (say) Tom Clancy. And then see how long you laugh. Be sure to tell the judge "it is only 28%, not the whole damned thing. Shoot, your honor, I changed 3 words on page 3..."
The probabilities of two things being derivative of the same depends on the % of similarity *AND* the length. You give the example of a novel, which is very large (several thousands of words). That is a fallacy.
For instance, in biochemistry, If you have two protein sequences of 100 amino acids, 28% similarity is not enough to claim an evolutionary or structural relationship. With DNA, is even worse. If the length increases, you may need a lower % to claim it.

If we both write a bubble sort routine, I am pretty sure that it will be very easy to find 28% of similar lines (particularly after compiling and disassembly!),. It will be a different story if 28% is present in a whole program of 100000 lines.

Miguel
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Bob Hyatt says that....

Post by bob »

kranium wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
Alexander Schmidt wrote:
fern wrote: show us specific lines of code that are equal to those from fruit.
http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html
This is some kind of joke?!

The 'code' contains 200 lines, many of which are blank, ignoring those, there are:

33 lines same
81 lines different

that's a 28% correspondence. Very funny joke.

You have no source of Rybka, so the variable names are guesswork, btw.

Given that the code chucks are doing the same thing, I find 81 different lines to 33 same completely reasonable for programs written by two different people.
Please look again. "33 lines the same". One of us can't count. I stopped at 50. If two lines of C are on the same line they are equivalent.

At least don't try to distort what is being presented. That code is absoilutely _not_ independently written.
50? my goodness me, that's a lot out of 114. Not even half. And many of the equivalances rely on creative naming of variables and functions to, guess what, be the same!

Less than than half of your only disassembled code block so far? Very funny joke, Bob. Hahahaha

This is the famous identical corresponding code blocks is it? The famous 4000 lines of Christophe?

That code could perfectly well be independently written.

Are you going to try and get Vas's source code revealed at icga by this method? Hmmm?

chris-
there's no proof about 'creative' naming either...it just as plausible that it was accurately done.
Most likely, in your alternate reality, student plagiarism would never happen. Could never happen in fact. Because _no_ students have ever been so stupid as to turn in exactly identical codes. They at _least_ know enough to change the author's name at the top, which guarantees no 100% match. Doesn't quite match _my_ definition of plagiarism however.

I, however, have little problem dealing with this problem, the same as many others in fact.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Bob Hyatt says that....

Post by bob »

michiguel wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
fern wrote:...the argument that points how many programs or even all use the same algorythms is irrelevant as much they can be writen in so many different ways. So, he add, the reasonning that programs share lot of stuff, as Fabian said, would be not valid.
Ok. Then, if it is so and surely must be because, after all, Bob Hyatt and none other said that, if really the line of code and how was writen is the core of the issue, then let the attackers of Rybka originality show us specific lines of code that are equal to those from fruit.
Of course I wonder how they will do such a thing as much I presume the Rybka lines of code are not easily accesible.

Wondering regards
fern
What is being done is to take the executable, and run in thru a disassembler which produces the assembly language code the compiler produced when the source was compiled. An experienced programmer can then take that assembly language code and reconstruct the C source code it came from.

inc i => i++; in C for example.

It takes time because the optimizer in the compiler re-orders instructions to make them run as efficiently as possible, so the human reverse-engineer has to undo all of that...
Perhaps you should also point out that when the executable code is compiled from the source in the first place, enormous amounts of information are thrown away, so to re-generate the source from an executable requires creativity and massive amounts of creative guesswork from the reverse engineer. Better to call him reverse creative artist actually.

Your final alleged C source, rehashed from the executable is, shall we say, open to question.

Who tells you the label names, for example? Oh whoops, Reverse creative artist calls them himself, whatever he wants to call them, and so on.
Perhaps for you. Not for me. I can't reconstruct variable names. But I can figure out what code is doing and then deduce reasonable names, given enough interest to do so.
Yeah, yeah, yeah, Bob.

Only there won't be deduced "reasonable names", there will be deduced deliberately, names that cirrespond to names in the target program, to make it seem there are more correspondences than there really might be.

That's the creativity.

Do you deny the use of identical names as target program wherever possible will be used? No, of course you don't. Because already that is what has happened in the disassemblys we've been presented with.

Creative creation of variable and function names, deliberately to match the target.

Let's be quite honest to all the lays trying to follow this, shall we?
I will say it again. If the instructions match exactly, and I then copy the names from one program and use them in the other on the same instructions, and everything matches and works properly, and suddenly the two programs appear to be identical, is that "creating" evidence or "discovering" evidence.
28% appears "identical"?

28% lines up and matches "exactly"?

Variable and function names deliberately chosen to match?

Hahahahahaha

Maybe in Creative Art 101.

I thought this was computer science.
I believe it is the twilight zone, but there's no point in arguing about it...

What would be fun, however, is for you to write a novel, and copy 28% of the text from (say) Tom Clancy. And then see how long you laugh. Be sure to tell the judge "it is only 28%, not the whole damned thing. Shoot, your honor, I changed 3 words on page 3..."
The probabilities of two things being derivative of the same depends on the % of similarity *AND* the length. You give the example of a novel, which is very large (several thousands of words). That is a fallacy.
For instance, in biochemistry, If you have two protein sequences of 100 amino acids, 28% similarity is not enough to claim an evolutionary or structural relationship. With DNA, is even worse. If the length increases, you may need a lower % to claim it.

If we both write a bubble sort routine, I am pretty sure that it will be very easy to find 28% of similar lines (particularly after compiling and disassembly!),. It will be a different story if 28% is present in a whole program of 100000 lines.

Miguel
I believe your last statement is what I have been saying all along. Although I think the probability of even 10% match is very low when you look at blocks. And no, I don't look at individual lines as meaning anything. But common structure (procedures, functions they provide), data structures, semantical programming structures. All are revealing.