Bob Hyatt says that....

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Bob Hyatt says that....

Post by michiguel »

trojanfoe wrote:
michiguel wrote: The probabilities of two things being derivative of the same depends on the % of similarity *AND* the length. You give the example of a novel, which is very large (several thousands of words). That is a fallacy.
For instance, in biochemistry, If you have two protein sequences of 100 amino acids, 28% similarity is not enough to claim an evolutionary or structural relationship. With DNA, is even worse. If the length increases, you may need a lower % to claim it.
The properties of amino acids and DNA have nothing at all to do with this subject - they are irrelevant. There are plenty of things on the planet that contain 28% of another thing but could have statements made about them being unrelated.
michiguel wrote: If we both write a bubble sort routine, I am pretty sure that it will be very easy to find 28% of similar lines (particularly after compiling and disassembly!),.
Another poor example - what's a bubble sort - 6 lines of code as most?
michiguel wrote: It will be a different story if 28% is present in a whole program of 100000 lines.
Agreed.
It is not a poor example. Both are extreme examples. You accept one extreme and not the other? The case in point lays in between. My main point is to say that Bob example of the novel is a dialectic trick, a fallacy. That is fact because the % of similarity is not enough without knowing the length.

Miguel
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Bob Hyatt says that....

Post by michiguel »

bob wrote:
michiguel wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
fern wrote:...the argument that points how many programs or even all use the same algorythms is irrelevant as much they can be writen in so many different ways. So, he add, the reasonning that programs share lot of stuff, as Fabian said, would be not valid.
Ok. Then, if it is so and surely must be because, after all, Bob Hyatt and none other said that, if really the line of code and how was writen is the core of the issue, then let the attackers of Rybka originality show us specific lines of code that are equal to those from fruit.
Of course I wonder how they will do such a thing as much I presume the Rybka lines of code are not easily accesible.

Wondering regards
fern
What is being done is to take the executable, and run in thru a disassembler which produces the assembly language code the compiler produced when the source was compiled. An experienced programmer can then take that assembly language code and reconstruct the C source code it came from.

inc i => i++; in C for example.

It takes time because the optimizer in the compiler re-orders instructions to make them run as efficiently as possible, so the human reverse-engineer has to undo all of that...
Perhaps you should also point out that when the executable code is compiled from the source in the first place, enormous amounts of information are thrown away, so to re-generate the source from an executable requires creativity and massive amounts of creative guesswork from the reverse engineer. Better to call him reverse creative artist actually.

Your final alleged C source, rehashed from the executable is, shall we say, open to question.

Who tells you the label names, for example? Oh whoops, Reverse creative artist calls them himself, whatever he wants to call them, and so on.
Perhaps for you. Not for me. I can't reconstruct variable names. But I can figure out what code is doing and then deduce reasonable names, given enough interest to do so.
Yeah, yeah, yeah, Bob.

Only there won't be deduced "reasonable names", there will be deduced deliberately, names that cirrespond to names in the target program, to make it seem there are more correspondences than there really might be.

That's the creativity.

Do you deny the use of identical names as target program wherever possible will be used? No, of course you don't. Because already that is what has happened in the disassemblys we've been presented with.

Creative creation of variable and function names, deliberately to match the target.

Let's be quite honest to all the lays trying to follow this, shall we?
I will say it again. If the instructions match exactly, and I then copy the names from one program and use them in the other on the same instructions, and everything matches and works properly, and suddenly the two programs appear to be identical, is that "creating" evidence or "discovering" evidence.
28% appears "identical"?

28% lines up and matches "exactly"?

Variable and function names deliberately chosen to match?

Hahahahahaha

Maybe in Creative Art 101.

I thought this was computer science.
I believe it is the twilight zone, but there's no point in arguing about it...

What would be fun, however, is for you to write a novel, and copy 28% of the text from (say) Tom Clancy. And then see how long you laugh. Be sure to tell the judge "it is only 28%, not the whole damned thing. Shoot, your honor, I changed 3 words on page 3..."
The probabilities of two things being derivative of the same depends on the % of similarity *AND* the length. You give the example of a novel, which is very large (several thousands of words). That is a fallacy.
For instance, in biochemistry, If you have two protein sequences of 100 amino acids, 28% similarity is not enough to claim an evolutionary or structural relationship. With DNA, is even worse. If the length increases, you may need a lower % to claim it.

If we both write a bubble sort routine, I am pretty sure that it will be very easy to find 28% of similar lines (particularly after compiling and disassembly!),. It will be a different story if 28% is present in a whole program of 100000 lines.

Miguel
I believe your last statement is what I have been saying all along. Although I think the probability of even 10% match is very low when you look at blocks. And no, I don't look at individual lines as meaning anything. But common structure (procedures, functions they provide), data structures, semantical programming structures. All are revealing.
If you believe my statement is what you are saying all along, why do you come up with the novel example? So you know that is an exaggerated argument and you use it anyway?

Miguel
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Bob Hyatt says that....

Post by Zach Wegner »

michiguel wrote:
trojanfoe wrote:
michiguel wrote: The probabilities of two things being derivative of the same depends on the % of similarity *AND* the length. You give the example of a novel, which is very large (several thousands of words). That is a fallacy.
For instance, in biochemistry, If you have two protein sequences of 100 amino acids, 28% similarity is not enough to claim an evolutionary or structural relationship. With DNA, is even worse. If the length increases, you may need a lower % to claim it.
The properties of amino acids and DNA have nothing at all to do with this subject - they are irrelevant. There are plenty of things on the planet that contain 28% of another thing but could have statements made about them being unrelated.
michiguel wrote: If we both write a bubble sort routine, I am pretty sure that it will be very easy to find 28% of similar lines (particularly after compiling and disassembly!),.
Another poor example - what's a bubble sort - 6 lines of code as most?
michiguel wrote: It will be a different story if 28% is present in a whole program of 100000 lines.
Agreed.
It is not a poor example. Both are extreme examples. You accept one extreme and not the other? The case in point lays in between. My main point is to say that Bob example of the novel is a dialectic trick, a fallacy. That is fact because the % of similarity is not enough without knowing the length.

Miguel
Trying to assign a percentage of similarity in the first place is a fallacy. It's a very subjective measure based on decompilation, and even then its based on code lines.

What's more interesting to me is what is needed to change the code. Convert the time to int, remove error checking and a couple of "useless" options (mate and searchmoves), convert all the variable initializations (some variables in Fruit initialized as 0 are -1 in Rybka), and a few things with the time control and they are basically identical.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Bob Hyatt says that....

Post by michiguel »

bob wrote:
trojanfoe wrote:
michiguel wrote:
Alexander Schmidt wrote:
fern wrote: show us specific lines of code that are equal to those from fruit.
http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html
Is this all?

My goodness. I stayed away from this riot but I thought there was more about it.
I can see a similarity that is easily explained in this manner: Vas looked at the code and said, "I see a loop with strtok to parse UCI" and he implemented it that way. Of course, UCI is "universal" and both codes will converge towards those commands. Several lines are mixed, so it does not make any sense to copy and paste a code and later mix it up for no purpose. No reason to obfuscate this... it is easier to rewrite it from scratch!

I see zero proof of violation of GPL from this file.

Miguel
Why would Vas look at the code for a UCI parser? It's a pretty simple specification - I don't understand why he would need to see someone else's code. I recently implemented a UCI parser myself and never used strtok at all - but then it's my style not to use strtok much anyway. It's strange that Vas's style is to use strtok in the same way someone else has. Hmmmm.

There is more than zero proof here...
This is a hopeless discussion.. Some are _not_ going to be convinced, because of whatever personal reasons they have. Evidence be damned. Discussions be damned. It is just not going to change their opinion or adamant refusal to accept any kind of evidence, period.

The best that can be hoped for is that enough questions are raised, enough evidence is produced, that eventually the question has to be answered in a way that is consistent with all the facts, whatever they are by then. Might be nothing to it. Might be a lot to it. But the detractors are not going to help in any way so it is just wasted bandwidth. I know, I know, I've been just as guilty of wasting bandwidth. But I'm 60 years old now so we can blame that on old age. :)
I stayed away from this discussion. I contributed only with that message. You answered to someone that answered me with a patronizing attitude. Indirectly, you are referring to me as someone that who damns evidence and discussion. You may be referring to the rest of the planet, maybe, but my name is there. This is rude. I better stop right now.

For the record, I could not care less about the result of this particular case. I just realized that I stuck my nose in a street fight that is not mine. My mistake.

Good luck in this crusade, to both sides.
Miguel
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: Bob Hyatt says that....

Post by tiger »

chrisw wrote:
Uri Blass wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
Alexander Schmidt wrote:
fern wrote: show us specific lines of code that are equal to those from fruit.
http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html
This is some kind of joke?!

The 'code' contains 200 lines, many of which are blank, ignoring those, there are:

33 lines same
81 lines different

that's a 28% correspondence. Very funny joke.

You have no source of Rybka, so the variable names are guesswork, btw.

Given that the code chucks are doing the same thing, I find 81 different lines to 33 same completely reasonable for programs written by two different people.
Please look again. "33 lines the same". One of us can't count. I stopped at 50. If two lines of C are on the same line they are equivalent.

At least don't try to distort what is being presented. That code is absoilutely _not_ independently written.
50? my goodness me, that's a lot out of 114. Not even half. And many of the equivalances rely on creative naming of variables and functions to, guess what, be the same!

Less than than half of your only disassembled code block so far? Very funny joke, Bob. Hahahaha

This is the famous identical corresponding code blocks is it? The famous 4000 lines of Christophe?

That code could perfectly well be independently written.

Are you going to try and get Vas's source code revealed at icga by this method? Hmmm?
No way to know names of variables or functions so here is analysis by me
when I ignore empty lines of rybka and I compare only lines of rybka(not that I do not think that it proves copying and I may have few mistakes because I did not check for errors).

http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html

same:lines 1,2,4,5,45,46,54,56,57,62,63,66,68,69,72,74,75,79,96-98,100,109,112-113,116,123-126,130-132,135,137-138,141-142,158-166,176-177,181,183,200,202,206,208,210,213-214,216-218(61 identical lines)

same at different place:23,25,27,29-31,33-37,39,103-104,106-107,144(17 that are claimed to be identical lines)
not same but same meaning:7,8,61,67,73,86-89,111,136,157,179-180
,182,190,194,196,198,201,207(21 equivalent lines)
almost the same 193
different value 47-52,65,71,115,134,140
not same 14,16,17,18,78,80-84,184,204

We get at most 100 equivalent lines and have no big blocks that are identical.(the biggest block that can be considered as equivalent is 141-183 and I can see there many empty lines that are not counted.

Note that
{
} are considered as identical lines and by that logic I can easily find many identical lines between every 2 programs.

Uri
Uri,

Where do I start here?!

1. is the function name, chosen by the decompiler. He chose it to be the same. 1. is no way allowable as a comparison.

2. is a brace character, absolutely forced after the function name, no choice at all, it HAS to be there. No way allowable as a comparison.

Just for starters.

Did you include all other brace characters? They're forced whenever they appear.

There's lines in Fruit that are NOT in the Rybka disassembly. Line 70 for example as well as many others. Line 70 you've not mentioned, different. Line 73 not mentioned, different. Line 76 not mentioned, different.

Line 74 is a brace, mentioned as identical, but braces are forced. Shouldn't be included in identical list.

169, 170 not mentioned, different

I could go on ....


Pointless.

The courts compare the semantics of both programs, not the spelling of the source code, variable names or function names.



// Christophe
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Bob Hyatt says that....

Post by michiguel »

Zach Wegner wrote:
michiguel wrote:
trojanfoe wrote:
michiguel wrote: The probabilities of two things being derivative of the same depends on the % of similarity *AND* the length. You give the example of a novel, which is very large (several thousands of words). That is a fallacy.
For instance, in biochemistry, If you have two protein sequences of 100 amino acids, 28% similarity is not enough to claim an evolutionary or structural relationship. With DNA, is even worse. If the length increases, you may need a lower % to claim it.
The properties of amino acids and DNA have nothing at all to do with this subject - they are irrelevant. There are plenty of things on the planet that contain 28% of another thing but could have statements made about them being unrelated.
michiguel wrote: If we both write a bubble sort routine, I am pretty sure that it will be very easy to find 28% of similar lines (particularly after compiling and disassembly!),.
Another poor example - what's a bubble sort - 6 lines of code as most?
michiguel wrote: It will be a different story if 28% is present in a whole program of 100000 lines.
Agreed.
It is not a poor example. Both are extreme examples. You accept one extreme and not the other? The case in point lays in between. My main point is to say that Bob example of the novel is a dialectic trick, a fallacy. That is fact because the % of similarity is not enough without knowing the length.

Miguel
Trying to assign a percentage of similarity in the first place is a fallacy. It's a very subjective measure based on decompilation, and even then its based on code lines.

What's more interesting to me is what is needed to change the code. Convert the time to int, remove error checking and a couple of "useless" options (mate and searchmoves), convert all the variable initializations (some variables in Fruit initialized as 0 are -1 in Rybka), and a few things with the time control and they are basically identical.
Good point. That is called parsimony and can be quantified too. Has anybody ever done it? I do not see it in that file. That depends on the length of the information too. How many steps do you need to make both codes identical?

Let's get two routines with a clear purpose (i.e. quicksort, but may be too shor) of similar length written independently by two programmers after they have seen an example (to mimick this case) and compare those numbers.

This analysis is common in evolutionary biology and the quick look is many times misleading.

I wlll be winding down.
Miguel
Karmazen & Oliver
Posts: 374
Joined: Sat Mar 10, 2007 12:34 am

Re: Bob Hyatt says that....

Post by Karmazen & Oliver »

tiger wrote:
chrisw wrote:
Uri Blass wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
Alexander Schmidt wrote:
fern wrote: show us specific lines of code that are equal to those from fruit.
http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html
This is some kind of joke?!

The 'code' contains 200 lines, many of which are blank, ignoring those, there are:

33 lines same
81 lines different

that's a 28% correspondence. Very funny joke.

You have no source of Rybka, so the variable names are guesswork, btw.

Given that the code chucks are doing the same thing, I find 81 different lines to 33 same completely reasonable for programs written by two different people.
Please look again. "33 lines the same". One of us can't count. I stopped at 50. If two lines of C are on the same line they are equivalent.

At least don't try to distort what is being presented. That code is absoilutely _not_ independently written.
50? my goodness me, that's a lot out of 114. Not even half. And many of the equivalances rely on creative naming of variables and functions to, guess what, be the same!

Less than than half of your only disassembled code block so far? Very funny joke, Bob. Hahahaha

This is the famous identical corresponding code blocks is it? The famous 4000 lines of Christophe?

That code could perfectly well be independently written.

Are you going to try and get Vas's source code revealed at icga by this method? Hmmm?
No way to know names of variables or functions so here is analysis by me
when I ignore empty lines of rybka and I compare only lines of rybka(not that I do not think that it proves copying and I may have few mistakes because I did not check for errors).

http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html

same:lines 1,2,4,5,45,46,54,56,57,62,63,66,68,69,72,74,75,79,96-98,100,109,112-113,116,123-126,130-132,135,137-138,141-142,158-166,176-177,181,183,200,202,206,208,210,213-214,216-218(61 identical lines)

same at different place:23,25,27,29-31,33-37,39,103-104,106-107,144(17 that are claimed to be identical lines)
not same but same meaning:7,8,61,67,73,86-89,111,136,157,179-180
,182,190,194,196,198,201,207(21 equivalent lines)
almost the same 193
different value 47-52,65,71,115,134,140
not same 14,16,17,18,78,80-84,184,204

We get at most 100 equivalent lines and have no big blocks that are identical.(the biggest block that can be considered as equivalent is 141-183 and I can see there many empty lines that are not counted.

Note that
{
} are considered as identical lines and by that logic I can easily find many identical lines between every 2 programs.

Uri
Uri,

Where do I start here?!

1. is the function name, chosen by the decompiler. He chose it to be the same. 1. is no way allowable as a comparison.

2. is a brace character, absolutely forced after the function name, no choice at all, it HAS to be there. No way allowable as a comparison.

Just for starters.

Did you include all other brace characters? They're forced whenever they appear.

There's lines in Fruit that are NOT in the Rybka disassembly. Line 70 for example as well as many others. Line 70 you've not mentioned, different. Line 73 not mentioned, different. Line 76 not mentioned, different.

Line 74 is a brace, mentioned as identical, but braces are forced. Shouldn't be included in identical list.

169, 170 not mentioned, different

I could go on ....

Pointless.

The courts compare the semantics of both programs, not the spelling of the source code, variable names or function names.



// Christophe
that calls you abstract programming, programmers usually understand only that the different names in variables don't imply that the programming algorithm is the same one...

and the algorithm = you devise... ; -) is this the most important thing... ??? of course... YES.

bye. from spain.

postdate: and ? the question is if program streka go the same way ?¡ what is it the problem... is all "cracked" ideas of fruit... ?? on cascade ? fruit-ryba-streka? who will be the next ... ? and where are the limits of ideas program ?
uff...
User avatar
Mike S.
Posts: 1480
Joined: Thu Mar 09, 2006 5:33 am

Re: Bob Hyatt says that....

Post by Mike S. »

Dann Corbit wrote: The following Crafty file names are exact matches:
attacks.c
hash.c
init.c
list.c
make.c
moves.c
next.c
phase.c
ponder.c
search.c
searchr.c
test.c
time.c
utility.c
It is still as absurd as it was, then. I mean, what are programmers talking about all the time, regarding ANY chess engine? Am I right that it are things like hash, moves, ponder, search...? If that's enough for a clone suspicion, than every engine is suspicious to be a copy & paste clone of every other enigne.

It's like taking a famous novel, selecting some frequent words form everyday talk which will be found in almost every book, and based on that claim that it is a copy of another novel. Because both contain words like "mother", "husband", "house", and maybe "airplane". By that criteria, almost any novel would be suspicious to be a clone of any other novel. How intelligent is that? It is moronic.

Nevertheless, the unscrupulous, reckless ICGA disqualified a participant who was not present on site, based on accusations of that ridiculous quality. I almost exploded about that excess of incompetence and injustice. In contrast, F.Reul did not care much about it and almost never wrote anything in a public forum. Appearantly he kept very cool about it and continued to do his things, not caring for internet message board noise. Enviable.
Regards, Mike
Uri Blass
Posts: 10280
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Bob Hyatt says that....

Post by Uri Blass »

Mike S. wrote:
Dann Corbit wrote: The following Crafty file names are exact matches:
attacks.c
hash.c
init.c
list.c
make.c
moves.c
next.c
phase.c
ponder.c
search.c
searchr.c
test.c
time.c
utility.c
It is still as absurd as it was, then. I mean, what are programmers talking about all the time, regarding ANY chess engine? Am I right that it are things like hash, moves, ponder, search...? If that's enough for a clone suspicion, than every engine is suspicious to be a copy & paste clone of every other enigne.

It's like taking a famous novel, selecting some frequent words form everyday talk which will be found in almost every book, and based on that claim that it is a copy of another novel. Because both contain words like "mother", "husband", "house", and maybe "airplane". By that criteria, almost any novel would be suspicious to be a clone of any other novel. How intelligent is that? It is moronic.

Nevertheless, the unscrupulous, reckless ICGA disqualified a participant who was not present on site, based on accusations of that ridiculous quality. I almost exploded about that excess of incompetence and injustice. In contrast, F.Reul did not care much about it and almost never wrote anything in a public forum. Appearantly he kept very cool about it and continued to do his things, not caring for internet message board noise. Enviable.
I agree that names of variables are proof for nothing but
I do not support list because
the ponder hit tables suggests that list and fruit derivatives are very similiar see

http://www.computerchess.org.uk/ccrl/40 ... es+only%29

I do not think that it can be an accident that list is more similiar to fruit relative to other engines.
If I remove engines with less than 500 moves then
engines never have more than 70% ponder hit with except loop-fruit.

1 Loop 13.6 32-bit – Fruit 051103 75.8 777
2 Loop 13.6 32-bit – Fruit 2.2.1 75.8 1186
3 Loop 10.32f – Fruit 2.2.1 75.6 1018
6 Loop M1-T 64-bit 4CPU – Toga II 1.3.1 73.4 1306
7 Loop M1-T 64-bit 2CPU – Toga II 1.3.1 72.7 1532
9 Toga II 1.3.1 – Loop 13.6 32-bit 70.8 960
11 Toga II 1.4 beta5c 4CPU – Loop M1-T 64-bit 4CPU 70.0 1358
12 Loop 13.6 64-bit 4CPU – Toga II 1.2.1a 70.0 1914
15 Loop M1-T 64-bit 4CPU – Toga II 1.2.1a 69.7 1582


Uri
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: Bob Hyatt says that....

Post by tiger »

chrisw wrote:
GenoM wrote:
chrisw wrote:
GenoM wrote:
Alexander Schmidt wrote:Chris, I guess I know what you are doing right now. You talked to Vas, asking about all this stuff here. He is a really nice guy and you believed him that he didn't do anything wrong.

I believed him too when I asked right after the Rybka 1.0 release.

I hope you will not get disapointed to much one day. I mean that honestly.
Yes, his POV is mainly based on personal feelings -- it seems that for him is most important to contradict with all Bob Hyatt is saying on the matter. And to challenge EVERY point Bob Hyatt made.

He (ChrisW) is ready to deny that 2+2=4 written by two different people is the same because it's wriiten with different handwritings.
No, no!! I say 33 out of 114 is 28% and is different hand ;-)
Writing 2+2=4 is is not wrong, even if it was cribbed in class room :-)
More often 2+2 makes 5 - it's what to look for

as in 100% identical = 28% identical

4000 corresponding identical code lines = metaphorical 4000 code lines

they will provide more examples, it's only a matter of time .....


Where did I wrote that there were "4000 corresponding identical code lines"?

At least until now your reasoning was just stubborn like "one line means nothing, let's go to the next, loop". Now you are making things up.



// Christophe