Rybka 1.0 vs. Strelka

Uri Blass · Post by **Uri Blass** » Fri Aug 22, 2008 7:26 am

bob wrote:
Uri Blass wrote:
tiger wrote:
Uri Blass wrote:
tiger wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
tiger wrote:Zach is showing code snippets where Rybka 1.0 is actually more similar to Fruit than Strelka 2.0.

A few days ago there was some vocal opposition to the idea that Rybka 1.0 coud be a derived work of Fruit 2.1.

Where is the opposition now?

There are several skilled people ready to explain why many programmers think (without daring to tell it) that Rybka started its life as Fruit 2.1.

The evidence is now being shown factually. Feel free to contradict it factually.

// Christophe
There is a second possibility that rybka started her life with part of fruit but never had the full source.

I know that movei started its life with part of tscp structures and names of variables and constants (but no chess working code)

Uri
Unfortunately that is enough to settle this immediately. "part of fruit" is unacceptable since _all_ of fruit is GPL'ed. This issue is black or white, with no grey at all.

BTW, in some cases development is obvious. You can go back to reg.games.chess.computer circa 1994 november or so, and find posts by me where I was working on a _new_ program (now called Crafty). I started with the move generator and published the source there and got lots of feedback. You can also find discussions about search and evaluation as they were written, not copied. So starting with someone else's code is not a normal development course.
I do not know what is the normal course.

I know some programs that started with the full tscp code like trace
and I am not sure if most chess programs started without code from other programs.

In my case
I started with legal move generator but I used some constants and variables from tscp and also some names of functions.

My move generator never used mailbox that is part of tscp and used some structures of me that are not part of tscp so it is clearly different than tscp move generator.

Uri

Is TSCP protected by the GPL?

It looks like you have acted without even looking at the licence your model has been published under. Read the licence and so you will know if you have infringed on it or not.

If you have, maybe it's not too late to apologize in good faith and to clean your work. That is a more honorable behaviour that the mutism we see from another person.

I see that most of your posts seem tainted by the fear that you have yourself done something wrong. What is legal and what is not is already well defined and will not change because you defend here.

So just clean your stuff first.

// Christophe
I simply responded to Bob hyatt.
Tscp is not protected by the GPL

I think that what is legal and what is not legal is not well defined.

What is derivative work?
If you have one line that you copied then I do not think that you can call it derivative work and you can have one common line also without copying.

What is the minimal number of lines that you copy that your program is defined to be derivative?

Things are not clear.

Uri

The GPL licence is clear:

From section 0: ...a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications...
So there is no minimal number of lines required to start to infringe on the GPL. It starts at one line, or in the case of programs presented on one line (there are such programs), it starts at a few characters.

What's left to be proven is that some part of a source code that is supposed to infringe on the GPL has indeed been taken from a GPL source code and is not an original creation.

This is going to be left at the appreciation of software experts. So if all you have copied is "i++;" I guess you are safe. If you have copied a one hundred lines routine, I think you are not, because the experts are likely to rule that there is no chance that you have come up with exactly the same 100 lines of code by pure luck.

Remember, it does not matter if the routine you have copied is of vital importance in your program or if it is not. It does not even matter if the routines you have copied come from a program that has absolutely nothing to do with yours. For example, you are not allowed to use a routine from a financial program even if you are going to use it in a chess program.

The spirit of the GPL is that some guy gives away his work by generosity. What he asks you in exchange is to do the same if you take all or even just a part of his work to include it in your creations. So if your intention is to keep your work closed, then the least you can do is to respect the will of the guy and not copy ANY part of his work. Hence the relative intolerance against such re-use, even for what you call "unimportant" parts.

It is also the reason why there is no minimal number of identical lines you would be allowed to use. You are not allowed to use this work at all if you do not want to accept the rules of the game, so if there is evidence of re-use of code, as small as it is, you are caught.

It's up to you to decide if you could be caught or not with your current code, knowing the original programs you may have copied in part.

Now if you want to be safe, make at least sure that you do not have 800 identical lines of code that some program protected by the GPL.

It has been shown that at least 800 lines of Strelka 2.0 are identical to lines of Fruit 2.1. Ask an expert what he thinks about it. I don't think the answer will be unclear.

// Christophe
I doubt if 800 is correct

http://64.68.157.89/forum/viewtopic.php?t=23095&start=0

Nobody commented and counted identical stuff and it is not clear how to count identical

I think that the word equivalent is more correct then identical in part of the cases

From the following example 1,4,5,6,8,9 are equivalent and not identical

FRUIT:
static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9

STRELKA:
void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9
sorry but that is garbage. The general usage is identical semantics. Not identical syntax. int a=1 and int alpha=1 are considered the same if you can replace all occurrences of a with alpha in the first and it works the same. Ditto for a "static" modifier which in the case above simply says that procedure can only be called by procedures in the same source file, and doesn't change a single thing semantically.

If your approach were taken, students would _never_ be accused of plagiarism. Even in publishing. I just print your article in Russian and I am home free since those are not "identical".

I did not claim that equivalent is not enough to blame somebody in plagiarism but only that there is a difference between identical and equivalent.

You can have 10 equivalent lines without using copy and paste between 2 different chess playing programs.
The probability for 10 identical lines is clearly smaller.

I know that the number of lines here is clearly bigger than 10 but the point is that we should not say identical for equivalent.

Uri

bob · Post by **bob** » Fri Aug 22, 2008 7:49 am

swami wrote:Note that this is the discussion about Free version of Rybka (Rybka 1.0)

Zach gave me a permission to edit the thread title and and insert "1.0" next to the engine name wherever he mentioned the engine, just to avoid confusion.

I agree with ChrisW that whenever you mention Rybka, It'd help if you include the version number along with it.

I am not personally convinced that it matters. What is the probability that Rybka 3 is vastly different from Rybka2, and R2 vastly different from V1? Most do not do _complete_ rewrites, which means much GPL code, if it was present in R1 will also be present in R3. So this realistically applies to all versions, if it applies to any.

Uri Blass · Post by **Uri Blass** » Fri Aug 22, 2008 7:53 am

bob wrote:
bnemias wrote:
tiger wrote:the generated code is exactly the same in both cases.
I'm no expert in disassembly, but it seems clear to me that for the above statement, ANY source producing identical code is identical for the purposes of comparing actual source with a disassembled binary. The variable names could be different, presumably.

I would be looking at implementation quirks such as comment 7:

Fruit
Code: Select all
static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9  
Strelka
Code: Select all
void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9
Comment 7's implementation detail is far from the obvious choice. It's certainly not a very good practice.
That is one of many tests... is the executable identical? But there are ways to prevent that from happening. For example, doing a sort a bit earlier in the code, rather than exactly where it is needed.

Other student tricks:

for (i=0; i<n; i++) {
...
}

vs

loop_index = 0;
while (loop_index < n) {
...
loop_index++;
}

Those are semantically identical. They are syntatically equivalent. Any decent programmer would pick up on that immediately. And comments don't count since they are not used by the compiler. They are actually the easiest thing to change to try to disguise plagiarism.

The code is equivalent but it does not prove plagiarism unless the task is complicated enough and many non trivial things are like that

If you ask 2 students to write a program that get a number p from the user and find if the number is prime then you can be sure that independent programmers may have the type of similiarity that you give in your example.

Uri

bob · Post by **bob** » Fri Aug 22, 2008 7:54 am

Rolf wrote:
bob wrote:
Rolf wrote:
bob wrote:You are assuming too much. For example, would you file suit against someone that was making a claim that hurt you, if you_knew_ that the claim was true? Because to file the suit, you have to make a sworn statement that the claim is false in order to seek damages. And if the claim is later proven true, you just committed perjury and are now looking at prison time rather than seeking financial redress from someone else. The sword of justice cuts both ways so caution is required.
I could cut out the other stuff because this here already shows your bias. You simply argue always from the position that Vas has done something wrong. I already wrote a message to write my astonishment how experts could be biased. If you at least would find arguments in favor of Vas, just out of principle. Or also in case you knew how something might have no legal relevance.
Has Vas responded in any way about this? How can one find points in his favor if there is nothing but a deafening silence from his side of the table???
I think he can rely on a purely psychological standpoint for the moment. When did you talk to a commercial programmer collegue during the last 50 years? Now that's going too far. You are looking upon everything like the guy with a hammer. Everything looks to him like a nail while hidden nails frighten him. A commercial programmer cannot discuss what he does, Bob, he lets his program speak. He's in chess what you are on ICC in Bullet. Simply the best!

Sorry, but that boat won't float. When I was accused of cheating several years ago, about something that happened back in 1986. I chose to not sit idly by and let the accusations reverberate around r.g.c.c... I replied factually, quoted a specific letter from David Levy which described the investigation he did and the conclusion that absolutely nothing wrong was done, and so forth. It would be easy enough to simply post "Rybka is my own unique work, I didn't borrow any code form any GPL or open-source programs, so I don't know why this kind of discussion has come up." I can think of only one reason why _I_ would not write that were the discussion about me, I'll leave discovering that reason as an exercise for the reader. I think it is obvious enough that anyone will "get it."

Uri Blass · Post by **Uri Blass** » Fri Aug 22, 2008 7:56 am

bob wrote:
swami wrote:Note that this is the discussion about Free version of Rybka (Rybka 1.0)

Zach gave me a permission to edit the thread title and and insert "1.0" next to the engine name wherever he mentioned the engine, just to avoid confusion.

I agree with ChrisW that whenever you mention Rybka, It'd help if you include the version number along with it.
I am not personally convinced that it matters. What is the probability that Rybka 3 is vastly different from Rybka2, and R2 vastly different from V1? Most do not do _complete_ rewrites, which means much GPL code, if it was present in R1 will also be present in R3. So this realistically applies to all versions, if it applies to any.

Vas did not need to make a complete rewrite to avoid similiarity to fruit.
Some parts of rybka1 like the move generator have no relation to fruit.

Uri

kranium · Post by **kranium** » Fri Aug 22, 2008 8:01 am

there's only 1 way possible to get so much 'equivalent' code in two programs:

the source code from one was was used as a starting point, and then changed.
for ex: a global search and replace to change all instances of "true" to 1, and false to "0".

sorry, but the code is still indentical, and i think any programmer would agree.

Uri - with all due respect, aren't you just playing with words here?

GenoM · Post by **GenoM** » Fri Aug 22, 2008 8:06 am

In this thread Uri acts as a "дървен философ" as we use to say in Bulgarian.
[edit]Translated literally it would sounds like "wooden philosopher". In Engish dictionary it's "wiseacre".

bob · Post by **bob** » Fri Aug 22, 2008 8:09 am

Uri Blass wrote:
bob wrote:
swami wrote:Note that this is the discussion about Free version of Rybka (Rybka 1.0)

Zach gave me a permission to edit the thread title and and insert "1.0" next to the engine name wherever he mentioned the engine, just to avoid confusion.

I agree with ChrisW that whenever you mention Rybka, It'd help if you include the version number along with it.
I am not personally convinced that it matters. What is the probability that Rybka 3 is vastly different from Rybka2, and R2 vastly different from V1? Most do not do _complete_ rewrites, which means much GPL code, if it was present in R1 will also be present in R3. So this realistically applies to all versions, if it applies to any.
Vas did not need to make a complete rewrite to avoid similiarity to fruit.
Some parts of rybka1 like the move generator have no relation to fruit.

Uri

Can we _please_ stay on topic? The issue is this (again):

1. If you copy a GPL program, your program becomes GPL.

2. Once your program is derived from a GPL program, it remains GPL until _every_ line that was copied has been removed.

So, if we follow the thread carefully, and consider the set {rybka, strelka} to be equivalent (Vas' own words) and now strelka is derived from fruit. Then it follows that Rybka 1 was derived from fruit.

Given that, Rybka is subject to the GPL requirements, and unless _every copied line_ were replaced, that GPL requirement passes from R1 to R2 and from R2 to R3.

Understand the connection? That is why I posed my question when someone wants to quibble "but you have the source that came from Rybka 1, but it has nothing to do with rybka 2 or 3." In fact, it has _everything_ to do with those as well, because of the specific wording of the GPL.

bob · Post by **bob** » Fri Aug 22, 2008 8:17 am

Uri Blass wrote:
bob wrote:
bnemias wrote:
tiger wrote:the generated code is exactly the same in both cases.
I'm no expert in disassembly, but it seems clear to me that for the above statement, ANY source producing identical code is identical for the purposes of comparing actual source with a disassembled binary. The variable names could be different, presumably.

I would be looking at implementation quirks such as comment 7:

Fruit
Code: Select all
static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9  
Strelka
Code: Select all
void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9
Comment 7's implementation detail is far from the obvious choice. It's certainly not a very good practice.
That is one of many tests... is the executable identical? But there are ways to prevent that from happening. For example, doing a sort a bit earlier in the code, rather than exactly where it is needed.

Other student tricks:

for (i=0; i<n; i++) {
...
}

vs

loop_index = 0;
while (loop_index < n) {
...
loop_index++;
}

Those are semantically identical. They are syntatically equivalent. Any decent programmer would pick up on that immediately. And comments don't count since they are not used by the compiler. They are actually the easiest thing to change to try to disguise plagiarism.
The code is equivalent but it does not prove plagiarism unless the task is complicated enough and many non trivial things are like that

If you ask 2 students to write a program that get a number p from the user and find if the number is prime then you can be sure that independent programmers may have the type of similiarity that you give in your example.

Uri

what I am talking about is two different programs. They are identical except every loop of type 1 above in program A has been replaced by a loop of type 2 in program B. Programs A and B are now no longer syntactically identical, but they are certainly semantically identical. That was my point. You are claiming they are different. _most_ of us would say they are identical. It is a judgement I have to deal with every semester.

bob · Post by **bob** » Fri Aug 22, 2008 8:21 am

The correct term is "semantically equivalent". That is a common term in discussing student assignments and plagiarism/copying. The probability of even 800 syntactically identical lines is vanishingly small, however 800 lines semantically identical is almost as small. About as likely as two authors writing two different mystery novels, and the storylines are identical, with only the character names and locations being different. It just doesn't happen. And when a couple of chapters are identical, and word-for-word, the probability of "simultaneous invention" is zero.

Rybka 1.0 vs. Strelka

Re: Wanted: some opposition to the provided evidence

Re: Rybka 1.0 vs. Strelka

Re: Wanted: some opposition to the provided evidence

Re: Wanted: some opposition to the provided evidence

Re: Rybka 1.0 vs. Strelka

Re: Wanted: some opposition to the provided evidence

Re: Wanted: some opposition to the provided evidence

Re: Rybka 1.0 vs. Strelka

Re: Wanted: some opposition to the provided evidence

Re: Wanted: some opposition to the provided evidence