World Computer Chess Championship ?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: World Computer Chess Championship ?

Post by Laskos »

bob wrote:
So you believe there is some "magic match percentage" (such as the one chosen by CSVN) that is a safe number. Anything above that is simply a clone with no investigation needed, anything below that is not?

(Hint: CSVN's number doesn't look particularly "safe" to me)...
As of now, anything higher than 60% is suspicious at 100ms (on 1 modern core) with Don 8,000 or so positions, although the approach is a bit simplistic. Special care must be taken for the testing positions not be publicly available. It's true that cloning must be proven by inspecting the sources, so these suspicious engines must be dealt separately from the main body of engines in a tourney (asking for sources, etc.). There could be "false negatives" at say 57% or even 55% level. Ponder hit numbers from games are very similar (and are not dependent on the choise of positions).

Kai

ps I was not extremely enthusiasmed by the CSVN approach, but seeing so much useless talk about what must be done, the approach now seems pretty adequate
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: World Computer Chess Championship ?

Post by bob »

Laskos wrote:
bob wrote:
So you believe there is some "magic match percentage" (such as the one chosen by CSVN) that is a safe number. Anything above that is simply a clone with no investigation needed, anything below that is not?

(Hint: CSVN's number doesn't look particularly "safe" to me)...
As of now, anything higher than 60% is suspicious at 100ms (on 1 modern core) with Don 8,000 or so positions, although the approach is a bit simplistic. Special care must be taken for the testing positions not be publicly available. It's true that cloning must be proven by inspecting the sources, so these suspicious engines must be dealt separately from the main body of engines in a tourney (asking for sources, etc.). There could be "false negatives" at say 57% or even 55% level. Ponder hit numbers from games are very similar (and are not dependent on the choise of positions).

Kai

ps I was not extremely enthusiasmed by the CSVN approach, but seeing so much useless talk about what must be done, the approach now seems pretty adequate
I dislike simple detection schemes. They always have built-in error rates that are non-zero. I'd hate to see someone branded a clone just because of a similarity test, particularly once lots of newcomers are measured...
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: World Computer Chess Championship ?

Post by Laskos »

bob wrote:
Laskos wrote:
bob wrote:
So you believe there is some "magic match percentage" (such as the one chosen by CSVN) that is a safe number. Anything above that is simply a clone with no investigation needed, anything below that is not?

(Hint: CSVN's number doesn't look particularly "safe" to me)...
As of now, anything higher than 60% is suspicious at 100ms (on 1 modern core) with Don 8,000 or so positions, although the approach is a bit simplistic. Special care must be taken for the testing positions not be publicly available. It's true that cloning must be proven by inspecting the sources, so these suspicious engines must be dealt separately from the main body of engines in a tourney (asking for sources, etc.). There could be "false negatives" at say 57% or even 55% level. Ponder hit numbers from games are very similar (and are not dependent on the choise of positions).

Kai

ps I was not extremely enthusiasmed by the CSVN approach, but seeing so much useless talk about what must be done, the approach now seems pretty adequate
I dislike simple detection schemes. They always have built-in error rates that are non-zero. I'd hate to see someone branded a clone just because of a similarity test, particularly once lots of newcomers are measured...
You seem to be very srupulous with regard to this test, but you yourself (and some others) are accusing directly or not a lot of authors unscrupulously. Generally speaking, this test is say >95% correct in positive detections, while your accusations are pretty random. Besides that, the engines will be labeled only as suspicious, and the inspection of the sources will establish the copying.

Kai
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: World Computer Chess Championship ?

Post by michiguel »

Laskos wrote:
bob wrote:
So you believe there is some "magic match percentage" (such as the one chosen by CSVN) that is a safe number. Anything above that is simply a clone with no investigation needed, anything below that is not?

(Hint: CSVN's number doesn't look particularly "safe" to me)...
As of now, anything higher than 60% is suspicious at 100ms (on 1 modern core) with Don 8,000 or so positions, although the approach is a bit simplistic. Special care must be taken for the testing positions not be publicly available. It's true that cloning must be proven by inspecting the sources, so these suspicious engines must be dealt separately from the main body of engines in a tourney (asking for sources, etc.). There could be "false negatives" at say 57% or even 55% level. Ponder hit numbers from games are very similar (and are not dependent on the choise of positions).

Kai

ps I was not extremely enthusiasmed by the CSVN approach, but seeing so much useless talk about what must be done, the approach now seems pretty adequate
Keeping the positions secrets only have a purpose if only the absolute numbers are used. Using any set of positions, even random, it is still fine. But, for practical purposes, I understand the secrecy. Anyway, a further bootstrap analysis to determine the confidence of the branches is the most accurate way.

Miguel
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: World Computer Chess Championship ?

Post by Laskos »

michiguel wrote:
Laskos wrote:
bob wrote:
So you believe there is some "magic match percentage" (such as the one chosen by CSVN) that is a safe number. Anything above that is simply a clone with no investigation needed, anything below that is not?

(Hint: CSVN's number doesn't look particularly "safe" to me)...
As of now, anything higher than 60% is suspicious at 100ms (on 1 modern core) with Don 8,000 or so positions, although the approach is a bit simplistic. Special care must be taken for the testing positions not be publicly available. It's true that cloning must be proven by inspecting the sources, so these suspicious engines must be dealt separately from the main body of engines in a tourney (asking for sources, etc.). There could be "false negatives" at say 57% or even 55% level. Ponder hit numbers from games are very similar (and are not dependent on the choise of positions).

Kai

ps I was not extremely enthusiasmed by the CSVN approach, but seeing so much useless talk about what must be done, the approach now seems pretty adequate
Keeping the positions secrets only have a purpose if only the absolute numbers are used. Using any set of positions, even random, it is still fine. But, for practical purposes, I understand the secrecy. Anyway, a further bootstrap analysis to determine the confidence of the branches is the most accurate way.

Miguel
Yes, better use random in-game positions from a large database, changing them in time, the numbers will gain some more generality (at least for a given 100ms time on one decent core). Ponder-hits are good too. My worry is that some may script their engines to those 8,000 positions of Don, and CSVN adopted a strict 60% rule (which is not bad as of now for suspicious engines).
Yes, a bootstrap would be even better, I never managed to do that for the set, I remember you did that.

Kai
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: World Computer Chess Championship ?

Post by Adam Hair »

bob wrote:
Laskos wrote:
bob wrote:
BubbaTough wrote:
bob wrote: As to the "statute of limitations"... what is a reasonable period of time? 3 months? 3 years? This is a pretty serious issue, and serious crimes have a very long (or none at all) statute of limitations...
Not sure, maybe 3 months, or perhaps just until the event is over. If the protest is not registered before the event, then the protest should be due to behavior during the event, and in my mind such things should be obvious enough that it does not take years to uncover. Also, I would only want other participants to be protesting, not a situation where anyone in the world is invited to take pot shots.

To me personally, this is not very comparable to serious crimes, and should not be treated as such. Its more like a sport, and while some sports have obviously contemplated some odd retroactive anything goes type of penalties for things they don't like, most don't. If you win, you win. Protests can happen during the event by competing teams, you can have testing before, during, and immediately after events, but when the whistle blows and the testing is done its over and the winner is crowned.

-Sam
Counter examples:

(1) Ben Johnson. Gold medal revoked well after Olympics ended.

(2) NCAA football championships revoked 5 years after it was won.

There are others. This is not that uncommon, as sometimes it takes a good bit of time to investigate, sometimes evidence surfaces well after the event (strelka to start the Fruit/Rybka investigation, etc).
If taking these Ben Johnson analogies, the CVSN (sim) method would weed out the derivatives much more efficiently than the doping tests in most of the sports. I don't know what you are blabbering about the unreliability of the sim or ponder-hit methods, it is by far more efficient than anything you propose. Do you have in mind a proven false positive on sim or ponder-hits? I think it's easier to have a false negative than a false positive on sim or ponder hits.

Kai
Both Don AND Adam have clearly stated that the similarity testing is a good filter. If a program passes that, then consider it most likely to be OK. That's certainly not perfectly accurate, but a good first approximation. If a program fails the similarity test, it still has not proven to be a clone, a derivative, or anything else. It is just more suspicious than the others that were tested, and it needs actual verification with traditional approaches including source comparison and such...

I don't think there is any particular evidence that suggests that more or less false positives than false negatives will occur. It is certainly obvious that some of each will happen, which is why the source comparison is still necessary.
The following, which can be found at Ed's web site, is precisely what I think about the subject:


An additional observation is that a minimum of 5 standard deviations should be used to judge a pair percentage is beyond the norm. If 1000 unique engines (where unique means unrelated engines with unique authors) is considered to be an upper limit, then there could be possibly 999*1000/2 = 499,500 pairs of unique engines.


4 standard deviations representes an event that occurs 1 time out of approximately 31,600, or approximately 16 times in 499,500.

5 standard deviations represents 1 time in 3,448,556.
(In the context of my data, 5 standard deviations is approximately 59.5%)

While not a guarantee of avoiding a false positive, the threshold of 5 standard deviations greatly reduces the chance of it occurring.

The drawback of setting the false positive threshold so high is that more false negatives will occur (two similar engines would be deemed non-similar). However, there are several things to consider.

1)The use of statistical methods assumes that the authors have access to a common pool of ideas, but that there are no interactions between authors/engines. In reality, authors/engines do interact.

2)There are permissible methods by which one author can make his engine more similar to another engine. We have no standard for when some author goes too far. Thus, we have no way to determine an exact threshold.

3)The need to avoid false accusation is greater than the need to determine authors who break the rules slightly. In other words, it is better to let lesser offenders slip through than to make accusations against innocent authors.

4)This tool should not be used solely for determining derivatives and clones. Other methods should be used in conjunction with this tool. Ultimately, any accusation of cloning requires an examination of the code of the accused author.

Sincerely,
Adam Hair
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: World Computer Chess Championship ?

Post by Rebel »

bob wrote:
Laskos wrote:
bob wrote:
So you believe there is some "magic match percentage" (such as the one chosen by CSVN) that is a safe number. Anything above that is simply a clone with no investigation needed, anything below that is not?

(Hint: CSVN's number doesn't look particularly "safe" to me)...
As of now, anything higher than 60% is suspicious at 100ms (on 1 modern core) with Don 8,000 or so positions, although the approach is a bit simplistic. Special care must be taken for the testing positions not be publicly available. It's true that cloning must be proven by inspecting the sources, so these suspicious engines must be dealt separately from the main body of engines in a tourney (asking for sources, etc.). There could be "false negatives" at say 57% or even 55% level. Ponder hit numbers from games are very similar (and are not dependent on the choise of positions).

Kai

ps I was not extremely enthusiasmed by the CSVN approach, but seeing so much useless talk about what must be done, the approach now seems pretty adequate
I dislike simple detection schemes.
And how much of that is driven by Adam's low percentage of Rybka 1.0? Can you honestly state you are total objective here?
They always have built-in error rates that are non-zero. I'd hate to see someone branded a clone just because of a similarity test, particularly once lots of newcomers are measured...
1. There are options to deal with false positives. In case of doubt:
1a. Run the test again now at fixed depth as a second opinion.
1b Run a second set of 8000 positions.
1c. I have made a start with a database with (odd and secret) positions that make a kind of fingerprint of the open sources and how they handle them, it measures the absence or presence of certain chess knowledge and as such can serve as extra information in case of doubt.

2. I am not very impressed by your sudden care for false accusations since you are mastering accusations yourself and elevated it to some kind of art.

3. Besides, the tool is not meant to brand programs as a clone, just to exclude programs from participation. An author just has to make sure his brainchild is original enough by the percentage set by the TD.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: World Computer Chess Championship ?

Post by Rebel »

In addition I can tell the following, some programmer (who wants to remain unknown) has done the following experiment:

1. Take the Fruit 2.1 source and modify each of Fruit's EVAL equal to Rybka 1.0

2. Similarity detector reported an only 4% increase.

This is BAD NEWS for cloners who think they can take an existing source code modify all eval values (some even use multiplication) and think they can get away with it. Playing style is hard to remove from an engine.
matejst
Posts: 364
Joined: Mon May 14, 2007 8:20 pm
Full name: Boban Stanojević

Re: World Computer Chess Championship ?

Post by matejst »

I often read posts on TalkChess, but since the beginning of the clones saga, I avoid to post: to post about this topic is an excellent way to be insulted, and at my age I really don't need it.

But I downloaded the other day an interesting engine, Naraku, and I saw that the author stopped his development because of insipid and ugly clone accusations.

In the beginning, I was also quite confused about the topic. I felt that F. Letouzey has contributed enough to have a fair share of the money one can make selling chess engines. But after some more thinking, I see the situation a bit differently.

The aim of open source engines is not to put an indirect, hidden copyright on certain ideas and procedures by publishing them. Quite the contrary: it is to share those ideas, to have that code enhanced and optimized, and the greatest satisfaction should be to see your code used and improved.

So, in one hand, I am a bit disappointed by the GPL violations when there seem to be violations, but otherwise, I am perfectly OK with the work of many young programmers starting on the Fruit, Glaurung, Toga and, with some reserves, even on Ippo foundations.

In this light, I now see Vas Rajlich's not only as perfectly legal (I don't even think he broke the GPL, though it remain to be checked; then, I believe that the FSF would have no problems to have his word about it), but also quite ethical, just like I see Richard Vida's work, or Don Dailey's, despite the differences.

Anyway, although there is a lot of disagreement about this topic—and it is quite understandable—I suggest to be more cautious about proclaiming some engines as clones and attacking and insulting. The author of Naraku plainly wrote it: he published his work on his blog, was clear about the process, and didn't ask anybody to use it.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: World Computer Chess Championship ?

Post by mar »

Rebel wrote:This is BAD NEWS for cloners who think they can take an existing source code modify all eval values (some even use multiplication) and think they can get away with it. Playing style is hard to remove from an engine.
I dare to disagree Ed. Playing style is determined by the eval in the first place. So I would say it's actually good news for them.
If I would take for example Stockfish and would heavily lobotomize it's eval (losing several hundred elo), I bet it would appear as "crystal clear" in the similary test. And yet much stronger than vast majority of other engines.

Martin