Rebel wrote:bob wrote:Laskos wrote:bob wrote:
So you believe there is some "magic match percentage" (such as the one chosen by CSVN) that is a safe number. Anything above that is simply a clone with no investigation needed, anything below that is not?
(Hint: CSVN's number doesn't look particularly "safe" to me)...
As of now, anything higher than 60% is suspicious at 100ms (on 1 modern core) with Don 8,000 or so positions, although the approach is a bit simplistic. Special care must be taken for the testing positions not be publicly available. It's true that cloning must be proven by inspecting the sources, so these suspicious engines must be dealt separately from the main body of engines in a tourney (asking for sources, etc.). There could be "false negatives" at say 57% or even 55% level. Ponder hit numbers from games are very similar (and are not dependent on the choise of positions).
Kai
ps I was not extremely enthusiasmed by the CSVN approach, but seeing so much useless talk about what must be done, the approach now seems pretty adequate
I dislike simple detection schemes.
And how much of that is driven by Adam's low percentage of Rybka 1.0? Can you honestly state you are total objective here?
My comments have absolutely nothing to do with Rybka 1.0. They are based on simple observations from past experiments that have been tried to show similarity, or predict the rating of a program. ICC has done a TON of work on detecting computer usage. I worked with Tim over there years ago and saw just how hard it is to show that this "matching moves" stuff indicates anything other than playing good chess in the majority of the cases.
The normal way to develop a model such as this is to put it together, and then run it against several trial groups, specifically comprised to test the model thoroughly. Groups that have nothing to do with each other. Groups that include a couple of related versions. I've not seen any of the "control experiments" to see what ponderhit might show in a group of GM players that are obviously not clones. The more programs that are tested in this way, the more false positives one will see due to nothing more than "the pigeonhole issue"...
They always have built-in error rates that are non-zero. I'd hate to see someone branded a clone just because of a similarity test, particularly once lots of newcomers are measured...
1. There are options to deal with false positives. In case of doubt:
1a. Run the test again now at fixed depth as a second opinion.
1b Run a second set of 8000 positions.
1c. I have made a start with a database with (odd and secret) positions that make a kind of fingerprint of the open sources and how they handle them, it measures the absence or presence of certain chess knowledge and as such can serve as extra information in case of doubt.
"Secret positions" is not very useful. They don't remain secret for long. I have seen examples of programs in the past with a "hidden built-in opening book" so that they could play reasonably even without a book file. I have seen examples of programs with built-in learning information so that they would do better on some "test positions" but which were caught when someone noticed that if you flipped the positions left-right or flipped colors, the program played completely differently. So "secret positions" don't impress me very much at all.
2. I am not very impressed by your sudden care for false accusations since you are mastering accusations yourself and elevated it to some kind of art.
A direct challenge: how about listing exactly WHO I have accused of copying code? Vas/Rybka/ Houdart/Houdini. Because there is a TON of evidence showing both convincingly. The others would include El Chinito, Le Petite, Bionic Impakt, Voyager, and several other Crafty clones that were proven beyond any doubt. Who ELSE have I accused? Be waiting to see if you answer or run here...
3. Besides, the tool is not meant to brand programs as a clone, just to exclude programs from participation. An author just has to make sure his brainchild is original enough by the percentage set by the TD.
"exclude from participation" is not the same as "branding them as a clone"?