The individuals who will find your test the most useful are the cloners. With it, the easiest and most effective types of changes in their clones will become well known and the types of changes that are not worth the time and effort will also become well known. IOW, you will be making the cloners more efficient and more competent.Don wrote:Suppose you ran 1000 random positions on many different versions of a
the same program, then run the same positions on many versions of
other programs. What could be deduced statistically from how often
the various program versions picked the same move?
Such a thing could serve as a crude clone detector. I ran such an
experiment on many different programs to get a kind of measurment of
corelation between different program "families" and different versions
within the same family of programs and the result is surprising.
The 1000 positions are from a set of positions that Larry Kaufman and
I created long ago that are designed to compare chess programs to
humans in playing style. So few problems are blatantly tactical and
in many of these positions the choice of moves is going to based on
preference more than raw strength.
The test compares any two programs by how often they pick the same
move, out of a sample of 1000 positions. I run each program to the
same time limit which in this case is 1/10 of a second.
Below is a table of the results, starting with the most corelated to
the least corelated programs. I ran various verisons of my own
program, all the stockfish versions including glaurung, and all the so
called Rybka clones as well as Rybka herself.
What is interesting in the table is that if you assume that ippolitio,
Robbolito and Rybka are in the same family of programs, and that any
program with a score above 594 is to be considered a clone, then my
test gets it right in every single case. It identifies families of
programs and non-related programs accurately.
The most corelated set of programs that we know are not clones of each
other are rybka and doch-1.2. However, these 2 program do have a
program author in common.
Just in case this could be interpreted as a strength tester, I added a
version of stockfish 1.6 which I call sf_strong. It is stockfish 1.6
run at 1/4 of a second instead of 1/10 of a second. This was a sanity
test to determine if stockfish would look more like Rybka if it was
run at a level where it was closer to Rybka's chess strength. As you
can see, this did not foil my test.
I'm not going to attach any special signficance to this test - look at
the data and draw your own conclusions. I don't pretend it's
scientifically accurate or anything like this. It is whatever it is.
Code: Select all
846 sf_strong sf16 758 doch-1.2 doch-1.0 734 robbo ippolito 720 komodo doch-1.2 706 sf15 sf14 687 komodo doch-1.0 671 sf16 sf15 655 rybka robbo 649 sf16 sf14 644 rybka ippolito 639 sf14 glaurung 638 sf_strong sf15 630 sf15 glaurung 617 sf_strong sf14 600 sf_strong glaurung 595 sf16 glaurung 594 rybka doch-1.2 582 rybka doch-1.0 581 rybka komodo 579 ippolito doch-1.0 573 robbo komodo 571 sf15 robbo 571 ippolito doch-1.2 569 sf15 rybka 568 robbo doch-1.2 565 sf_strong rybka 563 sf14 ippolito 560 komodo ippolito 559 sf14 robbo 559 robbo doch-1.0 558 sf14 rybka 557 sf_strong robbo 556 sf15 ippolito 554 sf16 rybka 554 sf16 robbo 554 sf15 komodo 551 sf14 doch-1.0 549 sf15 doch-1.0 544 glaurung doch-1.0 542 rybka glaurung 541 sf16 doch-1.0 538 sf16 ippolito 536 sf_strong doch-1.0 536 sf14 komodo 532 komodo glaurung 531 sf_strong ippolito 531 sf15 doch-1.2 528 glaurung doch-1.2 527 sf_strong komodo 527 sf16 doch-1.2 525 robbo glaurung 524 sf_strong doch-1.2 523 sf14 doch-1.2 521 ippolito glaurung 519 sf16 komodo
BTW, am I right in stating that sf_strong & sf14 are highly related? And that sf_15 & robbo are highly independent? If true on both counts, then the respective scores for these pairings: 617 & 571 seems too close for comfort.
Another thing that bothers me about your test, is that calibrating it (which positions to use and which to not) requires making assumptions about which programs are clones and which are not? How else? A lawyer would argue that all that you have done with your test is pick positions that tend corroborate your belief and omit positions that tend to not and that an easier way to determine your belief is to simply ask you.
