the same program, then run the same positions on many versions of
other programs. What could be deduced statistically from how often
the various program versions picked the same move?
Such a thing could serve as a crude clone detector. I ran such an
experiment on many different programs to get a kind of measurment of
corelation between different program "families" and different versions
within the same family of programs and the result is surprising.
The 1000 positions are from a set of positions that Larry Kaufman and
I created long ago that are designed to compare chess programs to
humans in playing style. So few problems are blatantly tactical and
in many of these positions the choice of moves is going to based on
preference more than raw strength.
The test compares any two programs by how often they pick the same
move, out of a sample of 1000 positions. I run each program to the
same time limit which in this case is 1/10 of a second.
Below is a table of the results, starting with the most corelated to
the least corelated programs. I ran various verisons of my own
program, all the stockfish versions including glaurung, and all the so
called Rybka clones as well as Rybka herself.
What is interesting in the table is that if you assume that ippolitio,
Robbolito and Rybka are in the same family of programs, and that any
program with a score above 594 is to be considered a clone, then my
test gets it right in every single case. It identifies families of
programs and non-related programs accurately.
The most corelated set of programs that we know are not clones of each
other are rybka and doch-1.2. However, these 2 program do have a
program author in common.
Just in case this could be interpreted as a strength tester, I added a
version of stockfish 1.6 which I call sf_strong. It is stockfish 1.6
run at 1/4 of a second instead of 1/10 of a second. This was a sanity
test to determine if stockfish would look more like Rybka if it was
run at a level where it was closer to Rybka's chess strength. As you
can see, this did not foil my test.
I'm not going to attach any special signficance to this test - look at
the data and draw your own conclusions. I don't pretend it's
scientifically accurate or anything like this. It is whatever it is.
Code: Select all
846 sf_strong sf16
758 doch-1.2 doch-1.0
734 robbo ippolito
720 komodo doch-1.2
706 sf15 sf14
687 komodo doch-1.0
671 sf16 sf15
655 rybka robbo
649 sf16 sf14
644 rybka ippolito
639 sf14 glaurung
638 sf_strong sf15
630 sf15 glaurung
617 sf_strong sf14
600 sf_strong glaurung
595 sf16 glaurung
594 rybka doch-1.2
582 rybka doch-1.0
581 rybka komodo
579 ippolito doch-1.0
573 robbo komodo
571 sf15 robbo
571 ippolito doch-1.2
569 sf15 rybka
568 robbo doch-1.2
565 sf_strong rybka
563 sf14 ippolito
560 komodo ippolito
559 sf14 robbo
559 robbo doch-1.0
558 sf14 rybka
557 sf_strong robbo
556 sf15 ippolito
554 sf16 rybka
554 sf16 robbo
554 sf15 komodo
551 sf14 doch-1.0
549 sf15 doch-1.0
544 glaurung doch-1.0
542 rybka glaurung
541 sf16 doch-1.0
538 sf16 ippolito
536 sf_strong doch-1.0
536 sf14 komodo
532 komodo glaurung
531 sf_strong ippolito
531 sf15 doch-1.2
528 glaurung doch-1.2
527 sf_strong komodo
527 sf16 doch-1.2
525 robbo glaurung
524 sf_strong doch-1.2
523 sf14 doch-1.2
521 ippolito glaurung
519 sf16 komodo