Clone detection test

Don · Post by **Don** » Wed Jan 27, 2010 8:42 pm

Suppose you ran 1000 random positions on many different versions of a
the same program, then run the same positions on many versions of
other programs. What could be deduced statistically from how often
the various program versions picked the same move?

Such a thing could serve as a crude clone detector. I ran such an
experiment on many different programs to get a kind of measurment of
corelation between different program "families" and different versions
within the same family of programs and the result is surprising.

The 1000 positions are from a set of positions that Larry Kaufman and
I created long ago that are designed to compare chess programs to
humans in playing style. So few problems are blatantly tactical and
in many of these positions the choice of moves is going to based on
preference more than raw strength.

The test compares any two programs by how often they pick the same
move, out of a sample of 1000 positions. I run each program to the
same time limit which in this case is 1/10 of a second.

Below is a table of the results, starting with the most corelated to
the least corelated programs. I ran various verisons of my own
program, all the stockfish versions including glaurung, and all the so
called Rybka clones as well as Rybka herself.

What is interesting in the table is that if you assume that ippolitio,
Robbolito and Rybka are in the same family of programs, and that any
program with a score above 594 is to be considered a clone, then my
test gets it right in every single case. It identifies families of
programs and non-related programs accurately.

The most corelated set of programs that we know are not clones of each
other are rybka and doch-1.2. However, these 2 program do have a
program author in common.

Just in case this could be interpreted as a strength tester, I added a
version of stockfish 1.6 which I call sf_strong. It is stockfish 1.6
run at 1/4 of a second instead of 1/10 of a second. This was a sanity
test to determine if stockfish would look more like Rybka if it was
run at a level where it was closer to Rybka's chess strength. As you
can see, this did not foil my test.

I'm not going to attach any special signficance to this test - look at
the data and draw your own conclusions. I don't pretend it's
scientifically accurate or anything like this. It is whatever it is.

Code: Select all

  846  sf_strong         sf16            
  758  doch-1.2          doch-1.0        
  734  robbo             ippolito        
  720  komodo            doch-1.2        
  706  sf15              sf14            
  687  komodo            doch-1.0        
  671  sf16              sf15            
  655  rybka             robbo           
  649  sf16              sf14            
  644  rybka             ippolito        
  639  sf14              glaurung        
  638  sf_strong         sf15            
  630  sf15              glaurung        
  617  sf_strong         sf14            
  600  sf_strong         glaurung        
  595  sf16              glaurung        

  594  rybka             doch-1.2        
  582  rybka             doch-1.0        
  581  rybka             komodo          
  579  ippolito          doch-1.0        
  573  robbo             komodo          
  571  sf15              robbo           
  571  ippolito          doch-1.2        
  569  sf15              rybka           
  568  robbo             doch-1.2        
  565  sf_strong         rybka           
  563  sf14              ippolito        
  560  komodo            ippolito        
  559  sf14              robbo           
  559  robbo             doch-1.0        
  558  sf14              rybka           
  557  sf_strong         robbo           
  556  sf15              ippolito        
  554  sf16              rybka           
  554  sf16              robbo           
  554  sf15              komodo          
  551  sf14              doch-1.0        
  549  sf15              doch-1.0        
  544  glaurung          doch-1.0        
  542  rybka             glaurung        
  541  sf16              doch-1.0        
  538  sf16              ippolito        
  536  sf_strong         doch-1.0        
  536  sf14              komodo          
  532  komodo            glaurung        
  531  sf_strong         ippolito        
  531  sf15              doch-1.2        
  528  glaurung          doch-1.2        
  527  sf_strong         komodo          
  527  sf16              doch-1.2        
  525  robbo             glaurung        
  524  sf_strong         doch-1.2        
  523  sf14              doch-1.2        
  521  ippolito          glaurung        
  519  sf16              komodo

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Wed Jan 27, 2010 8:51 pm

Can you throw Crafty, Toga and Fruit into the mix?

Don · Post by **Don** » Wed Jan 27, 2010 8:59 pm

Gian-Carlo Pascutto wrote:Can you throw Crafty, Toga and Fruit into the mix?

I cannot easily run Crafty because the tester uses UCI as the interface. But I can try Fruit and Toga. Let me see if I have those on my system.

Don

michiguel · Post by **michiguel** » Wed Jan 27, 2010 9:14 pm

Don wrote:Suppose you ran 1000 random positions on many different versions of a
the same program, then run the same positions on many versions of
other programs. What could be deduced statistically from how often
the various program versions picked the same move?

Such a thing could serve as a crude clone detector. I ran such an
experiment on many different programs to get a kind of measurment of
corelation between different program "families" and different versions
within the same family of programs and the result is surprising.

The 1000 positions are from a set of positions that Larry Kaufman and
I created long ago that are designed to compare chess programs to
humans in playing style. So few problems are blatantly tactical and
in many of these positions the choice of moves is going to based on
preference more than raw strength.

The test compares any two programs by how often they pick the same
move, out of a sample of 1000 positions. I run each program to the
same time limit which in this case is 1/10 of a second.

Below is a table of the results, starting with the most corelated to
the least corelated programs. I ran various verisons of my own
program, all the stockfish versions including glaurung, and all the so
called Rybka clones as well as Rybka herself.

What is interesting in the table is that if you assume that ippolitio,
Robbolito and Rybka are in the same family of programs, and that any
program with a score above 594 is to be considered a clone, then my
test gets it right in every single case. It identifies families of
programs and non-related programs accurately.

The most corelated set of programs that we know are not clones of each
other are rybka and doch-1.2. However, these 2 program do have a
program author in common.

Just in case this could be interpreted as a strength tester, I added a
version of stockfish 1.6 which I call sf_strong. It is stockfish 1.6
run at 1/4 of a second instead of 1/10 of a second. This was a sanity
test to determine if stockfish would look more like Rybka if it was
run at a level where it was closer to Rybka's chess strength. As you
can see, this did not foil my test.

I'm not going to attach any special signficance to this test - look at
the data and draw your own conclusions. I don't pretend it's
scientifically accurate or anything like this. It is whatever it is.
Code: Select all
  846  sf_strong         sf16            
  758  doch-1.2          doch-1.0        
  734  robbo             ippolito        
  720  komodo            doch-1.2        
  706  sf15              sf14            
  687  komodo            doch-1.0        
  671  sf16              sf15            
  655  rybka             robbo           
  649  sf16              sf14            
  644  rybka             ippolito        
  639  sf14              glaurung        
  638  sf_strong         sf15            
  630  sf15              glaurung        
  617  sf_strong         sf14            
  600  sf_strong         glaurung        
  595  sf16              glaurung        

  594  rybka             doch-1.2        
  582  rybka             doch-1.0        
  581  rybka             komodo          
  579  ippolito          doch-1.0        
  573  robbo             komodo          
  571  sf15              robbo           
  571  ippolito          doch-1.2        
  569  sf15              rybka           
  568  robbo             doch-1.2        
  565  sf_strong         rybka           
  563  sf14              ippolito        
  560  komodo            ippolito        
  559  sf14              robbo           
  559  robbo             doch-1.0        
  558  sf14              rybka           
  557  sf_strong         robbo           
  556  sf15              ippolito        
  554  sf16              rybka           
  554  sf16              robbo           
  554  sf15              komodo          
  551  sf14              doch-1.0        
  549  sf15              doch-1.0        
  544  glaurung          doch-1.0        
  542  rybka             glaurung        
  541  sf16              doch-1.0        
  538  sf16              ippolito        
  536  sf_strong         doch-1.0        
  536  sf14              komodo          
  532  komodo            glaurung        
  531  sf_strong         ippolito        
  531  sf15              doch-1.2        
  528  glaurung          doch-1.2        
  527  sf_strong         komodo          
  527  sf16              doch-1.2        
  525  robbo             glaurung        
  524  sf_strong         doch-1.2        
  523  sf14              doch-1.2        
  521  ippolito          glaurung        
  519  sf16              komodo          

If you give me a matrix of which positions each program solved, I think I will be able to adapt some software we run to do phylogenetic analysis.
http://en.wikipedia.org/wiki/Phylogenetic_tree

We could even trace the ancestors, **if** the concept is valid.

Miguel
PS: I mean something like

Rybka {10011110110011........101010}
Komodo{10010010101000........111000}

Where 1 is solved, 0 is not solved, in order. Whatever format you have, I will convert it.

Don · Post by **Don** » Wed Jan 27, 2010 9:27 pm

Gian-Carlo Pascutto wrote:Can you throw Crafty, Toga and Fruit into the mix?

Ok, I added fruit and toga, the versions I had on my linux system. According to this test they are highly correlated but are no related to other families.

I think the test could be improved if I made an effort to remove positions from the set where many programs at many different levels agree on the same moves. This would have to be done without regard to which programs were selecting which moves of course.

Code: Select all

  846  sf_strong         sf16            
  758  doch-1.2          doch-1.0        
  734  robbo             ippolito        
  720  komodo            doch-1.2        
  706  sf15              sf14            
  687  komodo            doch-1.0        
  671  sf16              sf15            
  655  rybka             robbo           
  649  sf16              sf14            
  644  rybka             ippolito        
  639  sf14              glaurung        
  638  sf_strong         sf15            
  636  toga2             fruit           
  630  sf15              glaurung        
  617  sf_strong         sf14            
  600  sf_strong         glaurung        
  595  sf16              glaurung        

  594  rybka             doch-1.2        
  589  fruit             doch-1.0        
  582  rybka             doch-1.0        
  581  rybka             komodo          
  579  ippolito          doch-1.0        
  573  robbo             komodo          
  571  sf15              robbo           
  571  ippolito          doch-1.2        
  569  sf15              rybka           
  568  robbo             doch-1.2        
  565  sf_strong         rybka           
  564  toga2             sf15            
  563  sf14              ippolito        
  561  toga2             sf14            
  561  toga2             doch-1.0        
  560  toga2             glaurung        
  560  komodo            ippolito        
  559  sf14              robbo           
  559  robbo             doch-1.0        
  559  fruit             doch-1.2        
  558  sf14              rybka           
  557  sf_strong         robbo           
  556  sf15              ippolito        
  554  sf16              rybka           
  554  sf16              robbo           
  554  sf15              komodo          
  553  komodo            fruit           
  553  glaurung          fruit           
  551  sf14              doch-1.0        
  549  sf15              doch-1.0        
  548  sf14              fruit           
  546  toga2             doch-1.2        
  544  sf15              fruit           
  544  glaurung          doch-1.0        
  543  toga2             sf16            
  542  rybka             glaurung        
  541  sf16              doch-1.0        
  538  sf16              ippolito        
  537  toga2             ippolito        
  536  toga2             komodo          
  536  sf_strong         doch-1.0        
  536  sf14              komodo          
  535  toga2             robbo           
  534  sf16              fruit           
  534  ippolito          fruit           
  532  komodo            glaurung        
  531  sf_strong         ippolito        
  531  sf15              doch-1.2        
  530  robbo             fruit           
  528  glaurung          doch-1.2        
  527  sf_strong         komodo          
  527  sf16              doch-1.2        
  526  toga2             rybka           
  525  robbo             glaurung        
  524  toga2             sf_strong       
  524  sf_strong         doch-1.2        
  523  sf14              doch-1.2        
  523  rybka             fruit           
  521  ippolito          glaurung        
  519  sf16              komodo          
  514  sf_strong         fruit

mcostalba · Post by **mcostalba** » Wed Jan 27, 2010 9:31 pm

Interesting. I was thinking that the correlation between SF and Komodo / Doch was much more then what it seems from the table

Don · Post by **Don** » Wed Jan 27, 2010 9:35 pm

michiguel wrote:
If you give me a matrix of which positions each program solved, I think I will be able to adapt some software we run to do phylogenetic analysis.
http://en.wikipedia.org/wiki/Phylogenetic_tree

We could even trace the ancestors, **if** the concept is valid.

Miguel
PS: I mean something like

Rybka {10011110110011........101010}
Komodo{10010010101000........111000}

Where 1 is solved, 0 is not solved, in order. Whatever format you have, I will convert it.

That's not how the test works. It's not a test to see if a given move is selected, it's a test to compare 2 programs to see if they chose the same move. So for any pair of programs I could give you such a matrix where 1 means the 2 programs in question showed the same move. But I would have to generate such a matrix for every pair of programs (which is easily done of course.)

benstoker · Post by **benstoker** » Wed Jan 27, 2010 9:49 pm

Don wrote:Suppose you ran 1000 random positions on many different versions of a
the same program, then run the same positions on many versions of
other programs. What could be deduced statistically from how often
the various program versions picked the same move?

Such a thing could serve as a crude clone detector. I ran such an
experiment on many different programs to get a kind of measurment of
corelation between different program "families" and different versions
within the same family of programs and the result is surprising.

The 1000 positions are from a set of positions that Larry Kaufman and
I created long ago that are designed to compare chess programs to
humans in playing style. So few problems are blatantly tactical and
in many of these positions the choice of moves is going to based on
preference more than raw strength.

The test compares any two programs by how often they pick the same
move, out of a sample of 1000 positions. I run each program to the
same time limit which in this case is 1/10 of a second.

Below is a table of the results, starting with the most corelated to
the least corelated programs. I ran various verisons of my own
program, all the stockfish versions including glaurung, and all the so
called Rybka clones as well as Rybka herself.

What is interesting in the table is that if you assume that ippolitio,
Robbolito and Rybka are in the same family of programs, and that any
program with a score above 594 is to be considered a clone, then my
test gets it right in every single case. It identifies families of
programs and non-related programs accurately.

The most corelated set of programs that we know are not clones of each
other are rybka and doch-1.2. However, these 2 program do have a
program author in common.

Just in case this could be interpreted as a strength tester, I added a
version of stockfish 1.6 which I call sf_strong. It is stockfish 1.6
run at 1/4 of a second instead of 1/10 of a second. This was a sanity
test to determine if stockfish would look more like Rybka if it was
run at a level where it was closer to Rybka's chess strength. As you
can see, this did not foil my test.

I'm not going to attach any special signficance to this test - look at
the data and draw your own conclusions. I don't pretend it's
scientifically accurate or anything like this. It is whatever it is.
Code: Select all
  846  sf_strong         sf16            
  758  doch-1.2          doch-1.0        
  734  robbo             ippolito        
  720  komodo            doch-1.2        
  706  sf15              sf14            
  687  komodo            doch-1.0        
  671  sf16              sf15            
  655  rybka             robbo           
  649  sf16              sf14            
  644  rybka             ippolito        
  639  sf14              glaurung        
  638  sf_strong         sf15            
  630  sf15              glaurung        
  617  sf_strong         sf14            
  600  sf_strong         glaurung        
  595  sf16              glaurung        

  594  rybka             doch-1.2        
  582  rybka             doch-1.0        
  581  rybka             komodo          
  579  ippolito          doch-1.0        
  573  robbo             komodo          
  571  sf15              robbo           
  571  ippolito          doch-1.2        
  569  sf15              rybka           
  568  robbo             doch-1.2        
  565  sf_strong         rybka           
  563  sf14              ippolito        
  560  komodo            ippolito        
  559  sf14              robbo           
  559  robbo             doch-1.0        
  558  sf14              rybka           
  557  sf_strong         robbo           
  556  sf15              ippolito        
  554  sf16              rybka           
  554  sf16              robbo           
  554  sf15              komodo          
  551  sf14              doch-1.0        
  549  sf15              doch-1.0        
  544  glaurung          doch-1.0        
  542  rybka             glaurung        
  541  sf16              doch-1.0        
  538  sf16              ippolito        
  536  sf_strong         doch-1.0        
  536  sf14              komodo          
  532  komodo            glaurung        
  531  sf_strong         ippolito        
  531  sf15              doch-1.2        
  528  glaurung          doch-1.2        
  527  sf_strong         komodo          
  527  sf16              doch-1.2        
  525  robbo             glaurung        
  524  sf_strong         doch-1.2        
  523  sf14              doch-1.2        
  521  ippolito          glaurung        
  519  sf16              komodo          

Since the uci spits out a bestmove and a ponder move, could you not utilize the additional ponder move value for comparison. Then, instead of one value per position, you have 2 values.

michiguel · Post by **michiguel** » Wed Jan 27, 2010 9:51 pm

Don wrote:
michiguel wrote:
If you give me a matrix of which positions each program solved, I think I will be able to adapt some software we run to do phylogenetic analysis.
http://en.wikipedia.org/wiki/Phylogenetic_tree

We could even trace the ancestors, **if** the concept is valid.

Miguel
PS: I mean something like

Rybka {10011110110011........101010}
Komodo{10010010101000........111000}

Where 1 is solved, 0 is not solved, in order. Whatever format you have, I will convert it.
That's not how the test works. It's not a test to see if a given move is selected, it's a test to compare 2 programs to see if they chose the same move. So for any pair of programs I could give you such a matrix where 1 means the 2 programs in question showed the same move. But I would have to generate such a matrix for every pair of programs (which is easily done of course.)

Even better!

If understand correctly, this will be even more specific:

Rybka = {
Nf6
Bc4
...
Qe4+
}

Komodo = {
Nf6
Nc6
...
Qe4+
}

But the easiest will be something that provides this type of data in any format that I could process (where 100 is perfect match, I think you get the idea)

Code: Select all

  
           Ro   Ry   ko   Do
Robbo     100   95   85   80
Rybka          100   90   85
Komodo              100   98
Doch 1                   100

hgm · Post by **hgm** » Wed Jan 27, 2010 9:51 pm

That is even better. Just give a list of the moves. (E.g. in long algebraic notation, all concatenated to a very long string.)

Clone detection test

Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test