Following messages from Miguel A. Ballorca and Christophe Théron on this thread, I have read some scientific papers about detection of similarities in documents and source codes with the NCD method. Then, I searched tools which perform this kind of analysis.
So, this morning, I have discovered this software (Baldr), written in a French engineering school by a computer science teacher (Hubert Wassner) and by his students (in Java, with a GPL2 license), in order to detect plagiarisms in student assignments. Baldr is based on the NCD method.
We could certainly build a protocol to detect clones or derivative engines more efficiently with this kind of software. For example :
- Take executable codes (or source code for free softwares) of all chess engines.
- Compute distances between these engines.
- Select the engine pairs which have the lowest distances.
- Check more carefully similarities between the engines with lowest distances (watch source codes with diff or Kompare, or reverse-engineer first the executable codes in the case of closed-source programs).
The following issues should be solved first :
- Differences in programming languages (several engines are written in Delphi or Java).
- Source codes are often divided in many files and directories.[/url]
A derivative-searching protocol
Moderator: Ras
-
- Posts: 11
- Joined: Sun Aug 17, 2008 2:28 am
- Location: Mulhouse (Alsace, France)
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: A derivative-searching protocol
I think this is doable, but it is uncertain how sensitive an accurate the methods will be. In any case, it is very interesting.Jérémy Pages wrote:Following messages from Miguel A. Ballorca and Christophe Théron on this thread, I have read some scientific papers about detection of similarities in documents and source codes with the NCD method. Then, I searched tools which perform this kind of analysis.
So, this morning, I have discovered this software (Baldr), written in a French engineering school by a computer science teacher (Hubert Wassner) and by his students (in Java, with a GPL2 license), in order to detect plagiarisms in student assignments. Baldr is based on the NCD method.
We could certainly build a protocol to detect clones or derivative engines more efficiently with this kind of software. For example :
- Take executable codes (or source code for free softwares) of all chess engines.
- Compute distances between these engines.
- Select the engine pairs which have the lowest distances.
- Check more carefully similarities between the engines with lowest distances (watch source codes with diff or Kompare, or reverse-engineer first the executable codes in the case of closed-source programs).
The following issues should be solved first :
- Differences in programming languages (several engines are written in Delphi or Java).
- Source codes are often divided in many files and directories.[/url]
Miguel