A derivative-searching protocol

Discussion of chess software programming and technical issues.

Moderator: Ras

Jérémy Pages
Posts: 11
Joined: Sun Aug 17, 2008 2:28 am
Location: Mulhouse (Alsace, France)

A derivative-searching protocol

Post by Jérémy Pages »

Following messages from Miguel A. Ballorca and Christophe Théron on this thread, I have read some scientific papers about detection of similarities in documents and source codes with the NCD method. Then, I searched tools which perform this kind of analysis.

So, this morning, I have discovered this software (Baldr), written in a French engineering school by a computer science teacher (Hubert Wassner) and by his students (in Java, with a GPL2 license), in order to detect plagiarisms in student assignments. Baldr is based on the NCD method.

We could certainly build a protocol to detect clones or derivative engines more efficiently with this kind of software. For example :
- Take executable codes (or source code for free softwares) of all chess engines.
- Compute distances between these engines.
- Select the engine pairs which have the lowest distances.
- Check more carefully similarities between the engines with lowest distances (watch source codes with diff or Kompare, or reverse-engineer first the executable codes in the case of closed-source programs).

The following issues should be solved first :
- Differences in programming languages (several engines are written in Delphi or Java).
- Source codes are often divided in many files and directories.[/url]
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: A derivative-searching protocol

Post by michiguel »

Jérémy Pages wrote:Following messages from Miguel A. Ballorca and Christophe Théron on this thread, I have read some scientific papers about detection of similarities in documents and source codes with the NCD method. Then, I searched tools which perform this kind of analysis.

So, this morning, I have discovered this software (Baldr), written in a French engineering school by a computer science teacher (Hubert Wassner) and by his students (in Java, with a GPL2 license), in order to detect plagiarisms in student assignments. Baldr is based on the NCD method.

We could certainly build a protocol to detect clones or derivative engines more efficiently with this kind of software. For example :
- Take executable codes (or source code for free softwares) of all chess engines.
- Compute distances between these engines.
- Select the engine pairs which have the lowest distances.
- Check more carefully similarities between the engines with lowest distances (watch source codes with diff or Kompare, or reverse-engineer first the executable codes in the case of closed-source programs).

The following issues should be solved first :
- Differences in programming languages (several engines are written in Delphi or Java).
- Source codes are often divided in many files and directories.[/url]
I think this is doable, but it is uncertain how sensitive an accurate the methods will be. In any case, it is very interesting.

Miguel