Pairwise Analysis of Chess Engine Move Selections

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Pairwise Analysis of Chess Engine Move Selections

Post by Adam Hair » Sun Apr 17, 2011 1:25 am

I am presenting some data that I have accumulated with the purpose
of posing some questions concerning some of the results.

In general, it is supposed that unrelated engines will tend to choose
different moves when given a position to analyse that has more than
one "good" move that could be made, due in part to differences in
their evaluation code. By unrelated, I mean engines from different
authors, who have varying skills in programming and the game of
chess. Of course, given the flow of ideas in the computer chess
community, one would expect there to be some overlap of ideas when
comparing two unrelated engines. Yet, the actual implementation of
those ideas would be expected to differ some, and thus cause some
differences in move selection. Conversely, one would expect in most
cases that related engines would tend to choose the same moves at
a higher frequency. The data I have generated seems to indicate that
these assumptions are true in general.

However, there are some cases where seemingly unrelated engines
select the same moves at a higher rate than is expected. Also, in
almost every case, where there is a group of engines that select
the same moves with a higher frequency there is also a open source
engine in that group. I am refering to Fruit, Strelka, Robbolito/IvanHoe,
and Glaurung/Stockfish. This seems to argue against the possibility that
the effect I have observed is due to convergent ideas or successful
attempts to emulate the evaluation of other, closed sourced engines. It
seems to imply the flow of ideas from the open source to other engines.
It does not imply any wrong doing, for authors of original engines (Crafty,
Fruit, Glaurung to name a few) choose to open their sources to share
their ideas and help others learn. However, in most cases I could not
find any acknowledgement of the source of the ideas being used. Given
the standing in the community of some of the authors involved, I am
led to ask certain questions and make no assumptions or accusations.
Furthermore, the data I have generated is no proof of anything. It is only
circumstantial, patterns of data that require additional information to
interpret correctly. So, here are my questions:

Is there a discernable threshhold where the fair use of ideas turns into
plagarism?

Is it possible to use many of the ideas from another engine and yet create
a unique engine that avoids plagarism?

Is it possible that there are explanations for my data other than the overuse
of ideas from open source engines?

I used Don Dailey's similarity tool to create the data. The tool contains
8000+ positions for the engines to analyse. While not all of the positions
are such where information can be extracted, there are more than enough
"good" positions to be able to make statistical inferences. Since it sends
UCI commands, I had to use WB2UCI for the few WB engines that I could
get to run the test properly. In one case (Delfi), I had to use InBetween
to change the "go depth 50" command to "go infinite" in order to run the
test correctly. There were several engines that will not run the test
correctly and are not included.

I gave Houdini 1.5 10 ms per position and determined the thinking time
for the other engines by this formula: 10 ms * 2^(Elo Diff/100), where
Elo Diff is the difference in elo from Houdini 1.5. I had to guesstimate
the Elo difference in several cases and this is merely an attempt to
cancel, to some degree, the correlation between depth searched and
matched moves. From other data I know that the correlation is not very
strong (the number of matched moves does not change greatly with
changes in thinking time). Let me add that Houdini searches to a depth
of 8 plies when given 10 ms, depending on the position. Also, each engine
used one core and 64 MB of hash (I thought that more than enough hash
was better than not enough).

Some engines obey the UCI commands more faithfully than others,
particularly the "stop" command. Also, as I noted above, I guesstimated
some Elo ratings. In fact, the Elo ratings I used are not that precise.
The end result is that the data generated has some impreciseness
inherent to it. So, borderline cases of "similarity" should be ignored.
Also, instead of looking at the raw percentages (which are relative to
the positions found in the tool and the conditions of the test), I
standardized the data. This involves computing the mean (average) and
the sample standard deviation and then converted the data with this
formula: (Mean - Percent matched)/ standard deviation. The resulting
numbers convey how far from the mean the Percent of matched moves
is in each case. This also informs how likely this result would occur if
the assumptions of the test is true. The following table gives the
probability of the result occuring:

Code: Select all

 
1σ  31.7310508%          1 / 3.1514872 
2σ  4.5500264%            1 / 21.977895 
3σ  0.2699796%            1 / 370.398 
4σ  0.006334%              1 / 15,787 
5σ  0.0000573303%       1 / 1,744,278 
6σ  0.0000001973%       1 / 506,800,000 
7σ  0.0000000002560%  1 / 390,700,000,000 
The underlying assumption that I am using is that engines from different
families are unrelated in regards to evaluation, unrelated in the sense
that the overlap of ideas is not great. In the case of Spark and Bright,
one would have to think that they share a lot of code, but they act
differently in regards to choosing moves. I chose the following 37
engines as the basis for my comparisons. They were chosen because they
are not highly similar in regards to move selection.

Code: Select all

1) Alfil 8.1.1 Opt (time: 485 ms  scale: 1.0)
 2) Arasan 12.3 (time: 761 ms  scale: 1.0)
 3) Aristarch 4.50 (time: 686 ms  scale: 1.0)
 4) Bison 9.11 (time: 288 ms  scale: 1.0)
 5) Bright 0.5c (time: 166 ms  scale: 1.0)
 6) Crafty 23.4 (time: 149 ms  scale: 1.0)
 7) Critter 0.80 (time: 39 ms  scale: 1.0)
 8) Cyrano 0.6b17 (time: 422 ms  scale: 1.0)
 9) Daydreamer 1.75 (time: 343 ms  scale: 1.0)
10) Et Chess 130108 (time: 408 ms  scale: 1.0)
11) Fruit 2.1 (time: 381 ms  scale: 1.0)
12) Gandalf 6.0 (time: 485 ms  scale: 1.0)
13) Gaviota 0.80 (time: 844 ms  scale: 1.0)
14) Gull 1.2 (time: 59 ms  scale: 1.0)
15) Hannibal 1.0a (time: 89 ms  scale: 1.0)
16) Hermann 2.6 (time: 970 ms  scale: 1.0)
17) HIARCS 12 (time: 113 ms  scale: 1.0)
18) Houdini 1.5 (time: 10 ms  scale: 1.0)
19) Jonny 4.00 (time: 209 ms  scale: 1.0)
20) Komodo 1.3 (time: 39 ms  scale: 1.0)
21) List 512 (time: 577 ms  scale: 1.0)
22) Little Goliath Evolution (time: 1004 ms  scale: 1.0)
23) Movei 00.8.438 (time: 453 ms  scale: 1.0)
24) Naum 2.0 (time: 279 ms  scale: 1.0)
25) Protector 1.4.0 (time: 77 ms  scale: 1.0)
26) RedQueen 0.9.5 (time: 663 ms  scale: 1.0)
27) Ruffian 2.1.0 (time: 502 ms  scale: 1.0)
28) Rybka 4 (time: 18 ms  scale: 1.0)
29) Shredder 11 (time: 102 ms  scale: 1.0)
30) SOS 5.1 (time: 640 ms  scale: 1.0)
31) Spark 1.0 (time: 61 ms  scale: 1.0)
32) Spike 1.4 (time: 65 ms  scale: 1.0)
33) Stockfish 2.0 (time: 20 ms  scale: 1.0)
34) Thinker 5.4d Active (time: 144 ms  scale: 1.0)
35) Tornado 4.4 (time: 243 ms  scale: 1.0)
36) Ufim 8.02 (time: 844 ms  scale: 1.0)
37) Yace 9987 (time: 1040 ms  scale: 1.0)
This list produced 666 data points.
Each data point represents the percentage of matching moves for each
possible pair of engines. The mean for the percentage of
matched moves for pairs of engines is 45.71% and the sample deviation
is 2.31 The only pairs which were more than 3 standard deviations from
the mean were:
Aristarch 4.50 - List 5.12 = 3.94 deviations
Cyrano 0.6b17 - Fruit 2.1 = 3.36 deviations
List 5.12 - Ruffian 2.1.0 = 3.44 deviations

Then, I proceeded by including the data for one engine, compared to
these engines, to the 666 data points. I then recomputed the mean
and deviation on this combined data and then standardized the data.
In cases where the engine being tested was related to an engine
included in the base list, for example Rybka 1.0 beta, I exchanged
the existing engine with the engine being tested.

Here is the list of engine pairs that were at least 3 standard deviations
from their means. 3 standard deviations may not be significant enough
in this context. Perhaps 4 or more deviations would be more appropriate.
Also, the first column of numbers is the percent of matched moves and
the second column is the number of standard deviations from the mean of
the group of engines pairs that the particular pair was compared with.
The numbers of the first column are compareable, the numbers of the
second column are weakly compareable.

Code: Select all

Houdini 1.00                 RobboLito 0.085g3	        71.15	9.90
Loop 2007	                 Fruit 2.1	                71.13	9.74
Onno 1.0.4	                 Fruit 2.1	                67.83	8.71
Rybka 1.0 beta	             Strelka 2.0 B	            66.78	8.65
Thinker 5.4D Inert	         Strelka 2.0 B	            62.07	6.80
Naum 4.2	                 Rybka 1.0 beta	            61.00	6.71
RobboLito 0.085g3	         Houdini 1.5	            61.87	6.59
Naum 4.2	                 Strelka 2.0 B	            61.37	6.48
Philou 3.5.1	             Stockfish 2.0	            61.17	6.23
Alaric 707	                 Fruit 2.1	                60.28	6.02
Colossus 2008	             Fruit 2.1	                59.69	5.83
Glaurung 1.2.1	             Alfil 8.1.1 Opt	        59.07	5.82
Thinker 5.4D Inert	         Rybka 1.0 beta	            58.50	5.78
Fruit 1.5	                 Rotor 0.6	                59.09	5.76
Rybka 3	                     RobboLito 0.085g3	        59.16	5.37
Critter 1.0	                 RobboLito 0.085g3	        59.04	5.29
Strelka 2.0 B	             Fruit 2.1	                58.18	5.02
Delfi 5.4	                 Fruit 2.1	                57.50	4.85
Naraku 1.31	                 Fruit 2.1	                57.00	4.79
TwistedLogic 20090922	     Strelka 2.0 B	            56.65	4.73
Rybka 2.3.2a	             Fruit 2.1	                57.26	4.71
Thinker 5.4D Passive	     Fruit 2.1	                56.65	4.69
Rybka 1.0 Beta	             Fruit 2.1	                56.85	4.68
Critter 1.0	                 Houdini 1.5	            57.40	4.64
SmarThink 1.20	             Fruit 2.1	                55.89	4.33
Hamsters 0.7.1	             Fruit 2.1	                55.38	4.07
Aristarch 4.50	             List 512	                54.82	3.94
RobboLito 0.085g3	         Rybka 4	                54.98	3.77
Rybka 3	                     Houdini 1.5	            55.16	3.76
Pupsi 0.08	                 Fruit 2.1	                54.36	3.71
Glaurung 2.2	             Fruit 2.1	                54.29	3.68
Strelka 2.0 B	             Rybka 4	                54.07	3.62
Daydreamer 1.7	             Strelka 2.0 B	            54.07	3.60
Philou 3.5.1	             Fruit 2.1	                54.53	3.53
List 512	                 Ruffian 2.1.0	            53.65	3.44
Cyrano 0.6b17	             Fruit 2.1	                53.48	3.36
Philou 3.5.1	             Hannibal 1.0a	            53.98	3.30
Strelka 2.0 B	             Hannibal 1.0a	            52.72	3.04
Strelka 2.0 B	             Bison 9.11	                52.63	3.00
All of my data and analysis can be download here: http://www.mediafire.com/file/8d3pq334a ... nalysis.7z

User avatar
Graham Banks
Posts: 33141
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

Re: Pairwise Analysis of Chess Engine Move Selections

Post by Graham Banks » Sun Apr 17, 2011 9:38 pm

Very interesting Adam.
My email addresses:
gbanksnz at gmail.com
gbanksnz at yahoo.co.nz

walter koroljow
Posts: 8
Joined: Fri Oct 15, 2010 9:06 pm
Location: Delaware, USA

Re: Pairwise Analysis of Chess Engine Move Selections

Post by walter koroljow » Sun Apr 17, 2011 10:09 pm

Thank you for your work. This is very interesting, indeed.

Walter

User avatar
kranium
Posts: 1824
Joined: Thu May 29, 2008 8:43 am

Re: Pairwise Analysis of Chess Engine Move Selections

Post by kranium » Sun Apr 17, 2011 11:32 pm

Hi Adam,
thanks for the analysis...

I'm not at all surprised by the Loop/Fruit results in your data above, years ago
I noticed a very strong correlation according to the CCRL ponder hit statistics:

Ponder hit: Most similar pairs (different families only)
# Pair Ponder hit Moves counted
1 Loop 13.6 (Loop 2007) 32-bit – Fruit 2.2.1 75.8 1183
2 Loop 10.32f – Fruit 2.2.1 75.6 1016

3 Onno 1.1.1 32-bit – Loop 13.6 (Loop 2007) 32-bit 73.5 1302
4 Loop M1-T 64-bit 4CPU – Toga II 1.3.1 73.4 1305
5 Loop M1-T 64-bit 2CPU – Toga II 1.3.1 72.6 1529

6 Rybka 2.3.2a 64-bit 2CPU – Naum 3.1 64-bit 2CPU 71.4 1746
7 Cyclone 3.4 – Loop 13.6 (Loop 2007) 32-bit 71.0 1284
8 Toga II 1.4.1SE 4CPU – Onno 1.0 64-bit 70.4 1496
9 Toga II 1.4 beta5c 4CPU – Loop M1-T 64-bit 4CPU 70.0 1354
10 Loop 13.6 (Loop 2007) 64-bit 4CPU – Toga II 1.2.1a 69.9 1904
11 Loop M1-T 64-bit 4CPU – Toga II 1.2.1a 69.7 1577

12 Naum 3.1 64-bit 4CPU – Rybka 2.3.2a 64-bit 69.6 1771
13 Toga II 1.4.1SE6 – Loop 13.6 (Loop 2007) 32-bit 69.6 1415
14 Colossus 2008b – Sloppy 0.2.3 32-bit 69.6 2333
15 Naum 4.2 32-bit – Fritz 12 69.5 1227
16 Loop 13.6 (Loop 2007) 32-bit – Toga II 1.2.1a 69.5 1180
17 Toga II 3.1.2SE – Loop 13.6 (Loop 2007) 32-bit 69.4 1210
18 Loop 13.6 (Loop 2007) 64-bit 4CPU – Toga II 1.3.1 69.2 1420
19 Toga II 1.4.1SE – Loop 13.6 (Loop 2007) 32-bit 69.2 1613

20 Naum 3.1 64-bit – Fritz 11 68.9 2786

Loop correlating to Fruit (or any of it's derivatives)
takes 13 out of the top 20 spots!

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Pairwise Analysis of Chess Engine Move Selections

Post by Adam Hair » Mon Apr 18, 2011 1:07 am

Thanks for the interest. I wish I could make the analysis more complete,
in regards to including more engines. There are a dozen or more engines
I have not been able find some way to make work with the sim tool.
I have no interest in singling out any engine or make accusations, and so
it seems to me the fairest thing would be to test every engine above
2600 Elo (CCRL). Unfortunately, that is not proving to be possible at the
moment.

User avatar
Graham Banks
Posts: 33141
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

Re: Pairwise Analysis of Chess Engine Move Selections

Post by Graham Banks » Wed Apr 20, 2011 6:39 am

Adam Hair wrote:........ so it seems to me the fairest thing would be to test every engine above 2600 Elo (CCRL). Unfortunately, that is not proving to be possible at the moment.
Well your analysis has certainly changed my opinion to a degree.
In light of the fact that it seems that many engines might have heavily borrowed ideas from strong open source engines, I'm more amenable now to the the idea of testing Houdini as the Ippo family representative, and also to continuing testing Naraku.
It would seem pretty difficult now to tar one or two engines without having lingering doubts about others.
Must have a think about all this.

Cheers,
Graham.
My email addresses:
gbanksnz at gmail.com
gbanksnz at yahoo.co.nz

User avatar
Onno Garms
Posts: 224
Joined: Mon Mar 12, 2007 6:31 pm
Location: Bonn, Germany

Re: Pairwise Analysis of Chess Engine Move Selections

Post by Onno Garms » Wed Apr 20, 2011 9:09 am

I feel obliged to comment on some points of your post. I'll go though
them in order of appearance in you post, but adding anything that
comes into my mind and is partially only loosely related to your post.

I post not just because your posting implicitly raises the question
if there are copyright infringements in Onno, but also because in my
opinion the so called clone debate is going into a very wrong
direction.
However, in most cases I could not
find any acknowledgment of the source of the ideas being used.
First, rules in commercial (software) development are different from
the rules in science. While in science, a proper citation is required
for using ideas (even if you don't use them verbatim or translated), this
is not required in commercial development. Commercial development is
ruled by the copyright only.

NB: Science is mostly paid by tax money. Commercial software
development is paid by customers. Both groups are not well paid, but
as for chess, science is better paid than commercial
development. However few are in the lucky situation to have their
chess programming activities sponsored by the tax payer, many work
without any salary and some try to found a business on it. I find it
all OK, but I hope that everybody understands that different
situations lead to different policies what to tell and what not.

Second, in spite of this, I did give credits. They have been on my
website from the day where Onno was published to about one month after
its end. Actually I removed the credits one day day before you made
your post, together with most other stuff that was specific to my
former engine.

The explicitness of such credits is a balancing act between giving
honor to those who deserve it and wanting to sell. In may case, this
amounted to:
Onno Website 2009 to 2011-04-15 wrote: First, there is Fabien Letouzey and all the other people who
contributed to the Fruit project. The study of the fruit sources was
what initially motivated me to start chess programming. The
algorithms were strikingly simple - at least for my scale
- yet very effective. I was pleased with the source but yet I
spotted many things that I wanted to do fundamentally differently. So
I wrote a new engine from scratch. Many but not all of its algorithms
are similar to those in Fruit; it would not have been possible to make
progress so quickly without having the Fruit sources to look at.
Onno's data structures and Onno's software design are however very
different from Fruit and Onno does not contain any code from Fruit or
another engine.
There were also credits to other people which you can look up in
offline version of the Onno website that was contained in the
distribution.
Furthermore, the data I have generated is no proof of anything. It is only
circumstantial, patterns of data that require additional information to
interpret correctly.
Your approach is made to measure similarity of algorithms. At first
glance it seems a reasonable approach, but we chess engine authors
know that not all reasonable looking approaches work. However, as I
currently cannot see major weaknesses of you approach, let us assume
for now that is measures what it attempts to measure.
Is there a discernable threshold where the fair use of ideas turns into
plagarism?
No.
Is it possible to use many of the ideas from another engine and yet create
a unique engine that avoids plagarism?
Don't know what you mean by "unique" and if "plagarism" has a legal
definition. But apart from that: Yes, you can use many of the ideas
from another engine and yet create an engine that avoids copyright
infringement.

As I said, your tests wants to measure similarity of
algorithms. Algorithms are not copyrighted, but their implementation
is.

You must not copy and paste implementations, you must not retype
implementations, you must not translate implementations. You may
however understand algorithms and implement very similar ones (as I
did) - or even identical ones if restructuring is required, if you find
that fun, some employer pays you, some voices tell you to do so,
whatever.

Consequently it is well possible that a high similarity in the
algorithms is neither a coincidence nor a copyright infringement.

The fact that algorithms are not copyrighted is a feature of our
copyright system, not a bug. Algorithms are ideas and there must be
freedom of ideas. Putting copyright on algorithms would be harmful for
science and global progress. In the US, you can patent algorithms, but
Fruit is not patented. In the EU, there are many people up in arms against
software patents - for a good reason.

Freedom of ideas includes the freedom to use them commercially.

If you feel inclined to hunt for copyright infringements, you should go
for the implementations, not for the algorithms. Check for
initialization mechanisms. In case of Fruit, check for a
longjump. Check for bugs in the UCI implementation, such as this one:

Code: Select all

UCI specification April 2006, downloaded 2011-04-20 from www.shredderchess.com:
* arbitrary white space between tokens is allowed
  Example: "debug on\n" and  "   debug     on  \n" and "\t  debug \t  \t\ton\t  \n"
  all set the debug mode of the engine on.


onno@diden:~/Schach/engines/fruit_2.1> ./fruit 
Fruit 2.1 UCI by Fabien Letouzey
uci
id name Fruit 2.1
id author Fabien Letouzey
<snip>
uciok
position startpos moves      e2e4
go depth 1
info depth 1
info depth 1 seldepth 1 score cp -402 time 0 nodes 2 pv b1a3
info depth 1 seldepth 1 score cp -374 time 0 nodes 3 pv b1c3
info depth 1 seldepth 2 time 0 nodes 33 nps 0
info time 0 nodes 33 nps 0 cpuload 1000
info hashfull 0
bestmove b1c3
NB: Consequently, the "evidence" against Vas is interesting to read
from a scientific viewpoint, but most of it is legally meaningless.

(Disclaimer: I have not verified the evidence. However the evidence
is extremely detailed and therefore verifiable or falsifiable. I
cannot imagine that anybody is bold enough to invent such an
evidence. I hope that several people do check it.)

You might still argue that in spite of the legal pointlessness, the
ICGA should adapt your minimum pairwise distance metric to define
standards. I would disagree on such an approach.

The same way as putting copyright on algorithms would be harmful for
science and global progress, defining standards for chess software
that protect algorithms would be harmful for the future development of
the playing strength of engines.

I hope that the ICGA will not try to create more rigorous rules of
originality than the legal rules of copyright. Either contestants
would not obey these rules, or the market would split into strong (but
still legal) engines and ICGA-compatible engines. Guess which ones
would be used by ordinary chess players. The latter would be highly
harmful to the ICGAs reputation.

Have there been any formal ICGA standards what a violation of
originality is at the time where Rybka was released? Up to now I have
not seen such standards, but of course I might have overlooked them. I
am certain that I never confirmed that Onno complies with any such
standards, because I would have checked them carefully.

I am sure that there would be findings in many other new engines
too. Rybka was the one that was reverse engineered because it is the
World Champion.

The downside of reducing the standards of originality to the copyright
is that it becomes possible to do an unnecessary reimplementation of
an open source engine just to go closed source with it and sell it.

Note however that this never happened until now. Neither Rybka nor
Onno is a mere reimplementation of Fruit. It made sense to reimplement
a major portion of the algorithms for other reasons than the
copyright. Also, both have added new things and changed others.

I doubt that ever anybody will do an unnecessary rewrite of an open
source engine and do nothing else. Rewriting a well written engine would
be a lot of dull work. Also you can neither expect to win the world
championship nor expect to make an adequate amount of money if you
don't add something fundamentally new.

If anybody has so revolutionary ideas how to improve a chess engine A
that he thinks that with that improvement engine A is still ahead of
other engines A' and B which have made progress over the time required
for a rewrite, and furthermore the title is important enough for him
to do a rewrite, he should be allowed to do so and release it as chess
engine C.
Is it possible that there are explanations for my data other than the overuse
of ideas from open source engines?
I don't know. I ran a similar thing in 2008. I used SCID game analysis
with same time for all engines; I compared number of matching moves
and evaluation (including search of course, as static evaluation is
not accessible). Note however that in case an author would want to
hide a relationship, he might disguise the output of the evaluation
just as Vas disguised the output of nps and depth.

I compared only the raw data; I did not normalize to standard
deviations. Furthermore I think that your normalizing is questionable, see
below.

Using my method I found that Onno is more similar to Loop than to
Fruit. However, while the similarity to Fruit is not surprising, I
have never seen the Loop sources and never reverse engineered Loop.

As I am much less interested than you in which engine uses algorithms
from which other engine, I did not go into a detailed analysis, so I
am not sure if the larger similarity to Loop can be confirmed.
Also, instead of looking at the raw percentages (which are relative to
the positions found in the tool and the conditions of the test), I
standardized the data. This involves computing the mean (average) and
the sample standard deviation and then converted the data with this
formula: (Mean - Percent matched)/ standard deviation. The resulting
numbers convey how far from the mean the Percent of matched moves
is in each case. This also informs how likely this result would occur if
the assumptions of the test is true. The following table gives the
probability of the result occurring:

Code:

1&#963; 31.7310508% 1 / 3.1514872
2&#963; 4.5500264% 1 / 21.977895
3&#963; 0.2699796% 1 / 370.398
4&#963; 0.006334% 1 / 15,787
5&#963; 0.0000573303% 1 / 1,744,278
6&#963; 0.0000001973% 1 / 506,800,000
7&#963; 0.0000000002560% 1 / 390,700,000,000
Normalization requires the standard deviation. If I got you correctly,
the computation of the standard deviation was based on 37 presumed
"original" engines. However, their selection is highly
subjective. Either you have to select them in advance, based on the
publically available knowledge which often does not include the
sources, or you have to choose them based on your results, which would
contradict the fundamental principles of a statistical experiment. For that
reason, above cannot be considered as the probability of a
coincidence. I would think that it is impossible to give anything like
a probability of coincidence.
This list produced 666 data points.
However, these 666 "data points" are values of random variables that
are not stochastically independent. I am not sure if it is OK to apply
your maths under these conditions.


Finally, let's have a look at Onno and Rybka 1.0 Beta.

Rybka 1.0 Beta:
- significantly stronger than Fruit
- significantly stronger than any other engine those days
- was released in 2005 with great rumor
- written by an author who had quit his job, so risked everything, to
go full time and become world computer chess champion
- development possibly sped up by illegal copying of code from Fruit

Onno
- significantly stronger than Fruit
- about the same strength as Rybka 1.0 Beta
- was released in 2009 and was mostly unconsidered
- contains only minor ideas that had become available *publically*
in the years 2006-2009
- written by an author who was not crazy enough to quit his nice job
- free of copyrighted code from other engines

Basically the same thing, just a fast and a slow version.

Shake-hands to Vas. He went high risk and his and Fabien's contribution
to computer chess is the most significant in the past years. Neither
Vas nor Fabien would have nearly as much influence without the
other. Just look how the community messed up the Fruit sources :-(

Vas could not have done his work without going closed source as he is
said to have been broke when he started. He could have done his work
without copyright infringements (if there are any - Ei incumbit
probatio qui dicit, non qui negat), but he might have needed some
financial support from his relatives and would have increased the risk
that others are faster than he is.

Vas also contributed significantly to the global progress as closing
the source did not hide it successfully from the public.

If I were Fabien, I would not want to hang the man or men who helped
him to leave much more distinct permanent traces in the world of
computer chess than he alone would have been able to leave. But of
course it is his code and his decision.

Hope this helps to understand my position.

Sven
Posts: 3826
Joined: Thu May 15, 2008 7:57 pm
Location: Berlin, Germany
Full name: Sven Schüle
Contact:

Re: Pairwise Analysis of Chess Engine Move Selections

Post by Sven » Wed Apr 20, 2011 11:18 am

Graham Banks wrote:
Adam Hair wrote:........ so it seems to me the fairest thing would be to test every engine above 2600 Elo (CCRL). Unfortunately, that is not proving to be possible at the moment.
Well your analysis has certainly changed my opinion to a degree.
In light of the fact that it seems that many engines might have heavily borrowed ideas from strong open source engines, I'm more amenable now to the the idea of testing Houdini as the Ippo family representative, and also to continuing testing Naraku.
It would seem pretty difficult now to tar one or two engines without having lingering doubts about others.
Must have a think about all this.
I must admit that this analysis is indeed very interesting, and I highly appreciate the effort. But I don't see how it changes anything substantial for the "engine origins" discussion (and as I understand Adam's statements this was also not his main intention). The key point of this analysis is that certain engines behave similar to each other regarding their move selection. There are several possible explanations for this similarity. A common explanation is that these engines are partially based on the same ideas and algorithms.

Some people may think that selecting the same moves as another engine means that code must have been copied from that other engine. But "borrowing ideas" should not be mixed up with "borrowing implementation (code)", and it is important to understand that selecting the same moves as another program *may* imply reuse of algorithms but *does not* imply reuse of code. (And, for those not being familiar with formal logic, it is also important to understand the meaning of "to imply" ;-) )

The former, "borrowing ideas", is fully o.k., even to a high extent, although one could argue that having two engines sharing the same ideas but implementing them differently is somewhat less "interesting" than having two engines with different ideas.

The latter, "borrowing implementation", is o.k. as long as copyright and license are not violated.


If your rule of thumb were that there should be only one representative from a "family" being included into testing activities, and if it is defined that an engine B belongs to the "family" of another (original) engine A if it is based on at least a certain percentage X of ideas and algorithms taken from A, then you would face these two problems:

- you would have to define the value of X, which is practically impossible;

- you would have to exclude a lot of engines from testing, including all Ippo* engines, Houdini, Fire, Strelka, Rybka, but - according to the data presented here - perhaps also Onno, Loop, Critter, Thinker, Delfi, Naraku, TwistedLogic, SmarThink, Hamsters, Pupsi, Daydreamer, Philou, Cyrano, ... - which obviously goes into the wrong direction.

Another problem would be, for consistency you would also have to exclude those open source engines which are legally derived from another engine in full GPL compliance, like Toga. But for which reason? Legally sharing implementation does of course imply sharing ideas and algorithms. This shows that the notion of "borrowing ideas" does not qualify as a guideline to decide about testing or not testing an engine.


In my opinion a better rule of thumb would be to continue concentrating on the cases where copyright/license violation is proven - difficult enough as it is, but looking more consistent to me at least.

This rule does not help much for the Ippolit case itself, though, since there is no clear agreement yet about the question whether heavily reusing code by reverse engineering a commercial program is considered as a copyright violation, and also whether such a heavy code reuse actually happened in case of Ippolit. In my opinion both get a "yes" but certainly others disagree.

As a second rule, I would still insist on not testing engines by anonymous authors.

Sven

User avatar
Graham Banks
Posts: 33141
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

Re: Pairwise Analysis of Chess Engine Move Selections

Post by Graham Banks » Wed Apr 20, 2011 11:28 am

Sven Schüle wrote:
Graham Banks wrote:
Adam Hair wrote:........ so it seems to me the fairest thing would be to test every engine above 2600 Elo (CCRL). Unfortunately, that is not proving to be possible at the moment.
Well your analysis has certainly changed my opinion to a degree.
In light of the fact that it seems that many engines might have heavily borrowed ideas from strong open source engines, I'm more amenable now to the the idea of testing Houdini as the Ippo family representative, and also to continuing testing Naraku.
It would seem pretty difficult now to tar one or two engines without having lingering doubts about others.
Must have a think about all this.
I must admit that this analysis is indeed very interesting, and I highly appreciate the effort. But I don't see how it changes anything substantial for the "engine origins" discussion (and as I understand Adam's statements this was also not his main intention). The key point of this analysis is that certain engines behave similar to each other regarding their move selection. There are several possible explanations for this similarity. A common explanation is that these engines are partially based on the same ideas and algorithms.

Some people may think that selecting the same moves as another engine means that code must have been copied from that other engine. But "borrowing ideas" should not be mixed up with "borrowing implementation (code)", and it is important to understand that selecting the same moves as another program *may* imply reuse of algorithms but *does not* imply reuse of code. (And, for those not being familiar with formal logic, it is also important to understand the meaning of "to imply" ;-) )

The former, "borrowing ideas", is fully o.k., even to a high extent, although one could argue that having two engines sharing the same ideas but implementing them differently is somewhat less "interesting" than having two engines with different ideas.

The latter, "borrowing implementation", is o.k. as long as copyright and license are not violated.


If your rule of thumb were that there should be only one representative from a "family" being included into testing activities, and if it is defined that an engine B belongs to the "family" of another (original) engine A if it is based on at least a certain percentage X of ideas and algorithms taken from A, then you would face these two problems:

- you would have to define the value of X, which is practically impossible;

- you would have to exclude a lot of engines from testing, including all Ippo* engines, Houdini, Fire, Strelka, Rybka, but - according to the data presented here - perhaps also Onno, Loop, Critter, Thinker, Delfi, Naraku, TwistedLogic, SmarThink, Hamsters, Pupsi, Daydreamer, Philou, Cyrano, ... - which obviously goes into the wrong direction.

Another problem would be, for consistency you would also have to exclude those open source engines which are legally derived from another engine in full GPL compliance, like Toga. But for which reason? Legally sharing implementation does of course imply sharing ideas and algorithms. This shows that the notion of "borrowing ideas" does not qualify as a guideline to decide about testing or not testing an engine.


In my opinion a better rule of thumb would be to continue concentrating on the cases where copyright/license violation is proven - difficult enough as it is, but looking more consistent to me at least.

This rule does not help much for the Ippolit case itself, though, since there is no clear agreement yet about the question whether heavily reusing code by reverse engineering a commercial program is considered as a copyright violation, and also whether such a heavy code reuse actually happened in case of Ippolit. In my opinion both get a "yes" but certainly others disagree.

As a second rule, I would still insist on not testing engines by anonymous authors.

Sven
Hi Sven,

I didn't mean to imply any copying of code by any of these engine authors. I agree that the use of ideas is obviously fine.
The main point that I was trying to make was that I now find it difficult to single out one or two engines in particular, given that Adam's analysis seems to show that the implementation of ideas from open source engines is fairly widespead.
Please correct me if you think I'm wrong in this assumption.

Cheers,
Graham.
My email addresses:
gbanksnz at gmail.com
gbanksnz at yahoo.co.nz

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Pairwise Analysis of Chess Engine Move Selections

Post by Adam Hair » Wed Apr 20, 2011 6:01 pm

Onno Garms wrote:I feel obliged to comment on some points of your post. I'll go though
them in order of appearance in you post, but adding anything that
comes into my mind and is partially only loosely related to your post.

I post not just because your posting implicitly raises the question
if there are copyright infringements in Onno, but also because in my
opinion the so called clone debate is going into a very wrong
direction.
Onno, first of all I would like to thank you for responding. Although there
is an implicit question of your program's origin in my post, that really was
not my intent. Let's blame it on my inability to find the right words to
avoid this. My purpose was to question why certain patterns appeared in
the data. Although I am not equipped to question the methods used and
the conclusions made in the comparison of Rybka and Fruit 2.1, the
question arose in me as to why there were engines that choose moves
in common with Fruit 2.1 at a higher frequency than Rybka 1.0 beta
does, given the amount of evaluation features in common.
Onno Garms wrote:
However, in most cases I could not
find any acknowledgment of the source of the ideas being used.
First, rules in commercial (software) development are different from
the rules in science. While in science, a proper citation is required
for using ideas (even if you don't use them verbatim or translated), this
is not required in commercial development. Commercial development is
ruled by the copyright only.

NB: Science is mostly paid by tax money. Commercial software
development is paid by customers. Both groups are not well paid, but
as for chess, science is better paid than commercial
development. However few are in the lucky situation to have their
chess programming activities sponsored by the tax payer, many work
without any salary and some try to found a business on it. I find it
all OK, but I hope that everybody understands that different
situations lead to different policies what to tell and what not.
I have to agree with you in general. However, given the particular
mixture of commercial, amateur, and academic authors in this field,
the determination of what is proper acknowledgement seems to be
seems to be in the eye of the beholder.
Onno Garms wrote: Second, in spite of this, I did give credits. They have been on my
website from the day where Onno was published to about one month after
its end. Actually I removed the credits one day day before you made
your post, together with most other stuff that was specific to my
former engine.

The explicitness of such credits is a balancing act between giving
honor to those who deserve it and wanting to sell. In may case, this
amounted to:
Onno Website 2009 to 2011-04-15 wrote: First, there is Fabien Letouzey and all the other people who
contributed to the Fruit project. The study of the fruit sources was
what initially motivated me to start chess programming. The
algorithms were strikingly simple - at least for my scale
- yet very effective. I was pleased with the source but yet I
spotted many things that I wanted to do fundamentally differently. So
I wrote a new engine from scratch. Many but not all of its algorithms
are similar to those in Fruit; it would not have been possible to make
progress so quickly without having the Fruit sources to look at.
Onno's data structures and Onno's software design are however very
different from Fruit and Onno does not contain any code from Fruit or
another engine.
There were also credits to other people which you can look up in
offline version of the Onno website that was contained in the
distribution.
I chose not list any authors who credited their source, out of the fear
that I missed finding an acknowledgement.
Onno Garms wrote:
Furthermore, the data I have generated is no proof of anything. It is only
circumstantial, patterns of data that require additional information to
interpret correctly.
Your approach is made to measure similarity of algorithms. At first
glance it seems a reasonable approach, but we chess engine authors
know that not all reasonable looking approaches work. However, as I
currently cannot see major weaknesses of you approach, let us assume
for now that is measures what it attempts to measure.
Perhaps it does not. If it does not, and somebody shows why, I would
consider that a satisfying result.
Onno Garms wrote:
Is there a discernable threshold where the fair use of ideas turns into
plagarism?
No.
Is it possible to use many of the ideas from another engine and yet create
a unique engine that avoids plagarism?
Don't know what you mean by "unique" and if "plagarism" has a legal
definition. But apart from that: Yes, you can use many of the ideas
from another engine and yet create an engine that avoids copyright
infringement.

As I said, your tests wants to measure similarity of
algorithms. Algorithms are not copyrighted, but their implementation
is.

You must not copy and paste implementations, you must not retype
implementations, you must not translate implementations. You may
however understand algorithms and implement very similar ones (as I
did) - or even identical ones if restructuring is required, if you find
that fun, some employer pays you, some voices tell you to do so,
whatever.

Consequently it is well possible that a high similarity in the
algorithms is neither a coincidence nor a copyright infringement.
I guess this is a sticking point amongst the different factions of
authors. Plagarism and copyright infringement seem to be two
different standards.
Onno Garms wrote: The fact that algorithms are not copyrighted is a feature of our
copyright system, not a bug. Algorithms are ideas and there must be
freedom of ideas. Putting copyright on algorithms would be harmful for
science and global progress. In the US, you can patent algorithms, but
Fruit is not patented. In the EU, there are many people up in arms against
software patents - for a good reason.

Freedom of ideas includes the freedom to use them commercially.
That is true.
Onno Garms wrote: If you feel inclined to hunt for copyright infringements, you should go
for the implementations, not for the algorithms. Check for
initialization mechanisms. In case of Fruit, check for a
longjump. Check for bugs in the UCI implementation, such as this one:

Code: Select all

UCI specification April 2006, downloaded 2011-04-20 from www.shredderchess.com&#58;
* arbitrary white space between tokens is allowed
  Example&#58; "debug on\n" and  "   debug     on  \n" and "\t  debug \t  \t\ton\t  \n"
  all set the debug mode of the engine on.


onno@diden&#58;~/Schach/engines/fruit_2.1> ./fruit 
Fruit 2.1 UCI by Fabien Letouzey
uci
id name Fruit 2.1
id author Fabien Letouzey
<snip>
uciok
position startpos moves      e2e4
go depth 1
info depth 1
info depth 1 seldepth 1 score cp -402 time 0 nodes 2 pv b1a3
info depth 1 seldepth 1 score cp -374 time 0 nodes 3 pv b1c3
info depth 1 seldepth 2 time 0 nodes 33 nps 0
info time 0 nodes 33 nps 0 cpuload 1000
info hashfull 0
bestmove b1c3
NB: Consequently, the "evidence" against Vas is interesting to read
from a scientific viewpoint, but most of it is legally meaningless.

I do not have the ability nor the inclination to hunt clones. Though, I
am interested in the standards being applied to make such a
determination.

(Disclaimer: I have not verified the evidence. However the evidence
is extremely detailed and therefore verifiable or falsifiable. I
cannot imagine that anybody is bold enough to invent such an
evidence. I hope that several people do check it.)
I do not have the ability nor the inclination to hunt clones. Though, I
am interested in the standards being applied to make such a
determination.
Onno Garms wrote: You might still argue that in spite of the legal pointlessness, the
ICGA should adapt your minimum pairwise distance metric to define
standards. I would disagree on such an approach.

The same way as putting copyright on algorithms would be harmful for
science and global progress, defining standards for chess software
that protect algorithms would be harmful for the future development of
the playing strength of engines.

I hope that the ICGA will not try to create more rigorous rules of
originality than the legal rules of copyright. Either contestants
would not obey these rules, or the market would split into strong (but
still legal) engines and ICGA-compatible engines. Guess which ones
would be used by ordinary chess players. The latter would be highly
harmful to the ICGAs reputation.

Have there been any formal ICGA standards what a violation of
originality is at the time where Rybka was released? Up to now I have
not seen such standards, but of course I might have overlooked them. I
am certain that I never confirmed that Onno complies with any such
standards, because I would have checked them carefully.

I am sure that there would be findings in many other new engines
too. Rybka was the one that was reverse engineered because it is the
World Champion.

The downside of reducing the standards of originality to the copyright
is that it becomes possible to do an unnecessary reimplementation of
an open source engine just to go closed source with it and sell it.

Note however that this never happened until now. Neither Rybka nor
Onno is a mere reimplementation of Fruit. It made sense to reimplement
a major portion of the algorithms for other reasons than the
copyright. Also, both have added new things and changed others.

I doubt that ever anybody will do an unnecessary rewrite of an open
source engine and do nothing else. Rewriting a well written engine would
be a lot of dull work. Also you can neither expect to win the world
championship nor expect to make an adequate amount of money if you
don't add something fundamentally new.

If anybody has so revolutionary ideas how to improve a chess engine A
that he thinks that with that improvement engine A is still ahead of
other engines A' and B which have made progress over the time required
for a rewrite, and furthermore the title is important enough for him
to do a rewrite, he should be allowed to do so and release it as chess
engine C.
Onno Garms wrote:
Is it possible that there are explanations for my data other than the overuse
of ideas from open source engines?
I don't know. I ran a similar thing in 2008. I used SCID game analysis
with same time for all engines; I compared number of matching moves
and evaluation (including search of course, as static evaluation is
not accessible). Note however that in case an author would want to
hide a relationship, he might disguise the output of the evaluation
just as Vas disguised the output of nps and depth.

I compared only the raw data; I did not normalize to standard
deviations. Furthermore I think that your normalizing is questionable, see
below.

Using my method I found that Onno is more similar to Loop than to
Fruit. However, while the similarity to Fruit is not surprising, I
have never seen the Loop sources and never reverse engineered Loop.

As I am much less interested than you in which engine uses algorithms
from which other engine, I did not go into a detailed analysis, so I
am not sure if the larger similarity to Loop can be confirmed.
Also, instead of looking at the raw percentages (which are relative to
the positions found in the tool and the conditions of the test), I
standardized the data. This involves computing the mean (average) and
the sample standard deviation and then converted the data with this
formula: (Mean - Percent matched)/ standard deviation. The resulting
numbers convey how far from the mean the Percent of matched moves
is in each case. This also informs how likely this result would occur if
the assumptions of the test is true. The following table gives the
probability of the result occurring:

Code:

1&#963; 31.7310508% 1 / 3.1514872
2&#963; 4.5500264% 1 / 21.977895
3&#963; 0.2699796% 1 / 370.398
4&#963; 0.006334% 1 / 15,787
5&#963; 0.0000573303% 1 / 1,744,278
6&#963; 0.0000001973% 1 / 506,800,000
7&#963; 0.0000000002560% 1 / 390,700,000,000
Normalization requires the standard deviation. If I got you correctly,
the computation of the standard deviation was based on 37 presumed
"original" engines. However, their selection is highly
subjective. Either you have to select them in advance, based on the
publically available knowledge which often does not include the
sources, or you have to choose them based on your results, which would
contradict the fundamental principles of a statistical experiment. For that
reason, above cannot be considered as the probability of a
coincidence. I would think that it is impossible to give anything like
a probability of coincidence.
There was one presumption that I made in this, besides not including
multiple engines from the same family in the comparisons. That was,
if an engine showed a high rate of similarity to Fruit 2.1, it was not
considered to be different from Fruit. Thus, it was not included in the
group of 37 engines, which included Fruit 2.1 for most of the comparisons.
I realise that there is some question to the validity of this. However,
including all engines washes out the higher correlation between Rybka 1.0
and Fruit 2.1, which are assumed to be related due to the large amount
of overlap in terms of evaluation. This is the reason why I left open
how many deviations from the mean would be appropriate to assume
statistical significance.
Onno Garms wrote:
This list produced 666 data points.
However, these 666 "data points" are values of random variables that
are not stochastically independent. I am not sure if it is OK to apply
your maths under these conditions.
I have to admit that I am relying on the observation that independence
necessary in mathematical proofs of statistical theorems but tends to
have lesser importance in application. And again, that is why I did not
try to attach the usual standards for statistical significance to my data.
Onno Garms wrote: Finally, let's have a look at Onno and Rybka 1.0 Beta.

Rybka 1.0 Beta:
- significantly stronger than Fruit
- significantly stronger than any other engine those days
- was released in 2005 with great rumor
- written by an author who had quit his job, so risked everything, to
go full time and become world computer chess champion
- development possibly sped up by illegal copying of code from Fruit

Onno
- significantly stronger than Fruit
- about the same strength as Rybka 1.0 Beta
- was released in 2009 and was mostly unconsidered
- contains only minor ideas that had become available *publically*
in the years 2006-2009
- written by an author who was not crazy enough to quit his nice job
- free of copyrighted code from other engines

Basically the same thing, just a fast and a slow version.

Shake-hands to Vas. He went high risk and his and Fabien's contribution
to computer chess is the most significant in the past years. Neither
Vas nor Fabien would have nearly as much influence without the
other. Just look how the community messed up the Fruit sources :-(

Vas could not have done his work without going closed source as he is
said to have been broke when he started. He could have done his work
without copyright infringements (if there are any - Ei incumbit
probatio qui dicit, non qui negat), but he might have needed some
financial support from his relatives and would have increased the risk
that others are faster than he is.

Vas also contributed significantly to the global progress as closing
the source did not hide it successfully from the public.

If I were Fabien, I would not want to hang the man or men who helped
him to leave much more distinct permanent traces in the world of
computer chess than he alone would have been able to leave. But of
course it is his code and his decision.

Hope this helps to understand my position.
Thanks again for your reply Onno. I appreciate your opinions and
critique.

Adam

Post Reply