Similarity tool myth - debunked.

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Similarity tool myth - debunked.

Post by Don »

Over the many months since I released the similarity tool I have heard
a LOT of comments implying that the similarity test is highly
sensitive to pieces square tables. The way it typically is expressed is
that all you have to do is change the piece square tables to fool the
similarity tool and "that's all it really measures."

Sometimes someone will make an assertion and over time it is given
"factual" status by it's being repeated enough times that pepole start
to believe it. This is a form of social (unintended) indoctrination
because the indocrinated person is not expected to question or examine
the facts critically. They go along with what they have heard and it
becomes part of their own belief system. Could that be happening
here?

I have never seen proof of this presented, only the assertion
itself. If anyone is aware of any studies I would be very interested
in seeing them.

Meanwhile, I decided to do my own study and I'm now reporting what I
have found.

I am starting with a comparison to Ivanhoe, using a development
version of Komodo, and 3 additional modified versions of Komodo. Two of
the modified versions of Komodo incorporarte the Ivanhoe piece square
table.

I am not very familiar with the Ippo family of programs so I first had
to get familiar with the code and find the tables in the source code -
that was not too hard. It turns out the tables are computed in
static.c - so the first step was to create komodo compatible
declarations. After doing so we end up with the following Ivanhoe
tables. Note that each square in the tables are represented with 2
values (pairs) where the first is the opening value and the second is
the endgame value. Since most major programs including Komodo now do
it that way there is no compatibility issues. The actual value used
in the program is interpolated by game phase. The tables are adjusted
to appear from the WHITE point of view as it would in a diagram and I
use the same convention in Komodo too.

Code: Select all

PAWN
  0,  0,    0,  0,    0,  0,    0,  0,    0,  0,    0,  0,    0,  0,   0,   0, 
-17, -2,   -5, -4,    1, -6,    8, -8,    8, -8,    1, -6,   -5, -4,  -17, -2, 
-18, -4,   -6, -6,    0, -8,    7,-10,    7,-10,    0, -8,   -6, -6,  -18, -4, 
-19, -5,   -7, -7,   -1, -9,    6,-11,    6,-11,   -1, -9,   -7, -7,  -19, -5, 
-21, -6,   -9, -8,   -3,-10,    4,-12,    4,-12,   -3,-10,   -9, -8,  -21, -6, 
-22, -7,  -10, -9,   -4,-11,    3,-13,    3,-13,   -4,-11,  -10, -9,  -22, -7, 
-23, -7,  -11, -9,   -5,-11,    2,-13,    2,-13,   -5,-11,  -11, -9,  -23, -7, 
  0,  0,    0,  0,    0,  0,    0,  0,    0,  0,    0,  0,    0,  0,    0,  0, 

KNIGHT
-120, -15,  -21,-10,  -10, -5,   -6, -2,   -6, -2,  -10, -5,  -21,-10, -120,-15, 
 -16, -8,     0, -1,   11,  3,   15,  5,   15,  5,   11,  3,    0, -1,  -16, -8, 
  -7, -3,     9,  3,   20,  8,   24, 10,   24, 10,   20,  8,    9,  3,   -7, -3, 
  -5, -4,    11,  1,   22,  6,   26, 10,   26, 10,   22,  6,   11,  1,   -5, -4, 
 -11, -6,     5, -1,   16,  4,   20,  8,   20,  8,   16,  4,    5, -1,  -11, -6, 
 -20,-10,    -4, -4,    7,  1,   11,  3,   11,  3,    7,  1,   -4, -4,  -20,-10, 
 -36,-15,   -20, -8,   -9, -4,   -5, -2,   -5, -2,   -9, -4,  -20, -8,  -36,-15, 
 -58,-22,   -42,-17,  -31,-12,  -27, -9,  -27, -9,  -31,-12,  -42,-17,  -58,-22, 

BISHOP
 -2,  0,   -3, -1,   -6, -2,   -8, -2,   -8, -2,   -6, -2,   -3, -1,   -2,  0, 
 -3, -1,    3,  1,    0,  0,   -2,  0,   -2,  0,    0,  0,    3,  1,   -3, -1, 
 -6, -2,    0,  0,    7,  3,    6,  2,    6,  2,    7,  3,    0,  0,   -6, -2, 
 -8, -2,   -2,  0,    6,  2,   15,  5,   15,  5,    6,  2,   -2,  0,   -8, -2, 
 -8, -2,   -2,  0,    6,  2,   15,  5,   15,  5,    6,  2,   -2,  0,   -8, -2, 
 -6, -2,    0,  0,    7,  3,    6,  2,    6,  2,    7,  3,    0,  0,   -6, -2, 
 -3, -1,    3,  1,    0,  0,   -2,  0,   -2,  0,    0,  0,    3,  1,   -3, -1, 
 -7,  0,   -8, -1,  -11, -2,  -13, -2,  -13, -2,  -11, -2,   -8, -1,   -7,  0, 

ROOK
 -4, -2,    0, -2,    4, -2,    8, -2,    8, -2,    4, -2,    0, -2,   -4, -2, 
 -4,  1,    0,  1,    4,  1,    8,  1,    8,  1,    4,  1,    0,  1,   -4,  1, 
 -4,  1,    0,  1,    4,  1,    8,  1,    8,  1,    4,  1,    0,  1,   -4,  1, 
 -4,  1,    0,  1,    4,  1,    8,  1,    8,  1,    4,  1,    0,  1,   -4,  1, 
 -4,  0,    0,  0,    4,  0,    8,  0,    8,  0,    4,  0,    0,  0,   -4,  0, 
 -4,  0,    0,  0,    4,  0,    8,  0,    8,  0,    4,  0,    0,  0,   -4,  0, 
 -4,  0,    0,  0,    4,  0,    8,  0,    8,  0,    4,  0,    0,  0,   -4,  0, 
 -4,  0,    0,  0,    4,  0,    8,  0,    8,  0,    4,  0,    0,  0,   -4,  0, 

QUEEN
-11,-15,   -7,-10,   -4, -8,   -2, -7,   -2, -7,   -4, -8,   -7,-10,  -11,-15, 
 -7,-10,   -1, -5,    1, -3,    3, -2,    3, -2,    1, -3,   -1, -5,   -7,-10, 
 -4, -8,    1, -3,    5,  0,    6,  2,    6,  2,    5,  0,    1, -3,   -4, -8, 
 -2, -7,    3, -2,    6,  2,    9,  5,    9,  5,    6,  2,    3, -2,   -2, -7, 
 -2, -7,    3, -2,    6,  2,    9,  5,    9,  5,    6,  2,    3, -2,   -2, -7, 
 -4, -8,    1, -3,    5,  0,    6,  2,    6,  2,    5,  0,    1, -3,   -4, -8, 
 -7,-10,   -1, -5,    1, -3,    3, -2,    3, -2,    1, -3,   -1, -5,   -7,-10, 
-16,-15,  -12,-10,   -9, -8,   -7, -7,   -7, -7,   -9, -8,  -12,-10,  -16,-15, 

KING
  5,-53,   10,-30,  -20,-14,  -40, -8,  -40, -8,  -20,-14,   10,-30,    5,-53, 
 15,-35,   20,-10,  -10,  2,  -30,  8,  -30,  8,  -10,  2,   20,-10,   15,-35, 
 25,-24,   30, -3,    0, 12,  -20, 18,  -20, 18,    0, 12,   30, -3,   25,-24, 
 30,-18,   35,  3,    5, 18,  -15, 27,  -15, 27,    5, 18,   35,  3,   30,-18, 
 35,-23,   40, -2,   10, 13,  -10, 22,  -10, 22,   10, 13,   40, -2,   35,-23, 
 38,-29,   43, -8,   13,  7,   -7, 13,   -7, 13,   13,  7,   43, -8,   38,-29, 
 41,-40,   46,-15,   16, -3,   -4,  3,   -4,  3,   16, -3,   46,-15,   41,-40, 
 44,-73,   49,-50,   19,-34,   -1,-28,   -1,-28,   19,-34,   49,-50,   44,-73, 
In the evaluation function, the value of a pawn in Komodo is 1000.
I'm pretty sure that it's 100 in Ivanhoe and the the tables represent
their true value. If anyone knows differently please let me know. My
goal was to create a more or less drop in replacement table for Komodo
but without unduly obsessing about getting it perfect.

So to get the equivalent tables I multiplied the Ivahnoe values by 10.
I believe this gives me piece square tables that are exactly like the
Ivanhoe tables. In Komodo I use tables that can be adjusted by a
multiplier so I made a version also that uses the Komodo multipliers
as a second reference point.

I also made a third version that is not based on Ivanhoe - I took the
mobility values in Komodo and reduced them to 1/10 of their current
values. The idea here is to see if it's relatively simple to change
the evaluation funciton in a way that easily defeats the similarity
test. This was something I could do quickly and easily that should
have a relatively large impact on the evaluation function and yet not
completely cripple Komodo.

For referece I threw in a couple of critter versions, stockfish
versions and Komodo 3. So the question that comes to mind is this,
did I succeed in creating a program that plays just like Ivanhoe by
simply copying their piece square table? Strangely enough the answer
seems be that it had no impact whatsoever. At the end I will explain
why this should be no big suprise.


sim version 3
------ IvanHoe 9.47b x64 (time: 100 ms scale: 1.0) ------
62.09 Critter 1.4 64-bit SSE4 (time: 100 ms scale: 1.0)
61.58 Critter 1.6 64-bit (time: 100 ms scale: 1.0)
53.12 Komodo 4476.03 64-bit (time: 100 ms scale: 1.0)
52.40 Komodo 3 (time: 100 ms scale: 1.0)
52.33 Komodo with Ivanhoe tables x 10 (time: 100 ms scale: 1.0)
51.77 Stockfish 2.0 JA 64bit (time: 100 ms scale: 1.0)
51.53 Komodo with 1/10 mobility (time: 100 ms scale: 1.0)
51.02 Stockfish 2.2.2 JA (time: 100 ms scale: 1.0)
49.79 Komodo with Ivanhoe tables (time: 100 ms scale: 1.0)


When comparing the reference development version it is interesting to
see if this major table change affects Komodo's similarity to itself.
That is useful because we can imagine a cloner starting with Komodo RE
source and then trying to change the tables in order to defeat the
similarity tool:

sim version 3
------ Komodo 4476.03 64-bit (time: 100 ms scale: 1.0) ------
67.08 Komodo with Ivanhoe tables x 10 (time: 100 ms scale: 1.0)
63.15 Komodo 3 (time: 100 ms scale: 1.0)
61.45 Komodo with Ivanhoe tables (time: 100 ms scale: 1.0)
60.57 Komodo with 1/10 mobility (time: 100 ms scale: 1.0)
53.84 Critter 1.6 64-bit (time: 100 ms scale: 1.0)
53.63 Critter 1.4 64-bit SSE4 (time: 100 ms scale: 1.0)
53.12 IvanHoe 9.47b x64 (time: 100 ms scale: 1.0)
51.06 Stockfish 2.0 JA 64bit (time: 100 ms scale: 1.0)
50.87 Stockfish 2.2.2 JA (time: 100 ms scale: 1.0)

Here is the reason this should not suprise anyone:

Komodo has roughly 200 evaluation terms, many of them configurable and
some of them hard coded into the program. You could define the piece
square tables as 64 x 6 x 2 = 768 different evaluation terms and thus
view it as a major component but if you look at the code from any of
the programs that build their tables by construction you see that in
reality they represent a minor subset of the terms. How minor this is
depends on the program itself and complexity of the evaluation
function and how much any given term impacts the play in general. So
to get a more realistic picture of the impact of piece square tables
you have to realize that an equivalent program can be constructed
without piece square tables and this could be represented with just a
small handful of evaluation terms.

Here is an example to make that clear. Suppose I wanted to have a term
in the program which penalized having a knight on the edge of the
board? A single term could express that concept and it could be
computed easily. The piece square table is just a particularly
efficient way to express that concept and other basic concepts such as
piece centralization.

So it should come as no suprise that the piece square tables does not
represent a major component of the evaluation function. This is myth
that needs to be debunked.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
velmarin
Posts: 1600
Joined: Mon Feb 21, 2011 9:48 am

Re: Similarity tool myth - debunked.

Post by velmarin »

Mr Don,
You are quite right,
his tool is very good in my opinion, the penalty is that we are turning over these things.

We have two weeks where there calmly on "the origins of the engine" and is much appreciated.

Your tool in my tests is very good.
User avatar
Dan Honeycutt
Posts: 5258
Joined: Mon Feb 27, 2006 4:31 pm
Location: Atlanta, Georgia

Re: Similarity tool myth - debunked.

Post by Dan Honeycutt »

Don wrote:So it should come as no suprise that the piece square tables does not represent a major component of the evaluation function. This is myth that needs to be debunked.
I think the correct statement is piece square tables do not represent a major component in Komodo's evaluation function. For engines with a lot less than your ~200 evaluation terms the PSTs likely represent a much more significant component.

Best
Dan H.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity tool myth - debunked.

Post by Don »

Dan Honeycutt wrote:
Don wrote:So it should come as no suprise that the piece square tables does not represent a major component of the evaluation function. This is myth that needs to be debunked.
I think the correct statement is piece square tables do not represent a major component in Komodo's evaluation function. For engines with a lot less than your ~200 evaluation terms the PSTs likely represent a much more significant component.

Best
Dan H.
Yes, that is obviously true. But I don't think that is the case for the stronger engines and right now it's these strong engines that are the focus of all the attention.

Here is really how I stand on this issue. My primary assertion is that it's very difficult to take an existing program of top 10 strength and make it play significantly different without also making it significantly weaker. Houdart was able to make Ivanhoe significantly stronger but he failed to make it play significantly different. And I think I have shown that is isn't just the piece square tables that he copied.

Yes, I could make major changes to the piece square table and give it different playing style if I am willing to make drastic changes and use heavy weight so that they over-ride the other evaluation features. I could double the queen side values so that the program wants to play all it's pieces to the king side for example. But the primary concepts that the piece square tables capture are piece centralization and general purpose pawn advancement and perhaps a few others trivial things. These are universal concepts so the trick isn't just changing the tables, it's changing them in a way that can make it play a lot different and yet still not be totally stupid. Presumably ALL piece square tables will value knight centralization - you are not going to get around that without weakening the program.

So if you merely substitute someone else's piece square tables it's going to capture the same concepts, the only difference will be in the details. If that can make your program play a lot differently then either your program sucks or theirs does.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Hood
Posts: 659
Joined: Mon Feb 08, 2010 12:52 pm
Location: Polska, Warszawa

Re: Similarity tool myth - debunked.

Post by Hood »

Excuse me,

I am a bit sceptic concerning similarity tool from the chess point.

I think that in the position there are 2-3 best moves (contrary to Tarrasch 1 best :)) so the programs of certain level will choose the same moves.
Algorithms can be differrent but shall drive to the same decision.

I think that tracing who has adapted what is a way to nowhere because it is losing energy and time. Lets look to the car industry.
If we designed similarity tool for that area what would we get?
I suppose if there is future for chess programs it is in teaching area.

Rgds Hood.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Similarity tool myth - debunked.

Post by michiguel »

Hood wrote:Excuse me,

I am a bit sceptic concerning similarity tool from the chess point.

I think that in the position there are 2-3 best moves (contrary to Tarrasch 1 best :)) so the programs of certain level will choose the same moves.
The fact is, they don't.

For the same reason you mention, there 2-3 suitable moves in many positions to pick from. Expand this over 1000 positions and a higher match means a clear similarity in move selection or style.

Miguel
Algorithms can be differrent but shall drive to the same decision.

I think that tracing who has adapted what is a way to nowhere because it is losing energy and time. Lets look to the car industry.
If we designed similarity tool for that area what would we get?
I suppose if there is future for chess programs it is in teaching area.

Rgds Hood.
Hood
Posts: 659
Joined: Mon Feb 08, 2010 12:52 pm
Location: Polska, Warszawa

Re: Similarity tool myth - debunked.

Post by Hood »

michiguel wrote: The fact is, they don't.

For the same reason you mention, there 2-3 suitable moves in many positions to pick from. Expand this over 1000 positions and a higher match means a clear similarity in move selection or style.

Miguel
May be you are right but... I do not trust in statistics. are chess lottery game?

I think creative people are losing their energy for designig detective tools. It is a role for others.
Uri Blass
Posts: 10874
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Similarity tool myth - debunked.

Post by Uri Blass »

Don wrote:
Dan Honeycutt wrote:
Don wrote:So it should come as no suprise that the piece square tables does not represent a major component of the evaluation function. This is myth that needs to be debunked.
I think the correct statement is piece square tables do not represent a major component in Komodo's evaluation function. For engines with a lot less than your ~200 evaluation terms the PSTs likely represent a much more significant component.

Best
Dan H.
Yes, that is obviously true. But I don't think that is the case for the stronger engines and right now it's these strong engines that are the focus of all the attention.

Here is really how I stand on this issue. My primary assertion is that it's very difficult to take an existing program of top 10 strength and make it play significantly different without also making it significantly weaker. Houdart was able to make Ivanhoe significantly stronger but he failed to make it play significantly different. And I think I have shown that is isn't just the piece square tables that he copied.

Yes, I could make major changes to the piece square table and give it different playing style if I am willing to make drastic changes and use heavy weight so that they over-ride the other evaluation features. I could double the queen side values so that the program wants to play all it's pieces to the king side for example. But the primary concepts that the piece square tables capture are piece centralization and general purpose pawn advancement and perhaps a few others trivial things. These are universal concepts so the trick isn't just changing the tables, it's changing them in a way that can make it play a lot different and yet still not be totally stupid. Presumably ALL piece square tables will value knight centralization - you are not going to get around that without weakening the program.

So if you merely substitute someone else's piece square tables it's going to capture the same concepts, the only difference will be in the details. If that can make your program play a lot differently then either your program sucks or theirs does.
1)I think that you showed nothing about houdini by similiarity analysis.

I remember that your similiarity tool detected also that Critter is too similiar to IvanHoe when you have the source of Critter and
Richard copied only the piece square table
so what worked for komodo did not work for Critter and it seems that not all the top programs are the same.

2)I do not know if it is hard to change move choice without reducing the playing strength significantly but if I try to do it then I may try to change not only the evaluation.

The idea is that out of 2 different equal moves based on deep search
I would like the program to prefer the move that at very small depth seems to be worse(when usually it does not happen because the program need to see better move to change its mind and not equal move).

In order to do it
I may simply tell the program to change its mind at small depth(for example depth<10) not only if it finds a better move but also when it finds move that has the same score(and it is simple to do it by changing the score that it has for the best move and reduce it by 1).

If the program changes its mind based on a move with the same score then I may tell it to stop this behaviour so there is a good chance that in cases that there are 2 equal moves based on the program,
my modification is going to produce a different move.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity tool myth - debunked.

Post by Don »

Hood wrote:Excuse me,

I am a bit sceptic concerning similarity tool from the chess point.

I think that in the position there are 2-3 best moves (contrary to Tarrasch 1 best :)) so the programs of certain level will choose the same moves.
Algorithms can be differrent but shall drive to the same decision.
If you go by gut instinct you will often reach the wrong conclusion. What you say sound reasonable but in fact that is not how it works.

When I first published the tester the very first objection was based on your same thinking. Two extremely strong programs were compared and found to be very similar. I think it was Robbolito and Houdini. The immediate objection was that "brilliant minds think alike", in other words we should not be surprised that 2 strong program play the same moves most of the time.

That was trivially rebuffed by simply running one of the programs 10x longer - putting it in a completely different playing class. It make only a very small difference. So it turns out that the each program has it's own personality and playing style which is largely independent of playing strength.

The similarity tool is not a "find the best move" test. The positions were taken randomly from a large set of grandmaster games.


I think that tracing who has adapted what is a way to nowhere because it is losing energy and time. Lets look to the car industry.
If we designed similarity tool for that area what would we get?
I suppose if there is future for chess programs it is in teaching area.

Rgds Hood.
The auto-makers jealously guard all their technology with patent and law suits. Try taking one of the engines from your competitor and see what happens to you. Even if you make some fix-ups to change things a bit you will be in a world of hurt.

That's not pettiness on their part either because they sink million of dollars into the R&D and design of each engine. So if you can get away with simply taking the design for yourself you have succeeding in criminal dishonesty, they pay millions and you reap the advantages. If you steal millions of dollars what do you think should happen?
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity tool myth - debunked.

Post by Don »

Uri Blass wrote:
Don wrote:
Dan Honeycutt wrote:
Don wrote:So it should come as no suprise that the piece square tables does not represent a major component of the evaluation function. This is myth that needs to be debunked.
I think the correct statement is piece square tables do not represent a major component in Komodo's evaluation function. For engines with a lot less than your ~200 evaluation terms the PSTs likely represent a much more significant component.

Best
Dan H.
Yes, that is obviously true. But I don't think that is the case for the stronger engines and right now it's these strong engines that are the focus of all the attention.

Here is really how I stand on this issue. My primary assertion is that it's very difficult to take an existing program of top 10 strength and make it play significantly different without also making it significantly weaker. Houdart was able to make Ivanhoe significantly stronger but he failed to make it play significantly different. And I think I have shown that is isn't just the piece square tables that he copied.

Yes, I could make major changes to the piece square table and give it different playing style if I am willing to make drastic changes and use heavy weight so that they over-ride the other evaluation features. I could double the queen side values so that the program wants to play all it's pieces to the king side for example. But the primary concepts that the piece square tables capture are piece centralization and general purpose pawn advancement and perhaps a few others trivial things. These are universal concepts so the trick isn't just changing the tables, it's changing them in a way that can make it play a lot different and yet still not be totally stupid. Presumably ALL piece square tables will value knight centralization - you are not going to get around that without weakening the program.

So if you merely substitute someone else's piece square tables it's going to capture the same concepts, the only difference will be in the details. If that can make your program play a lot differently then either your program sucks or theirs does.
1)I think that you showed nothing about houdini by similiarity analysis.
But I showed more than you did. I sometimes produce studies and give results. All you do is stand by and think of ways to say no. Show me something.

I remember that your similiarity tool detected also that Critter is too similiar to IvanHoe when you have the source of Critter and
Richard copied only the piece square table
so what worked for komodo did not work for Critter and it seems that not all the top programs are the same.
What are you talking about? I think Critter is closely patterned after Ivanhoe in more ways than just the piece square tables. But much less so that Houdini which started out from the same exact code based as Robbolitto.

What is it that worked and did not work? I don't know what you are saying here.

2)I do not know if it is hard to change move choice without reducing the playing strength significantly but if I try to do it then I may try to change not only the evaluation.
I don't believe the sim tool is very good at detecting search similarities. If that's what you are saying then I completely agree with you.

The idea is that out of 2 different equal moves based on deep search
I would like the program to prefer the move that at very small depth seems to be worse(when usually it does not happen because the program need to see better move to change its mind and not equal move).

In order to do it
I may simply tell the program to change its mind at small depth(for example depth<10) not only if it finds a better move but also when it finds move that has the same score(and it is simple to do it by changing the score that it has for the best move and reduce it by 1).

If the program changes its mind based on a move with the same score then I may tell it to stop this behaviour so there is a good chance that in cases that there are 2 equal moves based on the program,
my modification is going to produce a different move.
Are you simply trying to outline a way to defeat the sim tool? Go ahead and do so.

The tool does whatever the tool does, no more and no less and my only interest is scientific. I think it has already taught us one very important lesson, that each program has it's own personality and that personality is not defined by the ELO. This is something that strong players who work with computers have known for a very long time but evidently is something many of us are naively unaware of.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.