Uri's Challenge : TwinFish

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Uri's Challenge : TwinFish

Post by michiguel »

bob wrote:
Rebel wrote:
bob wrote:If you want to ask me "do you believe that the test is pretty accurate, statistically?" I would answer yes. "pretty accurate" however. NOT "perfect". Do I consider it proof that two programs are clones? No. I consider it a suggestion, one that requires code inspection to actually prove the clone status. Do I consider it proof that two programs are not related? No, I consider it pretty reasonable evidence they are not, but not proof. That STILL requires code comparison.
I remember a different reasoning from you back in 2008.
Bob wrote:
CW wrote:My position is that an accused person is innocent until proven guilty.

What's yours?
Mine is the same, but the evidence has become substantial. We have the gun that killed someone. We have fingerprints on the gun. We have gunshot residue on the suspect. We have established motive. We have established opportunity. The suspect was seen entering and leaving the building during the time the victim was killed. Gunshots were heard from inside the building while the suspect was there. The suspect had victim's blood on his clothes. All we lack is an eye-witness. But the case _still_ looks pretty bad and people have been convicted on far less.
:wink:
If you are going to be that obtuse, you are going to have to explain your point. In the post you liked to, I said "innocent until proven guilty". Not ONE comment in this thread has said anything contrary to that statement.
If we all cooperate and leave unnecessary insults out, there will be less chances this (or any other) thread escalate into a noisy chaos.

Miguel

I said (a) the similarity tester can provide a suggestion that one program is a derivative of another, NOT "proof". (b) the similarity tester can provide a suggestion that two programs are completely different and one is not a derivative of the other, but not "proof" that they are not.

So please explain what your point is supposed to be, I have always been "innocent until proven guilty" in this clone nonsense.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Uri's Challenge : TwinFish

Post by Laskos »

bob wrote:
Laskos wrote: 1. Hundreds of studied engines mean tens of thousands of pairs to compare, and not a single false positive appeared. So, it's entirely plausible pink elephants exist, but highly unlikely.

2. It makes sense for someone to try to avoid detection and produce a false negative. The opposite, to try intentionally to make your engine more similar on Sim to another engine, would be silly. There is no incentive in doing that.
Nobody mentioned "incentive". Just "the possibility it could happen." And we have nothing to prove it can not. It just hasn't happened yet.
I just mentioned the possibility of pink elephants to happen. By your reasoning, there is no such a thing like empirical evidence (a posteriori knowledge). Also, your holy grail, "comparison of sources" _can_ fail. Without talking that it could be many orders of magnitude harder to accomplish, if the sources are not readily available.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Uri's Challenge : TwinFish

Post by Laskos »

Laskos wrote:
bob wrote:
Laskos wrote: 1. Hundreds of studied engines mean tens of thousands of pairs to compare, and not a single false positive appeared. So, it's entirely plausible pink elephants exist, but highly unlikely.

2. It makes sense for someone to try to avoid detection and produce a false negative. The opposite, to try intentionally to make your engine more similar on Sim to another engine, would be silly. There is no incentive in doing that.
Nobody mentioned "incentive". Just "the possibility it could happen." And we have nothing to prove it can not. It just hasn't happened yet.
I just mentioned the possibility of pink elephants to happen. By your reasoning, there is no such a thing like empirical evidence (a posteriori knowledge). Also, your holy grail, "comparison of sources" _can_ fail. Without talking that it could be many orders of magnitude harder to accomplish, if the sources are not readily available.
Without talking of Rybka case, where your holy grail produced or was close to produce a false positive. But that's another discussion I will not enter more.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Uri's Challenge : TwinFish

Post by bob »

michiguel wrote:
bob wrote:
Rebel wrote:
bob wrote:If you want to ask me "do you believe that the test is pretty accurate, statistically?" I would answer yes. "pretty accurate" however. NOT "perfect". Do I consider it proof that two programs are clones? No. I consider it a suggestion, one that requires code inspection to actually prove the clone status. Do I consider it proof that two programs are not related? No, I consider it pretty reasonable evidence they are not, but not proof. That STILL requires code comparison.
I remember a different reasoning from you back in 2008.
Bob wrote:
CW wrote:My position is that an accused person is innocent until proven guilty.

What's yours?
Mine is the same, but the evidence has become substantial. We have the gun that killed someone. We have fingerprints on the gun. We have gunshot residue on the suspect. We have established motive. We have established opportunity. The suspect was seen entering and leaving the building during the time the victim was killed. Gunshots were heard from inside the building while the suspect was there. The suspect had victim's blood on his clothes. All we lack is an eye-witness. But the case _still_ looks pretty bad and people have been convicted on far less.
:wink:
If you are going to be that obtuse, you are going to have to explain your point. In the post you liked to, I said "innocent until proven guilty". Not ONE comment in this thread has said anything contrary to that statement.
If we all cooperate and leave unnecessary insults out, there will be less chances this (or any other) thread escalate into a noisy chaos.

Miguel

What "unnecessary insult" are you talking about? Certainly not the word "obtuse"???

Here's my intended usage: "indistinctly felt or perceived, as pain or sound.".

Not an insult. Very unclear (to me) meaning/implication.

I said (a) the similarity tester can provide a suggestion that one program is a derivative of another, NOT "proof". (b) the similarity tester can provide a suggestion that two programs are completely different and one is not a derivative of the other, but not "proof" that they are not.

So please explain what your point is supposed to be, I have always been "innocent until proven guilty" in this clone nonsense.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Uri's Challenge : TwinFish

Post by bob »

Laskos wrote:
bob wrote:
Laskos wrote: 1. Hundreds of studied engines mean tens of thousands of pairs to compare, and not a single false positive appeared. So, it's entirely plausible pink elephants exist, but highly unlikely.

2. It makes sense for someone to try to avoid detection and produce a false negative. The opposite, to try intentionally to make your engine more similar on Sim to another engine, would be silly. There is no incentive in doing that.
Nobody mentioned "incentive". Just "the possibility it could happen." And we have nothing to prove it can not. It just hasn't happened yet.
I just mentioned the possibility of pink elephants to happen. By your reasoning, there is no such a thing like empirical evidence (a posteriori knowledge). Also, your holy grail, "comparison of sources" _can_ fail. Without talking that it could be many orders of magnitude harder to accomplish, if the sources are not readily available.
If source comparison "can fail" then the similarity test is hopeless from the get-to, because source comparison is about 100x more accurate.

That is the ONLY way to be sure. Anything else has a significantly large error margin.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Uri's Challenge : TwinFish

Post by bob »

Laskos wrote:
Laskos wrote:
bob wrote:
Laskos wrote: 1. Hundreds of studied engines mean tens of thousands of pairs to compare, and not a single false positive appeared. So, it's entirely plausible pink elephants exist, but highly unlikely.

2. It makes sense for someone to try to avoid detection and produce a false negative. The opposite, to try intentionally to make your engine more similar on Sim to another engine, would be silly. There is no incentive in doing that.
Nobody mentioned "incentive". Just "the possibility it could happen." And we have nothing to prove it can not. It just hasn't happened yet.
I just mentioned the possibility of pink elephants to happen. By your reasoning, there is no such a thing like empirical evidence (a posteriori knowledge). Also, your holy grail, "comparison of sources" _can_ fail. Without talking that it could be many orders of magnitude harder to accomplish, if the sources are not readily available.
Without talking of Rybka case, where your holy grail produced or was close to produce a false positive. But that's another discussion I will not enter more.
There was NO false positive there, sorry. Try some other argument.

There is even MORE evidence about that case today. Go to Rybka forum and look at the hash code Richard Vida uncovered. False match indeed...
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Uri's Challenge : TwinFish

Post by michiguel »

bob wrote:
michiguel wrote:
bob wrote:
Rebel wrote:
bob wrote:If you want to ask me "do you believe that the test is pretty accurate, statistically?" I would answer yes. "pretty accurate" however. NOT "perfect". Do I consider it proof that two programs are clones? No. I consider it a suggestion, one that requires code inspection to actually prove the clone status. Do I consider it proof that two programs are not related? No, I consider it pretty reasonable evidence they are not, but not proof. That STILL requires code comparison.
I remember a different reasoning from you back in 2008.
Bob wrote:
CW wrote:My position is that an accused person is innocent until proven guilty.

What's yours?
Mine is the same, but the evidence has become substantial. We have the gun that killed someone. We have fingerprints on the gun. We have gunshot residue on the suspect. We have established motive. We have established opportunity. The suspect was seen entering and leaving the building during the time the victim was killed. Gunshots were heard from inside the building while the suspect was there. The suspect had victim's blood on his clothes. All we lack is an eye-witness. But the case _still_ looks pretty bad and people have been convicted on far less.
:wink:
If you are going to be that obtuse, you are going to have to explain your point. In the post you liked to, I said "innocent until proven guilty". Not ONE comment in this thread has said anything contrary to that statement.
If we all cooperate and leave unnecessary insults out, there will be less chances this (or any other) thread escalate into a noisy chaos.

Miguel

What "unnecessary insult" are you talking about? Certainly not the word "obtuse"???

Here's my intended usage: "indistinctly felt or perceived, as pain or sound.".

Not an insult. Very unclear (to me) meaning/implication.
What insult? Future insults. We know how things escalate and how they start with minor descriptions of how one person perceives the other. If we all try to smooth rough edges, it will remain enjoyable with a very minimum effort. This is not a moderation warning of any sort, just an observation, which may be useful as prevention. Let's get back to the thread.

Miguel

I said (a) the similarity tester can provide a suggestion that one program is a derivative of another, NOT "proof". (b) the similarity tester can provide a suggestion that two programs are completely different and one is not a derivative of the other, but not "proof" that they are not.

So please explain what your point is supposed to be, I have always been "innocent until proven guilty" in this clone nonsense.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Uri's Challenge : TwinFish

Post by Adam Hair »

bob wrote:
Adam Hair wrote:
Milos wrote:
Laskos wrote:What sense are making these analogies? It's about statistics, and no false positives in hundreds of studied engines.
Again talking BS as usual.
How do you know there are no false positives???
Do you have source code of all those hundreds of studied engines?
The only thing you do when you see high score after using BS similarity test is scream clone. You never perform any serious analysis, look at sources, disassemble engines in question.
Your claims are a joke.
Several closed source engines that have the highest similarity percentages have been looked at by Mark Watkins, though some in more detail than others. The only one in question may be Fritz 11. Otherwise, there has not been any signs of a false positive. If you have any evidence of a false positive, please share it.
The problem is this:

(1) citing that "hundreds of tests have been done with no false positives found."

(2) citing that "several closed source engines have been looked at."

"hundreds" and "several" leave me in a vague state. For example, you said that several with the highest similarity were tested and found to be not similar by mark.. Doesn't that actually show that there might be a weakness in the test if it says "they are similar" but inspection says "they are not?"
I wrote the opposite of that, though not clearly enough. The closed source engines with high similarity percentages that have been checked (to some degree) do show evaluation code similarities. Furthermore, the evaluation code similarities have been with open source engines.
bob wrote: This is unsound reasoning. If you are suspicious of something, just saying "I have not seen an exception" doesn't really support the argument very well. If someone shows an exception, it will prove the process is flawed. Until one is shown, it is only a guesstimate of whether the process is flawed or not. How long did the search for a Higgs boson go on? Since no one had found one, did that mean such did not exist?
The lack of an exception increases the confidence in the test. I tested a large number of engines, in part, to find an exception. Based on the statistical assumptions that I have been using, if I test enough engines I should find an exception. This is why I judged 60% to be a proper threshold (for my data) for judging whether or not 2 engines are related. That makes a naturally occurring exception very unlikely.
bob wrote: If you want to ask me "do you believe that the test is pretty accurate, statistically?" I would answer yes. "pretty accurate" however. NOT "perfect". Do I consider it proof that two programs are clones? No. I consider it a suggestion, one that requires code inspection to actually prove the clone status. Do I consider it proof that two programs are not related? No, I consider it pretty reasonable evidence they are not, but not proof. That STILL requires code comparison.
It is not enough to definitively declare that an engine is a derivative. Such a declaration does require code inspection, though that process is more subjective than has been admitted. However, the similarity test can determine clones.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Uri's Challenge : TwinFish

Post by Adam Hair »

Laskos wrote:
Uri Blass wrote:
The incentive is to have a stronger engine and based on my memory the programmer of Naum already did it with Rybka based on his words(I think that he did it with Rybka2.3.2a but I am not sure about the exact version of Rybka).

I think that we can at least agree that big similarity is not something that can happen by accident and the engine is derived from the code or from the output of another engine.
I don't believe in this Naum story. Try to optimize engine's strength via a test suite instead of pure Elo-wise tests. You will get a weaker engine. Trying to optimize to play these neutral positions from Sim similarly to a stronger engine, if your own engine has a very different eval, will only wreck the engine. There are hundreds of parameters to tune, it would be a miracle for a completely different eval to be tunable according to the same parameters, and to get a stronger engine.
If you trace the changes in Naum with the similarity tool, you will see a relatively unique engine (v1.91, v2.0) start displaying increased similarity with Strelka 2.0B (v2.1, v2.2), then high similarity with Rybka 2.x. (v3.1, v4.2). I do believe that Strelka was essential for tuning Naum to Rybka.

Code: Select all

  Key:

  1) Fruit 2.1 (time: 290 ms  scale: 1.0)
  2) Naum 1.91 (time: 502 ms  scale: 1.0)
  3) Naum 2.0 (time: 290 ms  scale: 1.0)
  4) Naum 2.1 (time: 217 ms  scale: 1.0)
  5) Naum 2.2 (time: 180 ms  scale: 1.0)
  6) Naum 3.1 (time: 114 ms  scale: 1.0)
  7) Naum 4.2 (time: 58 ms  scale: 1.0)
  8) Rybka 1.0 Beta (time: 171 ms  scale: 1.0)
  9) Rybka 1.1 (time: 121 ms  scale: 1.0)
 10) Rybka 1.2f (time: 114 ms  scale: 1.0)
 11) Rybka 2.1o (time: 116 ms  scale: 1.0)
 12) Rybka 2.2n2 (time: 76 ms  scale: 1.0)
 13) Rybka 2.3.2a (time: 60 ms  scale: 1.0)
 14) Strelka 2.0 B (time: 114 ms  scale: 1.0)
 15) Thinker 5.4c Inert (time: 102 ms  scale: 1.0)

         1     2     3     4     5     6     7     8     9    10    11    12    13    14    15
  1.  ----- 48.51 47.32 53.12 52.33 54.38 54.92 55.75 56.00 55.32 55.00 55.16 56.71 57.60 53.82
  2.  48.51 ----- 68.44 51.30 51.75 44.53 46.03 45.63 46.10 46.53 46.04 45.61 47.48 48.35 46.54
  3.  47.32 68.44 ----- 52.49 53.69 43.93 45.36 45.31 45.62 45.42 44.93 45.44 46.97 46.78 45.91
  4.  53.12 51.30 52.49 ----- 71.11 54.33 55.54 54.81 56.06 55.29 55.06 55.41 54.92 57.71 53.56
  5.  52.33 51.75 53.69 71.11 ----- 53.30 54.99 53.74 54.71 54.45 53.65 54.75 53.92 56.55 52.76
  6.  54.38 44.53 43.93 54.33 53.30 ----- 67.01 59.48 65.06 67.84 68.66 63.22 60.33 61.51 57.48
  7.  54.92 46.03 45.36 55.54 54.99 67.01 ----- 60.37 64.20 64.25 64.13 64.54 62.01 62.99 58.24
  8.  55.75 45.63 45.31 54.81 53.74 59.48 60.37 ----- 67.30 65.32 64.72 65.25 62.19 68.52 58.76
  9.  56.00 46.10 45.62 56.06 54.71 65.06 64.20 67.30 ----- 73.61 72.51 69.66 65.11 68.57 60.34
 10.  55.32 46.53 45.42 55.29 54.45 67.84 64.25 65.32 73.61 ----- 87.14 71.78 66.16 66.58 60.67
 11.  55.00 46.04 44.93 55.06 53.65 68.66 64.13 64.72 72.51 87.14 ----- 72.07 64.91 65.96 59.86
 12.  55.16 45.61 45.44 55.41 54.75 63.22 64.54 65.25 69.66 71.78 72.07 ----- 66.39 66.31 59.84
 13.  56.71 47.48 46.97 54.92 53.92 60.33 62.01 62.19 65.11 66.16 64.91 66.39 ----- 65.19 59.59
 14.  57.60 48.35 46.78 57.71 56.55 61.51 62.99 68.52 68.57 66.58 65.96 66.31 65.19 ----- 63.26
 15.  53.82 46.54 45.91 53.56 52.76 57.48 58.24 58.76 60.34 60.67 59.86 59.84 59.59 63.26 -----
Uri Blass
Posts: 10321
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Uri's Challenge : TwinFish

Post by Uri Blass »

Laskos wrote:
Uri Blass wrote:
The incentive is to have a stronger engine and based on my memory the programmer of Naum already did it with Rybka based on his words(I think that he did it with Rybka2.3.2a but I am not sure about the exact version of Rybka).

I think that we can at least agree that big similarity is not something that can happen by accident and the engine is derived from the code or from the output of another engine.
I don't believe in this Naum story. Try to optimize engine's strength via a test suite instead of pure Elo-wise tests. You will get a weaker engine. Trying to optimize to play these neutral positions from Sim similarly to a stronger engine, if your own engine has a very different eval, will only wreck the engine. There are hundreds of parameters to tune, it would be a miracle for a completely different eval to be tunable according to the same parameters, and to get a stronger engine.
It is not clear for me that tuning when you start from a different eval is going to cause your engine to be weaker.

Usually programmers do not work in this way so we have no evidence about it and I guess it may be dependent on the evaluation that you start.