Similarity test

mar · Post by **mar** » Sat Jan 24, 2015 3:22 pm

Just for fun, I ran Don's sim test on cheng4 0.36c and 0.38.
These engines are exactly the same (neglecting some other minor changes in 0.38, search is exactly the same).
The most important difference is tuned evaluation.
The result: 47.7%

This is in line with what I thought, that sim test is only good for comparing eval between engines.
What this implies? Not much really. Just that it's simply not enough to accuse someone of cloning based on high sim result
(and consequently that low sim percentage doesn't really mean anything either).

velmarin · Post by **velmarin** » Sat Jan 24, 2015 4:09 pm

The differences between static_eval the two versions are tremendous.
Stacic eval has a great influence on the quick work of SIM.
Among other things.

Never seen starting positions SIM, would be interesting.

You got into the Cheng 3 test, for the fun.

Laskos · Post by **Laskos** » Sat Jan 24, 2015 4:47 pm

mar wrote: What this implies? Not much really. Just that it's simply not enough to accuse someone of cloning based on high sim result

You got a false negative. So how do you conclude anything on positives?

(and consequently that low sim percentage doesn't really mean anything either).

That's correct, false negatives happen, as Ed already had shown.

mar · Post by **mar** » Sat Jan 24, 2015 4:52 pm

Laskos wrote:You got a false negative. So how do you conclude anything on positives?
(and consequently that low sim percentage doesn't really mean anything either).
That's correct, false negatives happen, as Ed already had shown.

Yes, I probably cannot make any claims about positives, that was a rash conclusion.
But I still think that eval has most most impact on sim results.

Laskos · Post by **Laskos** » Sat Jan 24, 2015 5:01 pm

mar wrote:.
But I still think that eval has most most impact on sim results.

Yes, Don himself hinted to that when releasing the Sim. And it's still a good tool anyway, as most cloners are unable to modify significantly the eval.

Adam Hair · Post by **Adam Hair** » Sat Jan 24, 2015 6:25 pm

Laskos wrote:
mar wrote:.
But I still think that eval has most most impact on sim results.
Yes, Don himself hinted to that when releasing the Sim. And it's still a good tool anyway, as most cloners are unable to modify significantly the eval.

It is easy to change the sim results by changing the parameter values. But it is damn tough to fool the sim test without screwing up the strength too much.

bob · Post by **bob** » Sat Jan 24, 2015 7:13 pm

Adam Hair wrote:
Laskos wrote:
mar wrote:.
But I still think that eval has most most impact on sim results.
Yes, Don himself hinted to that when releasing the Sim. And it's still a good tool anyway, as most cloners are unable to modify significantly the eval.
It is easy to change the sim results by changing the parameter values. But it is damn tough to fool the sim test without screwing up the strength too much.

This is something that needs to be tested. I've seen the statement made repeatedly, but I haven't seen anyone make a concerted effort to actually try and do this. I'm skeptical myself, because I encounter so many tuning parameters where changing them makes little difference in terms of skill, but makes a lot of difference in how a program plays, style-wise. The problem is that all eval terms are general-purpose. If you think of them as a linear programming problem with an objective function, there are a BUNCH of feasible solutions that will meet the same requirements. A general-purpose piece of "knowledge" is, by definition, imperfect. I have seen cases where two different values produce no Elo change, yet they change the personality of the engine quite a bit. That's all that is needed to fool a test that is based mainly on the evaluation.

If this test can be run on linux, tell me how. I can test a bunch of old crafty versions from the last year or so and see if they all show up as very similar or if some drop below the "magic number" that some are using to claim originality. I wouldn't waste the time myself, either, trying to disprove this, because it would take some time. But since I have lots of old versions laying around, many of which are very modest Elo improvements, if I can find a version N and N+1 that fools the similarity test, yet Elos are close, that would prove this is not as hard to do as might be expected...

I believe this "false-positive" argument is irrelevant. What we REALLY care about is a clone that produces a false-negative and hides its origin and gets away with it. Of course there will be false positives. But they can easily be proven to be false with code examination. False negatives avoid the scrutiny and represent a real problem.

Pio · Post by **Pio** » Sat Jan 24, 2015 8:12 pm

Hi

I think it is really dangerous to accuse someone of cheating just because the person's engine makes similar moves to other engines.

There are many reasons why an engine plays similar moves to another engine:

1) I suspect that good engines play more similar moves just because they are good.

2) A person might have tuned his engine against another engine. If that is the case I guess the moves will be more similar.

3) A person might have tuned his weights using another engine as an oracle/reference using the other engine as a sort of trainer.

4) You can get the same result from very different ways. Bats can fly and so can birds too. Whales can swim and so can sharks. I have not even mentioned insects.

I do not think it is so wise thinking an engine is original just because it has low similarity to all the other engines. What about better/smarter cheaters that steal from many engines and take ideas from wiki pages. You might call that research but it is still not original. I also believe that you can make small changes to the evaluation and search leading to very big differences in the similarity test.

michiguel · Post by **michiguel** » Sat Jan 24, 2015 11:04 pm

Pio wrote:Hi

I think it is really dangerous to accuse someone of cheating just because the person's engine makes similar moves to other engines.

There are many reasons why an engine plays similar moves to another engine:

1) I suspect that good engines play more similar moves just because they are good.

This is not true, and it is one of the interesting aspects of testing "similarity". Strong engines play different moves in situations in which there are several reasonable options. This is a statistical test over a big number of positions.

Miguel

2) A person might have tuned his engine against another engine. If that is the case I guess the moves will be more similar.

3) A person might have tuned his weights using another engine as an oracle/reference using the other engine as a sort of trainer.

4) You can get the same result from very different ways. Bats can fly and so can birds too. Whales can swim and so can sharks. I have not even mentioned insects.

I do not think it is so wise thinking an engine is original just because it has low similarity to all the other engines. What about better/smarter cheaters that steal from many engines and take ideas from wiki pages. You might call that research but it is still not original. I also believe that you can make small changes to the evaluation and search leading to very big differences in the similarity test.

Laskos · Post by **Laskos** » Sat Jan 24, 2015 11:08 pm

bob wrote:
I believe this "false-positive" argument is irrelevant. What we REALLY care about is a clone that produces a false-negative and hides its origin and gets away with it. Of course there will be false positives. But they can easily be proven to be false with code examination. False negatives avoid the scrutiny and represent a real problem.

First, I have to agree with Adam by completing my sentence that "cloners have difficulty modifying significantly the eval and not decreasing significantly the strength".

Then to your point. False negatives are possible, let's say even not decreasing dramatically the strength, with smart cloners. Maybe we have to look only at false positives. So far I am unaware of any. What is supposed to do a honest programmer to have high similarity with other program? The poster after you presented some hypothesis.

1/ The absolute strength seems _not_ to be an issue. Strong programs do not play the same moves, it is possible that 2 perfect players will not be very similar.

2/ Tuning against other programs or mimicking other programs. It was claimed by a program author. Show my how, if it's an original engine.

Getting rid of false positives IS a large progress. But folks have to agree on this issue on Sim.

Similarity test

Similarity test

Re: Similarity test

Re: Similarity test

Re: Similarity test

Re: Similarity test

Re: Similarity test

Re: Similarity test

Re: Similarity test

Re: Similarity test

Re: Similarity test