Uri's Challenge : TwinFish

JVMerlino · Post by **JVMerlino** » Fri Jan 31, 2014 10:17 pm

Andres Valverde wrote:
Steve Maughan wrote:
tpetzke wrote:(...) I just like the concept of property (...)
+1

Indeed!

Steve
+2

+3

I have sent my code to roughly 8-10 people privately over the five years I've worked on Myrddin. One of those ended up cloning it, literally by changing only the engine name. And this was a version that, at the time, was rated maybe 1500. So you can guess the lengths a motivated cloner will go to for an engine that's 2500+ or even 3000+.

I've spent a lot of hours on Myrddin, and who cares if it is still only about 2300? Why should somebody else benefit from all of that work practically instantly?

jm

pilgrimdan · Post by **pilgrimdan** » Sat Feb 01, 2014 12:14 am

JVMerlino wrote:
Andres Valverde wrote:
Steve Maughan wrote:
tpetzke wrote:(...) I just like the concept of property (...)
+1

Indeed!

Steve
+2
+3

I have sent my code to roughly 8-10 people privately over the five years I've worked on Myrddin. One of those ended up cloning it, literally by changing only the engine name. And this was a version that, at the time, was rated maybe 1500. So you can guess the lengths a motivated cloner will go to for an engine that's 2500+ or even 3000+.

I've spent a lot of hours on Myrddin, and who cares if it is still only about 2300? Why should somebody else benefit from all of that work practically instantly?

jm

agree... nobody should benefit from that... unless the author allows it... and that should be documented...

Uri Blass · Post by **Uri Blass** » Sat Feb 01, 2014 6:09 am

Milos wrote:
Tennison wrote:The only changes made to reach a "<55%" similarity are a complete asymetric PST (based on Adam Hair values).

If you want to see the changes just search for "Robber" in the sources files.
This is well known thing from long ago, that similarity test actually measures PST matching. All other eval terms are completely irrelevant.
It's totally unscientific thing, made to look like some science.

It is not logical that other terms are completely irrelevant(perhaps they are relatively irrelevant if you have crazy high values in the piece square table).
If you change the mobility evaluation or the king safety evaluation you change the choice of the move so mobility or king safety should be relevant.

Also if you change the search you change the choice of the moves so search should be also relevant.

Note that I expect asymetric crazy PST to be relatively weaker at longer time control so some questions:
1)what is the time control that you measure 70-80 elo difference and what happens at time control that is 3 times slower?
2)What happens to the similarity at longer time control and do you find relatively bigger similarity to stockfish(when you compare with other engines) if you use longer time control?

Adam Hair · Post by **Adam Hair** » Sat Feb 01, 2014 4:42 pm

Milos wrote:
Tennison wrote:The only changes made to reach a "<55%" similarity are a complete asymetric PST (based on Adam Hair values).

If you want to see the changes just search for "Robber" in the sources files.
This is well known thing from long ago, that similarity test actually measures PST matching. All other eval terms are completely irrelevant.
It's totally unscientific thing, made to look like some science.

As I recall, you are the only one who has espoused the idea that the similarity test basically only measures PST matching. Are you sure that this is a position that you want to take? Yes, PSTs have a definite influence on move selection. However, it has been shown more than once (and soon, once again) that simply using the same PSTs do not make two engines highly similar. This negates the statement "All other eval terms are completely irrelevant".

Adam Hair · Post by **Adam Hair** » Sat Feb 01, 2014 4:45 pm

velmarin wrote:I've never seen positions used in SIM.
Although it seems they can be changed.

They will guess middlegame.
If you change positions near the opening, what would happen?
or change them at the end positions.

Larry may recall the nature of the positions. I do know that they include opening and midgame positions. Possibly early endgame, but I have not looked that closely.

Adam Hair · Post by **Adam Hair** » Sat Feb 01, 2014 4:51 pm

Laskos wrote:
Rebel wrote:And so we are witnessing the death of similarity tester. Now that the cat is out of the bag I can confirm Ben's findings. During the PST-thread in the programmers forum I did some experiments with the several posted PST's and Piece Values and indeed they dreadfully bring down the similarity percentage without too much elo loss (20-30).

So folks be aware, cloners will find out anyway.
Still, no false positives with Sim, only false negatives.

Yes. 2 or 3 years ago I did some experiments with Fruit, and substantially changing the piece/square values would fool the test. However, it appears that the changes do not have to be as drastic as what I used.

Sedat Canbaz · Post by **Sedat Canbaz** » Sat Feb 01, 2014 5:15 pm

Hello dear Adam,

Right now I am testing Tweenfish and so far the results are very good, soon I hope to publish the results

It seems, we need another positions for Simtest, otherwise many new engine releases will appear as derivative work or clones

Best,
Sedat

RoadWarrior · Post by **RoadWarrior** » Sun Feb 02, 2014 2:07 am

lucasart wrote:At least going open source means you have nothing to hide. It still puzzles me why people develop private engines (so you can't even run the similarity test?) or closed source engines when they are hundreds of elo below the top engines. Why do they fear to show us their code?

As others have remarked, this has nothing to do with fear. My chess engine is my IP, my sweat and effort, my time away from my family, and my "baby". Turning the question around, what advantage does anybody gain if I opened the source, and why should I care?

And even if you ignore those arguments and have the source code in front of you, there are limits to what a human reader can absorb from thousands of lines of text designed primarily to function, not to convey meaning. When knowledge passes into code, it changes state; like water turned to ice, it becomes a new thing, with new properties. That's one reason why transplanting code from one program to another doesn't usually have the desired effect.

Sedat Canbaz · Post by **Sedat Canbaz** » Sun Feb 02, 2014 10:37 am

TwinFish 0.07's simtest results

As wee see, TwinFish has similarity less than 40 % comparing to Fruit 2.1

Well-done to the creator of Twinfish !
It seems they hacked Simtest tool

And now I have a question to all:
- I wonder now, how many engines we are testing, which are created in similar way as Tweenfish ?

Btw, (if there will be a such tournament) then probably I will include Tweenfish in my next Non-Fruit style tournament

Code: Select all

sim version 3
------ TwinFish 0.07 (time: 100 ms  scale: 1.0) ------
 52.25  Stockfish 070114 64 SSE4.2 (time: 100 ms  scale: 1.0)
 47.54  Houdini 4 x64 (time: 100 ms  scale: 1.0)
 47.51  Bouquet 1.8 x64 (time: 100 ms  scale: 1.0)
 47.20  Komodo TCECr 64-bit  (time: 100 ms  scale: 1.0)
 47.17  Fire 3.0 x64 (time: 100 ms  scale: 1.0)
 46.71  IvanHoe-Beta 999946h6 x64 Tr (time: 100 ms  scale: 1.0)
 45.65  Chiron 2 64bit (time: 100 ms  scale: 1.0)
 44.62  Protector 1.6.0 x64 (time: 100 ms  scale: 1.0)
 44.11  DiscoCheck 5.2 (time: 100 ms  scale: 1.0)
 39.80  Fruit 2.1 (time: 100 ms  scale: 1.0)

Uri Blass · Post by **Uri Blass** » Sun Feb 02, 2014 12:51 pm

Sedat Canbaz wrote:TwinFish 0.07's simtest results

As wee see, TwinFish has similarity less than 40 % comparing to Fruit 2.1

Well-done to the creator of Twinfish !
It seems they hacked Simtest tool

And now I have a question to all:
- I wonder now, how many engines we are testing, which are created in similar way as Tweenfish ?

Btw, (if there will be a such tournament) then probably I will include Tweenfish in my next Non-Fruit style tournament
Code: Select all
sim version 3
------ TwinFish 0.07 (time: 100 ms  scale: 1.0) ------
 52.25  Stockfish 070114 64 SSE4.2 (time: 100 ms  scale: 1.0)
 47.54  Houdini 4 x64 (time: 100 ms  scale: 1.0)
 47.51  Bouquet 1.8 x64 (time: 100 ms  scale: 1.0)
 47.20  Komodo TCECr 64-bit  (time: 100 ms  scale: 1.0)
 47.17  Fire 3.0 x64 (time: 100 ms  scale: 1.0)
 46.71  IvanHoe-Beta 999946h6 x64 Tr (time: 100 ms  scale: 1.0)
 45.65  Chiron 2 64bit (time: 100 ms  scale: 1.0)
 44.62  Protector 1.6.0 x64 (time: 100 ms  scale: 1.0)
 44.11  DiscoCheck 5.2 (time: 100 ms  scale: 1.0)
 39.80  Fruit 2.1 (time: 100 ms  scale: 1.0)

I see a difference of 4.71 between stockfish and second similiar at 100 ms
52.25-47.54=4.71
I think that search is relatively more significant at longer time control so I wonder if the difference between stockfish and second similiar is bigger at 500 ms.

Maybe it is possible to use simtest to find programs that you suspect not based on the single number but based on finding that 4.71 is increasing to a bigger number at long time control(note that I do not know if the 4.71 goes up).

Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish

Re: Uri's Challenge : TwinFish