Similarity tests

Henk · Post by **Henk** » Mon Oct 06, 2014 5:16 pm

I only test my own chess engine Skipper, so I know I'm cheating. Copying everything from chess programming web site. Not quite original. My own ideas never work ( and others too). But for me nothing works as Don Daily told me one and a half year ago. I don't understand how other programmers are able to get ELO > 2400 without cloning.

Guenther · Post by **Guenther** » Mon Oct 06, 2014 5:38 pm

Modern Times wrote:Yes - which takes us back to:

Everyone can do what they are comfortable and happy doing.

Not really, but the tests need quite some additional work.
A few days ago I downloaded all for simtest, but then I noticed how much work/time it still needs to get more reliable results.
Adam did a great job, that's all I can say now!

Sedat Canbaz · Post by **Sedat Canbaz** » Mon Oct 06, 2014 5:42 pm

Hello Adam,

It seems you missed to read my locked thread:
http://www.talkchess.com/forum/viewtopic.php?t=53960

Btw, many thanks to Talkchess moderators!!!

And now about the current issue,

I already stated....SCCT used rule is not perfect, but the best one which I've noticed so far...

1) There are engines that do not respond correctly to the similarity tool.
Right...but almost all chess engines are responding on my PCs, which I have tested so far
Exception: Junior and Delfi

2) The similarity tool only sends UCI commands. In my experience, it can be difficult to properly test WB engines.
Yes...but Wb2UCI adapter almost all WB engines are possible to tested via sim tool

Plus...very rarely we can see WB engine to be clone...
Nowadays UCI engines are getting mainly to be copied...

3) There is a positive correlation between engine strength and similarity scores. Since there is also a positive correlation between engine strength and processor speed, different people will have different similarity measurements.
So far (according to my sim test results +55% )
I could not see any positive correlation between engine strength and similarity scores,
And correct me please if I am wrong (based on my published sim test results),
I mean which one is original that I don't allow and I am making double standard regarding SCCT ?

4) Whatever threshold used should have some statistical analysis to back it up. I do not suggest using my numbers because they are relative to the computer I used.
No problem....SCCT's sim test data data can be used as main one, do you have any doubts ??

And now I let's talk a little bit about CCRL superior conditions,
I can't see Ivanhoe, Heron, BabyMaster....do you plan to test them too ??

Best,
Sedat

Modern Times · Post by **Modern Times** » Mon Oct 06, 2014 7:52 pm

Ivanhoe is tested by CCRL, look more closely.

Sedat Canbaz · Post by **Sedat Canbaz** » Mon Oct 06, 2014 8:51 pm

Modern Times wrote:Ivanhoe is tested by CCRL, look more closely.

Dear Ray,

Oh yes now I noticed it, BRAVO !

Ok...but still there are many engines, which I could not see in CCRL
But at least....CCRL team does not make double standard as far as possible...and that sounds not bad...!!

But however,
I respect...It's CCRL rules... and I wish to all testers good luck !!

Keep up the good work!

Sedat

Adam Hair · Post by **Adam Hair** » Tue Oct 07, 2014 1:31 pm

Sedat Canbaz wrote:Hello Adam,

It seems you missed to read my locked thread:
http://www.talkchess.com/forum/viewtopic.php?t=53960

Btw, many thanks to Talkchess moderators!!!

And now about the current issue,

I already stated....SCCT used rule is not perfect, but the best one which I've noticed so far...

1) There are engines that do not respond correctly to the similarity tool.
Right...but almost all chess engines are responding on my PCs, which I have tested so far
Exception: Junior and Delfi

I do not remember exactly which engines would either abort their search before receiving the stop command or would keep searching after receiving it. I seem to remember that Zappa, Junior, Sjeng, Booot, Ktulu, Chess Tiger, and Bison had problems of some sort. I remember having a result for Delfi (~57% match with Fruit 2.1), but I do not remember what I had to do to get it.

The best way to check if an engine is running the test correctly is to use Polyglot or wb2uci to make logs. I have found that InBetween does not work as well (the test can stall at times).

Sedat Canbaz wrote: 2) The similarity tool only sends UCI commands. In my experience, it can be difficult to properly test WB engines.
Yes...but Wb2UCI adapter almost all WB engines are possible to tested via sim tool

Plus...very rarely we can see WB engine to be clone...
Nowadays UCI engines are getting mainly to be copied...

Many WB engines will not stop searching and send a best move when they are sent the command to stop.

Sedat Canbaz wrote: 3) There is a positive correlation between engine strength and similarity scores. Since there is also a positive correlation between engine strength and processor speed, different people will have different similarity measurements.
So far (according to my sim test results +55% )
I could not see any positive correlation between engine strength and similarity scores,
And correct me please if I am wrong (based on my published sim test results),
I mean which one is original that I don't allow and I am making double standard regarding SCCT ?

If you take a large enough sample and plot average Elo versus similarity score, I believe that you will find that there is a correlation. If I can dig up my data, I will plot it to show you what I mean.

I have not looked over your posted results. My primary objection is that 55% may be too low for your conditions.

Sedat Canbaz wrote: 4) Whatever threshold used should have some statistical analysis to back it up. I do not suggest using my numbers because they are relative to the computer I used.
No problem....SCCT's sim test data data can be used as main one, do you have any doubts ??

I only have doubts if you do not do a thorough survey of engines and do not use that information to determine a threshold percentage.

Sedat Canbaz wrote: And now I let's talk a little bit about CCRL superior conditions,
I can't see Ivanhoe, Heron, BabyMaster....do you plan to test them too ??

Best,
Sedat

BabyMaster? Never heard of that one.

Where did I say anything about the CCRL having superior conditions?

Sedat Canbaz · Post by **Sedat Canbaz** » Tue Oct 07, 2014 6:36 pm

Dear Adam,

BabyMaster is my chess engine, simply I cloned ChessMaster, but at least I am honest!)
I hope to see more honest programmers who will tell the true, there are not like me ))
And its will be great, if you test my chess engine under CCRL (40/40)
In blitz, I know its performance (approx. 30 Elo) stronger than original default
I wonder...what will be the performance of my chess engine at slow time controls...

Exception that BabyMaster you never heard, be sure in that,
You will hear many new engine releases, which will belong to others works...
But in the same time, they will claim that their engines are original work

This is already proven...

And be sure in that too,
The great Don Dailey tool is not a perfect in detection,
But the best one which I've seen so far...

And I am very happy of using it...really!!
Once more I noticed: Don Dailey belongs to the greatest chess programmers!!

A little note more,
Those engine which have difficulties with sim tool, there is another solution (as always):
-We can ask for a help and advise from chess engine experts,
Just is needed a little bit work and efforts by us...

Btw, I managed (via sim tool)to test some of your mentioned engines without problems!
And right now still I am testing the rest...

But this is also true,
I don't blame CCRL or CEGT for nothing and I have no right for that...
I know very well what is a collective work, not so easy... that's why congrats!
You should follow strictly the same views as rest testers of the same team...
Otherwise...for example, CCRL or CEGT would not exist...
And I give full understanding to both rating teams (CCRL and CEGT)

But Individual Rating Creators,
They can fight and protect the originality

Otherwise,
We will see a lot of mushrooms of same types (similar engines with different settings)

And last,
- Once more I say: my rule is not perfect, but the best one !!

Feel the difference in SCCT!

Hopes helps..

Best,
Sedat

Sedat Canbaz · Post by **Sedat Canbaz** » Tue Oct 07, 2014 8:12 pm

Hello Adam,

One thing more, you helped a lot regarding sim tool, now it's my turn

A little advise about how to run via sim tool, which you could not test
For example Bison, Ktulu, Chess Tiger etc..
You need to include all their files in sim test folder (during sim test process)

And I expect you can test them successfully too!

Hopes this helps too

Sedat

Adam Hair · Post by **Adam Hair** » Tue Oct 07, 2014 9:44 pm

Sedat,

I never said that I was unable to get any results for those engines. Rather, I think that the results for these engines may be problematic. I will have to look over my data.

http://www.top-5000.nl/clone.htm

Sedat Canbaz · Post by **Sedat Canbaz** » Tue Oct 07, 2014 11:23 pm

Btw, here are the sons of the Shark (Rybka) !!

Hopes helps...

sim version 3
------ Rybka 3 (time: 100 ms scale: 1.0) ------
61.79 RobboLito 0.085g3 w32 (time: 100 ms scale: 1.0)
59.18 Elektro 1.0 (time: 100 ms scale: 1.0)
58.76 Fire 3.0 x64 (time: 100 ms scale: 1.0)
58.45 BlackMamba 2.0 x64 (time: 100 ms scale: 1.0)
57.66 Critter 1.6a 64-bit (time: 100 ms scale: 1.0)
57.65 Equinox 3.20 x64mp (time: 100 ms scale: 1.0)
57.00 Naum 4.6 (time: 100 ms scale: 1.0)
56.98 Murka 3 x64 UCI (time: 100 ms scale: 1.0)
56.48 Houdini 4 x64 (time: 100 ms scale: 1.0)
55.12 Critter 0.90 64-bit SSE4 (time: 100 ms scale: 1.0)
54.87 Rybka 1.0 Beta (time: 100 ms scale: 1.0)
53.93 Fruit 090705 Test Beta (time: 100 ms scale: 1.0)
53.69 Stockfish 2.1 JA 64bit (time: 100 ms scale: 1.0)
53.44 Stockfish 2.1 JA 64bit (time: 50 ms scale: 1.0)
53.29 Strelka 2.0 B (time: 50 ms scale: 1.0)
52.82 Komodo64 2.03 DC (time: 100 ms scale: 1.0)
52.80 Heron impossible 231113 X64 Normal mode (time: 100 ms scale: 1.0)
52.51 Stockfish 1.5 JA 64bit (time: 100 ms scale: 1.0)
52.10 Stockfish 1.7 JA 64bit (time: 100 ms scale: 1.0)
52.05 Senpai 1.0 (time: 100 ms scale: 1.0)
51.69 Stockfish 1.7.1 JA (time: 100 ms scale: 1.0)
51.21 Protector 1.7.0 (time: 100 ms scale: 1.0)
51.08 Komodo 8 64-bit (time: 100 ms scale: 1.0)
50.50 Gull 3 x64 (time: 100 ms scale: 1.0)
50.47 Glaurung 2.2 JA (time: 100 ms scale: 1.0)
50.46 Stockfish 140614 64 SSE4.2 (time: 100 ms scale: 1.0)
50.34 Deep Shredder 12 x64 (time: 100 ms scale: 1.0)
50.04 spark-1.0 (time: 100 ms scale: 1.0)
49.58 Toga II 3.0 (time: 100 ms scale: 1.0)
49.09 MinkoChess 1.3 x64 (time: 100 ms scale: 1.0)
49.00 Crafty 23.8 x64 (time: 100 ms scale: 1.0)
48.85 Chiron 2 64bit (time: 100 ms scale: 1.0)
48.54 Fruit 2.1 (time: 100 ms scale: 1.0)
48.54 Bobcat 3.25 (time: 100 ms scale: 1.0)
47.54 Daydreamer 1.75 JA (time: 100 ms scale: 1.0)
47.29 Octochess revision 5190 (time: 100 ms scale: 1.0)
47.23 Bison 9.11 (time: 100 ms scale: 1.0)
46.78 TwinFish 0.07 (time: 100 ms scale: 1.0)
46.71 cheng4 0.36a (time: 100 ms scale: 1.0)
46.55 Rodent 1.4 (build 2) (time: 100 ms scale: 1.0)
46.33 Cyrano 0.6b17 (time: 100 ms scale: 1.0)
45.78 Tornado 5.0 x64 SSE4 (time: 100 ms scale: 1.0)
45.63 Chess Tiger 2007.1 (time: 100 ms scale: 1.0)
45.59 Spike 1.4 (time: 100 ms scale: 1.0)
44.32 Vajolet 2.48 (time: 100 ms scale: 1.0)
44.26 Movei00_8_438 (time: 100 ms scale: 1.0)
43.28 EXchess v7.31b x64 (time: 100 ms scale: 1.0)
42.33 Igorrit 0.086v8_x64 (time: 100 ms scale: 1.0)
40.65 Ktulu 8 (time: 100 ms scale: 1.0)
40.25 Booot 5.2.0(64) (time: 100 ms scale: 1.0)
40.06 Zappa Mexico II (time: 100 ms scale: 1.0)
35.71 Deep Sjeng WC2008 x64 (time: 100 ms scale: 1.0)
27.40 Arasan 17.2 (time: 100 ms scale: 1.0)

Similarity tests

Re: Similarity tests

Re: Similarity tests

Re: Similarity tests

Re: Similarity tests

Re: Similarity tests

Re: Similarity tests

Re: Similarity tests

Re: Similarity tests

Re: Similarity tests

Re: Similarity tests