STS 1.0 revisited

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Thomas Mayer
Posts: 385
Joined: Thu Mar 09, 2006 6:45 pm
Location: Nellmersbach, Germany

STS 1.0 revisited

Post by Thomas Mayer »

Since I have seen that Swaminathan uses the strategic test suit to make a first guess about engine strength/development I was interested as well in that test-suit and let some engines through STS1 (maybe 2-8 might follow). Especially I am interested whether it scales somehow with engine strength. For now the picture is unclear, maybe after running all the engines through the hole set it might get clearer. Here is a list of engines and their scores:

Code: Select all

Core 2 Duo 3 GHz, 32bit, One CPU used, 32mb hash each, All Nalimov 5-men, Bitbases when possible, 10 seconds per position
Using epd2wb with the database feature

Engine	               Solved	Solve-Time	CEGT-Elo	WBEC-Elo
Rybka 1.0 Beta 32-bit	   90	      146	       2815	     ?
Gandalf 5.1	             76	      296	      ?2600?	?2650?
Aristarch 4.21	          76	      309	      ?2550?	?2620?
Zappa 1.0	               74	      359	       2573      ?
LambChop 10.99	          74	      379	        ?	   ?2524?
King of Kings 2.40	      71	      346	      ?2450?	?2410?
WildCat 2.79	            71	      359	        ?	     ?
Gromit 3.82	             71	      360	        ?	    2478
Little Goliath 2000 v3.9	70	      371	        ?	     ?
Ruffian 1.0.5	           68	      355	       2618	    2620
Nimzo 2000b	             67	      383	        ?	     ?
Phalanx 22	              66	      400	        ?	    2392
Bringer 1.9	             65	      398	        ?	   ?2476?
Patzer 3.61	             62	      453	        ?	     ?
Adam 2.9	                61	      420	        ?	   ?2050?
Quark v2.70beta	         60	      467	      ?2550?	 2447
Beowulf 2.2	             60	      473		   ?2284?     ?
Horizon 4.1	             59	      479		   ?2300?     ?
GnuChess 4.14	           55	      497		   ?2207?     ?
Mint v2.3	               51	      555		     ?       1410    
Celes 0.75c	             48	      538		     ?       2193
Gerbil 02	               45	      595		     ?       1963
PolarEngine 1.3	         35	      679		     ?       1648
Elo-ratings with ?-? are more or less guessed because the engines are not with the exact version in the lists. IMO it doesn't scale to badly with some exceptions maybe, King of Kings & Adam look overrated, Ruffian and maybe Quark a bit underrated. But of course "undermining" might be a weakness / strong point of those engines. ...to be continued

Greets, Thomas
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: STS 1.0 revisited

Post by Frank Quisinsky »

Hi Thomas,

King Of Kings and Aristarch are quit strong in tactics, Gandalf 4 and 5 too (Gandalf 4.32f more as Gandalf 5 version). Gandalf 4.32d is the strongest of the older Gandalf versions.

The big surprise is LambChop 10.99 I think.

Best
Frank
User avatar
Thomas Mayer
Posts: 385
Joined: Thu Mar 09, 2006 6:45 pm
Location: Nellmersbach, Germany

Re: STS 1.0 revisited

Post by Thomas Mayer »

Frank Quisinsky wrote:King Of Kings and Aristarch are quit strong in tactics, Gandalf 4 and 5 too (Gandalf 4.32f more as Gandalf 5 version). Gandalf 4.32d is the strongest of the older Gandalf versions.

The big surprise is LambChop 10.99 I think.
Hi Frank,

I have no idea how much you know about the Strategic Test Suite made by Swaminathan & Dann Corbit. It definitely doesn't test tactics, at least that is not what the authors want. Therefor I have no idea what your comment is about ?!

Greets, Thomas
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: STS 1.0 revisited

Post by Frank Quisinsky »

Hi Thomas,

thats the main problem the most have.

1. Person A say position X is startegic.
2. Person B could be say ... this isn't strategic, position X is clearly tactic.

The most different opinions I read in the past ... I read to the topic:
Positional knowledge, startegic and tactic. You can ask 100 persons for his opinion to 10 engines. You will never found only two same opinions.

Not important to know this test.

I short look to the results and the opinion in clear for me. Must not be right, but right for me :-)

Best
Frank

I am sure (I am not a predictor but) Gandalf 4.32d will make more points as Gandalf 5 :-)
The King of Kings results are quit clear.
User avatar
Thomas Mayer
Posts: 385
Joined: Thu Mar 09, 2006 6:45 pm
Location: Nellmersbach, Germany

Re: STS 1.0 revisited

Post by Thomas Mayer »

Hi Frank,
Frank Quisinsky wrote:Not important to know this test.

I short look to the results and the opinion in clear for me. Must not be right, but right for me :-)
well, as long you don't know anything about something you shouldn't say anything about something, no ? So this post was mainly for those interested and informed about the testsuite. Because they know why and for what purpose it was created. Anyway I wonder why you post again in CCC, shall I quote some of your statements about CCC from the past ? ;)

Greets, Thomas
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: STS 1.0 revisited

Post by Frank Quisinsky »

Hi Thomas,

first part of your message:
If I look in tactical test suits, same programs participant ... the results are the same.

So I wonder because I read "strategic".
Strategic is the most controversial item in chess.
Very hard to say ... startegic or not.

Second part of your message:
I think the discuss between you and me should be end for now.
For personal things you can send a mail.

But feel free you add some information.
Childlike, sorry.

Best
Frank
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS 1.0 revisited

Post by swami »

Thanks for the STS tests, Thomas. I certainly do appreciate the results.
Thomas Mayer wrote: But of course "undermining" might be a weakness / strong point of those engines. ...to be continued
Yep. Undermining is definitely the weakness/strong point of those engines.

We need to have 20 more STS in order to have ratings that's close to ratings that's gotten from engine vs engine matches.

Since we've only 8 or so test suites, I believe they do test engine's exact knowledge on those suites.
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS 1.0 revisited

Post by swami »

Thomas Mayer wrote:Since I have seen that Swaminathan uses the strategic test suit to make a first guess about engine strength/development I was interested as well in that test-suit and let some engines through STS1 (maybe 2-8 might follow). Especially I am interested whether it scales somehow with engine strength.
I test new releases/updates for 3 reasons :)

1. Want to see if the new update has shown better scores relative to the previous version.

2. Want to get the rough idea of strength of the *newly* released engine.
By comparing its total scores with engines that have similar overall score.

3. Want to see how high up the new update improved and where it is now placed and which level the engine now plays at.


For #3, STS accurately guessed that the latest Daydreamer plays currently at the level of Abrok, Pepito, Delphil, Gromit, Kiwi, Diablo etc

For #2 STS guessed the rough estimate of most newly released engines. (For example: Jabba plays around 2100)

For #1, STS guessed strength improvement of 95% of the engine updates.
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS 1.0 revisited

Post by swami »

Frank Quisinsky wrote:Hi Thomas,

thats the main problem the most have.

1. Person A say position X is startegic.
2. Person B could be say ... this isn't strategic, position X is clearly tactic.
Hi Frank, I tend to disagree here.

How can "Bishop vs Knight", "Square Vacancy", "Knight outposts", "Open files", "Advancement of f/g/h pawns" and others be "Tactics"?

Tactics = Sacrifice, combination, mate threat, huge material gain. etc.
anything that involves immediate gain in material or threats of chechmate is tactics.

Strategy = gain in positional themes: pawn structure, square weakness,
long term plans, Space, mobility, material superiority, threats, material evaluation, king safety.

Strategy doesn't involve huge material gain but are intended to contribute to positional gain, or in rare cases small material gain that maybe inevitable as the result of strategical threats.
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: STS 1.0 revisited

Post by Frank Quisinsky »

Hi Swami,

good !!
The first one in a chess forum I read with comes with an explanation to stratetgic.

I will give the test my first look.

I see results only and look in excel tables I have. I saw, ups the same programs have the same good results in a tactic test. In each of such test suits King of Kings have a better performance.

I like King of Kings very much. With King of Kings I made some analyses in in the past, with Gandalf too, Aristarch too.

I have nothing again test suites but in the most test suites we found a best move. With a best move the most are speaking from tactics. I never see a good test test suite without best moves.

Means with 2, or 3 possible good moves.
I have such a chess book. I got it from around 25 years and used it for chess computers in the past, I like this book.

Sorry, I have nothing against you ... nothing against the strategic test suite.

Best
Frank