Page 1 of 3
STS 1.0 revisited
Posted: Thu Jan 07, 2010 6:17 pm
by Thomas Mayer
Since I have seen that Swaminathan uses the strategic test suit to make a first guess about engine strength/development I was interested as well in that test-suit and let some engines through STS1 (maybe 2-8 might follow). Especially I am interested whether it scales somehow with engine strength. For now the picture is unclear, maybe after running all the engines through the hole set it might get clearer. Here is a list of engines and their scores:
Code: Select all
Core 2 Duo 3 GHz, 32bit, One CPU used, 32mb hash each, All Nalimov 5-men, Bitbases when possible, 10 seconds per position
Using epd2wb with the database feature
Engine Solved Solve-Time CEGT-Elo WBEC-Elo
Rybka 1.0 Beta 32-bit 90 146 2815 ?
Gandalf 5.1 76 296 ?2600? ?2650?
Aristarch 4.21 76 309 ?2550? ?2620?
Zappa 1.0 74 359 2573 ?
LambChop 10.99 74 379 ? ?2524?
King of Kings 2.40 71 346 ?2450? ?2410?
WildCat 2.79 71 359 ? ?
Gromit 3.82 71 360 ? 2478
Little Goliath 2000 v3.9 70 371 ? ?
Ruffian 1.0.5 68 355 2618 2620
Nimzo 2000b 67 383 ? ?
Phalanx 22 66 400 ? 2392
Bringer 1.9 65 398 ? ?2476?
Patzer 3.61 62 453 ? ?
Adam 2.9 61 420 ? ?2050?
Quark v2.70beta 60 467 ?2550? 2447
Beowulf 2.2 60 473 ?2284? ?
Horizon 4.1 59 479 ?2300? ?
GnuChess 4.14 55 497 ?2207? ?
Mint v2.3 51 555 ? 1410
Celes 0.75c 48 538 ? 2193
Gerbil 02 45 595 ? 1963
PolarEngine 1.3 35 679 ? 1648
Elo-ratings with ?-? are more or less guessed because the engines are not with the exact version in the lists. IMO it doesn't scale to badly with some exceptions maybe, King of Kings & Adam look overrated, Ruffian and maybe Quark a bit underrated. But of course "undermining" might be a weakness / strong point of those engines. ...to be continued
Greets, Thomas
Re: STS 1.0 revisited
Posted: Thu Jan 07, 2010 6:23 pm
by Frank Quisinsky
Hi Thomas,
King Of Kings and Aristarch are quit strong in tactics, Gandalf 4 and 5 too (Gandalf 4.32f more as Gandalf 5 version). Gandalf 4.32d is the strongest of the older Gandalf versions.
The big surprise is LambChop 10.99 I think.
Best
Frank
Re: STS 1.0 revisited
Posted: Thu Jan 07, 2010 6:47 pm
by Thomas Mayer
Frank Quisinsky wrote:King Of Kings and Aristarch are quit strong in tactics, Gandalf 4 and 5 too (Gandalf 4.32f more as Gandalf 5 version). Gandalf 4.32d is the strongest of the older Gandalf versions.
The big surprise is LambChop 10.99 I think.
Hi Frank,
I have no idea how much you know about the Strategic Test Suite made by Swaminathan & Dann Corbit. It definitely doesn't test tactics, at least that is not what the authors want. Therefor I have no idea what your comment is about ?!
Greets, Thomas
Re: STS 1.0 revisited
Posted: Thu Jan 07, 2010 7:03 pm
by Frank Quisinsky
Hi Thomas,
thats the main problem the most have.
1. Person A say position X is startegic.
2. Person B could be say ... this isn't strategic, position X is clearly tactic.
The most different opinions I read in the past ... I read to the topic:
Positional knowledge, startegic and tactic. You can ask 100 persons for his opinion to 10 engines. You will never found only two same opinions.
Not important to know this test.
I short look to the results and the opinion in clear for me. Must not be right, but right for me
Best
Frank
I am sure (I am not a predictor but) Gandalf 4.32d will make more points as Gandalf 5
The King of Kings results are quit clear.
Re: STS 1.0 revisited
Posted: Thu Jan 07, 2010 8:16 pm
by Thomas Mayer
Hi Frank,
Frank Quisinsky wrote:Not important to know this test.
I short look to the results and the opinion in clear for me. Must not be right, but right for me
well, as long you don't know anything about something you shouldn't say anything about something, no ? So this post was mainly for those interested and informed about the testsuite. Because they know why and for what purpose it was created. Anyway I wonder why you post again in CCC, shall I quote some of your statements about CCC from the past ?
Greets, Thomas
Re: STS 1.0 revisited
Posted: Thu Jan 07, 2010 8:29 pm
by Frank Quisinsky
Hi Thomas,
first part of your message:
If I look in tactical test suits, same programs participant ... the results are the same.
So I wonder because I read "strategic".
Strategic is the most controversial item in chess.
Very hard to say ... startegic or not.
Second part of your message:
I think the discuss between you and me should be end for now.
For personal things you can send a mail.
But feel free you add some information.
Childlike, sorry.
Best
Frank
Re: STS 1.0 revisited
Posted: Fri Jan 08, 2010 2:45 am
by swami
Thanks for the STS tests, Thomas. I certainly do appreciate the results.
Thomas Mayer wrote: But of course "undermining" might be a weakness / strong point of those engines. ...to be continued
Yep. Undermining is definitely the weakness/strong point of those engines.
We need to have 20 more STS in order to have ratings that's close to ratings that's gotten from engine vs engine matches.
Since we've only 8 or so test suites, I believe they do test engine's exact knowledge on those suites.
Re: STS 1.0 revisited
Posted: Fri Jan 08, 2010 2:47 am
by swami
Thomas Mayer wrote:Since I have seen that Swaminathan uses the strategic test suit to make a first guess about engine strength/development I was interested as well in that test-suit and let some engines through STS1 (maybe 2-8 might follow). Especially I am interested whether it scales somehow with engine strength.
I test new releases/updates for 3 reasons
1. Want to see if the new update has shown better scores relative to the previous version.
2. Want to get the rough idea of strength of the *newly* released engine.
By comparing its total scores with engines that have similar overall score.
3. Want to see how high up the new update improved and where it is now placed and which level the engine now plays at.
For #3, STS accurately guessed that the latest Daydreamer plays currently at the level of Abrok, Pepito, Delphil, Gromit, Kiwi, Diablo etc
For #2 STS guessed the rough estimate of most newly released engines. (For example: Jabba plays around 2100)
For #1, STS guessed strength improvement of 95% of the engine updates.
Re: STS 1.0 revisited
Posted: Fri Jan 08, 2010 3:05 am
by swami
Frank Quisinsky wrote:Hi Thomas,
thats the main problem the most have.
1. Person A say position X is startegic.
2. Person B could be say ... this isn't strategic, position X is clearly tactic.
Hi Frank, I tend to disagree here.
How can "Bishop vs Knight", "Square Vacancy", "Knight outposts", "Open files", "Advancement of f/g/h pawns" and others be "Tactics"?
Tactics = Sacrifice, combination, mate threat, huge material gain. etc.
anything that involves immediate gain in material or threats of chechmate is tactics.
Strategy = gain in positional themes: pawn structure, square weakness,
long term plans, Space, mobility, material superiority, threats, material evaluation, king safety.
Strategy doesn't involve huge material gain but are intended to contribute to positional gain, or in rare cases small material gain that maybe inevitable as the result of strategical threats.
Re: STS 1.0 revisited
Posted: Fri Jan 08, 2010 9:33 am
by Frank Quisinsky
Hi Swami,
good !!
The first one in a chess forum I read with comes with an explanation to stratetgic.
I will give the test my first look.
I see results only and look in excel tables I have. I saw, ups the same programs have the same good results in a tactic test. In each of such test suits King of Kings have a better performance.
I like King of Kings very much. With King of Kings I made some analyses in in the past, with Gandalf too, Aristarch too.
I have nothing again test suites but in the most test suites we found a best move. With a best move the most are speaking from tactics. I never see a good test test suite without best moves.
Means with 2, or 3 possible good moves.
I have such a chess book. I got it from around 25 years and used it for chess computers in the past, I like this book.
Sorry, I have nothing against you ... nothing against the strategic test suite.
Best
Frank