'STS' Test Suite (v2.0): Open Files and Diagonals.. Released

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
swami
Posts: 6535
Joined: Thu Mar 09, 2006 3:21 am

'STS' Test Suite (v2.0): Open Files and Diagonals.. Released

Post by swami » Sat Apr 04, 2009 8:17 am

Chapter 2 Of the Strategic Test Suite: Open Files and Diagonals is now available for download.

* This test suite consists of carefully selected 100 questionnaires on Open Files and Diagonals.

* All of the questions in this suite are thoroughly analysed with assistance of Rybka 3/Naum/Zappa by Dann Corbit. Each question has been analysed by engines for hours.

* All the answers carry some similar traits:
  • (1) A Rook move to seize the control of an open/Semi-Open file
    (2) A Rook move to battery itself in an open/Semi-Open file.
    (3) A Bishop move to seize the control of the open/Semi-Open diagonal.
    (4) A Queen move to serve the functions of both the Bishop and Rook.
I initially selected 190 problems and sent to Dann, of which 100 has passed criteria. Dann made the final selection and done all that analysis with the help of the engines.

Download it here: STS 2.0
http://computerchessblogger.googlepages.com/sts

Feel free to report the results from your engine/ favourite engine, suggestions are welcome.

Special Thanks to Pedro Castro, Allard Siemelink, Zach Wegner for testing this suite with their own engine and correcting mistakes throughout the process. And Thanks to those who reported bugs and tested out the V 1.0 of the test suite called "Undermining"

Test Suite Released time: 4th of April, 2009
Dann Corbit/Swaminathan.

User avatar
Eelco de Groot
Posts: 4131
Joined: Sun Mar 12, 2006 1:40 am
Location: Groningen

Re: 'STS' Test Suite (v2.0): Open Files and Diagonals.. Rele

Post by Eelco de Groot » Sat Apr 04, 2009 4:25 pm

Thanks Swami and Dann for part two of your testsuite!

A lot of work has gone into it I can see. I think this suite may be a little less hard to solve reasonably well than your first one? At least I got some better results with 'Open files and diagonals', but for the first test they were very poor. Now I got just over 50% with Blueberry on the Athlon :) At the same time the antivirus program decided it was a good time to start downloading and installing new virusdefinitions so I think Blueberry might have gotten one more solution right with some more time!

Code: Select all


Engine: Blueberry Beta 4 DM70 Build 421 (Athlon 2009 MHz, 128 MB)
by F. Letouzey, T. Gaksch, E. de Groot
Right until now: 52 of 100  ;  20:51m
20 seconds per position, fixed time. Only first solutions counted.

         1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
 -------------------------------------------------------------------------------------
   0 |   -   0   1   -   -   -   -   -   -   -   -  10   -   -   -  16   -   4   -   -
  20 |   -   -   0   3   2   -  11   2   -   0   -   -   0   1   -   0  17   -  16   0
  40 |  18   1   2   -   9   0   0   1   6   -   -   -   -   3   -   0   -   -   -   4
  60 |   -   1   -   7   -   -  15   -   3   0  14  14   -   3   3   0   0   -   2   -
  80 |   0   -   -   -   1   -   6   -  17   7   -  17   3   -   9   0   5  14   -   0

   1 sec ->  15/100
   2 sec ->  21/100
   3 sec ->  25/100
   4 sec ->  31/100
   5 sec ->  33/100
   6 sec ->  34/100
   7 sec ->  36/100
   8 sec ->  38/100
   9 sec ->  38/100
  10 sec ->  40/100
  11 sec ->  41/100
  12 sec ->  42/100
  13 sec ->  42/100
  14 sec ->  42/100
  15 sec ->  45/100
  20 sec ->  52/100
  n/s: 554.997  
  TotTime: 33:29m    SolTime: 20:51m
Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan

Spock

Re: 'STS' Test Suite (v2.0): Open Files and Diagonals.. Rele

Post by Spock » Sat Apr 04, 2009 11:27 pm

Crafty 23.0 x64 4CPU
Quad Core Opteron 1352
20 secs per move

80 of 100 matching moves
04/04/2009 23:11:27, Total time: 00:35:47
Rated time: 08:25 = 505 Seconds

I wonder if Bob came along with his 8-way Xeon whether Crafty would improve on that even further

Spock

Re: 'STS' Test Suite (v2.0): Open Files and Diagonals.. Rele

Post by Spock » Sun Apr 05, 2009 12:11 am

Bright 0.4a 4CPU
Quad Core Opteron 1352
20 secs per move

83 of 100 matching moves
05/04/2009 00:10:50, Total time: 00:34:46
Rated time: 07:17 = 437 Seconds

swami
Posts: 6535
Joined: Thu Mar 09, 2006 3:21 am

Re: 'STS' Test Suite (v2.0): Open Files and Diagonals.. Rele

Post by swami » Sun Apr 05, 2009 3:02 am

Eelco de Groot wrote:Thanks Swami and Dann for part two of your testsuite!
You're welcome, Eelco. :)
Eelco de Groot wrote: A lot of work has gone into it I can see. I think this suite may be a little less hard to solve reasonably well than your first one?
Could be. But from my tests, It seems that some engines did better in undermining than in open files and diagonals. For example, Booot, Bright. I need to re-run the same engines in the undermining suite again to find out the differences. I have conducted this (open files and diagonals) test on nearly half of the engines available. I'd post the results soon.
Eelco de Groot wrote:At least I got some better results with 'Open files and diagonals', but for the first test they were very poor. Now I got just over 50% with Blueberry on the Athlon :) At the same time the antivirus program decided it was a good time to start downloading and installing new virusdefinitions so I think Blueberry might have gotten one more solution right with some more time!
Well,considering it was run on Athlon, I'd think it's indeed a good result. These test suites are mainly designed for the engines <2700. Authors who are developing their engine from initial 1800 rating would find this useful. They could make tweaks, implement understanding etc in an attempt to improve its knowledge in specific test suites. I'd hope they could gain more elos via tweaks.

Strategy and understanding of positional patterns play the most important role in computer chess. For humans, it got to be undoubtedly tactics. Engines are generally considered good tacticians, what they lack is deep positional understanding. Some lack one concept of strategy, while they play better in others.
Eelco de Groot wrote:

Code: Select all

Engine&#58; Blueberry Beta 4 DM70 Build 421 &#40;Athlon 2009 MHz, 128 MB&#41;
by F. Letouzey, T. Gaksch, E. de Groot
Right until now&#58; 52 of 100  ;  20&#58;51m
20 seconds per position, fixed time. Only first solutions counted.

         1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
 -------------------------------------------------------------------------------------
   0 |   -   0   1   -   -   -   -   -   -   -   -  10   -   -   -  16   -   4   -   -
  20 |   -   -   0   3   2   -  11   2   -   0   -   -   0   1   -   0  17   -  16   0
  40 |  18   1   2   -   9   0   0   1   6   -   -   -   -   3   -   0   -   -   -   4
  60 |   -   1   -   7   -   -  15   -   3   0  14  14   -   3   3   0   0   -   2   -
  80 |   0   -   -   -   1   -   6   -  17   7   -  17   3   -   9   0   5  14   -   0

   1 sec ->  15/100
   2 sec ->  21/100
   3 sec ->  25/100
   4 sec ->  31/100
   5 sec ->  33/100
   6 sec ->  34/100
   7 sec ->  36/100
   8 sec ->  38/100
   9 sec ->  38/100
  10 sec ->  40/100
  11 sec ->  41/100
  12 sec ->  42/100
  13 sec ->  42/100
  14 sec ->  42/100
  15 sec ->  45/100
  20 sec ->  52/100
  n/s&#58; 554.997  
  TotTime&#58; 33&#58;29m    SolTime&#58; 20&#58;51m
Eelco
This is very interesting test, from 1sec to 20 sec, It does highly leap forward when given more time. I thought of running this kind of test whenever the update announcement of an engine is posted onto the forum. Work on the third test suite is underway. Perhaps we could release it in 2 months time. :)

Spock

Re: 'STS' Test Suite (v2.0): Open Files and Diagonals.. Rele

Post by Spock » Sun Apr 05, 2009 11:50 am

swami wrote: I have conducted this (open files and diagonals) test on nearly half of the engines available. I'd post the results soon.
That will be very interesting :)

Meantime, another result here

Glaurung 2.2 x64 4CPU
Quad Core Opteron 1352
20 secs per move

80 of 100 matching moves
05/04/2009 12:48:26, Total time: 01:34:41
Rated time: 08:08 = 488 Seconds

bob
Posts: 20402
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: 'STS' Test Suite (v2.0): Open Files and Diagonals.. Rele

Post by bob » Sun Apr 05, 2009 3:40 pm

I am running this for fun, but I do _not_ consider these a "positional test suite". For example, in undermine #2, c5 is a tactical move. +4.0 is not the mark of a great "positional test move".

I think that a positional test suite should be one where it is irrelevant what other programs think in general, they should be about ideas that are actually positional in nature and where tactics don't play a role at all.. For example, 1/2 of the original kopec-bratko test positions were pawn lever positions that were not tactical in nature...

Spock

Re: 'STS' Test Suite (v2.0): Open Files and Diagonals.. Rele

Post by Spock » Sun Apr 05, 2009 3:55 pm

I was running it just for fun :wink: Look forward to your results

bob
Posts: 20402
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: results

Post by bob » Sun Apr 05, 2009 4:05 pm

Spock wrote:I was running it just for fun :wink: Look forward to your results
Hardware: dual-socket Nehalem 2.26ghz, 12gb ram.

latest version of crafty.

undermine:

Code: Select all

total positions searched..........         100
number right......................          96
number wrong......................           4
percentage right..................          96
percentage wrong..................           4
total nodes searched..............  4908791123
average search depth..............         6.3
nodes per second..................    14098024
total time........................        5&#58;48
I made crafty compute until it had the correct move for three consecutive iterations, but I modified it to capture the results when it first found the "correct" move assuming it held it for 3 consecutive iterations without changing its mind.

NPS is slow here because many of them are found in zero time which screws up the NPS calculations.

Crafty doesn't understand the Re1=5 type stuff in the files/diags test, so it only counts things correct if it matches the "bm" given.

Code: Select all

total positions searched..........         100
number right......................          93
number wrong......................           7
percentage right..................          93
percentage wrong..................           7
total nodes searched..............  8553232076
average search depth..............         6.7
nodes per second..................    14500690
total time........................        9&#58;49
I had it search for 1 minute max on both tests, so the ones that it didn't find took 60 seconds each. Looking at the last run, 7 wrong means 7 minutes of the total time were used on those 7 positions. Or, it took 2:49 to find the 93 correct answers. No idea how many of the "secondary" answers it got right since it doesn't use those at all.

swami
Posts: 6535
Joined: Thu Mar 09, 2006 3:21 am

Re: 'STS' Test Suite (v2.0): Open Files and Diagonals.. Rele

Post by swami » Sun Apr 05, 2009 4:06 pm

bob wrote:I am running this for fun, but I do _not_ consider these a "positional test suite". For example, in undermine #2, c5 is a tactical move. +4.0 is not the mark of a great "positional test move".

I think that a positional test suite should be one where it is irrelevant what other programs think in general, they should be about ideas that are actually positional in nature and where tactics don't play a role at all.. For example, 1/2 of the original kopec-bratko test positions were pawn lever positions that were not tactical in nature...
There are some tests where high scoring do indeed happen, So Ok, I shall stop calling it a positional test suite, I'd instead call it "Open Files and Diagonals", "Undermining" by name only, from now on.

Looking forward to results from Crafty on that faster hardware you've in your possession.
Last edited by swami on Sun Apr 05, 2009 4:08 pm, edited 2 times in total.

Post Reply