STS Test Results

MikeB · Post by **MikeB** » Sun Jun 14, 2015 3:47 am

Latest developmental version of crafty.

At 10 seconds per move. iMac Retina 5K , 4 CORE i7 4790 4.0Ghz

test results summary&#58;

total positions searched..........        1300
number right......................        1056
number wrong......................         244
percentage right..................          81
percentage wrong..................          18
total nodes searched..............355918365972
average search depth..............        24.6
nodes per second..................    27160114
total time........................      218&#58;24

not sure how this stacks against other programs...

Ferdy · Post by **Ferdy** » Sun Jun 14, 2015 6:19 am

MikeB wrote:Latest developmental version of crafty.

At 10 seconds per move. iMac Retina 5K , 4 CORE i7 4790 4.0Ghz

Code: Select all

test results summary&#58;

total positions searched..........        1300
number right......................        1056
number wrong......................         244
percentage right..................          81
percentage wrong..................          18
total nodes searched..............355918365972
average search depth..............        24.6
nodes per second..................    27160114
total time........................      218&#58;24

not sure how this stacks against other programs...

STS has 15 themes now equivalent to 100x15=1500positions.
Download the complete epd here.
https://sites.google.com/site/strategictestsuite/

or if you process the alternative moves and point system, download the complete 15 themes in the following, at the bottom of the first post. There are couple of changes in the alternative moves, and also uniform formatting in the epd with additional opcodes converting SAN to LAN.
http://www.talkchess.com/forum/viewtopi ... 06&t=56653

BTW is there an option for Crafty to output move in LAN format like
e2e4 instead of e4?

MikeB · Post by **MikeB** » Sun Jun 14, 2015 6:15 pm

Ferdy wrote:
MikeB wrote:Latest developmental version of crafty.

At 10 seconds per move. iMac Retina 5K , 4 CORE i7 4790 4.0Ghz
Code: Select all
test results summary&#58;

total positions searched..........        1300
number right......................        1056
number wrong......................         244
percentage right..................          81
percentage wrong..................          18
total nodes searched..............355918365972
average search depth..............        24.6
nodes per second..................    27160114
total time........................      218&#58;24 
not sure how this stacks against other programs...
STS has 15 themes now equivalent to 100x15=1500positions.
Download the complete epd here.
https://sites.google.com/site/strategictestsuite/

or if you process the alternative moves and point system, download the complete 15 themes in the following, at the bottom of the first post. There are couple of changes in the alternative moves, and also uniform formatting in the epd with additional opcodes converting SAN to LAN.
http://www.talkchess.com/forum/viewtopi ... 06&t=56653

BTW is there an option for Crafty to output move in LAN format like
e2e4 instead of e4?

Appreciate the reply. Crafty does output in LAN - option "output long". Looks like your program is pc. If you are interested, I can try to compile a mac version.

st 200ms- single core:

Code: Select all

test results summary&#58;

total positions searched..........        1500
number right......................         862
number wrong......................         638
percentage right..................          57
percentage wrong..................          42
total nodes searched..............  2055634974
average search depth..............        14.0
nodes per second..................     6852116
total time........................        5&#58;00
White&#40;1&#41;&#58;

st 200ms - 4 core:

Code: Select all

test results summary&#58;

total positions searched..........        1500
number right......................         953
number wrong......................         547
percentage right..................          63
percentage wrong..................          36
total nodes searched..............  6717873153
average search depth..............        15.8
nodes per second..................    22392910
total time........................        5&#58;00
White&#40;1&#41;&#58;

MikeB · Post by **MikeB** » Sun Jun 14, 2015 7:05 pm

st 800ms single core:

Code: Select all

test results summary&#58;

total positions searched..........        1500
number right......................         979
number wrong......................         521
percentage right..................          65
percentage wrong..................          34
total nodes searched..............  8627711134
average search depth..............        16.7
nodes per second..................     7189759
total time........................       20&#58;00
White&#40;1&#41;&#58;

st 800ms 4 core

Code: Select all

test results summary&#58;

total positions searched..........        1500
number right......................        1043
number wrong......................         457
percentage right..................          69
percentage wrong..................          30
total nodes searched.............. 30004379169
average search depth..............        18.9
nodes per second..................    25003649
total time........................       20&#58;00
White&#40;1&#41;&#58;

Ferdy · Post by **Ferdy** » Mon Jun 15, 2015 8:27 am

MikeB wrote:
Ferdy wrote:
MikeB wrote:Latest developmental version of crafty.

At 10 seconds per move. iMac Retina 5K , 4 CORE i7 4790 4.0Ghz
Code: Select all
test results summary&#58;

total positions searched..........        1300
number right......................        1056
number wrong......................         244
percentage right..................          81
percentage wrong..................          18
total nodes searched..............355918365972
average search depth..............        24.6
nodes per second..................    27160114
total time........................      218&#58;24 
not sure how this stacks against other programs...
STS has 15 themes now equivalent to 100x15=1500positions.
Download the complete epd here.
https://sites.google.com/site/strategictestsuite/

or if you process the alternative moves and point system, download the complete 15 themes in the following, at the bottom of the first post. There are couple of changes in the alternative moves, and also uniform formatting in the epd with additional opcodes converting SAN to LAN.
http://www.talkchess.com/forum/viewtopi ... 06&t=56653

BTW is there an option for Crafty to output move in LAN format like
e2e4 instead of e4?
Appreciate the reply. Crafty does output in LAN - option "output long".

I just knew it, unfortunately it also outputs the Piece letter as in Ng1f3. I thought it is g1f3, as in uci output. So in this case the tool is of no use to calculate points. Later I will implement SAN, but I need to create a new format on the epd. Also on my list is to support st (decimal and integer values in seconds) command for WB engines.

Looks like your program is pc. If you are interested, I can try to compile a mac version.

I am using windows, the source is a python script and converted to exe file. I am using python 2.7.6. Does python run on your system?

st 200ms- single core:

Code: Select all

test results summary&#58;

total positions searched..........        1500
number right......................         862
number wrong......................         638
percentage right..................          57
percentage wrong..................          42
total nodes searched..............  2055634974
average search depth..............        14.0
nodes per second..................     6852116
total time........................        5&#58;00
White&#40;1&#41;&#58;

I run Crafty on my system also at 200ms/pos.

Code: Select all

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Physical Cores&#58; 4, HT&#58; ON

Crafty v24.1 &#40;1 cpus&#41;

White&#40;1&#41;&#58; bench
Running benchmark. . .
.......
Total nodes&#58; 194234149
Raw nodes per second&#58; 5242487
Total elapsed time&#58; 37.05

STS 1-15, 0.2s/pos
test results summary&#58;

total positions searched..........        1500
number right......................         826
number wrong......................         674
percentage right..................          55
percentage wrong..................          44
total nodes searched..............  1055659387
average search depth..............        12.9
nodes per second..................     3468114
total time........................        5&#58;04

Your machine is faster than mine. Here are some results from different engines, also at 200ms. We can only compare the BestCnt line.

Code: Select all

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Rodent 1.7 build 1
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.200s
Test duration&#58; 00&#58;03&#58;33
Expected time to finish&#58; 00&#58;05&#58;45

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     65     61     50     62     65     64     49     33     48     60     52     54     59     69     33    824
   Score    746    680    646    728    726    819    621    509    598    670    633    620    707    783    558  10044
Score&#40;%)   74.6   68.0   64.6   72.8   72.6   81.9   62.1   50.9   59.8   67.0   63.3   62.0   70.7   78.3   55.8   67.0

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Arasan 17.5
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;23
Expected time to finish&#58; 00&#58;05&#58;45

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     61     60     56     55     59     51     47     29     39     65     47     47     52     65     37    770
   Score    688    674    678    643    690    762    600    486    527    741    617    574    636    760    575   9651
Score&#40;%)   68.8   67.4   67.8   64.3   69.0   76.2   60.0   48.6   52.7   74.1   61.7   57.4   63.6   76.0   57.5   64.3

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Deuterium v14.3.34.130
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;25
Expected time to finish&#58; 00&#58;05&#58;45

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     67     61     63     57     67     67     59     43     38     69     45     60     69     63     31    859
   Score    762    699    743    687    736    842    699    573    508    775    579    687    759    730    547  10326
Score&#40;%)   76.2   69.9   74.3   68.7   73.6   84.2   69.9   57.3   50.8   77.5   57.9   68.7   75.9   73.0   54.7   68.8

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Gaviota v1.0
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;04
Expected time to finish&#58; 00&#58;05&#58;45

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     66     56     62     58     65     65     63     41     57     66     52     63     62     63     45    884
   Score    743    680    714    724    717    817    717    559    642    750    602    712    729    724    695  10525
Score&#40;%)   74.3   68.0   71.4   72.4   71.7   81.7   71.7   55.9   64.2   75.0   60.2   71.2   72.9   72.4   69.5   70.2

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Senpai 1.0
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;44
Expected time to finish&#58; 00&#58;05&#58;45

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     67     68     62     64     75     69     63     38     51     78     63     69     64     68     42    941
   Score    756    746    741    749    794    829    730    575    643    852    759    768    753    801    627  11123
Score&#40;%)   75.6   74.6   74.1   74.9   79.4   82.9   73.0   57.5   64.3   85.2   75.9   76.8   75.3   80.1   62.7   74.2

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Nemo SP64o 1.0.1 Beta
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;22
Expected time to finish&#58; 00&#58;05&#58;45

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     69     57     60     56     65     69     55     37     53     73     54     66     60     55     36    865
   Score    769    697    699    684    723    843    676    527    630    803    655    743    699    673    577  10398
Score&#40;%)   76.9   69.7   69.9   68.4   72.3   84.3   67.6   52.7   63.0   80.3   65.5   74.3   69.9   67.3   57.7   69.3

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Rhetoric 1.4.1 x64
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;44
Expected time to finish&#58; 00&#58;05&#58;45

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     61     62     63     55     66     59     56     41     53     68     45     63     63     63     39    857
   Score    697    682    721    679    711    765    683    585    632    767    591    719    744    746    639  10361
Score&#40;%)   69.7   68.2   72.1   67.9   71.1   76.5   68.3   58.5   63.2   76.7   59.1   71.9   74.4   74.6   63.9   69.1

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Stockfish 6 64 POPCNT
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;20
Expected time to finish&#58; 00&#58;05&#58;45

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     84     76     73     70     77     76     69     68     69     81     73     72     78     75     48   1089
   Score    874    847    818    807    821    893    787    794    765    864    795    786    842    849    708  12250
Score&#40;%)   87.4   84.7   81.8   80.7   82.1   89.3   78.7   79.4   76.5   86.4   79.5   78.6   84.2   84.9   70.8   81.7

Comparison based on my machine. Rating in parenthesis is from CCRL 40/4.

Code: Select all

Arasan 17.5		        770/1500 &#40;2849&#41;
Rodent 1.7 build 1	    824/1500 &#40;2833&#41;
Crafty v24.1 &#40;1 cpus&#41;	 826/1500 &#40;No record, but v24.0 is 2801&#41;
Rhetoric 1.4.1 x64	    857/1500 &#40;2794&#41;
Deuterium v14.3.34.130	859/1500 &#40;2889&#41;
Nemo SP64o 1.0.1 Beta	 865/1500 &#40;2864&#41;
Gaviota v1.0		       884/1500 &#40;2895&#41;
Senpai 1.0		         941/1500 &#40;3016&#41;
Stockfish 6 64 POPCNT	1089/1500 &#40;3317&#41;

Not really a good fit, just a rough formula to get a rating estimate for CCRL 40/4 based on bestmove counts percentage from 1500 pos, at 200ms/pos.
I think Crafty does fine here as with other engines though the data is not that big.

Code: Select all

sts_rating = 25.212 x bestmove_percentage + 1439.4

MikeB · Post by **MikeB** » Wed Jun 17, 2015 3:51 am

...the source is a python script and converted to exe file. I am using python 2.7.6. Does python run on your system?

Yes - python scripts run on Mac. I can adjust output to LAN as well (with no piece indicators).

Ferdy · Post by **Ferdy** » Wed Jun 17, 2015 8:24 am

MikeB wrote:
...the source is a python script and converted to exe file. I am using python 2.7.6. Does python run on your system?
Yes - python scripts run on Mac. I can adjust output to LAN as well (with no piece indicators).

I have implemented the SAN for WB engine, also support for st command both integer and with decimal number in sec, as in --st 0.2 --san options.
The link to script and new epd format is sent in PM. For WB engine automatic sts rating calculation is still not implemented, but you can get the points percentage.

MikeB · Post by **MikeB** » Sat Jul 18, 2015 4:41 pm

MikeB wrote:

Code: Select all

test results summary&#58;

total positions searched..........        1500
number right......................        1043
number wrong......................         457
percentage right..................          69
percentage wrong..................          30
total nodes searched.............. 30004379169
average search depth..............        18.9
nodes per second..................    25003649
total time........................       20&#58;00
White&#40;1&#41;&#58;

latest developmental version

Code: Select all

test results summary&#58;

total positions searched..........        1500
number right......................        1057
number wrong......................         443
percentage right..................          70
percentage wrong..................          29
total nodes searched.............. 26303312891
average search depth..............        19.2
nodes per second..................    21918879
total time........................       20&#58;00
White&#40;1&#41;&#58;

a tad slower and a tad more smarter

MikeB · Post by **MikeB** » Sat Jul 18, 2015 5:21 pm

MikeB wrote:

Code: Select all

test results summary&#58;

total positions searched..........        1500
number right......................         862
number wrong......................         638
percentage right..................          57
percentage wrong..................          42
total nodes searched..............  2055634974
average search depth..............        14.0
nodes per second..................     6852116
total time........................        5&#58;00
White&#40;1&#41;&#58;

]

Code: Select all

latest developmental version&#58;


test results summary&#58;

total positions searched..........        1500
number right......................         887
number wrong......................         613
percentage right..................          59
percentage wrong..................          40
total nodes searched..............  2165113989
average search depth..............        14.4
nodes per second..................     7217046
total time........................        5&#58;00
White&#40;1&#41;

xr_a_y · Post by **xr_a_y** » Mon Apr 23, 2018 6:16 pm

Wow just found this old STS results, so I decided to score Weini at this ... well not a good news, at 5sec per position (mean depth was around 11), Weini only gets 688/1500 right ...

Splitting that by STS pack this gives :
1: 47 (nothing special is implemented)
2: 36 (Wow ! I though my thing on this subject was working ... :'-()
3: 49 (just PSQT)
4: 46 (just something wia PSQT)
5: 51 (more or less same as in CPW)
6: 57
7: 37 (a king of tradeoff pieces bonus for the winning side is activated)
8: 38 (ok pawn storm was not activated in eval ...)
9: 45 (ok pawn storm was not activated in eval ...)
10: 51
11: 44 (Weini has both some king centralization via PSQT and king safety by keeping pawn in front of king, but king troppism was not activated for this test)
12: 46 (center control by threat look up was not activated, but PSQT have some stuff on this subject)
13: 65
14: 52 (just by PSQT)
15: 26 (... how can I defined "queen pointless exchange" in eval ?)

A lot of work to do ... Is this a good idea to work on evaluation based on STS ?

STS Test Results

STS Test Results

Re: STS Test Results

Re: STS Test Results

Re: STS Test Results

Re: STS Test Results

Re: STS Test Results

Re: STS Test Results

Re: STS Test Results

Re: STS Test Results

Re: STS Test Results