STS test suite and engine analysis interface

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

STS test suite and engine analysis interface

Post by Ferdy »

Now v3 has basic support for winboard (WB) engines. It does not approximate STS rating for WB engines but only for UCI engines.
For WB engines analysis time can be sent by level command, say you want the engine to analyze at around 200ms/pos, you may use
--mps 40 --tc 0:8 command line options. Only those WB engines that output similar to "move e2e4" - LAN notation are considered, and only those engines that hve support for setboard and level commands. Also the tool has no control on number of threads and hash size of the engines. But for uci engines you can control the hash and the threads by --hash <value> and --threads <value> respectively.

Sample command lines.

Code: Select all

STS_Rating_v3 -f "STS1-STS15_LAN.epd" -e "LambChop_1099.exe -hash 128" --proto wb --mps 300 --tc 1 --log
or

Code: Select all

STS_Rating_v3 -f "STS1-STS15_LAN.epd" -e "C&#58;\Chess\engines\nobook\RomiChess_P3L\RomiChessp3L64.exe" --proto wb --mps 40 --tc 0&#58;8 --log
For uci engines.

Code: Select all

STS_Rating_v3 -f "STS1-STS15_LAN.epd" -e "Stockfish 6.exe" --proto uci -h 128 --getrating
In --getrating the tool will determine the time and the threads.

Usage:

Code: Select all

program -f <epdfile> -e <engname> -t <numthreads> --movetime <timeinms>

Options&#58;
-f or --file <value>, for epd file input
-e or --engine <value>, name of engine
-h or --hash <value>, hash size in MB, default is 32 MB
-t or --threads <value>, for threads, Cores and Max CPUs setting
--movetime <value>, time in ms, default is 1000ms
--log, save engine log
--getrating, calculate CCRL 40/4 rating estimate for uci engines only
--mps <integer value>, moves per session for winboard engines
--tc <integer value in minutes or mm&#58;ss>, timce control for winboard engines
--proto <uci or wb>, protocol the engine supports

Example&#58;
Ex1. Analyze test.epd with 2 threads and 128 MB hash, at 3s/pos using sf 6.exe
STS_Rating -f test.epd -e "sf 6.exe" -h 128 -t 2 --movetime 3000
Tested WB engines.

Code: Select all

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; LambChop_1099.exe -hash
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;10&#58;50
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 300 1 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     36     40     44     32     56     46     25     18     23     65     21     34     37     55     18    550
   Score    439    478    560    419    625    704    398    344    405    717    351    429    524    659    376   7428
Score&#40;%)   43.9   47.8   56.0   41.9   62.5   70.4   39.8   34.4   40.5   71.7   35.1   42.9   52.4   65.9   37.6   49.5


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Averno081
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;10&#58;28
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 300 1 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     60     41     50     54     67     47     43     30     38     60     38     48     47     50     39    712
   Score    668    499    624    638    740    720    543    460    495    682    508    598    571    667    563   8976
Score&#40;%)   66.8   49.9   62.4   63.8   74.0   72.0   54.3   46.0   49.5   68.2   50.8   59.8   57.1   66.7   56.3   59.8


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Myrddin_0.87-64
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;00
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     52     47     43     44     65     54     40     22     37     58     34     46     64     53     30    689
   Score    606    581    551    588    696    747    559    391    508    673    476    590    723    651    480   8820
Score&#40;%)   60.6   58.1   55.1   58.8   69.6   74.7   55.9   39.1   50.8   67.3   47.6   59.0   72.3   65.1   48.0   58.8


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Natwarlal_v0.14
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;10&#58;18
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 300 1 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     54     52     41     44     64     51     42     33     29     54     39     59     66     64     28    720
   Score    634    634    536    587    715    736    534    451    446    654    513    681    739    749    477   9086
Score&#40;%)   63.4   63.4   53.6   58.7   71.5   73.6   53.4   45.1   44.6   65.4   51.3   68.1   73.9   74.9   47.7   60.6


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; Gerbil_02_x64_ja
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;06&#58;14
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     37     30     31     35     58     53     30     26     24     55     31     37     45     49     29    570
   Score    482    410    491    457    691    751    412    364    403    633    440    485    565    626    483   7693
Score&#40;%)   48.2   41.0   49.1   45.7   69.1   75.1   41.2   36.4   40.3   63.3   44.0   48.5   56.5   62.6   48.3   51.3


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; scorpio-276-64-ja
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;07&#58;54
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     73     58     61     67     69     68     54     32     46     67     45     67     59     65     35    866
   Score    784    687    739    761    751    849    617    474    592    740    554    764    697    755    608  10372
Score&#40;%)   78.4   68.7   73.9   76.1   75.1   84.9   61.7   47.4   59.2   74.0   55.4   76.4   69.7   75.5   60.8   69.1


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; C&#58;\Chess\engines\nobook\RomiChess_P3L\RomiChessp3L64
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;47
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     51     49     51     45     63     56     46     30     42     67     41     46     51     63     22    723
   Score    605    575    612    572    718    763    585    457    517    742    532    583    641    724    407   9033
Score&#40;%)   60.5   57.5   61.2   57.2   71.8   76.3   58.5   45.7   51.7   74.2   53.2   58.3   64.1   72.4   40.7   60.2


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; C&#58;\Chess\engines\nobook\Thinker54D\ThinkerInert 5.4D x64 SP
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;05&#58;55
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     69     67     70     59     67     59     53     51     55     72     60     60     63     67     43    915
   Score    762    751    775    692    763    808    681    662    667    799    705    687    725    766    621  10864
Score&#40;%)   76.2   75.1   77.5   69.2   76.3   80.8   68.1   66.2   66.7   79.9   70.5   68.7   72.5   76.6   62.1   72.4


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; C&#58;\Chess\engines\nobook\Satana.2.0.7\Satana.2.0.7.w64bit
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;06&#58;51
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     18     28     22     25     34     34     27     18     18     61     14     15     19     34      5    372
   Score    334    385    339    322    430    597    355    314    275    676    279    247    287    458    176   5474
Score&#40;%)   33.4   38.5   33.9   32.2   43.0   59.7   35.5   31.4   27.5   67.6   27.9   24.7   28.7   45.8   17.6   36.5


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; C&#58;\Chess\engines\nobook\shallow-rev688-win-ja\shallow-rev688-win-ja\Windows\shallow-rev688-64-ja
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;06&#58;52
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     50     43     38     55     58     52     41     28     37     54     36     46     59     67     32    696
   Score    607    535    525    639    663    723    542    447    504    642    520    574    707    747    517   8892
Score&#40;%)   60.7   53.5   52.5   63.9   66.3   72.3   54.2   44.7   50.4   64.2   52.0   57.4   70.7   74.7   51.7   59.3


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; C&#58;\Chess\engines\nobook\Nemeton\Nemeton\Nemeton_1
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;06&#58;17
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     39     40     45     42     57     55     33     20     25     63     38     45     55     45     25    627
   Score    528    507    576    544    635    763    480    395    380    704    503    551    672    612    458   8308
Score&#40;%)   52.8   50.7   57.6   54.4   63.5   76.3   48.0   39.5   38.0   70.4   50.3   55.1   67.2   61.2   45.8   55.4


Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Engine&#58; C&#58;\Chess\engines\nobook\Knightx1.92\Knightx192
Estimated time/pos&#58; 0.400s
Test duration&#58; 00&#58;08&#58;52
Expected time to finish&#58; 00&#58;10&#58;45
Command&#58; level 40 0&#58;16 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     47     49     47     46     57     46     40     27     35     60     38     49     55     64     27    687
   Score    567    555    600    589    642    709    518    397    475    689    506    580    654    756    452   8689
Score&#40;%)   56.7   55.5   60.0   58.9   64.2   70.9   51.8   39.7   47.5   68.9   50.6   58.0   65.4   75.6   45.2   57.9
Download:
http://www.mediafire.com/download/3xyap ... ing_v3.rar
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: STS test suite and engine analysis interface

Post by Laskos »

Thank you very much Ferdy. Only today I discovered your STS tool v1 for UCI only, using it now. I will use v3 too. To get ratings from v1, you approximate STS global result (15,000 points maximum, 100%) to CCRL via a linear formula, right? And the translator from percentages to ELO is a factor close to 45, right? It seems STS testsuite is the best one to approximate ratings based on games, the more tactical ones always favor tactical settings, like in Houdini Tactical, which are not the best ones in games.
cetormenter
Posts: 170
Joined: Sun Oct 28, 2012 9:46 pm

Re: STS test suite and engine analysis interface

Post by cetormenter »

Fantastic tool Ferdinand. Here is a run with Nirvana and Stockfish.

Code: Select all

Intel&#40;R&#41; Core&#40;TM&#41; i7-4790K CPU @ 4.00GHz
Engine&#58; Nirvanachess 2.1c
Hash&#58; 32, Threads&#58; 1, time/pos&#58; 0.131s
Test duration&#58; 00&#58;03&#58;29
Expected time to finish&#58; 00&#58;04&#58;01
STS rating&#58; 3012

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     75     57     67     67     73     60     66     53     55     71     58     69     65     68     38    942
   Score    791    669    789    751    778    795    737    645    655    771    681    757    739    786    621  10965
Score&#40;%)   79.1   66.9   78.9   75.1   77.8   79.5   73.7   64.5   65.5   77.1   68.1   75.7   73.9   78.6   62.1   73.1
  Rating   3279   2736   3270   3101   3221   3297   3038   2629   2673   3190   2789   3128   3047   3257   2522   3012

Engine&#58; Stockfish 6 64
Hash&#58; 32, Threads&#58; 1, time/pos&#58; 0.131s
Test duration&#58; 00&#58;03&#58;25
Expected time to finish&#58; 00&#58;04&#58;01
STS rating&#58; 3345

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     84     74     74     72     73     73     68     68     69     78     70     70     75     74     47   1069
   Score    869    825    812    822    794    870    775    785    772    849    772    771    815    842    714  12087
Score&#40;%)   86.9   82.5   81.2   82.2   79.4   87.0   77.5   78.5   77.2   84.9   77.2   77.1   81.5   84.2   71.4   80.6
  Rating   3626   3430   3372   3417   3292   3631   3208   3252   3194   3537   3194   3190   3386   3506   2936   3345

Their actual ccrl 40/4 ratings are 3029 +/- 19 and 3314 +/- 14 respectively so the estimates are almost spot on. Nice work!
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

Laskos wrote:Thank you very much Ferdy. Only today I discovered your STS tool v1 for UCI only, using it now. I will use v3 too. To get ratings from v1, you approximate STS global result (15,000 points maximum, 100%) to CCRL via a linear formula, right?
Yes that is based on 15000 points.
And the translator from percentages to ELO is a factor close to 45, right?
The formula in the tool is
Rating = 44.523 * scorePercent - 242.85
That scorePercent should be based on all 15 themes. In the result output, there are ratings per theme, sts1, sts2, ... the rating on those themes are calculated from the formula based on 15000 points. I just showed it there to see how the score would look by rating, of course it is possible to make a regression per theme of 100 positions.
It seems STS testsuite is the best one to approximate ratings based on games, the more tactical ones always favor tactical settings, like in Houdini Tactical, which are not the best ones in games.
This is also what I have observed, weaker engines most often get lower score, even at lower rating range. Another thing is if the engine missed the bestmove, it still may get points from alternative moves with lower score. Those alternative moves would further create differences in the strength estimation.

We need to encourage Swaminathan and Dann to create more :).

I am thinking of creating a suite, (but not like the STS with themes) something of an idea from Miguel in selecting the position. Parse the games in pgn file, find a position where the next 6 plies have no captures (perhaps also no promotions, no check, no evasions) save the position. Then analyze the saved positions by strong engine running in multipv say 5, if position has some interesting properties based from the score analysis/comparison of every pv score, then save the position together with the move and corresponding points based from the multipv scores (now we have a similar system from STS). So what we have is a quiet test suite which would challenge the engine its positional play for the next 3 moves (6 plies quiet). All this can be done automatically.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

cetormenter wrote:Fantastic tool Ferdinand. Here is a run with Nirvana and Stockfish.

Code: Select all

Intel&#40;R&#41; Core&#40;TM&#41; i7-4790K CPU @ 4.00GHz
Engine&#58; Nirvanachess 2.1c
Hash&#58; 32, Threads&#58; 1, time/pos&#58; 0.131s
Test duration&#58; 00&#58;03&#58;29
Expected time to finish&#58; 00&#58;04&#58;01
STS rating&#58; 3012

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     75     57     67     67     73     60     66     53     55     71     58     69     65     68     38    942
   Score    791    669    789    751    778    795    737    645    655    771    681    757    739    786    621  10965
Score&#40;%)   79.1   66.9   78.9   75.1   77.8   79.5   73.7   64.5   65.5   77.1   68.1   75.7   73.9   78.6   62.1   73.1
  Rating   3279   2736   3270   3101   3221   3297   3038   2629   2673   3190   2789   3128   3047   3257   2522   3012

Engine&#58; Stockfish 6 64
Hash&#58; 32, Threads&#58; 1, time/pos&#58; 0.131s
Test duration&#58; 00&#58;03&#58;25
Expected time to finish&#58; 00&#58;04&#58;01
STS rating&#58; 3345

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     84     74     74     72     73     73     68     68     69     78     70     70     75     74     47   1069
   Score    869    825    812    822    794    870    775    785    772    849    772    771    815    842    714  12087
Score&#40;%)   86.9   82.5   81.2   82.2   79.4   87.0   77.5   78.5   77.2   84.9   77.2   77.1   81.5   84.2   71.4   80.6
  Rating   3626   3430   3372   3417   3292   3631   3208   3252   3194   3537   3194   3190   3386   3506   2936   3345

Their actual ccrl 40/4 ratings are 3029 +/- 19 and 3314 +/- 14 respectively so the estimates are almost spot on. Nice work!
Looks good, I see your time/pos of 0.131s. The formula is based on my computer using 0.2s / pos. So in this case the bench in the tool seemed to work just right :).
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: STS test suite and engine analysis interface

Post by lucasart »

Ferdinand,

Thank you. This is a very meaningful contribution to computer chess, as opposed to yet another tactics/zugzwang test suite.

Potentially, people can develop and improve engines based on that measure, instead of having to play tens of thousands of games to test every patch. At least for weak engines (I suspect cutting corners eventually you hit a glass ceiling).

Where can I find the code and epd files to run it? Is it Windows only or can I compile it for Linux?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: STS test suite and engine analysis interface

Post by JVMerlino »

Marvelous tool. Can't believe I didn't know about it.

Thanks very much, Ferdinand!

jm
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

lucasart wrote: Where can I find the code and epd files to run it? Is it Windows only or can I compile it for Linux?
The original test suite is from here.
https://sites.google.com/site/strategictestsuite/

To easily compare the move with the uci engines's LAN format bestmove output, I make some reformatting on those epd's, then combine the 15 themes (100 pos each theme) into one file. The reformatted epd's can be found in my first post in this thread (check below that post there is a download link). The tool to read and let the engine analyze the epd's is also included in that file. That tool is a python script which was converted to exe file using py2exe program.
Maarten Claessens
Posts: 106
Joined: Mon May 12, 2014 10:08 am
Location: Near Nijmegen

Re: STS test suite and engine analysis interface

Post by Maarten Claessens »

I ran the tool with WaDuuttie twice: the first time with the Wb2Uci-adaptor and WaDuuttie interpreting st 1000 as milliseconds, the second time as the WinBoard-engine he actually is (the commandline option "-h 23" makes sure at most 128M hash is used).
This are the results:

Code: Select all

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600 CPU @ 3.40GHz
Engine&#58; WaDuuttie
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.175s
Test duration&#58; 00&#58;05&#58;42
Expected time to finish&#58; 00&#58;05&#58;07
STS rating&#58; 2822

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     66     56     63     67     64     63     61     43     51     70     48     64     55     64     30    865
   Score    749    635    725    769    734    782    720    581    633    797    589    709    640    741    520  10324
Score&#40;%)   74.9   63.5   72.5   76.9   73.4   78.2   72.0   58.1   63.3   79.7   58.9   70.9   64.0   74.1   52.0   68.8
  Rating   3092   2584   2985   3181   3025   3239   2963   2344   2575   3306   2380   2914   2607   3056   2072   2822

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600 CPU @ 3.40GHz
Engine&#58; WaDuuttie.exe -h 23
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;06&#58;27
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     68     59     62     66     67     64     65     48     54     71     58     65     56     68     41    912
   Score    762    670    749    750    746    792    737    619    658    796    674    729    655    766    643  10746
Score&#40;%)   76.2   67.0   74.9   75.0   74.6   79.2   73.7   61.9   65.8   79.6   67.4   72.9   65.5   76.6   64.3   71.6
It seems to me the UCI-results and the WinBoard-results cannot be compared with each other one on one.
Nothing is unstable (Lawrence Krauss)
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

Maarten Claessens wrote:I ran the tool with WaDuuttie twice: the first time with the Wb2Uci-adaptor and WaDuuttie interpreting st 1000 as milliseconds, the second time as the WinBoard-engine he actually is (the commandline option "-h 23" makes sure at most 128M hash is used).
This are the results:

Code: Select all

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600 CPU @ 3.40GHz
Engine&#58; WaDuuttie
Hash&#58; 128, Threads&#58; 1, time/pos&#58; 0.175s
Test duration&#58; 00&#58;05&#58;42
Expected time to finish&#58; 00&#58;05&#58;07
STS rating&#58; 2822

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     66     56     63     67     64     63     61     43     51     70     48     64     55     64     30    865
   Score    749    635    725    769    734    782    720    581    633    797    589    709    640    741    520  10324
Score&#40;%)   74.9   63.5   72.5   76.9   73.4   78.2   72.0   58.1   63.3   79.7   58.9   70.9   64.0   74.1   52.0   68.8
  Rating   3092   2584   2985   3181   3025   3239   2963   2344   2575   3306   2380   2914   2607   3056   2072   2822

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600 CPU @ 3.40GHz
Engine&#58; WaDuuttie.exe -h 23
Estimated time/pos&#58; 0.200s
Test duration&#58; 00&#58;06&#58;27
Expected time to finish&#58; 00&#58;05&#58;45
Command&#58; level 40 0&#58;8 0

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     68     59     62     66     67     64     65     48     54     71     58     65     56     68     41    912
   Score    762    670    749    750    746    792    737    619    658    796    674    729    655    766    643  10746
Score&#40;%)   76.2   67.0   74.9   75.0   74.6   79.2   73.7   61.9   65.8   79.6   67.4   72.9   65.5   76.6   64.3   71.6
It seems to me the UCI-results and the WinBoard-results cannot be compared with each other one on one.
The test duration of WB is longer than the one that uses WB2UCI adaptor.
Also with that less than 200ms/pos analysis time anaything can happen to the communication lag when using WB2UCI.
Try to re-run the WB2UCI see if there are changes to the results.

You may try to run at longer time/pos, for uci engines do not use
--getrating option, in this way the tool will follow the time in
--movetime <value in ms>.

For WB just adjust --mps and --tc options. I don't know if your engine will think more if it receives the first position to be analyzed. I don't know how it behaves when it receives the level command.

Try to enable the --log option and examine the depths to couple of positions just to compare the WB and WB2UCI logs.

Another option is adjust the tc of WB.
You may use level 40 0:7 0 perhaps, that would be
time/pos = 7s/40 = 0.175s
--mps 40 --tc 0:7