STS test suite and engine analysis interface

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Ferdy
Posts: 4846
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

MikeB wrote: Very nice Ferdy!

Could you post/send the source so I can compile a mac version?

Best, Mike
There is not much change so far. I am trying to fix the reported wrong number of cores in xp. I will send the code once there are significant changes done.

I am adding more info in summary, but this is still under testing.
Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
Physical Cores: 4, Logical Cores: 8
Engine: crafty-24.1-x64-sse42
Number of positions in STS1-STS15_LAN_v2.epd: 1500
Max score = 1500 x 10 = 15000
Estimated time/pos: 0.100s
Test duration: 00h:03m:01s
Expected time to finish: 00h:03m:15s

Command: st 0.1

STS ID STS1 STS2 STS3 STS4 STS5 STS6 STS7 STS8 STS9 STS10 STS11 STS12 STS13 STS14 STS15 ALL
NumPos 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1500
BestCnt 67 49 55 59 56 60 53 48 40 57 48 57 54 57 36 796
Score 733 571 668 691 658 786 628 562 543 665 584 665 674 713 553 9694
Score(%) 73.3 57.1 66.8 69.1 65.8 78.6 62.8 56.2 54.3 66.5 58.4 66.5 67.4 71.3 55.3 64.6

Legend:
STS 01: Undermining
STS 02: Open Files and Diagonals
STS 03: Knight Outposts
STS 04: Square Vacancy
STS 05: Bishop vs Knight
STS 06: Re-Capturing
STS 07: Offer of Simplification
STS 08: Advancement of f/g/h Pawns
STS 09: Advancement of a/b/c Pawns
STS 10: Simplification
STS 11: Activity of the King
STS 12: Center Control
STS 13: Pawn Play in the Center
STS 14: Queens and Rooks to the 7th rank
STS 15: Avoid Pointless Exchange
And plan to add,
Top 3 STS num with strong results.

Then followed by,
Top 3 STS num with weak results.
brianr
Posts: 540
Joined: Thu Mar 09, 2006 3:01 pm
Full name: Brian Richardson

Re: STS test suite and engine analysis interface

Post by brianr »

Thank you for updating STSrating.

As you are working on it now, please look into questionable setboard not supported claims.

Perhaps STSrating could ignore engine output until getting "done=1" before expecting to process protover features.

Thank you again,
Brian Richardson
(author of Tinker)
Ferdy
Posts: 4846
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

brianr wrote:Perhaps STSrating could ignore engine output until getting "done=1" before expecting to process protover features.
After executing the engine, I send xboard, followed by protover 2 command.
Then I parse every string the engine would output, look for setboard=1 and done=1. After getting the done=1, I stopped parsing the engine output. If the setboard_flag is true, meaning I got the seboard=1 string,
I send new commands to continue the test process. But if the setboard_flag is false, I send quit command to the engine and stopped the test altogether.

This is now my latest dev so far. Display top 3 strong and weak areas of an engine.

Crafty on 0.2s/pos

Code: Select all

Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
Physical Cores: 4, Logical Cores: 8
Engine: crafty-24.1-x64-sse42
Number of positions in STS1-STS15_LAN_v2.epd: 1500
Max score = 1500 x 10 = 15000
Estimated time/pos: 0.200s
Test duration: 00h:05m:51s
Expected time to finish: 00h:05m:45s
Command: st 0.2

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     67     49     58     65     57     60     52     53     43     57     49     56     56     62     38    822
   Score    739    571    682    747    672    784    636    632    572    682    598    653    693    750    574   9985
Score(%)   73.9   57.1   68.2   74.7   67.2   78.4   63.6   63.2   57.2   68.2   59.8   65.3   69.3   75.0   57.4   66.6

::STS ID AND Titles::
STS 01: Undermining
STS 02: Open Files and Diagonals
STS 03: Knight Outposts
STS 04: Square Vacancy
STS 05: Bishop vs Knight
STS 06: Re-Capturing
STS 07: Offer of Simplification
STS 08: Advancement of f/g/h Pawns
STS 09: Advancement of a/b/c Pawns
STS 10: Simplification
STS 11: Activity of the King
STS 12: Center Control
STS 13: Pawn Play in the Center
STS 14: Queens and Rooks to the 7th rank
STS 15: Avoid Pointless Exchange

::Top 3 STS with high result::
1. STS 06, 78.4%, "Re-Capturing"
2. STS 14, 75.0%, "Queens and Rooks to the 7th rank"
3. STS 04, 74.7%, "Square Vacancy"

::Top 3 STS with low result::
1. STS 02, 57.1%, "Open Files and Diagonals"
2. STS 09, 57.2%, "Advancement of a/b/c Pawns"
3. STS 15, 57.4%, "Avoid Pointless Exchange"
And plan to integrate plots from 2 data that could be an output from the test itself. Run the test and enable the option "--plotdata 1.txt", then run again with another engine and "--plotdata 2.txt".
Then run a program to parse those 1.txt and 2.txt files and produce a plot like this.

Red is my old version, Blue is my new version. New is around +25 elo better in actual engine gauntlets.

Image

So now I just know that the new version is doing well on difficult test, but the old has somewhat an advantage in the easier ones. The plot helps in what direction should I take next :).
brianr
Posts: 540
Joined: Thu Mar 09, 2006 3:01 pm
Full name: Brian Richardson

Re: STS test suite and engine analysis interface

Post by brianr »

That is what I sort of thought.

However, in Tinker's case if I send "feature setboard=1" in the second line of output right after an engine name and version line it works fine with STSrating.

Unfortunately, Tinker normally spews quite a lot of somewhat spurious debugging and status information before processing the "protover" command. In that "normal" case STSrating claims Tinker does not support setboard, but it is supported, apparently sort of.

Perhaps you have some suggestions.
Thanks again,
Brian
Ferdy
Posts: 4846
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

brianr wrote:That is what I sort of thought.

However, in Tinker's case if I send "feature setboard=1" in the second line of output right after an engine name and version line it works fine with STSrating.

Unfortunately, Tinker normally spews quite a lot of somewhat spurious debugging and status information before processing the "protover" command. In that "normal" case STSrating claims Tinker does not support setboard, but it is supported, apparently sort of.

Perhaps you have some suggestions.
Thanks again,
Brian
Looking at the code here, I have actually added a waiting time of around 5s, and is multiplied by 10 if I receive done=0. If I cannot get done=1, I stop parsing engine output and look at the setboard_flag. This was not thoroughly tested, might be the cause. Somehow the waiting time was not enough. I will look into this.
Ferdy
Posts: 4846
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

Canoike wrote:Thank you.
It worked on my computer with this new version.
The CPU has 4 physical cores and not 4 logical cores.
Windows XP Professional 64 bits edition.
It would be nice if your utility works under Linux because the same engine runs faster under this OS. From the fastest to the slowest OS : 1) Linux 2) MacOs 3) guess...

Stockfish latest dev, my compil from Github with mingW :

Code: Select all

STS_Rating_v9 -f "STS1-STS15_LAN_v2.epd" -e "stockfish20150910.exe" --proto uci -h 128 --getrating

Intel(R) Core(TM) i5-3450S CPU @ 2.80GHz
Physical Cores: 1, Logical Cores: 4
Engine: Stockfish 120915 64 POPCNT
Hash: 128, Threads: 1, time/pos: 0.166s
Test duration: 00:04:41
Expected time to finish: 00:04:54
STS rating: 2978

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     87     64     72     74     71     75     53     68     66     78     65     52     67     62     42    996
   Score    891    683    796    787    752    794    573    783    703    820    696    566    747    706    553  10850
Score(%)   89.1   68.3   79.6   78.7   75.2   79.4   57.3   78.3   70.3   82.0   69.6   56.6   74.7   70.6   55.3   72.3
  Rating   3724   2798   3301   3261   3105   3292   2308   3243   2887   3408   2856   2277   3083   2900   2219   2978
Komodo 9.2 x64

Code: Select all

STS_Rating_v9 -f "STS1-STS15_LAN_v2.epd" -e "ko
modo-9.2-64bit.exe" --proto uci -h 128 --getrating
Intel(R) Core(TM) i5-3450S CPU @ 2.80GHz
Physical Cores: 1, Logical Cores: 4
Engine: Komodo 9.2 64-bit
Hash: 128, Threads: 1, time/pos: 0.165s
Test duration: 00:04:24
Expected time to finish: 00:04:52
STS rating: 2990

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     85     67     71     74     74     75     49     66     66     71     66     52     70     63     47    996
   Score    874    741    789    770    768    800    561    736    706    778    725    573    764    703    602  10890
Score(%)   87.4   74.1   78.9   77.0   76.8   80.0   56.1   73.6   70.6   77.8   72.5   57.3   76.4   70.3   60.2   72.6
  Rating   3648   3056   3270   3185   3177   3319   2255   3034   2900   3221   2985   2308   3159   2887   2437   2990
Tried my new parsing tool to read the summary and plot it. Here it is.

Image

Unfortunately I cannot convert the script to exe file.
Vinvin
Posts: 5287
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: STS test suite and engine analysis interface

Post by Vinvin »

Great tool and graph ! Thank you very much !
Ferdy
Posts: 4846
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

Try this one if it will work.
Changes:

Code: Select all

v12
1. Modify epd reader now will read sts version 3.epd
2. Refactor winboard code, increase wait time for done=1
3. Updated STS 12, now using epd's with more alternatives on pos
1 to 23. There are 2 such files in official website, the other one
has less alternative moves.
4. Modify test set epd formats, opcodes are now in order, c0, c7, c8 and c9.
Test set is now at version 3
5. Display top 5 strong and weak test set results
6. Show test set title, in summary
7. Use latest version of psutil, the module that detects number
of logical and physical cores
Sample output.

Code: Select all

Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
Physical Cores: 4, Logical Cores: 8
Engine: Stockfish 6 64 POPCNT
Hash: 128, Threads: 1, time/pos: 0.188s
Number of positions in STS1-STS15_LAN_v3.epd: 1500
Max score = 1500 x 10 = 15000
Test duration: 00h:04m:53s
Expected time to finish: 00h:05m:27s
STS rating: 3395

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     83     77     75     71     76     75     70     68     69     80     73     72     77     74     49   1089
   Score    864    844    830    817    814    892    786    786    765    862    795    802    833    844    723  12257
Score(%)   86.4   84.4   83.0   81.7   81.4   89.2   78.6   78.6   76.5   86.2   79.5   80.2   83.3   84.4   72.3   81.7
  Rating   3604   3515   3453   3395   3381   3729   3257   3257   3163   3595   3297   3328   3466   3515   2976   3395

:: STS ID and Titles ::
STS 01: Undermining
STS 02: Open Files and Diagonals
STS 03: Knight Outposts
STS 04: Square Vacancy
STS 05: Bishop vs Knight
STS 06: Re-Capturing
STS 07: Offer of Simplification
STS 08: Advancement of f/g/h Pawns
STS 09: Advancement of a/b/c Pawns
STS 10: Simplification
STS 11: Activity of the King
STS 12: Center Control
STS 13: Pawn Play in the Center
STS 14: Queens and Rooks to the 7th rank
STS 15: Avoid Pointless Exchange

:: Top 5 STS with high result ::
1. STS 06, 89.2%, "Re-Capturing"
2. STS 01, 86.4%, "Undermining"
3. STS 10, 86.2%, "Simplification"
4. STS 02, 84.4%, "Open Files and Diagonals"
5. STS 14, 84.4%, "Queens and Rooks to the 7th rank"

:: Top 5 STS with low result ::
1. STS 15, 72.3%, "Avoid Pointless Exchange"
2. STS 09, 76.5%, "Advancement of a/b/c Pawns"
3. STS 07, 78.6%, "Offer of Simplification"
4. STS 08, 78.6%, "Advancement of f/g/h Pawns"
5. STS 11, 79.5%, "Activity of the King"
Download:
http://www.mediafire.com/download/kncmw ... ng_v12.rar

I have this script to plot results based from summary. Save summary of each engine to 1.txt, 2.txt, and 3.txt.
Sample summary.

Code: Select all

Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
Physical Cores: 4, Logical Cores: 8
Engine: Gaviota v1.0
Hash: 128, Threads: 1, time/pos: 0.200s
Number of positions in STS1-STS15_LAN_v3.epd: 1500
Max score = 1500 x 10 = 15000
Test duration: 00h:05m:27s
Expected time to finish: 00h:05m:45s

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     65     59     62     58     65     65     63     42     57     67     53     64     62     64     45    891
   Score    739    704    714    730    717    820    717    576    642    753    612    745    729    731    696  10625
Score(%)   73.9   70.4   71.4   73.0   71.7   82.0   71.7   57.6   64.2   75.3   61.2   74.5   72.9   73.1   69.6   70.8
The plot for 3 engines for comparison.

Image

Python script, you need to have matplotlib installed, and also python and numpy. The plot will be save in png format. I am not successful so far in converting this to exe file.

Code: Select all

# sts_chart_v2.py
# Python 2.7.6

import numpy as np
import matplotlib.pyplot as plt
import sys
import os


def autoLabel(rects, ax):
    """ Add labels """
    for rect in rects:
        height = rect.get_height()
        ax.text(rect.get_x()+rect.get_width()/2.,\
                1.05*height, '%d'%int(height),\
                ha='center', va='bottom')

def readData(f1, f2, f3):
    """ Read sts data for plotting
        f1 is from first engine and
        f2 is from second engine.
        The first line read in each file is the engine id name
    """
    data1 = []
    data2 = []
    data3 = []
    
    with open (f1, 'r') as fdata1:
        for lines in fdata1:
            line = lines.strip()
            if 'Engine:' in line:
                a = line.split(':')
                engine = a[1].strip()
                data1.append(engine)
            elif 'Score ' in line:
                a = ' '.join(line.split())
                a = a.split(' ')
                for i, n in enumerate(a):
                    if i > 0:
                        b = n.strip()
                        data1.append(int(b))
                        
    # Delete the last entry since this is just the overall score
    del data1[-1]                      

    with open (f2, 'r') as fdata2:
        for lines in fdata2:
            line = lines.strip()
            if 'Engine:' in line:
                a = line.split(':')
                engine = a[1].strip()
                data2.append(engine)
            elif 'Score ' in line:
                a = ' '.join(line.split())
                a = a.split(' ')
                for i, n in enumerate(a):
                    if i > 0:
                        b = n.strip()
                        data2.append(int(b))
    del data2[-1]

    with open (f3, 'r') as fdata3:
        for lines in fdata3:
            line = lines.strip()
            if 'Engine:' in line:
                a = line.split(':')
                engine = a[1].strip()
                data3.append(engine)
            elif 'Score ' in line:
                a = ' '.join(line.split())
                a = a.split(' ')
                for i, n in enumerate(a):
                    if i > 0:
                        b = n.strip()
                        data3.append(int(b))
    del data3[-1]

    return (data1, data2, data3)


def plotSts():
    """ plot data from sts score"""

    # Check file
    file1 = '1.txt'
    file2 = '2.txt'
    file3 = '3.txt'

    # Check file1 and file 2 are existing
    if not os.path.isfile(file1):
        print 'file %s is missing?!' %file1
        raw_input('Press enter key to exit')
        sys.exit(1)
    if not os.path.isfile(file2):
        print 'file %s is missing?!' %file2
        raw_input('Press enter key to exit')
        sys.exit(1)
    if not os.path.isfile(file3):
        print 'file %s is missing?!' %file3
        raw_input('Press enter key to exit')
        sys.exit(1)

    e1, e2, e3 = readData(file1, file2, file3)
    
    e1name = e1[0]

    # Delete the first entry since this is just the engine id
    del e1[0]
    firstEngine = e1
    min1 = min(firstEngine)

    e2name = e2[0]
    del e2[0]
    secondEngine = e2
    min2 = min(secondEngine)

    e3name = e3[0]
    del e3[0]
    thirdEngine = e3
    min3 = min(thirdEngine)

    # Get the minimum score for plotting
    minval = min(min1, min2, min3)
    
    N = len(firstEngine)    

    ind = np.arange(N) # For x axis
    # width = 0.30       # bar width
    margin = 0.05
    width = (1.-3.*margin)/3

    fig, ax = plt.subplots(figsize=(20,9))

    rects1 = ax.bar(ind, firstEngine, width, color='crimson')
    rects2 = ax.bar(ind+width, secondEngine, width, color='green')
    rects3 = ax.bar(ind+width+width, thirdEngine, width, color='dodgerblue')

    # add some text for labels, title and axes ticks
    minScore = minval - 50
    ax.set_ylim(minScore, 900)
    
    leftMargin = 0.3
    rightMargin = 0.1
    ax.set_xlim(0-leftMargin, 15+rightMargin)

    plt.subplots_adjust(bottom=0.25)
    
    ax.set_ylabel('Score')
    ax.set_title('STS Score comparison')
    ax.set_xticks(ind+width*2)
    ax.set_xticklabels( ('STS.01 - Undermining', 'Open Files and Diagonals', 'Knight Outposts',\
                         'Square Vacancy', 'Bishop vs Knight', 'Re-Capturing',\
                         'STS.07 - Offer of Simplification', 'Advancement of f/g/h Pawns',\
                         'Advancement of a/b/c Pawns', 'Simplification', 'Activity of the King',\
                         'Center Control', 'Pawn Play in the Center', 'Queens and Rooks to the 7th rank',\
                         'STS.15 - Avoid Pointless Exchange'), ha='right', rotation=45)

    ax.legend( (rects1[0], rects2[0], rects3[0]), (e1name, e2name, e3name) )

    autoLabel(rects1, ax)
    autoLabel(rects2, ax)
    autoLabel(rects3, ax)

    # Save to disk
    fig_fname = e1name + '_' + e2name + '_' + e3name + '_' + 'stsbar.png'
    
    # Delete existing file
    if os.path.isfile(fig_fname):
        os.remove(fig_fname)
        
    plt.savefig(fig_fname, format="png", dpi=300)

    plt.show()

if __name__ == "__main__":
    plotSts()
Ferdy
Posts: 4846
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS test suite and engine analysis interface

Post by Ferdy »

Canoike wrote:No warning of psutil. Just this :
STS_Rating_v9 -f "STS1-STS15_LAN_v2.epd" -e "st
ockfish20150910.exe" --proto uci -h 128 --getrating
STS Rating v9.0

Intel(R) Core(TM) i5-3450S CPU @ 2.80GHz
Cores, physical: 1, logical: 4

Engine: stockfish20150910.exe
Hash: 128, Threads: 1, MoveTime: 1.0s
Number of positions in STS1-STS15_LAN_v2.epd: 1500

Your bench : 2.118753s
My bench : 2.553400s
Analysis Time to get CCRL 40/4 rating estimate : 166ms
Starting engine stockfish20150910.exe ...
id name: Stockfish 120915 64 POPCNT
There is v12 here. You might like to try.
http://www.talkchess.com/forum/viewtopi ... 50&t=56653
tttony
Posts: 271
Joined: Sun Apr 24, 2011 12:33 am

Re: STS test suite and engine analysis interface

Post by tttony »

Big thanks!!! Excelent tool!

Do you have a site or blog? I recommend you to create one to keep updates there