Page 1 of 1

EARS (Engine Analysis Reliability Score)

Posted: Sat Mar 09, 2019 8:23 pm
by Ferdy
It is an absolute score between two positions.

Given the first position

[d]r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - -
We let an engine analyze it for 10s or anytime. Save the score and the pv.

position fen r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - 0 1
go movetime 10000

Analysis result:
info depth 15 seldepth 29 score cp -104 time 6490 nodes 10283125 nps 1606738 hashfull 39 tbhits 0 pv e4a8 g5g4 d2e2 g4f3 a8f3 h8g8 c1b1 d7f6 h1e1 d8d6 h2h3 c8d7 e2e5 f6h7 c2c4 h7g5 e5d6 c7d6

So the score is -104 cp
and the pv is e4a8 g5g4 d2e2 g4f3 a8f3 h8g8

We only take the first 6 pv moves in this example. We will be making this pv moves on the board and let the engine again analyze the position (2nd position)

position fen r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - 0 1 moves e4a8 g5g4 d2e2 g4f3 a8f3 h8g8
And we have this position.

[d]2bq1rk1/p1pnbpp1/1p2p3/8/3P4/5B2/PPP1QPPP/2KR3R w - -
Send the command
go movetime 10000

Analysis result:
info depth 14 seldepth 28 score cp -100 time 6592 nodes 10004920 nps 1539218 hashfull 49 tbhits 0 pv c1b1 e7d6 f3c6 d8f6 g2g3 d7b8 c6e4 c8a6 e2h5 g7g6 h5f3 d6e7 f3f6 e7f6 c2c3

So the score is -100 cp

We calculate ears like this.
ears = abs(score1 - score2) = abs(-104 - (-100)) = abs(-104 + 100) = abs(-4) = 4 cp

This would mean that this engine analysis is reliable in terms of score. We give the first position and the second position after its first 6 pv moves, and yet the difference in score is only 4 cp. The lower the ears the more reliable is the engine analysis score during its analysis in the first position.

To test the engine analysis we take some positions and calculate its average or mean ears, 1st set is from opening test suite from kai and the 2nd set are positions from arasan epd.

First result using opening suite, we only use 100 positions out of 200 positions.

Image

So the mean ars (analysis reliability score) is close. Hannibal achieved the top average of 8 cp while the last got 19 cp. By average they are equally good, but notice the maximum and standard deviations. Low max and and low stddev are good.

Here now are the results from arasan test suite. Also 100 positions only and 10s/pos.

Image

Senpai tops the list with only 24 cp.

To test your favorite engine try this python script. You need python 3 and python-chess v0.26.0 installed.

ears.py

Code: Select all

# -*- coding: utf-8 -*-
"""
ears.py

Engine Analysis Reliability Score

Run engine on different positions and measure its analysis reliability score (ars).

* Analyse initial position get the first 6 pv moves after time t and
  save the score as score1
* Walk the first 6 pv moves from initial position to get new position
  and get its analysis and save the score as score2
* Get the absolute score = abs(score1 - score2) to get the ars for this position.
* Finally calculate the average ars for all positions to get the ears

Lower average ars is better.

Requirements:
    python 3
    python-chess v0.26.0
    uci engine that supports go movetime command
    uci engine that can show analysis of 6 or more pv moves
    xboard engine that supports setboard and ping and a neat pv output

"""


import os
import statistics as st
import argparse
import chess
import chess.engine
import logging


def main():
    parser = argparse.ArgumentParser(prog='Engine Analysis Reliability Score v0.1', 
                description='Let engine analyze some positions ' +
                'from a file and get its ars (analysis reliability score)',
                epilog='%(prog)s')    
    parser.add_argument("-i", "--inepd", help="input epd",
                        required=True)
    parser.add_argument("-e", "--engine", help="engine file or path",
                        required=True)
    parser.add_argument("-m", "--name", help="engine name", required=False)
    parser.add_argument("-a", "--threads", help="engine threads (default=1)",
                        default=1, type=int, required=False)
    parser.add_argument("-b", "--hash", help="engine hash in MB (default=128)",
                        default=128, type=int, required=False)
    parser.add_argument("-w", "--weight", help="weight file for NN engine",
                        required=False)
    parser.add_argument("-t", "--movetime", 
                        help="analysis movetime in ms (default=5000)",
                        default=5000, type=int, required=False)
    parser.add_argument("-n", "--numpos", 
                        help="max number of positions to analyze (default=20)",
                        default=20, type=int, required=False)
    parser.add_argument("-p", "--pvlength", 
                        help="length of pv to extend (default=6)",
                        default=6, type=int, required=False)
    parser.add_argument("-r", "--proto", help="engine protocol [uci/xboard]",
                        required=True)
    parser.add_argument("-d", "--debug", help="save logs to file",
                        action="store_true")
    
    args = parser.parse_args()  
    
    eng_path = args.engine
    epdfn = args.inepd
    mtsec = args.movetime/1000.0
    hash_value = args.hash
    thread_value = args.threads
    max_epd_cnt = args.numpos
    pv_len_max = args.pvlength
    csvfn = 'output_ears.csv'
    
    if args.proto == 'uci':
        engine = chess.engine.SimpleEngine.popen_uci(eng_path)
    else:
        engine = chess.engine.SimpleEngine.popen_xboard(eng_path)
        
    if args.name:
        eng_name = args.name
    else:
        eng_name = engine.id['name']
        
    print(eng_name)
    
    # Replace weird chars in engine name
    for r in (("\\", "_"), ("/", "_")):
        eng_name = eng_name.replace(*r)
        
    if args.debug:
        ename = '_'.join(eng_name.split())
        epdname = '_'.join(epdfn[0:-4].split())
        logfn = epdname + '_' + ename + '_ears.log'
        
        logging.basicConfig(level=logging.DEBUG,
                        filename=logfn, filemode='w')  
    
    # Set engine basic options
    try:
        engine.configure({"Hash": hash_value})
    except:
        pass    
    try:
        engine.configure({"Threads": thread_value})
    except:
        pass
    
    # For Lc0 weights file, if not on same dir with lc0
    try:
        if 'lc0' in eng_name.lower():            
            engine.configure({"WeightsFile": args.weight})
    except:
        pass
    
    # Define engine search limit
    limit = chess.engine.Limit(time=mtsec)
    
    tried_epd_cnt = 0
    actual_epd_cnt = 0
    ars_list = []
    
    with open(epdfn, 'r') as f:
        for epd in f:
            actual_epd_cnt += 1
            
            if tried_epd_cnt >= max_epd_cnt:
                break
            
            epd = epd.strip()
            
            logging.info('position %d' % (actual_epd_cnt))
            
            print('|| position %d ||' % (actual_epd_cnt))
            print()
    
            # (1) Analyze the current position and save the score and pv
            pos, epd_info = chess.Board().from_epd(epd)
            side1 = pos.turn
            logging.info('starting position')
            logging.info('%s' % (epd))
            print('starting position')
            print('%s' % (epd))            
            print(pos)
            
            score1 = None
            pv = None
            
            # python-chess will send uci_analyzemode and go movetime commands
            with engine.analysis(pos, limit) as analysis:
                for info in analysis:
                    try:
                        if 'score' in info and not 'lowerbound' in info \
                            and not 'upperbound' in info:
                            if info['score'].is_mate():
                                # Not all engines might have a matescore of 32000
                                score1 = info['score'].relative.score(mate_score=32000)
                                logging.info('Convert mate_score {} to score cp {}'.format(info['score'], score1))
                            else:
                                score1 = info['score'].relative.score()
                    except:
                        pass
                    
                    # Save the pv, to be used later
                    try:
                        if 'score' in info and not 'lowerbound' in info \
                            and not 'upperbound' in info and 'pv' in info \
                            and len(info['pv']) >= pv_len_max:
                            pv = info['pv']
                    except:
                        pass
                    
            if score1 is None:
                logging.warning('%s failed to show score1 from %s' % (eng_name, pos.epd()))
                continue
            
            if pv is None:
                logging.warning('%s failed to show required pv from %s' % (eng_name, pos.epd()))
                continue

            print('analysis score1: %d' % (score1))
            print()
            
            # Cut the pv up to 6 moves
            pvline = pv[0:pv_len_max]
            
            # (2) Update the current position by pushing the 6 pv moves.
            # Search and save the score to score2
            for move in pvline:
                pos.push(move)
                
            print()
            side2 = pos.turn
            logging.info('position after %d pv moves' % (pv_len_max))
            logging.info('%s' % (pos.epd()))
            print('position after %d pv moves' % (pv_len_max))
            print('%s' % (pos.epd()))            
            print(pos)
            
            score2 = None
            with engine.analysis(pos, limit) as analysis:
                for info in analysis:
                    try:
                        if 'score' in info and not 'lowerbound' in info \
                            and not 'upperbound' in info:
                            if info['score'].is_mate():
                                score2 = info['score'].relative.score(mate_score=32000)
                                logging.info('Convert mate_score {} to score cp {}'.format(info['score'], score2))
                            else:
                                score2 = info['score'].relative.score()
                    except:
                        pass
                    
            if score2 is None:
                logging.warning('%s failed to show score2 from %s' % (eng_name, pos.epd()))
                continue
                        
            # Perhaps not needed as max pv_length has been checked before
            if side1 != side2:
                score2 = -score2

            print('analysis score2: %d' % (score2))
            print()
            
            ars = abs(score1 - score2)
            logging.info('ars (Analysis Reliability Score) = abs(score1 - (score2))')
            logging.info('ars = abs(%+d - (%+d)) = %+d' % (score1, score2, ars))
            print('Analysis reliability Score (ars = abs(s1 - s2), lower is better):', ars)
            print()
            
            tried_epd_cnt += 1
            ars_list.append(ars)
            
    print('engine           : %s' % (eng_name))
    print('movetime sec     : %0.1f' % (mtsec))
    print('test epd file    : %s' % (epdfn))
    print('cnt ars          : %d' % (len(ars_list)))
    print('mean ars cp      : %0.2f' % (st.mean(ars_list)))
    
    # Save to csv file
    if not os.path.isfile(csvfn):
        # Write header
        with open(csvfn, 'a') as f:
            f.write('%s,%s,%s,%s,%s,%s,%s,%s,%s\n' % ('engine', 'tsec/pos',
                    'numpos', 'mean_ars', 'stdev_ars', 'min_ars',
                    'max_ars', 'pv_length', 'epd file'))        
    
    # Write data
    with open(csvfn, 'a') as f:
        f.write('%s,%0.1f,%d,%d,%d,%d,%d,%d,%s\n' % (eng_name, mtsec,
                len(ars_list), st.mean(ars_list),
                st.stdev(ars_list), min(ars_list),
                max(ars_list), pv_len_max, epdfn))
    
    engine.quit()


if __name__ == "__main__":
    main()
To show help type
python ears.py -h

Code: Select all

Let engine analyze some positions from a file and get its ars (analysis
reliability score)

optional arguments:
  -h, --help            show this help message and exit
  -i INEPD, --inepd INEPD
                        input epd
  -e ENGINE, --engine ENGINE
                        engine file or path
  -m NAME, --name NAME  engine name
  -a THREADS, --threads THREADS
                        engine threads (default=1)
  -b HASH, --hash HASH  engine hash in MB (default=128)
  -w WEIGHT, --weight WEIGHT
                        weight file for NN engine
  -t MOVETIME, --movetime MOVETIME
                        analysis movetime in ms (default=5000)
  -n NUMPOS, --numpos NUMPOS
                        max number of positions to analyze (default=20)
  -p PVLENGTH, --pvlength PVLENGTH
                        length of pv to extend (default=6)
  -r PROTO, --proto PROTO
                        engine protocol [uci/xboard]
  -d, --debug           save logs to file

Engine Analysis Reliability Score v0.1
Sample command line for analysis

Code: Select all

python ears.py --inepd "arasan.epd" --engine "C:\chess\engines\stockfish\stockfish_10.exe" --proto uci --threads 1 --hash 256 --movetime 10000 --numpos 100 --pvlength 6 --debug
Analysis results will be saved in output_ears.csv (append mode).