Given the first position
[d]r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - -
We let an engine analyze it for 10s or anytime. Save the score and the pv.
position fen r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - 0 1
go movetime 10000
Analysis result:
info depth 15 seldepth 29 score cp -104 time 6490 nodes 10283125 nps 1606738 hashfull 39 tbhits 0 pv e4a8 g5g4 d2e2 g4f3 a8f3 h8g8 c1b1 d7f6 h1e1 d8d6 h2h3 c8d7 e2e5 f6h7 c2c4 h7g5 e5d6 c7d6
So the score is -104 cp
and the pv is e4a8 g5g4 d2e2 g4f3 a8f3 h8g8
We only take the first 6 pv moves in this example. We will be making this pv moves on the board and let the engine again analyze the position (2nd position)
position fen r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - 0 1 moves e4a8 g5g4 d2e2 g4f3 a8f3 h8g8
And we have this position.
[d]2bq1rk1/p1pnbpp1/1p2p3/8/3P4/5B2/PPP1QPPP/2KR3R w - -
Send the command
go movetime 10000
Analysis result:
info depth 14 seldepth 28 score cp -100 time 6592 nodes 10004920 nps 1539218 hashfull 49 tbhits 0 pv c1b1 e7d6 f3c6 d8f6 g2g3 d7b8 c6e4 c8a6 e2h5 g7g6 h5f3 d6e7 f3f6 e7f6 c2c3
So the score is -100 cp
We calculate ears like this.
ears = abs(score1 - score2) = abs(-104 - (-100)) = abs(-104 + 100) = abs(-4) = 4 cp
This would mean that this engine analysis is reliable in terms of score. We give the first position and the second position after its first 6 pv moves, and yet the difference in score is only 4 cp. The lower the ears the more reliable is the engine analysis score during its analysis in the first position.
To test the engine analysis we take some positions and calculate its average or mean ears, 1st set is from opening test suite from kai and the 2nd set are positions from arasan epd.
First result using opening suite, we only use 100 positions out of 200 positions.
So the mean ars (analysis reliability score) is close. Hannibal achieved the top average of 8 cp while the last got 19 cp. By average they are equally good, but notice the maximum and standard deviations. Low max and and low stddev are good.
Here now are the results from arasan test suite. Also 100 positions only and 10s/pos.
Senpai tops the list with only 24 cp.
To test your favorite engine try this python script. You need python 3 and python-chess v0.26.0 installed.
ears.py
Code: Select all
# -*- coding: utf-8 -*-
"""
ears.py
Engine Analysis Reliability Score
Run engine on different positions and measure its analysis reliability score (ars).
* Analyse initial position get the first 6 pv moves after time t and
save the score as score1
* Walk the first 6 pv moves from initial position to get new position
and get its analysis and save the score as score2
* Get the absolute score = abs(score1 - score2) to get the ars for this position.
* Finally calculate the average ars for all positions to get the ears
Lower average ars is better.
Requirements:
python 3
python-chess v0.26.0
uci engine that supports go movetime command
uci engine that can show analysis of 6 or more pv moves
xboard engine that supports setboard and ping and a neat pv output
"""
import os
import statistics as st
import argparse
import chess
import chess.engine
import logging
def main():
parser = argparse.ArgumentParser(prog='Engine Analysis Reliability Score v0.1',
description='Let engine analyze some positions ' +
'from a file and get its ars (analysis reliability score)',
epilog='%(prog)s')
parser.add_argument("-i", "--inepd", help="input epd",
required=True)
parser.add_argument("-e", "--engine", help="engine file or path",
required=True)
parser.add_argument("-m", "--name", help="engine name", required=False)
parser.add_argument("-a", "--threads", help="engine threads (default=1)",
default=1, type=int, required=False)
parser.add_argument("-b", "--hash", help="engine hash in MB (default=128)",
default=128, type=int, required=False)
parser.add_argument("-w", "--weight", help="weight file for NN engine",
required=False)
parser.add_argument("-t", "--movetime",
help="analysis movetime in ms (default=5000)",
default=5000, type=int, required=False)
parser.add_argument("-n", "--numpos",
help="max number of positions to analyze (default=20)",
default=20, type=int, required=False)
parser.add_argument("-p", "--pvlength",
help="length of pv to extend (default=6)",
default=6, type=int, required=False)
parser.add_argument("-r", "--proto", help="engine protocol [uci/xboard]",
required=True)
parser.add_argument("-d", "--debug", help="save logs to file",
action="store_true")
args = parser.parse_args()
eng_path = args.engine
epdfn = args.inepd
mtsec = args.movetime/1000.0
hash_value = args.hash
thread_value = args.threads
max_epd_cnt = args.numpos
pv_len_max = args.pvlength
csvfn = 'output_ears.csv'
if args.proto == 'uci':
engine = chess.engine.SimpleEngine.popen_uci(eng_path)
else:
engine = chess.engine.SimpleEngine.popen_xboard(eng_path)
if args.name:
eng_name = args.name
else:
eng_name = engine.id['name']
print(eng_name)
# Replace weird chars in engine name
for r in (("\\", "_"), ("/", "_")):
eng_name = eng_name.replace(*r)
if args.debug:
ename = '_'.join(eng_name.split())
epdname = '_'.join(epdfn[0:-4].split())
logfn = epdname + '_' + ename + '_ears.log'
logging.basicConfig(level=logging.DEBUG,
filename=logfn, filemode='w')
# Set engine basic options
try:
engine.configure({"Hash": hash_value})
except:
pass
try:
engine.configure({"Threads": thread_value})
except:
pass
# For Lc0 weights file, if not on same dir with lc0
try:
if 'lc0' in eng_name.lower():
engine.configure({"WeightsFile": args.weight})
except:
pass
# Define engine search limit
limit = chess.engine.Limit(time=mtsec)
tried_epd_cnt = 0
actual_epd_cnt = 0
ars_list = []
with open(epdfn, 'r') as f:
for epd in f:
actual_epd_cnt += 1
if tried_epd_cnt >= max_epd_cnt:
break
epd = epd.strip()
logging.info('position %d' % (actual_epd_cnt))
print('|| position %d ||' % (actual_epd_cnt))
print()
# (1) Analyze the current position and save the score and pv
pos, epd_info = chess.Board().from_epd(epd)
side1 = pos.turn
logging.info('starting position')
logging.info('%s' % (epd))
print('starting position')
print('%s' % (epd))
print(pos)
score1 = None
pv = None
# python-chess will send uci_analyzemode and go movetime commands
with engine.analysis(pos, limit) as analysis:
for info in analysis:
try:
if 'score' in info and not 'lowerbound' in info \
and not 'upperbound' in info:
if info['score'].is_mate():
# Not all engines might have a matescore of 32000
score1 = info['score'].relative.score(mate_score=32000)
logging.info('Convert mate_score {} to score cp {}'.format(info['score'], score1))
else:
score1 = info['score'].relative.score()
except:
pass
# Save the pv, to be used later
try:
if 'score' in info and not 'lowerbound' in info \
and not 'upperbound' in info and 'pv' in info \
and len(info['pv']) >= pv_len_max:
pv = info['pv']
except:
pass
if score1 is None:
logging.warning('%s failed to show score1 from %s' % (eng_name, pos.epd()))
continue
if pv is None:
logging.warning('%s failed to show required pv from %s' % (eng_name, pos.epd()))
continue
print('analysis score1: %d' % (score1))
print()
# Cut the pv up to 6 moves
pvline = pv[0:pv_len_max]
# (2) Update the current position by pushing the 6 pv moves.
# Search and save the score to score2
for move in pvline:
pos.push(move)
print()
side2 = pos.turn
logging.info('position after %d pv moves' % (pv_len_max))
logging.info('%s' % (pos.epd()))
print('position after %d pv moves' % (pv_len_max))
print('%s' % (pos.epd()))
print(pos)
score2 = None
with engine.analysis(pos, limit) as analysis:
for info in analysis:
try:
if 'score' in info and not 'lowerbound' in info \
and not 'upperbound' in info:
if info['score'].is_mate():
score2 = info['score'].relative.score(mate_score=32000)
logging.info('Convert mate_score {} to score cp {}'.format(info['score'], score2))
else:
score2 = info['score'].relative.score()
except:
pass
if score2 is None:
logging.warning('%s failed to show score2 from %s' % (eng_name, pos.epd()))
continue
# Perhaps not needed as max pv_length has been checked before
if side1 != side2:
score2 = -score2
print('analysis score2: %d' % (score2))
print()
ars = abs(score1 - score2)
logging.info('ars (Analysis Reliability Score) = abs(score1 - (score2))')
logging.info('ars = abs(%+d - (%+d)) = %+d' % (score1, score2, ars))
print('Analysis reliability Score (ars = abs(s1 - s2), lower is better):', ars)
print()
tried_epd_cnt += 1
ars_list.append(ars)
print('engine : %s' % (eng_name))
print('movetime sec : %0.1f' % (mtsec))
print('test epd file : %s' % (epdfn))
print('cnt ars : %d' % (len(ars_list)))
print('mean ars cp : %0.2f' % (st.mean(ars_list)))
# Save to csv file
if not os.path.isfile(csvfn):
# Write header
with open(csvfn, 'a') as f:
f.write('%s,%s,%s,%s,%s,%s,%s,%s,%s\n' % ('engine', 'tsec/pos',
'numpos', 'mean_ars', 'stdev_ars', 'min_ars',
'max_ars', 'pv_length', 'epd file'))
# Write data
with open(csvfn, 'a') as f:
f.write('%s,%0.1f,%d,%d,%d,%d,%d,%d,%s\n' % (eng_name, mtsec,
len(ars_list), st.mean(ars_list),
st.stdev(ars_list), min(ars_list),
max(ars_list), pv_len_max, epdfn))
engine.quit()
if __name__ == "__main__":
main()
python ears.py -h
Code: Select all
Let engine analyze some positions from a file and get its ars (analysis
reliability score)
optional arguments:
-h, --help show this help message and exit
-i INEPD, --inepd INEPD
input epd
-e ENGINE, --engine ENGINE
engine file or path
-m NAME, --name NAME engine name
-a THREADS, --threads THREADS
engine threads (default=1)
-b HASH, --hash HASH engine hash in MB (default=128)
-w WEIGHT, --weight WEIGHT
weight file for NN engine
-t MOVETIME, --movetime MOVETIME
analysis movetime in ms (default=5000)
-n NUMPOS, --numpos NUMPOS
max number of positions to analyze (default=20)
-p PVLENGTH, --pvlength PVLENGTH
length of pv to extend (default=6)
-r PROTO, --proto PROTO
engine protocol [uci/xboard]
-d, --debug save logs to file
Engine Analysis Reliability Score v0.1
Code: Select all
python ears.py --inepd "arasan.epd" --engine "C:\chess\engines\stockfish\stockfish_10.exe" --proto uci --threads 1 --hash 256 --movetime 10000 --numpos 100 --pvlength 6 --debug