Best EPD Testing Software

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Dann Corbit
Posts: 12550
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Best EPD Testing Software

Post by Dann Corbit »

can00336 wrote:
Dann Corbit wrote:Use Arena.
You can run an EPD test suite with that.
I think that ChessGui can do it also, but I did not try it myself.
I have tried three different versions of Arena. They all stop communicating with the engine at some random point during long runs.

I haven't tried ChessGui, but I would prefer a cmd line interface, if possible.
Thanks for the suggestion!
I have seen the same thing (loss of connection to the output file)
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
BBauer
Posts: 658
Joined: Wed Mar 08, 2006 8:58 pm

Re: Best EPD Testing Software

Post by BBauer »

You may do

Code: Select all

  polyglot.exe  epd-test -min-time 0.1 -max-time 0.50 -min-depth 12 -max-depth 127 -depth-delta 5 -epd G:\Testsets\EET_1.epd 
Kind regards
Bernhard
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Best EPD Testing Software

Post by Ferdy »

There are 5 output files generated.

1. AET_result_summary.txt
Append mode all engine tests are recorded here. It can take any normal epd with am and bm too.

Code: Select all

                                  Engine  Hash(mb)  Thre  Time/pos(s)         TotalTime  Positions  Correct     %  TestFile  
              Stockfish 7Beta1 64 POPCNT       128     1       0.200  00h:00m:50s:448ms        250       40  16.0  arasan18.epd
                  Houdini 4 x64 Tactical       128     1       0.200  00h:00m:50s:000ms        250       25  10.0  arasan18.epd
                           Houdini 4 x64       128     1       0.200  00h:00m:50s:000ms        250       15   6.0  arasan18.epd
      Deuterium v2015.1.35.321 offensive       128     1       0.200  00h:00m:50s:699ms        250       68  27.2  arasan18.epd
                Deuterium v2015.1.35.321       128     1       0.200  00h:00m:50s:699ms        250       11   4.4  arasan18.epd
                Deuterium v2015.1.35.321       128     1       0.200  00h:00m:20s:280ms        100       32  32.0  sts15.epd 
      Deuterium v2015.1.35.321 offensive       128     1       0.200  00h:00m:20s:279ms        100       47  47.0  sts15.epd 
                             Arasan 18.2       128     1       0.200  00h:00m:50s:700ms        250       10   4.0  arasan18.epd
                   Hakkapeliitta 3.0 x64       128     1       0.200  00h:00m:50s:407ms        250       30  12.0  arasan18.epd
                              Fire 4 x64       128     1       0.200  00h:00m:29s:454ms        250       21   8.4  arasan18.epd
2. <engine>_log.txt
Overwrite mode, this is for manual inspection of engine analysis and solutions.

Code: Select all

Starting engine stockfish_15122720_x64_modern.exe ...
>> uci
<< Stockfish 7Beta1 64 POPCNT by T. Romstad, M. Costalba, J. Kiiski, G. Linscott
<< id name Stockfish 7Beta1 64 POPCNT
<< id author T. Romstad, M. Costalba, J. Kiiski, G. Linscott
<< 
<< option name Write Debug Log type check default false
<< option name Contempt type spin default 0 min -100 max 100
<< option name Threads type spin default 1 min 1 max 128
<< option name Hash type spin default 16 min 1 max 1048576
<< option name Clear Hash type button
<< option name Ponder type check default false
<< option name MultiPV type spin default 1 min 1 max 500
<< option name Skill Level type spin default 20 min 0 max 20
<< option name Move Overhead type spin default 30 min 0 max 5000
<< option name Minimum Thinking Time type spin default 20 min 0 max 5000
<< option name Slow Mover type spin default 84 min 10 max 1000
<< option name nodestime type spin default 0 min 0 max 10000
<< option name UCI_Chess960 type check default false
<< option name SyzygyPath type string default <empty>
<< option name SyzygyProbeDepth type spin default 1 min 1 max 100
<< option name Syzygy50MoveRule type check default true
<< option name SyzygyProbeLimit type spin default 6 min 0 max 6
<< uciok
>> setoption name Hash value 128
>> setoption name Threads value 1

Pos 1
r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - bm g4; id "arasan18.1"; c0 "J. Polgar-Berkes, Budapest Hunguest Hotels 2003";

2015-12-30T05&#58;58&#58;48.253000 >> isready
2015-12-30T05&#58;58&#58;48.253000 << readyok
2015-12-30T05&#58;58&#58;48.253000 >> ucinewgame
2015-12-30T05&#58;58&#58;48.253000 >> position fen r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - 0 1
2015-12-30T05&#58;58&#58;48.253000 >> go movetime 200
2015-12-30T05&#58;58&#58;48.284000 << info depth 1 seldepth 1 multipv 1 score cp 120 nodes 41 nps 41000 tbhits 0 time 1 pv e4a8
2015-12-30T05&#58;58&#58;48.284000 << info depth 2 seldepth 2 multipv 1 score cp 68 nodes 90 nps 90000 tbhits 0 time 1 pv e4a8 g5g4
2015-12-30T05&#58;58&#58;48.284000 << info depth 3 seldepth 3 multipv 1 score cp -125 nodes 160 nps 160000 tbhits 0 time 1 pv e4a8 g5g4 c2c3 g4f3
2015-12-30T05&#58;58&#58;48.284000 << info depth 4 seldepth 4 multipv 1 score cp 49 nodes 351 nps 351000 tbhits 0 time 1 pv g2g4 d7f6 e4a8 f6g4
2015-12-30T05&#58;58&#58;48.284000 << info depth 5 seldepth 5 multipv 1 score cp 73 nodes 619 nps 309500 tbhits 0 time 2 pv e4a8 c8a6 a8c6 g5g4 f3e1
2015-12-30T05&#58;58&#58;48.284000 << info depth 6 seldepth 7 multipv 1 score cp 123 nodes 1178 nps 589000 tbhits 0 time 2 pv e4a8 e7f6 a8c6 g5g4 f3e1 a7a6
2015-12-30T05&#58;58&#58;48.284000 << info depth 7 seldepth 8 multipv 1 score cp -129 nodes 3152 nps 1050666 tbhits 0 time 3 pv h2h4 g5g4 e4a8 g4f3 a8f3 c8a6 g2g3
2015-12-30T05&#58;58&#58;48.284000 << info depth 8 seldepth 8 multipv 1 score cp -112 nodes 3993 nps 998250 tbhits 0 time 4 pv h2h4 g5g4 e4a8 c8a6 a8c6 g4f3 c6f3 e7h4
2015-12-30T05&#58;58&#58;48.284000 << info depth 9 seldepth 12 multipv 1 score cp -92 nodes 6562 nps 1312400 tbhits 0 time 5 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 a7a6 h2h4
2015-12-30T05&#58;58&#58;48.299000 << info depth 10 seldepth 13 multipv 1 score cp -107 nodes 13477 nps 1497444 tbhits 0 time 9 pv e4a8 g5g4 d2e2 g4f3 a8f3 a7a5 c1b1 d7f6 h2h4 h8g8 g2g3 c8d7
2015-12-30T05&#58;58&#58;48.299000 << info depth 11 seldepth 16 multipv 1 score cp -106 nodes 21058 nps 1504142 tbhits 0 time 14 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 h8g8 h2h4 c8d7 g2g4 f6d5 g4g5
2015-12-30T05&#58;58&#58;48.299000 << info depth 12 seldepth 18 multipv 1 score cp -99 nodes 31810 nps 1590500 tbhits 0 time 20 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 c8d7 h2h4 a7a5 g2g4 f6d5 f3d5 e6d5
2015-12-30T05&#58;58&#58;48.315000 << info depth 13 seldepth 19 multipv 1 score cp -89 nodes 56711 nps 1667970 tbhits 0 time 34 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 c8d7 h2h4 a7a5 g2g4 f6d5 g4g5 h8g8 a2a3
2015-12-30T05&#58;58&#58;48.377000 << info depth 14 seldepth 20 multipv 1 score cp -91 nodes 154867 nps 1683336 tbhits 0 time 92 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 c8d7 h2h4 f6d5 g2g3 d7a4 a2a3 a7a5 h4h5 e7g5 h5h6 g5h6
2015-12-30T05&#58;58&#58;48.487000 << info nodes 346834 time 202
2015-12-30T05&#58;58&#58;48.487000 << bestmove e4a8 ponder g5g4
epd bm&#58; g4
Engine bestmove in san&#58; Bxa8
Engine bestmove does not match the epd bm??
Total position to be evaluated&#58; 250
Correct&#58; 0, Evaluated&#58; 1, CorrectRate&#58; 0.0%
3. <engine>_test_details.txt
Overwrite mode, some test details on test conditions and engine's score cp/mate and time elapsed calculations.

Code: Select all

AET - Arasan EPD Tester v1.0

Intel&#40;R&#41; Core&#40;TM&#41; i7-2600K CPU @ 3.40GHz
Physical Cores&#58; 4, Hyper-Threading&#58; ON
Physical Memory&#58; Total = 12 GB, Available = 8 GB

Engine&#58; Stockfish 7Beta1 64 POPCNT
Hash&#58; 128, Threads&#58; 1, Time&#58; 0.2s/pos

Test file&#58; arasan18.epd, TotalPos 250
AnalyzedPos &#58; 250, Correct&#58; 40 &#40;16.00%)

Total time as reported by engine    &#58; 00h&#58;00m&#58;50s&#58;448ms
Expected time based on time/pos     &#58; 00h&#58;00m&#58;50s&#58;000ms
Engine start/quit wall time elapsed &#58; 00h&#58;00m&#58;54s&#58;509ms

   Pos  Correct  EngineBM  ScoreCP    Mate    EPD
     1        0      Bxa8      -91       -    r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - bm g4; id "arasan18.1"; c0 "J. Polgar-Berkes, Budapest Hunguest Hotels 2003";
     2        0       Bb3      -21       -    r1b2rk1/1p1nbppp/pq1p4/3B4/P2NP3/2N1p3/1PP3PP/R2Q1R1K w - - bm Rxf7; id "arasan18.2"; c0 "Van der Wiel-Ribli, IBM Amsterdam 1980";
     3        1        g5     +136       -    r1br2k1/pp2qpp1/1b2p2p/3nB3/6P1/3B4/PPPNQP1P/1K1R3R w - - bm g5; id "arasan18.3"; c0 "Victorious &#40;Stockfish 191013SL&#41;-AKIM&#40;Houdini 3 Pro&#41;, playchess.com 2013";
4. <engine>_not_solved.epd
5. <engine>_solved.epd

Command line options:

Code: Select all

ArasanEpdTester_v1 -f "arasan18.epd" -e "stockfish_15122720_x64_modern.exe" --movetime 200 --log --option "Hash value 128, Threads value 1"
That movetime is in milliseconds.
The --log is for engine log only, solved, unsolved and others are always written.
The --option has the format

Code: Select all

-- option "<option name> value <option value>"
That is for single option only. Note about the double quotes.
For two or more options separate it with a comma.

Code: Select all

ArasanEpdTester_v1 -f "arasan18.epd" -e "stockfish_15122720_x64_modern.exe" --movetime 200 --log --option "Hash value 128, Threads value 1"
There is option --name <engine name> for customized engine options, example.

Code: Select all

ArasanEpdTester_v1 -f "arasan18.epd" -e "H4.exe" --movetime 200 --log --option "Hash value 128, Threads value 1, Tactical Mode value true" --name "Houdini 4 x64 Tactical"
That name will be displayed in the summary as well as it is used in <name>_log.txt and others.

Limitation:
1. This tool is not capable of handling an epd with both am and bm in it. It is intended only for either all am or all bm in the epd. Epd's without bm or am will be skipped.
2. It does not interrupt the engine search even when the solution is seen early. It assumes that uci engine follows the command
go movetime <time in millisec>
and should stop searching once it reaches that time limit.

This was only tested on windows 7.

Test it perhaps you may find errors especially in the engine log output. Next version will output file in csv format for viewing in spreadsheet app.

This tool uses python-chess library for converting the engine uci move to SAN move. I can then compare the am and bm in epd to determine if the move has matched or not. The script was converted to exe using py2exe app.

I will release the messy script :) once this tool is usable and stable.

Download the exe file.
https://app.box.com/s/fm0pv5s9gfvymnek2loaggp7mjegeeg5

Download the sample batch file.
https://app.box.com/s/ses501locnbgaaytdjr6qjbua3g4q8tz
jdart
Posts: 4368
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Best EPD Testing Software

Post by jdart »

Can polyglot output multi-pv solutions for a test suite?

--Jon
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Best EPD Testing Software

Post by Ferdy »

jdart wrote:Can polyglot output multi-pv solutions for a test suite?

--Jon
I can't understand that question. Can you describe a sample situation?
Dann Corbit
Posts: 12550
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Best EPD Testing Software

Post by Dann Corbit »

Ferdy wrote:
jdart wrote:Can polyglot output multi-pv solutions for a test suite?

--Jon
I can't understand that question. Can you describe a sample situation?
Perhaps like gradualtest.

Some test sets have different scores for different move choices. Like Tony Hedlund's positional test suite:
http://privat.bahnhof.se/wb432434/fentest.htm
or STS:
https://sites.google.com/site/strategictestsuite/
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Best EPD Testing Software

Post by Ferdy »

Dann Corbit wrote:
Ferdy wrote:
jdart wrote:Can polyglot output multi-pv solutions for a test suite?

--Jon
I can't understand that question. Can you describe a sample situation?
Perhaps like gradualtest.

Some test sets have different scores for different move choices. Like Tony Hedlund's positional test suite:
http://privat.bahnhof.se/wb432434/fentest.htm
or STS:
https://sites.google.com/site/strategictestsuite/
I thought of something like this.
Given epd with bm e2e4, let the engine run in multipv say 3.
If pvmove1 and epd bm e2e4 is not the same, then compare it
with pvmove2, then pvmove3, it gets points if any of the 3 pvmoves from multipv is a match. The pvmove1 gets a high score if it is a match, pvmove3 gets a lowest score if it is a match.
jdart
Posts: 4368
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Best EPD Testing Software

Post by jdart »

I mean, output the n best solutions, not just the single best, with scores.

--Jon
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Best EPD Testing Software

Post by Ferdy »

jdart wrote:I mean, output the n best solutions, not just the single best, with scores.

--Jon
What if the epd has only one bm?
I think polyglot can be revised to output n bestmove though.
jdart
Posts: 4368
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Best EPD Testing Software

Post by jdart »

by n best, I mean what the engine thinks are the best moves, regardless of the bm tag.

You really need to do this if you are not certain of the quality of the testsuite. Many tests are "busted" in that the allegedly best move is not actually the best, or have an alternate solution that is practically as good as the best one.

--Jon