Here is a sample system of identifying a good engine based on given positions where we know the result is draw but engines have problem showing its score to be even although they pick the best move. Got the idea from Steve.
I have collected some uci engines and let them analyse 8 positions (with fortress) at 1 sec per position, the search score returned is used so engine will get points, more if it is close to zero.
Code: Select all
A. Platform:
System : Windows
Release : 7
Version : 6.1.7601
Machine : AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
B. Engine parameters:
Threads : 1
Hash : 64mb
Time/pos : 1000ms
C. Test settings:
Total engine count : 45
Total positions : 8 (input file: test.fen)
Total max points : 800
Estimated total time: 8 pos x 1000ms/pos = 8000 ms
D. Summary high points is better:
1 id name Fire 4 x64 (time 6101 ms, Points 537, ratio 67.1%)
2 id name Gull 3 x64 (time 8000 ms, Points 475, ratio 59.4%)
3 id name Houdini 4 x64 (time 8000 ms, Points 472, ratio 59.0%)
4 id name Critter 1.6a 64-bit (time 6110 ms, Points 466, ratio 58.2%)
5 id name Strelka 6 w32 (time 8000 ms, Points 466, ratio 58.2%)
6 id name Komodo 6 64-bit (time 7203 ms, Points 358, ratio 44.8%)
7 id name Texel 1.04 64-bit (time 6930 ms, Points 341, ratio 42.6%)
8 id name Stockfish 131214 64 POPCNT (time 7139 ms, Points 296, ratio 37.0%)
9 id name HIARCS 14 WCSC (time 6161 ms, Points 269, ratio 33.6%)
10 id name Bouquet 1.8 x64 (time 8112 ms, Points 266, ratio 33.2%)
11 id name Hannibal 1.4x64 (time 5572 ms, Points 260, ratio 32.5%)
12 id name Booot 5.2.0(64) (time 140 ms, Points 212, ratio 26.5%)
13 id name Equinox 3.30 x64mp (time 6660 ms, Points 204, ratio 25.5%)
14 id name Deuterium v14.4.35.17 64bit POPCNT (time 6902 ms, Points 200, ratio 25.0%)
15 id name Fruit reloaded 2.1 (time 6099 ms, Points 199, ratio 24.9%)
16 id name Octochess revision 5190 (time 4945 ms, Points 186, ratio 23.2%)
17 id name Amyan 1.72 (time 80 ms, Points 173, ratio 21.6%)
18 id name Naum 4.6 (time 6726 ms, Points 163, ratio 20.4%)
19 id name Protector 1.7.0 (time 7488 ms, Points 159, ratio 19.9%)
20 id name Spike 1.4 (time 7488 ms, Points 130, ratio 16.2%)
21 id name DiscoCheck 5.2.1 (time 6705 ms, Points 127, ratio 15.9%)
22 id name Ruffian 1.0.5 (time 5920 ms, Points 115, ratio 14.4%)
23 id name Yace 0.99.87 (time 7936 ms, Points 115, ratio 14.4%)
24 id name Gaviota v1.0 (time 6661 ms, Points 110, ratio 13.8%)
25 id name Andscacs 0.71 (time 5981 ms, Points 96, ratio 12.0%)
26 id name Maverick 0.51 x64 (time 6193 ms, Points 94, ratio 11.8%)
27 id name Nebula 2.0 (time 6490 ms, Points 93, ratio 11.6%)
28 id name cheng4 0.36c (time 6666 ms, Points 90, ratio 11.2%)
29 id name Deuterium v14.3.34.130 (time 7316 ms, Points 90, ratio 11.2%)
30 id name Nemo SP64o 1.0.1 Beta (time 8000 ms, Points 88, ratio 11.0%)
31 id name AnMon 5.75 (time 7083 ms, Points 85, ratio 10.6%)
32 id name Rybka 2.3.2a mp (time 5625 ms, Points 80, ratio 10.0%)
33 id name Rodent 1.6 (build 6) (time 5835 ms, Points 73, ratio 9.1%)
34 id name Senpai 1.0 (time 7032 ms, Points 70, ratio 8.8%)
35 id name Arasan 17.4 (time 8149 ms, Points 67, ratio 8.4%)
36 id name Rhetoric 1.4.1 x64 (time 5505 ms, Points 66, ratio 8.2%)
37 id name Bobcat 3.25 (time 6503 ms, Points 49, ratio 6.1%)
38 id name GreKo 12.1 (time 6255 ms, Points 49, ratio 6.1%)
39 id name Vajolet2 1.45 (time 6844 ms, Points 41, ratio 5.1%)
40 id name spark-1.0 (time 8112 ms, Points 37, ratio 4.6%)
41 id name DisasterArea-1.54 (time 6724 ms, Points 32, ratio 4.0%)
42 id name Daydreamer 1.75 JA (time 6030 ms, Points 19, ratio 2.4%)
43 id name GNU Chess 5.60-64 (time 6357 ms, Points 14, ratio 1.8%)
44 id name Quazar 0.4 x64 (time 7207 ms, Points 8, ratio 1.0%)
45 id name iCE 2.0 v2240 x64/popcnt (time 2028 ms, Points 6, ratio 0.8%)
E. Positions:
1 6k1/8/6PP/3B1K2/8/2b5/8/8 b - - 0 1
2 8/8/r5kP/6P1/1R3K2/8/8/8 w - - 0 1
3 7k/R7/7P/6K1/8/8/2b5/8 w - - 0 1
4 8/8/5k2/8/8/4qBB1/6K1/8 w - - 0 1
5 8/8/8/3K4/8/4Q3/2p5/1k6 w - - 0 1
6 8/8/4nn2/4k3/8/Q4K2/8/8 w - - 0 1
7 8/k7/p7/Pr6/K1Q5/8/8/8 w - - 0 1
8 k7/p4R2/P7/1K6/8/6b1/8/8 w - - 0 1
F. Point System:
score <= abs(50) : 100 points
score <= abs(100) : 61 - 70, points
score <= abs(150) : 51 - 60, points
score <= abs(200) : 41 - 50, points
score <= abs(250) : 31 - 40, points
score <= abs(300) : 21 - 30, points
score <= abs(350) : 11 - 20, points
score <= abs(400) : 1 - 10, points
Other scores : 0 points
G. Engine that does not report time:
1 id name Gull 3 x64
2 id name Nemo SP64o 1.0.1 Beta
Sample positions. See section E for complete 8 positions.