hello
I would like to submit my engine (under developpement) to the famous WAC.epd test. Do you know of a (UCI + Linux) interface that allows that ?
Basically I'm looking for a feature like:
* load positions sequentially
* computer analyzes for X seconds
* compares the best move found by computer to the one indicated in the EPD
* output a score (like 200/300 meaning 200 correct moves out of the 300 WAC.epd positions)
Thank you
PS: here's a position that seems relatively easy for humans but tricky for computers (my engine never finds Rxb2! and Houdini only finds it at depth 20, although it reaches that depth in a couple of seconds...)
[D]8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - -
WAC test
Moderators: hgm, Rebel, chrisw
-
- Posts: 134
- Joined: Mon May 16, 2011 6:58 pm
- Location: Denmark
Re: WAC test
I am quite sure arena has .epd , then you can run on wine, but i dont know if it has wac test
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: WAC test
Three plies for Gaviota.lucasart wrote:hello
I would like to submit my engine (under developpement) to the famous WAC.epd test. Do you know of a (UCI + Linux) interface that allows that ?
Basically I'm looking for a feature like:
* load positions sequentially
* computer analyzes for X seconds
* compares the best move found by computer to the one indicated in the EPD
* output a score (like 200/300 meaning 200 correct moves out of the 300 WAC.epd positions)
Thank you
PS: here's a position that seems relatively easy for humans but tricky for computers (my engine never finds Rxb2! and Houdini only finds it at depth 20, although it reaches that depth in a couple of seconds...)
[D]8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - -
Miguel
Code: Select all
setboard 8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - -
d
+-----------------+
| . . . . . . . . | [Black]
| . . . . . . . p |
| . . . . . k . . |
| . . . . . p . . | Castling:
| p . p . . P . . | ep: -
| P r . p P K . . |
| . P . R . . . P |
| . . . . . . . . |
+-----------------+
analyze
iterative deepening --> start, thread=0
set timer to infinite
26 1: 0.0 +1.01 c3 2.bxc3 Rxc3
96 2: 0.0 +1.01 c3 2.bxc3 Rxc3
269 3 0.0 :-) Rxb2
327 3 0.0 :-) Rxb2
429 3: 0.0 +3.32 Rxb2 2.Rxb2 c3 3.Rb6+ Kf7
781 4: 0.0 +3.30 Rxb2 2.Rxb2 c3 3.Rb6+ Kf7 4.e4
2047 5: 0.0 +3.18 Rxb2 2.Rxb2 c3 3.Rb6+ Kf7 4.Rb7+ Kg6
5.e4
14310 6 0.0 :-) Rxb2
18800 6: 0.1 +4.03 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Rb7+ Kd6
5.e4 c2 6.e5+ Ke6
34030 7: 0.1 +4.31 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Rb7+ Kd6
5.Rxh7 c2 6.Rh8 c1=Q 7.Rd8+ Ke6 8.Rxd3
71096 8 0.2 +4.19 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Rb7+ Kd6
5.Rb4 c2 6.Rxa4 c1=Q 7.Rd4+ Ke6 8.Rxd3
71192 8: 0.2 +4.19 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Rb7+ Kd6
5.Rb4 c2 6.Rxa4 c1=Q 7.Rd4+ Ke6 8.Rxd3
198743 9 0.4 :-) Rxb2
378710 9 0.6 +5.13 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 Kd7 6.Rc5 d2 7.Rxc2 d1=Q
378780 9: 0.6 +5.13 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 Kd7 6.Rc5 d2 7.Rxc2 d1=Q
567366 10 0.8 +5.14 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 Kd7 6.Rc3 d2 7.Rxc2 d1=Q 8.Rc5
Qd2+ 9.Kf3
567476 10: 0.8 +5.14 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 Kd7 6.Rc3 d2 7.Rxc2 d1=Q 8.Rc5
Qd2+ 9.Kf3
1004162 11 1.5 +5.29 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kd6 8.Rxh7
Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
1004994 11: 1.5 +5.29 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kd6 8.Rxh7
Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
1747600 12 2.6 +5.29 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kd6 8.Rxh7
Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
1748679 12: 2.6 +5.29 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kd6 8.Rxh7
Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
4025055 13 6.1 +5.35 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kf6 8.Rc6+
Kf7 9.Rc5 Qd2+ 10.Kf3 Kg6 11.h3
4028199 13: 6.1 +5.35 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kf6 8.Rc6+
Kf7 9.Rc5 Qd2+ 10.Kf3 Kg6 11.h3
9893944 14 15.1 +5.38 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kf6 8.Rxh7
Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
12.Rh6+ Kg7
9902869 14: 15.2 +5.38 Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kf6 8.Rxh7
Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
12.Rh6+ Kg7
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: WAC test
Nice !!michiguel wrote: Three plies for Gaviota.
I think I'll have a look at rationalising a little bit me searhc extensions. And of course I need to add the pawn push on the 7th rank one.
-
- Posts: 27808
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: WAC test
WinBoard does not support EPD test suites yet, and I have toyed with the idea of adding it. Problem is that I have never used test suites of any kind, and am not sure what kind of reporting they expect. If it is just counting correct/incorrect solutions it would be simple enough. But would there also have to be modes for timing how long it takes to find the correct move, and such? Iunderstand therecan also be 'avoid moves', and howshould I score those? Can a single EPD have both a solution move and an avoid move, so that there are actually three possible outcomes for the scoring?
If it would just be a matter of scoring pass/fail, I came to the conclusion that it would be besttonotimplement it in the GUI, but as an engine. There could be apseudo-engine "Tester", which you could then have play a match against the true engine under test (or use any other tournament form WinBoard supports, like gauntlet for testinga collection of engines). The pseudo-engine would then neverplay amove itself, but immediately send a result message after receiving the move of the engine under test, reporting a win for that engine if the move was good, and a loss forit if the move was bad. That would make the GUImove on to the next game, and the match result accounted by the GUI would reflect the engine performance on the test.
You could then just install '"Tester.exe WAC.epd" /fd=".\start positions", and playing a match against that would then make you do the WAC test. No special support in the GUI would be required for this.
If it would just be a matter of scoring pass/fail, I came to the conclusion that it would be besttonotimplement it in the GUI, but as an engine. There could be apseudo-engine "Tester", which you could then have play a match against the true engine under test (or use any other tournament form WinBoard supports, like gauntlet for testinga collection of engines). The pseudo-engine would then neverplay amove itself, but immediately send a result message after receiving the move of the engine under test, reporting a win for that engine if the move was good, and a loss forit if the move was bad. That would make the GUImove on to the next game, and the match result accounted by the GUI would reflect the engine performance on the test.
You could then just install '"Tester.exe WAC.epd" /fd=".\start positions", and playing a match against that would then make you do the WAC test. No special support in the GUI would be required for this.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: WAC test
Polyglot can do something like this for uci engines.Basically I'm looking for a feature like:
* load positions sequentially
* computer analyzes for X seconds
* compares the best move found by computer to the one indicated in the EPD
* output a score (like 200/300 meaning 200 correct moves out of the 300 WAC.epd positions)
Code: Select all
$ polyglot epd-test -noini -ec gnuchess -epd wac.epd
PolyGlot 1.4.58b by Fabien Letouzey.
EngineName=GNU Chess 5.07.172b-64
[Search parameters: MaxDepth=63 MaxTime=5.0 DepthDelta=3 MinDepth=8 MinTime=1.0]
1: "WAC.001" OK 1 score=+99.97 pv [D= 2, T= 0.01s, N= 0k] =Qg6 fxg6 Nxg6#
2: "WAC.002" -- 1 score= +1.39 pv [D=10, T= 1.27s, N= 182k] =Rb8 Rg2 Re8 Rd2 Re7 Rg2 Rd7 Rd2 Rg7 e4 fxe4+ Kxe4 Re7+ Kd4 Re2 Rd1
3: "WAC.003" OK 2 score= +3.88 pv [D= 1, T= 0.00s, N= 0k] =Rg3
4: "WAC.004" OK 3 score=+99.97 pv [D= 1, T= 0.00s, N= 0k] =Qxh7+ Kxh7 hxg6#
5: "WAC.005" OK 4 score=+99.97 pv [D= 1, T= 0.00s, N= 0k] =Qc4+ Nxc4 bxc4#
6: "WAC.006" OK 5 score= +7.61 pv [D= 2, T= 0.00s, N= 0k] =Rb7 Rb5 Rxb5 Kg8 Rb7 a5 Kg5 a4 Ra7 Kf8 Kxg4 a3 Kf4 a2 Ke4 a1=Q Rxa1
7: "WAC.007" OK 6 score= +7.58 pv [D= 2, T= 0.00s, N= 0k] =Ne3 Ngf3 Nxd1 Kxd1 d5 exd6 Bxd6 Nc4 Bb4+ Bd2 O-O e4
8: "WAC.008" OK 7 score=+15.82 pv [D= 1, T= 0.00s, N= 0k] =Rf7 Qg8 Nxg8 Rxg8 Rxd7 f3 Rxg7 Rxg7 Re1 h6 Re8+ Kh7
^C
-
- Posts: 295
- Joined: Wed Mar 08, 2006 8:29 pm
Re: WAC test
Hi,
Polyglot does all you're looking for just nicely.
"polyglot epd-test -epd wac.epd -max-time 3"
Will run the epd-test and analyze every position for 3 second.
Now regarding WAC2. My engine too had troubles to solve that position in reasonable time. But in any case I wouldn't tune the engine for difficult WAC-positions as it would only help to solve those specific positions while weakening the overall play considerably.
With almost no hash my engine takes about 30s to solve WAC2 on an (rather outdated) AMD Sempron 2400+. With 128MB of hash it still takes about 9s.
Roman
Polyglot does all you're looking for just nicely.
"polyglot epd-test -epd wac.epd -max-time 3"
Will run the epd-test and analyze every position for 3 second.
Now regarding WAC2. My engine too had troubles to solve that position in reasonable time. But in any case I wouldn't tune the engine for difficult WAC-positions as it would only help to solve those specific positions while weakening the overall play considerably.
With almost no hash my engine takes about 30s to solve WAC2 on an (rather outdated) AMD Sempron 2400+. With 128MB of hash it still takes about 9s.
Roman
Code: Select all
D Time Score Best line
1 0.00 0.41 ... c4-c3
2 0.00 0.65 ... c4-c3 b2xc3
3 0.00 0.65 ... c4-c3 b2xc3 rb3xc3
4 0.00 0.67 ... c4-c3 b2xc3 rb3xc3 Rd2-a2
5 0.01 0.82 ... c4-c3 b2xc3 rb3xc3 Rd2-a2 d3-d2
6 0.03 1.21 ... c4-c3 b2xc3 rb3xc3 Rd2-a2 d3-d2 Kf3-e2
7 0.07 1.21 ... c4-c3 b2xc3 rb3xc3 Rd2-a2 d3-d2 Kf3-e2 rc3xe3+
Ke2xd2
8 0.16 1.02 ... c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3
h2-h4 h7-h5
9 0.30 1.02 ... c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3
h2-h4 h7-h5 Rd2xd3
10 0.64 1.17 ... c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3
h2-h4 ra3-a1 Rd2xd3 a4-a3
11 1.25 1.07 ... c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3
h2-h4 ra3-a1 Rd2xd3 a4-a3 Rd3-d7
12 2.69 1.21 ... c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3
h2-h4 h7-h5 Rd2xd3 ra3xd3 Ke4xd3 a4-a3
13 8.77 1.58 ... rb3xb2 Rd2xb2 c4-c3 Rb2-b6+ kf6-e7 Rb6-b7+ ke7-d6
Rb7-b6+ kd6-c5 Rb6-b8 c3-c2 Rb8-c8+ kc5-b6 h2-h4
d3-d2 Rc8-b8+ kb6-a7 Rb8-b7+ ka7xb7
14 28.37 2.71 ... rb3xb2 Rd2xb2 c4-c3 Rb2-b6+ kf6-e7 Rb6-b7+ ke7-d6
Rb7-b6+ kd6-c5 Rb6-f6 c3-c2 Rf6xf5+ kc5-c4 Rf5-a5
c2-c1=Q Ra5xa4+ kc4-d5
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: WAC test
Awesome! I'll go ahead and install polyglot. I did it programmatically anyway, with some quick and dirty shell script to parse the output and compare.Michel wrote:Polyglot can do something like this for uci engines.Basically I'm looking for a feature like:
* load positions sequentially
* computer analyzes for X seconds
* compares the best move found by computer to the one indicated in the EPD
* output a score (like 200/300 meaning 200 correct moves out of the 300 WAC.epd positions)
Check the readme file of polyglot for the possible options.Code: Select all
$ polyglot epd-test -noini -ec gnuchess -epd wac.epd PolyGlot 1.4.58b by Fabien Letouzey. EngineName=GNU Chess 5.07.172b-64 [Search parameters: MaxDepth=63 MaxTime=5.0 DepthDelta=3 MinDepth=8 MinTime=1.0] 1: "WAC.001" OK 1 score=+99.97 pv [D= 2, T= 0.01s, N= 0k] =Qg6 fxg6 Nxg6# 2: "WAC.002" -- 1 score= +1.39 pv [D=10, T= 1.27s, N= 182k] =Rb8 Rg2 Re8 Rd2 Re7 Rg2 Rd7 Rd2 Rg7 e4 fxe4+ Kxe4 Re7+ Kd4 Re2 Rd1 3: "WAC.003" OK 2 score= +3.88 pv [D= 1, T= 0.00s, N= 0k] =Rg3 4: "WAC.004" OK 3 score=+99.97 pv [D= 1, T= 0.00s, N= 0k] =Qxh7+ Kxh7 hxg6# 5: "WAC.005" OK 4 score=+99.97 pv [D= 1, T= 0.00s, N= 0k] =Qc4+ Nxc4 bxc4# 6: "WAC.006" OK 5 score= +7.61 pv [D= 2, T= 0.00s, N= 0k] =Rb7 Rb5 Rxb5 Kg8 Rb7 a5 Kg5 a4 Ra7 Kf8 Kxg4 a3 Kf4 a2 Ke4 a1=Q Rxa1 7: "WAC.007" OK 6 score= +7.58 pv [D= 2, T= 0.00s, N= 0k] =Ne3 Ngf3 Nxd1 Kxd1 d5 exd6 Bxd6 Nc4 Bb4+ Bd2 O-O e4 8: "WAC.008" OK 7 score=+15.82 pv [D= 1, T= 0.00s, N= 0k] =Rf7 Qg8 Nxg8 Rxg8 Rxd7 f3 Rxg7 Rxg7 Re1 h6 Re8+ Kh7 ^C
My current score is only WAC 65/300... 8MB Hash and 1 second per position. And I still have some illegal moves in the PV... still a lot of work to do
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: WAC test
And 226/300 *without* htable. clearly something wrong with my htable...lucasart wrote: My current score is only WAC 65/300... 8MB Hash and 1 second per position. And I still have some illegal moves in the PV... still a lot of work to do
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: WAC test
Can polyglot do a small tournament, engine vs engine, and output a PGN and a score ?Roman Hartmann wrote:Hi,
Polyglot does all you're looking for just nicely.
For example I'd want it to play 100 games in 1'+1" against another uci engine, and see the score and the pgn.