WAC test

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
lucasart
Posts: 3036
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

WAC test

Post by lucasart » Sun Jul 17, 2011 6:54 am

hello

I would like to submit my engine (under developpement) to the famous WAC.epd test. Do you know of a (UCI + Linux) interface that allows that ?
Basically I'm looking for a feature like:
* load positions sequentially
* computer analyzes for X seconds
* compares the best move found by computer to the one indicated in the EPD
* output a score (like 200/300 meaning 200 correct moves out of the 300 WAC.epd positions)

Thank you

PS: here's a position that seems relatively easy for humans but tricky for computers (my engine never finds Rxb2! and Houdini only finds it at depth 20, although it reaches that depth in a couple of seconds...)

[D]8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - -

ethanara
Posts: 134
Joined: Mon May 16, 2011 4:58 pm
Location: Denmark
Contact:

Re: WAC test

Post by ethanara » Sun Jul 17, 2011 7:03 am

I am quite sure arena has .epd , then you can run on wine, but i dont know if it has wac test

User avatar
michiguel
Posts: 6387
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Re: WAC test

Post by michiguel » Sun Jul 17, 2011 7:09 am

lucasart wrote:hello

I would like to submit my engine (under developpement) to the famous WAC.epd test. Do you know of a (UCI + Linux) interface that allows that ?
Basically I'm looking for a feature like:
* load positions sequentially
* computer analyzes for X seconds
* compares the best move found by computer to the one indicated in the EPD
* output a score (like 200/300 meaning 200 correct moves out of the 300 WAC.epd positions)

Thank you

PS: here's a position that seems relatively easy for humans but tricky for computers (my engine never finds Rxb2! and Houdini only finds it at depth 20, although it reaches that depth in a couple of seconds...)

[D]8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - -
Three plies for Gaviota.

Miguel

Code: Select all

setboard 8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - -
d
+-----------------+
| . . . . . . . . | [Black]
| . . . . . . . p |
| . . . . . k . . |
| . . . . . p . . |    Castling: 
| p . p . . P . . |    ep: -
| P r . p P K . . |
| . P . R . . . P |
| . . . . . . . . |
+-----------------+

analyze
iterative deepening --> start, thread=0
set timer to infinite
        26   1:      0.0    +1.01  c3 2.bxc3 Rxc3
        96   2:      0.0    +1.01  c3 2.bxc3 Rxc3
       269   3       0.0      :-)  Rxb2
       327   3       0.0      :-)  Rxb2
       429   3:      0.0    +3.32  Rxb2 2.Rxb2 c3 3.Rb6+ Kf7
       781   4:      0.0    +3.30  Rxb2 2.Rxb2 c3 3.Rb6+ Kf7 4.e4
      2047   5:      0.0    +3.18  Rxb2 2.Rxb2 c3 3.Rb6+ Kf7 4.Rb7+ Kg6
                                   5.e4
     14310   6       0.0      :-)  Rxb2
     18800   6:      0.1    +4.03  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Rb7+ Kd6
                                   5.e4 c2 6.e5+ Ke6
     34030   7:      0.1    +4.31  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Rb7+ Kd6
                                   5.Rxh7 c2 6.Rh8 c1=Q 7.Rd8+ Ke6 8.Rxd3
     71096   8       0.2    +4.19  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Rb7+ Kd6
                                   5.Rb4 c2 6.Rxa4 c1=Q 7.Rd4+ Ke6 8.Rxd3
     71192   8:      0.2    +4.19  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Rb7+ Kd6
                                   5.Rb4 c2 6.Rxa4 c1=Q 7.Rd4+ Ke6 8.Rxd3
    198743   9       0.4      :-)  Rxb2
    378710   9       0.6    +5.13  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 Kd7 6.Rc5 d2 7.Rxc2 d1=Q
    378780   9:      0.6    +5.13  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 Kd7 6.Rc5 d2 7.Rxc2 d1=Q
    567366  10       0.8    +5.14  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 Kd7 6.Rc3 d2 7.Rxc2 d1=Q 8.Rc5
                                   Qd2+ 9.Kf3
    567476  10:      0.8    +5.14  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 Kd7 6.Rc3 d2 7.Rxc2 d1=Q 8.Rc5
                                   Qd2+ 9.Kf3
   1004162  11       1.5    +5.29  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kd6 8.Rxh7
                                   Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
   1004994  11:      1.5    +5.29  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kd6 8.Rxh7
                                   Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
   1747600  12       2.6    +5.29  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kd6 8.Rxh7
                                   Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
   1748679  12:      2.6    +5.29  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kd6 8.Rxh7
                                   Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
   4025055  13       6.1    +5.35  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kf6 8.Rc6+
                                   Kf7 9.Rc5 Qd2+ 10.Kf3 Kg6 11.h3
   4028199  13:      6.1    +5.35  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kf6 8.Rc6+
                                   Kf7 9.Rc5 Qd2+ 10.Kf3 Kg6 11.h3
   9893944  14      15.1    +5.38  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kf6 8.Rxh7
                                   Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
                                   12.Rh6+ Kg7
   9902869  14:     15.2    +5.38  Rxb2 2.Rxb2 c3 3.Rb6+ Ke7 4.Kf2 c2
                                   5.Rc6 d2 6.Rxc2 d1=Q 7.Rc7+ Kf6 8.Rxh7
                                   Qd2+ 9.Kf3 Qd5+ 10.Ke2 Qa2+ 11.Kf3 Qxa3
                                   12.Rh6+ Kg7

User avatar
lucasart
Posts: 3036
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: WAC test

Post by lucasart » Sun Jul 17, 2011 7:20 am

michiguel wrote: Three plies for Gaviota.
Nice !!

I think I'll have a look at rationalising a little bit me searhc extensions. And of course I need to add the pawn push on the 7th rank one.

User avatar
hgm
Posts: 23523
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: WAC test

Post by hgm » Sun Jul 17, 2011 8:22 am

WinBoard does not support EPD test suites yet, and I have toyed with the idea of adding it. Problem is that I have never used test suites of any kind, and am not sure what kind of reporting they expect. If it is just counting correct/incorrect solutions it would be simple enough. But would there also have to be modes for timing how long it takes to find the correct move, and such? Iunderstand therecan also be 'avoid moves', and howshould I score those? Can a single EPD have both a solution move and an avoid move, so that there are actually three possible outcomes for the scoring?

If it would just be a matter of scoring pass/fail, I came to the conclusion that it would be besttonotimplement it in the GUI, but as an engine. There could be apseudo-engine "Tester", which you could then have play a match against the true engine under test (or use any other tournament form WinBoard supports, like gauntlet for testinga collection of engines). The pseudo-engine would then neverplay amove itself, but immediately send a result message after receiving the move of the engine under test, reporting a win for that engine if the move was good, and a loss forit if the move was bad. That would make the GUImove on to the next game, and the match result accounted by the GUI would reflect the engine performance on the test.

You could then just install '"Tester.exe WAC.epd" /fd=".\start positions", and playing a match against that would then make you do the WAC test. No special support in the GUI would be required for this.

Michel
Posts: 2038
Joined: Sun Sep 28, 2008 11:50 pm

Re: WAC test

Post by Michel » Sun Jul 17, 2011 8:43 am

Basically I'm looking for a feature like:
* load positions sequentially
* computer analyzes for X seconds
* compares the best move found by computer to the one indicated in the EPD
* output a score (like 200/300 meaning 200 correct moves out of the 300 WAC.epd positions)
Polyglot can do something like this for uci engines.

Code: Select all

$ polyglot epd-test -noini -ec gnuchess -epd wac.epd
PolyGlot 1.4.58b by Fabien Letouzey.

EngineName=GNU Chess 5.07.172b-64

[Search parameters: MaxDepth=63   MaxTime=5.0   DepthDelta=3   MinDepth=8   MinTime=1.0]

 1: "WAC.001"       OK    1 score=+99.97    pv [D= 2, T=   0.01s, N=     0k] =Qg6 fxg6 Nxg6#
 2: "WAC.002"       --    1 score= +1.39    pv [D=10, T=   1.27s, N=   182k] =Rb8 Rg2 Re8 Rd2 Re7 Rg2 Rd7 Rd2 Rg7 e4 fxe4+ Kxe4 Re7+ Kd4 Re2 Rd1
 3: "WAC.003"       OK    2 score= +3.88    pv [D= 1, T=   0.00s, N=     0k] =Rg3
 4: "WAC.004"       OK    3 score=+99.97    pv [D= 1, T=   0.00s, N=     0k] =Qxh7+ Kxh7 hxg6#
 5: "WAC.005"       OK    4 score=+99.97    pv [D= 1, T=   0.00s, N=     0k] =Qc4+ Nxc4 bxc4#
 6: "WAC.006"       OK    5 score= +7.61    pv [D= 2, T=   0.00s, N=     0k] =Rb7 Rb5 Rxb5 Kg8 Rb7 a5 Kg5 a4 Ra7 Kf8 Kxg4 a3 Kf4 a2 Ke4 a1=Q Rxa1
 7: "WAC.007"       OK    6 score= +7.58    pv [D= 2, T=   0.00s, N=     0k] =Ne3 Ngf3 Nxd1 Kxd1 d5 exd6 Bxd6 Nc4 Bb4+ Bd2 O-O e4
 8: "WAC.008"       OK    7 score=+15.82    pv [D= 1, T=   0.00s, N=     0k] =Rf7 Qg8 Nxg8 Rxg8 Rxd7 f3 Rxg7 Rxg7 Re1 h6 Re8+ Kh7
^C
Check the readme file of polyglot for the possible options.

User avatar
Roman Hartmann
Posts: 295
Joined: Wed Mar 08, 2006 7:29 pm
Contact:

Re: WAC test

Post by Roman Hartmann » Sun Jul 17, 2011 9:26 am

Hi,
Polyglot does all you're looking for just nicely.

"polyglot epd-test -epd wac.epd -max-time 3"

Will run the epd-test and analyze every position for 3 second.

Now regarding WAC2. My engine too had troubles to solve that position in reasonable time. But in any case I wouldn't tune the engine for difficult WAC-positions as it would only help to solve those specific positions while weakening the overall play considerably.

With almost no hash my engine takes about 30s to solve WAC2 on an (rather outdated) AMD Sempron 2400+. With 128MB of hash it still takes about 9s.

Roman

Code: Select all

  D   Time    Score  Best line
  1   0.00    0.41   ...  c4-c3 
  2   0.00    0.65   ...  c4-c3 b2xc3 
  3   0.00    0.65   ...  c4-c3 b2xc3 rb3xc3 
  4   0.00    0.67   ...  c4-c3 b2xc3 rb3xc3 Rd2-a2 
  5   0.01    0.82   ...  c4-c3 b2xc3 rb3xc3 Rd2-a2 d3-d2 
  6   0.03    1.21   ...  c4-c3 b2xc3 rb3xc3 Rd2-a2 d3-d2 Kf3-e2 
  7   0.07    1.21   ...  c4-c3 b2xc3 rb3xc3 Rd2-a2 d3-d2 Kf3-e2 rc3xe3+ 
                     Ke2xd2 
  8   0.16    1.02   ...  c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3 
                     h2-h4 h7-h5 
  9   0.30    1.02   ...  c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3 
                     h2-h4 h7-h5 Rd2xd3 
 10   0.64    1.17   ...  c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3 
                     h2-h4 ra3-a1 Rd2xd3 a4-a3 
 11   1.25    1.07   ...  c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3 
                     h2-h4 ra3-a1 Rd2xd3 a4-a3 Rd3-d7 
 12   2.69    1.21   ...  c4-c3 b2xc3 rb3xc3 e3-e4 f5xe4+ Kf3xe4 rc3xa3 
                     h2-h4 h7-h5 Rd2xd3 ra3xd3 Ke4xd3 a4-a3 
 13   8.77    1.58   ...  rb3xb2 Rd2xb2 c4-c3 Rb2-b6+ kf6-e7 Rb6-b7+ ke7-d6 
                     Rb7-b6+ kd6-c5 Rb6-b8 c3-c2 Rb8-c8+ kc5-b6 h2-h4 
                     d3-d2 Rc8-b8+ kb6-a7 Rb8-b7+ ka7xb7 
 14  28.37    2.71   ...  rb3xb2 Rd2xb2 c4-c3 Rb2-b6+ kf6-e7 Rb6-b7+ ke7-d6 
                     Rb7-b6+ kd6-c5 Rb6-f6 c3-c2 Rf6xf5+ kc5-c4 Rf5-a5 
                     c2-c1=Q Ra5xa4+ kc4-d5 

User avatar
lucasart
Posts: 3036
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: WAC test

Post by lucasart » Sun Jul 17, 2011 1:58 pm

Michel wrote:
Basically I'm looking for a feature like:
* load positions sequentially
* computer analyzes for X seconds
* compares the best move found by computer to the one indicated in the EPD
* output a score (like 200/300 meaning 200 correct moves out of the 300 WAC.epd positions)
Polyglot can do something like this for uci engines.

Code: Select all

$ polyglot epd-test -noini -ec gnuchess -epd wac.epd
PolyGlot 1.4.58b by Fabien Letouzey.

EngineName=GNU Chess 5.07.172b-64

[Search parameters: MaxDepth=63   MaxTime=5.0   DepthDelta=3   MinDepth=8   MinTime=1.0]

 1: "WAC.001"       OK    1 score=+99.97    pv [D= 2, T=   0.01s, N=     0k] =Qg6 fxg6 Nxg6#
 2: "WAC.002"       --    1 score= +1.39    pv [D=10, T=   1.27s, N=   182k] =Rb8 Rg2 Re8 Rd2 Re7 Rg2 Rd7 Rd2 Rg7 e4 fxe4+ Kxe4 Re7+ Kd4 Re2 Rd1
 3: "WAC.003"       OK    2 score= +3.88    pv [D= 1, T=   0.00s, N=     0k] =Rg3
 4: "WAC.004"       OK    3 score=+99.97    pv [D= 1, T=   0.00s, N=     0k] =Qxh7+ Kxh7 hxg6#
 5: "WAC.005"       OK    4 score=+99.97    pv [D= 1, T=   0.00s, N=     0k] =Qc4+ Nxc4 bxc4#
 6: "WAC.006"       OK    5 score= +7.61    pv [D= 2, T=   0.00s, N=     0k] =Rb7 Rb5 Rxb5 Kg8 Rb7 a5 Kg5 a4 Ra7 Kf8 Kxg4 a3 Kf4 a2 Ke4 a1=Q Rxa1
 7: "WAC.007"       OK    6 score= +7.58    pv [D= 2, T=   0.00s, N=     0k] =Ne3 Ngf3 Nxd1 Kxd1 d5 exd6 Bxd6 Nc4 Bb4+ Bd2 O-O e4
 8: "WAC.008"       OK    7 score=+15.82    pv [D= 1, T=   0.00s, N=     0k] =Rf7 Qg8 Nxg8 Rxg8 Rxd7 f3 Rxg7 Rxg7 Re1 h6 Re8+ Kh7
^C
Check the readme file of polyglot for the possible options.
Awesome! I'll go ahead and install polyglot. I did it programmatically anyway, with some quick and dirty shell script to parse the output and compare.
My current score is only WAC 65/300... 8MB Hash and 1 second per position. And I still have some illegal moves in the PV... still a lot of work to do :(

User avatar
lucasart
Posts: 3036
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: WAC test

Post by lucasart » Sun Jul 17, 2011 2:10 pm

lucasart wrote: My current score is only WAC 65/300... 8MB Hash and 1 second per position. And I still have some illegal moves in the PV... still a lot of work to do :(
And 226/300 *without* htable. clearly something wrong with my htable...

User avatar
lucasart
Posts: 3036
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: WAC test

Post by lucasart » Sun Jul 17, 2011 2:15 pm

Roman Hartmann wrote:Hi,
Polyglot does all you're looking for just nicely.
Can polyglot do a small tournament, engine vs engine, and output a PGN and a score ?
For example I'd want it to play 100 games in 1'+1" against another uci engine, and see the score and the pgn.

Post Reply