PGN parsing conundrum

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

PGN parsing conundrum

Post by jshriver »

Till now whenever working with pgn files I use a 3rd party tool to convert it to algebraic notation (e2e4, etc).

However I'm wondering how parsers get around the following scenario.
Say this is your board:

4k3/8/6K1/2n5/8/3P4/5n2/8 b KQkq - 0 1

and black makes this move in the PGN file
Nxd3 how should it know which Knight actually moved, since both knights can legally move to d3? Seems to me a full pgn parser would almost have to track legal moves and various other chess rules just to keep the board state accurate.

Using just algebraic representation it's easy since it's a literal move x to y.

Any comments or suggestions is greatly appreciated.
-Josh
Aaron Becker
Posts: 292
Joined: Tue Jul 07, 2009 4:56 am

Re: PGN parsing conundrum

Post by Aaron Becker »

Nxd3 is not a legally formatted move in that position. The relevant moves in standard algebraic notation are Ncxd3 and Nfxd3. Of course a pgn parser can't be expected to deal with ambiguous moves.
User avatar
ilari
Posts: 750
Joined: Mon Mar 27, 2006 7:45 pm
Location: Finland

Re: PGN parsing conundrum

Post by ilari »

A bad PGN parser would probably just pick the first move that matches "Nxd3". A good parser should notice the ambiguity and throw an error.
Seems to me a full pgn parser would almost have to track legal moves and various other chess rules just to keep the board state accurate.
True
Richard Allbert
Posts: 792
Joined: Wed Jul 19, 2006 9:58 am

Re: PGN parsing conundrum

Post by Richard Allbert »

What caused me a lot of headaches when debugging my PGN parser (for the database statistics maker), was the opposite... when a move was described as ambiguous when it wasn't.

e.g. Ndxf3, where the "d" was not needed. I never found away of working around it.

:evil:
User avatar
ilari
Posts: 750
Joined: Mon Mar 27, 2006 7:45 pm
Location: Finland

Re: PGN parsing conundrum

Post by ilari »

Richard Allbert wrote:What caused me a lot of headaches when debugging my PGN parser (for the database statistics maker), was the opposite... when a move was described as ambiguous when it wasn't.

e.g. Ndxf3, where the "d" was not needed. I never found away of working around it.

:evil:
Did your PGN parser first convert legal moves to SAN format and then compare them to the parsed string? I used to do that, and while it's easy it's also very inflexible and slow. It's better to just do it the hard way by parsing the string properly.
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: PGN parsing conundrum

Post by Vinvin »

Or better : it corrects the move by looking at next moves wich knight moved :-)
ilari wrote:A bad PGN parser would probably just pick the first move that matches "Nxd3". A good parser should notice the ambiguity and throw an error.
Seems to me a full pgn parser would almost have to track legal moves and various other chess rules just to keep the board state accurate.
True
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: PGN parsing conundrum

Post by jwes »

Vinvin wrote:Or better : it corrects the move by looking at next moves wich knight moved :-)
A quantum pgn parser :wink: The board is left in an indeterminate state until it is resolved by a subsequent observation.
Richard Allbert
Posts: 792
Joined: Wed Jul 19, 2006 9:58 am

Re: PGN parsing conundrum

Post by Richard Allbert »

It does this...

Code: Select all


uint santomove(const string sanmove)
{

    uint length = sanmove.length();
    uint last = length-1;
    ASS&#40;length>1&&length<8&#41;;

   // cout<<"\n move in "<<sanmove;


    bool capture = false;
    bool promote = false;
    bool fileambig = false;
    bool rankambig = false;
    bool found = false;


    string&#58;&#58;size_type locater  = sanmove.find&#40;"x",0&#41;;
    if&#40;locater != string&#58;&#58;npos&#41; capture = true;
    locater  = sanmove.find&#40;"=",0&#41;;
    if&#40;locater != string&#58;&#58;npos&#41; promote = true;

//    if&#40;capture&#41; cout<<" capture "; else cout<<" not capture ";
 //   if&#40;promote&#41; cout<<" promote "; else cout<<" not promote ";

    uint piece=0;
    uint side = SIDENOW;
    uint ff,tf,fr,tr,from=0,to=0,cap=0,prom=0,flag=0,move;

    ASS&#40;side>=cW&&side<=cB&#41;;

    if&#40;sanmove == "0-0" || sanmove == "O-O" || sanmove == "0-0+" || sanmove == "O-O+" || sanmove == "0-0#" || sanmove == "O-O#")
    &#123;
        cap = pE; flag = FlagCA;
        if&#40;side==cW&#41; &#123; from = E1; to = G1;  &#125;
        else &#123; from = E8; to = G8;  &#125;
    &#125;
    else if&#40;sanmove == "0-0-0" || sanmove == "O-O-O" || sanmove == "0-0-0+" || sanmove == "O-O-O+" || sanmove == "0-0-0#" || sanmove == "O-O-O#")
    &#123;
        cap = pE; flag = FlagCA;
        if&#40;side==cW&#41; &#123; from = E1; to = C1;  &#125;
        else &#123; from = E8; to = C8;  &#125;
    &#125;
    else if&#40;sanmove&#91;0&#93; >= 'a')//pawn move?
    &#123;
      //  cout<<" pawn move ";
        ASS&#40;sanmove&#91;0&#93; >= 'a' && sanmove&#91;0&#93; <= 'h');
        if&#40;side == cW&#41;  piece = pwP;
        else piece = pbP;

        //if we don't have a capture, the first letter will be the file of the target square
        if&#40;!capture&#41; //e4 or e8=Q
        &#123;
            ASS&#40;sanmove&#91;1&#93; >= '1' && sanmove&#91;1&#93; <= '8');
            tf = chartofile&#40;sanmove&#91;0&#93;);
            tr = chartorank&#40;sanmove&#91;1&#93;);
            to = fr2sq&#40;tf,tr&#41;;

            ASS&#40;BRDPCE&#40;to&#41;==pE&#41;;
            ASS&#40;onbrd&#40;to&#41;);

            //could have been a 2 move start
            if&#40;side==cW&#41;
            &#123;
              if&#40;ranks&#91;to&#93;==RANK4&#41;
              &#123;
                if&#40;BRDPCE&#40;to+S&#41;==pE&#41; from=to+S+S;
                else from=to+S;
              &#125;
              else
              &#123;
                  from = to+S;
              &#125;
            &#125;
            else
            &#123;
               if&#40;ranks&#91;to&#93;==RANK5&#41;
              &#123;
                if&#40;BRDPCE&#40;to+N&#41;==pE&#41; from=to+N+N;
                else from=to+N;
              &#125;
              else
              &#123;
                  from = to+N;
              &#125;
            &#125;
        &#125;//end of if !capture
        else //form will be exf4
        &#123;
            //cout<<" pcap ";
            tf = chartofile&#40;sanmove&#91;2&#93;);
            tr = chartorank&#40;sanmove&#91;3&#93;);
            if&#40;side==cW&#41; fr = tr-1;
            else fr = tr+1;
            ff = chartofile&#40;sanmove&#91;0&#93;);
            to = fr2sq&#40;tf,tr&#41;;
           // cout<<"\n to = "<<printsquare&#40;to&#41;;
            cap = BRDPCE&#40;to&#41;;
           // cout<<" f "<<ff<<" r "<<fr;
            from = fr2sq&#40;ff,fr&#41;;
           // cout<<" ? "<<printsquare&#40;to&#41;;
            ASS&#40;onbrd&#40;from&#41;);
            ASS&#40;onbrd&#40;to&#41;);
            ASS&#40; &#40;cap==pE && flag==FlagEP&#41; || piecegood&#40;cap&#41; );

            if&#40;BRDPCE&#40;to&#41;==pE&#41; flag=FlagEP; //ep cap
        &#125;
    &#125;
    else//not pawn move
    &#123;
        //identify the moving piece
        switch&#40;sanmove&#91;0&#93;)
        &#123;
            case 'K'&#58; if &#40;side==cW&#41; piece=pwK;else piece=pbK; break;
            case 'Q'&#58; if &#40;side==cW&#41; piece=pwQ;else piece=pbQ; break;
            case 'R'&#58; if &#40;side==cW&#41; piece=pwR;else piece=pbR; break;
            case 'B'&#58; if &#40;side==cW&#41; piece=pwB;else piece=pbB; break;
            case 'N'&#58; if &#40;side==cW&#41; piece=pwN;else piece=pbN; break;
            default&#58; &#123;cout<<"\n piece unidentified, move "<<sanmove<<endl<<endl;exit&#40;1&#41;;&#125;
        &#125;

        ASS&#40;piecegood&#40;piece&#41;);
    //    cout<<" piece move "<<piecechar&#40;piece&#41;<<" ";

        //idendtify ambiguity
        if&#40;sanmove&#91;1&#93; >='1' && sanmove&#91;1&#93; <= '8')
        &#123;
            rankambig=true;
           // cout<<" rankambig ";
        &#125;
        else
        &#123;
             if&#40;capture&#41;
             &#123;
                 if&#40;sanmove&#91;1&#93; >='a' && sanmove&#91;1&#93; <= 'h')
                 &#123;
                     fileambig=true;
                    // cout<<" fileambig ";
                 &#125;
             &#125;
             else
             &#123;
                 if&#40;&#40;sanmove&#91;1&#93; >='a' && sanmove&#91;1&#93; <= 'h') && &#40;sanmove&#91;2&#93; >='a' && sanmove&#91;2&#93; <= 'h') )
                 &#123;
                     fileambig=true;
                   //  cout<<" fileambig2 ";
                 &#125;
             &#125;
        &#125;

        //now identify the to sq
        if&#40;!capture&#41;
        &#123;
            if&#40;!rankambig && !fileambig&#41;
            &#123;
                tf = chartofile&#40;sanmove&#91;1&#93;);
                tr = chartorank&#40;sanmove&#91;2&#93;);
                to = fr2sq&#40;tf,tr&#41;;
                ASS&#40;onbrd&#40;to&#41;);
                ASS&#40;BRDPCE&#40;to&#41;==pE&#41;;
            &#125;
            else
            &#123;
                tf = chartofile&#40;sanmove&#91;2&#93;);
                tr = chartorank&#40;sanmove&#91;3&#93;);
                to = fr2sq&#40;tf,tr&#41;;
                ASS&#40;onbrd&#40;to&#41;);
                ASS&#40;BRDPCE&#40;to&#41;==pE&#41;;
            &#125;
        &#125;
        else//is capture
        &#123;
            if&#40;!rankambig && !fileambig&#41; //Nxd4
            &#123;
                tf = chartofile&#40;sanmove&#91;2&#93;);
                tr = chartorank&#40;sanmove&#91;3&#93;);
                to = fr2sq&#40;tf,tr&#41;;
                cap = BRDPCE&#40;to&#41;;
                ASS&#40;onbrd&#40;to&#41;);
                ASS&#40;BRDPCE&#40;to&#41;!=pE&#41;;
            &#125;
            else //Nexd4
            &#123;
                tf = chartofile&#40;sanmove&#91;3&#93;);
                tr = chartorank&#40;sanmove&#91;4&#93;);
                to = fr2sq&#40;tf,tr&#41;;
                cap = BRDPCE&#40;to&#41;;
                ASS&#40;onbrd&#40;to&#41;);
                ASS&#40;BRDPCE&#40;to&#41;!=pE&#41;;
            &#125;
        &#125;

        uint ar;
        uint af;
        uint i,pcenum,sq,tsq;
        const int *movedir;
        //now need the from sqaure, start with ambig, it's easier, just look for the piece on relevant file or rank
        if&#40;rankambig&#41;
        &#123;
            ar = chartorank&#40;sanmove&#91;1&#93;);
            ASS&#40;ranks&#91;N*ar&#93;==ar&#41;;
            for&#40;i = N*ar; !&#40;i&0x88&#41;; i+=E&#41;
            &#123;
                if&#40;BRDPCE&#40;i&#41;==piece&#41; &#123; from=i; break; &#125;
            &#125;
        &#125;
        else if&#40;fileambig&#41;
        &#123;
            af = chartofile&#40;sanmove&#91;1&#93;);
           // cout<<" ambigfile = "<<af;
            for&#40;i = af; !&#40;i&0x88&#41;; i+=N&#41;
            &#123;
              //  cout<<" trying "<<printsquare&#40;i&#41;;
                if&#40;BRDPCE&#40;i&#41;==piece&#41; &#123;
                     from=i; break; &#125;
            &#125;
        &#125;
        else//not ambig, need to find the piece using the to square and the known piece type
        &#123;
            pcenum = PCENUM&#40;piece&#41;;
            while&#40;pcenum&#41;
            &#123;
                sq = PCESQ&#40;piece, pcenum&#41;;
              //  cout<<" onsq "<<printsquare&#40;sq&#41;;
                ASS&#40;onbrd&#40;sq&#41;);
                for&#40;movedir = dirPieces&#91;piece&#93;; *movedir ; movedir++)
                &#123;
                    if&#40;ISSLIDE&#40;piece&#41;)
                    &#123;
                     for&#40;tsq = sq+*movedir; !&#40;tsq & 0x88&#41;; tsq += *movedir&#41;
                     &#123;
                        // cout<<" to tsq "<<printsquare&#40;tsq&#41;;
                        if&#40;tsq==to&#41; &#123;from=sq; found = true; break;&#125;
                        if&#40;BRDPCE&#40;tsq&#41;!=pE&#41; break;
                     &#125;
                    &#125;
                    else
                    &#123;
                        if&#40;sq+*movedir==to&#41; &#123;from=sq; found = true; break;&#125;
                    &#125;
                &#125;
                if&#40;found&#41; break;
                pcenum--;
            &#125;
        &#125;
    &#125;


    if&#40;promote&#41;
    &#123;
        ASS&#40;ranks&#91;to&#93;==RANK1 || ranks&#91;to&#93;==RANK8&#41;;
        if&#40;sanmove&#91;last&#93;=='#' || sanmove&#91;last&#93;=='+') last--;
     switch &#40;sanmove&#91;last&#93;)
     &#123;
      case 'Q'&#58; if&#40;side==cW&#41; prom = pwQ; else prom = pbQ; break;
      case 'R'&#58; if&#40;side==cW&#41; prom = pwR; else prom = pbR; break;
      case 'B'&#58; if&#40;side==cW&#41; prom = pwB; else prom = pbB; break;
      case 'N'&#58; if&#40;side==cW&#41; prom = pwN; else prom = pbN; break;
      default&#58; cout<<"\n sanmove prom piece not found";exit&#40;1&#41;;ASS&#40;true==false&#41;;break;
      &#125;
    &#125;

           move = &#40;from | &#40;to<<7&#41; | &#40;cap<<16&#41; | &#40;prom<<20&#41; | flag&#41;;

         //  cout<<"\n move is "<<printmove&#40;move&#41;<<" "<<move;
         //  cout<<" from "<<FROM&#40;move&#41;<<" to "<<TO&#40;move&#41;<<" flag "<<FLAG&#40;move&#41;<<" cap "<<CAP&#40;move&#41;;

#ifdef DEBUG
    gen_all_moves&#40;NULLMOVE&#41;;
    found = false;
    uint i;

    //first loop, looking for move and if it gives check, or mate
    for&#40; i = 0; i < MOVECOUNT&#40;PLYNOW&#41;; ++i&#41;
    &#123;
           if&#40;MOVE&#40;PLYNOW,i&#41; == move&#41;
           found=true;
    &#125;
    ASS&#40;found&#41;;
#endif

return move;
&#125;

Which I'm sure is the most inefficient method possible.... :D :D .

(This isn't my job, it's a hobbby :) )

It then tries to "make" the returned move. If it's illegal, then I enter a slow mode where I generate the SAN internally for every legal move in the current position, and compare. If this fails, give up, move to next game.

It worked on debug checking of a database of 960k games on all but 28 moves... and these were all "incorrect" ambiguity, as described above.

Richard