PGN parsing conundrum

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
jshriver
Posts: 1358
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

PGN parsing conundrum

Post by jshriver »

Till now whenever working with pgn files I use a 3rd party tool to convert it to algebraic notation (e2e4, etc).

However I'm wondering how parsers get around the following scenario.
Say this is your board:

4k3/8/6K1/2n5/8/3P4/5n2/8 b KQkq - 0 1

and black makes this move in the PGN file
Nxd3 how should it know which Knight actually moved, since both knights can legally move to d3? Seems to me a full pgn parser would almost have to track legal moves and various other chess rules just to keep the board state accurate.

Using just algebraic representation it's easy since it's a literal move x to y.

Any comments or suggestions is greatly appreciated.
-Josh
Aaron Becker
Posts: 292
Joined: Tue Jul 07, 2009 4:56 am

Re: PGN parsing conundrum

Post by Aaron Becker »

Nxd3 is not a legally formatted move in that position. The relevant moves in standard algebraic notation are Ncxd3 and Nfxd3. Of course a pgn parser can't be expected to deal with ambiguous moves.
User avatar
ilari
Posts: 750
Joined: Mon Mar 27, 2006 7:45 pm
Location: Finland

Re: PGN parsing conundrum

Post by ilari »

A bad PGN parser would probably just pick the first move that matches "Nxd3". A good parser should notice the ambiguity and throw an error.
Seems to me a full pgn parser would almost have to track legal moves and various other chess rules just to keep the board state accurate.
True
Richard Allbert
Posts: 794
Joined: Wed Jul 19, 2006 9:58 am

Re: PGN parsing conundrum

Post by Richard Allbert »

What caused me a lot of headaches when debugging my PGN parser (for the database statistics maker), was the opposite... when a move was described as ambiguous when it wasn't.

e.g. Ndxf3, where the "d" was not needed. I never found away of working around it.

:evil:
User avatar
ilari
Posts: 750
Joined: Mon Mar 27, 2006 7:45 pm
Location: Finland

Re: PGN parsing conundrum

Post by ilari »

Richard Allbert wrote:What caused me a lot of headaches when debugging my PGN parser (for the database statistics maker), was the opposite... when a move was described as ambiguous when it wasn't.

e.g. Ndxf3, where the "d" was not needed. I never found away of working around it.

:evil:
Did your PGN parser first convert legal moves to SAN format and then compare them to the parsed string? I used to do that, and while it's easy it's also very inflexible and slow. It's better to just do it the hard way by parsing the string properly.
Vinvin
Posts: 5298
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: PGN parsing conundrum

Post by Vinvin »

Or better : it corrects the move by looking at next moves wich knight moved :-)
ilari wrote:A bad PGN parser would probably just pick the first move that matches "Nxd3". A good parser should notice the ambiguity and throw an error.
Seems to me a full pgn parser would almost have to track legal moves and various other chess rules just to keep the board state accurate.
True
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: PGN parsing conundrum

Post by jwes »

Vinvin wrote:Or better : it corrects the move by looking at next moves wich knight moved :-)
A quantum pgn parser :wink: The board is left in an indeterminate state until it is resolved by a subsequent observation.
Richard Allbert
Posts: 794
Joined: Wed Jul 19, 2006 9:58 am

Re: PGN parsing conundrum

Post by Richard Allbert »

It does this...

Code: Select all


uint santomove(const string sanmove)
{

    uint length = sanmove.length();
    uint last = length-1;
    ASS(length>1&&length<8);

   // cout<<"\n move in "<<sanmove;


    bool capture = false;
    bool promote = false;
    bool fileambig = false;
    bool rankambig = false;
    bool found = false;


    string::size_type locater  = sanmove.find("x",0);
    if(locater != string::npos) capture = true;
    locater  = sanmove.find("=",0);
    if(locater != string::npos) promote = true;

//    if(capture) cout<<" capture "; else cout<<" not capture ";
 //   if(promote) cout<<" promote "; else cout<<" not promote ";

    uint piece=0;
    uint side = SIDENOW;
    uint ff,tf,fr,tr,from=0,to=0,cap=0,prom=0,flag=0,move;

    ASS(side>=cW&&side<=cB);

    if(sanmove == "0-0" || sanmove == "O-O" || sanmove == "0-0+" || sanmove == "O-O+" || sanmove == "0-0#" || sanmove == "O-O#")
    {
        cap = pE; flag = FlagCA;
        if(side==cW) { from = E1; to = G1;  }
        else { from = E8; to = G8;  }
    }
    else if(sanmove == "0-0-0" || sanmove == "O-O-O" || sanmove == "0-0-0+" || sanmove == "O-O-O+" || sanmove == "0-0-0#" || sanmove == "O-O-O#")
    {
        cap = pE; flag = FlagCA;
        if(side==cW) { from = E1; to = C1;  }
        else { from = E8; to = C8;  }
    }
    else if(sanmove[0] >= 'a')//pawn move?
    {
      //  cout<<" pawn move ";
        ASS(sanmove[0] >= 'a' && sanmove[0] <= 'h');
        if(side == cW)  piece = pwP;
        else piece = pbP;

        //if we don't have a capture, the first letter will be the file of the target square
        if(!capture) //e4 or e8=Q
        {
            ASS(sanmove[1] >= '1' && sanmove[1] <= '8');
            tf = chartofile(sanmove[0]);
            tr = chartorank(sanmove[1]);
            to = fr2sq(tf,tr);

            ASS(BRDPCE(to)==pE);
            ASS(onbrd(to));

            //could have been a 2 move start
            if(side==cW)
            {
              if(ranks[to]==RANK4)
              {
                if(BRDPCE(to+S)==pE) from=to+S+S;
                else from=to+S;
              }
              else
              {
                  from = to+S;
              }
            }
            else
            {
               if(ranks[to]==RANK5)
              {
                if(BRDPCE(to+N)==pE) from=to+N+N;
                else from=to+N;
              }
              else
              {
                  from = to+N;
              }
            }
        }//end of if !capture
        else //form will be exf4
        {
            //cout<<" pcap ";
            tf = chartofile(sanmove[2]);
            tr = chartorank(sanmove[3]);
            if(side==cW) fr = tr-1;
            else fr = tr+1;
            ff = chartofile(sanmove[0]);
            to = fr2sq(tf,tr);
           // cout<<"\n to = "<<printsquare(to);
            cap = BRDPCE(to);
           // cout<<" f "<<ff<<" r "<<fr;
            from = fr2sq(ff,fr);
           // cout<<" ? "<<printsquare(to);
            ASS(onbrd(from));
            ASS(onbrd(to));
            ASS( (cap==pE && flag==FlagEP) || piecegood(cap) );

            if(BRDPCE(to)==pE) flag=FlagEP; //ep cap
        }
    }
    else//not pawn move
    {
        //identify the moving piece
        switch(sanmove[0])
        {
            case 'K': if (side==cW) piece=pwK;else piece=pbK; break;
            case 'Q': if (side==cW) piece=pwQ;else piece=pbQ; break;
            case 'R': if (side==cW) piece=pwR;else piece=pbR; break;
            case 'B': if (side==cW) piece=pwB;else piece=pbB; break;
            case 'N': if (side==cW) piece=pwN;else piece=pbN; break;
            default: {cout<<"\n piece unidentified, move "<<sanmove<<endl<<endl;exit(1);}
        }

        ASS(piecegood(piece));
    //    cout<<" piece move "<<piecechar(piece)<<" ";

        //idendtify ambiguity
        if(sanmove[1] >='1' && sanmove[1] <= '8')
        {
            rankambig=true;
           // cout<<" rankambig ";
        }
        else
        {
             if(capture)
             {
                 if(sanmove[1] >='a' && sanmove[1] <= 'h')
                 {
                     fileambig=true;
                    // cout<<" fileambig ";
                 }
             }
             else
             {
                 if((sanmove[1] >='a' && sanmove[1] <= 'h') && (sanmove[2] >='a' && sanmove[2] <= 'h') )
                 {
                     fileambig=true;
                   //  cout<<" fileambig2 ";
                 }
             }
        }

        //now identify the to sq
        if(!capture)
        {
            if(!rankambig && !fileambig)
            {
                tf = chartofile(sanmove[1]);
                tr = chartorank(sanmove[2]);
                to = fr2sq(tf,tr);
                ASS(onbrd(to));
                ASS(BRDPCE(to)==pE);
            }
            else
            {
                tf = chartofile(sanmove[2]);
                tr = chartorank(sanmove[3]);
                to = fr2sq(tf,tr);
                ASS(onbrd(to));
                ASS(BRDPCE(to)==pE);
            }
        }
        else//is capture
        {
            if(!rankambig && !fileambig) //Nxd4
            {
                tf = chartofile(sanmove[2]);
                tr = chartorank(sanmove[3]);
                to = fr2sq(tf,tr);
                cap = BRDPCE(to);
                ASS(onbrd(to));
                ASS(BRDPCE(to)!=pE);
            }
            else //Nexd4
            {
                tf = chartofile(sanmove[3]);
                tr = chartorank(sanmove[4]);
                to = fr2sq(tf,tr);
                cap = BRDPCE(to);
                ASS(onbrd(to));
                ASS(BRDPCE(to)!=pE);
            }
        }

        uint ar;
        uint af;
        uint i,pcenum,sq,tsq;
        const int *movedir;
        //now need the from sqaure, start with ambig, it's easier, just look for the piece on relevant file or rank
        if(rankambig)
        {
            ar = chartorank(sanmove[1]);
            ASS(ranks[N*ar]==ar);
            for(i = N*ar; !(i&0x88); i+=E)
            {
                if(BRDPCE(i)==piece) { from=i; break; }
            }
        }
        else if(fileambig)
        {
            af = chartofile(sanmove[1]);
           // cout<<" ambigfile = "<<af;
            for(i = af; !(i&0x88); i+=N)
            {
              //  cout<<" trying "<<printsquare(i);
                if(BRDPCE(i)==piece) {
                     from=i; break; }
            }
        }
        else//not ambig, need to find the piece using the to square and the known piece type
        {
            pcenum = PCENUM(piece);
            while(pcenum)
            {
                sq = PCESQ(piece, pcenum);
              //  cout<<" onsq "<<printsquare(sq);
                ASS(onbrd(sq));
                for(movedir = dirPieces[piece]; *movedir ; movedir++)
                {
                    if(ISSLIDE(piece))
                    {
                     for(tsq = sq+*movedir; !(tsq & 0x88); tsq += *movedir)
                     {
                        // cout<<" to tsq "<<printsquare(tsq);
                        if(tsq==to) {from=sq; found = true; break;}
                        if(BRDPCE(tsq)!=pE) break;
                     }
                    }
                    else
                    {
                        if(sq+*movedir==to) {from=sq; found = true; break;}
                    }
                }
                if(found) break;
                pcenum--;
            }
        }
    }


    if(promote)
    {
        ASS(ranks[to]==RANK1 || ranks[to]==RANK8);
        if(sanmove[last]=='#' || sanmove[last]=='+') last--;
     switch (sanmove[last])
     {
      case 'Q': if(side==cW) prom = pwQ; else prom = pbQ; break;
      case 'R': if(side==cW) prom = pwR; else prom = pbR; break;
      case 'B': if(side==cW) prom = pwB; else prom = pbB; break;
      case 'N': if(side==cW) prom = pwN; else prom = pbN; break;
      default: cout<<"\n sanmove prom piece not found";exit(1);ASS(true==false);break;
      }
    }

           move = (from | (to<<7) | (cap<<16) | (prom<<20) | flag);

         //  cout<<"\n move is "<<printmove(move)<<" "<<move;
         //  cout<<" from "<<FROM(move)<<" to "<<TO(move)<<" flag "<<FLAG(move)<<" cap "<<CAP(move);

#ifdef DEBUG
    gen_all_moves(NULLMOVE);
    found = false;
    uint i;

    //first loop, looking for move and if it gives check, or mate
    for( i = 0; i < MOVECOUNT(PLYNOW); ++i)
    {
           if(MOVE(PLYNOW,i) == move)
           found=true;
    }
    ASS(found);
#endif

return move;
}

Which I'm sure is the most inefficient method possible.... :D :D .

(This isn't my job, it's a hobbby :) )

It then tries to "make" the returned move. If it's illegal, then I enter a slow mode where I generate the SAN internally for every legal move in the current position, and compare. If this fails, give up, move to next game.

It worked on debug checking of a database of 960k games on all but 28 moves... and these were all "incorrect" ambiguity, as described above.

Richard