Chess game data structure in C

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mathmoi
Posts: 286
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Chess game data structure in C

Post by mathmoi »

Hi,

I'm currently working on a PGN parser in C, a language I don't have much experience in. The reason I'm writing it in C is because I might want to open source it in the future if I can get proud enough of my code and it will be more useful to others if it can be integrated in C programs than if it's written in C++.

I'm at the step where I need to find out how I will store a parsed game in memory. The PGN specifications allows for lot of complicated cases. For example, a move can be followed by one of more of the following : A comment, a Numeric Annotation Glyph, a variation (this one can be nested indefinitely), the end of the game. To makes maters worse some of these elements can happens before any move (for example a comment before the first move of a variation.

Last year in this thread. H.G.Muller suggested that each types of elements that can be attached to a move (variations, comment, etc.) be stored in a separate linked list on each move. I like this idea, even if it has some shortcomings. For example, elements that happens before the first move need to be linked to the variation instead of a move. We also need a way to determine the order in which theses elements originaly appeared in the PGN file (Imagine somethings like : move - comment - variation - NAG - comment).

Considering all this I came up with the following structures. Since I'd like this parser to be useful to others, I though I would ask your opinion about this data structure.

Code: Select all

typedef struct PgnTag
{
   char* name;
   char* value;
   struct PgnTag* next;
} PgnTag;

typedef struct PgnMove
{
   char* move;
   
   PgnComments* comments; /* linked list */
   PgnNag* nags; /* linked list */
   PgnVariation* variations; /* linked list */

   PgnMove* next; 
} PgnMove;

typedef struct PgnVariation
{
   PgnMove* moves; /* linked list. Should I use an array instead? */
   int order; /* orders within the list of elements (comments, variations, NAG) */

   /* The next lines are for comments/nags/variations that would
    * appears before the first move of the variation */
   PgnComments* comments; /* linked list */
   PgnNag* nags; /* linked list */
   PgnVariation* variations; /* linked list */

} PgnVariation;

typedef struct PgnGame
{
   char* Event;
   char* Site;
   char* Date;
   char* Round;
   char* White;
   char* Black;
   char* Result;

   PgnTag* otherTags;

   PgnVariation* mainVariation;
} PgnGame;
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Chess game data structure in C

Post by sje »

I would treat all PGN tags the same way and so not have special storage treatment for the basic seven tags.

For the PGN move text, I would have a two-way linked list of MTIs (Move Text Items). An MTI can be a move, a NAG, a Game Termination Mark, or a recursive alternative variation. A recursive alternative variation itself contains the head and tail pointers for a nested list of MTIs.
mathmoi
Posts: 286
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: Chess game data structure in C

Post by mathmoi »

sje wrote:I would treat all PGN tags the same way and so not have special storage treatment for the basic seven tags.
The reason I did it this way is to speed things up when we're looking up who are the players for example. If the information is included in the linked list you would need to scan it. Of course, in C++ or any higher level language I would have used some sort of dictionary.
For the PGN move text, I would have a two-way linked list of MTIs (Move Text Items). An MTI can be a move, a NAG, a Game Termination Mark, or a recursive alternative variation. A recursive alternative variation itself contains the head and tail pointers for a nested list of MTIs.
This would solve some of my problem at the cost of some kind of pseudo-heritage. Some people advise me not to do this because it was too "C++ish". I'm not sure how I can implement this elegantly in C, but I'll look into it.

Thanks for you inputs
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Chess game data structure in C

Post by sje »

mathmoi wrote:
sje wrote:I would treat all PGN tags the same way and so not have special storage treatment for the basic seven tags.
The reason I did it this way is to speed things up when we're looking up who are the players for example. If the information is included in the linked list you would need to scan it. Of course, in C++ or any higher level language I would have used some sort of dictionary.
For the PGN move text, I would have a two-way linked list of MTIs (Move Text Items). An MTI can be a move, a NAG, a Game Termination Mark, or a recursive alternative variation. A recursive alternative variation itself contains the head and tail pointers for a nested list of MTIs.
This would solve some of my problem at the cost of some kind of pseudo-heritage. Some people advise me not to do this because it was too "C++ish". I'm not sure how I can implement this elegantly in C, but I'll look into it.
"Things (members of the PGN tag names set) that are the same should be treated the same." For tag names, you should use a simple dictionary and have a game's tag pair names point to dictionary entries. Since pointers are smaller than (most) strings, it is also a space saver.

For efficient construction of MTI items, consider using C unions.
mathmoi
Posts: 286
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: Chess game data structure in C

Post by mathmoi »

sje wrote:
mathmoi wrote:
sje wrote:I would treat all PGN tags the same way and so not have special storage treatment for the basic seven tags.
The reason I did it this way is to speed things up when we're looking up who are the players for example. If the information is included in the linked list you would need to scan it. Of course, in C++ or any higher level language I would have used some sort of dictionary.
For the PGN move text, I would have a two-way linked list of MTIs (Move Text Items). An MTI can be a move, a NAG, a Game Termination Mark, or a recursive alternative variation. A recursive alternative variation itself contains the head and tail pointers for a nested list of MTIs.
This would solve some of my problem at the cost of some kind of pseudo-heritage. Some people advise me not to do this because it was too "C++ish". I'm not sure how I can implement this elegantly in C, but I'll look into it.
"Things (members of the PGN tag names set) that are the same should be treated the same." For tag names, you should use a simple dictionary and have a game's tag pair names point to dictionary entries. Since pointers are smaller than (most) strings, it is also a space saver.

For efficient construction of MTI items, consider using C unions.
Hi Steven,

Theses are great ideas, exactly the kind of feedback/brainstorming I was looking for. I plan on following your two recommendations : treating all tags the same and using unions to implement pseudo heritage of MTIs.

Thanks
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Chess game data structure in C

Post by Sven »

mathmoi wrote:I'm currently working on a PGN parser in C, a language I don't have much experience in. The reason I'm writing it in C is because I might want to open source it in the future if I can get proud enough of my code and it will be more useful to others if it can be integrated in C programs than if it's written in C++.
Hi Mathieu,

why don't you implement it in C++, which is presumably the language you are more familiar with, and offer an additional C interface for the (IMO rare) case that people want to use your code within a C program? It should be possible to hide the whole C++ implementation behind a C interface in your case.

That way you could combine your existing C++ experience with the "integration in C programs" requirement while avoiding all the mess with C unions and "simulating inheritance".

Sven
mathmoi
Posts: 286
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: Chess game data structure in C

Post by mathmoi »

Sven Schüle wrote:
mathmoi wrote:I'm currently working on a PGN parser in C, a language I don't have much experience in. The reason I'm writing it in C is because I might want to open source it in the future if I can get proud enough of my code and it will be more useful to others if it can be integrated in C programs than if it's written in C++.
Hi Mathieu,

why don't you implement it in C++, which is presumably the language you are more familiar with, and offer an additional C interface for the (IMO rare) case that people want to use your code within a C program? It should be possible to hide the whole C++ implementation behind a C interface in your case.

That way you could combine your existing C++ experience with the "integration in C programs" requirement while avoiding all the mess with C unions and "simulating inheritance".

Sven
Hi Sven,

My rational for choosing C was that most chess programming is done in C instead of C++. Also it's easier to use a C library in C++ than writing a C wrapper around C++ objects.

Even if I did write it in C++ and offer a C interface to access it from C programs I would still have to deal with the "simulated inheritance" to pass the game back to C programs.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Chess game data structure in C

Post by lucasart »

mathmoi wrote:
Sven Schüle wrote:
mathmoi wrote:I'm currently working on a PGN parser in C, a language I don't have much experience in. The reason I'm writing it in C is because I might want to open source it in the future if I can get proud enough of my code and it will be more useful to others if it can be integrated in C programs than if it's written in C++.
Hi Mathieu,

why don't you implement it in C++, which is presumably the language you are more familiar with, and offer an additional C interface for the (IMO rare) case that people want to use your code within a C program? It should be possible to hide the whole C++ implementation behind a C interface in your case.

That way you could combine your existing C++ experience with the "integration in C programs" requirement while avoiding all the mess with C unions and "simulating inheritance".

Sven
Hi Sven,

My rational for choosing C was that most chess programming is done in C instead of C++. Also it's easier to use a C library in C++ than writing a C wrapper around C++ objects.

Even if I did write it in C++ and offer a C interface to access it from C programs I would still have to deal with the "simulated inheritance" to pass the game back to C programs.
If you code in C, you'll be missing on all the joy and fun that is C++ programming:
http://yosefk.com/c++fqa/defective.html
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Chess game data structure in C

Post by Sven »

lucasart wrote:If you code in C, you'll be missing on all the joy and fun that is C++ programming:
http://yosefk.com/c++fqa/defective.html
Hi Lucas,

since discussing your remark and the link you gave would miss the subject of this thread and would start another "language war" instead, I'd suggest that you open another thread "Defects of C++ from a C++ hater's viewpoint" :lol:

In the meantime, please study Stockfish as an excellent example for being successful with all those defects.

Sven
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: Chess game data structure in C

Post by Harald »

Hi

The code below is not in C or C++ but in Python.
But it has the data structures that I used to parse one or many PGNs in a file.
Then I build a big game tree from it counting win, draw and loss.
That was an attempt to get an opening database or general statistics.
I did a lot of work to recognize all the comments in a PGN and assign them
to the right object. I could also post the whole code here if anybody wants to see it.
You may look at my basic data structures:

Code: Select all

class Board(object):
    """This is the chess board information,
    including the pieces on board, the side to move, castling flags,
    en passant square, half move count for 50 move rule, full move number.
    More info like ply or king positions is added, most of it
    is search related.
    """
    
    opening_board = [  4,  2,  3,  6,  5,  3,  2,  4,
                       1,  1,  1,  1,  1,  1,  1,  1,
                       0,  0,  0,  0,  0,  0,  0,  0,
                       0,  0,  0,  0,  0,  0,  0,  0,
                       0,  0,  0,  0,  0,  0,  0,  0,
                       0,  0,  0,  0,  0,  0,  0,  0,
                      -1, -1, -1, -1, -1, -1, -1, -1,
                      -4, -2, -3, -6, -5, -3, -2, -4
                    ]

    def __init__(self):
        """Build the opening board."""
        #self.set_fen('rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1')    
        self.board = list(Board.opening_board)
        self.color = white
        self.castle = woo | wooo | boo | booo
        self.ep = 0
        self.move50 = 0
        self.full_move_number = 1
        self.wking_pos = E1
        self.bking_pos = E8
        self.ply = 0

class MoveInfo(object):
    """A move info is a bundle of a full move number, color, move, annotation,
    comment and a list of variants. Number, color and move are mandatory,
    the other infos may have empty/null default values.
    With the move infos inside the optional list of variants the move info
    is a recursive data structure.
    """

    def __init__(self):
        """Initialise the move info.
        """
        self.number = 0       # full move number
        self.color = white    # white or black move
        self.move = None      # Move object or None
        self.annotation = 0   # see PGN specification
        self.comment = ''
        self.variants = []    # list of variants, each a MoveInfo list starting with other moves
        # The following attributes often found in comments are not standardized.
        # self.msec = 0       # used time for the move
        # self.depths = 0     # depths, iteration or number of plies reached while searching the move
        # self.value = 0.0    # score or value of move
        self.win = 0          # number of white wins
        self.draw = 0         # number of draws
        self.loss = 0         # number of black wins

class Variant(object):
    """A Variant is a list of move infos together with a comment.
    With the optional list of variants inside each move info the
    variant is a recursive data structure.
    """

    def __init__(self):
        """Initialise the variant variables.
        """
        self.comment = ''
        self.move_infos = []  # list of MoveInfo objects

class Game(object):
    """This is the chess game information,
    including the board, PGN infos and move list.
    """

    def __init__(self):
        """Initialise the game variables.
        """
        # Tags are a dict of tag strings and string values.
        # Required tags are: Event, Site, Date, Round, White, Black, Result.
        self.tags = { 'Event'  : '',
                      'Site'   : '',
                      'Date'   : '',
                      'Round'  : '',
                      'White'  : '',
                      'Black'  : '',
                      'Result' : ''
                    }
        self.tag_comments = { 'Event'  : '',
                              'Site'   : '',
                              'Date'   : '',
                              'Round'  : '',
                              'White'  : '',
                              'Black'  : '',
                              'Result' : ''
                            }
        # The game needs an initial board, normally the opening position.
        # It can be changed with the FEN tag.
        self.board = Board()
        # The moves of a game are stored in a variant.
        self.moves = Variant()
        # The result is one of *, 1-0, 0-1, 1/2-1/2. It must match the result tag.
        self.result = ''
        # The comments can be found before or after a PGN description of the game.
        self.pre_comment = ''
        self.post_comment = ''
Harald