New Interchange Protocol / Alternative to PGN

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
gcramer
Posts: 40
Joined: Mon Oct 28, 2013 11:21 pm
Location: Bad Homburg, Germany

New Interchange Protocol / Alternative to PGN

Post by gcramer »

It's time for a new interchange format. PGN is still a useful protocol, but not sufficient for the interchange of data between modern chess applications. So a new format is available, called C/CIF. This format is still a draft, but quite near to a final version.

C/CIF provides both a XML format and a more compact binary format. But have a closer look on the homepage of C/CIF about the format definition and the features.

Probably some points in the documentation of C/CIF are not yet clear, please post a note.

Suggestions, improvements and corrections are very welcome.
User avatar
hgm
Posts: 27817
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: New Interchange Protocol / Alternative to PGN

Post by hgm »

Error 503 Service Unavailable

Service Unavailable
Guru Meditation:

XID: 876334965
User avatar
gcramer
Posts: 40
Joined: Mon Oct 28, 2013 11:21 pm
Location: Bad Homburg, Germany

Re: New Interchange Protocol / Alternative to PGN

Post by gcramer »

hgm wrote:Error 503 Service Unavailable

Service Unavailable
Guru Meditation:

XID: 876334965
The hoster Sourceforge has sometimes big performance problems, especially today, please try the link again.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: New Interchange Protocol / Alternative to PGN

Post by stegemma »

I don't like very much XML but it is a standard, so the choice to use XML is good, for me.

If the format would be universal, you should consider multidimensional chessboard (3D) and maybe other board games, similar to chess.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: New Interchange Protocol / Alternative to PGN

Post by lucasart »

This new XML format looks horrible. Far worse than PGN.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
gcramer
Posts: 40
Joined: Mon Oct 28, 2013 11:21 pm
Location: Bad Homburg, Germany

Re: New Interchange Protocol / Alternative to PGN

Post by gcramer »

Thanks for your response.
stegemma wrote:I don't like very much XML but it is a standard, so the choice to use XML is good, for me.
The text format XML for CIF is only the second choice, the first choice is the binary format CCIF. Conversion from CIF to CCIF and vice versa is a quite simple mapping (but with the use of dictionaries). Another advantage of XML: very simple to parse. I've implemented a PGN parser, I know that parsing C/CIF is a cakewalk, compared to PGN.
stegemma wrote:If the format would be universal, you should consider multidimensional chessboard (3D) and maybe other board games, similar to chess.
I don't like to define an universal format. The disadvantage of an universal format is that the definition and usage will be more complicated, and the goal of C/CIF is to be simple. IMO for multidimensional chess, or other board games, it seems to be appropriate to define a specific format for these games, a specialized format is more easy to handle. C/CIF is in fact quite elaborated, supporting all chess variants, and all board sizes. But this has been achieved without much complexity, because the chess variants are quite close to standard chess, only a few additional attributes are added for this support. Only Bughouse has required a small extension of the move section, but Bughouse is an important chess variant. It would be something else if the support of 3D chess can be done without increasing complexity, but I don't know 3D chess, so I need a submission of an updated format (the differences) for a final decision. I don't think that other board games do need C/CIF, but C/CIF could be a model for other formats, C/CIF is the result of about eight years experience with the development of a chess database.

Shortly summarized: C/CIF isn't universal.
User avatar
gcramer
Posts: 40
Joined: Mon Oct 28, 2013 11:21 pm
Location: Bad Homburg, Germany

Re: New Interchange Protocol / Alternative to PGN

Post by gcramer »

lucasart wrote:This new XML format looks horrible. Far worse than PGN.
Please do not compare the look and feel of PGN with C/CIF. PGN is designed to be human readable, but C/CIF is designed for a loss-free transfer of complex chess archives, PGN does not fit this goal at all. How can this format be far worse than PGN?

Furthermore please keep in mind that the human readable XML is not the primary format, the binary format CCIF is primary, and this format is not human readable at all, but quite compact, and supporting all the features of a modern chess application. Conversion between CIF (XML format) and CCIF (binary format) is quite simple. The XML format CIF is only sugar, to have a human readable format, it's a 1:1 mapping of the binary format.

Another point: it's absolutely impossible to transfer chess games from ChessBase to Scidb or vice versa with PGN, even if no additional data, like documents or videos, are involved. PGN does not know about the existence of the various languages in the world, and PGN does only know the existence of post commentaries and NAGs. But ChessBase and Scidb are quite more elaborated than PGN can support. In general a chess game is ruined with the transfer via PGN between ChessBase and Scidb.

Shortly summarized: C/CIF is not a replacement or successor of PGN, it's an additional format. For the transfer of plain games PGN is the primary format. But for the transfer of complex archives PGN isn't usable at all.
User avatar
hgm
Posts: 27817
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: New Interchange Protocol / Alternative to PGN

Post by hgm »

OK, I haven't had time to read the entire description yet, but already some initial remarks:

* For computers SAN is a pain: it cannot be parsed without having the context of a current position, and knowledge of the rules. A notation which explicitly mentions the from-square, like CAN, avoids all that. For humans, however, SAN is much more convenient. (I guess that is why it became standard.) So I wonder if supporting a text format using CAN is not 'serving two masters', and in practice will give you the worst of both worlds. When targeting computers it seems best to use a binary format with CAN, and when targeting humans text + SAN seems to be most convenient.

* You mention that parsing PGN for arbitrary variants is impossible. But the "VariantMen" PGN tag which the latest XBoard (4.8.0) uses for non-standard variants is supposed to solve that (for variants that are not extremely weird). It is supposed to supply the rule knowledge on piece movement needed to decode the SAN.

* The 8000 lines used by scidb for PGN parsing is much more than you typically need (probably for efficiency reasons). The XBoard low-level parser (lexical scanner, actually) which does not just recognize SAN but also Shogi notation is only 633 lines, and the Betza-driven move generator (needed to decode SAN) is only 230 lines. XBoard's LoadGame routine, which does the higlievel parsing is 470 lines, but a large part of that is used for locating a game in a multi-game PGN file (which involves lots of code duplication from the actual reading). So I guess 1000-1500 lines would in general be enough to have a PGN parser supporting multiple move notation standards.

* How does your CAN handle multi-leg moves, like a Lion double capture in Chu Shogi (or Mighty-Lion Chess, e.g. Le2xe3xf3).
User avatar
gcramer
Posts: 40
Joined: Mon Oct 28, 2013 11:21 pm
Location: Bad Homburg, Germany

Re: New Interchange Protocol / Alternative to PGN

Post by gcramer »

hgm wrote:* For computers SAN is a pain: it cannot be parsed without having the context of a current position, and knowledge of the rules. A notation which explicitly mentions the from-square, like CAN, avoids all that. For humans, however, SAN is much more convenient. (I guess that is why it became standard.) So I wonder if supporting a text format using CAN is not 'serving two masters', and in practice will give you the worst of both worlds. When targeting computers it seems best to use a binary format with CAN, and when targeting humans text + SAN seems to be most convenient.
Yes, I agree, for human SAN is much more readable, and in general even more compact than CAN (with XML), but it's not really the goal to have a human readable format, the XML format is a 1:1 mapping of the binary format. And XML is the readable format, so the specification is done with XML. Later the rules for the conversion between CIF (XML) and CCIF (binary) will be released. The readable format XML is the basis for the definition. But I will keep in mind that probably a later version of CIF will also support SAN, I think that an additional attribute at the right place is sufficient for the recognition of the move notation.
hgm wrote:* You mention that parsing PGN for arbitrary variants is impossible. But the "VariantMen" PGN tag which the latest XBoard (4.8.0) uses for non-standard variants is supposed to solve that (for variants that are not extremely weird). It is supposed to supply the rule knowledge on piece movement needed to decode the SAN.
Many thanks for you hint, naturally I don't know all the chess programming techniques, and I will have a look on your latest XBoard release, probably C/CIF can provide a conversion tool for (nearly) all variants. But some severe problems are not yet solved. For example if you download a game from FICS it might contain illegal moves (FICS is allowing a castling move, even if the king has already castled; this is of course a bug). In C/CIF such a game MUST be flagged as a game containing an illegal move, an application might crash if it plays such a move. How to detect such things without specific chess logic, when converting from PGN to C/CIF? The PGN parser from Scidb is detecting illegal, and invalid moves.
hgm wrote:* The 8000 lines used by scidb for PGN parsing is much more than you typically need
Yes.
hgm wrote:(probably for efficiency reasons).
No. have a look on the sources: db_pgn_reader.cpp, and db_reader.cpp. Scidb supports all the undocumented features used in PGN files, even the use of language dependent piece sets. Furthermore the parser of Scidb is giving error messages and warnings, is doing many conversions, and is validating all the data. The goal of C/CIF is that all these checks while reading the archive are not needed anymore.
hgm wrote:* How does your CAN handle multi-leg moves, like a Lion double capture in Chu Shogi (or Mighty-Lion Chess, e.g. Le2xe3xf3).
I'm not familiar with Chu Shogi or Mighty-Lion Chess, so I'm very pleased if you do a suggestion how to handle multi-leg moves.
User avatar
hgm
Posts: 27817
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: New Interchange Protocol / Alternative to PGN

Post by hgm »

gcramer wrote:Many thanks for you hint, naturally I don't know all the chess programming techniques, and I will have a look on your latest XBoard release, probably C/CIF can provide a conversion tool for (nearly) all variants. But some severe problems are not yet solved. For example if you download a game from FICS it might contain illegal moves (FICS is allowing a castling move, even if the king has already castled; this is of course a bug). In C/CIF such a game MUST be flagged as a game containing an illegal move, an application might crash if it plays such a move. How to detect such things without specific chess logic, when converting from PGN to C/CIF? The PGN parser from Scidb is detecting illegal, and invalid moves.
The idea of the VariantMen tag is that it would supply all the rule knowledge relating to piece movement. This should enable the SAN parser to do legality checking, and catch moves that were illegal according to the rules of the variant.
I'm not familiar with Chu Shogi or Mighty-Lion Chess, so I'm very pleased if you do a suggestion how to handle multi-leg moves.
In the protocol extension used in the WinBoard Alien Edition multi-leg moving is indicated by using the normal long-algebaric notation of WB protocol for each leg, and separate the legs by commas. This also captures multi-move variants like Marseillaise Chess, or non-standard castlings. For moves with a single piece that perform captures along the way, you could just concatenate all squares it visits.