JGN: A PGN Replacement

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28354
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: JGN: A PGN Replacement

Post by hgm »

I am not sure about this 'unclear specification' of PGN. What exactly is unclear here?
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: JGN: A PGN Replacement

Post by Harald »

HGM wrote:
I am not sure about this 'unclear specification' of PGN. What exactly is unclear here?
It was mentioned in the thread that there are small differences in implementations to read and write PGN files.
These format interpretations and extensions should not be possible in a new machine readable chess game format.
User avatar
hgm
Posts: 28354
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: JGN: A PGN Replacement

Post by hgm »

Implementation differences is not the same as unclear specification. They can originate from implementations not sticking to the specification. This can happen with any specification, and becomes more likely when the format becomes popular enough to get many implementations.
User avatar
Steve Maughan
Posts: 1277
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: JGN: A PGN Replacement

Post by Steve Maughan »

hgm wrote: Thu Nov 11, 2021 9:29 am
flok wrote: Wed Nov 10, 2021 9:03 pm
"binary formats are superior" is spoken as a non-developer

Yes they may be faster to parse but in this day and age it is more important that it is relatively easy to implement a processor and debug that code. Not fast enough? Buy a faster computer!
Why do you think speed is an issue here? Binary formats are superior in all ways. Most important is of course that they are easier to process.
My mapping program (AlignMix) switched from a binary format to a JSON file format — it's been revolutionary. The main reason for the switch was JSON enables forward as well as backward file compatibility i.e. AlignMix 2019 is able to read files created by AlignMix 2022 — this is HUGE! It can be easily accomplished by using the dictionary like features of JSON. For example, AlignMix 2019 might request the "View" object. If it doesn't exist it uses a default value, if it does exist it reads it in. There could be other objects in the file dictionary but AlignMix 2019 just skips over them. Using this approach I can develop and add new features without breaking the file format, which frees up the development process tremendously. I know this can in theory be accomplished with binary file format but it's much more difficult — you need to record the length of each object in the file so a skip is possible.

Anyway, IMHO JSON can be extremely useful and, for me at least, superior to binary file format.

Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: JGN: A PGN Replacement

Post by Sopel »

hgm wrote: Thu Nov 11, 2021 10:23 am I am not sure about this 'unclear specification' of PGN. What exactly is unclear here?
From the spec at http://www.saremba.de/chessgml/standard ... mplete.htm
There are two formats in the PGN specification. These are the "import" format and the "export" format. These are the two different ways of formatting the same PGN data according to its source. The details of the two formats are described throughout the following sections of this document.

The import format is rather flexible and is used to describe data that may have been prepared by hand.

A program that can read PGN data should be able to handle the somewhat lax import
I know not a single implementation that supports the "import" format.
Export format should also be used for archival storage. Here, "archival" storage is defined as storage that may be accessed by a variety of computing systems. The only extra requirement for archival storage is that the newline character have a specific representation that is independent of its value for a particular computing system's text file usage. The archival representation of a newline is the ASCII control character LF (line feed, decimal value 10, hexadecimal value 0x0a).
What's this??? Anyone???
PGN data is represented using a subset of the eight bit ISO 8859/1 (Latin 1) character set.

Because some PGN users' environments may not support presentation of non-ASCII characters, PGN game authors should refrain from using such characters in critical commentary or string values in game data that may be referenced in such environments. PGN software authors should have their programs handle such environments by displaying a question mark ("?") for non-ASCII character codes.
Severely limiting, not sure if anyone actually follows that even. I've seen pgns with a wild selection of character encodings.
Only four of the ASCII control characters are permitted in PGN import format; these are the horizontal and vertical tabs
[...]
Tab characters, both horizontal and vertical, are not permitted in the export format.
???
Also, tab characters may not appear inside of string data.
Ah, yes, surely people are checking for this.
Import format PGN text lines are limited to having a maximum of 255 characters per line including the newline character.
That means what for the implementation?
SAN as presented in this document uses English language single character abbreviations for chess pieces, although this is easily changed in the source.
So is this a standard or what?
An alternative to SAN is FAN (Figurine Algebraic Notation). FAN uses miniature piece icons instead of single letter piece abbreviations. The two notations are otherwise identical.
Why is this mentioned? Do I need to support it?
The letter code for a pawn is not used for SAN moves in PGN export format movetext. However, some PGN import software disambiguation code may allow for the appearance of pawn letter codes.
So it's not needed but some software might require it?
Import format is somewhat more relaxed and it makes allowances for moves that do not conform exactly to the canonical format. However, these allowances may differ among different PGN reader programs. There are a number of suggested guidelines for use with implementing PGN reader software for permitting non-canonical SAN move representation. The idea is to have a PGN reader apply various transformations to attempt to discover the move that is represented by non-canonical input. Some suggested transformations include: letter case remapping, capture indicator insertion, check indicator insertion, and checkmate indicator insertion.
What does that mean? Is this within the standard or not? It says "import" format so so I guess it is? So what is standardized here actually?
When exported, a move suffix annotation is translated into the corresponding Numeric Annotation Glyph as described in a later section of this document. For example, if the single move symbol "Qxa8?" appears in an import format PGN movetext, it would be replaced with the two adjacent symbols "Qxa8 $2".
Hmm, yes.
Note: The specification for import/export representation of RAV elements needs further development.
Very nice standard
There is a standard sorting order for PGN games within a file. This collation is based on eight keys; these are the seven tag values of the STR and also the movetext itself.
I'm sure everyone does that. What a stupid thing to have in a standard.
English language piece names are used to define the letter set for identifying chesspieces in PGN movetext. However, authors of programs which are used only for local presentation or scanning of chess move data may find it convenient to use piece letter codes common in their locales. This is not a problem as long as PGN data that resides in archival storage or that is exchanged among programs still uses the SAN (English) piece letter codes: "PNBRQK".
It is a problem if you're trying to specify a standard... There is no way to tell which of these the game record uses.

tldr is that pretty much everyone implements only the "export" format and is happy ignoring the rest of the messy spec
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: JGN: A PGN Replacement

Post by Fulvio »

hgm wrote: Thu Nov 11, 2021 10:23 am I am not sure about this 'unclear specification' of PGN. What exactly is unclear here?
Some things are not specified in the standard, in particular the comments.
For example this is valid but confusing:

Code: Select all

1.e4 {[%eval 5,5][%clk 0:01:00]} 
{bad move!}1...h5 {[%eval +1.25][%clk 0:01:00]} {many} {comments} {do fold?} 
({better move!}1... c5{+0.05/5 Stock 14})
1-0
User avatar
hgm
Posts: 28354
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: JGN: A PGN Replacement

Post by hgm »

Doesn't WinBoard support the input format? It reads pretty much anything that remotely looks like a collection of Chess moves. You can even paste a HTML table of moves into it.

The standard is of course the export format. The import format is just a collection of suggestions for what deviations from the standard to expect. One can of course argue it is a bad thing to cater to non-compliance. But if that is your philosophy, you an simply ignore the PGN import format.

The same problem will apply to any format specification. A JGN implementation will also be confronted with incorrect input, and will have to deal with it somehow. Having it crash on any violation doesn't make it better than an export-format-only PGN implementation.
User avatar
hgm
Posts: 28354
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: JGN: A PGN Replacement

Post by hgm »

Fulvio wrote: Thu Nov 11, 2021 2:43 pm
hgm wrote: Thu Nov 11, 2021 10:23 am I am not sure about this 'unclear specification' of PGN. What exactly is unclear here?
Some things are not specified in the standard, in particular the comments.
For example this is valid but confusing:

Code: Select all

1.e4 {[%eval 5,5][%clk 0:01:00]} 
{bad move!}1...h5 {[%eval +1.25][%clk 0:01:00]} {many} {comments} {do fold?} 
({better move!}1... c5{+0.05/5 Stock 14})
1-0
Well, the standard says such additional information can be included as comments. If it is desirable to standardize the format for such 'meta data', a standard should be adopted for it. There isn't anything unclear here, just absent.
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: JGN: A PGN Replacement

Post by Fulvio »

Sopel wrote: Thu Nov 11, 2021 2:19 pm I know not a single implementation that supports the "import" format.
Export format should also be used for archival storage. Here, "archival" storage is defined as storage that may be accessed by a variety of computing systems. The only extra requirement for archival storage is that the newline character have a specific representation that is independent of its value for a particular computing system's text file usage. The archival representation of a newline is the ASCII control character LF (line feed, decimal value 10, hexadecimal value 0x0a).
What's this??? Anyone???
I believe all the implementations supports the "import" format.
Like all treatises since Euclid's elements, it begins with definitions:
"Import format" -> file that is imported into the program
"export format" -> file that is created by the program
"Archival storage" -> file that can be used portably between different computers and systems

The extra requirement says: a PGN file created manually in windows that uses "\r\n" for new lines is acceptable because it is regulated by the "import format".
A PGN file created by a program should always use "\n" for new lines to be portable.
User avatar
hgm
Posts: 28354
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: JGN: A PGN Replacement

Post by hgm »

Well, it seems clear the PGN specs in some areas exceed their jurisdiction. (Which is something entirely different from being unclear.) E.g. requiring a specific encoding of end-of-line basically makes it a binary format, wand would preclude the interchanged files to be readable on Windows. That seems a very bad idea. Banning the use of non-ascii is also problematic. OTOH, it avoids a real problem, namely that there is no generally accepted standard for encoding such characters, so that they might chage when files get exchanged between different locales. But since this is a general problem in file interchange it would have been wise to delegate dealing with this problem to the OS, and how that nadles locales.