I'm playing around with parsing PGN files. I've created a repository of different encodings and slightly (or not so slightly) malformed PGN games. Each file has three games: a normal games to start (in most cases); a game with an error; and finally normal game. Ideally a PGN parser should be able to either skip or correct the second game in each file, then recover and parse the third game.
https://github.com/stevemaughan/Dirty-P ... -PGN-Files
Hopefully it's useful for someone else other than me!
— Steve
Dirty & Malformed PGN Games Collection
Moderator: Ras
-
Steve Maughan
- Posts: 1349
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Dirty & Malformed PGN Games Collection
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
-
mar
- Posts: 2871
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Dirty & Malformed PGN Games Collection
it's cool to validate that both utf8 and latin-1 work fine, it's easy to tell a valid utf-8 (fortunately) and have a iso-latin-1 fallback
it's useful for sure since I'm writing a GUI at the moment
I don't recall the standard ever mentioning utf-16 however and you have it among the valid files.
utf-16 with bom should be easy enough to detect, without bom a bit trickier. but before I start doing this - is it a thing,
are there utf-16 encoded pgn files out there? I'd like to avoid wasting time on implementing something that's not useful at all,
just asking...
it's useful for sure since I'm writing a GUI at the moment
I don't recall the standard ever mentioning utf-16 however and you have it among the valid files.
utf-16 with bom should be easy enough to detect, without bom a bit trickier. but before I start doing this - is it a thing,
are there utf-16 encoded pgn files out there? I'd like to avoid wasting time on implementing something that's not useful at all,
just asking...