Dirty & Malformed PGN Games Collection

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Steve Maughan
Posts: 1349
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Dirty & Malformed PGN Games Collection

Post by Steve Maughan »

I'm playing around with parsing PGN files. I've created a repository of different encodings and slightly (or not so slightly) malformed PGN games. Each file has three games: a normal games to start (in most cases); a game with an error; and finally normal game. Ideally a PGN parser should be able to either skip or correct the second game in each file, then recover and parse the third game.

https://github.com/stevemaughan/Dirty-P ... -PGN-Files

Hopefully it's useful for someone else other than me!

— Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
mar
Posts: 2871
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Dirty & Malformed PGN Games Collection

Post by mar »

it's cool to validate that both utf8 and latin-1 work fine, it's easy to tell a valid utf-8 (fortunately) and have a iso-latin-1 fallback
it's useful for sure since I'm writing a GUI at the moment

I don't recall the standard ever mentioning utf-16 however and you have it among the valid files.
utf-16 with bom should be easy enough to detect, without bom a bit trickier. but before I start doing this - is it a thing,
are there utf-16 encoded pgn files out there? I'd like to avoid wasting time on implementing something that's not useful at all,
just asking...