Extreme PGN cleaner for html file

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

MikeGL
Posts: 1010
Joined: Thu Sep 01, 2011 2:49 pm

Extreme PGN cleaner for html file

Post by MikeGL »

Is there an extreme PGN cleaner that can scrape PGN games from an online html file?
I wanted to convert an html file containing a game into a pure PGN file,
not sure if this tool is already done somewhere.

As an example, I would like to download the game I have uploaded to the forum:
http://talkchess.com/forum/viewtopic.ph ... 25&t=64417

But If I stare at the source, it contains PGN that's encapsulated by html tags and some javascript stuff.
Image

Of course I can just click the html page one by one then "quote" it and copy-paste the PGN game,
but if there are so many pages like the link above, then this would take a couple of hours.
I prefer to do it the automated way, to just feed the html file as a parameter to the util then get an output file with a clean PGN.
Is there such a tool?

If anyone can point to URL and correct direction with a util or CLI tool that can do this, it would be appreciated.
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Extreme PGN cleaner for html file

Post by jdart »

There is a trick to download PGN from a post:

1. Start a reply to the post by clicking the "Quote" button in the upper left.
2. Highlight the part of the text that is pure PGN (outside the pgn markers).
3. Copy and paste into your favorite program/editor.

You can then abandon the reply (if you don't click Submit it will not be posted).

--Jon
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: Extreme PGN cleaner for html file

Post by CMCanavessi »

There's also this:
3- getting the best out of pgn4web:
When you have a pgn4web widget in a page, there are a few hidden functionalities you should know about:
- click on a8 and you'll get debug informations
- click on b8 and get the current FEN string and the PGN score
- click on c8 to get the PGN score of the current game only
- click on d8 to get the PGN scores of all the games in the collection
- clicking on e8 will pop up a window with a great functionality: an analysis board with a javascript version of Garbochess pondering the current position!
- clicking on g8 will bring pgn4web's shortcut squares' help page in a new window
- clicking on h8 will bring pgn4web's general help.

Click h8 to get some more interesting functionalities of pgn4web (there are many!)

Here are some specially useful ones:
- a1 : go to game start (Note that you can also click on any move in the list to reach this position)
- b1 : go to game end
Cheers !
MikeGL
Posts: 1010
Joined: Thu Sep 01, 2011 2:49 pm

Re: Extreme PGN cleaner for html file

Post by MikeGL »

CMCanavessi wrote:There's also this:
3- getting the best out of pgn4web:
When you have a pgn4web widget in a page, there are a few hidden functionalities you should know about:
- click on a8 and you'll get debug informations
- click on b8 and get the current FEN string and the PGN score
- click on c8 to get the PGN score of the current game only
- click on d8 to get the PGN scores of all the games in the collection
- clicking on e8 will pop up a window with a great functionality: an analysis board with a javascript version of Garbochess pondering the current position!
- clicking on g8 will bring pgn4web's shortcut squares' help page in a new window
- clicking on h8 will bring pgn4web's general help.

Click h8 to get some more interesting functionalities of pgn4web (there are many!)

Here are some specially useful ones:
- a1 : go to game start (Note that you can also click on any move in the list to reach this position)
- b1 : go to game end
Cheers !
Thanks for the tip, didn't know those features of pgn4web.
jdart wrote:There is a trick to download PGN from a post:

1. Start a reply to the post by clicking the "Quote" button in the upper left.
2. Highlight the part of the text that is pure PGN (outside the pgn markers).
3. Copy and paste into your favorite program/editor.

You can then abandon the reply (if you don't click Submit it will not be posted).

--Jon

Noted. Thanks.

Though there are some blogs with some random test positions too,
that I would like to backup on my disk to be used during free time and tactical training session.
I am not sure if scraping a PGN on other blog is legal but it's only for my private and personal use and not
going to be published by me on a copyrighted material (books, nor software)

Also, if pages are too many, like the thread in Faster Forced Mate, there are around a
dozen puzzles posted there with solutions, so if I click one by one, it would be tedious and
prone to error during my right-click copy/paste part of the transfer.

I have a similar PGN cleaner written in C but it was very basic which just strips comment
section of a PGN with no legality checking nor other bells and whistles like the one done
by PGN Extract or PGNTRIM5 and other powerful pgnutil.