https://github.com/benini/pgn_to_parquet
As an example I converted the latest 1516 and 1515 archives from The Week In Chess and uploaded them to Github.
Why it is so cool?
1) It is possible to run SQL queries on the Parquet files!
2) It is possible to query the file remotely, without downloading the entire file!
3) It is possible to join multiple files into a single database, again without downloading them!
4) It is possible to output the queries in multiple file formats: json, csv, markdown, etc...
Example using DuckDB:
Code: Select all
CREATE VIEW chess_games AS SELECT * FROM read_parquet(
['https://github.com/benini/pgn_to_parquet/raw/main/twic1515.parquet',
'https://github.com/benini/pgn_to_parquet/raw/main/twic1516.parquet']);
.mode markdown
SELECT White,Black,Result,Date FROM chess_games WHERE White ilike '%vachier%';
Code: Select all
| White | Black | Result | Date |
|-------------------|------------------|---------|---------------------|
| Vachier Lagrave,M | Caruana,F | 1/2-1/2 | 2023-11-14 00:00:00 |
| Vachier Lagrave,M | Nepomniachtchi,I | 1/2-1/2 | 2023-11-15 00:00:00 |
| Vachier Lagrave,M | Robson,R | 1/2-1/2 | 2023-11-16 00:00:00 |
| Vachier Lagrave,M | Sevian,Samuel | 1/2-1/2 | 2023-11-16 00:00:00 |
| Vachier Lagrave,M | Nepomniachtchi,I | 0-1 | 2023-11-17 00:00:00 |
| Vachier Lagrave,M | Giri,A | 1-0 | 2023-11-17 00:00:00 |
| Vachier Lagrave,M | So,W | 1-0 | 2023-11-17 00:00:00 |
| Vachier Lagrave,M | Caruana,F | 1/2-1/2 | 2023-11-17 00:00:00 |
| Vachier Lagrave,M | Le,Quang Liem | 0-1 | 2023-11-18 00:00:00 |
| Vachier Lagrave,M | Firouzja,Alireza | 1-0 | 2023-11-18 00:00:00 |
| Vachier Lagrave,M | Robson,R | 1/2-1/2 | 2023-11-18 00:00:00 |
| Vachier Lagrave,M | Xiong,Jeffery | 1/2-1/2 | 2023-11-18 00:00:00 |
| Vachier Lagrave,M | Sevian,Samuel | 1/2-1/2 | 2023-11-18 00:00:00 |
| Vachier Lagrave,M | Aronian,L | 1/2-1/2 | 2023-11-21 00:00:00 |
| Vachier Lagrave,M | Caruana,F | 1/2-1/2 | 2023-11-23 00:00:00 |
| Vachier Lagrave,M | Nepomniachtchi,I | 1/2-1/2 | 2023-11-25 00:00:00 |