As I was collecting games to build an opening book for my chess engine (in development), I thought "why not share my games with everyone" ? And then it became a project of its own!
Well, here's the very first release of the CGR database!
For those interested, the details are here :
https://chessgamesrepository.wordpress. ... 0219-full/
The first release of the CGR games database
Moderators: hgm, Rebel, chrisw
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
-
- Posts: 44
- Joined: Sun Aug 07, 2016 5:24 pm
Re: The first release of the CGR games database
thx for your great work. sadly the archiv ist corrupt.
-
- Posts: 1535
- Joined: Sun Oct 25, 2009 2:30 am
Re: The first release of the CGR games database
So, that's why the CCRL404 was messing up my DB! I had to filter it trough several programs, without actually knowing what was wrong with it. Good to know.It’s been known for a while, games from the CCRL (Computer Chess Rating Lists) use a “custom” round number and many chess database programs don’t seem to like it. Besides, importing CCRL games will often cause the famous “Round Name limit of 262143 exceeded” error in Scid or Scid vs PC. So I have decided to replace the round number in the CCRL games by the default value of “?”.
16? More like 12. I haven't read about CB or CA having a hardcoded limit, but I'm sure any of them will crash with enough games. Finally, using the Zip format isn't the best way to go.we will hit the 16 million games limit in Scid… For those who use other chess database software, are there similar limits? Do you want the database in multiple Zip files or just one Zip file?
Are you saying that the DB is already 206G big, but below 16 Million games?Finally, as a side note, the next release will probably be another FULL one as I have another 83G of PGN games ready ! I have kept the 206G that made it into this first release
-
- Posts: 395
- Joined: Fri Aug 12, 2016 8:43 pm
Re: The first release of the CGR games database
CCRL tags are like this:bstjean wrote: It’s been known for a while, games from the CCRL (Computer Chess Rating Lists) use a “custom” round number and many chess database programs don’t seem to like it. Besides, importing CCRL games will often cause the famous “Round Name limit of 262143 exceeded” error in Scid or Scid vs PC. So I have decided to replace the round number in the CCRL games by the default value of “?”. Does any one have a problem with this? Do you have any idea/suggestion/comment on this?
[Event "CCRL 40/40"]
[Round "529.2.403"]
and i believe the best things would be to change them to:
[Event "CCRL 40/40 - 529"]
[Round "2.403"]
if 529 is the tournament number.
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
Re: The first release of the CGR games database
Looks like the problem is on your end.
I downloaded the zip file myself, again, to test it and it works just fine. Besides, it's been downloaded 50+ times so far and I haven't received any comment nor email from anyone saying the archive was corrupted.
I'm using the 7-Zip software (on Windows) and it unzips fine it that helps.
I downloaded the zip file myself, again, to test it and it works just fine. Besides, it's been downloaded 50+ times so far and I haven't received any comment nor email from anyone saying the archive was corrupted.
I'm using the 7-Zip software (on Windows) and it unzips fine it that helps.
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
Re: The first release of the CGR games database
Looks like the problem is on your end.retep1 wrote:thx for your great work. sadly the archiv ist corrupt.
I downloaded the zip file myself, again, to test it and it works just fine. Besides, it's been downloaded 50+ times so far and I haven't received any comment nor email from anyone saying the archive was corrupted.
I'm using the 7-Zip software (on Windows) and it unzips fine it that helps.
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
Re: The first release of the CGR games database
I'm just processing PGN files the way they were produced! Right now, doing that kind of "magic" is not an option but that's definitely doable in a not-so-distant future!Fulvio wrote:CCRL tags are like this:bstjean wrote: It’s been known for a while, games from the CCRL (Computer Chess Rating Lists) use a “custom” round number and many chess database programs don’t seem to like it. Besides, importing CCRL games will often cause the famous “Round Name limit of 262143 exceeded” error in Scid or Scid vs PC. So I have decided to replace the round number in the CCRL games by the default value of “?”. Does any one have a problem with this? Do you have any idea/suggestion/comment on this?
[Event "CCRL 40/40"]
[Round "529.2.403"]
and i believe the best things would be to change them to:
[Event "CCRL 40/40 - 529"]
[Round "2.403"]
if 529 is the tournament number.
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
Re: The first release of the CGR games database
No! I'm saying I have downloaded 206G of PGN games and this database was built from that. Obviously, it looks like everyone has MANY games in common!!Are you saying that the DB is already 206G big, but below 16 Million games?
-
- Posts: 338
- Joined: Tue Mar 13, 2012 9:59 pm
- Location: Germany
Re: The first release of the CGR games database
A very weird ZIP format but I managed to unzip it (using 7zip). It just took an unnecessary high amount of time.
Have you considered providing the database in SCID format? It occupies 100 MB less space than the zipped PGN file.
As for the database: nice stuff. Thank you.
Just for your information: There are at least 431000 doublets in this database.
Have you considered providing the database in SCID format? It occupies 100 MB less space than the zipped PGN file.
As for the database: nice stuff. Thank you.
Just for your information: There are at least 431000 doublets in this database.
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
Re: The first release of the CGR games database
1) I wasn't sure the sudden burst of downloads wouldn't cause problems so I zipped the file with the maximum compression I could find (see my post on the blog regarding this)styx wrote:A very weird ZIP format but I managed to unzip it (using 7zip). It just took an unnecessary high amount of time.
Have you considered providing the database in SCID format? It occupies 100 MB less space than the zipped PGN file.
As for the database: nice stuff. Thank you.
Just for your information: There are at least 431000 doublets in this database.
2) For now, I will stick to the PGN format as not eveyone uses Scid. The goal is to provide a quality database to *everyone*, not only Scid users! And the portability of the PGN format is currently the best solution!
3) Thanks for the info. But which options did you use to detect those? I made 5 passes of "twin checks" and, obviously, I missed some! I'm interested to know how you detected those duplicates!
4) Next release, I will probably provide 5 zip files, one per ECO classification.
Thanks!