The first release of the CGR games database

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

bstjean
Posts: 19
Joined: Sat Oct 08, 2016 10:10 pm
Location: Montreal
Full name: Benoît St-Jean

The first release of the CGR games database

Post by bstjean »

As I was collecting games to build an opening book for my chess engine (in development), I thought "why not share my games with everyone" ? And then it became a project of its own!

Well, here's the very first release of the CGR database!

For those interested, the details are here :

https://chessgamesrepository.wordpress. ... 0219-full/
retep1
Posts: 44
Joined: Sun Aug 07, 2016 5:24 pm

Re: The first release of the CGR games database

Post by retep1 »

thx for your great work. sadly the archiv ist corrupt.
User avatar
Ozymandias
Posts: 1532
Joined: Sun Oct 25, 2009 2:30 am

Re: The first release of the CGR games database

Post by Ozymandias »

It’s been known for a while, games from the CCRL (Computer Chess Rating Lists) use a “custom” round number and many chess database programs don’t seem to like it. Besides, importing CCRL games will often cause the famous “Round Name limit of 262143 exceeded” error in Scid or Scid vs PC. So I have decided to replace the round number in the CCRL games by the default value of “?”.
So, that's why the CCRL404 was messing up my DB! I had to filter it trough several programs, without actually knowing what was wrong with it. Good to know.
we will hit the 16 million games limit in Scid… For those who use other chess database software, are there similar limits? Do you want the database in multiple Zip files or just one Zip file?
16? More like 12. I haven't read about CB or CA having a hardcoded limit, but I'm sure any of them will crash with enough games. Finally, using the Zip format isn't the best way to go.
Finally, as a side note, the next release will probably be another FULL one as I have another 83G of PGN games ready ! I have kept the 206G that made it into this first release
Are you saying that the DB is already 206G big, but below 16 Million games?
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: The first release of the CGR games database

Post by Fulvio »

bstjean wrote: It’s been known for a while, games from the CCRL (Computer Chess Rating Lists) use a “custom” round number and many chess database programs don’t seem to like it. Besides, importing CCRL games will often cause the famous “Round Name limit of 262143 exceeded” error in Scid or Scid vs PC. So I have decided to replace the round number in the CCRL games by the default value of “?”. Does any one have a problem with this? Do you have any idea/suggestion/comment on this?
CCRL tags are like this:
[Event "CCRL 40/40"]
[Round "529.2.403"]
and i believe the best things would be to change them to:
[Event "CCRL 40/40 - 529"]
[Round "2.403"]
if 529 is the tournament number.
bstjean
Posts: 19
Joined: Sat Oct 08, 2016 10:10 pm
Location: Montreal
Full name: Benoît St-Jean

Re: The first release of the CGR games database

Post by bstjean »

Looks like the problem is on your end.

I downloaded the zip file myself, again, to test it and it works just fine. Besides, it's been downloaded 50+ times so far and I haven't received any comment nor email from anyone saying the archive was corrupted.

I'm using the 7-Zip software (on Windows) and it unzips fine it that helps.
bstjean
Posts: 19
Joined: Sat Oct 08, 2016 10:10 pm
Location: Montreal
Full name: Benoît St-Jean

Re: The first release of the CGR games database

Post by bstjean »

retep1 wrote:thx for your great work. sadly the archiv ist corrupt.
Looks like the problem is on your end.

I downloaded the zip file myself, again, to test it and it works just fine. Besides, it's been downloaded 50+ times so far and I haven't received any comment nor email from anyone saying the archive was corrupted.

I'm using the 7-Zip software (on Windows) and it unzips fine it that helps.
bstjean
Posts: 19
Joined: Sat Oct 08, 2016 10:10 pm
Location: Montreal
Full name: Benoît St-Jean

Re: The first release of the CGR games database

Post by bstjean »

Fulvio wrote:
bstjean wrote: It’s been known for a while, games from the CCRL (Computer Chess Rating Lists) use a “custom” round number and many chess database programs don’t seem to like it. Besides, importing CCRL games will often cause the famous “Round Name limit of 262143 exceeded” error in Scid or Scid vs PC. So I have decided to replace the round number in the CCRL games by the default value of “?”. Does any one have a problem with this? Do you have any idea/suggestion/comment on this?
CCRL tags are like this:
[Event "CCRL 40/40"]
[Round "529.2.403"]
and i believe the best things would be to change them to:
[Event "CCRL 40/40 - 529"]
[Round "2.403"]
if 529 is the tournament number.
I'm just processing PGN files the way they were produced! Right now, doing that kind of "magic" is not an option but that's definitely doable in a not-so-distant future!
bstjean
Posts: 19
Joined: Sat Oct 08, 2016 10:10 pm
Location: Montreal
Full name: Benoît St-Jean

Re: The first release of the CGR games database

Post by bstjean »

Are you saying that the DB is already 206G big, but below 16 Million games?
No! I'm saying I have downloaded 206G of PGN games and this database was built from that. Obviously, it looks like everyone has MANY games in common!!
styx
Posts: 338
Joined: Tue Mar 13, 2012 9:59 pm
Location: Germany

Re: The first release of the CGR games database

Post by styx »

A very weird ZIP format but I managed to unzip it (using 7zip). It just took an unnecessary high amount of time.
Have you considered providing the database in SCID format? It occupies 100 MB less space than the zipped PGN file.

As for the database: nice stuff. Thank you.

Just for your information: There are at least 431000 doublets in this database.
bstjean
Posts: 19
Joined: Sat Oct 08, 2016 10:10 pm
Location: Montreal
Full name: Benoît St-Jean

Re: The first release of the CGR games database

Post by bstjean »

styx wrote:A very weird ZIP format but I managed to unzip it (using 7zip). It just took an unnecessary high amount of time.
Have you considered providing the database in SCID format? It occupies 100 MB less space than the zipped PGN file.

As for the database: nice stuff. Thank you.

Just for your information: There are at least 431000 doublets in this database.
1) I wasn't sure the sudden burst of downloads wouldn't cause problems so I zipped the file with the maximum compression I could find (see my post on the blog regarding this)
2) For now, I will stick to the PGN format as not eveyone uses Scid. The goal is to provide a quality database to *everyone*, not only Scid users! And the portability of the PGN format is currently the best solution!
3) Thanks for the info. But which options did you use to detect those? I made 5 passes of "twin checks" and, obviously, I missed some! I'm interested to know how you detected those duplicates!
4) Next release, I will probably provide 5 zip files, one per ECO classification.

Thanks!