You claim the (wh) and (bl) tags are useless and idiotic on the basis of redundancy. If it were just to indicate which side he is playing I would agree with you. However...
What these tags do is allow a player to be analyzed as 2 different entities. We can examine the player from each color. We can compute his Elos, examine his repetoire (ECO), see style differences like aggressiveness, and so on.
You would first have to standardize each player's name and make sure each player is not playing under multiple spellings of his name. This is not practical in a super-mega-database.
The first release of the CGR games database
Moderators: hgm, Rebel, chrisw
-
- Posts: 1056
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
Re: The first release of the CGR games database
That kind of filter can be applied in 2 seconds in any decent chess database! Specify the player, the color, you're done. Repeat for black.Norm Pollock wrote:You claim the (wh) and (bl) tags are useless and idiotic on the basis of redundancy. If it were just to indicate which side he is playing I would agree with you. However...
What these tags do is allow a player to be analyzed as 2 different entities. We can examine the player from each color. We can compute his Elos, examine his repetoire (ECO), see style differences like aggressiveness, and so on.
You would first have to standardize each player's name and make sure each player is not playing under multiple spellings of his name. This is not practical in a super-mega-database.
-
- Posts: 389
- Joined: Wed Sep 26, 2012 1:29 pm
- Location: Hungary
Re: The first release of the CGR games database
I'v tried to unzip it on Ubuntu 16.04 without success. Finally uncompressed on Windows, but it needed huge amount of time. I suggest to use default settings for .zip compression for next version!bstjean wrote:1) I wasn't sure the sudden burst of downloads wouldn't cause problems so I zipped the file with the maximum compression I could find (see my post on the blog regarding this)styx wrote:A very weird ZIP format but I managed to unzip it (using 7zip). It just took an unnecessary high amount of time.
Have you considered providing the database in SCID format? It occupies 100 MB less space than the zipped PGN file.
As for the database: nice stuff. Thank you.
Just for your information: There are at least 431000 doublets in this database.
2) For now, I will stick to the PGN format as not eveyone uses Scid. The goal is to provide a quality database to *everyone*, not only Scid users! And the portability of the PGN format is currently the best solution!
3) Thanks for the info. But which options did you use to detect those? I made 5 passes of "twin checks" and, obviously, I missed some! I'm interested to know how you detected those duplicates!
4) Next release, I will probably provide 5 zip files, one per ECO classification.
Thanks!
May I ask you why do you strip annotation/comments? It's one of the most useful part of some .pgn files IMO.
Another idea can be to split the database into ICS/computer games/OTB games parts instead of ECO.
Thx for your hard work!
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
Re: The first release of the CGR games database
[/quote]
I'v tried to unzip it on Ubuntu 16.04 without success. Finally uncompressed on Windows, but it needed huge amount of time. I suggest to use default settings for .zip compression for next version!
May I ask you why do you strip annotation/comments? It's one of the most useful part of some .pgn files IMO.
Another idea can be to split the database into ICS/computer games/OTB games parts instead of ECO.
Thx for your hard work![/quote]
1) So far, the time it takes to decompress the file and the compression method/format is THE major complain! It's well noted guys!
2) Because they take an awful lot of space. You could easily double the size of the files.
3) I was thinking of providing the WhiteIsComp/BlackIsComp tag (when it is not already there) in a future release so you could filter out games played by computers. As I said, I don't want to impose my choices on people. I'd rather give them everything while making sure they have all they need to tailor their copy of the database to their own needs!
I'v tried to unzip it on Ubuntu 16.04 without success. Finally uncompressed on Windows, but it needed huge amount of time. I suggest to use default settings for .zip compression for next version!
May I ask you why do you strip annotation/comments? It's one of the most useful part of some .pgn files IMO.
Another idea can be to split the database into ICS/computer games/OTB games parts instead of ECO.
Thx for your hard work![/quote]
1) So far, the time it takes to decompress the file and the compression method/format is THE major complain! It's well noted guys!
2) Because they take an awful lot of space. You could easily double the size of the files.
3) I was thinking of providing the WhiteIsComp/BlackIsComp tag (when it is not already there) in a future release so you could filter out games played by computers. As I said, I don't want to impose my choices on people. I'd rather give them everything while making sure they have all they need to tailor their copy of the database to their own needs!
-
- Posts: 389
- Joined: Wed Sep 26, 2012 1:29 pm
- Location: Hungary
Re: The first release of the CGR games database
2) I disagree here. Very few amount of games in .pgn files has annotations. This extra space doesn't count compared to full .pgn size.bstjean wrote: 2) Because they take an awful lot of space. You could easily double the size of the files.
3) I was thinking of providing the WhiteIsComp/BlackIsComp tag (when it is not already there) in a future release so you could filter out games played by computers. As I said, I don't want to impose my choices on people. I'd rather give them everything while making sure they have all they need to tailor their copy of the database to their own needs!
3) Good idea!
-
- Posts: 338
- Joined: Tue Mar 13, 2012 9:59 pm
- Location: Germany
Re: The first release of the CGR games database
Ubuntu: make sure you got the p7zip-full package installed and thengbtami wrote: I'v tried to unzip it on Ubuntu 16.04 without success. Finally uncompressed on Windows, but it needed huge amount of time. I suggest to use default settings for .zip compression for next version!
Code: Select all
7z e filename.zip
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
Re: The first release of the CGR games database
Anyway, keeping the comments/annotations just creates another problem. Often times, you'll find the same "historical" games annotated multiple times by multiple people. Do I keep Karpov's comments? Or Kortchnoi's annotations? Or Seirawan's comments? I would always find people who'd prefer one analysis over the other... Besides, most annotated games are also annotated on a lot of online sites if one absolutely needs comments. And nowadays, a LOT of people prefer having a chess engine running (and alerting you of blunders, better lines) as they go through the game...gbtami wrote:2) I disagree here. Very few amount of games in .pgn files has annotations. This extra space doesn't count compared to full .pgn size.bstjean wrote: 2) Because they take an awful lot of space. You could easily double the size of the files.
3) I was thinking of providing the WhiteIsComp/BlackIsComp tag (when it is not already there) in a future release so you could filter out games played by computers. As I said, I don't want to impose my choices on people. I'd rather give them everything while making sure they have all they need to tailor their copy of the database to their own needs!
3) Good idea!
-
- Posts: 389
- Joined: Wed Sep 26, 2012 1:29 pm
- Location: Hungary
Re: The first release of the CGR games database
Good idea, thx!styx wrote:Ubuntu: make sure you got the p7zip-full package installed and thengbtami wrote: I'v tried to unzip it on Ubuntu 16.04 without success. Finally uncompressed on Windows, but it needed huge amount of time. I suggest to use default settings for .zip compression for next version!
worksCode: Select all
7z e filename.zip
-
- Posts: 389
- Joined: Wed Sep 26, 2012 1:29 pm
- Location: Hungary
Re: The first release of the CGR games database
I don't see "another problem" here. If one game occurs with annotation by Karpov and by Kortchnoi too, I want to read both! This is not a problem but a huge value! Chessbase mega database has this feature. "The exclusive annotated database. Contains more than 6.8 millions games from 1560 to 2016 in the highest ChessBase quality standard. 70,000 games contain commentary from top players"bstjean wrote:Anyway, keeping the comments/annotations just creates another problem. Often times, you'll find the same "historical" games annotated multiple times by multiple people. Do I keep Karpov's comments? Or Kortchnoi's annotations? Or Seirawan's comments? I would always find people who'd prefer one analysis over the other... Besides, most annotated games are also annotated on a lot of online sites if one absolutely needs comments. And nowadays, a LOT of people prefer having a chess engine running (and alerting you of blunders, better lines) as they go through the game...gbtami wrote:2) I disagree here. Very few amount of games in .pgn files has annotations. This extra space doesn't count compared to full .pgn size.bstjean wrote: 2) Because they take an awful lot of space. You could easily double the size of the files.
3) I was thinking of providing the WhiteIsComp/BlackIsComp tag (when it is not already there) in a future release so you could filter out games played by computers. As I said, I don't want to impose my choices on people. I'd rather give them everything while making sure they have all they need to tailor their copy of the database to their own needs!
3) Good idea!
Why do you think it's a problem?
Analyzing engines in GUI will never give you insights like Karpov and other human giants give in hes annotations!
-
- Posts: 19
- Joined: Sat Oct 08, 2016 10:10 pm
- Location: Montreal
- Full name: Benoît St-Jean
Re: The first release of the CGR games database
Let's put it this way : I started with 206G of games and ended up with only 7G. There are **TONS** of games annotated by chess engines out there. We would end up with **TONS** of crap most people don't want. You'd be amazed to see how many times I've seen the Karpov-Korchnoi 1974 match, second game (http://www.chessgames.com/perl/chessgame?gid=1067858) annotated by chess engines or even some John Doe! Besides, since annotations mostly originate from stuff that has been published and reproduced, there a copyright problem with this!gbtami wrote:I don't see "another problem" here. If one game occurs with annotation by Karpov and by Kortchnoi too, I want to read both! This is not a problem but a huge value! Chessbase mega database has this feature. "The exclusive annotated database. Contains more than 6.8 millions games from 1560 to 2016 in the highest ChessBase quality standard. 70,000 games contain commentary from top players"bstjean wrote:Anyway, keeping the comments/annotations just creates another problem. Often times, you'll find the same "historical" games annotated multiple times by multiple people. Do I keep Karpov's comments? Or Kortchnoi's annotations? Or Seirawan's comments? I would always find people who'd prefer one analysis over the other... Besides, most annotated games are also annotated on a lot of online sites if one absolutely needs comments. And nowadays, a LOT of people prefer having a chess engine running (and alerting you of blunders, better lines) as they go through the game...gbtami wrote:2) I disagree here. Very few amount of games in .pgn files has annotations. This extra space doesn't count compared to full .pgn size.bstjean wrote: 2) Because they take an awful lot of space. You could easily double the size of the files.
3) I was thinking of providing the WhiteIsComp/BlackIsComp tag (when it is not already there) in a future release so you could filter out games played by computers. As I said, I don't want to impose my choices on people. I'd rather give them everything while making sure they have all they need to tailor their copy of the database to their own needs!
3) Good idea!
Why do you think it's a problem?
Analyzing engines in GUI will never give you insights like Karpov and other human giants give in hes annotations!