The first release of the CGR games database

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: The first release of the CGR games database

Post by MikeB »

jdart wrote:A few comments: I am sure this took a lot of effort and contributing it to the public is a fine idea. But there are some issues:

1. I generally don't use computer match games played without book or with a limited-depth book. CCRL for example uses a limited book - the lines change from time to time but the same lines are repeated many times. They are not necessarily the best opening moves. Also many engines will play a suboptimal move in the first few moves out of book.

2. Quite a few games, especially correspondence games and those on Playchess, are lost by forfeit, so they have a result but that result is not a good guide to how well the players were doing when the game was terminated. For example, White might have been winning but overstepped the time limit and lost. I have a filter program that will weed these games out, since my book building program considers game results.

3. I mostly don't use blitz games due to the higher rate of errors in blitz play. I even don't use computer blitz games, although there is less reason to avoid those now since with today's search speeds, engines are reaching quite high depths even in blitz. Still I started weeding them out when engines were much slower.

4. While there are a lot of free game collections by human players available on the Internet, the quality of these is often low. The games themselves sometimes have errors, but the metadata (Event/Site/Date/Round) very frequently is wrong/incomplete or has things like round in the Site tag. I think some of these were pirated from Chessbase in the early days and auto-converted out of their format by buggy software. TWIC though is a very good source and doesn't have these issues.

--Jon
+1 Jon is right on the mark with all of his comments. You really do not want to use CCRL games for building a quality book. There's a python script on Github (Ted Wong) that will auto download all of the TWIC back issues with a little tweaking to a around issue ~1120. For a modest fee, Mark Crowther will let you donate to TWIC and provide issues 1 through 1120. Very worthwhile for book building for a very modest investment and it help keeps TWIC going

http://theweekinchess.com

https://github.com/student-t/BuildPGN

https://github.com/robwheeler/WYCC2015

Edit: Might have gotten Ted's script mixed up with somebody's else, so I added the 3rd link - they all can come in handy.