+1 Jon is right on the mark with all of his comments. You really do not want to use CCRL games for building a quality book. There's a python script on Github (Ted Wong) that will auto download all of the TWIC back issues with a little tweaking to a around issue ~1120. For a modest fee, Mark Crowther will let you donate to TWIC and provide issues 1 through 1120. Very worthwhile for book building for a very modest investment and it help keeps TWIC goingjdart wrote:A few comments: I am sure this took a lot of effort and contributing it to the public is a fine idea. But there are some issues:
1. I generally don't use computer match games played without book or with a limited-depth book. CCRL for example uses a limited book - the lines change from time to time but the same lines are repeated many times. They are not necessarily the best opening moves. Also many engines will play a suboptimal move in the first few moves out of book.
2. Quite a few games, especially correspondence games and those on Playchess, are lost by forfeit, so they have a result but that result is not a good guide to how well the players were doing when the game was terminated. For example, White might have been winning but overstepped the time limit and lost. I have a filter program that will weed these games out, since my book building program considers game results.
3. I mostly don't use blitz games due to the higher rate of errors in blitz play. I even don't use computer blitz games, although there is less reason to avoid those now since with today's search speeds, engines are reaching quite high depths even in blitz. Still I started weeding them out when engines were much slower.
4. While there are a lot of free game collections by human players available on the Internet, the quality of these is often low. The games themselves sometimes have errors, but the metadata (Event/Site/Date/Round) very frequently is wrong/incomplete or has things like round in the Site tag. I think some of these were pirated from Chessbase in the early days and auto-converted out of their format by buggy software. TWIC though is a very good source and doesn't have these issues.
Edit: Might have gotten Ted's script mixed up with somebody's else, so I added the 3rd link - they all can come in handy.