jshriver wrote:Checked and right now my raw data streams from fics over the past 3 years is 61gigs. So up for options on what people would like to have grabbed from it.
Wow. May I have then please:
all standard and blitz games,
min 10 moves, no dupes, no variant games
Sounds good, I can't sleep and working on fics2008 right now. Have to wait 15min in between downloads but should have everything done and uploaded by tomorrow sometime. Once it's up I'll post on the board the urls.
It's also payday with ram being cheap it's about time I upgrade this little box from 1gig to maybe 4 or 8. Should make crunching these datasets a lot quicker.
jshriver wrote:Checked and right now my raw data streams from fics over the past 3 years is 61gigs. So up for options on what people would like to have grabbed from it.
I'd like to see it run through pgn-extract, very short games removed, and split into a few large chunks. I'd prefer split by elo, but split into openings would be good also. Pgn-extract claims to do both.
jshriver wrote:Checked and right now my raw data streams from fics over the past 3 years is 61gigs. So up for options on what people would like to have grabbed from it.
I'd like to see it run through pgn-extract, very short games removed, and split into a few large chunks. I'd prefer split by elo, but split into openings would be good also. Pgn-extract claims to do both.
Hi Wes,
My step by step questions on :
How would you proceed with such huge amount of data?
chunk size: which is better? 250 MB? or 500 MB?
very short games: minimum number of moves? 4, 7 or 10?
type of games: standard, rapid and blitz together or separate?
Unfortunately my parser isn't very smart in this regards, it assume that whatever move was made on the server is legal (no real chess rule checking just dumping from one stream into another formated into pgn).
Believe that will be where pgn-extract or scid comes in, they can clean out the bad games.
I think they're invalid, not necessarily illegal. For example "Qd1d3" can't be a SAN-compliant move. It should be one of "Qd3", "Qdd3" or "Q1d3" - with the last 2 variants being used only if there's more than one queen that could move to d3.