I've been running a game observation bot on fics for just about 3 years. I cleaned up the parser and after all of this time consolidated all of the raw data and dumped it through my parser to get a nice big 3.5gig pgn file.
If anyone wants it I've put it on my website.
It's pretty much unfiltered raw pgn games of all elo ratings/unrated. even guest. The only thing I filtered for was variants. Though I kept my raw streams for processing later.
http://olympuschess.com/fics-1.pgn.bz2
Hope releasing this into the wild will help other people besides myself.
It's definitely a fun dataset. If anyone finds any problems please let me know. I wish I could have ran this in scid for error checking but my machine isn't powerful enough to do it. Took my parser (written in perl) almost 2 weeks to crunch the raw data.
-Josh
P.S. I just checked and this file has 4,182,419 games.
FICS Data
Moderator: Ras
-
- Posts: 1356
- Joined: Wed Mar 08, 2006 9:41 pm
- Location: Morgantown, WV, USA
Re: FICS Data
For those interested, I'm rewriting the bot as well due to some account limitations on fics. Right now Oannes (account name) looks for running games, gets a history, observes and pops the data off (no processing). The major drawback to this is that fics allows you to only view up to 10 games at a time so losing A LOT of the overall game traffic.
To get around this, I'm keeping a list of user accounts, right now I have over 9k usernames. The new bot will grab data from their history, maintain an internal table of previous games and cycle accordingly. FICS history goes from 1-99 or 0-99, forget. But since there are only 10 games in the history it'll never overlap.
In my tests so far, just letting it run through 1 single names pass it's accumulated about 72megs raw data resulting in 36,489 games. Which goes to show how much my original bot is missing all this time.
Plan to keep releasing game dumps for other people to use for learning, opening books, or whatever.
Hope this helps.
-Josh
To get around this, I'm keeping a list of user accounts, right now I have over 9k usernames. The new bot will grab data from their history, maintain an internal table of previous games and cycle accordingly. FICS history goes from 1-99 or 0-99, forget. But since there are only 10 games in the history it'll never overlap.
In my tests so far, just letting it run through 1 single names pass it's accumulated about 72megs raw data resulting in 36,489 games. Which goes to show how much my original bot is missing all this time.
Plan to keep releasing game dumps for other people to use for learning, opening books, or whatever.
Hope this helps.
-Josh
Re: FICS Data
I have been looking for something like this for a while. Does this include human-engine games as well?
Thanks.
Thanks.
-
- Posts: 1356
- Joined: Wed Mar 08, 2006 9:41 pm
- Location: Morgantown, WV, USA
Re: FICS Data
Sure it does
it's pretty much everything that's visible to users. If you have a specific account you're looking for I can do a quick grep to see what and how many games is in this first release.
-Josh
P.S. I changed the file and the link above from my initial post because I wanted to give some kind of version to it.

-Josh
P.S. I changed the file and the link above from my initial post because I wanted to give some kind of version to it.
Re: FICS Data
My download will complete in 40 minutes, but I am curious how many games there are and what percentage are human-engine?
Thanks again.
Thanks again.
-
- Posts: 1356
- Joined: Wed Mar 08, 2006 9:41 pm
- Location: Morgantown, WV, USA
Re: FICS Data
Hope it works well for you. By the way this is bzip2 compressed. If others would like I could put a zip version up as well.
I did add md5 and sh1sums
http://olympuschess.com/fics-1.bz2.sh1
http://olympuschess.com/fics-1.bz2.md5
-Josh
I did add md5 and sh1sums
http://olympuschess.com/fics-1.bz2.sh1
http://olympuschess.com/fics-1.bz2.md5
-Josh
Re: FICS Data
Did you try 7zip/LZMA with large word/dictionary size? I just took 60,000 pseudo-random games and reduced them to 13% their original size compared to your 27%, so you might be able to cut this file size down in half.
-
- Posts: 1356
- Joined: Wed Mar 08, 2006 9:41 pm
- Location: Morgantown, WV, USA
Re: FICS Data
Tried 7zr with maximum compression and it only sliced off a little more than bzip2 with -9.
Can put it up as well upon request.
-Josh
Can put it up as well upon request.
-Josh
-
- Posts: 778
- Joined: Sat Jul 01, 2006 7:11 am
Re: FICS Data
After running it through pgn-extract, importing it into scid and eliminating duplicates and very short games, I ended up with 1601772 games.
Here is an example of a badly formatted game.
[White "Flesch"]
[Black "Vadasz"]
[WhiteElo "0"]
[BlackElo "0"]
[Date "2006.10.18"]
[Event "None"]
[Site "FICS"]
[Round "0"]
1 ...
1 Bxh7+
1 Nf5
1 Qh5
1 Rd7
1 Rg7+
1 Rxd4
1 d5
1 none Kxf6Kxh7Nxg3Qc5Qc6Qe7Rad8Rxa3Rxd2Rxe4cxd5gxf5gxh5
2 Ba3
2 Bxc6
2 Bxd2
2 Bxf5
2 Ne4+
2 Nxg3
2 Qa1#
2 Qf6
2 Qxd2
2 Re5
2 Rxd6
2 Rxe4
2 bxa3
2 h4 Bd5Bxc6+Bxd6Kf7Nb3+Ng5Qd8Qf3Qh4Rd2Rxd6Rxg3+
3 Bxc6
3 Kb1
3 Kg1
3 Ne7+
3 Qf6
3 Qxe2
3 Rg7+
3 Rh4+
3 hxg3 Bxc6+Ke8Kg6Kg8Ncd2+Nh3#Qxe7Rd1+Rxg3+gxf6
4 Kc2
4 Kf1
4 Kg1
4 Qxd1
4 Qxh7+
4 Rg4+
4 Rh8+ Kh7Kh8Kxh7Kxh8Nd4+Rxd3Rxe2e2+
5 Be3
5 Bxf6#
5 Kxd2
5 Qh6+
5 Rg4
5 Rh5+
5 Rxg7+
5 bxc5 Bxe3#Kg8Kh8Nxe2Qxb2Rg2+
6 Kf1
6 Kh1
6 Qh6#
6 Qxg7#
6 Rf8+
6 Rh8#
6 bxa3 Bg8Ng3Rxg3#e2+
7 Ke1
7 Rfxg8# Rg1+
1-0
Here is an example of a badly formatted game.
[White "Flesch"]
[Black "Vadasz"]
[WhiteElo "0"]
[BlackElo "0"]
[Date "2006.10.18"]
[Event "None"]
[Site "FICS"]
[Round "0"]
1 ...
1 Bxh7+
1 Nf5
1 Qh5
1 Rd7
1 Rg7+
1 Rxd4
1 d5
1 none Kxf6Kxh7Nxg3Qc5Qc6Qe7Rad8Rxa3Rxd2Rxe4cxd5gxf5gxh5
2 Ba3
2 Bxc6
2 Bxd2
2 Bxf5
2 Ne4+
2 Nxg3
2 Qa1#
2 Qf6
2 Qxd2
2 Re5
2 Rxd6
2 Rxe4
2 bxa3
2 h4 Bd5Bxc6+Bxd6Kf7Nb3+Ng5Qd8Qf3Qh4Rd2Rxd6Rxg3+
3 Bxc6
3 Kb1
3 Kg1
3 Ne7+
3 Qf6
3 Qxe2
3 Rg7+
3 Rh4+
3 hxg3 Bxc6+Ke8Kg6Kg8Ncd2+Nh3#Qxe7Rd1+Rxg3+gxf6
4 Kc2
4 Kf1
4 Kg1
4 Qxd1
4 Qxh7+
4 Rg4+
4 Rh8+ Kh7Kh8Kxh7Kxh8Nd4+Rxd3Rxe2e2+
5 Be3
5 Bxf6#
5 Kxd2
5 Qh6+
5 Rg4
5 Rh5+
5 Rxg7+
5 bxc5 Bxe3#Kg8Kh8Nxe2Qxb2Rg2+
6 Kf1
6 Kh1
6 Qh6#
6 Qxg7#
6 Rf8+
6 Rh8#
6 bxa3 Bg8Ng3Rxg3#e2+
7 Ke1
7 Rfxg8# Rg1+
1-0
-
- Posts: 1356
- Joined: Wed Mar 08, 2006 9:41 pm
- Location: Morgantown, WV, USA
Re: FICS Data
Thanks for the input, will take a look at my data and the parser.
Had seen a problem similar to that when i was testing. Occasionally, as the original bot was recording "style 12" lines from the FICS server it might miss a beat and lose a line, it wasn't transmitted, or lost somehow. Resulting in holes in the game data.
Didn't seem as frequent to be a big concern, but the line with the random garbage (or other game moves) within a single move section is troubling.
Will look at it this weekend.
Any other concerns are appreciated.
-Josh
Had seen a problem similar to that when i was testing. Occasionally, as the original bot was recording "style 12" lines from the FICS server it might miss a beat and lose a line, it wasn't transmitted, or lost somehow. Resulting in holes in the game data.
Didn't seem as frequent to be a big concern, but the line with the random garbage (or other game moves) within a single move section is troubling.
Will look at it this weekend.
Any other concerns are appreciated.
-Josh