Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.
So far I have 3.7 million human-human games between at least 2200 elo rated players.
About 8 million annotated (score only) Stockfish games. And I am not even halfway.
http://rebel13.nl/download/data.html
For data lovers only
Moderators: hgm, Rebel, chrisw
-
- Posts: 6997
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
For data lovers only
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 4556
- Joined: Tue Jul 03, 2007 4:30 am
Re: For data lovers only
Why was 2200 elo chosen as the cutting point? I believe it's equivalent to 1900 elo of chess.com.
Your beliefs create your reality, so be careful what you wish for.
-
- Posts: 12542
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: For data lovers only
I think all of it has value because of the tremendous volume.Rebel wrote: ↑Wed Jul 10, 2019 7:57 am Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.
So far I have 3.7 million human-human games between at least 2200 elo rated players.
About 8 million annotated (score only) Stockfish games. And I am not even halfway.
http://rebel13.nl/download/data.html
What mistakes do players below 1000 tend to make?
Below 1500?
Below 2000?
If we only care to find the best moves we need to filter. But there are other interesting answers to questions hid in that data,
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 1470
- Joined: Mon Apr 23, 2018 7:54 am
Re: For data lovers only
But then it'd really help to know what Lichess ratings would correspond to in FIDE elo.Dann Corbit wrote: ↑Wed Jul 10, 2019 9:13 am I think all of it has value because of the tremendous volume.
What mistakes do players below **** tend to make?
Otherwise, we just know they are very bad, but not exactly how bad.
-
- Posts: 6997
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: For data lovers only
Keep the volume reasonable. Maybe I will do 2100 and 2000 later.
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 1470
- Joined: Mon Apr 23, 2018 7:54 am
-
- Posts: 133
- Joined: Fri Apr 09, 2010 3:26 am
Re: For data lovers only
The higher you go, the closer the ratings get to chess.com's. At least Lichess 2500 is already 2500 on chess.com
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: For data lovers only
This will be useful for supervized neural network training.Rebel wrote: ↑Wed Jul 10, 2019 7:57 am Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.
So far I have 3.7 million human-human games between at least 2200 elo rated players.
About 8 million annotated (score only) Stockfish games. And I am not even halfway.
http://rebel13.nl/download/data.html
It is good if it is sorted by elo from lowest to highest pairs, so that it can emulate
the progression of the neural network in self-training.
-
- Posts: 6997
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: For data lovers only
I heard that before. Just for the record, only the 3.7M human games are checked on 2200 elo, the SF games not.Daniel Shawul wrote: ↑Wed Jul 10, 2019 3:18 pmThis will be useful for supervized neural network training.Rebel wrote: ↑Wed Jul 10, 2019 7:57 am Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.
So far I have 3.7 million human-human games between at least 2200 elo rated players.
About 8 million annotated (score only) Stockfish games. And I am not even halfway.
http://rebel13.nl/download/data.html
It is good if it is sorted by elo from lowest to highest pairs, so that it can emulate
the progression of the neural network in self-training.
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 6997
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: For data lovers only
Once you have the human database of 3.784.887 games installed you can extract from that database higher elo rated database such as 2300, 2400, 2500 etc. with SOMU 1.5, see the page.
90% of coding is debugging, the other 10% is writing bugs.