Interesting data graphs
Posted: Mon Jun 18, 2018 10:54 pm
I am currently processing chess positions for use in an engine mentored learning project. Each data point is evaluated and takes about ~90 min to process 100,000 positions. I have 9 cores working on it, but it will still take a couple more weeks to complete the entire set.
The data is from a huge data set of positions (FENs) offered by Mathieu Pagé. I downloaded the set a year or so ago from his website, but it appears his website is no longer active. I believe each position is unique (no dups) but I have no idea what games sourced the data. The entire data set has about 286M position. The number of positions split between white/black is statistically even.
I took 20% of the positions and produced two graphs, but when I saw the graphs, I thought there must have been some bias, so I ran the collection again against twice the amount of data. The graphs were almost identical, the latter is what I have uploaded here.
Positions found with only two kings were culled, which is why there aren't any data points on the second graph below 3. In the future I'll also eliminate unwinnable endgame positions as well as those positions where the side to move is in check (about 1%).
The chart.pgn graph is a break down of the frequency of positions based on the number of pawns in each position. I'll leave it to the readers to figure out what it means, but the most interesting data point to me is the dislike of being one pawn behind in the opening.
The second graph is the frequency of positions by number of pieces (all pieces) in each position. This is a very interesting graph, showing a tendency to maintain piece equality, more so in the early game. Your questions and comments are welcome, and if there are any interesting (what if's), I'll attempt to graph them.
The data is from a huge data set of positions (FENs) offered by Mathieu Pagé. I downloaded the set a year or so ago from his website, but it appears his website is no longer active. I believe each position is unique (no dups) but I have no idea what games sourced the data. The entire data set has about 286M position. The number of positions split between white/black is statistically even.
I took 20% of the positions and produced two graphs, but when I saw the graphs, I thought there must have been some bias, so I ran the collection again against twice the amount of data. The graphs were almost identical, the latter is what I have uploaded here.
Positions found with only two kings were culled, which is why there aren't any data points on the second graph below 3. In the future I'll also eliminate unwinnable endgame positions as well as those positions where the side to move is in check (about 1%).
The chart.pgn graph is a break down of the frequency of positions based on the number of pawns in each position. I'll leave it to the readers to figure out what it means, but the most interesting data point to me is the dislike of being one pawn behind in the opening.
The second graph is the frequency of positions by number of pieces (all pieces) in each position. This is a very interesting graph, showing a tendency to maintain piece equality, more so in the early game. Your questions and comments are welcome, and if there are any interesting (what if's), I'll attempt to graph them.