Page 1 of 1

Interesting data graphs

Posted: Mon Jun 18, 2018 10:54 pm
by MOBMAT
I am currently processing chess positions for use in an engine mentored learning project. Each data point is evaluated and takes about ~90 min to process 100,000 positions. I have 9 cores working on it, but it will still take a couple more weeks to complete the entire set.

The data is from a huge data set of positions (FENs) offered by Mathieu Pagé. I downloaded the set a year or so ago from his website, but it appears his website is no longer active. I believe each position is unique (no dups) but I have no idea what games sourced the data. The entire data set has about 286M position. The number of positions split between white/black is statistically even.

I took 20% of the positions and produced two graphs, but when I saw the graphs, I thought there must have been some bias, so I ran the collection again against twice the amount of data. The graphs were almost identical, the latter is what I have uploaded here.

Positions found with only two kings were culled, which is why there aren't any data points on the second graph below 3. In the future I'll also eliminate unwinnable endgame positions as well as those positions where the side to move is in check (about 1%).

The chart.pgn graph is a break down of the frequency of positions based on the number of pawns in each position. I'll leave it to the readers to figure out what it means, but the most interesting data point to me is the dislike of being one pawn behind in the opening.

The second graph is the frequency of positions by number of pieces (all pieces) in each position. This is a very interesting graph, showing a tendency to maintain piece equality, more so in the early game.
NumPieces.png
Your questions and comments are welcome, and if there are any interesting (what if's), I'll attempt to graph them.

Re: Interesting data graphs

Posted: Wed Jun 20, 2018 4:53 am
by Ferdy
MOBMAT wrote: Mon Jun 18, 2018 10:54 pm I am currently processing chess positions for use in an engine mentored learning project. Each data point is evaluated and takes about ~90 min to process 100,000 positions. I have 9 cores working on it, but it will still take a couple more weeks to complete the entire set.

The data is from a huge data set of positions (FENs) offered by Mathieu Pagé. I downloaded the set a year or so ago from his website, but it appears his website is no longer active. I believe each position is unique (no dups) but I have no idea what games sourced the data. The entire data set has about 286M position. The number of positions split between white/black is statistically even.
Can you post an example single fen from the original data set. If it is purely fen without result then there is no need to post it.

It is also interesting to see the plot on the frequency of passed pawns say total pawns or total pieces [Q=9, R=5, R=B=3] in x-axis.

Re: Interesting data graphs

Posted: Wed Jun 20, 2018 6:10 am
by MOBMAT
Ferdy wrote: Wed Jun 20, 2018 4:53 am Can you post an example single fen from the original data set. If it is purely fen without result then there is no need to post it.
The original data set does not have any results. I would not interested in them anyways since I'm taking a different approach.
An original FEN looks like this:

Code: Select all

1B1b1k1r/1B3pp1/4pn2/Pn5p/8/1P6/6PP/2R2R1K w - - 
A static evaluation is applied and the output is:

Code: Select all

1B1b1k1r/1B3pp1/4pn2/Pn5p/8/1P6/6PP/2R2R1K w - - ce +349
The score is in centi-pawns from the point of view of the side to move.
The score did not come into play when producing the graphs.
It is also interesting to see the plot on the frequency of passed pawns say total pawns or total pieces [Q=9, R=5, R=B=3] in x-axis.
So, as an example, for each number of pawn/pieces remaining (X-axis), you would want to see a count of how many positions had passed pawns? For your second idea, I'm not sure what you mean by " [Q=9, R=5, R=B=3] in x-axis". I could run the count of passed pawns against the total pieces remaining, though.

Re: Interesting data graphs

Posted: Wed Jun 20, 2018 9:26 am
by Ferdy
MOBMAT wrote: Wed Jun 20, 2018 6:10 am
Ferdy wrote: Wed Jun 20, 2018 4:53 am Can you post an example single fen from the original data set. If it is purely fen without result then there is no need to post it.
The original data set does not have any results. I would not interested in them anyways since I'm taking a different approach.
An original FEN looks like this:

Code: Select all

1B1b1k1r/1B3pp1/4pn2/Pn5p/8/1P6/6PP/2R2R1K w - - 
A static evaluation is applied and the output is:

Code: Select all

1B1b1k1r/1B3pp1/4pn2/Pn5p/8/1P6/6PP/2R2R1K w - - ce +349
The score is in centi-pawns from the point of view of the side to move.
I have a similar training sets, but I just extracted the position from pgn files. The advantage is that the score, and result is already there. Example.

r1bqr1k1/p2p1ppp/1pnP1n2/b1p5/7N/2N3P1/PP2PPBP/R1BQ1RK1 w - - ce 73; sm Re1; c0 "1-0";

ce 73 - the evaluation of the engine playing as white
sm Re1 - sm (supplied move), the move made by the engine from that position
c0 "1-0: the result of the actual game.

Extraction conditions:
1. The king of the side to move is not under attack
2. The sm is not a capture or promote or castle or a move the check the opponent's king
3. The previous sm of the previous position is not a capture or promotion
4. The future sm (1 ply look ahead) or the sm of the position after the current position is not a capture or promotion
This would try to ensure that moves played in the game from previous, current and future position is not tactical
Other conditions are:
5. Start examining the position of every game at move 12
6. Minimum number of pawns for each side is 2.
7. Minimum piece value for each side is 8 [Q=9, R=5, N=B=3]

I use c0 "1-0" for texel tuning
I use ce 73, also for tuning, get my engine score of the position, get the error_square = (myenginescore-73)^2, then minimize the error by adjusting the parameter values.
sm Re1 can also be used for tuning by move
MOBMAT wrote:The score did not come into play when producing the graphs.
It is also interesting to see the plot on the frequency of passed pawns say total pawns or total pieces [Q=9, R=5, R=B=3] in x-axis.
So, as an example, for each number of pawn/pieces remaining (X-axis), you would want to see a count of how many positions had passed pawns?
Yes.
MOBMAT wrote:For your second idea, I'm not sure what you mean by " [Q=9, R=5, R=B=3] in x-axis". I could run the count of passed pawns against the total pieces remaining, though.
My mistake, it should have been [Q=9, R=5, N=B=3]. Count pieces by piece values. If there are 2 queens remaining on the board and there are no other pieces, total piece value = 9+9 = 18, then find the number of passers if remaining piece value is 18.