Page 1 of 1

Training data

Posted: Thu May 10, 2018 7:57 am
by MOBMAT
I recall a thread regarding a big file of quiet positions with scores in FEN format.

I search and browsed and couldn't find the thread.

If someone has a link to that kind of file, please let me know. It is time to do some engine brain training.

Re: Training data

Posted: Thu May 10, 2018 9:46 am
by Vinvin
Openings ? Middlegames ? Endgames ?

Re: Training data

Posted: Thu May 10, 2018 12:40 pm
by pkumar

Re: Training data

Posted: Thu May 10, 2018 2:27 pm
by AlvaroBegue
I can give you two related things. The first one is the file I used to tune RuyDos a couple of years ago: https://bitbucket.org/alonamaloh/ruy_tu ... th_results

It looks like this:

Code: Select all

3r4/4k3/8/5p1R/8/1b2PB2/1P6/4K3 b - - 1-0
3nk2r/rp1b2pp/pR3p2/3P4/5Q2/3B1N2/5PPP/5RK1 b k - 1-0
1R6/7p/4k1pB/p1Ppn3/3K3P/8/r7/8 w - - 0-1
3R4/5B1k/2b4p/5p2/1P6/4q3/P4RPP/6K1 b - - 1/2-1/2
8/5kp1/p4n1p/3pK3/1B6/8/8/8 w - - 0-1
3q3k/1br2pp1/1p6/pP1pR1b1/3P4/P2Q2P1/1B5P/5RK1 b - - 1-0
2b1rbk1/1p1n1pp1/3B3p/6q1/2B1P3/2N2P1P/R2Q2P1/6K1 b - - 1/2-1/2
2q3k1/5pp1/p3p2p/1p6/1Q1P4/5PP1/PP2N2P/3R2K1 b - - 1-0
8/7Q/p2p1pp1/4b1k1/6r1/8/P4PP1/3R1RK1 b - - 1-0
rq3rk1/2p2ppp/p2b4/1p1Rp1BQ/4P3/1P5P/1PP2PP1/3R2K1 b - - 1-0
[...]
The positions have been obtained initially by sampling from evaluation calls in my program RuyDos. A game was played from that position (SF7-vs-SF7 with very fast time control), but then each position was replaced with the leaf from running QS. At the end of each line you have the result of the game. I probably would do it slightly differently if I were to generate data like this again.


The second one is a Kaggle data set that was created for a different purpose, but it contains 50,000 games where every position has been scored by letting Stockfish (not sure what version, but the files are dated 2014) search for 1 second: https://www.kaggle.com/c/finding-elo/data

Re: Training data

Posted: Thu May 10, 2018 2:46 pm
by Henk
For policy training you also need move probabilities. Relative frequencies for all legal moves per position. And who can tell that these values are right.

Re: Training data

Posted: Thu May 10, 2018 3:55 pm
by AlvaroBegue
Henk wrote: Thu May 10, 2018 2:46 pm For policy training you also need move probabilities. Relative frequencies for all legal moves per position. And who can tell that these values are right.
No, for policy training you just need moves. Take any collection of games (I believe they don't need to be of particularly high quality or evenly matched for this purpose) and use cross-entropy as your loss function.

Re: Training data

Posted: Thu May 10, 2018 5:47 pm
by MOBMAT
Vinvin wrote: Thu May 10, 2018 9:46 am Openings ? Middlegames ? Endgames ?
all of the above. i'm not doing NN, i'm doing logistical regression analysis, so i need a spread from all aspects of the game.
i just need a won or loss indicator, i won't use draws. I'm breaking the game up into stages and applying separate regression on each stage with some overlap between stages for smoothing. I used this technique in a Othello engine a long time ago and it worked very well, albeit, i was using linear regression (scores in Othello range from +- 64).

if i can't find a suitable file, the alternative is to use the million(s) DB of GM games and split them up using the Python Chess code, but I was hoping someone already did something like this already.

Re: Training data

Posted: Thu May 10, 2018 9:58 pm
by Werner Taelemans
MOBMAT wrote: Thu May 10, 2018 5:47 pm if i can't find a suitable file, .....
Did you miss the link that Alvaro gave to you?
https://bitbucket.org/alonamaloh/ruy_tu ... th_results

It's a file with 1.3 million positions, plus their result.