I recall a thread regarding a big file of quiet positions with scores in FEN format.
I search and browsed and couldn't find the thread.
If someone has a link to that kind of file, please let me know. It is time to do some engine brain training.
Training data
Moderators: hgm, Rebel, chrisw
-
- Posts: 385
- Joined: Sat Feb 04, 2017 11:57 pm
- Location: USA
Training data
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
-
- Posts: 5228
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Training data
Openings ? Middlegames ? Endgames ?
-
- Posts: 100
- Joined: Tue Oct 15, 2013 5:45 pm
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Training data
I can give you two related things. The first one is the file I used to tune RuyDos a couple of years ago: https://bitbucket.org/alonamaloh/ruy_tu ... th_results
It looks like this:
The positions have been obtained initially by sampling from evaluation calls in my program RuyDos. A game was played from that position (SF7-vs-SF7 with very fast time control), but then each position was replaced with the leaf from running QS. At the end of each line you have the result of the game. I probably would do it slightly differently if I were to generate data like this again.
The second one is a Kaggle data set that was created for a different purpose, but it contains 50,000 games where every position has been scored by letting Stockfish (not sure what version, but the files are dated 2014) search for 1 second: https://www.kaggle.com/c/finding-elo/data
It looks like this:
Code: Select all
3r4/4k3/8/5p1R/8/1b2PB2/1P6/4K3 b - - 1-0
3nk2r/rp1b2pp/pR3p2/3P4/5Q2/3B1N2/5PPP/5RK1 b k - 1-0
1R6/7p/4k1pB/p1Ppn3/3K3P/8/r7/8 w - - 0-1
3R4/5B1k/2b4p/5p2/1P6/4q3/P4RPP/6K1 b - - 1/2-1/2
8/5kp1/p4n1p/3pK3/1B6/8/8/8 w - - 0-1
3q3k/1br2pp1/1p6/pP1pR1b1/3P4/P2Q2P1/1B5P/5RK1 b - - 1-0
2b1rbk1/1p1n1pp1/3B3p/6q1/2B1P3/2N2P1P/R2Q2P1/6K1 b - - 1/2-1/2
2q3k1/5pp1/p3p2p/1p6/1Q1P4/5PP1/PP2N2P/3R2K1 b - - 1-0
8/7Q/p2p1pp1/4b1k1/6r1/8/P4PP1/3R1RK1 b - - 1-0
rq3rk1/2p2ppp/p2b4/1p1Rp1BQ/4P3/1P5P/1PP2PP1/3R2K1 b - - 1-0
[...]
The second one is a Kaggle data set that was created for a different purpose, but it contains 50,000 games where every position has been scored by letting Stockfish (not sure what version, but the files are dated 2014) search for 1 second: https://www.kaggle.com/c/finding-elo/data
-
- Posts: 7218
- Joined: Mon May 27, 2013 10:31 am
Re: Training data
For policy training you also need move probabilities. Relative frequencies for all legal moves per position. And who can tell that these values are right.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Training data
No, for policy training you just need moves. Take any collection of games (I believe they don't need to be of particularly high quality or evenly matched for this purpose) and use cross-entropy as your loss function.
-
- Posts: 385
- Joined: Sat Feb 04, 2017 11:57 pm
- Location: USA
Re: Training data
all of the above. i'm not doing NN, i'm doing logistical regression analysis, so i need a spread from all aspects of the game.
i just need a won or loss indicator, i won't use draws. I'm breaking the game up into stages and applying separate regression on each stage with some overlap between stages for smoothing. I used this technique in a Othello engine a long time ago and it worked very well, albeit, i was using linear regression (scores in Othello range from +- 64).
if i can't find a suitable file, the alternative is to use the million(s) DB of GM games and split them up using the Python Chess code, but I was hoping someone already did something like this already.
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
-
- Posts: 119
- Joined: Mon Feb 03, 2014 11:57 am
- Location: Belgium
- Full name: Werner Taelemans
Re: Training data
Did you miss the link that Alvaro gave to you?
https://bitbucket.org/alonamaloh/ruy_tu ... th_results
It's a file with 1.3 million positions, plus their result.