Training data

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

MOBMAT
Posts: 385
Joined: Sat Feb 04, 2017 11:57 pm
Location: USA

Training data

Post by MOBMAT »

I recall a thread regarding a big file of quiet positions with scores in FEN format.

I search and browsed and couldn't find the thread.

If someone has a link to that kind of file, please let me know. It is time to do some engine brain training.
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Training data

Post by Vinvin »

Openings ? Middlegames ? Endgames ?
pkumar
Posts: 100
Joined: Tue Oct 15, 2013 5:45 pm

Re: Training data

Post by pkumar »

AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Training data

Post by AlvaroBegue »

I can give you two related things. The first one is the file I used to tune RuyDos a couple of years ago: https://bitbucket.org/alonamaloh/ruy_tu ... th_results

It looks like this:

Code: Select all

3r4/4k3/8/5p1R/8/1b2PB2/1P6/4K3 b - - 1-0
3nk2r/rp1b2pp/pR3p2/3P4/5Q2/3B1N2/5PPP/5RK1 b k - 1-0
1R6/7p/4k1pB/p1Ppn3/3K3P/8/r7/8 w - - 0-1
3R4/5B1k/2b4p/5p2/1P6/4q3/P4RPP/6K1 b - - 1/2-1/2
8/5kp1/p4n1p/3pK3/1B6/8/8/8 w - - 0-1
3q3k/1br2pp1/1p6/pP1pR1b1/3P4/P2Q2P1/1B5P/5RK1 b - - 1-0
2b1rbk1/1p1n1pp1/3B3p/6q1/2B1P3/2N2P1P/R2Q2P1/6K1 b - - 1/2-1/2
2q3k1/5pp1/p3p2p/1p6/1Q1P4/5PP1/PP2N2P/3R2K1 b - - 1-0
8/7Q/p2p1pp1/4b1k1/6r1/8/P4PP1/3R1RK1 b - - 1-0
rq3rk1/2p2ppp/p2b4/1p1Rp1BQ/4P3/1P5P/1PP2PP1/3R2K1 b - - 1-0
[...]
The positions have been obtained initially by sampling from evaluation calls in my program RuyDos. A game was played from that position (SF7-vs-SF7 with very fast time control), but then each position was replaced with the leaf from running QS. At the end of each line you have the result of the game. I probably would do it slightly differently if I were to generate data like this again.


The second one is a Kaggle data set that was created for a different purpose, but it contains 50,000 games where every position has been scored by letting Stockfish (not sure what version, but the files are dated 2014) search for 1 second: https://www.kaggle.com/c/finding-elo/data
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Training data

Post by Henk »

For policy training you also need move probabilities. Relative frequencies for all legal moves per position. And who can tell that these values are right.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Training data

Post by AlvaroBegue »

Henk wrote: Thu May 10, 2018 2:46 pm For policy training you also need move probabilities. Relative frequencies for all legal moves per position. And who can tell that these values are right.
No, for policy training you just need moves. Take any collection of games (I believe they don't need to be of particularly high quality or evenly matched for this purpose) and use cross-entropy as your loss function.
MOBMAT
Posts: 385
Joined: Sat Feb 04, 2017 11:57 pm
Location: USA

Re: Training data

Post by MOBMAT »

Vinvin wrote: Thu May 10, 2018 9:46 am Openings ? Middlegames ? Endgames ?
all of the above. i'm not doing NN, i'm doing logistical regression analysis, so i need a spread from all aspects of the game.
i just need a won or loss indicator, i won't use draws. I'm breaking the game up into stages and applying separate regression on each stage with some overlap between stages for smoothing. I used this technique in a Othello engine a long time ago and it worked very well, albeit, i was using linear regression (scores in Othello range from +- 64).

if i can't find a suitable file, the alternative is to use the million(s) DB of GM games and split them up using the Python Chess code, but I was hoping someone already did something like this already.
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
User avatar
Werner Taelemans
Posts: 119
Joined: Mon Feb 03, 2014 11:57 am
Location: Belgium
Full name: Werner Taelemans

Re: Training data

Post by Werner Taelemans »

MOBMAT wrote: Thu May 10, 2018 5:47 pm if i can't find a suitable file, .....
Did you miss the link that Alvaro gave to you?
https://bitbucket.org/alonamaloh/ruy_tu ... th_results

It's a file with 1.3 million positions, plus their result.