Training data

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
MOBMAT
Posts: 74
Joined: Sat Feb 04, 2017 10:57 pm
Location: USA

Training data

Post by MOBMAT » Thu May 10, 2018 5:57 am

I recall a thread regarding a big file of quiet positions with scores in FEN format.

I search and browsed and couldn't find the thread.

If someone has a link to that kind of file, please let me know. It is time to do some engine brain training.
Vince S
Author of MOBMAT

"Reductions, extensions, and pruning, oh my!"

Vinvin
Posts: 4369
Joined: Thu Mar 09, 2006 8:40 am
Full name: Vincent Lejeune

Re: Training data

Post by Vinvin » Thu May 10, 2018 7:46 am

Openings ? Middlegames ? Endgames ?

pkumar
Posts: 93
Joined: Tue Oct 15, 2013 3:45 pm

Re: Training data

Post by pkumar » Thu May 10, 2018 10:40 am


AlvaroBegue
Posts: 919
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Training data

Post by AlvaroBegue » Thu May 10, 2018 12:27 pm

I can give you two related things. The first one is the file I used to tune RuyDos a couple of years ago: https://bitbucket.org/alonamaloh/ruy_tu ... th_results

It looks like this:

Code: Select all

3r4/4k3/8/5p1R/8/1b2PB2/1P6/4K3 b - - 1-0
3nk2r/rp1b2pp/pR3p2/3P4/5Q2/3B1N2/5PPP/5RK1 b k - 1-0
1R6/7p/4k1pB/p1Ppn3/3K3P/8/r7/8 w - - 0-1
3R4/5B1k/2b4p/5p2/1P6/4q3/P4RPP/6K1 b - - 1/2-1/2
8/5kp1/p4n1p/3pK3/1B6/8/8/8 w - - 0-1
3q3k/1br2pp1/1p6/pP1pR1b1/3P4/P2Q2P1/1B5P/5RK1 b - - 1-0
2b1rbk1/1p1n1pp1/3B3p/6q1/2B1P3/2N2P1P/R2Q2P1/6K1 b - - 1/2-1/2
2q3k1/5pp1/p3p2p/1p6/1Q1P4/5PP1/PP2N2P/3R2K1 b - - 1-0
8/7Q/p2p1pp1/4b1k1/6r1/8/P4PP1/3R1RK1 b - - 1-0
rq3rk1/2p2ppp/p2b4/1p1Rp1BQ/4P3/1P5P/1PP2PP1/3R2K1 b - - 1-0
[...]
The positions have been obtained initially by sampling from evaluation calls in my program RuyDos. A game was played from that position (SF7-vs-SF7 with very fast time control), but then each position was replaced with the leaf from running QS. At the end of each line you have the result of the game. I probably would do it slightly differently if I were to generate data like this again.


The second one is a Kaggle data set that was created for a different purpose, but it contains 50,000 games where every position has been scored by letting Stockfish (not sure what version, but the files are dated 2014) search for 1 second: https://www.kaggle.com/c/finding-elo/data

Henk
Posts: 5797
Joined: Mon May 27, 2013 8:31 am

Re: Training data

Post by Henk » Thu May 10, 2018 12:46 pm

For policy training you also need move probabilities. Relative frequencies for all legal moves per position. And who can tell that these values are right.

AlvaroBegue
Posts: 919
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Training data

Post by AlvaroBegue » Thu May 10, 2018 1:55 pm

Henk wrote:
Thu May 10, 2018 12:46 pm
For policy training you also need move probabilities. Relative frequencies for all legal moves per position. And who can tell that these values are right.
No, for policy training you just need moves. Take any collection of games (I believe they don't need to be of particularly high quality or evenly matched for this purpose) and use cross-entropy as your loss function.

MOBMAT
Posts: 74
Joined: Sat Feb 04, 2017 10:57 pm
Location: USA

Re: Training data

Post by MOBMAT » Thu May 10, 2018 3:47 pm

Vinvin wrote:
Thu May 10, 2018 7:46 am
Openings ? Middlegames ? Endgames ?
all of the above. i'm not doing NN, i'm doing logistical regression analysis, so i need a spread from all aspects of the game.
i just need a won or loss indicator, i won't use draws. I'm breaking the game up into stages and applying separate regression on each stage with some overlap between stages for smoothing. I used this technique in a Othello engine a long time ago and it worked very well, albeit, i was using linear regression (scores in Othello range from +- 64).

if i can't find a suitable file, the alternative is to use the million(s) DB of GM games and split them up using the Python Chess code, but I was hoping someone already did something like this already.
Vince S
Author of MOBMAT

"Reductions, extensions, and pruning, oh my!"

User avatar
Werner Taelemans
Posts: 102
Joined: Mon Feb 03, 2014 10:57 am
Location: Belgium
Contact:

Re: Training data

Post by Werner Taelemans » Thu May 10, 2018 7:58 pm

MOBMAT wrote:
Thu May 10, 2018 3:47 pm
if i can't find a suitable file, .....
Did you miss the link that Alvaro gave to you?
https://bitbucket.org/alonamaloh/ruy_tu ... th_results

It's a file with 1.3 million positions, plus their result.

Post Reply