3 million games for training neural networks

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

3 million games for training neural networks

Post by AlvaroBegue »

Hi,

I had my 8-core Ryzen 7 computer spend about 2 months generating quick Stockfish-vs-Stockfish games. The result is 3 million games that I think could be used for training neural networks, or to tune evaluation functions.

I have the games available in PGN format with comments that include the score and search depth, or in a much more compact and easy to parse format: One game per line, consisting of moves in UCI notation followed by the result at the end.

Code: Select all

d2d4 g8f6 g1f3 g7g6 b1c3 f8g7 e2e4 d7d6 c1e3 c7c6 d1d2 b8d7 a2a4 e8g8 e3h6 e7e5 e1c1 d8a5 d2g5 g7h6 g5h6 d7b6 d1d3 c8e6 f3g5 b6d7 f1e2 b7b5 d4e5 d7e5 d3g3 b5b4 c3b1 b4b3 c2b3 c6c5 b1d2 c5c4 g5e6 f7e6 b3c4 a8b8 h6h3 b8b2 h3e6 g8g7 d2b3 b2b3 g3b3 f6e4 e6d5 e4c5 b3b5 a5c3 c1b1 f8f2 d5d6 f2e2 d6e7 g7h6 e7h4 h6g7 h4e7 g7h6 0-1
I used openings from swcr-fq-openings-v3.5.pgn , whose lines a fixed length of 8 moves for each player.

At least one person in this forum has expressed an interest in the data. I will make it available soon, but I wanted to ask this: Would you be interested in the PGN version or the games, or is the simple format enough?
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: 3 million games for training neural networks

Post by Ferdy »

What is the time control? Do you apply game adjudications? I like the full pgn file with move comments. I am interested in extracting positions with material imbalance, positions with mate in N where N can be 10 or less and extract epd and add ce where ce is from the move comments in the game itself for evaluation tuning.
User avatar
Ozymandias
Posts: 1532
Joined: Sun Oct 25, 2009 2:30 am

Re: 3 million games for training neural networks

Post by Ozymandias »

Full PGN is preferable. Thx in advance.
smatovic
Posts: 2639
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: 3 million games for training neural networks

Post by smatovic »

Would you be interested in the PGN version or the games, or is the simple format enough?
PGN preferred, thanks.

--
Srdja
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: 3 million games for training neural networks

Post by abulmo2 »

AlvaroBegue wrote:I used openings from swcr-fq-openings-v3.5.pgn , whose lines a fixed length of 8 moves for each player.
Are the 5120 opening lines various enough for 3 millions games?
To tune Amoeba's evaluation function, I play the first moves randomly, to be sure to have enough variety. During an alphabeta search, the leaves of the tree can reach some strange positions that need to be evaluated correctly.
Richard Delorme
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: 3 million games for training neural networks

Post by AlvaroBegue »

Ferdy wrote:What is the time control? Do you apply game adjudications? I like the full pgn file with move comments. I am interested in extracting positions with material imbalance, positions with mate in N where N can be 10 or less and extract epd and add ce where ce is from the move comments in the game itself for evaluation tuning.
Time control is 1 s + 0.1 s/move. No adjudication, so you'll get plenty of mate-in-n positions. My notes are below.

What do you mean by "ce"?

----------

Engine: Stockfish 101217 64 BMI2

CPU: AMD Ryzen 7 1800X Eight-Core Processor

This is the command used to generate games:

Code: Select all

nohup time unbuffer ~/Downloads/cutechess-master/projects/cli/cutechess-cli -recover -engine cmd=stockfish proto=uci option.SyzygyPath=/home/alvaro/syzygy option.Hash=256 tc=inf/1+.1 -engine cmd=stockfish proto=uci option.SyzygyPath=/home/alvaro/syzygy option.Hash=256 tc=inf/1+.1 -games 1000000 -concurrency 8 -openings file=~/ruy/swcr-fq-openings-v3.5.pgn format=pgn order=random -pgnout db.pgn &
It takes 19.8 days to complete one of these runs.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: 3 million games for training neural networks

Post by AlvaroBegue »

Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: 3 million games for training neural networks

Post by Ferdy »

AlvaroBegue wrote:What do you mean by "ce"?
It is an opcode in epd standard, called centipawn evaluation.

https://chessprogramming.wikispaces.com ... escription

Example.

Code: Select all

rnbqkbnr/pppp1ppp/8/4p3/4P3/8/PPPP1PPP/RNBQKBNR w KQkq - bm Nf3; ce 15;
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: 3 million games for training neural networks

Post by Ferdy »

AlvaroBegue wrote:Here's the link: https://drive.google.com/drive/folders/ ... itamyJD5_k

Enjoy!
Thanks.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: 3 million games for training neural networks

Post by Joost Buijs »

AlvaroBegue wrote:Here's the link: https://drive.google.com/drive/folders/ ... itamyJD5_k

Enjoy!
Thanks for making these games available!

Atm. I'm not really into NN, but just curious to see if these games will make a difference compared with the self-play games I use to tune the evaluation function.