3 million games for training neural networks

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
zenpawn
Posts: 294
Joined: Sat Aug 06, 2016 6:31 pm
Location: United States

Re: 3 million games for training neural networks

Post by zenpawn » Sat Feb 24, 2018 11:33 pm

Likewise, I plan to use them for tuning. Thank you, Alvaro!

To everyone downloading these, don't forget to prune the time forfeits if you're using the result. Either that, or go through them by hand and decide what the results should have been. That might not be too daunting as there are only 8 of them in db_3 (haven't checked the count in the others yet).

AlvaroBegue
Posts: 919
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: 3 million games for training neural networks

Post by AlvaroBegue » Sun Feb 25, 2018 2:46 am

zenpawn wrote:To everyone downloading these, don't forget to prune the time forfeits if you're using the result. Either that, or go through them by hand and decide what the results should have been. That might not be too daunting as there are only 8 of them in db_3 (haven't checked the count in the others yet).
That's a very good point. There are 2 on db_2 and none in db_1. I guess the computer had some issues in the third run (swapping?).

They should probably just be removed. Or you can ignore the problem since the pollution is minimal.

AlvaroBegue
Posts: 919
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: 3 million games for training neural networks

Post by AlvaroBegue » Sun Feb 25, 2018 3:10 am

AlvaroBegue wrote:That's a very good point. There are 2 on db_2 and none in db_1. I guess the computer had some issues in the third run (swapping?).
Sorry, only 1 on db_2. I don't know what I did earlier.

Daniel Shawul
Posts: 3749
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: 3 million games for training neural networks

Post by Daniel Shawul » Thu Oct 11, 2018 10:04 pm

AlvaroBegue wrote:
Sat Feb 24, 2018 3:08 am
Hi,

I had my 8-core Ryzen 7 computer spend about 2 months generating quick Stockfish-vs-Stockfish games. The result is 3 million games that I think could be used for training neural networks, or to tune evaluation functions.

I have the games available in PGN format with comments that include the score and search depth, or in a much more compact and easy to parse format: One game per line, consisting of moves in UCI notation followed by the result at the end.

Code: Select all

d2d4 g8f6 g1f3 g7g6 b1c3 f8g7 e2e4 d7d6 c1e3 c7c6 d1d2 b8d7 a2a4 e8g8 e3h6 e7e5 e1c1 d8a5 d2g5 g7h6 g5h6 d7b6 d1d3 c8e6 f3g5 b6d7 f1e2 b7b5 d4e5 d7e5 d3g3 b5b4 c3b1 b4b3 c2b3 c6c5 b1d2 c5c4 g5e6 f7e6 b3c4 a8b8 h6h3 b8b2 h3e6 g8g7 d2b3 b2b3 g3b3 f6e4 e6d5 e4c5 b3b5 a5c3 c1b1 f8f2 d5d6 f2e2 d6e7 g7h6 e7h4 h6g7 h4e7 g7h6 0-1
I used openings from swcr-fq-openings-v3.5.pgn , whose lines a fixed length of 8 moves for each player.

At least one person in this forum has expressed an interest in the data. I will make it available soon, but I wanted to ask this: Would you be interested in the PGN version or the games, or is the simple format enough?
Alvaro, did you have success with training neural networks with these games ?
I have had no success with using ccrl-cegt 40/40 games for training. The accuracy is about 50% and is unable
to beat a network trained from quiet.epd+your_epd_set (total 2M postions). I have trained even with ~1 billion positons
extracted from various PGN databases and using the game outcome but i still weaker network than one trained with 2M pos network.
I guess what i am seeing is quality is more important than quantity.
Maybe your games are better quality since they are all stockfish games (though at fast time control) so I will try.

Daniel

AlvaroBegue
Posts: 919
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: 3 million games for training neural networks

Post by AlvaroBegue » Thu Oct 11, 2018 11:53 pm

Daniel Shawul wrote:
Thu Oct 11, 2018 10:04 pm
AlvaroBegue wrote:
Sat Feb 24, 2018 3:08 am
Hi,

I had my 8-core Ryzen 7 computer spend about 2 months generating quick Stockfish-vs-Stockfish games. The result is 3 million games that I think could be used for training neural networks, or to tune evaluation functions.

I have the games available in PGN format with comments that include the score and search depth, or in a much more compact and easy to parse format: One game per line, consisting of moves in UCI notation followed by the result at the end.

Code: Select all

d2d4 g8f6 g1f3 g7g6 b1c3 f8g7 e2e4 d7d6 c1e3 c7c6 d1d2 b8d7 a2a4 e8g8 e3h6 e7e5 e1c1 d8a5 d2g5 g7h6 g5h6 d7b6 d1d3 c8e6 f3g5 b6d7 f1e2 b7b5 d4e5 d7e5 d3g3 b5b4 c3b1 b4b3 c2b3 c6c5 b1d2 c5c4 g5e6 f7e6 b3c4 a8b8 h6h3 b8b2 h3e6 g8g7 d2b3 b2b3 g3b3 f6e4 e6d5 e4c5 b3b5 a5c3 c1b1 f8f2 d5d6 f2e2 d6e7 g7h6 e7h4 h6g7 h4e7 g7h6 0-1
I used openings from swcr-fq-openings-v3.5.pgn , whose lines a fixed length of 8 moves for each player.

At least one person in this forum has expressed an interest in the data. I will make it available soon, but I wanted to ask this: Would you be interested in the PGN version or the games, or is the simple format enough?
Alvaro, did you have success with training neural networks with these games ?
I have had no success with using ccrl-cegt 40/40 games for training. The accuracy is about 50% and is unable
to beat a network trained from quiet.epd+your_epd_set (total 2M postions). I have trained even with ~1 billion positons
extracted from various PGN databases and using the game outcome but i still weaker network than one trained with 2M pos network.
I guess what i am seeing is quality is more important than quantity.
Maybe your games are better quality since they are all stockfish games (though at fast time control) so I will try.

Daniel
Daniel,

No, life caught up with me and I haven't done much NN training or chess development in general in many months. If you end up using these games for training, I would love to know how it went.

Álvaro.

Dann Corbit
Posts: 9986
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: 3 million games for training neural networks

Post by Dann Corbit » Fri Oct 12, 2018 12:28 am

Daniel Shawul wrote:
Thu Oct 11, 2018 10:04 pm
AlvaroBegue wrote:
Sat Feb 24, 2018 3:08 am
Hi,

I had my 8-core Ryzen 7 computer spend about 2 months generating quick Stockfish-vs-Stockfish games. The result is 3 million games that I think could be used for training neural networks, or to tune evaluation functions.

I have the games available in PGN format with comments that include the score and search depth, or in a much more compact and easy to parse format: One game per line, consisting of moves in UCI notation followed by the result at the end.

Code: Select all

d2d4 g8f6 g1f3 g7g6 b1c3 f8g7 e2e4 d7d6 c1e3 c7c6 d1d2 b8d7 a2a4 e8g8 e3h6 e7e5 e1c1 d8a5 d2g5 g7h6 g5h6 d7b6 d1d3 c8e6 f3g5 b6d7 f1e2 b7b5 d4e5 d7e5 d3g3 b5b4 c3b1 b4b3 c2b3 c6c5 b1d2 c5c4 g5e6 f7e6 b3c4 a8b8 h6h3 b8b2 h3e6 g8g7 d2b3 b2b3 g3b3 f6e4 e6d5 e4c5 b3b5 a5c3 c1b1 f8f2 d5d6 f2e2 d6e7 g7h6 e7h4 h6g7 h4e7 g7h6 0-1
I used openings from swcr-fq-openings-v3.5.pgn , whose lines a fixed length of 8 moves for each player.

At least one person in this forum has expressed an interest in the data. I will make it available soon, but I wanted to ask this: Would you be interested in the PGN version or the games, or is the simple format enough?
Alvaro, did you have success with training neural networks with these games ?
I have had no success with using ccrl-cegt 40/40 games for training. The accuracy is about 50% and is unable
to beat a network trained from quiet.epd+your_epd_set (total 2M postions). I have trained even with ~1 billion positons
extracted from various PGN databases and using the game outcome but i still weaker network than one trained with 2M pos network.
I guess what i am seeing is quality is more important than quantity.
Maybe your games are better quality since they are all stockfish games (though at fast time control) so I will try.

Daniel
Perhaps collections like high level TCEC games would be useful for training. They have scores attached to each position.
That should give very high quality answers for the actual value of a position.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Daniel Shawul
Posts: 3749
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: 3 million games for training neural networks

Post by Daniel Shawul » Fri Oct 12, 2018 3:41 pm

Dann Corbit wrote:
Fri Oct 12, 2018 12:28 am

Perhaps collections like high level TCEC games would be useful for training. They have scores attached to each position.
That should give very high quality answers for the actual value of a position.
Yes, I think that is what I need. Training from PGN games using the result tag just doesn't seem to work for me for some reason.
I just finished training with Alvaro's 3 mil games, while it plays good chess, still looses to the network trained from 2 M positions.

Daniel

AlvaroBegue
Posts: 919
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: 3 million games for training neural networks

Post by AlvaroBegue » Fri Oct 12, 2018 7:16 pm

If you do use games where White and Black are different engines (TCEC or CCRL), it would be interesting to inform the network of the Elo difference between the players (assuming Elo estimates are available). That input can then be used to implement something like contempt factor in a more principled way.

Ratosh
Posts: 71
Joined: Mon Apr 16, 2018 4:56 pm

Re: 3 million games for training neural networks

Post by Ratosh » Fri Oct 12, 2018 7:38 pm

Are those 2M positions available?

Daniel Shawul
Posts: 3749
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: 3 million games for training neural networks

Post by Daniel Shawul » Fri Oct 12, 2018 9:19 pm

Ratosh wrote:
Fri Oct 12, 2018 7:38 pm
Are those 2M positions available?
Those are 770k position from quiet_labeled.epd (zurichess), and 1.x million pos from Alvaro that everybody here has.
I don't know why I am not able to beat those networks from PGN-trained networks so far...
I just tried Cheng's 12 million positions metnioned in this thread, same outcome. But John's big3.epd + lichess.epd did help improve my network.

Daniel

Post Reply