41 million EPDs with evals for tuning purposes

chrisw · Post by **chrisw** » Sun Sep 20, 2020 11:02 pm

Ed Schroeder is kindly hosting.

http://rebel13.nl/download/data.html#one

From the README:

Data files for Texel Tuning (part II)

Chris Whittington contributed 41 million EPD positions analysed by Stockfish 11 at 25ms for Texel Tuning. In his own words:

Files contain 41 million EPD’s, sampled from LiChess PGN Database of human games on rebel13.nl

All with evaluation by Stockfish 11, search set at 25 milliseconds. Format is 6-part FEN plus centipawn evaluation, POV side on move. Filtered for a) legality, b) availability of more than one legal move from the position and c) not immediately game terminal. Sampling rate from PGNs was around 12%

Code: Select all

rnb1k2r/ppppqppp/5n2/4N3/1bPP4/2N5/PP2PPPP/R1BQKB1R b KQkq - 0 1; sf11=-89.0
r1bqk2r/pp1pppbp/2n3p1/8/2P1N3/1P3N2/P2QPPPP/R1B1KB1R b KQkq - 0 1; sf11=170.0
1r1q2kr/p6p/1nB2p1B/3b1P2/3p2P1/5Q1P/PP6/R4RK1 w - - 0 1; sf11=381.0
8/k7/P5R1/1Pb5/2P2r2/3K4/8/8 b - - 0 1; sf11=0.0
rn3k1b/4p2p/bqp2np1/p3P3/1p6/3B1P2/PPPQNNP1/R3K2R b KQ - 0 1; sf11=-678.0
2r1n1k1/1p3p1p/3p2p1/1p1Pp3/1B2P3/1P2bP2/P3N1PP/1R5K w - - 0 1; sf11=-20.0
8/4kp2/7p/5K1P/BP4P1/b7/8/8 b - - 0 1; sf11=-13.0

Suitable, and designed for (Texel) tuning of chess evaluation function. These are suitable for first shot tuning, proof of concept. Chess engine programmers would probably want to later develop their own testing sets and their own evaluations.

Release is into Public Domain. Thanks to Ed Schroeder for hosting.

Cordially,
Chris Whittington

cucumber · Post by **cucumber** » Mon Sep 21, 2020 7:45 pm

This is awesome. I was trying to come up with a similar set myself a while ago, but failed pretty badly. This is great, though. Thanks so much for creating and sharing it, and also thanks to Ed for hosting it.

Daniel Shawul · Post by **Daniel Shawul** » Fri Sep 25, 2020 1:09 am

Is there a "quiet" version of this collection? A 25ms stockfish search is probably depth=8 so there could be many tactical moves.
This is probably good for training NNUE or shallow nets but I am trying to tune my hand-crafted eval with this.
I can't get it to converge with this test set.
I may have a bug with my tuning code even though it doesn't seem to have a problem with quiet.epd of 725k positions.

Daniel Shawul · Post by **Daniel Shawul** » Fri Sep 25, 2020 4:59 am

Nevermind, it turns out to be a bug in my code with centi-pawn to winning percentage conversion.
Your dataset has about 5074 mini-batches (8192 positions per batch), so evaluation is updated only 5074 times per epoch.
I definitely need to do multiple epochs and see if I get a better evaluation than the one I trained with just 2 mill quiet positions.

Also I learned that NNUE training actually filters out non-quiet positons. I thought that it is a neural network (though shallow)
would atleast be good enough to learn some tactics but that doesn't seem to be the case.
You don't need to filter quiet positions for training deep ResNets with value/policy heads... Or maybe a small fraction
of non-quiet positions doesn't hurt anyway. The bug I had before made me think it was important but probably it isn't that important...

chrisw · Post by **chrisw** » Fri Sep 25, 2020 10:22 am

Daniel Shawul wrote: ↑Fri Sep 25, 2020 1:09 am Is there a "quiet" version of this collection? A 25ms stockfish search is probably depth=8 so there could be many tactical moves.
This is probably good for training NNUE or shallow nets but I am trying to tune my hand-crafted eval with this.
I can't get it to converge with this test set.
I may have a bug with my tuning code even though it doesn't seem to have a problem with quiet.epd of 725k positions.

You can quietify them yourself if you have AB source code (which by definition of engine programmer you will have). Problem with quietification is that it depends what you call quiet. I think the best way is to use the exact same definition as your own engine, namely your qsearch() and most engines have their own flavour of that. Run them past your Qsearch, get the PV (if there is one), play it out and use that position to train on. Those are the positions you ultimately ask for an evaluate() on.
On the other hand, if you are training for a holistic net evaluation, then you would want to keep all position types, quiet() no longer applies.

chrisw · Post by **chrisw** » Fri Sep 25, 2020 10:27 am

Daniel Shawul wrote: ↑Fri Sep 25, 2020 4:59 am Nevermind, it turns out to be a bug in my code with centi-pawn to winning percentage conversion.
Your dataset has about 5074 mini-batches (8192 positions per batch), so evaluation is updated only 5074 times per epoch.
I definitely need to do multiple epochs and see if I get a better evaluation than the one I trained with just 2 mill quiet positions.

Also I learned that NNUE training actually filters out non-quiet positons.

That makes sense because NNUE paradigm is using SF search, and that includes qsearch. Well maybe they change that one day. Possibly the small size if the net precludes disentangling non-quietness. LC0 paradigm uses way larger nets.

I thought that it is a neural network (though shallow)
would atleast be good enough to learn some tactics but that doesn't seem to be the case.
You don't need to filter quiet positions for training deep ResNets with value/policy heads... Or maybe a small fraction
of non-quiet positions doesn't hurt anyway. The bug I had before made me think it was important but probably it isn't that important...

chrisw · Post by **chrisw** » Fri Sep 25, 2020 10:57 am

Daniel Shawul wrote: ↑Fri Sep 25, 2020 1:09 am Is there a "quiet" version of this collection? A 25ms stockfish search is probably depth=8 so there could be many tactical moves.
This is probably good for training NNUE or shallow nets but I am trying to tune my hand-crafted eval with this.
I can't get it to converge with this test set.
I may have a bug with my tuning code even though it doesn't seem to have a problem with quiet.epd of 725k positions.

I see from later post, you kind of got this sorted. The 41M positions are not my best learning suites, they come from LiChess human games and I had to dump the PGN results of those, because they were just too unreliable. So they are useful only for the SF scores. One reason I put them up, well apart from all the people saying they really needed training suites, was to hopefully start a trend whereby engine programmers put their training suites into public domain for everybody to use. Generating suites is straightforward but consumes, or can consume, a lot of processing time, and can be beyond the reach of someone with say just a laptop to work with.
However, despite all the GPL sharing and caring morality, it's to be noted that this appears not to extend to test suites, and they remain "private". Just saying.
*If* there's a move to put "private" suites plus evals for "GPL" engines into the public domain, my intention is to distribute more or all of mine as well.

41 million EPDs with evals for tuning purposes

41 million EPDs with evals for tuning purposes

Re: 41 million EPDs with evals for tuning purposes

Re: 41 million EPDs with evals for tuning purposes

Re: 41 million EPDs with evals for tuning purposes

Re: 41 million EPDs with evals for tuning purposes

Re: 41 million EPDs with evals for tuning purposes

Re: 41 million EPDs with evals for tuning purposes