Page 1 of 1

Leela data publicly available for use

Posted: Tue Jun 15, 2021 10:04 pm
by Madeleine Birchfield
Recently the Leela development team put out the following blog post:

https://lczero.org/blog/2021/06/the-imp ... open-data/
2021-06-14

The importance of open data

In the Leela Chess project, we generate a huge amount of data. We use them to generate the network files to use with Lc0 for further data generation, but also with other chess engines, like Ceres. The same data are often used by individual project contributors to generate additional network files using the “supervised learning” approach.

Our intention has always been for “our” data to be open and available to everyone to use. To that end, we adopted an open license to allow their wide use:
This collection of training data for Leela Chess Zero is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/
Therefore we are very pleased that Stockfish, starting from today, is using a NNUE network file trained on the same data

Both projects have mentioned before that our “teams will join forces to demonstrate our commitment to open source chess engines and training tools, and open data.” This is the first concrete result stemming from this effort, and we promise it won’t be the last.
This might be of great use for training networks and tuning evaluations.

Re: Leela data publicly available for use

Posted: Wed Jun 16, 2021 12:27 am
by noobpwnftw
It always have been. Efforts were made to keeping those data available for everyone, the only problem is few people don't seem to willingly pay any tribute or even care enough to spell people's last name right.

Re: Leela data publicly available for use

Posted: Wed Jun 16, 2021 10:00 am
by Andrew
There have already been two versions on Abrok using this which is nice to see!

Andrew

Re: Leela data publicly available for use

Posted: Wed Jun 16, 2021 8:13 pm
by Ozymandias
So they already did the equivalent of what AS did for FF2? Looks already stronger and it can only get better. Time for a SF14 that will retake ALL the first spots in the rating lists.

Re: Leela data publicly available for use

Posted: Thu Jun 24, 2021 7:17 pm
by dkappe
So I started training Night Nurse from data generated by Bad Gyal 8 (then 9) as an exercise to see what kind of nnue net a mcts/nn engine would spawn. This data was generated using uci over a set of random openings and also a very large set of human < +-200 cp openings.

After generating a large amount of this data, I thought about converting all the “free” Bad Gyal self-play training data I had sitting about. A few lines of python later and I had maybe 250m positions I could add to my existing 300m positions. Instant elo boost, right?

Nope. They added maybe 10 elo. Testing with just the training data, it was maybe 80 elo weaker than a nnue net trained on the non-training data.

Now I had found a sweet spot of lambda = 0.7 for the Bad Gyal data. Moving to lambda 1.0 reduced the difference to 20 elo, but didn’t wipe it out, and the resulting nets were weaker than the 0.7 nets. My hypothesis is that the use of temperature makes the data perform worse when used with lambda < 1.0.

There’s quite a bit to critique about my experiment — data that doesn’t exactly match the source nets, etc. — but the difference was big enough that I stopped using training data as a source.

Re: Leela data publicly available for use

Posted: Sun Jun 27, 2021 10:24 pm
by Wilson
And what about the "NNUE bandwagon"?

https://lczero.org/blog/2021/04/jumping ... bandwagon/

Re: Leela data publicly available for use

Posted: Mon Jun 28, 2021 12:07 am
by Madeleine Birchfield
Wilson wrote: Sun Jun 27, 2021 10:24 pm And what about the "NNUE bandwagon"?

https://lczero.org/blog/2021/04/jumping ... bandwagon/
That particular blog post about jumping on the NNUE bandwagon was written on April Fools Day.