Instead of wasting so much electricity, which don't you ask Demis Hassabis for insights?crem wrote: ↑Sun Nov 25, 2018 11:21 pm 1. So far in our tests we fail to reach A0 strength given the same number of training games (44 million).
2. We don't know why it is. Maybe we have a bug (probably), maybe we use wrong FPU, maybe we guessed wrong Cpuct, maybe we understood the paper incorrectly, maybe we don't shuffle training games good enough, maybe we release new network too rarely, maybe something else.
3. I agree that the best (or rather the only) way to get consistent improvements is to run lots of small tests with different ideas.
4. Currently the way to do such tests is not developed (it's discussed for 6 months already, but it's constantly being preempted by more urgent tasks [rushing release for some CCCC/TCEC season, or changing Lc0 so that new features could be added in more elegant way, or implementing some new Lc0 feaure myself because it's more fun]).
5. Without implemented easy way of testing, setting up and running a fresh tests is a cumbersome task. Especially if it requires engine changes, then currently it takes weeks to roll it out. Server-side part is not one-click thing either, requires some hours of wiring up training scripts, data transfer, typing some SQL, making sure that clients still not send training data from old test after restart etc.
6. Often things are not changed just because the changes needed for a new idea are not implemented yet. Or sometime it's because all devs are too busy with their non-Lc0 life for a week or two etc.
7. Yes, current use of contributors' GPU is not optimal. But to make it more optimal, things have to be implemented, and devs just cannot keep up.
8. Current idea (from my perception) is "We'll do testing properly (on many small-scale experiments that anyone can submit, and statistically sound conclusions) when we have a framework. Until that's ready, let's run full-size test with intuitively guessed params/ideas and hope it will be stronger that everything we had before."
So, yes we fail to reach A0 level, yes we should run well designed experiments, yes we should have done lots of them, yes they should be small and frequent instead of rare and large (and largely based on just intuitive guess instead of some scientifically sound method), but there's really no infrastructure and very little dev time to implement this infrastructure. And even for doing it manually, idea of starting a new small test every week is too time-consuming.
I totally agree that if some team of 2-3 full time developers would appear, they would leave LCZero project behind within one month. I don't know what to do with that knowledge though.
PS. For "More resources were used than in DeepMind A0 project, not being at all near A0 level strength with 20xxx and 30xxx nets." I hope you mean one run of DM vs one run of Lc0. For total amount of resources (for trial and testing), I'm sure DeepMind used hundreds if not thousands times more resources than we did so far.
He may not give you the exact secret sauce for everything, but he can at least bring you some clarifications on what you've assumed from his paper (where unclear), perhaps what parameters he used, or at least ideas on how to estimate such parameters.
A simple email could save the planet a few GWh. Think of the polar bears, man…