hgm wrote:
If one, however, tries several approaches and elaborates on the one that works best, it is extremely unlikely to fail.
...
Not necessarily. Sometimes a match of 10 games is enough to show with > 95% confidence which engine is best. Of course we are not interested in finding something that only works marginally better. We are interested in finding something that works orders of magnitude better. Like achieving with 3 residual blocks and 64 filters what otherwise would take 20 residual blocks and 256 filters.
But there's the rub.
1. Trying a different approach takes as long to validate as the first. Even just a million games can take up to a week to run and process. And what do you know at the end? Only how that approach performed for the first million games of training. It may have reached that level faster or slower than the first approach, but that tells you nothing about the limit. So really, you have to take every approach extremely far to know if it is better or worse than another.
2. What idea do you have that is likely to work orders of magnitude better/faster? For two orders of magnitude, you could make 100 attempts and still come out ahead. But what if you didn't find anything that is orders of magnitude faster? Then you are a year later on, and have absolutely nothing to show, because you didn't pursue any of them.
3. There's nothing preventing you or anyone else from trying a better approach. Complaining that one team decided to take approach A instead of first testing B, C, D, and E, is a bit like complaining that Stockfish is spending its time on evaluation and extension tweaks instead of starting a research group to find the grand unifying theory of chess, which would make all chess engines obsolete.
I may think that the fastest way to train a deep net is by training random 3 piece positions, and gradually increase the number of pieces. Great. But if I want to do that, it's on me - I shouldn't expect other people to drop what they are doing to prove that it doesn't work nearly as well.