That's pretty much spot on.Aaron Becker wrote: I can't speak for the authors, but I think they're trying to prove that their technique is superior to the earlier TD techniques that they refer to as TD-Leaf and TD-Root, and I think they make a reasonable case. You say that you could use any learning technique to improve on random data, but those earlier TD techniques don't work well without a reasonable starting vector. Obviously improving a real top engine would be a better result, but rewriting the evaluation of an engine like Stockfish to be expressed as a weighted feature vector would be a lot of work on its own, and improving values that have been painstakingly hand-tuned is an awfully high bar to clear for a new technique.
Yes, I am also curious to know whether this technique (or an adaptation of) could help a strong engine. It is something I would personally be interested in trying in the future...that said, I think it would be crazy to invest such a huge effort in a new technique, without first validating it with some self play / random weight experiments on a simple program.
Some more general comments (directed at the whole thread):
a) This paper looks at learning from self play, starting from random weights....and nothing more. So please try and understand it in this context, and don't get angry when it doesn't answer _your_ question.
b) I agree that the paper could have been improved by reporting the results of multiple learning runs. If I get a chance in the future, I will address this. For what it is worth, in my private testing (with different, and progressively more sophisticated sets of features), the TreeStrap method consistently outperformed TD-Leaf by similar margins.
c) Apologies if the paper is difficult to read. The mathematics is really not that much. If you can understand TD-Leaf, or an ANN, then you can easily understand this work. I might post a chess programmer friendly version on my website if there is sufficient interest, in the meantime, please email me if you have any questions.
d) Automatic evaluation improvement of strong Chess engines is an entire research topic in itself. It's different and more open ended than learning from scratch with self-play. And I dare say more difficult. e.g. You can try to use the information provided by strong engines in a number of different ways (build a score based training set, try to match moves, etc)... Or, as mentioned somewhere in this thread, you could look at methods that only tune a small number of parameters in isolation... Or you can use a cluster and some computation heavy technique like simulated annealing (with ELO as the objective function)... Take your pick, be creative, and enjoy the age of auto-tuning.