You're right, I forgot to mention the error functions: cross entropy vs. sum of squares.Gerd Isenberg wrote:Wow, so simple is that
Slowly the dust settles. Thanks for the insight, yes, a single-neuron ANN.
I don't exactly get that with the cost function requiering gradient descent to find the optimum, i.e. the error formula with least squares as used in Texel Tuning and Deep Thought Tuning and the "wild formulas" in Buro's paper?
http://www.jair.org/media/179/live-179-1475-jair.pdf
I would expect statisticians to be religious about using cross entropy when you want to interpret the output as a probability. In practice, we can just try both
I think the practical advice that can be found on the net about ANN learning parameters is both very useful and easier to understand (e.g. the bottom of https://visualstudiomagazine.com/Articl ... spx?Page=1 though this focuses on data representation):
- if the output is continuous, use a linear output and sum of squares as error function
- if the output is binary/probability, use the logistic function + cross entropy
- if the output is discrete, use Softmax + log loss (not sure about the official name)
The binary case is just an optimisation of Softmax with two outputs (since q = 1 - p), but IMO it complicates the formulas.
That was for choosing an error function though, which might not be what your question was about.
Fabien.