Thanks for the replies
. I have been trying to study up a bit on calculus and read some docs/examples on derivatives and gradient descent.
Alvaro wrote:Compute the derivative of the loss function (the thing you are trying to minimize) with respect to each of the parameters.
Here's where I am lost.
First, just to be clear, isn't my loss function (the "sum of the squares"):
E= 1/N * Sum(i=1,N, (R(i) - Sigmoid(Q(i))^ 2)
If this is correct:
* All the derivative examples I covered were all basics, like y=x^2 + x - 4, so I do not know how to calculate this equation's derivative "with respect" to anything
. I need to keep studying.
* Also, since any qsearch result is a single value based on a bunch of parameters, how could one calculate the derivative with respect to a single parameter that is fed into the QSearch function? I believe Peter Osterlund wrote it as "this is a black box".
I looked at Arasan's code and it looks like "computeTexelDeriv" is where the dT is actually calculated. Unfortunately, I need to look at this code a lot longer to understand what is going on.
Because of this, I looked for additional answers and read Peter wrote something like this (going from memory here
) :
dE / dP ~= E(p(i)+1) - E(p(i))
This is the direction I have been thinking and am starting to test. Basically, with all parameters at their defaults, calculate E, then adjust each parameter one at a time while getting a new E for each parameter. This dE/dP is more like a delta or change in E. This tells me a direction which to increase or decrease that parameter but not a rate or pace in which to make the change. Once I collect the dE/dP for each parameter, I could start to walk all the parameters simultaneously but I do not know the rate at which each parameter needs to change.
Is this line of thinking OK or valid? Maybe this is where and why I need to understand the derivative and its purpose?
Thanks again