Re: Gradient Descent Introduction
Posted: Sun Dec 09, 2018 3:43 pm
Let's start very simple, and say each training example has an input x (let's say material balance - single number) and output y^ (result).Desperado wrote: ↑Sun Dec 09, 2018 3:32 pm Hello everybody,
i am interested in a simple gradient descent implementation. Unfortunately i am not able to put the puzzle pieces together.
Here is what i think that i understand and what i can do for now:
base model:
1. i have a sample set of positions including results
2. i have a parameter list with N elements.
3. i have a cost function (MSE)
a. to minimize the cost function, which is a squared function, i need the derivative which leads to a linear model y=mx+b.
b. solved this, i can tune the parameter the way, that y=mx+b gets close to 0.
example:
1. SAMPLESIZE 10000
2. PARAMETERLIST 5
3. MSE = (sum(result-computed_value)^2) / SAMPLESIZE
How do i have to iterate over my parameterlist and the samples to compute m,b ?
Do i have to compute m,b for each single sample ? m,b for the batch ? how do i get m,b for the batch ?
I red some articles on the web, but i am interested in the dialogue and the practice how to handle it in the context chess parameter tuning.
So, i think i got the idea but need to know how to do it.
Thanks a lot in advance...
Code: Select all
dL(x)/dw = dL(x)/dy(x) * dy(x)/dw # chain rule
= (y(x) - y^) * x
dL(x)/db = (y(x) - y^) # dy(x)/db = 1
Already at this point my confusion starts, sounds funny, i know...matthewlai wrote: ↑Sun Dec 09, 2018 3:57 pm Let's start very simple, and say each training example has an input x (let's say material balance - single number) and output y^ (result).
Code: Select all
x = feature-parameter-list
e(x) = evaluation using x
y^ = output of e(x)
y(x) = result given by sample
err = y(x) - y^
Maybe you will catch me before i can catch myself...And let's say you use a very simple model y(x) = wx + b. That is, you think y is linear to x. w and b would be your parameters. Note that here y(x) is the output of the model given x, and you want it to predict y^ (the actual result).
If you want to work with multiple features already, that's fine too, but single feature is easier and I just thought it would be best to get the single feature case sorted out first.Desperado wrote: ↑Mon Dec 10, 2018 9:21 pm Thanks both of you so far. Unfortunately my time is very limited, so there might be a delay for further posts.
Now,first, i would like to follow Matthew's description and i would appreciate if you can guide me through.
Already at this point my confusion starts, sounds funny, i know...matthewlai wrote: ↑Sun Dec 09, 2018 3:57 pm Let's start very simple, and say each training example has an input x (let's say material balance - single number) and output y^ (result).
1.=> e(x) = y^
For me x are my feature parameters (say material p,n,b,r,q ->1,3,3,5,9) used by my evaluation function e(), that produces an output y^
2. error = given result in sample - evaluation output = y(x) - y^
That is how i get the error (and finally sum all squared errors of the sample set).
But what you are saying is, that x is based on one feature. (eg material delta, put into a number). That only would go hand in hand
if my evaluation would be based on one feature (eg material balance). I simply do not have an x0,x1,...xn this way.
Code: Select all
x = feature-parameter-list e(x) = evaluation using x y^ = output of e(x) y(x) = result given by sample err = y(x) - y^
Based on my explanation above, y(x) is constant as x is constant too (my feature parameter list). That is why i would have only one x.
If i use the way you described, would i need to break the features/paramaters into seperated x0,x1,...,xn input values ?
like..
delta matrerial -> 200 = x0
delta mobility -> 40 = x1
delta passer -> 75 = x2
and so on...
Thanks a lot
PS: i guess it does not make sense to go on with your description as long this is not clarified. Maybe one additional note. The "parameters" m,b
are not the "paramters" i want to tune, but i need them to modify my x (feature parameters) rapidly. At least this is the way i think at the moment.
Maybe you will catch me before i can catch myself...And let's say you use a very simple model y(x) = wx + b. That is, you think y is linear to x. w and b would be your parameters. Note that here y(x) is the output of the model given x, and you want it to predict y^ (the actual result).
In general the goal of training is not changing your features. You are trying to find a function that given a set of features, will give you a useful output, and you achieve that by changing the weights (w0, w1, w2, b). Features are, in general, not something you control, but something you get from the board.PS: i guess it does not make sense to go on with your description as long this is not clarified. Maybe one additional note. The "parameters" m,b
are not the "paramters" i want to tune, but i need them to modify my x (feature parameters) rapidly. At least this is the way i think at the moment.