How to calc the derivative for gradient descent?
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
How to calc the derivative for gradient descent?
Hello, this is my first post.
I'd like to know which way you suggest for calculating the derivative of the evaluation for each parameter for gradient descent with Texel tuning. I've read about (Eval(xi+1)Eval(xi1))/2, Eval(xi+1)Eval(xi), auto differentiation libraries, Jacobian matrix and so forth.
I'm currently using local search. My engine is written in C.
I'd like to know which way you suggest for calculating the derivative of the evaluation for each parameter for gradient descent with Texel tuning. I've read about (Eval(xi+1)Eval(xi1))/2, Eval(xi+1)Eval(xi), auto differentiation libraries, Jacobian matrix and so forth.
I'm currently using local search. My engine is written in C.

 Posts: 1030
 Joined: Tue Apr 19, 2016 4:08 am
 Location: U.S.A
 Full name: Andrew Grant
 Contact:
Re: How to calc the derivative for gradient descent?
Might not directly answer your question, but I can plug something I wrote on the topic.BrianNeal wrote: ↑Mon Jan 04, 2021 9:03 pmHello, this is my first post.
I'd like to know which way you suggest for calculating the derivative of the evaluation for each parameter for gradient descent with Texel tuning. I've read about (Eval(xi+1)Eval(xi1))/2, Eval(xi+1)Eval(xi), auto differentiation libraries, Jacobian matrix and so forth.
I'm currently using local search. My engine is written in C.
There is a section dedicated to the derivative of each term.

 Posts: 927
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: How to calc the derivative for gradient descent?
This is much easier to do in C++, because you can just use a custom data type instead of `float' and the gradient will be computed for you automatically.
For instance, see the example here: https://github.com/autodiff/autodiff
It's not hard to write your own reversemode automatic differentiation library in C++.
For instance, see the example here: https://github.com/autodiff/autodiff
It's not hard to write your own reversemode automatic differentiation library in C++.

 Posts: 568
 Joined: Mon Jul 20, 2015 3:06 pm
 Contact:
Re: How to calc the derivative for gradient descent?
My question is : What does the slope do once it is calculated? The calculation is for a variable based upon a local minimum. However, there may be several local minimums for a position (king safety, pawn promotions, etc.) , and the current best estimator will change with the next iteration, and the slope is in the wrong direction for the convergence.BrianNeal wrote: ↑Mon Jan 04, 2021 9:03 pmHello, this is my first post.
I'd like to know which way you suggest for calculating the derivative of the evaluation for each parameter for gradient descent with Texel tuning. I've read about (Eval(xi+1)Eval(xi1))/2, Eval(xi+1)Eval(xi), auto differentiation libraries, Jacobian matrix and so forth.
I'm currently using local search. My engine is written in C.
Re: How to calc the derivative for gradient descent?
If the eval is linear, then you have only a single global minimum.D Sceviour wrote: ↑Tue Jan 05, 2021 5:00 pmMy question is : What does the slope do once it is calculated? The calculation is for a variable based upon a local minimum. However, there may be several local minimums for a position (king safety, pawn promotions, etc.) , and the current best estimator will change with the next iteration, and the slope is in the wrong direction for the convergence.BrianNeal wrote: ↑Mon Jan 04, 2021 9:03 pmHello, this is my first post.
I'd like to know which way you suggest for calculating the derivative of the evaluation for each parameter for gradient descent with Texel tuning. I've read about (Eval(xi+1)Eval(xi1))/2, Eval(xi+1)Eval(xi), auto differentiation libraries, Jacobian matrix and so forth.
I'm currently using local search. My engine is written in C.
Re: How to calc the derivative for gradient descent?
One way of making convergence faster and more reliable is to do batch updates. The reason is that with batch updates you get much more reliable approximation of the direction. When I made some simple CNN’s for picture recognition I observed that doing big batches helped reducing the validation error enormously and my net could become much much better. The training error was a little bit worse (if I don’t remember it wrong) and that is just good. What it shows is that when you are doing single updates, each update will be like a Brownian motion just drifting more and more from optimum/optima and the optimum it will attract is probably only good for your training set.D Sceviour wrote: ↑Tue Jan 05, 2021 5:00 pmMy question is : What does the slope do once it is calculated? The calculation is for a variable based upon a local minimum. However, there may be several local minimums for a position (king safety, pawn promotions, etc.) , and the current best estimator will change with the next iteration, and the slope is in the wrong direction for the convergence.BrianNeal wrote: ↑Mon Jan 04, 2021 9:03 pmHello, this is my first post.
I'd like to know which way you suggest for calculating the derivative of the evaluation for each parameter for gradient descent with Texel tuning. I've read about (Eval(xi+1)Eval(xi1))/2, Eval(xi+1)Eval(xi), auto differentiation libraries, Jacobian matrix and so forth.
I'm currently using local search. My engine is written in C.
If you have a very good set of values and just want to get a good value for a new feature I have a really good way of doing so.

 Posts: 568
 Joined: Mon Jul 20, 2015 3:06 pm
 Contact:
Re: How to calc the derivative for gradient descent?
How does one prove the eval is linear? From the information available at a glance it appears to be a struggle for mathematicians:tomitank wrote: ↑Tue Jan 05, 2021 8:32 pmIf the eval is linear, then you have only a single global minimum.D Sceviour wrote: ↑Tue Jan 05, 2021 5:00 pmMy question is : What does the slope do once it is calculated? The calculation is for a variable based upon a local minimum. However, there may be several local minimums for a position (king safety, pawn promotions, etc.) , and the current best estimator will change with the next iteration, and the slope is in the wrong direction for the convergence.BrianNeal wrote: ↑Mon Jan 04, 2021 9:03 pmHello, this is my first post.
I'd like to know which way you suggest for calculating the derivative of the evaluation for each parameter for gradient descent with Texel tuning. I've read about (Eval(xi+1)Eval(xi1))/2, Eval(xi+1)Eval(xi), auto differentiation libraries, Jacobian matrix and so forth.
I'm currently using local search. My engine is written in C.
https://math.stackexchange.com/question ... alminimum
https://math.stackexchange.com/question ... alminimum
Can anybody translate the links to NN chess for dummies? King placement is very important and this seems to be a starting point for NNUE tuning. What about pawn promotion which may or may not be king placement sensitive?

 Posts: 568
 Joined: Mon Jul 20, 2015 3:06 pm
 Contact:
Re: How to calc the derivative for gradient descent?
Thank you. That is very good explanation of the importance of batch updating. This looks worth further experimentation.Pio wrote: ↑Tue Jan 05, 2021 8:38 pmOne way of making convergence faster and more reliable is to do batch updates. The reason is that with batch updates you get much more reliable approximation of the direction. When I made some simple CNN’s for picture recognition I observed that doing big batches helped reducing the validation error enormously and my net could become much much better. The training error was a little bit worse (if I don’t remember it wrong) and that is just good. What it shows is that when you are doing single updates, each update will be like a Brownian motion just drifting more and more from optimum/optima and the optimum it will attract is probably only good for your training set.D Sceviour wrote: ↑Tue Jan 05, 2021 5:00 pmMy question is : What does the slope do once it is calculated? The calculation is for a variable based upon a local minimum. However, there may be several local minimums for a position (king safety, pawn promotions, etc.) , and the current best estimator will change with the next iteration, and the slope is in the wrong direction for the convergence.BrianNeal wrote: ↑Mon Jan 04, 2021 9:03 pmHello, this is my first post.
I'd like to know which way you suggest for calculating the derivative of the evaluation for each parameter for gradient descent with Texel tuning. I've read about (Eval(xi+1)Eval(xi1))/2, Eval(xi+1)Eval(xi), auto differentiation libraries, Jacobian matrix and so forth.
I'm currently using local search. My engine is written in C.
If you have a very good set of values and just want to get a good value for a new feature I have a really good way of doing so.
Re: How to calc the derivative for gradient descent?
The discussion about linearity in evaluation function is very bad. You might have a function that is linear with respect to your features, I.e. a linear combination of those features, but the features themselves might not be linear. You can also create a linear function of your highly non linear evaluation function. Just define your entire evaluation function as a feature and you have a linear evaluation function.D Sceviour wrote: ↑Tue Jan 05, 2021 10:02 pmHow does one prove the eval is linear? From the information available at a glance it appears to be a struggle for mathematicians:tomitank wrote: ↑Tue Jan 05, 2021 8:32 pmIf the eval is linear, then you have only a single global minimum.D Sceviour wrote: ↑Tue Jan 05, 2021 5:00 pmMy question is : What does the slope do once it is calculated? The calculation is for a variable based upon a local minimum. However, there may be several local minimums for a position (king safety, pawn promotions, etc.) , and the current best estimator will change with the next iteration, and the slope is in the wrong direction for the convergence.BrianNeal wrote: ↑Mon Jan 04, 2021 9:03 pmHello, this is my first post.
I'd like to know which way you suggest for calculating the derivative of the evaluation for each parameter for gradient descent with Texel tuning. I've read about (Eval(xi+1)Eval(xi1))/2, Eval(xi+1)Eval(xi), auto differentiation libraries, Jacobian matrix and so forth.
I'm currently using local search. My engine is written in C.
https://math.stackexchange.com/question ... alminimum
https://math.stackexchange.com/question ... alminimum
Can anybody translate the links to NN chess for dummies? King placement is very important and this seems to be a starting point for NNUE tuning. What about pawn promotion which may or may not be king placement sensitive?
I have previously had the idea about doing a pawn eval using an NN since it can be reused 99 % of the time. I guess you could do a very advanced neural network function for the pawns only, maybe learn 16 bytes output for each square that will later be used as input to kings’ placement, combining the info with a new NN producing a more complex representation per square where the original pawn output will be identity mapped and the combined pawn and king output will go to another 8 bytes or so. The nice thing is that you will seldom move your king so a big part of the calculation can be saved.
You can do the same thing for other pieces but I don’t think you will gain so much since they usually move a lot.
Good luck with your engine !!!

 Posts: 568
 Joined: Mon Jul 20, 2015 3:06 pm
 Contact:
Re: How to calc the derivative for gradient descent?
I believe chess evaluation is nonlinear, buy any argument is welcome that would demonstrate linearity. However, creating a linear function of a highly nonlinear evaluation function should lead to very bad results.Pio wrote: ↑Tue Jan 05, 2021 10:47 pmThe discussion about linearity in evaluation function is very bad. You might have a function that is linear with respect to your features, I.e. a linear combination of those features, but the features themselves might not be linear. You can also create a linear function of your highly non linear evaluation function. Just define your entire evaluation function as a feature and you have a linear evaluation function.