tool to create derivates of a given function
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 3724
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
Re: tool to create derivates of a given function
Actually there is no need for this if the eval is linear  which it commonly is. Compute it through finite difference once and store the Jacobian matrix, after that one doesn't have to call back eval() again. An example implementation is here : https://github.com/dshawul/Scorpio/blob/master/eval.cpp The jacobian takes about 1.5G bytes to store for 400 or so variables. The tuning with the Jacobian is much much faster than without it as one have to just multiply coefficients to compute the eval. I guess I could use SIMD vectorization on CPUs or even GPUs to make computing the eval and its derivative much much faster. Another advantage of this is that you can still use ints in your eval, and use float/double to store the Jacobian. Moreover, the "standard" implementation of autodiff using operator overloading is pretty slow compared to the sourcetosource transformation used in fortran and other languages.
Daniel
Daniel

 Posts: 919
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: tool to create derivates of a given function
Oh, that's my fault. I could have provided those operators (and I probably should have).Rein Halbersma wrote:[...]
It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune
It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).

 Posts: 685
 Joined: Tue May 22, 2007 9:13 am
Re: tool to create derivates of a given function
It's not obvious to me how to implement it, and Python autograd also does not support it. You want that a = b; a += c; to be equivalent to a = b + c; I am curious to how you would implement it.AlvaroBegue wrote:Oh, that's my fault. I could have provided those operators (and I probably should have).Rein Halbersma wrote:[...]
It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune
It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).

 Posts: 919
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: tool to create derivates of a given function
Rein Halbersma wrote:It's not obvious to me how to implement it, and Python autograd also does not support it. You want that a = b; a += c; to be equivalent to a = b + c; I am curious to how you would implement it.AlvaroBegue wrote:Oh, that's my fault. I could have provided those operators (and I probably should have).Rein Halbersma wrote:[...]
It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune
It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).
Code: Select all
inline Variable &operator+=(Variable &v1, Variable v2) {
return v1 = v1 + v2;
}
Here's a sample program, using autodiff.hpp from RuyTune:
Code: Select all
#include "autodiff.hpp"
namespace autodiff {
inline Variable &operator+=(Variable &v1, Variable v2) {
return v1 = v1 + v2;
}
}
template <typename Score>
Score three_times(Score s) {
s += 2.0 * s;
return s;
}
int main() {
autodiff::Variable x = 2.0;
autodiff::Variable y = three_times(x);
y.p>derivative = 1.0;
autodiff::backpropagate();
std::cout << x.p>derivative << '\n'; // Prints 3
}

 Posts: 685
 Joined: Tue May 22, 2007 9:13 am
Re: tool to create derivates of a given function
Ah, that's a nice way to do it. One thing that bothers me, however, is that it is not the canonical way to overload operators. Usually you define op@= (with @ = *, /, +,  etc.) as a member function, returning *this, and then you define op@ as a nonmember function in terms of op@=.AlvaroBegue wrote:That seems to work in RuyTune. Through the magic of shared_ptr the old value of v1 is not lost.Code: Select all
inline Variable &operator+=(Variable &v1, Variable v2) { return v1 = v1 + v2; }
Your way of defining op@= in terms of op@ is nonstandard. I would have to think it through very carefully if all the usual arithmetic properties are still being satisfied (likely, but I'd like to prove it to myself first ).
One other thing that I noticed in your RuyTune library is that you have a global storage of the computational graph, where you rely on the LIFO nature in which the various contributions to the loss functions are being added. The way e.g. the machine learning library PyTorch works, is to have each variable hold its child edges by shared pointer, that the user then calls backward() on any such variable (in Python Autograd library, this is done implicitly), after which you can call grad() on any of its children (the weights to be learned).
Here's a toy implementation of this approach:
Code: Select all
#include <cassert> // assert
#include <iostream> // cout
#include <memory> // make_shared, shared_ptr
#include <utility> // pair
#include <vector> // vector
// class with value semantics
template<class T>
class Var
{
struct Node;
using EdgeWeight = std::pair<std::shared_ptr<Node>, T>;
// nested type held by shared_ptr inside Var
struct Node
{
T data {};
T grad {};
std::vector<EdgeWeight> edge;
explicit Node(T const& v) noexcept
:
data(v)
{}
Node(T const& v, std::vector<EdgeWeight> const& e)
:
data(v),
edge(e)
{}
};
std::shared_ptr<Node> node_ptr;
Node const* operator>() const noexcept { return node_ptr.get(); }
// recursive traversal of all child nodes
void propagate(std::shared_ptr<Node> const& n)
{
for (auto i = std::size_t{0}, N = n>edge.size(); i < N; ++i) {
auto& c = n>edge[i].first;
c>grad += n>edge[i].second * n>grad;
propagate(c);
}
}
public:
Var(T const& v)
:
node_ptr(std::make_shared<Node>(v))
{}
Var(T const& v, std::vector<EdgeWeight> const& e)
:
node_ptr(std::make_shared<Node>(v, e))
{}
// back propagation over all child nodes
void backward()
{
node_ptr>grad = T{1};
propagate(node_ptr);
}
// requires previous call to backward()
friend auto grad(Var const& v)
{
return v>grad;
}
friend std::ostream& operator<<(std::ostream& ostr, Var<T> const& v)
{
return ostr << v>data;
}
friend Var operator+(Var const& L, Var const& R)
{
return
{
L>data + R>data,
{
{ L.node_ptr, T{1} },
{ R.node_ptr, T{1} }
}
};
}
friend Var operator*(Var const& L, Var const& R)
{
return
{
L>data * R>data,
{
{ L.node_ptr, R>data },
{ R.node_ptr, L>data }
}
};
}
};
int main()
{
Var w1 = 2.0;
Var w2 = 3.0;
Var w3 = 5.0;
Var loss = w1 + w2 * w3; // forward computation of loss is implicit
std::cout << w1 << '\n';
std::cout << w2 << '\n';
std::cout << w3 << '\n';
std::cout << loss << '\n';
loss.backward(); // backward propagation of gradient is explicit
std::cout << grad(w1) << '\n';
std::cout << grad(w2) << '\n';
std::cout << grad(w3) << '\n';
}

 Posts: 919
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: tool to create derivates of a given function
Oh, that does seem better than my solution. I'll study it carefully and I will probably adopt it. Thanks!

 Posts: 685
 Joined: Tue May 22, 2007 9:13 am
Re: tool to create derivates of a given function
Finite difference computations of gradients scale linearly in the number of parameters, whereas automatic differentiation is a constant (for both methods, it is also scales as a constant times the forward computation of the loss). There's a reason that all modern machine learning libraries use automatic differentiation.Daniel Shawul wrote:Actually there is no need for this if the eval is linear  which it commonly is. Compute it through finite difference once and store the Jacobian matrix, after that one doesn't have to call back eval() again. An example implementation is here : https://github.com/dshawul/Scorpio/blob/master/eval.cpp The jacobian takes about 1.5G bytes to store for 400 or so variables. The tuning with the Jacobian is much much faster than without it as one have to just multiply coefficients to compute the eval. I guess I could use SIMD vectorization on CPUs or even GPUs to make computing the eval and its derivative much much faster. Another advantage of this is that you can still use ints in your eval, and use float/double to store the Jacobian. Moreover, the "standard" implementation of autodiff using operator overloading is pretty slow compared to the sourcetosource transformation used in fortran and other languages.
Daniel

 Posts: 685
 Joined: Tue May 22, 2007 9:13 am
Re: tool to create derivates of a given function
I will open a new project Autograd on GitHub, will probably be my xmas break project.AlvaroBegue wrote:Oh, that does seem better than my solution. I'll study it carefully and I will probably adopt it. Thanks!
I'm still not completely satisfied. Feature it needs IMO is that every Var should hold a std::set of all it's unique children. This enables writing loss.grad(w1) instead of the toy example's grad(w1), so that you can compute gradients of any variable with respect to any of its children. Look at Python's autograd or PyTorch for inspiration about nice syntactic sugar
Re: the op+= example that you gave. There are some corner cases like x+=a; followed by x=a, or selfassignment x=x; that need careful unit testing to make sure they are correct. Or you need a general proof that no cycles between shared pointers can be made through your definitions. As soon as you have shared_ptr cycles, there's a bug.

 Posts: 919
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: tool to create derivates of a given function
I don't get what seems so scary about the operator+= I wrote. I am as convinced that it is correct as any other part of the library. In some sense my very primitive implementation, with a global vector that stores all the relevant partial derivatives, makes this easier to think about. The time ordering of the operations is proof that there is no loop.Rein Halbersma wrote: Re: the op+= example that you gave. There are some corner cases like x+=a; followed by x=a, or selfassignment x=x; that need careful unit testing to make sure they are correct. Or you need a general proof that no cycles between shared pointers can be made through your definitions. As soon as you have shared_ptr cycles, there's a bug.
But if you do find a case where it fails, I am ready to learn something new.

 Posts: 3724
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
Re: tool to create derivates of a given function
You misunderstood, the FD gradient calculation is done only once at the the beginning and then you store the gradient for all positions. During optimization you read the eval and its gradient from the stored Jacobian matrix so this is faster than anything you can come up with that calls eval() during optimization  which the autodiff approach does. I mentioned in the other thread how the jacobian method can be used for some nonlinear eval() as well.Rein Halbersma wrote:Finite difference computations of gradients scale linearly in the number of parameters, whereas automatic differentiation is a constant (for both methods, it is also scales as a constant times the forward computation of the loss). There's a reason that all modern machine learning libraries use automatic differentiation.Daniel Shawul wrote:Actually there is no need for this if the eval is linear  which it commonly is. Compute it through finite difference once and store the Jacobian matrix, after that one doesn't have to call back eval() again. An example implementation is here : https://github.com/dshawul/Scorpio/blob/master/eval.cpp The jacobian takes about 1.5G bytes to store for 400 or so variables. The tuning with the Jacobian is much much faster than without it as one have to just multiply coefficients to compute the eval. I guess I could use SIMD vectorization on CPUs or even GPUs to make computing the eval and its derivative much much faster. Another advantage of this is that you can still use ints in your eval, and use float/double to store the Jacobian. Moreover, the "standard" implementation of autodiff using operator overloading is pretty slow compared to the sourcetosource transformation used in fortran and other languages.
Daniel
In any case, machine learning people are kind of newbies to autodiff. The method has been in use in computational fluid dynamics for optimization of nonlinear differential equations for long. The sourcetosource autodiff used in fortran is much faster than the c++ operator overloading approach because compilers are not good at all in compiling that kind of code. Even we have to do some optimization to handle a dual (mg,eg) score in an efficient way. Using atleast expression templates for lazy evaluation of expressions is necessary to get anywhere close to the sourcetosource transformed code. I recall there is such library that would hopefully bring c++ to the level of fortran when it comes to autodiff. This used to be the case for matrix manupilation as well but now c++ is on the level of fortran thanks to expression templates.
Edit: Something like this library http://www.met.reading.ac.uk/clouds/adept/ for C++.