tool to create derivates of a given function

brtzsnr · Post by **brtzsnr** » Tue Nov 07, 2017 11:18 am

Useful for users of the Texel Tuning method. If you have a model more complex than a linear function you can use this new library to compute the derivative for you.

https://research.googleblog.com/2017/11 ... gable.html

Rein Halbersma · Post by **Rein Halbersma** » Tue Nov 07, 2017 11:25 am

brtzsnr wrote:Useful for users of the Texel Tuning method. If you have a model more complex than a linear function you can use this new library to compute the derivative for you.

https://research.googleblog.com/2017/11 ... gable.html

It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune

It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).

Python's autograd has similar restrictions: https://github.com/HIPS/autograd/blob/m ... utorial.md

Daniel Shawul · Post by **Daniel Shawul** » Tue Nov 07, 2017 12:43 pm

Actually there is no need for this if the eval is linear -- which it commonly is. Compute it through finite difference once and store the Jacobian matrix, after that one doesn't have to call back eval() again. An example implementation is here : https://github.com/dshawul/Scorpio/blob/master/eval.cpp The jacobian takes about 1.5G bytes to store for 400 or so variables. The tuning with the Jacobian is much much faster than without it as one have to just multiply coefficients to compute the eval. I guess I could use SIMD vectorization on CPUs or even GPUs to make computing the eval and its derivative much much faster. Another advantage of this is that you can still use ints in your eval, and use float/double to store the Jacobian. Moreover, the "standard" implementation of autodiff using operator overloading is pretty slow compared to the source-to-source transformation used in fortran and other languages.

Daniel

AlvaroBegue · Post by **AlvaroBegue** » Tue Nov 07, 2017 3:01 pm

Rein Halbersma wrote:[...]
It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune

It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).

Oh, that's my fault. I could have provided those operators (and I probably should have).

Rein Halbersma · Post by **Rein Halbersma** » Tue Nov 07, 2017 7:48 pm

AlvaroBegue wrote:
Rein Halbersma wrote:[...]
It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune

It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).
Oh, that's my fault. I could have provided those operators (and I probably should have).

It's not obvious to me how to implement it, and Python autograd also does not support it. You want that a = b; a += c; to be equivalent to a = b + c; I am curious to how you would implement it.

AlvaroBegue · Post by **AlvaroBegue** » Tue Nov 07, 2017 7:58 pm

Rein Halbersma wrote:
AlvaroBegue wrote:
Rein Halbersma wrote:[...]
It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune

It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).
Oh, that's my fault. I could have provided those operators (and I probably should have).
It's not obvious to me how to implement it, and Python autograd also does not support it. You want that a = b; a += c; to be equivalent to a = b + c; I am curious to how you would implement it.

Code: Select all

  inline Variable &operator+=&#40;Variable &v1, Variable v2&#41; &#123;
    return v1 = v1 + v2;
  &#125;

That seems to work in RuyTune. Through the magic of shared_ptr the old value of v1 is not lost.

Here's a sample program, using autodiff.hpp from RuyTune:

Code: Select all

#include "autodiff.hpp"

namespace autodiff &#123;
  inline Variable &operator+=&#40;Variable &v1, Variable v2&#41; &#123;
   return v1 = v1 + v2;
 &#125;
&#125;

template <typename Score>
Score three_times&#40;Score s&#41; &#123;
  s += 2.0 * s;
  return s;
&#125;

int main&#40;) &#123;
  autodiff&#58;&#58;Variable x = 2.0;
  autodiff&#58;&#58;Variable y = three_times&#40;x&#41;;
  y.p->derivative = 1.0;
  autodiff&#58;&#58;backpropagate&#40;);
  std&#58;&#58;cout << x.p->derivative << '\n'; // Prints 3
&#125;

Rein Halbersma · Post by **Rein Halbersma** » Wed Nov 08, 2017 2:58 pm

AlvaroBegue wrote:
Code: Select all
  inline Variable &operator+=&#40;Variable &v1, Variable v2&#41; &#123;
    return v1 = v1 + v2;
  &#125;
That seems to work in RuyTune. Through the magic of shared_ptr the old value of v1 is not lost.

Ah, that's a nice way to do it. One thing that bothers me, however, is that it is not the canonical way to overload operators. Usually you define op@= (with @ = *, /, +, - etc.) as a member function, returning *this, and then you define op@ as a non-member function in terms of op@=.

Your way of defining op@= in terms of op@ is non-standard. I would have to think it through very carefully if all the usual arithmetic properties are still being satisfied (likely, but I'd like to prove it to myself first

).

One other thing that I noticed in your RuyTune library is that you have a global storage of the computational graph, where you rely on the LIFO nature in which the various contributions to the loss functions are being added. The way e.g. the machine learning library PyTorch works, is to have each variable hold its child edges by shared pointer, that the user then calls backward() on any such variable (in Python Autograd library, this is done implicitly), after which you can call grad() on any of its children (the weights to be learned).

Here's a toy implementation of this approach:

Code: Select all

#include <cassert>  // assert
#include <iostream> // cout
#include <memory>   // make_shared, shared_ptr
#include <utility>  // pair
#include <vector>   // vector

// class with value semantics
template<class T>
class Var 
&#123;
    struct Node;
    using EdgeWeight = std&#58;&#58;pair<std&#58;&#58;shared_ptr<Node>, T>;

    // nested type held by shared_ptr inside Var
    struct Node 
    &#123;
        T data &#123;&#125;;
        T grad &#123;&#125;;
        std&#58;&#58;vector<EdgeWeight> edge;
       
        explicit Node&#40;T const& v&#41; noexcept
        &#58;
            data&#40;v&#41;
        &#123;&#125;
        
        Node&#40;T const& v, std&#58;&#58;vector<EdgeWeight> const& e&#41;
        &#58;
            data&#40;v&#41;, 
            edge&#40;e&#41;
        &#123;&#125;
    &#125;;

    std&#58;&#58;shared_ptr<Node> node_ptr;
    Node const* operator->() const noexcept &#123; return node_ptr.get&#40;); &#125;
    
    // recursive traversal of all child nodes
    void propagate&#40;std&#58;&#58;shared_ptr<Node> const& n&#41;
    &#123;
        for &#40;auto i = std&#58;&#58;size_t&#123;0&#125;, N = n->edge.size&#40;); i < N; ++i&#41; &#123;
            auto& c = n->edge&#91;i&#93;.first;
            c->grad += n->edge&#91;i&#93;.second * n->grad;
            propagate&#40;c&#41;;
        &#125;
    &#125;

public&#58;    
    Var&#40;T const& v&#41;
    &#58;
        node_ptr&#40;std&#58;&#58;make_shared<Node>&#40;v&#41;)
    &#123;&#125;
    
    Var&#40;T const& v, std&#58;&#58;vector<EdgeWeight> const& e&#41;
    &#58;
        node_ptr&#40;std&#58;&#58;make_shared<Node>&#40;v, e&#41;)
    &#123;&#125;

    // back propagation over all child nodes
    void backward&#40;)
    &#123;
        node_ptr->grad = T&#123;1&#125;;
        propagate&#40;node_ptr&#41;;
    &#125;

    // requires previous call to backward&#40;) 
    friend auto grad&#40;Var const& v&#41;
    &#123;
        return v->grad;
    &#125;

    friend std&#58;&#58;ostream& operator<<&#40;std&#58;&#58;ostream& ostr, Var<T> const& v&#41;
    &#123;
        return ostr << v->data;
    &#125;

    friend Var operator+&#40;Var const& L, Var const& R&#41;
    &#123;
        return 
        &#123; 
            L->data + R->data, 
            &#123;
                &#123; L.node_ptr, T&#123;1&#125; &#125;,
                &#123; R.node_ptr, T&#123;1&#125; &#125;
            &#125;
        &#125;;
    &#125;

    friend Var operator*&#40;Var const& L, Var const& R&#41;
    &#123;
        return 
        &#123; 
            L->data * R->data, 
            &#123; 
                &#123; L.node_ptr, R->data &#125;,
                &#123; R.node_ptr, L->data &#125;
            &#125;
        &#125;;
    &#125;
&#125;;

int main&#40;)
&#123;
    Var w1 = 2.0;
    Var w2 = 3.0;
    Var w3 = 5.0;
    Var loss = w1 + w2 * w3;    // forward computation of loss is implicit
    std&#58;&#58;cout << w1 << '\n';
    std&#58;&#58;cout << w2 << '\n';
    std&#58;&#58;cout << w3 << '\n';
    std&#58;&#58;cout << loss << '\n';
    loss.backward&#40;);            // backward propagation of gradient is explicit
    std&#58;&#58;cout << grad&#40;w1&#41; << '\n';
    std&#58;&#58;cout << grad&#40;w2&#41; << '\n';
    std&#58;&#58;cout << grad&#40;w3&#41; << '\n';
&#125;

Live Example

AlvaroBegue · Post by **AlvaroBegue** » Wed Nov 08, 2017 3:11 pm

Oh, that does seem better than my solution. I'll study it carefully and I will probably adopt it. Thanks!

Rein Halbersma · Post by **Rein Halbersma** » Wed Nov 08, 2017 3:22 pm

Daniel Shawul wrote:Actually there is no need for this if the eval is linear -- which it commonly is. Compute it through finite difference once and store the Jacobian matrix, after that one doesn't have to call back eval() again. An example implementation is here : https://github.com/dshawul/Scorpio/blob/master/eval.cpp The jacobian takes about 1.5G bytes to store for 400 or so variables. The tuning with the Jacobian is much much faster than without it as one have to just multiply coefficients to compute the eval. I guess I could use SIMD vectorization on CPUs or even GPUs to make computing the eval and its derivative much much faster. Another advantage of this is that you can still use ints in your eval, and use float/double to store the Jacobian. Moreover, the "standard" implementation of autodiff using operator overloading is pretty slow compared to the source-to-source transformation used in fortran and other languages.

Daniel

Finite difference computations of gradients scale linearly in the number of parameters, whereas automatic differentiation is a constant (for both methods, it is also scales as a constant times the forward computation of the loss). There's a reason that all modern machine learning libraries use automatic differentiation.

Rein Halbersma · Post by **Rein Halbersma** » Wed Nov 08, 2017 3:36 pm

AlvaroBegue wrote:Oh, that does seem better than my solution. I'll study it carefully and I will probably adopt it. Thanks!

I will open a new project Autograd on GitHub, will probably be my xmas break project.

I'm still not completely satisfied. Feature it needs IMO is that every Var should hold a std::set of all it's unique children. This enables writing loss.grad(w1) instead of the toy example's grad(w1), so that you can compute gradients of any variable with respect to any of its children. Look at Python's autograd or PyTorch for inspiration about nice syntactic sugar

Re: the op+= example that you gave. There are some corner cases like x+=a; followed by x-=a, or self-assignment x-=x; that need careful unit testing to make sure they are correct. Or you need a general proof that no cycles between shared pointers can be made through your definitions. As soon as you have shared_ptr cycles, there's a bug.

tool to create derivates of a given function

tool to create derivates of a given function

Re: tool to create derivates of a given function

Re: tool to create derivates of a given function

Re: tool to create derivates of a given function

Re: tool to create derivates of a given function

Re: tool to create derivates of a given function

Re: tool to create derivates of a given function

Re: tool to create derivates of a given function

Re: tool to create derivates of a given function

Re: tool to create derivates of a given function