## tool to create derivates of a given function

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 3:02 pm
Contact:

### tool to create derivates of a given function

Useful for users of the Texel Tuning method. If you have a model more complex than a linear function you can use this new library to compute the derivative for you.

Rein Halbersma
Posts: 685
Joined: Tue May 22, 2007 9:13 am

### Re: tool to create derivates of a given function

brtzsnr wrote:Useful for users of the Texel Tuning method. If you have a model more complex than a linear function you can use this new library to compute the derivative for you.

It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune

It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).

Daniel Shawul
Posts: 3593
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### Re: tool to create derivates of a given function

Actually there is no need for this if the eval is linear -- which it commonly is. Compute it through finite difference once and store the Jacobian matrix, after that one doesn't have to call back eval() again. An example implementation is here : https://github.com/dshawul/Scorpio/blob/master/eval.cpp The jacobian takes about 1.5G bytes to store for 400 or so variables. The tuning with the Jacobian is much much faster than without it as one have to just multiply coefficients to compute the eval. I guess I could use SIMD vectorization on CPUs or even GPUs to make computing the eval and its derivative much much faster. Another advantage of this is that you can still use ints in your eval, and use float/double to store the Jacobian. Moreover, the "standard" implementation of autodiff using operator overloading is pretty slow compared to the source-to-source transformation used in fortran and other languages.

Daniel

AlvaroBegue
Posts: 916
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: tool to create derivates of a given function

Rein Halbersma wrote:[...]
It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune

It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).
Oh, that's my fault. I could have provided those operators (and I probably should have).

Rein Halbersma
Posts: 685
Joined: Tue May 22, 2007 9:13 am

### Re: tool to create derivates of a given function

AlvaroBegue wrote:
Rein Halbersma wrote:[...]
It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune

It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).
Oh, that's my fault. I could have provided those operators (and I probably should have).
It's not obvious to me how to implement it, and Python autograd also does not support it. You want that a = b; a += c; to be equivalent to a = b + c; I am curious to how you would implement it.

AlvaroBegue
Posts: 916
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: tool to create derivates of a given function

Rein Halbersma wrote:
AlvaroBegue wrote:
Rein Halbersma wrote:[...]
It's reasonably straightforward to code this in C++. See e.g. Alvaro's RuyTunehttp://chessprogramming.wikispaces.com/RuyTune

It may require some minor rewrites in your evaluation function, mainly if you use compound assignment operators (+= etc., instead of a+=b; you need to write a = a + b; ).
Oh, that's my fault. I could have provided those operators (and I probably should have).
It's not obvious to me how to implement it, and Python autograd also does not support it. You want that a = b; a += c; to be equivalent to a = b + c; I am curious to how you would implement it.

Code: Select all

``````  inline Variable &operator+=&#40;Variable &v1, Variable v2&#41; &#123;
return v1 = v1 + v2;
&#125;
``````
That seems to work in RuyTune. Through the magic of shared_ptr the old value of v1 is not lost.

Here's a sample program, using autodiff.hpp from RuyTune:

Code: Select all

``````#include "autodiff.hpp"

namespace autodiff &#123;
inline Variable &operator+=&#40;Variable &v1, Variable v2&#41; &#123;
return v1 = v1 + v2;
&#125;
&#125;

template <typename Score>
Score three_times&#40;Score s&#41; &#123;
s += 2.0 * s;
return s;
&#125;

int main&#40;) &#123;
autodiff&#58;&#58;Variable x = 2.0;
autodiff&#58;&#58;Variable y = three_times&#40;x&#41;;
y.p->derivative = 1.0;
autodiff&#58;&#58;backpropagate&#40;);
std&#58;&#58;cout << x.p->derivative << '\n'; // Prints 3
&#125;

``````

Rein Halbersma
Posts: 685
Joined: Tue May 22, 2007 9:13 am

### Re: tool to create derivates of a given function

AlvaroBegue wrote:

Code: Select all

``````  inline Variable &operator+=&#40;Variable &v1, Variable v2&#41; &#123;
return v1 = v1 + v2;
&#125;
``````
That seems to work in RuyTune. Through the magic of shared_ptr the old value of v1 is not lost.
Ah, that's a nice way to do it. One thing that bothers me, however, is that it is not the canonical way to overload operators. Usually you define op@= (with @ = *, /, +, - etc.) as a member function, returning *this, and then you define op@ as a non-member function in terms of op@=.

Your way of defining op@= in terms of op@ is non-standard. I would have to think it through very carefully if all the usual arithmetic properties are still being satisfied (likely, but I'd like to prove it to myself first ).

One other thing that I noticed in your RuyTune library is that you have a global storage of the computational graph, where you rely on the LIFO nature in which the various contributions to the loss functions are being added. The way e.g. the machine learning library PyTorch works, is to have each variable hold its child edges by shared pointer, that the user then calls backward() on any such variable (in Python Autograd library, this is done implicitly), after which you can call grad() on any of its children (the weights to be learned).

Here's a toy implementation of this approach:

Code: Select all

``````#include <cassert>  // assert
#include <iostream> // cout
#include <memory>   // make_shared, shared_ptr
#include <utility>  // pair
#include <vector>   // vector

// class with value semantics
template<class T>
class Var
&#123;
struct Node;
using EdgeWeight = std&#58;&#58;pair<std&#58;&#58;shared_ptr<Node>, T>;

// nested type held by shared_ptr inside Var
struct Node
&#123;
T data &#123;&#125;;
std&#58;&#58;vector<EdgeWeight> edge;

explicit Node&#40;T const& v&#41; noexcept
&#58;
data&#40;v&#41;
&#123;&#125;

Node&#40;T const& v, std&#58;&#58;vector<EdgeWeight> const& e&#41;
&#58;
data&#40;v&#41;,
edge&#40;e&#41;
&#123;&#125;
&#125;;

std&#58;&#58;shared_ptr<Node> node_ptr;
Node const* operator->() const noexcept &#123; return node_ptr.get&#40;); &#125;

// recursive traversal of all child nodes
void propagate&#40;std&#58;&#58;shared_ptr<Node> const& n&#41;
&#123;
for &#40;auto i = std&#58;&#58;size_t&#123;0&#125;, N = n->edge.size&#40;); i < N; ++i&#41; &#123;
auto& c = n->edge&#91;i&#93;.first;
propagate&#40;c&#41;;
&#125;
&#125;

public&#58;
Var&#40;T const& v&#41;
&#58;
node_ptr&#40;std&#58;&#58;make_shared<Node>&#40;v&#41;)
&#123;&#125;

Var&#40;T const& v, std&#58;&#58;vector<EdgeWeight> const& e&#41;
&#58;
node_ptr&#40;std&#58;&#58;make_shared<Node>&#40;v, e&#41;)
&#123;&#125;

// back propagation over all child nodes
void backward&#40;)
&#123;
propagate&#40;node_ptr&#41;;
&#125;

// requires previous call to backward&#40;)
&#123;
&#125;

friend std&#58;&#58;ostream& operator<<&#40;std&#58;&#58;ostream& ostr, Var<T> const& v&#41;
&#123;
return ostr << v->data;
&#125;

friend Var operator+&#40;Var const& L, Var const& R&#41;
&#123;
return
&#123;
L->data + R->data,
&#123;
&#123; L.node_ptr, T&#123;1&#125; &#125;,
&#123; R.node_ptr, T&#123;1&#125; &#125;
&#125;
&#125;;
&#125;

friend Var operator*&#40;Var const& L, Var const& R&#41;
&#123;
return
&#123;
L->data * R->data,
&#123;
&#123; L.node_ptr, R->data &#125;,
&#123; R.node_ptr, L->data &#125;
&#125;
&#125;;
&#125;
&#125;;

int main&#40;)
&#123;
Var w1 = 2.0;
Var w2 = 3.0;
Var w3 = 5.0;
Var loss = w1 + w2 * w3;    // forward computation of loss is implicit
std&#58;&#58;cout << w1 << '\n';
std&#58;&#58;cout << w2 << '\n';
std&#58;&#58;cout << w3 << '\n';
std&#58;&#58;cout << loss << '\n';
loss.backward&#40;);            // backward propagation of gradient is explicit
&#125;``````
Live Example

AlvaroBegue
Posts: 916
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: tool to create derivates of a given function

Oh, that does seem better than my solution. I'll study it carefully and I will probably adopt it. Thanks!

Rein Halbersma
Posts: 685
Joined: Tue May 22, 2007 9:13 am

### Re: tool to create derivates of a given function

Daniel Shawul wrote:Actually there is no need for this if the eval is linear -- which it commonly is. Compute it through finite difference once and store the Jacobian matrix, after that one doesn't have to call back eval() again. An example implementation is here : https://github.com/dshawul/Scorpio/blob/master/eval.cpp The jacobian takes about 1.5G bytes to store for 400 or so variables. The tuning with the Jacobian is much much faster than without it as one have to just multiply coefficients to compute the eval. I guess I could use SIMD vectorization on CPUs or even GPUs to make computing the eval and its derivative much much faster. Another advantage of this is that you can still use ints in your eval, and use float/double to store the Jacobian. Moreover, the "standard" implementation of autodiff using operator overloading is pretty slow compared to the source-to-source transformation used in fortran and other languages.

Daniel
Finite difference computations of gradients scale linearly in the number of parameters, whereas automatic differentiation is a constant (for both methods, it is also scales as a constant times the forward computation of the loss). There's a reason that all modern machine learning libraries use automatic differentiation.

Rein Halbersma
Posts: 685
Joined: Tue May 22, 2007 9:13 am

### Re: tool to create derivates of a given function

AlvaroBegue wrote:Oh, that does seem better than my solution. I'll study it carefully and I will probably adopt it. Thanks!
I will open a new project Autograd on GitHub, will probably be my xmas break project.

I'm still not completely satisfied. Feature it needs IMO is that every Var should hold a std::set of all it's unique children. This enables writing loss.grad(w1) instead of the toy example's grad(w1), so that you can compute gradients of any variable with respect to any of its children. Look at Python's autograd or PyTorch for inspiration about nice syntactic sugar

Re: the op+= example that you gave. There are some corner cases like x+=a; followed by x-=a, or self-assignment x-=x; that need careful unit testing to make sure they are correct. Or you need a general proof that no cycles between shared pointers can be made through your definitions. As soon as you have shared_ptr cycles, there's a bug.