tool to create derivates of a given function

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: tool to create derivates of a given function

Post by AlvaroBegue »

Rein Halbersma wrote: Re: the op+= example that you gave. There are some corner cases like x+=a; followed by x-=a, or self-assignment x-=x; that need careful unit testing to make sure they are correct. Or you need a general proof that no cycles between shared pointers can be made through your definitions. As soon as you have shared_ptr cycles, there's a bug.
I don't get what seems so scary about the operator+= I wrote. I am as convinced that it is correct as any other part of the library. In some sense my very primitive implementation, with a global vector that stores all the relevant partial derivatives, makes this easier to think about. The time ordering of the operations is proof that there is no loop.

But if you do find a case where it fails, I am ready to learn something new.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: tool to create derivates of a given function

Post by Daniel Shawul »

Rein Halbersma wrote:
Daniel Shawul wrote:Actually there is no need for this if the eval is linear -- which it commonly is. Compute it through finite difference once and store the Jacobian matrix, after that one doesn't have to call back eval() again. An example implementation is here : https://github.com/dshawul/Scorpio/blob/master/eval.cpp The jacobian takes about 1.5G bytes to store for 400 or so variables. The tuning with the Jacobian is much much faster than without it as one have to just multiply coefficients to compute the eval. I guess I could use SIMD vectorization on CPUs or even GPUs to make computing the eval and its derivative much much faster. Another advantage of this is that you can still use ints in your eval, and use float/double to store the Jacobian. Moreover, the "standard" implementation of autodiff using operator overloading is pretty slow compared to the source-to-source transformation used in fortran and other languages.

Daniel
Finite difference computations of gradients scale linearly in the number of parameters, whereas automatic differentiation is a constant (for both methods, it is also scales as a constant times the forward computation of the loss). There's a reason that all modern machine learning libraries use automatic differentiation.
You misunderstood, the FD gradient calculation is done only once at the the beginning and then you store the gradient for all positions. During optimization you read the eval and its gradient from the stored Jacobian matrix so this is faster than anything you can come up with that calls eval() during optimization -- which the autodiff approach does. I mentioned in the other thread how the jacobian method can be used for some non-linear eval() as well.

In any case, machine learning people are kind of newbies to autodiff. The method has been in use in computational fluid dynamics for optimization of non-linear differential equations for long. The source-to-source autodiff used in fortran is much faster than the c++ operator overloading approach because compilers are not good at all in compiling that kind of code. Even we have to do some optimization to handle a dual (mg,eg) score in an efficient way. Using atleast expression templates for lazy evaluation of expressions is necessary to get anywhere close to the source-to-source transformed code. I recall there is such library that would hopefully bring c++ to the level of fortran when it comes to autodiff. This used to be the case for matrix manupilation as well but now c++ is on the level of fortran thanks to expression templates.

Edit: Something like this library http://www.met.reading.ac.uk/clouds/adept/ for C++.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: tool to create derivates of a given function

Post by Rein Halbersma »

Daniel Shawul wrote: Edit: Something like this library http://www.met.reading.ac.uk/clouds/adept/ for C++.
Thanks, that's a useful link.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: tool to create derivates of a given function

Post by Rein Halbersma »

AlvaroBegue wrote:
Rein Halbersma wrote: Re: the op+= example that you gave. There are some corner cases like x+=a; followed by x-=a, or self-assignment x-=x; that need careful unit testing to make sure they are correct. Or you need a general proof that no cycles between shared pointers can be made through your definitions. As soon as you have shared_ptr cycles, there's a bug.
I don't get what seems so scary about the operator+= I wrote. I am as convinced that it is correct as any other part of the library. In some sense my very primitive implementation, with a global vector that stores all the relevant partial derivatives, makes this easier to think about. The time ordering of the operations is proof that there is no loop.

But if you do find a case where it fails, I am ready to learn something new.
I now defined op+= inside Var as a member function.

Code: Select all

    Var& operator+=(Var const& R)
    {
        return *this = *this + R;
    }
See Live Example

Now everything has exactly the same signature and interface as regular types with overloaded operators ("do as the ints do"). The correctness is now obvious at least. Still quite a cute thing with the shared pointers under the hood, I hadn't thought of that before you showed me.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: tool to create derivates of a given function

Post by AlvaroBegue »

Rein Halbersma wrote:
AlvaroBegue wrote:
Rein Halbersma wrote: Re: the op+= example that you gave. There are some corner cases like x+=a; followed by x-=a, or self-assignment x-=x; that need careful unit testing to make sure they are correct. Or you need a general proof that no cycles between shared pointers can be made through your definitions. As soon as you have shared_ptr cycles, there's a bug.
I don't get what seems so scary about the operator+= I wrote. I am as convinced that it is correct as any other part of the library. In some sense my very primitive implementation, with a global vector that stores all the relevant partial derivatives, makes this easier to think about. The time ordering of the operations is proof that there is no loop.

But if you do find a case where it fails, I am ready to learn something new.
I now defined op+= inside Var as a member function.

Code: Select all

    Var& operator+=(Var const& R)
    {
        return *this = *this + R;
    }
See Live Example

Now everything has exactly the same signature and interface as regular types with overloaded operators ("do as the ints do"). The correctness is now obvious at least. Still quite a cute thing with the shared pointers under the hood, I hadn't thought of that before you showed me.
Making the binary operators be member functions or free functions is just a matter of taste. I prefer to have them as free functions, because the first operand is sometimes not a class (e.g., double times vector), but it really makes no difference.