The Azimuth Project
Automatic differentiation

Automatic differentiation


Given a function defined by a section of computer code, automatic differentiation produces a piece of computer code which, except for certain extreme cases, computes its derivative function. This page accumulates some notes regarding it.


There are three main advantages to automatic differentiation:

  1. If a function, particuarly one that is subject to change, is naturally expressed as program code, it is generally less error-prone to automatically differentiate the code rather than manually differentiate a conventional symbolic expression and manually convert it to code. (This is particulary true for code involving loops, but we ignore this on this page for now.)

  2. It requires less “tuning” than estimating derivatives by finite differencing (as finite differencing requires care about the offset magnitude to best deal with machine precision/rounding errors and function change, particularly if the function may be defined using multiple pieces). This is most important in cases where slightly inaccurate gradients may cause some procedure to fail to converge.

  3. It is generally significantly more efficient that finite differencing (as common values can mostly be detected and reused, rather than recomputed). However, this is of less importance than the “correctness” issue in 2.

Forward rules

Extended number system based rules for forward automatic differentiation (here a prime denotes derivative with respect to some basic variable – let’s call this xx – and uu denotes some general expression):

(1)variablexgivesx,1 variable \quad x \quad gives \quad \langle x, 1 \rangle
(2)variableuxgivesu,0 variable \quad u \ne x \quad gives \quad \langle u, 0 \rangle
(3)constantcgivesc,0 constant \quad c \quad gives \quad \langle c, 0 \rangle
(4)u,u+v,v=u+v,u+v \langle u,u'\rangle +\langle v,v'\rangle = \langle u+v, u'+v' \rangle
(5)u,uv,v=uv,uv \langle u,u'\rangle -\langle v,v'\rangle = \langle u-v, u'-v' \rangle
(6)u,u*v,v=uv,uv+uv \langle u,u'\rangle *\langle v,v'\rangle = \langle u v, u' v+u v' \rangle
(7)u,u/v,v=uv,uvuvv 2(v0) \langle u,u'\rangle /\langle v,v'\rangle = \left\langle \frac{u}{v}, \frac{u' v-u v'}{v^2} \right\rangle \quad ( v\ne 0)
(8)sinu,u=sin(u),ucos(u) \sin\langle u,u'\rangle = \langle \sin(u) , u' \cos(u) \rangle
(9)cosu,u=cos(u),usin(u) \cos\langle u,u'\rangle = \langle \cos(u) , -u' \sin(u) \rangle
(10)expu,u=expu,uexpu \exp\langle u,u'\rangle = \langle \exp u , u' \exp u \rangle
(11)logu,u=log(u),u/u(u>0) \log\langle u,u'\rangle = \langle \log(u) , u'/u \rangle \quad (u \gt 0)
(12)u,u k=u k,ku k1u(u0) \langle u,u'\rangle^k = \langle u^k , k u^{k-1} u' \rangle \quad (u \ne 0)
(13)|u,u|=|u|,usignu(u0) \left| \langle u,u'\rangle \right| = \langle \left| u \right|, u' \sign u \rangle \quad (u \ne 0)

For a two argument function

(14)g(u,u,v,v)=g(u,v),g (1)(u,v)u+g (2)(u,v)v g(\langle u,u' \rangle , \langle v,v' \rangle ) = \langle g(u,v) , g^{(1)}(u,v) u' + g^{(2)}(u,v) v' \rangle

where g (1)g^{(1)} and g (2)g^{(2)} are the derivatives of gg with respect to its first and second arguments, respectively.

Reverse formulation

All this is just for first derivaties. Does anyone know of any reference for directly computing second derivatives? (Most refs just say “differentiate the once-differentiated code code”, but that’s rather awkward and inefficient.) Having said that, looking around it appears using even exact second derivatives in optimisation decreases the basin of convergence, so it’s probably not worth the effort – David Tweed

Suppose we have a sequence of assignments of the form

(15)x i=f i(inputs) x_i\mathrel{=}f_i( inputs )

If each “variable” is only assigned to once (so-called SSA form) then the distinction between assignment and mathematical equality disappears. In order to clarify the equations, we use the convention that the final value x nx_n is denoted by rr. The goal is to compute the derivative of rr with respect to some variable x ix_i. The basic rule can be expressed as

(16)drdx i= insjwithidrdx jf jx i \frac{d r}{d x_i}\mathrel{=}\sum_{ins \quad j \quad with\quad i} \frac{d r}{d x_j} \frac{\partial f_j}{\partial x_i}

This is basically the chain rule for a scalar function of vector valued inputs, without any geometric underpinning and discarding terms known to be zero.

Unfortunately this form is awkward to sue in practice, so an alternative formulation involving accumulating values is given here.

Warning: these are independently derived and may need correcting: Denote dr/dx id r/d x_i by r (i)r_{(i)}.


(17)x i =x j+x kgives r (j) +=r (i) r (k) +=r (i)\begin{aligned} x_i &= x_j + x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)}\\ r_{(k)} &{\mathrel{+=}} r_{(i)} \end{aligned}


(18)x i =x jx kgives r (j) +=r (i) r (k) +=r (i)\begin{aligned} x_i &= x_j - x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)}\\ r_{(k)} &{\mathrel{+=}} - r_{(i)} \end{aligned}


(19)x i =x j×x kgives r (j) +=r (i)×x k r (k) +=r (i)×x j\begin{aligned} x_i &= x_j \times x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} \times x_k\\ r_{(k)} &{\mathrel{+=}} r_{(i)} \times x_j \end{aligned}


(20)x i =x j/x kgives r (j) +=r (i)/x k r (k) +=r (i)x j/(x k 2) ier (i)x i/x k\begin{aligned} x_i &= x_j / x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} / x_k\\ r_{(k)} &{\mathrel{+=}} - r_{(i)} x_j / (x_k^2)\\ &{ie}\quad - r_{(i)} x_i / x_k \end{aligned}


(21)x i =x j x kgives r (j) +=r (i)x kx j x k1 ier (i)x ix k/x j r (k) +=r (i)x j x klogx j ier (i)x ilogx j\begin{aligned} x_i &= x_j ^ {x_k} \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} x_k x_j ^ {x_k-1}\\ &{ie} r_{(i)} x_i x_k / x_j\\ r_{(k)} &{\mathrel{+=}} r_{(i)} x_j ^ {x_k} \log x_j\\ &{ie} \quad r_{(i)} x_i \log x_j \end{aligned}


(22)x i =expx jgives r (j) +=r (i)x i\begin{aligned} x_i &= \exp x_j \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} x_i \end{aligned}


(23)x i =logx jgives r (j) +=r (i)/x j\begin{aligned} x_i &= \log {x_j} \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} / x_j \end{aligned}


(24)x i =sinx jgives r (j) +=r (i)cosx j\begin{aligned} x_i &= \sin {x_j} \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} \cos {x_j} \end{aligned}


(25)x i =cosx jgives r (j) +=r (i)sinx j\begin{aligned} x_i &= \cos {x_j} \quad gives\\ r_{(j)} &{\mathrel{+=}} - r_{(i)} \sin {x_j} \end{aligned}

Absolute value:

(26)x i =|x j|gives r (j) +=r (i)signx j\begin{aligned} x_i &= |{x_j}| \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} \sign {x_j} \end{aligned}


See also separable function on Azimuth.

A web site listing lots of Automatic Differentiation software: