The Azimuth Project
Automatic differentiation (Rev #5)

Automatic differentiation

Idea

Given a function defined by a section of computer code, automatic differentiation produces a piece of computer code which, except for certain extreme cases, computes its derivative function. This page accumulates some notes of it.

Details

Forward rules

Extended number system based rules for forward automatic differentiation (here a prime denotes derivative with respect to some basic variable – let’s call this xx – and uu denotes some general expression):

(1)variablexgivesx,1 variable \quad x \quad gives \quad \langle x, 1 \rangle
(2)variableuxgivesu,0 variable \quad u \ne x \quad gives \quad \langle u, 0 \rangle
(3)constantcgivesc,0 constant \quad c \quad gives \quad \langle c, 0 \rangle
(4)u,u+v,v=u+v,u+v \langle u,u'\rangle +\langle v,v'\rangle = \langle u+v, u'+v' \rangle
(5)u,uv,v=uv,uv \langle u,u'\rangle -\langle v,v'\rangle = \langle u-v, u'-v' \rangle
(6)u,u*v,v=uv,uv+uv \langle u,u'\rangle *\langle v,v'\rangle = \langle u v, u' v+u v' \rangle
(7)u,u/v,v=uv,uvuvv 2(v0) \langle u,u'\rangle /\langle v,v'\rangle = \left\langle \frac{u}{v}, \frac{u' v-u v'}{v^2} \right\rangle \quad ( v\ne 0)
(8)sinu,u=sin(u),ucos(u) \sin\langle u,u'\rangle = \langle \sin(u) , u' \cos(u) \rangle
(9)cosu,u=cos(u),usin(u) \cos\langle u,u'\rangle = \langle \cos(u) , -u' \sin(u) \rangle
(10)expu,u=expu,uexpu \exp\langle u,u'\rangle = \langle \exp u , u' \exp u \rangle
(11)logu,u=log(u),u/u(u>0) \log\langle u,u'\rangle = \langle \log(u) , u'/u \rangle \quad (u \gt 0)
(12)u,u k=u k,ku k1u(u0) \langle u,u'\rangle^k = \langle u^k , k u^{k-1} u' \rangle \quad (u \ne 0)
(13)|u,u|=|u|,usignu(u0) \left| \langle u,u'\rangle \right| = \langle \left| u \right|, u' \sign u \rangle \quad (u \ne 0)

For a two argument function

(14)g(u,u,v,v)=g(u,v),g (1)(u,v)u+g (2)(u,v)v g(\langle u,u' \rangle , \langle v,v' \rangle ) = \langle g(u,v) , g^{(1)}(u,v) u' + g^{(2)}(u,v) v' \rangle

where g (1)g^{(1)} and g (2)g^{(2)} are the derivatives of gg with respect to its first and second arguments, respectively.

Reverse formulation

Suppose we have a sequence of assignments of the form

(15)x i=f i(inputs) x_i\mathrel{=}f_i( inputs )

If each “variable” is only assigned to once (so-called SSA form) then the distinction between assignment and mathematical equality disappears. In order to clarify the equations, we use the convention that the final value x nx_n is denoted by rr. The goal is to compute the derivative of rr with respect to some variable x ix_i. The basic rule can be expressed as

(16)drdx i= insjwithidrdx jf jx i \frac{d r}{d x_i}\mathrel{=}\sum_{ins \quad j \quad with\quad i} \frac{d r}{d x_j} \frac{\partial f_j}{\partial x_i}

This is basically the chain rule for a scalar function of vector valued inputs, without any geometric underpinning and discarding terms known to be zero.

Unfortunately this form is awkward to sue in practice, so an alternative formulation involving accumulating values is given here.

Warning: these are independently derived and may need correcting: Denote dr/dx id r/d x_i by r (i)r_{(i)}.

Addition:

(17)x i =x j+x kgives r (j) +=r (i) r (k) +=r (i)\begin{aligned} x_i &= x_j + x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)}\\ r_{(k)} &{\mathrel{+=}} r_{(i)} \end{aligned}

Subtraction:

(18)x i =x jx kgives r (j) +=r (i) r (k) +=r (i)\begin{aligned} x_i &= x_j - x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)}\\ r_{(k)} &{\mathrel{+=}} - r_{(i)} \end{aligned}

Mulitplication:

(19)x i =x j×x kgives r (j) +=r (i)×x k r (k) +=r (i)×x j\begin{aligned} x_i &= x_j \times x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} \times x_k\\ r_{(k)} &{\mathrel{+=}} r_{(i)} \times x_j \end{aligned}

Division:

(20)x i =x j/x kgives r (j) +=r (i)/x k r (k) +=r (i)x j/(x k 2) ier (i)x i/x k\begin{aligned} x_i &= x_j / x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} / x_k\\ r_{(k)} &{\mathrel{+=}} - r_{(i)} x_j / (x_k^2)\\ &{ie}\quad - r_{(i)} x_i / x_k \end{aligned}

Power:

(21)x i =x j x kgives r (j) +=r (i)x kx j x k1 r (k) +=r (i)x j x klogx j ier (i)x ilogx j\begin{aligned} x_i &= x_j ^ {x_k} \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} x_k x_j ^ {x_k-1}\\ r_{(k)} &{\mathrel{+=}} r_{(i)} x_j ^ {x_k} \log x_j\\ &{ie} \quad r_{(i)} x_i \log x_j \end{aligned}

Exponential:

(22)x i =expx jgives r (j) +=r (i)x i\begin{aligned} x_i &= \exp x_j \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} x_i \end{aligned}

Logarithm:

(23)x i =logx jgives r (j) +=r (i)/x j\begin{aligned} x_i &= \log {x_j} \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} / x_j \end{aligned}

Sin:

(24)x i =sinx jgives r (j) +=r (i)cosx j\begin{aligned} x_i &= \sin {x_j} \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} \cos {x_j} \end{aligned}

Cosine:

(25)x i =cosx jgives r (j) +=r (i)sinx j\begin{aligned} x_i &= \cos {x_j} \quad gives\\ r_{(j)} &{\mathrel{+=}} - r_{(i)} \sin {x_j} \end{aligned}

Absolute value:

(26)x i =|x j|gives r (j) +=r (i)signx j\begin{aligned} x_i &= |{x_j}| \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} \sign {x_j} \end{aligned}

References