# Automatic differentiation

## Idea

Given a function defined by a section of computer code, automatic differentiation produces a piece of computer code which, except for certain extreme cases, computes its derivative function. This page accumulates some notes of it.

## Details

### Forward rules

Extended number system based rules for forward automatic differentiation (here a prime denotes derivative with respect to some basic variable – let’s call this $x$ – and $u$ denotes some general expression):

(1)$variable \quad x \quad gives \quad \langle x, 1 \rangle$
(2)$variable \quad u \ne x \quad gives \quad \langle u, 0 \rangle$
(3)$constant \quad c \quad gives \quad \langle c, 0 \rangle$
(4)$\langle u,u'\rangle +\langle v,v'\rangle = \langle u+v, u'+v' \rangle$
(5)$\langle u,u'\rangle -\langle v,v'\rangle = \langle u-v, u'-v' \rangle$
(6)$\langle u,u'\rangle *\langle v,v'\rangle = \langle u v, u' v+u v' \rangle$
(7)$\langle u,u'\rangle /\langle v,v'\rangle = \left\langle \frac{u}{v}, \frac{u' v-u v'}{v^2} \right\rangle \quad ( v\ne 0)$
(8)$\sin\langle u,u'\rangle = \langle \sin(u) , u' \cos(u) \rangle$
(9)$\cos\langle u,u'\rangle = \langle \cos(u) , -u' \sin(u) \rangle$
(10)$\exp\langle u,u'\rangle = \langle \exp u , u' \exp u \rangle$
(11)$\log\langle u,u'\rangle = \langle \log(u) , u'/u \rangle \quad (u \gt 0)$
(12)$\langle u,u'\rangle^k = \langle u^k , k u^{k-1} u' \rangle \quad (u \ne 0)$
(13)$\left| \langle u,u'\rangle \right| = \langle \left| u \right|, u' \sign u \rangle \quad (u \ne 0)$

For a two argument function

(14)$g(\langle u,u' \rangle , \langle v,v' \rangle ) = \langle g(u,v) , g^{(1)}(u,v) u' + g^{(2)}(u,v) v' \rangle$

where $g^{(1)}$ and $g^{(2)}$ are the derivatives of $g$ with respect to its first and second arguments, respectively.

## Reverse formulation

Suppose we have a sequence of assignments of the form

(15)$x_i\mathrel{=}f_i( inputs )$

If each “variable” is only assigned to once (so-called SSA form) then the distinction between assignment and mathematical equality disappears. In order to clarify the equations, we use the convention that the final value $x_n$ is denoted by $r$. The goal is to compute the derivative of $r$ with respect to some variable $x_i$. The basic rule can be expressed as

(16)$\frac{d r}{d x_i}\mathrel{=}\sum_{ins \quad j \quad with\quad i} \frac{d r}{d x_j} \frac{\partial f_j}{\partial x_i}$

This is basically the chain rule for a scalar function of vector valued inputs, without any geometric underpinning and discarding terms known to be zero.

Unfortunately this form is awkward to sue in practice, so an alternative formulation involving accumulating values is given here.

Warning: these are independently derived and may need correcting: Denote $d r/d x_i$ by $r_{(i)}$.

(17)\begin{aligned} x_i &= x_j + x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)}\\ r_{(k)} &{\mathrel{+=}} r_{(i)} \end{aligned}

Subtraction:

(18)\begin{aligned} x_i &= x_j - x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)}\\ r_{(k)} &{\mathrel{+=}} - r_{(i)} \end{aligned}

Mulitplication:

(19)\begin{aligned} x_i &= x_j \times x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} \times x_k\\ r_{(k)} &{\mathrel{+=}} r_{(i)} \times x_j \end{aligned}

Division:

(20)\begin{aligned} x_i &= x_j / x_k \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} / x_k\\ r_{(k)} &{\mathrel{+=}} - r_{(i)} x_j / (x_k^2)\\ &{ie}\quad - r_{(i)} x_i / x_k \end{aligned}

Power:

(21)\begin{aligned} x_i &= x_j ^ {x_k} \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} x_k x_j ^ {x_k-1}\\ r_{(k)} &{\mathrel{+=}} r_{(i)} x_j ^ {x_k} \log x_j\\ &{ie} \quad r_{(i)} x_i \log x_j \end{aligned}

Exponential:

(22)\begin{aligned} x_i &= \exp x_j \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} x_i \end{aligned}

Logarithm:

(23)\begin{aligned} x_i &= \log {x_j} \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} / x_j \end{aligned}

Sin:

(24)\begin{aligned} x_i &= \sin {x_j} \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} \cos {x_j} \end{aligned}

Cosine:

(25)\begin{aligned} x_i &= \cos {x_j} \quad gives\\ r_{(j)} &{\mathrel{+=}} - r_{(i)} \sin {x_j} \end{aligned}

Absolute value:

(26)\begin{aligned} x_i &= |{x_j}| \quad gives\\ r_{(j)} &{\mathrel{+=}} r_{(i)} \sign {x_j} \end{aligned}