Showing changes from revision #8 to #9:
Added | Removed | Changed
Given a function defined by a section of computer code, automatic differentiation produces a piece of computer code which, except for certain extreme cases, computes its derivative function. This page accumulates some notes on regarding it.
There are three main advantages to automatic differentiation:
If a function, particuarly one that is subject to change, is naturally expressed as program code, it is generally less error-prone to automatically differentiate the code rather than manually differentiate a conventional symbolic expression and manually convert it to code. (This is particulary true for code involving loops, but we ignore this on this page for now.)
It requires less “tuning” than estimating derivatives by finite differencing (as finite differencing requires care about the offset magnitude to best deal with machine precision/rounding errors and function change, particularly if the function may be defined using multiple pieces). This is most important in cases where slightly inaccurate gradients may cause some procedure to fail to converge.
It is generally significantly more efficient that finite differencing (as common values can mostly be detected and reused, rather than recomputed). However, this is of less importance than the “correctness” issue in 2.
Extended number system based rules for forward automatic differentiation (here a prime denotes derivative with respect to some basic variable – let’s call this $x$ – and $u$ denotes some general expression):
For a two argument function
where $g^{(1)}$ and $g^{(2)}$ are the derivatives of $g$ with respect to its first and second arguments, respectively.
All this is just for first derivaties. Does anyone know of any reference for directly computing second derivatives? (Most refs just say “differentiate the once-differentiated code code”, but that’s rather awkward and inefficient.) Having said that, looking around it appears using even exact second derivatives in optimisation decreases the basin of convergence, so it’s probably not worth the effort – David Tweed
Suppose we have a sequence of assignments of the form
If each “variable” is only assigned to once (so-called SSA form) then the distinction between assignment and mathematical equality disappears. In order to clarify the equations, we use the convention that the final value $x_n$ is denoted by $r$. The goal is to compute the derivative of $r$ with respect to some variable $x_i$. The basic rule can be expressed as
This is basically the chain rule for a scalar function of vector valued inputs, without any geometric underpinning and discarding terms known to be zero.
Unfortunately this form is awkward to sue in practice, so an alternative formulation involving accumulating values is given here.
Warning: these are independently derived and may need correcting: Denote $d r/d x_i$ by $r_{(i)}$.
Addition:
Subtraction:
Mulitplication:
Division:
Power:
Exponential:
Logarithm:
Sin:
Cosine:
Absolute value: