The Azimuth Project
Stochastic differential equation



Stochastic (ordinary) differential equations or SDE are differential equations that describe certain random processes. They are used in statistical physics to model systems with both a deterministic influence and a random influence. The simplest example is a classical particle in a potential coupled to a heat bath. Physicists often refer to the corresponding class of equations as ‘the’ Langevin equation.

The weather, and climate, can be modelled using similar but also more complicated SDE. For example, the El Niño/Southern Oscillation is an important phenomenon centered in the Pacific ocean, which displays irregular oscillatory behavior and has been studied using stochastic differential equations. For more, see ENSO.

Numerical solvers of SDE can also be used to synthesize random processes in order to produce synthetic time series, such that techniques of time series analysis can be tested, when real data is not available.

This page is about ordinary differential equations only, see stochastic partial differential equations.

Mathematical Details

We will concentrate on one dimensional ordinary SDE, the Ito calculus and integration with respect to Brownian motion (also known as the ‘Wiener process’).

Motivation from physics, Langevin equation

We’ll deliberately switch to the sloppy physicist’s language in this paragraph.

We have a system that can be modelled as a particle moving in a potential VV subject to friction proportional to its velocity and random influences from the coupling to a heat bath. Let tt \in \mathbb{R} be the time and write x(t)x(t) as the space coordinate of our particle, x˙\dot{x} as the time derivative.

As a non-relativistic classical particle its equation of motion is:

mx¨=V(x)γx˙+2DW˙ m \; \ddot{x} \; = -V'(x) - \gamma \dot{x} + \sqrt{2D} \; \dot{W}

Here the constant mm is the mass of our particle. The constant γ\gamma describes a frictional force which depends linearly on the velocity of the particle; this ansatz for the friction is called Newtonian friction.

It is the term 2DW˙\sqrt{2D} \; \dot{W} that makes this differential equation stochastic: it describes the influence of the heat bath. The constant DD is often called the diffusion constant or diffusion coefficient. WW is not a fixed function of time; it is a random function, or technically a random process called the Wiener process, also known as Brownian motion or red noise. W˙\dot{W} is used to denote the ‘time derivative’ of the Wiener process, which is called white noise. Physicists often speak as if white noise were a random process. Strictly speaking there is no random process that is the time derivative of the Wiener process, but it is possible to construct white noise as a distribution-valued random process in a framework called the ‘white noise calculus’. Luckily, it is not necessary to understand this in any more detail for the following.

If we assume our particle moves in the ‘high friction limit’ where the acceleration is negligible, we can rewrite the equation as

x˙=V(x)+2DW t˙ \dot{x} \; = -V'(x) + \sqrt{2D} \; \dot{W_t}

absorbing the constants m and γ\gamma into the potential function VV and the diffusion DD.

This is the way that physicists, chemists etc. usually write the following stochastic differential equation

dX t=f(X t,t)dt+g(X t,t)dW t dX_t = \; f(X_t, t) dt + \; g(X_t, t) \; dW_t

Motivation from physics II, Fokker-Planck equation

Every Langevin equation or SDE of the form

dX t=f(X t,t)dt+g(X t,t)dW t dX_t = \; f(X_t, t) dt + \; g(X_t, t) \; dW_t

has an associated Fokker-Planck equation (as physicists say) aka Kolmogorov forward equation (as mathematicians say), that describes the time evolution of the probability distribution of the Langevin equation: From the physicist’s POV this probability distribution tells us what the probability is to find the particle at a certain time in a certain place.

The mathematical theorem making all of this precise is the Feynman-Kac formula.

The associated Fokker-Planck equation is in our example:

tP(x,t|x,t)=(xf(x,t)+12 2x 2g 2(x,t))P(x,t|x,t) \frac{\partial}{\partial t} P(x, t | x', t') = ( - \frac{\partial}{\partial x} f(x, t) + \frac{1}{2} \frac{\partial^2}{\partial x^2} g^2(x, t) ) P(x, t | x', t')

with P(x,t|x,t)P(x, t | x', t') denoting the probability to find the particle at time t at position x if it was at time t’ at position x’.

If the particle starts at x at time 0 then our initial condition is

P(x,0|y,0)=δ(xy) P(x, 0 | y, 0) = \delta(x-y)

Mathematical theory

We’ll collect some mathematical theorems and other results that are needed for the models and simulations on the Azimuth project here. For starters, note that the usual chain rule of calculus is not valid for Itô stochastic processes, but one has to use the Itô formula instead.

Qualitative analysis

Some theory has been developed for understanding the qualitative properties of SDE, that is, for figuring out stuff about families of solutions without actually solving the SDE in question. Schenk-Hoppé (1998) writes,

Attractors play an important role in the study of the asymptotic behavior of dynamical systems. There is an extensive literature dealing with attractors of deterministic dynamical systems. For stochastic dynamical systems, however, comparably little progress has been made by now.

The first salient point is that the notion of an “attractor” has to be changed. In a deterministic, unperturbed dynamical system, we can talk about a region AA in phase space which evolves into itself over any length of time, φ(t)A=At\varphi(t)A = A \forall t, and which has a basin of attraction UU within which the distance between a point and the attractor AA tends to zero. In the stochastic case, the attractor becomes a family of phase-space regions A(W)A(W) dependent on the perturbation WW. Convergence of points within the basin to the attractor itself is then something which happens “almost surely”.

Two different approaches to stochastic bifurcation have been pursued so far, the “physicist’s” (P) and the “dynamical” (D) approach, see [L. Arnold et al. (1995)]. Suppose a random dynamical system which depends on a parameter is given. Then the approach called (P)-bifurcation deals with qualitative changes of stationary measures when a parameter is varied. It is particularly useful for stochastic differential equations, since densities of stationary measures are delivered by the Fokker–Planck equation. (P)-bifurcations have been studied intensively by engineers […]

The approach called (D)-bifurcation studies the loss of stability of invariant measures and the occurrence of new invariant measures when a parameter is varied. It is tied to sample stability through the use of Lyapunov exponents (given by the multiplicative ergodic theorem) to characterize stability of invariant measures. This approach has proven to be very successful.

Numerical Solvers

It is possible to solve SDE numerically, in the sense that one can get numerical approximations to certain properties of the stochastic process that is the solution of a given SDE. Algorithms for solving SDE numerically are generalizations of algorithms used for solving ordinary differential equations (ODE).

For some general terminology see discrete approximation scheme.

The Euler Scheme

Let X tX_t be an Ito process that is a solution to the SDE

dX t=f(X t,t)dt+g(X t,t)dW t dX_t = \; f(X_t, t) dt + \; g(X_t, t) \; dW_t

on [0,T][0, T] with the initial condition X 0=CX_0 = C \in \mathbb{R}. For a given equidistant discretization of [0,T][0, T], 0=τ 0,...τ n=T0 = \tau_0, ... \tau_n = T an Euler approximation is a stochastic process Y tY_t satisfying the recursion relations

Y n+1=Y n+a(τ n,Y n)(τ n+1τ n)+b(τ n,Y n)(W(τ n+1)W(τ n)) Y_{n+1} = Y_n + a(\tau_n, Y_n) (\tau_{n+1} - \tau_n) + b(\tau_n, Y_n) (W(\tau_{n+1}) - W(\tau_n) )

and Y 0=CY_0 = C. If we compare this approximation to the usual Euler scheme for an ODE, we see that in order to compute YY numerically using a computer, we need random numbers with a Gaussian distribution NN, since W(τ n+1)W(τ n)W(\tau_{n+1}) - W(\tau_n) is distributed according to N(0,τ n+1τ)N(0, \tau_{n+1} - \tau ). For this, we need a random number generator.

The simplest scheme that achieves a higher order of strong convergence (see below) than the Euler scheme is the Milstein scheme.

Orders of Convergence

We will use the nomenclature of the preceding paragraph about the Euler scheme.

We need to generalize the concepts of convergence of a discrete approximation to the solution of ODE to the stochastic setting. There are several ways to do this, two common ones are:

Let δ:=max(|τ nτ n+1|)\delta := \max (|\tau_n - \tau_{n+1}|) be the maximum time difference of a given discretization of [0,T][0, T].


Strong Convergence

A discrete approximation Y tY_t to an Ito process X tX_t converges strongly, if

lim δ0E(|X tY t|)0 \lim_{\delta \to 0} \; E(|X_t - Y_t|) \to 0

It is said to be of strong order γ\gamma if there is a constant CC such that

E(|X tY t|)Cδ γ E(|X_t - Y_t|) \leq C \delta^{\gamma}

As we see from the definition, a strong approximation needs to approximate the original process X tX_t pathwise.


Weak Convergence

A discrete approximation Y tY_t to an Ito process X tX_t converges weakly for a given class of functions GG, if for every function gGg \in G we have

lim δ0E(g(X T))E(g(Y t))0 \lim_{\delta \to 0} \; E(g(X_T)) - E(g(Y_t)) \to 0

where we assume that all appearing expectation values exist and are finite.

It is said to be of weak order γ\gamma with respect to GG if there is a constant CC such that for every function gGg \in G:

E(g(X T))E(g(Y t))Cδ γ E(g(X_T)) - E(g(Y_t)) \leq C \delta^{\gamma}

We speak simply of weak convergence when GG is the set of all polynomials.

So weak convergence means that the approximation needs to approximate the probability distribution of the original process X tX_t. Weak and strong convergence are indeed different concepts, example:

The Euler scheme is of strong order 0.5 and of weak order 1.0.

Strong approximations are needed if we are interested in properties of the process X tX_t that depend on properties of sample paths, for example, when we are interested in trajectories of projectiles. Weak approximations are needed when we are interested in properties of the probability distribution of X tX_t, for example, when we would like to compute functionals of X tX_t: According to the Feynman-Kac formula we can use weak approximations to solve the associated Fokker-Planck equation numerically. This may be an interesting alternative to numerical methods for hyperbolic partial differential equations, that apply to Fokker-Planck equations, in high dimensions and given complex boundary conditions.


General Climate Applications

Applications of stochastic differential equations to climate physics are described here:

  • Tim Palmer and Paul Williams, eds., Stochastic Physics and Climate Modelling, Cambridge U. Press, Cambridge, 2010.

In particular, the following article describes some very simple models of long-term climate variability:

As a sample, here is the simplest one:

c sdT sdt=b dT ab sT s c_s \frac{d T_s}{d t} = b_d T_a - b_s T_s
c adT adt=b 2T sb 3T a+σW˙ c_a \frac{d T_a}{d t} = b_2 T_s - b_3 T_a + \sigma \dot{W}

This equation describes the interaction between the atmosphere and the ocean’s surface. Here T aT_a is the atmosphere temperature and T sT_s is the ocean surface temperature, which are idealized as functions only of time. The constants c ac_a and c sc_s describe the heat capacity of the atmosphere and the ocean, respectively. In this idealization the atmosphere and ocean temperatures are assumed to interact with each other and with cold outer space in a linear way, following Newton’s law of cooling:

The rate of heat loss of a body is proportional to the difference in temperatures between the body and its surroundings.

These linear interactions are described by the constants b d,b s,b 2,b 3b_d, b_s, b_2, b_3. Finally, we assume the atmosphere is affected by random fluctuations described by the σW˙\sigma \dot{W} term: here W˙\dot{W} is white noise, and σ\sigma is a constant.

For a somewhat realistic model, we assume the heat capacity of the ocean surface is much greater than that of the atmosphere:

c s30c a c_s \approx 30 c_a

Starting from these assumptions one can work out the power spectrum for both atmosphere and ocean surface temperatures, T aT_a and T sT_s. The power spectrum of a random process is the square of the absolute value of its Fourier transform: this measures how ‘strong’ that process is at various frequencies. It turns out that the power spectrum for T sT_s is roughly flat for frequencies below

b 4b s/b 3c s11600days,b_4 b_s / b_3 c_s \approx \frac{1}{1600 \; days} \, ,

and drops off in a manner like the power spectrum of ‘red noise’ at higher frequencies. As mentioned above, red noise is another name for Brownian motion, and it has power inversely proportional to the square of frequency.

The moral is that rapid fluctuations of the atmospheric temperature can drive fluctuations in the ocean temperature that last much longer, even in this simple linearized model. This is a small first step towards understanding how temperature fluctuations at time scales greater than a year can arise in the Earth’s climate. However, many examples of such fluctuations, such as the ENSO and AMO, are quasi-periodic in nature and require much more sophisticated models.

Stochastic Resonance

Stochastic resonance is a phenomenon that is best studied using SDE.

Parametric Estimation

A given time series can be used to fit a SDE from a fixed set of models, see Parametric estimation for stochastic differential equations.

Equations with Delay

If the right hand side of a SDE depends on values of the process in the past, one talks about stochastic differential delay equations.



The following textbook proceeds by explaining little chunks of theory and then progresses through exercises with increasing complexity. It starts with basic probability theory and ends with SDE:

  • Zdzisław Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes: A Course Through Exercises. (ZMATH)

A modern classic, due to its lucid style and the many applications that are explained, is this:

  • Bernt Øksendal, Stochastic Differential Equations: An Introduction with Applications, 6th ed. (ZMATH)

Numerical Solution

The standard references are

  • Peter E. Kloeden and Eckhard Platen, Numerical Solution of Stochastic Differential Equations. (ZMATH)

and its accompanying volume

  • Peter E. Kloeden, Eckhard Platen and Henri Schurz, Numerical Solution of SDE Through Computer Experiments. (ZMATH)

Applications in Natural Sciences

The following book is an elementary introduction to models in physics, chemistry and other sciences using Langevin equations:

  • William T. Coffey, Yuri P. Kalmykov, and John T. Waldron: The Langevin equation. With applications to stochastic problems in physics, chemistry and electrical engineering. 2nd ed. (ZMATH)

The following book is a classic reference for the practicioner, Risken explains model building and approximate solution methods like linear response theory? using a minimum of mathematical background:

  • Hannes Risken: The Fokker-Planck equation. Methods of solution and applications. (ZMATH)