The Azimuth Project
Blog - increasing the signal-to-noise ratio with more noise

This page is a blog article in progress, written by Glyn Adgie and Tim van Beek. To discuss this article while it was being written, visit the Azimuth Forum. To the see the final polished version, go to the Azimuth Blog.

Or: Are the Milankovich Cycles Causing the Ice Ages?

The Milankovich cycles are periodic changes in how the Earth orbits the Sun. One question is: can these changes can be responsible for the ice ages? On the first sight it seems impossible, because the changes are simply too small. But it turns out that we can find a dynamical system where a small periodic external force is actually strengthened by random ‘noise’ in the system. This phenomenon has been dubbed ‘stochastic resonance’ and has been proposed as an explanation for the ice ages.

In this blog post we would like to provide an introduction to it. We will look at a bistable toy example. And you’ll even get to play with an online simulation that some members of the Azimuth Project created, namely Glyn Adgie, Allan Erskine and Jim Stuttard!

Equations with "Noise"

In This Weeks Finds, back in ‘week308’, we had a look at a model with ‘noise’. A lot of systems can be described by ordinary differential equations:

dxdt=f(x,t) \frac{d x}{d t} = f(x, t)

If ff is nice, the time evolution of the system x(t)x(t) will be a nice smooth function, like the trajectory of a thrown stone. But there are situations where we have some kind of noise, a chaotic, fluctuating influence, that we would like to take into account. This could be, for example, turbulence in the air flow around a rocket. Or, in our case, short term fluctuations of the weather of the earth. If we take this into account, we get an equation of the form

dxdt=f(x,t)+W(t) \frac{d x}{d t} = f(x, t) + W(t)

where the “W” is some model of noise. In all the examples above, the noise is actually a simplified way to take into account fast degrees of freedom of the system at hand. This way we do not have to explicitly model these degrees of freedom, which is often impossible. But we could still try to formulate a model where long term influences of these degrees of freedom are included.

A way to make the above mathematically precise is to write the equation in the form

dx t=f(x,t)dt+g(x,t)dW t d x_t = f(x, t) d t + g(x, t) d W_t

as an Ito stochastic differential equation driven by a Wiener process.

The mathematics behind this is somewhat sophisticated, but we don’t need to delve into it in this blog post. For those who are interested in it, we have gathered some material on the Azimuth Wiki on the page stochastic differential equation.

Physicists and engineers who model systems with ordinary and partial differential equations should think of stochastic differential equations as another tool in the toolbox. With this tool, it is possible to easily model systems that exhibit new behaviour.

What is Stochastic Resonance ?

Listening to your friend who sits across the table, the noise around you may become a serious distraction, although humans are famous for their ability to filter the signal - what your friend is saying - from the noise - all other sounds, people’s chatter, dishes clattering etc. But could you imagine a situation where more noise makes it easier to detect the signal?

Picture a marble that sits in a double well potential:

double well potential

This graphic is taken from

• Roberto Benzi, Stochastic resonance: from climate to biology.

This review paper is also a nice introduction to the topic.

This simple model can be applied to various situations. Since we think about the ice ages, we can interpret the two metastable states at the two minima of the potential in this way: the minimum to the left at -1 corresponds to the Earth in an ice age, or, earlier in the Earth’s history, a Snowball Earth event. The minimum to the right at 1 corresponds to the Earth at temperatures as we know it today.

Let us add a small periodic force to the model. This periodic force corresponds to different solar input due to periodic changes in Earth’s orbit. These are the’Milankovich cycles’ mentioned in the introduction. There are several different cycles with different periods, but in our toy model we will consider only one force with one period.

If the force itself is too small to get the marble over the well that is between the two potential minima, we will not observe any periodic change in the position of the marble. But if we add noise to the motion, it may happen that the force together with the noise succeed in getting the marble from one minimum to the other! This is more likely to happen when the force is pushing the right way. So, it should happen with roughly the same periodicity as the force: we should see a ‘quasi-periodic’ motion of the marble.

But if we increase the noise more and more, the influence of the external force will become unrecognizable, and we will not recognize any pattern at all.

There should be a noise strength somewhere in between that makes the Milankovich cycles as visible as possible from the Earth’s temperature record. In other words, if we think of the periodic force as a ‘signal’, there should be some amount of noise that maximizes the ‘signal-to-noise ratio’. Can we find out what it is?

In order to find out, let us make our model precise. Let’s take the time-independent potential to be

V(x)14x 412x 2 V(x) \coloneqq \frac{1}{4} x^4 - \frac{1}{2} x^2

This potential has two local minima at latexx=±1.latex x = \pm 1. Let’s take the time-dependent periodic forcing, or ‘signal’, to be

F(t)=Asin(t) F(t) = - A \; \sin(t)

The constant AA is the amplitude of the periodic forcing.

The time-dependent effective potential of the system therefore is:

V(x,t)14x 412x 2Asin(t)x V(x, t) \coloneqq \frac{1}{4} x^4 - \frac{1}{2} x^2 - A \; \sin(t) x

Including noise leads to the equation of motion, which is usually called a Langevin equation by physicists and chemists:

dX=V(X t,t)dt+2DdW t=(X tX t 3+Asin(t))dt+2DdW t dX \; = -V'(X_t, t) \; dt + \sqrt{2D} \; dW_t = (X_t - X_t^3 + A \; \sin(t)) \; dt + \sqrt{2D} \; dW_t

For more about what a Langevin equation is, see our article on stochastic differential equations.

This model has two free parameters, AA and DD. As already mentioned, AA is the amplitude of the forcing. DD is called the diffusion coefficient and represents the strength of noise. In our case it is a constant, but in more general models it could depend on space and time.

It is possible to write programs that generate simulations aka numerical approximations to solutions of stochastic differential equations. For those interested in how this works, there is an addendum at the bottom of the blog post that explains this. We can use such a program to graph the three cases of small, high and optimal noise level.

In the following graphs, green is the external force, while red is the marble position. Here is a simulation with a low level of noise:

low noise level

As you can see, within the time of the simulation there is no transition from the metastable state at 1 to the one at -1. That is, in this situation Earth stays in the warm state.

Here is a simulation with a high noise level:

high noise level

The marble jumps wildly between the states. By inspecting the graph with your eyes only, you don’t see any pattern in it, do you?

And finally, here is a simulation where the noise level seems to be about right to make the signal visible via a quasi-periodic motion:

high noise level

Sometimes the marble makes a transition in phase with the force, sometimes it does not, sometimes it has a phase lag. It can even happen that the marble makes a transition while the force acts in the opposite direction!

Some members of the Azimuth crew have created an online model that calculates simulations of the model, for different values of the involved constants. You can see it here:

Stochastic resonance example.

Our special thanks go to Allan Erskine and Jim Stuttard.

You can change the values using the sliders under the graphic and see what happens. You can also set different “random seeds”, which means that the random numbers used in the simulation will be different.

Above we asked if one could calculate for a fixed AA the level of noise DD that optimally increases the signal to noise ratio. We have defined the model system, but we still need to define “signal to noise ratio” in order to address the question.

There is no general definition that makes sense for all kinds of systems. For our model system, let’s assume that the expectation value of x(t)x(t) has a Fourier expansion like this:

x(t)= n= M nexp(int) \langle x(t) \rangle = \sum_{n = - \infty}^{\infty} M_n \exp{(i n t)}

Then one useful definition of the signal to noise ratio η\eta is

η:=|M 1| 2A 2 \eta := \frac{|M_1|^2}{A^2}

So, can one calculate the value of latexDlatex D, depdending on latexAlatex A that maximizes latexηlatex \eta? We don’t know! Do you?

If you would like to know more, have a look at this page:

Stochastic resonance, Azimuth Library.

Also, read on for more about how we numerically solved our stochastic differential equation, and the design decisions we had to make to create a simulation that">runs on your web browser.

For übernerds only

Numerically solving stochastic differential equations

Maybe you are asking yourself what is behind the online model? How does one “solve numerically” a stochastic differential equation and in what sense are the graphs that you see there “close” to the exact solution? No problem, we will explain it here!

First, you need C code like this:

#define znew ((z = 36969 * (z&65535) + (z >> 16)) << 16)

#define wnew ((w = 18000 * (w&65535) + (w >> 16)) & 65535)

static unsigned long z = 362436069, w = 521288629;

float rnor() {
float x, y, v;

x = ((signed long) (znew + wnew)) * 1.167239e - 9;

if (fabs(x) < 1.17741)
	return (x);

y = (znew + wnew) * 2.328306e - 10;

if (log(y) < 0.69314722 - 0.5 * (x * x))
	return (x);

x = (x > 0) ? 0.8857913 * (2.5066282 - x) : -0.8857913 * (2.506628 + x);

if (log(1.8857913 - y) < 0.5718733 - 0.5 * (x * x))
	return (x);

do {
	v = ((signed long) (znew + wnew)) * 4.656613e - 10;

	x = -log(fabs(v)) * 0.3989423;
	y = -log((znew + wnew) * 2.328306e - 10);

} while (y + y < x * x);
return( (v > 0? 2.506628 + x : -2.5066282 - x);

Pretty scary, huh? Do you see what it does? We don’t think anyone stands a chance to understand all the “magical constants” in it. We pasted it here so you can go around and show it to friends who claim that they know the programming language C and understand every program in seconds, and see what they can tell you about it. We’ll explain it below.

Since this section is for übernerds, we will assume that you know stochastic processes in continuous time. Brownian motion on \mathbb{R} can be viewed as a probability distribution on the space of all continuous functions on a fixed interval, C[0,T]C[0, T]. This is also true for solutions of stochastic differential equations in general. A “numerical approximation” to a stochastic differential equation is an discrete time approximation that samples this probability space.

Another way to think about it is to start with numerical approximations of ordinary differential equations. When you are handed such an equation of the form

dxdt=f(x,t) \frac{d x}{d t} = f(x, t)

with an initial condition x(0)=x 0x(0) = x_0, you know that there are different discrete approximation schemes to calculate an approximate solution to it. The simplest one is the Euler scheme. You need to choose a step size hh and calcluate values for xx recusively using the formula

x n+1=x n+f(x n,t n)h x_{n + 1} = x_n + f(x_n, t_n) \; h

with t n+1=t n+ht_{n+1} = t_n + h. Connect the discrete values of x nx_n with straight lines to get your approximating function x hx^h. For h0h \to 0, you get convergence for example in the supremum norm of x h(t)x^h(t) to the exact solution x(t)x(t):

lim h0x hx =0 \lim_{h \to 0} \| x^h - x \|_\infty = 0

It is possible to extend some schemes from ordinary differential equations to stochastic differential equations that involve certain random variables in every step. The best textbook on the subject that we know is this one:

• Peter E. Kloeden and Eckhard Platen, Numerical Solution of Stochastic Differential Equations. (Springer 2010, ZMATH review)

The extension of the Euler scheme is very simple, deceptively so! It is much less straightforward to determine the extensions of higher Runge-Kutta schemes, for example. You have been warned! The Euler scheme for stochastic differential equations of the form

dX t=f(X,t)dt+g(X,t)dW t d X_t = f(X, t) d t + g(X, t) d W_t

is simply

X n+1=X n+f(X n,t n)h+g(X n,t n)w n X_{n + 1} = X_n + f(X_n, t_n) \; h + g(X_n, t_n) \; w_n

where all the w nw_n are independent random variables with a normal distribution N(0,h)N(0, h). If you write a computer program that calculates approximations using this scheme, you need a pseudorandom number generator. This is what the code I posted above is about. Actually, it is an efficient algorithm to calculate pseudorandom numbers with a normal distribution. It’s from this paper:

• George Marsaglia and Wai Wan Tsang, The Monty Python method for generating random variables, ACM Transactions on Mathematical Software 24 (1998), 341–350.

While an approximation to an ordinary differential equation approximates a function, an approximation to a stochastic differential equation approximates a probability distribution on C[0,T]C[0, T]. So we need a different concept of ‘convergence’ of the approximation for h0h \to 0. The two most important ones are called ‘strong’ and ‘weak’ convergence.

A discrete approximation X hX^h to an Ito process XX converges strongly at time TT, if

lim h0E(|X T hX T|)=0 \lim_{h \to 0} \; E(|X_T^h - X_T|) = 0

It is said to be of strong order γ\gamma if there is a constant CC such that

E(|X T hX T|)Ch γ E(|X_T^h - X_T|) \leq C h^{\gamma}

As we see from the definition, a strong approximation needs to approximate the path of the original process XX at the end of the time interval [0,T][0, T].

What about weak convergence?

A discrete approximation X hX^h to an Ito process XX converges weakly for a given class of functions GG at a time TT, if for every function gGg \in G we have

lim h0E(g(X T h))E(g(X T))=0 \lim_{h \to 0} \; E(g(X_T^h)) - E(g(X_T)) = 0

where we assume that all appearing expectation values exist and are finite.

It is said to be of weak order γ\gamma with respect to GG if there is a constant CC such that for every function gGg \in G:

E(g(X T h))E(g(X T))Ch γ E(g(X_T^h)) - E(g(X_T)) \leq C h^{\gamma}

One speaks simply of weak convergence when GG is the set of all polynomials.

So weak convergence means that the approximation needs to approximate the probability distribution of the original process XX. Weak and strong convergence are indeed different concepts, for example: The Euler scheme is of strong order 0.5 and of weak order 1.0.

Our simple online model implements the Euler scheme to calculate approximations to our model equation. So now you know it what sense it is indeed an approximation to the solution process!

Implementation details of the interactive online model

There appear to be three main approaches to implementing an interactive online model. It is assumed that all the main browsers understand Javascript.

  1. Implement the model using a server-side programme that produces graph data in response to parameter settings, and use Javascript in the browser to plot the graphs and provide interaction inputs.

  2. Pre-compute the graph data for various sets of parameter values, send these graphs to the browser, and use Javascript to select a graph and plot it. The computation of the graph data can be done on the designers machine, using their preferred language.

  3. Implement the model entirely in Javascript.

Option 1) is only practical if one can run interactive programmes on the server. While there is technology to do this, it is not something that most ISPs or hosting companies provide. You might be provided with Perl and PHP, but neither of these is a language of choice for numeric coding.

Option 2) has been tried. The problem with this approach is the large amount of pre-computed data required. For example, if we have 4 independent parameters, each of which takes on 10 values, then we need 10000 graphs. Also, it is not practical to provide much finer control of each parameter, as this increases the data size even more.

Option 3) looked doable. Javascript has some nice language features, such as closures, which are a good fit for implementing numerical methods for solving ODEs. The main question then is whether a typical Javascript engine can cope with the compute load.

One problem that arose with a pure Javascript implementation is the production of normal random deviates required for the numeric solution of an SDE. Javascript has a random() function, that produces uniform random deviates, and there are algorithms that could be implemented in Javascript to produce normal deviates from uniform deviates. The problems here are that there is no guarantee that all Javascript engines use the same PRNG, or that the PRNG is actually any good, and we cannot set a specific seed in Javascript. Even if the PRNG is adequate, we have the problem that different users will see different outputs for the same parameter settings. Also, reloading the page will reseed the PRNG from some unknown source, so a user will not be able to replicate a particular output.

The chosen solution was to pre-compute a sequence of random normal deviates, using a PRNG of known characteristics. This sequence is loaded from the server, and will naturally be the same for all users and sessions. This is the C++ code used to create the fixed Javascript array:

#include <boost/random/mersenne_twister.hpp>
#include <boost/random/normal_distribution.hpp>
#include <iostream>

int main(int argc, char * argv[])
    boost::mt19937 engine(1234); // PRNG with seed = 1234
    boost::normal_distribution<double> rng(0.0, 1.0); // Mean 0.0, standard deviation 1.0
    std::cout << "normals = [\n";
    int nSamples = 10001;
        std::cout << rng(engine) << ",\n";
    std::cout << "];";

Deviates with a standard deviation other than 1.0 can be produced in Javascript by multiplying the pre-computed deviates by a suitable scaling factor.

The graphs only use 1000 samples from the Javascript array. To see the effects of different sample paths, the Javascript code selects one of ten slices from the pre-computed array.

category: blog