# The Azimuth Project Blog - exploring climate data (part 1) (Rev #14, changes)

Showing changes from revision #13 to #14: Added | Removed | Changed

This is a blog article in progress. To see discussions of the article as it is being written, visit the Azimuth Forum.

joint with Dara O Shayda and David Tweed

Emboldened by our experiments in El Niño analysis and prediction, people in the Azimuth Code Project have been starting to analyze weather and climate data. A lot of this work is exploratory, with no big conclusions. But it’s still interesting! So, let’s try some blog articles where we present this work.

This one will be about the air pressure on the island of Tahiti and in a city called Darwin in Australia: how they’re correlated, and how each one varies. This article will also be a quick introduction to some basic statistics, as well as the concept of ‘continuous wavelet transforms’.

### Darwin, Tahiti and El Niños

The El Niño Southern Oscillation is often studied using the air pressure in Darwin, Australia versus the air pressure in Tahiti. When there’s an El Niño, it gets stormy in the eastern Pacific so the air temperatures tend to be lower in Tahiti and higher in Darwin. When there’s a La Niña, it’s the other way around:

The Southern Oscillation Index or SOI is a normalized version of the monthly mean air pressure anomaly in Tahiti minus that in Darwin. Here anomaly means we subtract off the mean, and normalized means that we divide by the standard deviation.

So, the SOI tends to be negative when there’s an El Niño. On the other hand, when there’s an El Niño the Niño 3.4 index tends to be positive—this says it’s hotter than usual in a certain patch of the Pacific.

Here you can see how this works:

When the Niño 3.4 index is positive, the SOI tends to be negative, and vice versa!

It might be fun to explore precisely how well correlated they are. You can get the data to do that by clicking on the links above.

But here’s another question: how similar are the air pressure anomalies in Darwin and in Tahiti? Do we really need to take their difference, or are they so strongly anticorrelated that either one would be enough to detect an El Niño?

You can get the data to answer such questions here:

Southern Oscillation Index based upon annual standardization, Climate Analysis Section, NCAR/UCAR. This includes links to monthly sea level pressure anomalies in Darwin and Tahiti, in either ASCII format (click the second two links) or netCDF format (click the first one and read the explanation).

In fact this website has some nice graphs already made, which I might as well show you! Here’s the SOI and also the sum of the air pressure anomalies in Darwin and Tahiti, normalized in some way:

(Click to enlarge.)

If the sum were zero, the air pressure anomalies in Darwin and Tahiti would contain the same information and life would be simple. But it’s not!

Dara took the air pressure anomaly data from 1866 to 2012 and graphed it:

PICTURE NEEDS TO BE FIXED!!!

More interestingly, he used Mathematica to compute some ‘continuous wavelet transforms’ of these air pressure anomalies. Let me start by explaining how a continuous wavelet transform works.

### Very basic statistics

It helps to start with some very basic statistics. Suppose you have a list of numbers

$x = (x_1, \dots, x_n)$

You probably know how to take their mean, or average. People often write this with angle brackets:

$\langle x \rangle = \frac{1}{n} \sum_{i = 1}^n x_i$

You can also calculate the mean of their squares:

$\langle x^2 \rangle = \frac{1}{n} \sum_{i = 1}^n x_i^2$

If you were naive you might think  \langle x^2 \rangle = \langle x \rangle^2 \rangle^2, , but in fact we have have:

$\langle x^2 \rangle \ge \langle x \rangle^2$

and they’re equal only if all the $x_i$ are the same. The point is that if the numbers $x_i$ are spread out, the squares of the big ones (positive or negative) contribute more to the average of the squares than if we had averaged them out before squaring. The difference

$\langle x^2 \rangle - \langle x \rangle^2$

is called the variance; it says how spread out our numbers are. The square root of the variance is the standard deviation:

$\sigma_x = \sqrt{\langle x^2 \rangle - \langle x \rangle^2 }$

and this has the slight advantage that if you multiply all the numbers $x_i$ by some constant  c c, , the standard deviation gets multiplied by$|c|$. (The variance gets multiplied by $c^2$.)

We can generalize the variance to a situation where we have two lists of numbers:

$x = (x_1, \dots, x_n)$
$y = (y_1, \dots, y_n)$

Namely, we can form the covariance

$\langle x y \rangle - \langle x \rangle \langle y \rangle$

This reduces to the variance when $x = y$. It measures how much $x$ and $y$ vary together — ‘hand in hand’, as it were. A bit more precisely: if $x_i$ is greater than its mean value mainly for $i$ such that $y_i$ is greater than its mean value, the covariance is positive. On the other hand, if $x_i$ tends to be greater than average when $y_i$ is smaller than average — like with the air pressures at Darwin and Tahiti — the covariance will be negative.

For example, if

 x = (1,-1), \quad y = (1,-1)

then they ‘vary hand in hand’, and the covariance

$\langle x y \rangle - \langle x \rangle \langle y \rangle = 1 - 0 = 1$

is positive. But if

 x = (1,-1), \quad y = (-1,1)

then one is positive when the other is negative, so the covariance

$\langle x y \rangle - \langle x \rangle \langle y \rangle = -1 - 0 = -1$

is negative.

Of course the covariance will get bigger if we multiply both $x$ and $y$ by some big number. If we don’t want this effect, we can normalize the covariance and get the correlation:

$\frac{ \langle x y \rangle - \langle x \rangle \langle y \rangle }{\sigma_x \sigma_y}$

which will always be between $-1$ and $1$.

DISCUSS CORRELATION FOR DARWIN AND TAHITI!!!

Okay, we’re almost ready for continuous wavelet transforms! If the mean of either $x$ or $y$ is zero, the formula for covariance simplifies a lot, to

$\langle x y \rangle = \frac{1}{n} \sum_{i = 1}^n x_i y_i$

So, this quantity says how much the numbers $x_i$ ‘vary hand in hand’ with the numbers $y_i,$ in the special case when one (or both) has mean zero.

We can do something similar if $x, y : \mathbb{R} \to \mathbb{R}$ are functions of time defined for all real numbers $t$. The sum becomes an integral, and we have to give up on dividing by $n$. We get:

$\int_{-\infty}^\infty x(t) y(t)\; d t$

This is called the inner product of the functions $x$ and  y y, , and often it’s written$\langle x, y \rangle,$ but it’s a lot like the covariance.

### Continuous wavelet transforms

Now What suppose are someone continuous hands wavelet us transforms, a and function why should we care?$x : \mathbb{R} \to \mathbb{R}$ and we want to look for wiggles in it that have some particular shape, for example this:

People have lots of tricks for studying ‘signals’, like series of numbers $x_i$ or functions $x : \mathbb{R} \to \mathbb{R}$. One method is to ‘transform’ the signal in a way that reveals useful information. The Fourier transform decomposes a signal into sines and cosines of different frequencies. This lets us see how much power the signal has at different frequencies, but it doesn’t reveal how the power at different frequencies changes with time. For that we should use something else, like the Gabor transform explained by Blake Pollard in a previous post.

Sines and cosines are great, but we might want to look for other patterns in a signal. A ‘continuous wavelet transform’ lets us scan a signal for appearances of a given pattern at different times and also at different time scales: a pattern could go by quickly, or in a stretched out slow way.

To implement the continuous wavelet transform, we need a signal and a pattern to look for. The signal could be a function $x : \mathbb{R} \to \mathbb{R}$. The pattern would then be another function $y: \mathbb{R} \to \mathbb{R},$ usually called a wavelet.

Here’s an example of a wavelet:

This If function we’re is in an example of a relaxed mood, we could callwavelet. If we’re in a relaxed mood, we can call any function that looks like a bump with wiggles in it a wavelet wavelet. There are lots of famous wavelets, but this particular one is the fourth derivative of a certain Gaussian. Mathematica calls this particular waveletDGaussianWavelet[4], and you can look up the formula under ‘Details’ on their webpage.

However, the exact formula doesn’t matter at all now! If we call this wavelet  y(t) y, , all that matters is that it’s a bump with wiggles on it, and that its mean value is 0, or more precisely:

$\int_{-\infty}^\infty y(t) \; d t = 0$

So, As we can saw in the last section, this fact lets us take our function$x$ and this the wavelet$y$ and see how much they ‘vary hand it hand’ simply by computing the their inner product product:

$\langle x , y \rangle = \int_{-\infty}^\infty x(t) y(t)\; d t$

Loosely speaking, this measures the ‘amount of $y$-shaped wiggle in the function $x$’. It’s amazing how hard it is to say something in plain English that perfectly captures the meaning of a simple formula like the above one—so take the quoted phrase with a huge grain of salt. But it gives a rough intuition.

Our wavelet $y$ happens to be centered at $t = 0$. However, we might be interested in $y$-shaped wiggles that are centered not at zero but at some other number $s$. We could detect these by shifting the function $y$ before taking its inner product with $x$:

$\int_{-\infty}^\infty x(t) y(t-s)\; d t$

We could also be interested in measuring the amount of some stretched-out or squashed version of a $y$-shaped wiggle in the function $x$. Again we could do this by changing $y$ before taking its inner product with $x$:

$\int_{-\infty}^\infty x(t) \; y\left(\frac{t}{P}\right) \; d t$

When $P$ is big, we get a stretched-out version of $y$. People sometimes call $P$ the period, since the period of the wiggles in $y$ will be proportional to this (though usually not equal to it).

Finally, we can combine these ideas, and compute

$\int_{-\infty}^\infty x(t) \; y\left(\frac{t- s}{P}\right)\; dt$

This is a function of the shift $s$ and period $P$ which says how much of the $s$-shifted, $P$-stretched wavelet $y$ is lurking in the function $x$. It’s a version of the continuous wavelet transform!

Mathematica implements this idea for ‘time series’, meaning lists of numbers $x = (x_1,\dots,x_n)$ instead of functions $x : \mathbb{R} \to \mathbb{R}$. The idea is that we think of the numbers as samples of a function $x$:

$x_1 = x(\Delta t)$
$x_2 = x(2 \Delta t)$

and so on, where $\Delta t$ is some time step, and replace the integral above by a suitable sum. Mathematica has a function ContinuousWaveletTransform that does this, giving

$w(s,P) = \frac{1}{\sqrt{P}} \sum_{i = 1}^n x_i \; y\left(\frac{i \Delta t - s}{P}\right)$

The factor of $1/\sqrt{P}$ in front is a useful extra trick: it’s the right way to compensate for the fact that when you stretch out out your wavelet $y$ by a factor of  P P, , it gets bigger. So, when we’re doing integrals, we should define thecontinuous wavelet transform of $y$ by:

$w(s,P) = \frac{1}{\sqrt{P}} \int_{-\infty}^\infty x(t) y(\frac{t- s}{P})\; dt$

### The results

Starting with the air pressure anomaly at Darwin, and taking the continous wavelet transform with the DGaussianWavelet[4] as our his wavelet, we Dara get Shayda this: computed the continuous wavelet transform$w(s,P)$ as above. To show us the answer, he created a scalogram:

PICTURE This NEEDS show TO roughly BE how FIXED!!! big the continuous wavelet transform$w(s,P)$ is for different shifts $s$ and periods $P$. Blue means it’s very small, green means it’s bigger, yellow means even bigger and read means very large.

Tahiti gave this:

PICTUE You’ll NEEDS notice TO that BE the FIXED!!! patterns at Darwin and Tahiti are quite similar.

These show roughly how big the continuous wavelet transform is for different shifts and periods. Blue means no power, green means some, yellow means more and read means even more.Puzzle 1. What can you say about the similarities and differences?

You’ll see there’s very little power at periods shorter than 4.7 months; IS THIS BECAUSE THE TIME SERIES HAD BEEN DENOISED??? HOW WAS IT DENOISED??? The most interesting feature is the bursts of power at longer periods, which makes interesting patterns near the bottom of the charts. The bursts occur for both Darwin and Tahiti, but they’re noticeably different. They are strong for Tahiti.Puzzle 2. Do a Gabor transform, also known as a ‘windowed Fourier transform’, of the same data. Blake Pollard explained the Gabor transform in his article Milankovitch vs the Ice Ages. This is a way to see how much a signal wiggles at a given frequency at a given time: we multiply the signal by a shifted Gaussian and then takes its Fourier transform.

Puzzle 1. 3. Why Read are about there continuous more wavelet low-frequency transforms (i.e., and long-period) find fluctuations a in second, Tahiti deeper than reason in why Darwin? we want to use a wavelet$y$ with

Puzzle 2. Explain a bit of the theory of continuous wavelet transforms, and the comparative advantages and disadvantages of using different wavelets for this purpose.

$\int_{-\infty}^\infty y(t) \; d t = 0$

Puzzle 3.In fact we want a somewhat stronger condition, which is implied by the above equation when the Fourier transform of Do a $y$Gabor transform is smooth and integrable:, also known as a ‘windowed Fourier transform’, of the same data. Blake Pollard explained the Gabor transform in his article Milankovitch vs the Ice Ages. This is a way to see how much a signal wiggles at a given frequency at a given time: we multiply the signal by a shifted Gaussian and then takes its Fourier transform.

### Another way to understand correlation

Continuous wavelet transform, Wikipedia.

### Another way to understand correlations

David Tweed mentioned another approach from signal processing to understanding the quantity

 \langle x, x y \rangle = \frac{1}{n} \sum_{i = 1}^n x_i y_i

If we’ve got two lists of data $x$ and $y$ that we want to compare to see if they behave similarly, the first thing we ought to do is multiplicatively scale each one so they’re of comparable magnitude. There are various possibilities for assigning a scale, but a reasonable one is to ensure they have equal ‘energy’

$\sum_{i=1}^n x_i^2 = \sum_{i=1}^n y_i^2$

(This can be achieved by dividing each list by its standard deviation, which is equivalent to what was done in the main derivation above.) Once we’ve done that then it’s clear that looking at

$\sum_{i=1}^n (x_i-y_i)^2$

gives small values when they have a very good match and progressively bigger values as they become less similar. Observe that

 \sum_{i=1}^n \begin{array}{ccl} \displaystyle{\sum_{i=1}^n (x_i-y_i)^2 =\sum_{i=1}^n } &=& \displaystyle{ \sum_{i=1}^n (x_i^2 - 2 x_i y_i + y_i^2) }\\ &=& \displaystyle{ \sum_{i=1}^n x_i^2 - 2 \sum_{i=1}^n x_i y_i + \sum_{i=1}^n y_i^2 } \end{array}
$=\sum_{i=1}^n x_i^2 - 2 \sum_{i=1}^n x_i y_i + \sum_{i=1}^n y_i^2$

Since we’ve scaled things so that $\sum_{i=1}^n x_i^2$ and $\sum_{i=1}^n y_i^2$ are constants, we can see that when $\sum_{i=1}^n x_i y_i$ becomes bigger,

 \displaystyle{ \sum_{i=1}^n (x_i-y_i)^2 }

becomes smaller. So,

 \sum_{i=1}^n \displaystyle{\sum_{i=1}^n x_i y_i y_i}

serves as a measure of how close the lists are, under these assumptions.

category: blog, climate