# The Azimuth Project Climate network (changes)

Showing changes from revision #7 to #8: Added | Removed | Changed

## Idea

A climate network is a undirected graph whose nodes represent points in a spatial grid, and where the edge weight (link strength) between nodes i and j is calculated from the historical weather record at the two points. For example it could be based on the cross-correlation of temperature histories at i and j. In weighted graph formulations, the cross-correlation may supply the raw data for the edge weight. In unweighted graph formulations, a binary decision rule, based on the cross-correlations between the histories at i and j, may be used to specify whether i and j are “connected” or not.

The graph structure of the network is a function of time.

Climate networks have found important applications to the detection and forecasting of El Niño events. See:

## Definition of climate networks

There are many ways in which a “link strength” could be defined. First we will set up some notation. Given a time $t$, we can define a map which takes $t$ to a vector of times no later then $t$. For example, if the units of time are days, a “previous year” map $Y(t)$ can be defined like:

$Y(t) = (t-364,\dots t-1, t).$

The covariance of two vectors $x$ and $y$ of the same length $n$ is defined in the usual way:

$\cov(x,y) = \operatorname{E}{\big[(x - \operatorname{E}[x])(y - \operatorname{E}[y])\big]} = \frac{1}{n}\sum_i \big( x_i - \frac{1}{n}\sum_j x_j \big) \big(y_i - \frac{1}{n}\sum_j y_j \big)$

From this, the correlation can be defined as

$\cor(x,y) = \frac{\cov(x,y)}{\cov(x,x)^{1/2} \; \cov(y,y)^{1/2}}.$

Finally, if $f(t)$ is a function of time, such as the temperature, we extend $f$ to a map between vectors in the obvious way:

$f((x_1, \dots, x_n)) = (f(x_1), \dots, f(x_n))$

The following definitions are derived from:

Firstly, $T_i(t)$ is the temperature on day $t$ at point $i$. For a time lag of $\tau$ days, $0 \leq \tau \leq 200$ they define time-lagged cross-covariances

$C^{(t)}_{i,j}(-\tau) = cov( T_i(Y(t)), \; T_j(Y(t-\tau)) )$

and

$C^{(t)}_{i,j}(\tau) = cov( T_i(Y(t-\tau)), \; T_j(Y(t)) )$

and then divide these by the corresponding standard deviations to obtain cross-correlations.

$c^{(t)}_{i,j}(-\tau) = \frac{C^{(t)}_{i,j}(-\tau)}{C^{(t)}_{i,i}(0)^{1/2} \; C^{(t-\tau)}_{j,j}(0)^{1/2}}.$

and a similar expression for $c^{(t)}_{i,j}(\tau)$.

The description of $S_{ij}(t)$ is changed in the correction to the paper. It is confusing, but I hope the following is correct: They determine, (by taking expectations over $\tau$) for each point in time $t$, and for any pair of points $i,j$, the maximum, the mean, and the standard deviation of $| c^{(t)}_{i,j}(\tau) |$ around the mean and define the link strength $S_{ij}(t)$ as the difference between the maximum and the mean value, divided by the standard deviation.

## El Niño prediction

The time dependent average link strength $S(t)$ is obtained by averaging over $S_{ij}(t)$ where $i$ is in the “El Niño basin” and $j$ is in an area of the Pacific outside the basin.

It is observed that $S(t)$ decreases during El Niño events. The prediction of El Niño events is done by choosing a threshold, and predicting an El Niño event when $S(t)$ rises above the threshold.

The link strength defined above by Yamasaki et al has some surprising behaviour. The See following graphs are all based on simulated data. There are two time series of length 565, called “signal 1” and “signal 2” in the graphs, which consist of quadratics$q_1$Experiments with varieties of link strength for El Niño prediction . and$q_2$ plus independent gaussian noise. The noise has the same amplitude (standard deviation) in all cases, but $q_1$ and $q_2$ are multiplied by 1000 (leftmost column), 9 (second column), 3 (third column) and 1 (fourth column).

Examples of the signals themselves are shown in the top two rows, the value of $c^{(t)}_{i,j}(\tau)$ is in the third row, and the fourth row shows an estimated density of the link strength derived from 100 replicates (different samplings of noise).

In the first column, the $q_1$ and $q_2$ overwhelm the guassian noise, so you can see their shapes. In particular, note that have positive correlation for all delays: it varies between about 0.87 and 0.97. The other three columns are intended to be more realistic signals which roughly resemble climate data. One would expect that as the multiplier for $q_1$ and $q_2$ decreases, the link strength would also decrease, but the opposite is the case. The code is below.

signal1 <- function(period) {
x <- period / length(period)
x + (x-.5)^2
}

signal2 <- function(period) {
x <- period / length(period)
x - (x-.5)^2
}

period <- 1:566
tau.max <- 200
tau.range <- -tau.max :tau.max
cperiod <- 365

make.Csamples <- function(nreps, scale) {
LSs <- rep(0, nreps)
C <- rep(0, length(tau.range))
for (r in 1:nreps) {
t1 <- scale * signal1(period) + rnorm(length(period))
t2 <- scale * signal2(period) + rnorm(length(period))
for (tau in tau.range) {
if (tau <= 0) {
x <- t1[1:cperiod]
y <- t2[(-tau+1):(-tau+cperiod)]
} else {
x <- t1[(tau+1):(tau+cperiod)]
y <- t2[1:cperiod]
}
C[tau.max+1+tau] <- abs(cor(x,y))
}
LSs[r] <- (max(C)-mean(C))/ sd(C)
}
qauntiles <- quantile(LSs, probs=c(.05,.25,.5,.75,.95))
list(C=C, LSs=LSs, t1=t1, t2=t2, qauntiles = round(qauntiles, digits=2))
}

op <- par(mfcol=c(4,4), mar=c(4,5,1,1))
for (s in 1:4) {
scaling <- c(1000,9,3,1)[s]
dsmps <- make.Csamples(100,scaling)
maintxt <- paste0("signal scaled by ", scaling)

plot(period, dsmps$t1, type='l', ylab="signal 1", xlab="days") plot(period, dsmps$t2, type='l', ylab="signal 2", xlab="days")
plot(tau.range, dsmps$C, type='l', ylab="C(tau)", xlab="tau") dens <- density(dsmps$LSs)
plot(dens$x, dens$y, type='l', xlab="signal strength", ylab="density")
}
par(op)

category: climate