# The Azimuth Project Experiments with varieties of link strength for El Niño prediction

## Idea

Ludescher et al use a link strength defined in Experiments in El Niño analysis and prediction. The aim of this page is to explore related notions, and especially to find more comprehensible definitions with similar bahaviour.

## Peculiarities of the Ludescher definition

If independent gaussian noise is added to the temperatures for each day and for each grid point, the signal strength as calculated by Ludescher et al increases. Roughly speaking, adding noise with an standard deviation of 0.05C adds about 0.05 to the average link strength. This is very counter-intuitive behaviour for a “link strength” which is supposed to measure correlation.

The following graphs are all based on simulated data. There are two time series of length 565, called “signal 1” and “signal 2” in the graphs, which consist of quadratics $q_1$ and $q_2$ plus independent gaussian noise. The noise has the same amplitude (standard deviation) in all cases, but $q_1$ and $q_2$ are multiplied by 1000 (leftmost column), 9 (second column), 3 (third column) and 1 (fourth column).

Examples of the signals themselves are shown in the top two rows, the value of $c^{(t)}_{i,j}(\tau)$ is in the third row, and the fourth row shows an estimated density of the link strength derived from 100 replicates (different samplings of noise).

In the first column, the $q_1$ and $q_2$ overwhelm the guassian noise, so you can see their shapes. In particular, note that have positive correlation for all delays: it varies between about 0.87 and 0.97. The other three columns are intended to be more realistic signals which roughly resemble climate data. One would expect that as the multiplier for $q_1$ and $q_2$ decreases, the link strength would also decrease, but the opposite is the case.

The code is below.

signal1 <- function(period) {
x <- period / length(period)
x + (x-.5)^2
}

signal2 <- function(period) {
x <- period / length(period)
x - (x-.5)^2
}

period <- 1:566
tau.max <- 200
tau.range <- -tau.max :tau.max
cperiod <- 365

make.Csamples <- function(nreps, scale) {
LSs <- rep(0, nreps)
C <- rep(0, length(tau.range))
for (r in 1:nreps) {
t1 <- scale * signal1(period) + rnorm(length(period))
t2 <- scale * signal2(period) + rnorm(length(period))
for (tau in tau.range) {
if (tau <= 0) {
x <- t1[1:cperiod]
y <- t2[(-tau+1):(-tau+cperiod)]
} else {
x <- t1[(tau+1):(tau+cperiod)]
y <- t2[1:cperiod]
}
C[tau.max+1+tau] <- abs(cor(x,y))
}
LSs[r] <- (max(C)-mean(C))/ sd(C)
}
qauntiles <- quantile(LSs, probs=c(.05,.25,.5,.75,.95))
list(C=C, LSs=LSs, t1=t1, t2=t2, qauntiles = round(qauntiles, digits=2))
}

op <- par(mfcol=c(4,4), mar=c(4,5,1,1))
for (s in 1:4) {
scaling <- c(1000,9,3,1)[s]
dsmps <- make.Csamples(100,scaling)
maintxt <- paste0("signal scaled by ", scaling)

plot(period, dsmps$t1, type='l', ylab="signal 1", xlab="days") plot(period, dsmps$t2, type='l', ylab="signal 2", xlab="days")
plot(tau.range, dsmps$C, type='l', ylab="C(tau)", xlab="tau") dens <- density(dsmps$LSs)
plot(dens$x, dens$y, type='l', xlab="signal strength", ylab="density")
}
par(op)

## An alternative

My basic idea is to define some statistics $f(i, j, d; \sigma)$ where $i$ and $j$ are grid points, $d$ is time (in days) and $\sigma$ is a time scale (also in days) which measures the agreement of the air temperatures between $i$ and $j$ on a time scale of $\sigma$. (“Agreement” and “on a time scale of $\sigma$” are to be made precise later.) As far as I understand the behaviour of Ludescher et al’s link strength $S(i,j,d)$, it responds positively to correlations on a small time scale, that is, if two grid points have similar fluctuations from day to day, possibly with a delay, then the signal goes up. And it responds negatively to correlations on a large time scale, so that if the temperatures of two points do dimilar things over several months, the signal goes down. So, loosely, I expect

$S(i,j,d) \approx f(i, j, d; small) - f(i, j, d; big)$

### A recipe

Like Ludescher et al, the value of $f(i, j, d; \sigma)$ will be based on the 365+200 days leading up to $d$.

1. Seasonally adjust the temperatures by subtracting a climatalogical average for each day and grid point. Call these $A(i,d)$.

2. For any day $x$, define $Y(i, x,\sigma)$ to be the vector of numbers $( A(i,x-364), \dots, A(i,x) )$, smoothed by a gaussian with scale constant $\sigma$ using reflection to pad the ends. (Other options are available.)

3. $U(i, x; \sigma)$ is the standard deviation of $Y(i, x,\sigma)$. $V(i; \sigma)$ is a climatalogical average of $U(i, x; \sigma)$ over many $x$ values. It is intended to capture the spatial variation of the standard deviation over 365 days. It will tend be higher over land than sea, for example. It used to normalise values, and this is the only normalisation in this recipe.

4. For pairs of points $i$ in the basin and $j$ outside the basin, and $\tau$ in $\{0,1,2, \dots, 200\}$ define

$W(i, j, d, \tau; \sigma) = \frac{cov(Y(i, d-\tau,\sigma), \; Y(j, d,\sigma))}{V(i; \sigma)V(j; \sigma)}$

and for $\tau$ in $\{0,-1,-2, \dots, -200\}$, define

$W(i, j, d, -\tau; \sigma) = W(j, i, d, \tau; \sigma)$
5. Find the maximum $f(i, j, d; \sigma)$ over $\tau$ in some window $(-w, \dots, -1,0,1, \dots w)$ of $W(i, j, d, \tau; \sigma)$. $w$ should probably depend on $\sigma$ and on the distance from $i$ to $j$ and is intended to be a plausible range over which two grid points could be physically connected.

category: experiments