Blog - El Niño project (part 8)

This is a blog article in progress, written by John Baez. To see discussions of the article as it was being written, visit the Azimuth Forum.

If you want to write your own article, please read the directions on How to blog.

This time I’d like to compare a different paper on climate networks:

• Y. Berezin, A. Gozolchiani, O. Guez and S. Havlin, Stability of climate networks with time, *Scientific Reports* **2** (2012).

The goal of this paper is to see how stable over time climate networks over time. They divide the world into 9 zones:

For each zone they construct several **climate networks**. Each one is an array of numbers $W_{l r}^y$, one for each year $y$ and each pair of grid points $l, r$ in that zone. They call $W_{l r}^y$ a **link strength**: it’s a measure of how how correlated the weather is at those two grid points during that year.

I’ll say more later about how they compute these link strengths. In Part 3 we explained one method for doing it. This paper uses a similar but subtly different method.

The paper’s first big claim is that $W_{l r}^y$ doesn’t change much from year to year, “in complete contrast” to the pattern of local daily air temperature and pressure fluctuations. In simple terms: *the strength of the correlation between weather at two different points tends to be quite stable.*

Moreover, the definition of link strength involves an adjustable time delay, $\tau$. We can measure the correlation between the weather at point $l$ at any given time and point $r$ at a time $\tau$ days later. The link strength is computed by taking a *maximum* over time delays $\tau$. Naively speaking, the value of $\tau$ that gives the maximum correlation is “how long it typically takes for weather at point $l$ to affect weather at point $r$”. Or the other way around, if $\tau$ is negative.

This is a naive way of explaining the idea, because I’m mixing up correlation with causation. But you get the idea, I hope.

Their second big claim is that when the link strength between two points $l$ and $r$ is big, the value of $\tau$ that gives the maximum correlation doesn’t change much from year to year. In simple terms: *if the weather at two locations is strongly correlated, the amount of time it takes for weather at one point to reach the other point doesn’t change very much.*

How do Berezin *et al* define their climate network?

They use data obtained from here:

This is not exactly the same data set that Ludescher *et al* use, namely:

“Reanalysis 2” is a newer attempt to reanalyze and fix up the same pile of data. That’s a very interesting issue, but never mind that now!

Berezin *et al* use data for:

• the geopotential height for six different pressures

and

• the air temperature at those different heights

The **geopotential height** for some pressure says roughly how high you have to go for air to have that pressure. Click the link if you want a more precise definition! Here’s the geopotential height field for the pressure of 500 millibars on some particular day of some particular year:

The height is in meters.

Berezin *et al* use daily values for this data for:

• locations world-wide on a grid with a resolution of 5° × 5°,

during:

• the years from 1948 to 2006.

They divide the globe into 9 zones, and separately study each zone:

So, they’ve got twelve different functions of space and time, where space is a rectangle discretized using a 5° × 5° grid, and time is discretized in days. From each such function they build a ‘climate network’.

How do they do it?

Berezin’s method of defining a climate network is similar to Ludescher *et al*’s, but different. Compare Part 3 if you want to think about this.

Let $\tilde{S}^y_l(t)$ be any one of their functions, evaluated at the grid point $l$ on day $t$ of year $y$.

Let $S_l^y(t)$ be $\tilde{S}^y_l(t)$ minus its **climatological average**. For example, if $t$ is June 1st and $y$ is 1970, we average the temperature at location $l$ over all June 1sts from 1948 to 2006, and subtract that from $\tilde{S}^y_l(t)$ to get $S^y_l(t)$. In other words:

$\displaystyle{ \tilde{S}^y_l(t) = S^y_l(t) - \frac{1}{N} \sum_y S^y_l(t) }$

where $N$ is the number of years considered.

For any function of time $f$, let $\langle f^y(t) \rangle$ be the average of the function over all days in year $y$. This is different than the ‘running average’ used by Ludescher *et al*, and I can’t even be 100% sure that Berezin mean what I just said: they use the notation $\langle f^y(t) \rangle$.

Let $l$ and $r$ be two grid points, and $\tau$ any number of days in the interval $[-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}]$. Define the **cross-covariance function** at time $t$ by:

$\Big(f_l(t) - \langle f_l(t) \rangle\Big) \; \Big( f_r(t + \tau) - \langle f_r(t + \tau) \rangle \Big)$

I believe Berezin mean to consider this quantity, because they mention two grid points $l$ and $r$. Their notation omits the subscripts $l$ and $r$ so it is impossible to be completely sure what they mean! But what I wrote is the reasonable quantity to consider here, so I’ll assume this is what they meant.

They normalize this quantity and take its absolute value, forming:

$\displaystyle{ X_{l r}^y(\tau) = \frac{\Big|\Big(f_l(t) - \langle f_l(t) \rangle\Big) \; \Big( f_r(t + \tau) - \langle f_r(t + \tau) \rangle \Big)\Big|} {\sqrt{\Big\langle \Big(f_l(t) - \langle f_l(t)\rangle \Big)^2 \Big\rangle } \; \sqrt{\Big\langle \Big(f_r(t+\tau) - \langle f_r(t+\tau)\rangle\Big)^2 \Big\rangle } } }$

They then take the maximum value of $X_{l r}^y(\tau)$ over delays $\tau \in [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}]$, subtract its mean over delays in this range, and divide by the standard deviation. They write something like this:

$\displaystyle{ W_{l r}^y = \frac{\mathrm{MAX}\Big( X_{l r}^y - \langle X_{l r}^y\rangle \Big) }{\mathrm{STD} X_{l r}^y} }$

and say that the maximum, mean and standard deviation are taken over the (not written) variable $\tau \in [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}]$.

Each number $W_{l r}^y$ is called a **link strength**. For each year, the matrix of numbers $W_{l r}^y$ where $l$ and $r$ range over all grid points in our zone is called a **climate network**.

We can think of a climate network as a weighted complete graph with the grid points $l$ as nodes. Remember, an **undirected graph** is one without arrows on the edges. A **complete graph** is an undirected graph with one edge between any pair of nodes:

A **weighted graph** is an undirected graph where each edge is labelled by a number called its **weight**. But right now we’re also calling the weight the ‘link strength’.

A lot of what’s usually called ‘network theory’ is the study of weighted graphs. You can learn about it here:

• Ernesto Estrada, *The Structure of Complex Networks: Theory and Applications*, Oxford U. Press, Oxford, 2011.

Suffice it to say that given a weighted graph, there are lot of quantities you can compute from it, which are believed to tell us interesting things!

I will not delve into the real meat of the paper, namely what they actually *do* with their climate networks! The paper is free online, so you can read this yourself.

I will just quote their conclusions and show you a couple of graphs.

The conclusions touch on an issue that’s important for the network-based approach to El Niño prediction. If climate networks are ‘stable’, not changing much in time, why would we use them to predict a time-dependent phenomenon like the El Niño Southern Oscillation?

We have established the stability of the network of connections between the dynamics of climate variables (e.g. temperatures and geopotential heights) in different geographical regions. This stability stands in fierce contrast to the observed instability of the original climatological field pattern. Thus the coupling between different regions is, to a large extent, constant and predictable. The links in the climate network seem to encapsulate information that is missed in analysis of the original field.

The strength of the physical connection, $W_{l r}$, that each link in this network represents, changes only between 5% to 30% over time. A clear boundary between links that represent real physical dependence and links that emerge due to noise is shown to exist. The distinction is based on both the high link average strength $\overline{W_{l r}}$ and on the low variability of time delays $\mathrm{STD}(T_{l r})$.

Recent studies indicate that the strength of the links in the climate network changes during the El Niño Southern Oscillation and the North Atlantic Oscillation cycles. These changes are within the standard deviation of the strength of the links found here. Indeed in Fig. 3 it is clearly seen that the coefficient of variation of links in the El Niño basin (zone 9) is larger than other regions such as zone 1. Note that even in the El Niño basin the coefficient of variation is relatively small (less than 30%).

Beside the stability of single links, also the hierarchy of the link strengths in the climate network is preserved to a large extent. We have shown that this hierarchy is partially due to the two dimensional space in which the network is embedded, and partially due to pure physical coupling processes. Moreover the contribution of each of these effects, and the level of noise was explicitly estimated. The spatial effect is typically around 50% of the observed stability, and the noise reduces the stability value by typically 5%–10%.

The network structure was further shown to be consistent across different altitudes, and a monotonic relation between the altitude distance and the correspondence between the network structures is shown to exist. This yields another indication that the observed network structure represents effects of physical coupling.

The stability of the network and the contributions of different effects were summarized in specific relation to different geographical areas, and a clear distinction between equatorial and off–equatorial areas was observed. Generally, the network structure of equatorial regions is less stable and more fluctuative.

The stability and consistence of the network structure during time and across different altitudes stands in contrast to the known unstable variability of the daily anomalies of climate variables. This contrast indicates an analogy between the behavior of nodes in the climate network and the behavior of coupled chaotic oscillators. While the fluctuations of each coupled oscillators are highly erratic and unpredictable, the interactions between the oscillators is stable and can be predicted. The possible outreach of such an analogy lies in the search for known behavior patterns of coupled chaotic oscillators in the climate system. For example, existence of phase slips in coupled chaotic oscillators is one of the fingerprints for their cooperated behavior, which is evident in each of the individual oscillators. Some abrupt changes in climate variables, for example, might be related to phase slips, and can be understood better in this context.

On the basis of our measured coefficient of variation of single links (around 15%), and the significant overall network stability of 20–40%, one may speculatively assess the extent of climate change. However, for this assessment our current available data is too short and does not include enough time from periods before the temperature trends. An assessment of the relation between the network stability and climate change might be possible mainly through launching of global climate model “experiments” realizing other climate conditions, which we indeed intend to perform.

A further future outreach of our work can be a mapping between network features (such as network motifs) and known physical processes. Such a mapping was previously shown to exist between an autonomous cluster in the climate network and El Niño. Further structures without such a climate interpretation might point towards physical coupling processes which were not observed earlier.

(I have expanded some acronyms and deleted some reference numbers.)

Finally, here two nice graphs showing the average link strength as a function of distance. The first is based on four climate networks for Zone 1, the southern half of South America:

The second is based on four climate networks for Zone 9, a big patch of the Pacific north of the Equator:

As we expect, temperatures and geopotential heights get less correlated at points further away. But the rate at which the correlation drops off conveys interesting information! Graham Jones has made some interesting charts of this for the rectangle of the Pacific that Ludescher *et al* use for El Niño prediction, and I’ll show you those next time.

• El Niño project (part 1): basic introduction to El Niño and our project here.

• El Niño project (part 2): introduction to the physics of El Niño.

• El Niño project (part 3): summary of the work of Ludescher *et al*.

• El Niño project (part 4): how Graham Jones replicated the work by Ludescher *et al*, using software written in R.

• El Niño project (part 5): how to download R and use it to get files of climate data.

• El Niño project (part 6): Steve Wenner’s statistical analysis of the work of Ludescher *et al*.

• El Niño project (part 7): the definition of El Niño.

• El Niño project (part 8): Berezin *et al* on the stability of climate networks.

category: blog