The Azimuth Project
Blog - El Niño project (part 3) (Rev #4, changes)

Showing changes from revision #3 to #4: Added | Removed | Changed

This is a blog article in progress, written by John Baez. To see discussions of the article as it is being written, visit the Azimuth Forum.

If you want to write your own article, please read the directions on How to blog.

This In February, this paper claimed that there’s a 3/4 75% chance that the next El Niño will arrive by the end of 2014:

• Josef Ludescher, Avi Gozolchiani, Mikhail I. Bogachev, Armin Bunde, Shlomo Havlin, and Hans Joachim Schellnhuber, Very early warning of next El Niño, Proceedings of the National Academy of Sciences, February 2014. (Click title for free version, journal name for official version.)

Since it was published in a reputable journal, it created a big stir. stir! But that’s not the main reason we at the Azimuth Project want to analyze and improve this paper! paper. The main reason is that it uses aclimate network.

Very roughly, the idea is this. Draw a big network of dots representing different places in the Pacific Ocean. Connect For two each dots pair with of an dots, edge compute if a number saying how strongly correlated the sea surface temperatures are at those two places places. are strongly correlated. The paper claims that when a El Niño is getting ready to happen, we the get average a of these numbers is big. In other words, temperatures in the Pacific tend to go up and down in synch!lot of edges this way. In other words, temperatures in a big region of the Pacific Ocean tend to go up and down in synch!

Whether this idea is right or wrong, it’s interesting— and it’s not very hard for programmers to dive in and study it.

Two Azimuth members have done just that: David Tanzer, a software developer who works for financial firms in New York, and Graham Jones, a self-employed programmer who also works on genomics and Bayesian statistics. These guys have really brought new life to the Azimuth Code Project in the last few weeks, and it’s exciting! It’s even gotten me to do some programming myself.

Soon I’ll start talking about the programs they’ve written, and how you can help.
But today I’ll summarize the paper by Ludescher et al . The Their methodology is also explained here:

• Josef Ludescher, Avi Gozolchiani, Mikhail I. Bogachev, Armin Bunde, Shlomo Havlin, and Hans Joachim Schellnhuber, Improved El Niño forecasting by cooperativity detection, Proceedings of the National Academy of Sciences, 30 May 2013.

The basic idea

The basic idea is to use a climate network. There are lots of variants on this idea, but here’s a simple one. Start with a bunch of points dots representing different places on the Earth. Draw For an any edge pair between two points if the weather at those two place is strongly correlated… in some way that we get to decide. This gives us a bunch of points dots and edges between points, or in other words, anundirected graphii . That’s and our climate network! Then we calculate stuff about this network.jj, compute the cross-correlation of temperature histories at those two places. Call some function of this the ‘link strength’ for that pair of dots. Compute the average link strength… and get excited when this gets bigger than a certain value.

There The are papers lots by of Ludescher ways to fill in the details. For example, for any pair of pointsiiet al use this strategy to predict El Niños. They build their climate network using correlations between daily temperature data for 14 grid points in the El Niño basin and 193 grid points outside this region, as shown here:jj we could compute the cross-correlation of temperature histories at these points. We could use say that ii and jj are connected by an edge if the cross-correlation is bigger than some value.

The papers by Ludescher et al try to predict El Niños by studying correlations between daily temperature data for “14 grid points in the El Niño basin and 193 grid points outside this domain”, as shown here:

The red dots are the points in the El Niño basin.

Starting from this temperature data, they build compute an ‘average link strength’ in a climate way network I’ll that describe changes later. with time. And starting from that, they calculate a number. When this number is bigger than a certain fixed value, they claim an El Niño is coming.

How do they decide if they’re right? How do we tell when an El Niño actually arrives? One way is to use the ‘Nin&ntildelo ‘Niño 3.4 index’. This the area-averaged sea surface temperature anomaly in the yellow region here:

Anomaly means the temperature minus its average over time: how much hotter than usual it is. When the Niño 3.4 index is over 0.5°C for at least 3 months, Ludescheret al say there’s an El Niño.

Here is what they get:

The blue peaks are El Niños: episodes where the Niño 3.4 index is over 0.5°C for at least 3 months. The red line is the their Nin&ntildelo ‘average 3.4 link index. strength’. When Whenever this gets exceed above… a certain thresholdΘ=2.82\Theta = 2.82, they predict an El Niño will start in the following calendar year.

The green arrows show their successes. The dashed arrows show their false alarms. You can see a little letter n whenever an El Niño occurred that they failed to predict.

Actually, chart A here shows the ‘learning phase’ of their calculation. In this phase, they adjusted the Θ\Theta so their procedure would do a good job. Chart B shows the ‘testing phase’. Here they used the value of Θ\Theta chosen in the learning phase, and checked to see how good a job it did.

The details

For Now any I mainly need to explain how they compute their ‘average link strength’.f(t)f(t), denote the moving average over the past year by:

Let ii stand for any point in this 9 × 27 grid:

For each day tt between June 1948 and November 2013, let T˜ i(t)\tilde{T}_i(t) be the the average surface air temperature at the point ii on day tt. You can get these numbers from here:

National Centers for Environmental Prediction and the National Center for Atmospheric Research Reanalysis I Project.

Let T i(t)T_i(t) be T˜ i(t)\tilde{T}_i(t) minus its climatological average: that is, minus its average value on that day of the year. The point is that we don’t care about the temperature: we care how much hotter it is than usual for that day of the year. For example, if tt is June 1st 1970, we average the temperature at location ii over all June 1sts from 1948 to 2013, and subtract that from T˜ i(t)\tilde{T}_i(t) to get T i(t)T_i(t). They call T i(t)T_i(t) the temperature anomaly.

For any function of time, denote its moving average over the last 365 days by:

f(t)=1365 d=0 364f(td)\langle f(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} f(t - d)

Let ii be a node in the El Niño basin, and jj be a node outside of it.

Let tt range over every tenth day in the time span from 1950 to 2011.

Let T k(t)T_k(t) be the daily atmospheric temperature anomalies (actual temperature value minus climatological average for each calendar day).

Define the time-delayed cross-covariance function by:

C i,j t(τ)=T i(t)T j(tτ)T i(t)T j(tτ) C_{i,j}^{t}(-\tau) = \langle T_i(t) T_j(t - \tau) \rangle - \langle T_i(t) \rangle \langle T_j(t - \tau) \rangle
C i,j t(τ)=T i(tτ)T j(t)T i(tτ)T j(t) C_{i,j}^{t}(\tau) = \langle T_i(t - \tau) T_j(t) \rangle - \langle T_i(t - \tau) \rangle \langle T_j(t) \rangle

They consider time lags τ\tau between 0 and 200 d, where “a reliable estime of the backround noise level can be guaranteed.”

Divide the cross-covariances by the standard deviations of T iT_i and T jT_j to obtain the cross-correlations.

Only temperature data from the past are considered when estimating the cross-correlation function at day tt.

Next, for nodes ii and jj, and for each time point tt, the maximum, the mean and the standard deviation around the mean are determined for C i,j tC_{i,j}^t, as τ\tau varies across its range.

Define the link strength S ij(t)S_{i j}(t) as the difference between the maximum and the mean value, divided by the standard deviation.

They say:

Accordingly, S ij(t)S_{i j}(t) describes the link strength at day t relative to the underlying background and thus quantifies the dynamical teleconnections between nodes ii and jj.

Niño 3.4

Niño 3.4 is the area-averaged sea surface temperature anomaly in the region 5°S-5°N and 170°-120°W. You can get Niño3.4 data here:

Niño 3.4 is just one of several official regions in the Pacific:

  • Niño 1: 80°W-90°W and 5°S-10°S.
  • Niño 2: 80°W-90°W and 0°S-5°S
  • Niño 3: 90°W-150°W and 5°S-5°N.
  • Niño 3.4: 120°W-170°W and 5°S-5°N.
  • Niño 4: 160°E-150°W and 5°S-5°N.

• Niño 1: 80°W-90°W and 5°S-10°S. • Niño 2: 80°W-90°W and 0°S-5°S • Niño 3: 90°W-150°W and 5°S-5°N. • Niño 3.4: 120°W-170°W and 5°S-5°N. • Niño 4: 160°E-150°W and 5°S-5°N.

For more details, read this:

• Kevin E. Trenberth, The definition of El Niño, Bulletin of the American Meteorological Society 78 (1997), 2771–2777.

category: blog, climate