The Azimuth Project
Jan Galkowski

Jan Galkowski

I haven’t been very active at Azimuth of late, because of a rush of climate activism and related involvements, and because of a recovery period from the Azimuth Data Backup Project which was a bit of a burnout for me, even though I am delighted it was done and really enjoyed working with the team that made it happen. I’m also deeply grateful to Professor John Baez for throwing his support behind it and pushing it on. All indications are that our fears were justified but also that the Trump administration is either too ineffective or too corrupt (A New Argentina) to be potent at environmental destruction.

But I’m back and hope to be writing about things here, more.

Readers in interested in my travels might want to check out my blog and podcast, too.

I am a statistician and data scientist working for Akamai Technologies in Cambridge, MA, where much of my time is devoted to support of internal teams with statistics questions, and, sometimes, commercial customer-sponsored research regarding the Internet. I live in Westwood, MA, with my wife, Claire, in a nearly zero Carbon-using home.

I’m an active student of environmental sciences, applications of James-Stein estimators like the LASSO), innovative applications of symmetrized versions of normalized compression divergence, and dimension reduction using random projections and corollaries of the Johnson-Lindenstrauss Lemma.

While I consider myself a Bayesian statistician, my view of that is now more nuanced. I see Bayesian inference as optimizing a regularized Likelihood function and, so, think of it more as a computational problem than a conceptual one. I surely do think classical frequentist methods like hypothesis testing and p-value hacking are severely misguided. But you don’t need to be a Bayesian to conclude that. Similar judgments are rendered by advocates of emploiting shrinkage, like Professor Brad Efron, and statistical sages like Konishi and Kitagawa or Burnham and Anderson.

Moreover, there are new sets of techniques and new problems for which Bayesian statistics does not have a strong opinion, whether that’s because these are not of interest yet, or it’s because Bayesian methods don’t or can’t apply, or because they just don’t fall into the same corral. The aforementioned random projections techniques, or normalized compressed divergence go in this direction. For problems, there is the regression problem of small n, large p, where n denotes the number of observations and p denotes number of predictors. It’s not like Bayesian inference, and especially Bayesian computation hasn’t something to say about the latter, as it actually does. But these problems typically don’t come packaged with defined Likelihood functions either, so to do standard Bayesian inference, either Empirical Likelihood needs to be embraced, or something like Approximate Bayesian Computation. These might maintain the connection to the Bayesian wordview, but I do not yet know what else they bring.

You can learn more about me here, and from my LinkedIn profile. I am a member of the American Association for the Advancement of Science, the Ecological Society of America, the American Statistical Association (active in its Boston Chapter), the International Society for Bayesian Analysis, and three organizations at Woods Hole Oceanographic Institution, the Associates, the 1930 Society, and its Fye Society.

I am also active in social and political activities relating to the environment, partly through the Green Congregation Committee at the Unitarian Universalist congregation to which I belong, First Parish in Needham, MA, and partly as a staunch advocate for distributed, locally owned solar energy.

Blog Articles

Warming slowdown? (part 1 of 2)” The idea of a global warming slowdown or hiatus is critically examined, emphasizing the literature, the datasets, and means and methods for telling such. Also available at the Azimuth Project wiki.

Warming slowdown? (part 2 of 2)” The idea of a global warming slowdown or hiatus is critically examined, emphasizing the literature, the datasets, and means and methods for telling such.).

Bayesian inversion of commingled tonnage of municipal solid waste to isolate components” Bayesian inversion to recover latent components in mixtures is a standard technique, with wide application. Yet, apparently, it is not well known. Frequentist methods for doing this are known as algorithms for blind source separation.

Unrelated to Azimuth, a technical and, occasionally, political blog which records developments in renewable energy, offers the occasional statistical and computational illustration and guidance, comments on Climate Science and other sciences of interest to me, e.g., Quantitative Population Biology, and sometime deep dives into subjects far from any of these, e.g., gun control or Geology.

Areas of Interest

  • Devising less people-hour intensive methods for achieving scientific results comparable in quality to existing techniques, whether by using ensembles of simpler agents, swarm intelligence, or machine learning, or data mining.
  • Statistical applications in quantitative ecology and population biology, particularly examining dynamics of species invasions, and diffusion throughout an ecosystem.
  • Applying modern computational and statistical techniques to engineering problems which people seem mired in a late 20th century way of thinking and calculating, at least according to their published technical literature.
  • Applications of Generalized Linear Mixed Models using Markov Chain Monte Carlo methods. (See also this Azimuth introduction.
  • Insights regarding inland hydrology and the implications of enhanced burst rainfall due to climate change for inland flooding. This is not as [Pollyannish|] as it might seem at first. Putting aside projections of climate change, NOAA projections of inland rainfall have crept steadily upwards over the last few decades. In my home town of Westwood, MA, for example, the probability per annum of 8 inches of rain in 24 hours is now 2%, which I found shocking. Yet our local conservation commission and planners still use tables of rainfall projections which date from the 1970s. I heard a talk from someone from NOAA in 2017 where she expressed her frustration that, for the most part, these updates were being ignored by local administrators, policymakers, and planners.
  • Statistical support to citizen science efforts, after Kosmala, Wiggins, Swanson, and Simmons.
  • Keeping up with R package ecosystem.


  • Applying techniques of symmetrized normalized compressed divergence to develop insights regarding time series, particular those derived from hydrological observations and from series of energy consumption, especially electrical energy consumption of households.
  • Reframing standard problems as small n, large p problems, and using techniques of compressed sensing and random projections to gain new insights.
  • Inferring latent causes of shortages in the drinking water supply in the town of Sharon, MA and, more generally, including predictive inference.
  • Developing techniques which facilitate interpretation of data gathered by volunteers in the field and natural settings censored by seasonal availability and varying quality.
  • Studying ecosystem relationships of the much maligned Alliaria petiolata (“Garlic mustard”) considering the insights of Professor Peter Del Tredici and colleagues. See here for an incomplete overview.
  • Supporting local towns in their development of climate and weather resilience plans by providing them with contacts, sources of data and maps, and doing pro bono statistical analyses.

Daily consumption of electricity from Westwood, MA, senior high school for 366 days, clustered using symmetrized network compression divergence and standard hierarchical clustering

What I’m Reading


category: members