The Azimuth Project
Jan Galkowski (changes)

Showing changes from revision #17 to #18: Added | Removed | Changed

Jan Galkowski

Readers in interested in my travels might want to check out my blog and podcast, too.

I am a statistician and quantitative engineer for Akamai Technologies in Cambridge, MA, where much of my time is devoted to support of internal teams with statistics questions, customer-sponsored research regarding the Internet, and, in general, what I call studying Internet sociology. I live in Westwood, MA, with my wife, Claire, in a nearly zero Carbon-using home.

I’m an active student of environmental sciences, applications of James-Stein estimators like the LASSO) and Bayesian additive classification and regression trees, and innovative applications of Procrustes Tangent Projections and Distance.

While I consider myself a Bayesian statistician, my view of that is nuanced. I see Bayesian inference as the formal term for a collection of techniques which optimize a regularized Likelihood functions and, to me it more as a computational problem than a conceptual or ideological one. I surely do think classical frequentist methods like hypothesis testing and p-value hacking are severely misguided. But you don’t need to be a Bayesian to conclude that. Similar judgments are rendered by advocates of emploiting shrinkage, like Professor Brad Efron, and statistical sages like Konishi and Kitagawa or Burnham and Anderson.

Moreover, there are new sets of techniques and new problems for which Bayesian statistics does not have a strong opinion, whether that’s because these are not of interest yet, or it’s because Bayesian methods don’t or can’t apply, or because they just don’t fall into the same corral. For example, there is the regression problem of small n, large p, where n denotes the number of observations and p denotes number of predictors. It’s not like Bayesian inference, and especially Bayesian computation hasn’t something to say about the latter, as it does. But these problems typically don’t come packaged with defined Likelihood functions either, so to do standard Bayesian inference, either Empirical Likelihood needs to be embraced, or something like Approximate Bayesian Computation. These might maintain the connection to the Bayesian worldview, but I do not yet know what else they bring.

I spend the most time developing methods which bridge quantitative and qualitative descriptions of data. For example, I am working on automated means of automatically deriving shapelets from time series using aforementioned Procrustes Tangent Distance as a measure of closeness, something which is better and faster than either dynamic time warping or normalized compression distance. Once found, these shapelets are collected in libraries which categories a series, and a classification scheme in the form of a decision tree can be used to classify series using such libraries.

I am also interested in means for devising sampling plans for surveys enhanced with pre-existing knowledge, such as spatial surveys for environmental data collection, or social surveys exploiting data available from Google Maps.

You can learn more about me here, and from my LinkedIn profile. I am a member of the American Association for the Advancement of Science, the Ecological Society of America, the American Statistical Association (active in its Boston Chapter), the International Society for Bayesian Analysis, the International Society of Survey Statisticians, the TeX Users Group (TUG), and three organizations at Woods Hole Oceanographic Institution, the Associates, the 1930 Society, and its Fye Society.

I am also active in social and political activities relating to the environment, partly through the Green Congregation Committee at the Unitarian Universalist congregation to which I belong, First Parish in Needham, MA, and partly as a staunch advocate for distributed, locally owned solar energy.

Blog Articles

Warming slowdown? (part 1 of 2)” The idea of a global warming slowdown or hiatus is critically examined, emphasizing the literature, the datasets, and means and methods for telling such. Also available at the Azimuth Project wiki.

Warming slowdown? (part 2 of 2)” The idea of a global warming slowdown or hiatus is critically examined, emphasizing the literature, the datasets, and means and methods for telling such.).

Bayesian inversion of commingled tonnage of municipal solid waste to isolate components” Bayesian inversion to recover latent components in mixtures is a standard technique, with wide application. Yet, apparently, it is not well known. Frequentist methods for doing this are known as algorithms for blind source separation.

Unrelated to Azimuth, a technical and, occasionally, political blog which records developments in renewable energy, offers the occasional statistical and computational illustration and guidance, comments on Climate Science and other sciences of interest to me, e.g., Quantitative Population Biology, and sometime deep dives into subjects far from any of these, e.g., gun control or Geology.

Areas of Interest

  • Devising less people-hour intensive methods for achieving scientific results comparable in quality to existing techniques.
  • Statistical applications in quantitative ecology and population biology, particularly examining dynamics of species invasions, and diffusion throughout an ecosystem.
  • Applying modern computational and statistical techniques to engineering problems which people seem mired in a late 20th century way of thinking and calculating, at least according to their published technical literature.
  • Insights regarding inland hydrology and the implications of enhanced burst rainfall due to climate change for inland flooding.
  • Statistical support to citizen science efforts, after Kosmala, Wiggins, Swanson, and Simmons.
  • Keeping up with R package ecosystem.
  • Uses of cumulants and the Cornish-Fisher expansion to estimate quantiles of empirical distributions, including extensions for multivariate distributions on non-exchangeable random variables and other interdependencies.
  • Applications of the Johnson-Lindenstrauss lemma for projecting large but sparse data of high dimension onto smaller dimensions. See also this and this.


  • Applying the shapelet techniques mentioned above to develop insights regarding time series, particular those derived from hydrological observations and from series of energy consumption, especially electrical energy consumption of households.
  • Inferring latent causes of shortages in the drinking water supply in the town of Sharon, MA and, more generally, including predictive inference.
  • Developing techniques which facilitate interpretation of data gathered by volunteers in the field and natural settings censored by seasonal availability and varying quality.
  • Studying ecosystem relationships of the much maligned Alliaria petiolata (“Garlic mustard”) considering the insights of Professor Peter Del Tredici and colleagues. See here for an incomplete overview.
  • Supporting local towns in their development of climate and weather resilience plans by providing them with contacts, sources of data and maps, and doing pro bono statistical analyses.

Daily consumption of electricity from Westwood, MA, senior high school for 366 days, clustered using symmetrized network compression divergence and standard hierarchical clustering

What I’m Reading


category: members