# The Azimuth Project Jan Galkowski (Rev #17, changes)

Showing changes from revision #16 to #17: Added | Removed | Changed

I haven’t been very active at Azimuth of late, because of a rush of climate activism and related involvements, and because of a recovery period from the Azimuth Data Backup Project which was a bit of a burnout for me, even though I am delighted it was done and really enjoyed working with the team that made it happen. I’m also deeply grateful to Professor John Baez for throwing his support behind it and pushing it on. All indications are that our fears were justified but also that the Trump administration is either too ineffective or too corrupt (A New Argentina) to be potent at environmental destruction.

But I’m back and hope to be writing about things here, more.

Readers in interested in my travels might want to check out my blog and podcast, too.

I am a statistician and data quantitative scientist engineer working forAkamai Technologies in Cambridge, MA, where much of my time is devoted to support of internal teams with statistics questions, and, sometimes, commercial customer-sponsored research regarding the Internet. Internet, I and, live in Westwood, general, MA, what with I my call wife, Claire, in astudying Internet sociology. I live in Westwood, MA, with my wife, Claire, in a nearly zero Carbon-using home.

I’m an active student of environmental sciences, applications of James-Stein estimators like the LASSO ), ) innovative and applications of symmetrized Bayesian versions additive of classification normalized and compression regression divergence trees , and dimension innovative reduction applications using of random Procrustes projections Tangent Projections and Distance and corollaries of the Johnson-Lindenstrauss Lemma.

While I consider myself a Bayesian statistician, my view of that is now more nuanced. I see Bayesian inference as optimizing the formal term for a collection of techniques which optimize a regularized Likelihood function functions and, so, to think me of it more as a computational problem than a conceptual or ideological one. I surely do think classical frequentist methods like hypothesis testing and p-value hacking are severely misguided. But you don’t need to be a Bayesian to conclude that. Similar judgments are rendered by advocates of emploitingshrinkage, like Professor Brad Efron, and statistical sages like Konishi and Kitagawa or Burnham and Anderson.

Moreover, there are new sets of techniques and new problems for which Bayesian statistics does not have a strong opinion, whether that’s because these are not of interest yet, or it’s because Bayesian methods don’t or can’t apply, or because they just don’t fall into the same corral. The aforementioned random projections techniques, or normalized compressed divergence go in this direction. For problems, example, there is the regression problem ofsmall n, large p, where n denotes the number of observations and p denotes number of predictors. It’s not like Bayesian inference, and especially Bayesian computation hasn’t something to say about the latter, as it actually does. But these problems typically don’t come packaged with defined Likelihood functions either, so to do standard Bayesian inference, eitherEmpirical Likelihood needs to be embraced, or something like Approximate Bayesian Computation . These might maintain the connection to the Bayesian wordview, worldview, but I do not yet know what else they bring.

You I can spend learn the more most about time me developing methods which bridge quantitative and qualitative descriptions of data. For example, I am working on automated means of automatically derivinghereshapelets , and from my time series using aforementioned Procrustes Tangent Distance as a measure of closeness, something which is better and faster than eitherLinkedIn profiledynamic time warping . I or am a member of theAmerican Association for the Advancement of Sciencenormalized compression distance , . Once found, these shapelets are collected in libraries which categories a series, and a classification scheme in the form of a decision tree can be used to classify series using such libraries.Ecological Society of America, the American Statistical Association (active in its Boston Chapter), the International Society for Bayesian Analysis, and three organizations at Woods Hole Oceanographic Institution, the Associates, the 1930 Society, and its Fye Society.

I am also interested in means for devising sampling plans for surveys enhanced with pre-existing knowledge, such as spatial surveys for environmental data collection, or social surveys exploiting data available from Google Maps.

You can learn more about me here, and from my LinkedIn profile. I am a member of the American Association for the Advancement of Science, the Ecological Society of America, the American Statistical Association (active in its Boston Chapter), the International Society for Bayesian Analysis, the International Society of Survey Statisticians, the TeX Users Group (TUG), and three organizations at Woods Hole Oceanographic Institution, the Associates, the 1930 Society, and its Fye Society.

I am also active in social and political activities relating to the environment, partly through the Green Congregation Committee at the Unitarian Universalist congregation to which I belong, First Parish in Needham, MA, and partly as a staunch advocate for distributed, locally owned solar energy.

## Blog Articles

Warming slowdown? (part 1 of 2)” The idea of a global warming slowdown or hiatus is critically examined, emphasizing the literature, the datasets, and means and methods for telling such. Also available at the Azimuth Project wiki.

Warming slowdown? (part 2 of 2)” The idea of a global warming slowdown or hiatus is critically examined, emphasizing the literature, the datasets, and means and methods for telling such.).

Bayesian inversion of commingled tonnage of municipal solid waste to isolate components” Bayesian inversion to recover latent components in mixtures is a standard technique, with wide application. Yet, apparently, it is not well known. Frequentist methods for doing this are known as algorithms for blind source separation.

Unrelated to Azimuth, a technical and, occasionally, political blog which records developments in renewable energy, offers the occasional statistical and computational illustration and guidance, comments on Climate Science and other sciences of interest to me, e.g., Quantitative Population Biology, and sometime deep dives into subjects far from any of these, e.g., gun control or Geology.

## Areas of Interest

• Devising less people-hour intensive methods for achieving scientific results comparable in quality to existing techniques, techniques. whether by using ensembles of simpler agents,swarm intelligence, or machine learning, or data mining.
• Statistical applications in quantitative ecology and population biology, particularly examining dynamics of species invasions, and diffusion throughout an ecosystem.
• Applying modern computational and statistical techniques to engineering problems which people seem mired in a late 20th century way of thinking and calculating, at least according to their published technical literature.
• Applications Insights regarding inland hydrology and the implications of enhanced burst rainfall due to climate change for inland flooding.Generalized Linear Mixed Models using Markov Chain Monte Carlo methods. (See also this Azimuth introduction.
• Insights regarding inland hydrology and the implications of enhanced burst rainfall due to climate change for inland flooding. This is not as [Pollyannish|] as it might seem at first. Putting aside projections of climate change, NOAA projections of inland rainfall have crept steadily upwards over the last few decades. In my home town of Westwood, MA, for example, the probability per annum of 8 inches of rain in 24 hours is now 2%, which I found shocking. Yet our local conservation commission and planners still use tables of rainfall projections which date from the 1970s. I heard a talk from someone from NOAA in 2017 where she expressed her frustration that, for the most part, these updates were being ignored by local administrators, policymakers, and planners.
• Statistical support to citizen science efforts, after Kosmala, Wiggins, Swanson, and Simmons.
• Keeping up with R package ecosystem.
• Uses of cumulants and the Cornish-Fisher expansion to estimate quantiles of empirical distributions, including extensions for multivariate distributions on non-exchangeable random variables and other interdependencies.

## Projects

• Applying the shapelet techniques of mentioned symmetrized above normalized compressed divergence to develop insights regarding time series, particular those derived from hydrological observations and from series of energy consumption, especially electrical energy consumption of households.especially electrical energy consumption of households.
• Reframing standard problems as small n, large p problems, and using techniques of compressed sensing and random projections to gain new insights.
• Inferring latent causes of shortages in the drinking water supply in the town of Sharon, MA and, more generally, including predictive inference.
• Developing techniques which facilitate interpretation of data gathered by volunteers in the field and natural settings censored by seasonal availability and varying quality.
• Studying ecosystem relationships of the much maligned Alliaria petiolata (“Garlic mustard”) considering the insights of Professor Peter Del Tredici and colleagues. See here for an incomplete overview.
• Supporting local towns in their development of climate and weather resilience plans by providing them with contacts, sources of data and maps, and doing pro bono statistical analyses.