The Azimuth Project
Blog - Your model is verified, but not valid! Huh?

This page is a blog article in progress, written by Tim van Beek. To see discussion of this article while it was being written, visit the Azimuth Forum. For the final polished version, go to the Azimuth Blog.

Introduction

Among the prominent tools in climate science are complicated computer models. If you are interested in both climate science and computer science, you should check out this blog:

• Steve Easterbrook, Serendipity, or What has Software Engineering got to do with Climate Change?

After reading Easterbrook's blog post about "climate model validation", and some discussions of this topic elsewhere, I noticed that there is some “computer terminology” floating around that disguises itself as plain English! This has led to some confusion, so I’d like to explain some of it here.

Technobabble: The Quest of Cooperation

Climate change may be the first problem in the history of humankind that has to be tackled on a global scale, by people all over the world working together. Of course, a prerequisite of working together is a mutual understanding and a mutual language. Unfortunately every single one of the many professions that scientists and engineers engage in have created their own dialect. And most experts are proud of it!

When I read about the confusion that “validation” versus “verification” of climate models has caused, I was reminded of the phrase “technobabble”, which screenwriters for the TV series “Star Trek” used whenever they had to write a dialog involving the engineers on the Starship Enterprise. Something like this:

“Captain, we have to send an inverse tachyon beam through the main deflector dish!”

“Ok, make it so!”

Fortunately, neither Captain Picard nor the audience had to understand what was really going on.

It’s a bit different in the real world, where not everyone may have the luxury of staying on the sidelines while the trustworthy crew members in the Enterprise’s engine room solve all the problems. We can start today by explaining some software engineering technobabble that came up in the context of climate models. But why would software engineers bother in the first place?

Short Review of Climate Models

Climate models come in a hierarchy of complexity. The simplest ones only try to simulate the energy balance of the planet earth. These are called energy balance models. They don’t take into account the spherical shape of the earth, for example.

At the opposite extreme, the most complex ones try to simulate the material and heat flow of the atmosphere and the oceans on a topographical model of the spinning earth. These are called general circulation models, or GCMs for short. GCMs have a lot of code, sometimes more than a million lines of code.

A line of code is basically one instruction for the computer to carry out, like:

add 1, 1/2, 1/6 and store the result in a variable called e

print e on the console

In order to understand what a computer program does, in theory, one has to memorize every single line of code and understand it. And most programs use a lot of other programs, so, in theory, one would have to understand those, too. This is of course not possible for a single person!

We hope that taking into account a lot of effects, which results in a lot of lines of code, makes the models more accurate. But it certainly means that they are complex enough to be interesting for software engineers.

In the case of software that is used to run an internet shop, a million lines of code isn’t much. But it is already too big for one single person to handle. Basically, this is where all the problems start, that software engineering seeks to solve.

When more than one person works on a software project things often get complicated.

(From the manual of CVS, the “Concurrent Versions System”.)

Software Design Circle

The job of software engineer is in some terms similar to the work of an architect. The differences are mainly due to the abstract nature of software. Everybody can see if a building is finished or if it isn’t, but that’s not possible with software. Nevertheless every software project does come to an end, and people have to decide whether or not the product, the software, is finished and does what it should. But since software is so abstract, people have come up with special ideas about how the software “production process” should work and how it should be determined if the software is correct. I would like to explain these a little bit further.

Stakeholders and Shifts in Stakeholder Analysis

There are many different people working in an office building with different interests: cleaning squads, janitors, plant security, and so on. When you design a new office building, you need to identify and take into account all the different interests of all these groups. Most software projects are similar, and the process just mentioned is usually called stakeholder analysis.

Of course, if you take into account only the groups already mentioned, you’ll build an office building without any offices, because that would obviously be the simplest one to monitor and to keep working. Such an office building wouldn’t make much sense, of course! This is because we made a fatal mistake with our stakeholder analysis: we failed to take into account the most important stakeholders, the people who will actually use the offices. These are the key stakeholders of the office building project.

After all, the primary purpose of an office building is to provide offices. And in the end, if we have an office building without offices, we’ll notice that no one will pay us for our efforts.

Gathering Requirements

While it may be obvious what most people want from an office building, the situation is usually much more abstract, hence much more complicated, for software projects.

This is why software people carry out a requirement analysis, where they ask the stakeholders what they would like the software to do. A requirement for an office building might be, for example, “we need a railway station nearby, because most of the people who will work in the building don’t have cars.” A requirement for a software project might be, for example, “we need the system to send email notifications to our clients on a specific schedule”.

In an ideal world, the requirement analysis would result in a document —usually called something like a system specification—that contains both the requirements, and also descriptions of the test cases that are needed to test whether the finished system meets the requirements. For example:

• “Employee A lives in an apartment 29km away from the office building and does not have a car. She gets to work within 30 minutes by using public transportation.”

Verification and Validation?

When we have finished the office building (or the software system), we’ll have to do some acceptance testing, in order to convince our customer that she should pay us (or simply to use the system, if it is for free). When you buy a car, your “acceptance test” is driving away with it - if that does not work, you know that there is something wrong with your car! But for complicated software - or office buildings - we need to agree on what we do to test if the system is finished. That’s what we need the test cases for.

If we are lucky, the relevant test cases will already be described in the system specification, as noted above. But that is not the whole story.

Every scientific community that has its own identity invents a new specific language, often borrowing words from everyday language and defining new, surprising, specific meanings for them. Software engineers are no different. There are, for example, two very different aspects to testing a system:

  • Did we do everything according to the system specification?

and:

  • Now that the system is there, and our key stakeholders can see it for themselves, did we get the system specification right: is our product useful to them?

The first is called verification, the second validation. As you can see, software engineers took two almost synonymous words from everyday language and gave them quite different meanings!

For example, if you wrote in the specification for an online book seller:

  • “we calculate the book price by multiplying the ISBN number by pi”

and the final software system does just that, then the system is verified. But if the book seller would like to stay in business, I bet that he won’t say the system has been validated.

Stakeholders of Climate Models

So, for business applications, it’s not quite right to ask “is the software correct?” The really important question is: “is the software as useful for the key stakeholders as it should be?”

But in Mathematics Everything is Either True or False!

One may wonder if this “true versus useful” stuff above makes any sense when we think about a piece of software that calculates, for example, a known mathematical function like a “modified Bessel function of the first kind”. After all, it is defined precisely in mathematics what these functions look like.

If we are talking about creating a program that can evaluate these functions, there are a lot of technical choices that need to be specified. Here is a random example (if you don’t understand it, don’t bother, that is not necessary to get the point):

  • Current computers know data types with a finite value range and finite precision only, so we need to agree on which such data type we want as a model of the real or complex numbers. For example, we might want to use the “double precision floating-point format”, which is an international standard.

Another aspect is, for example, “how long may the function take to return a value?” This is an example of a non-functional requirement (see Wikipedia). These requirements will play a role in the implementation too, of course.

However, apart from these technical choices, there is no ambiguity as to what the function should do, so there is no need to distinguish verification and validation. Thank god that mathematics is eternal! A Bessel function will always be the same, for all of eternity.

Unfortunately, this is no longer true when a computer program computes something that we would like to compare to the real world. Like, for example, a weather forecast. In this case the computer model will, like all models, include some aspects of the real world. Or rather, some specific implementations of a mathematical model of a part of the real world.

Verification will still be the same, if we understand it to be the stage where we test to see if the single pieces of the program compute what they are supposed to. The parts of the program that do things that can be defined in a mathematically precise way. But validation will be a whole different step if understood in the sense of “is the model useful?”

But Everybody Knows What Weather is!

But still, does this apply to climate models at all? I mean, everybody knows what “climate” is, and “climate models” should simulate just that, right?

As it turns out, it is not so easy, because climate models serve very different purposes:

  • Climate scientists want to test their understanding of basic climate processes, just as physicists calculate a lot of solutions to their favorite theories to gain a better understanding of what these theories can and do model.

  • Climate models are also used to analyse observational data, to supplement such data and/or to correct them. Climate models have had success in detecting misconfiguration and biases in observational instruments.

  • Finally, climate models are also used for global and/or local predictions of climate change.

The question “is my climate model right?” therefore translates to the question “is my climate model useful?” This question has to refer to a specific use of the model, or rather: to the viewpoint of the key stakeholders.

The Shift of Stakeholders

One problem of the discussions of the past seems to be due to a shift of the key stakeholders, for example: Some climate models have been developed as a tool for climate scientists to play around with certain aspects of the climate. When the scientists published papers, including insights gained from these models, they usually did not publish anything about the implementation details. Mostly, they did not publish anything about the model at all.

This is nothing unusual. After all, a physicist or mathematician will routinely publish her results and conclusions - maybe with proofs. But she is not required to publish every single thought she had to think to produce her results.

But after the results of climate science became a topic in international politics, a change of the key stakeholders occurred: A lot of people outside the climate science community developed an interest in the models. This is a good thing. There is a legitimate need of researchers to limit participation in the review process, of course. But when the results of a scientific community become the basis of far reaching political decisions, there is a legitimate public interest in the details of the ongoing research process, too. The problem in this case is that the requirements of the new key stakeholders, like interested software engineers outside the climate research community, are quite different from the requirements of the former key stakeholders, climate scientists.

For example, if you write a program for your own eyes only, there is hardly any need to write a detailed documentation of it. If you write it for others to understand it, as rule of thumb, you’ll have to produce at least as much documentation as code.

Back to the Start: Farms, Fields and Forests

As an example of a rather prominent critic of climate models, let’s quote the physicist Freeman Dyson:

The models solve the equations of fluid dynamics and do a very good job of describing the fluid motions of the atmosphere and the oceans.

They do a very poor job of describing the clouds, the dust, the chemistry and the biology of fields, farms and forests. They are full of fudge factors so the models more or less agree with the observed data. But there is no reason to believe the same fudge factors would give the right behaviour in a world with different chemistry, for example in a world with increased CO2.

Let’s assume that Dyson is talking here about GCMs, with all their parametrizations of unresolved processes (which he calls “fudge factors”). Then the first question that comes to my mind is “why would a climate model need to describe fields, farms and forests in more detail?”

I’m quite sure that the answer will depend on what aspects of the climate the model should represent, in what regions and over what timescale.

And that certainly depends on the answer to the question “what will we use our model for?” Dyson seems to assume that the answer to this question is obvious, but I don’t think that this is true. So, maybe we should start with “stakeholder analysis” first.

category: blog