# The Azimuth Project Invasion fitness in moment-closure treatments (Rev #22, changes)

Showing changes from revision #21 to #22: Added | Removed | Changed

This page is a blog article in progress, written by Blake Stacey. To discuss as it is being written, go to the Azimuth Forum.

### Idea

Moment closures are a way of forgetting information about a system in a controlled fashion, in the hope that an incomplete, fairly heavily “coarse-grained” picture of the system will still be useful in figuring out what will happen to it. Sometimes, this is a justifiable hope, but in other cases, we are right to wonder whether all the algebra it generates actually leads us to any insights. Here, we’ll be concerned with a particular application of this technology: studying the vulnerability of an ecosystem to invasion. We shall find expressions for invasion fitness, the expected relative growth rate of an initially-rare species or variety.

### Chatty Presentation

Consider a lattice, each site of which can occupied by an individual of “resident” type ($R$), occupied by a mutant ($M$), or empty ($0$). The difference between the two is encoded in the choice of transition rules representing death, birth and migration. We can get an aggregate measure of the situation by finding the probability that a randomly chosen site will be in state $a$, where $a$ can take values in the set $\{R, M, 0\}$. A finer degree of distinction is provided by the conditional probabilities $q_{a|b}$, where, for example, $q_{R|M}$ denotes the probability that a randomly chosen neighbor site to a randomly chosen mutant is of resident type. Note that if a mutant is injected into a native resident population and its offspring form a geographical cluster, $q_{M|M}$ can be much larger than $p_M$: few individuals are mutants overall, but the probability of a mutant life-form interacting with another mutant is high.

The pair dynamics of the system involves the time evolution of the probabilities $p_{a b}$, that is, the probability that a randomly selected lattice edge will have $a$ on one end and $b$ on the other. The differential equation for $d p_{R M}/d t$, for example, will have terms reflecting the processes which can form and destroy $R M$ pairs: $R M\rightarrow R R$ is one possibility, and $R M\rightarrow M M$ is another. Death, which comes for organisms and leaves empty spaces behind, introduces processes like $R M \rightarrow R 0$, $R M \rightarrow 0 M$ and $R M \rightarrow 0 0$. Reproduction can lead to formerly empty spaces becoming occupied: $R 0 \rightarrow R R$ and $M 0 \rightarrow M M$. In language more like we’ve used in the Azimuth discussions of stochastic Petri net theory, we’ve moved beyond just creating and annihilating residents and mutants, and now we’re dynamically changing the number of “resident–resident” and “resident–mutant” pairs. So, we probably could make a Petri net diagram for these processes, but the labels on the boxes might start looking a little funny.

Each term in our differential equations will have a transition rate dependent upon a conditional probability of the form $q_{a|b c}$, denoting the probability that a $b$ of a $b c$ pair will have a neighbor of type $a$. The differential equations for the pair probabilities $p_{a b}$ thus depend on triplet probabilities $p_{a b c}$ , which depend upon quadruplet probabilities and so forth. To make progress, we truncate this hierarchy hierarchy, in brutally a manner reminiscent to cutting off the higher-order BBGKY correlations hierarchy by in declaring statistical that mechanics: we impose thepair approximation that

$q_{a|b c} \approx q_{a|b}.$

This approximation imposition, destroys a information about spatial structure and thereby introduces bias which in an ideal world ought to be accounted for.pair approximation, destroys information about spatial structure and thereby introduces bias which in an ideal world ought to be accounted for. In theoretical ecology, this maneuver dates back at least to Matsuda et al. (1992), though it has antecedents in statistical physics, going back to the kinetic theory work of Bogoliubov, Born, Green, Kirkwood and Yvon, for whom the “BBGKY hierarchy” is named.

Invasion fitness is judged in the following manner. We start with a lattice devoid of mutants ($p_{M a} = 0$) and find the equilibrium densities $p_{R R}^*$ and $p_{R 0}^*$ by setting

$\frac{d p_{R0}}{d t} = \frac{d p_{R R}}{d t} = 0.$

The exact form of $p_{R R}^*$ and $p_{R0}^*$ will depend upon interaction details which we won’t worry about just yet. We then inject a mutant strain into this situation; as the mutants are initially rare, we can say they do not affect the large-scale dynamics of the resident population. Summarizing the pair probabilities $p_{M a}$ with the shorthand $\underline{p}$, we write the differential equation in matrix form

$\frac{d\underline{p}}{d t} = T(q_{a|b c})\underline{p},$

where the matrix $T(q_{a|b c})$ encapsulates the details of our chosen dynamics. The pair approximation, in which we discard correlations of third and higher order, lets us simplify this to

$\frac{d\underline{p}}{d t} = T(q_{a|b})\underline{p}.$

When people started doing simulations of lattice models like these, they found that the conditional probabilities $q_{a|M}$ equilibrate. That is to say, even if the global density of mutants $p_M$ changes, the local statistical structure of a mutant cluster remains constant. This is the key statement which allows us to linearize the dynamics and write the behavior of $\underline{p}$ in terms of eigenvectors and eigenvalues:

$\underline{p}(t) = C\underline{e}_A \exp(\lambda t).$

The dominant eigenvalue $\lambda$ of $T$ is the “invasion exponent” which characterizes whether an invasion will fail ($\lambda \lt 0$) or succeed ($\lambda \gt 0$). The eigenvector $\underline{e}_A$ associated with $\lambda$ describes the vehicle of selection for the mutants’ particular genetic variation, by summarizing the structure of their geographical cluster.

All of this, of course, is only as good as our linearization! If something interesting happens further away from the fixed point, just looking at the eigenvalues we got from our matrix $T$ won’t tell us about it. In addition, if the actual dynamics of the system tend to form patterns which can’t be represented very well by pairwise correlations, then pair approximation will run into a wall. Minus van Baalen puts the issue in the following way:

The extent to which pair-dynamics models are satisfactory depends on the goal of the modeler. As we have seen, these models do not capture all of the phenomena that can be observed in simulations of fully spatial probabilistic cellular automata. Basically, the approximation fails whenever spatial structures arise that are difficult to “describe” using pairs alone. More technically, the method fails whenever significant higher-order correlations arise – that is, whenever the frequency of parituclar triplets (or triangles, squares, or all sorts of star-like configurations) starts to diverge from what one would expect on the basis of pair densities. Thus, pair-dynamics models satisfactorily describe probabilistic cellular automata in which only “small-scale” patterns arise. Larger, “meso-scale” patterns such as spirals are difficult to capture using this method.

—in Dieckmann et al. (2000), chapter 19.

### Example 1: Birth, Death, Movement

We follow van Baalen (in Dieckmann et al. (2000), chapter 19).

(Petri net pictures will go here.)

We write $z$ for the coordination number of the lattice. Birth:

$R 0 \rightarrow R R,$

with rate $b/z$.

Death:

$R a \rightarrow 0 a,$

with rate $d/z$.

Movement or migration:

$R 0 \rightarrow 0 R,$

with rate $m/z$.

$\begin{array}{rcl} \frac{d p_{R 0}}{d t} & = & -p_{R 0}[b/z + d + (1 - 1/z)q_{0|R0}m + (1 - 1/z)q_{R|0R}(b + m)] \\ & & + p_{00} (1 - 1/z) q_{R|00} (b + m) \\ & & + p_{R R} [d + (1 - 1/z) q_{0|R R} m]. \end{array}$
$\frac{d p_{0 0}}{d t} = -p_{00} 2(1 - 1/z) q_{R|00} (b + m) + p_{R 0} 2 [d + (1 - 1/z) q_{0|R 0} m].$
$\frac{d p_{R R}}{d t} = p_{R 0} 2[b/z + (1 - 1/z)q_{R|0 R} (b + m)] - p_{R R} 2[d + (1 - 1/z) q_{0|R R} m].$
$p_i = \sum_j p_{i j}.$
$\frac{d p_R}{d t} = (b q_{0|R} - d) p_R.$

If we ignore spatial structure altogether, we can say that

$q_{0|R} = p_0,$

which by normalization of probability means

$q_{0|R} = 1 - p_R.$

So,

$\frac{d p_R}{d t} = (b(1 - p_R) - d) p_R.$

This should look familiar: it’s a logistic equation for population growth, with growth rate $b - d$ and equilibrium population $1 - d/b$.

It’s worth pausing a moment here and using this result to touch on a more general concern. Often, a logistic-growth model is presented with the growth rate and the equilibrium population size as its parameters. When we see the model in that form, we naturally start thinking of those parameters as independently variable quantities. We imagine that a mutation or a change in the environmental conditions could change one without affecting the other. However, if the growth rate and the equilibrium population size are both functions of other parameters taken together, then the changes which are biologically reasonable to consider will likely affect both of them. To understand which quantities we should treat as independent, we need to spend time looking at how the numbers which apply to population-scale phenomena arise from the smaller-scale physiological and ecological goings-on (Fox 2011).

### Example 2: Epidemic in an Adaptive Network

(This might spin off into its own page later, depending on whether I decide to emphasize more the idea of invasion fitness or the general relation between stochastic Petri nets and moment closures. Saving it here so I don’t forget about it.)

We can juice things up a little by considering another example, one which is topical for these jittery times. Let’s look at the spread of an epidemic. There’s a classic genre of models for this, which we can call after a prototypical representative, the Susceptible–Infected–Recovered or SIR model. In the SIR model, we imagine a population of individuals through which a disease can spread. Each individual is either Susceptible to the disease, Infected with it or Recovered from it. The Petri net for a an Susceptible–Infected–Recovered SIR (SIR) model would look like this:

We can read off the possible transitions from the diagram. Contact with an Infected individual can turn a Susceptible one into an Infected, so an $S$ plus an $I$ becomes an $I$ plus an $I$. If the disease runs its course in an individual, they gain the status of $R$ and are thereafter immune to further infection. (Perfect immunity is, mathematically, the same as death, but we’ll be optimistic with our labels today.) We can complicate the model in many ways, for example by making the immune response imperfect, so that individuals who have recovered can be re-infected later. This could happen by the immunity fading over time, so that $R$ individuals transition back to $S$, or the immunity might only be partial, so that we have transitions from $R$ directly back to $I$. We can also add population structure: interesting things happen when the individuals are not all in direct contact with one another. This complication is obviously something we’ll have to address if we want to do epidemiology with real-world diseases! Speaking from a more mathematical perspective, we find neat phase-transition effects when we put these epidemic models on a lattice; see Stacey et al. (2011) and references therein.

The complicating factor we’ll consider today is the following: the spread of a disease through a social network can itself change the way people contact each other. This makes epidemiology a candidate subject for the study of adaptive networks: graph-structured systems in which the states associated with the vertices and the topology of the edges can change on the same timescales, feeding back on one another.

To the SIR problem we described earlier, we add two extra wrinkles: first, the individuals are arranged in a network, and infection spreads only along the links within that network. Second, if a susceptible individual is in contact with an infected one, that link can be broken, and a new link established to another susceptible individual, which we pick at random from the pool of eligible susceptibles (that is, those who aren’t already neighbours — we are disallowing multiple links). This rewiring rule makes this scenario an adaptive-network problem.

For the moment, let’s say that once they’re infected, these organisms don’t recover. We have only $S$ and $I$ in the population at any time. Therefore, the probability that an organism chosen at random has status $S$ and the probability that an organism chosen at random has status $I$ sum to 1:

$p_S + p_I = 1.$
$p_{S S} + p_{S I} + p_{I I} = \langle k \rangle,$

where $\langle k \rangle$ is the average degree of the nodes in the network.

The number of infected individuals decreases as organisms recover from the disease, while it increases as the contagion spreads over links from those already infected. We say how quickly recovery happens using the parameter $r$, and we encode the ease with which the disease travels along network links with the transmissibility $\tau$. With these definitions, we can write the following rate equation for the density of infected individuals, $p_I$:

$\frac{d}{d t} p_I = \tau p_{S I} - r p_{I}.$

In the original SIR example, where everyone was in contact with everyone else, we could say $p_{S I} \approx \langle k \rangle p_S p_I$. But we ought to be wary of using this approximation here, for two reasons. First, any time we have a system which has a chance of developing heterogeneity, of forming lumps in one region which don’t directly affect lumps in another, then averaging over the whole system becomes a risky business. Second, more specifically, this approximation can’t capture rewiring. Links are breaking and re-forming all the time in our model, but the product $p_S p_I$ stays the same when we move links around. So, to write an equation for how $p_I$ changes, we need to write down how the density of $S I$ pairs will change, but in order to do that, we have to include the rewiring effect.

Let’s introduce a third parameter, $w$, to indicate how much rewiring is going on. The density of $S S$ pairs will go down as the disease spreads to them from infected nodes, but it will go up as nodes recover and as susceptible nodes rewire their links.

$\frac{d}{d t} p_{S S} = (r + w) p_{S I} - \tau p_{S S I}.$
$\frac{d}{d t} p_{I I} = \tau(p_{S I} + p_{I S I}) - 2r p_{I I}.$

We now enforce a pair approximation, by writing three-way probabilities in terms of lower-order ones:

$p_{I S I} = \frac{\langle q \rangle}{\langle k \rangle} \cdot \frac{p_{S I}^2}{p_S}.$

The “mean excess degree” $\langle q \rangle$ is the number of additional links we expect to find after we follow a random link. It turns out (Do 2009) that $\langle q \rangle = \langle k \rangle$ is a reasonable approximation.

Our dynamical system is defined by three equations. First, we have the rule we wrote before,

$\frac{d}{d t} p_I = \tau p_{S I} - r p_I,$

and then the rules we deduced using pair approximation:

$\frac{d}{d t} p_{S S} = (r + w)p_{S I} - 2 \tau p_{S I} \frac{p_{S S}}{p_S},$

and

$\frac{d}{d t} p_{I I} = \tau p_{S I} \left(1 + \frac{p_{S I}}{p_S}\right) - 2r p_{I I}.$

Having written these equations, we can compare the behaviour of the dynamical system they define to that of a simulation of the original model. They actually agree pretty well (Gross et al. 2006, Do et al. 2009).

The study of dynamical processes on and of graph structures brings together the subjects of “network theory”, as the term has been used around Azimuth, and “complex networks”, to invoke a term which has been trending elsewhere.

### Example 3: Evolution of Altruism

One application of this machinery has been to understand how evolution can produce altruistic behaviour. A behaviouristic definition of “altruism” would go something like, “Acting to increase the reproductive success of another individual at the expense of one’s own”. This can be written out using game theory; one defines a parameter $b$ to stand for benefit, $c$ to stand for cost and a payoff function which depends on $b$, $c$ and the strategy employed by an organism.

In the reproduction process, an organism spawns an offspring into an adjacent empty lattice site. This turns a pair of type $S 0$ into a pair of type $S S$. At what rate should the transition $S 0 \rightarrow S S$ occur? If we presume some baseline reproductive rate, call it $b_0$, then the presence of altruistic neighbours should augment that rate. We’ll say that if the number of nearby altruists is $n_A$, then selfish individuals will reproduce at a rate $b_0 + B n_A / n$, where the parameter $B$ specifies how helpful altruists are. The reproduction process for altruists, which we can write $A 0 \rightarrow A A$, occurs at a rate $b_0 + B n_A / n - C$. Here, the parameter $C$ is the cost of altruism: it’s how much an altruist gives up to help others.

In the differential equation for $d p_{S 0}/d t$, the $S 0 \rightarrow S S$ transition contributes a term proportional to the density of $S 0$ pairs:

$-[(1 - \phi) b_S + (b_S + m) \phi q_{S|0 S}] p_{S 0},$

where we have written $\phi$ for $1 - 1/z$, to save a little ink. All things told, the rate of change of $p_{S 0}$ is given by

$\begin{array}{rcl} \frac{d p_{S 0}}{d t} &= &(b_s + m) \phi q_{S|0 0} p_{0 0}\\ & & - [d_S + m \phi q_{0|S 0}] p_{S 0}\\ & & - [(1 - \phi) b_s + (b_S + m) \phi q_{S|0 S}] p_{S 0}\\ & & + [d_S + m \phi q_{0|S S}] p_{S S}\\ & & - (b_A + m) \phi q_{A|0 S} p_{S 0}\\ & & + [d_A + m \phi q_{0|A S}] p_{S A}. \end{array}$

Yuck! After writing a few equations like that, it’s easy to wonder if maybe we should look for new mathematical ideas which could help us better organise our thinking.

The next steps follow the general plan we laid out above. We write differential equations for the pairwise probabilities $p_{a b}$, which depend on triplet quantities $q_{a|b c}$. Then, we impose a pair approximation, declaring that $q_{a|b c} = q_{a|b}$, which gives us a closed system of equations. Next, we find the fixed point with $p_A = 0$, and we perturb around that fixed point to see what happens when a strain of altruists is introduced into a selfish population. The dominant eigenvalue $\lambda$ of the time-evolution matrix $T$ tells us, in this approximation, whether the altruistic strain will invade the lattice or wither away. The condition $\lambda \gt 0$ can be written in the form

$B \phi \tilde{q}_{A | A} - C \gt 0.$

That Here, is we’ve to written say, an attempted invasion by altruists will succeed if a measure of benefit,$\tilde{q}_{A | A}$ for the conditional probability of altruists contacting altruists which obtains as the local densities equilibrate. That is to say, an attempted invasion by altruists will succeed if a measure of benefit, $B$, multiplied by an indicator of “assortment” among genetically similar individuals, is greater than the cost of altruistic behaviour, $C$.

After all our mucking with eigenvalues, we have found a condition which is strongly reminiscent of a classic and influential idea from mid-twentieth-century evolutionary theory. In biology, the inequality

$(benefit) \times (relatedness) \gt (cost)$

is known as Hamilton’s rule (Van Dyken et al., 2011). This is a rule-of-thumb for when natural selection can favour altruistic behaviour: altruists can prosper when the inequality is satisfied. Hamilton’s rule was originally derived for unstructured populations, with no network topology or spatial arrangement to them. We can understand Hamiltion’s rule in this context in the following way:

How well an organism fares in the great contest of life depends on the environment it experiences. During the course of its life, an individual member of a species will interact with a set of others, which we could call its “social circle”. The composition of that social circle affects how well an individual will propagate its genetic information to the next generation — its fitness. In an unstructured population, we can think of such circles being formed by taking random samples of the population. An altruist, by our definition, sacrifices some of its own potential so that offspring of other individuals can prosper. A social circle of altruists can fare better than a social circle of selfish individuals, increasing the chances that social circles which form in the next generation will contain altruists (Van Dyken et al., 2011).

It’s common to treat “benefit” and “cost” as parameters of the system. We could potentially derive them from more fundamental dynamics, if we looked more closely at the interactions within a particular ecosystem, but right now, they’re just knobs we can turn. What about the remaining quantity in Hamilton’s rule: what does “relatedness” mean? Excellent question! We can get a feel for where the term came from by taking a gene’s-eye view: copies of many of my particular genetic variants will be sitting inside the cells of my close relatives. Consequently, as far as my genes are concerned, if my relatives survive, that’s almost as good as my surviving. When reckoning the benefit of altruism against its cost, then, the aid one organism brings to another ought to be weighted by how “related” they are.

So, we can say that we have “recovered Hamilton’s rule as an emergent property of the spatial dynamics” — if we are willing to draw a circle around the middle of our formula and declare those terms to be the “relatedness”.

Knowing where our invasion condition came from, we can appreciate some of the caveats which scientists have raised in connection with Hamilton’s rule.

Simon et al. (2011):

In particular, $r$ is often taken to be the average relatedness of interacting individuals, as compared to the average relatedness in the population, in which case inequality (1) $[r B \gt C]$ is referred to as Hamilton’s rule. It is important to note that inequality (1) is only a description of whether the current level of assortment as subsumed in the parameter $r$ is sufficient to favour cooperation, but not a description of the mechanisms that would lead to such assortment. It has been suggested repeatedly that the problem of cooperation can be understood entirely based on Hamilton’s rules of the form (1). Even though often taken as gospel, this claim is wrong in general, for two reasons.

First, and foremost, even if a rule of the form (1) predicts the direction of selection for cooperation at a given point in time, the long-term evolution of cooperation cannot be understood without having a dynamic equation for the quantity $r$, i.e., without understanding the temporal dynamics of assortment. The dynamics of $r$ in turn cannot be understood based solely on the current level of cooperation, and hence expressions of type (1) are in general insufficient to describe the evolutionary dynamics of cooperation. Second, the quantity $r$, which measures the average relatedness among interacting individuals, is insufficient to construct Hamilton’s rule in models that account for variable individual-level death rates and/or group-level events.

Damore and Gore (2011):

Contrary to the popular use of the word, “relatedness” describes a population of interacting individuals, where $r$ refers to how assorted similar individuals are in the population.

And in more detail:

every definition of relatedness must take into account the population. Therefore, relatedness is not the percent of genome shared, genetic distance, or any extent of similarity between two isolated individuals in a larger population. Also, because horizontal gene transfer is commonplace between microbes and selection is strong, phylogenetic distance or any other indirect genetic measure is likely to be inaccurate. Many of these false definitions live on partly because ambiguous heuristics like “$\frac{1}{2}$ for brothers, $\frac{1}{8}$ for cousins,” which require very specific assumptions, are repeated in the primary literature. Also, most non-theoretical papers simply define relatedness as “a measure of genetic similarity” and do not elaborate or instead leave the precise definition to the supplemental information …. Unfortunately, scientists can easily misinterpret this “measure of genetic similarity” to be anything that is empirically convenient such as genetic distance or percent of genome shared. Largely because of this confusion, we support the more widespread use of the term “assortment,” which is harder to misinterpret …. For similar reasons of reader understanding, we also encourage authors to make calculations more explicit, either in the main or supplemental text, and to avoid repeating previous results without giving the assumptions that went into deriving them.

### Envoi

Stepping back for a moment, notice that although the terms and coefficients started to proliferate on us, we haven’t introduced any remarkably “advanced” or “esoteric” mathematics. Derivatives, matrices, eigenvalues — this is undergraduate stuff! The amount of algebra we’ve been able to stir up without really even trying is, however, a little worrying. We can invent a mathematical model for some particular biological scenario, and we might even be able to solve it, or at least tell how it’ll behave in certain interesting circumstances. But what if we want general results which extend across models, or ideas which will help us identify the common features and the key disparities among a host of examples?

With that attitude, then, a thought towards “higher” mathematics:

A Petri net specifies a symmetric monoidal category. Each truncation of the moment-dynamics hierarchy for a system yields a Petri net, and so successive truncations of the moment-dynamics hierarchy yield mappings between categories. Going from a pair approximation to a mean-field approximation, for example, transforms a Petri net whose circles are labelled with pair states to one labelled by site states. Category theory might be able to say something interesting here. Anything which can tame the horrible spew of equations which arises in these problems would be great to have. Ought we be considering, say, the strict 2-category whose objects are moment-closure approximations to an ecosystem, and whose morphisms are symmetric monoidal functors between them?

### References

• H. Matsuda, N. Ogita, A. Sasaki, and K. Sato (1992), “Statistical mechanics of population”, Progress of Theoretical Physics 88, 6: 1035–49 (web).
• U. Dieckmann, R. Law, and J. A. J. Metz, eds., The Geometry of Ecological Interactions: Simplifying Spatial Complexity. Cambridge University Press, 2000.
• T. Gross, C. J. Dommar D’Lima and B. Blasius (2006), “Epidemic dynamics on an adaptive network”, Physical Review Letters 96, 20: 208701 (web). arXiv:q-bio/0512037.
• A.-L. Do and T. Gross (2009), “Contact processes and moment closure on adaptive networks”, in T. Gross and H. Sayama, eds., Adaptive Networks: Theory, Models and Applications. Springer.
• B. Allen (2010), Studies in the Mathematics of Evolution and Biodiversity. PhD thesis, Boston University (web).
• J. A. Damore and J. Gore (2011), “Understanding microbial cooperation”. Journal of Theoretical Biology DOI:10.1016/j.jtbi.2011.03.008 (pdf).
• B. Simon, J. A. Fletcher and M. Doebeli (2011), “Hamilton’s rule in multi-level selection models” Journal of Theoretical Biology. PMID:21820447.
• B. C. Stacey, A. Gros and Y. Bar-Yam (2011), “Beyond the mean field in host-pathogen spatial ecology”, arXiv:1110.3845 [nlin.AO].
• J. D. Van Dyken, T. A. Linksvayer and M. J. Wade (2011), “Kin Selection–Mutation Balance: A Model for the Origin, Maintenance, and Consequences of Social Cheating” The American Naturalist 177, 3: 288–300. JSTOR:10.1086/658365 (pdf).

category: blog