The Azimuth Project
Blog - the mathematical origin of irreversibility

guest post by Matteo Smerlak


Thermodynamical dissipation and adaptive evolution are two faces of the same Markovian coin!

Consider this. The second law of thermodynamics states that the entropy of an isolated thermodynamic system can never decrease; Landauer’s principle maintains that the erasure of information inevitably causes dissipation; Fisher’s fundamental theorem of natural selection asserts that any fitness difference within a population leads to adaptation in an evolution process governed by natural selection. Diverse as they are, these statements have two common characteristics:

  1. they express the irreversibility of certain natural phenomena, and

  2. the dynamical processes underlying these phenomena involve an element of randomness.

Doesn’t this suggest to you the following question: Could it be that thermal phenomena, forgetful information processing and adaptive evolution are governed by the same stochastic mechanism?

The answer is—yes! The key to this rather profound connection resides in a universal property of Markov process discovered recently in the context of non-equilibrium statistical mechanics, and known as the ‘fluctuation theorem’. Typically stated in terms in terms of ‘dissipated work’ or ‘entropy production’, this result is can be seen as an extension of the second law of thermodynamics to small systems, where thermal fluctuations cannot be neglected. But it is actually much more than this: it is the mathematical underpinning of irreversibility itself, be it thermodynamical, evolutionary, or else. To make this point clear, let me start by giving a general formulation of the fluctuation theorem that makes no reference to physics concepts such as ‘heat’ or ‘work’.

The mathematical fact

Consider a system randomly jumping between states a,b,a, b,\dots with (possibly time-dependent) transition rates γ ab(t)\gamma_{a b}(t) (where aa is the state prior to jump, while bb is the state after jump). I’ll assume that this dynamics defines a (continuous-time) Markov process, viz. that the numbers γ ab\gamma_{a b} are the matrix entries of an infinitesimal stochastic matrix, which means that its off-diagonal entries are non-negative and that its columns sum up to zero.

Now, each possible history ω=(ω t) 0tT\omega=(\omega_t)_{0\leq t\leq T} of this process can be characterized by the sequence of occupied states a ja_{j} and by the times τ j\tau_{j} at which the transitions a j1a ja_{j-1}\longrightarrow a_{j} occur (0jN)(0\leq j\leq N):

ω=(ω 0=a 0τ 0a 1τ 1τ Na N=ω T).\omega=(\omega_{0}=a_{0}\overset{\tau_{0}}{\longrightarrow} a_{1} \overset{\tau_{1}}{\longrightarrow}\cdots \overset{\tau_{N}}{\longrightarrow} a_{N}=\omega_{T}).

Define the skewness σ j(τ j)\sigma_{j}(\tau_{j}) of each of these transitions to be the logarithmic ratio of transition rates:

σ j(τ j):=lnγ a ja j1(τ j)γ a j1a j(τ j)\sigma_{j}(\tau_{j}):=\ln\frac{\gamma_{a_{j}a_{j-1}}(\tau_{j})}{\gamma_{a_{j-1}a_{j}}(\tau_{j})}

Also define the self-information of the system in state aa at time tt by:

i a(t):=lnπ a(t)i_a(t):=-\ln\pi_{a}(t)

where π a(t)\pi_{a}(t) is the probability that the system is in state aa at time tt, given some prescribed initial distribution π a(0)\pi_{a}(0).

Then the following identity—the detailed fluctuation theorem—holds:

Prob[ΔiΣ=A]=e AProb[ΔiΣ=A]\mathrm{Prob}[\Delta i-\Sigma=-A]=e^{-A}\mathrm{Prob}[\Delta i-\Sigma=A]


Σ:= jσ j(τ j)\Sigma:=\sum_{j}\sigma_{j}(\tau_{j})

is the cumulative skewness along a trajectory of the system, and Δi=i a 0(T)i a N(0)\Delta i=i_{a_0}(T)-i_{a_N}(0) is the variation of the self-information between the end points of this trajectory. This quantity is also sometimes called the “surprisal”, as it measures the “surprise” of finding out that the system is in state aa at time tt.

This identity has an immediate consequence: if \langle\,\cdot\,\rangle denotes the average over all realizations of the process, then we have the integral fluctuation theorem:

e Δi+Σ=1,\langle e^{-\Delta i+\Sigma}\rangle=1,

which implies (by convexity of the exponential and Jensen’s inequality)

Δi=ΔSΣ.\langle \Delta i\rangle=\Delta S\geq\langle\Sigma\rangle.

In short: the mean variation of self-information, aka the variation of Shannon entropy

S(t):= aπ a(t)i a(t)S(t):=-\sum_{a}\pi_{a}(t)i_a(t)

is bounded from below by the mean cumulative skewness of the underlying stochastic trajectory.

This is the fundamental mathematical fact underlying irreversibility. To unravel its physical and biological consequences, it suffices to consider the origin and interpretation of the ‘skewness’ term in different contexts. (By the way, people usually call Σ\Sigma the “entropy production” or “dissipation function”—but how tautological is that?)

The physical and biological consequences

Consider first the standard stochastic-thermodynamic scenario where a physical system is kept in contact with a thermal reservoir at inverse temperature β\beta and undergoes thermally induced transitions between states a,b,a, b,\dots. By virtue of the detailed balance condition

e βE a(t)γ ab(t)=e βE b(t)γ ba(t),e^{-\beta E_{a}(t)}\gamma_{a b}(t)=e^{-\beta E_{b}(t)}\gamma_{b a}(t),

the skewness σ j(τ j)\sigma_{j}(\tau_{j}) of each such transition is β\beta times the energy difference between the states a ja_{j} and a j1a_{j-1}, namely the heat received from the reservoir during the transition. Hence, the mean cumulative skewness Σ\langle \Sigma\rangle is nothing but βQ,\beta\langle Q\rangle, with QQ the total heat received by the system along the process. It follows from the detailed fluctuation theorem that

e ΔS+βQ=1\langle e^{-\Delta S+\beta Q}\rangle=1

and therefore

ΔSβQ\Delta S\geq\beta\langle Q\rangle

which is of course Clausius’ inequality. In a computational context where the control parameter is the entropy variation itself (such as in a bit-erasure protocol, where ΔS=ln2\Delta S=-\ln 2), this inequality in turn expresses Landauer’s principle: it impossible to decrease the self-information of the system’s state without dissipating a minimal amount of heat into the environment (in this case QkTln2-Q \geq k T\ln2, the ‘Landauer bound’). More general situations (several types of reservoirs, Maxwell-demon-like feedback controls) can be treated along the same lines, and the various forms of the second law of thermodynamics derived from the detailed fluctuation theorem.

Now, many would agree that evolutionary dynamics is a wholly different business from thermodynamics; in particular, notions such as ‘heat’ or ‘temperature’ are clearly irrelevant to Darwinian evolution. However, the stochastic framework of Markov processes is relevant to describe the genetic evolution of a population, and this fact alone has important consequences. As a simple example, consider the time evolution of mutant fixations x ax_{a} in a population, with aa ranging over the possible genotypes. In a “symmetric mutation scheme”, which I understand is biological parlance for “reversible Markov process” or “detailed balance”, the ratio between the aba\mapsto b and bab\mapsto a transition rates is completely determined by the fitnesses f af_{a} and f bf_b of aa and bb, according to

γ abγ ba=(f bf a) ν\displaystyle{\frac{\gamma_{a b}}{\gamma_{b a}} =(\frac{f_{b}}{f_{a}})^{\nu} }

where ν\nu is a model-dependent function of the effective population size [Sella2005]. Along a given history of mutant fixations, the cumulated skewness Σ\Sigma is therefore given by minus the fitness flux:

Φ=ν j(lnf a jlnf a j1).\Phi=\nu\sum_{j}(\ln f_{a_j}-\ln f_{a_{j-1}}).

The integral fluctuation theorem then becomes the fitness flux theorem:

e ΔSΦ=1\langle e^{-\Delta S-\Phi}\rangle=1

discussed recently by Mustonen and Lässig and implying Fisher’s fundamental theorem of natural selection as a special case. (Incidentally, the “fitness flux theorem” derived in this reference is more general than this; for instance, it does not rely on the “symmetric mutation scheme” assumption above.) The ensuing inequality

ΦΔS\langle \Phi\rangle\geq-\Delta S

shows that a positive fitness flux is “an almost universal evolutionary principle of biological systems” (Mustonen2010), with negative contributions limited to time intervals with a systematic loss of adaptation (ΔS>0\Delta S > 0). This statement may well be the closest thing to a version of the Second Law of Thermodynamics applying to evolutionary dynamics.

It is really quite remarkable that thermodynamical dissipation and Darwinian evolution can be reduced to the same stochastic mechanism, and that notions such as ‘fitness flux’ and ‘heat’ can arise as two faces of the same mathematical coin, namely the ‘skewness’ of Markovian transitions. After all, the phenomenon of life is in itself a direct challenge to thermodynamics, isn’t it? When thermal phenomena tend to increase the world’s disorder, life strives to bring about and maintain exquisitely fine spatial and chemical structures—which is why Schrödinger famously proposed to define life as negative entropy. Could there be a more striking confirmation of his intuition—and a reconciliation of evolution and thermodynamics in the same go—than the fundamental inequality of adaptive evolution ΦΔS\langle\Phi\rangle\geq-\Delta S?

Surely the detailed fluctuation theorem for Markov processes has other applications, pertaining neither to thermodynamics nor adaptive evolution. Can you think of any?

Proof of the fluctuation theorem

I am a physicist, but knowing that many readers of John’s blog are mathematicians, I’ll do my best to frame—and prove—the FT as an actual theorem.

Let (Ω,𝒯,p)(\Omega,\mathcal{T},p) be a probability space and () =ΩΩ(\,\cdot\,)^{\dagger}=\Omega\to \Omega a measurable involution of Ω\Omega. Denote p p^{\dagger} the push-forward probability measure through this involution, and

R=lndpdp R=\ln \frac{d p}{d p^\dagger}

the logarithm of the corresponding Radon-Nikodym derivative (we assume p p^\dagger and pp are mutually absolutely continuous). Then the following lemmas are true, with (1)(2)(3)(1)\Rightarrow(2)\Rightarrow(3):

Lemma 1. The detailed fluctuation relation:

A,p(R 1(A))=e Ap(R 1(A))\forall A\in\mathbb{R}, p\big(R^{-1}(-A) \big)=e^{-A}p \big(R^{-1}(A) \big)

Lemma 2. The integral fluctuation relation:

Ωdp(ω)e R(ω)=1\int_{\Omega} d p(\omega)\,e^{-R(\omega)}=1

Lemma 3. The positivity of the Kullback-Leibler divergence:

D(pp ):= Ωdp(ω)R(ω)0.D(p\,\Vert\, p^{\dagger}):=\int_{\Omega} d p(\omega)\,R(\omega)\geq0.

These are basic facts which anyone can show: (2)(3)(2)\Rightarrow(3) by Jensen’s inequality, (1)(2)(1)\Rightarrow(2) trivially, and (1)(1) follows from R(ω )=R(ω)R(\omega^{\dagger})=-R(\omega) and the change of variables theorem, as follows,

R 1(A)dp(ω)= R 1(A)dp (ω)= R 1(A)dp(ω)e R(ω)=e A R 1(A)dp(ω).\int_{R^{-1}(-A)} d p(\omega)=\int_{R^{-1}(A)}d p^{\dagger}(\omega)=\int_{R^{-1}(A)} d p(\omega)\, e^{-R(\omega)}=e^{-A} \int_{R^{-1}(A)} d p(\omega).

But here is the beauty: if

(Ω,𝒯,p)(\Omega,\mathcal{T},p) is actually a Markov process defined over some time interval [0,T][0,T] and valued in some (say discrete) state space Σ\Sigma, with the instantaneous probability π a(t)=p({ω t=a})\pi_{a}(t)=p\big(\{\omega_{t}=a\} \big) of each state aΣa\in\Sigma satisfying the “master equation” (aka Kolmogorov equation)

dπ a(t)dt= ba(γ ba(t)π a(t)γ ab(t)π b(t)),\frac{d\pi_{a}(t)}{dt}=\sum_{b\neq a}\Big(\gamma_{b a}(t)\pi_{a}(t)-\gamma_{a b}(t)\pi_{b}(t)\Big),


• the dagger involution is time-reversal, that is ω t :=ω Tt,\omega^{\dagger}_{t}:=\omega_{T-t},

then for a given path

ω=(ω 0=a 0τ 0a 1τ 1τ Na N=ω T)Ω\omega=(\omega_{0}=a_{0}\overset{\tau_{0}}{\longrightarrow} a_{1} \overset{\tau_{1}}{\longrightarrow}\cdots \overset{\tau_{N}}{\longrightarrow} a_{N}=\omega_{T})\in\Omega

the logarithmic ratio R(ω)R(\omega) decomposes into “variation of self-information” and “cumulative skewness” along ω\omega:

R(ω)=(lnπ ω 0(0)+lnπ ω T(T)) Δi(ω) j=1 Nlnγ a ja j1(τ j)γ a j1a j(τ j) Σ(ω).R(\omega)=\underbrace{\Big(-\ln\pi_{\omega_{0}}(0)+\ln\pi_{\omega_{T}}(T) \Big)}_{\Delta i(\omega)}-\underbrace{\sum_{j=1}^{N}\ln\frac{\gamma_{a_{j}a_{j-1}}(\tau_{j})}{\gamma_{a_{j-1}a_{j}}(\tau_{j})}}_{\Sigma(\omega)}.

This is easy to see if one writes the probability of a path explicitly as

p(ω)=π a 0(0)[ j=1 Nϕ a j1(τ j1,τ j)γ a j1a j(τ j)]ϕ a N(τ N,T)p(\omega)=\pi_{a_{0}}(0)\left[\prod_{j=1}^{N}\phi_{a_{j-1}}(\tau_{j-1},\tau_{j})\gamma_{a_{j-1}a_{j}}(\tau_{j})\right]\phi_{a_{N}}(\tau_{N},T)


ϕ a(τ,τ)=ϕ a(τ,τ)=exp( ba τ τdtγ ab(t))\phi_{a}(\tau,\tau')=\phi_{a}(\tau',\tau)=\exp\Big(-\sum_{b\neq a}\int_{\tau}^{\tau'}dt\, \gamma_{a b}(t)\Big)

is the probability that the process remains in the state aa between the times τ\tau and τ\tau'. It follows from the above lemma that

Theorem. Let (Ω,𝒯,p)(\Omega,\mathcal{T},p) be a Markov jump process and i,Σ:Ωi,\Sigma:\Omega\rightarrow \mathbb{R} be defined as above. Then we have

  1. The detailed fluctuation theorem:
A,p((ΔiΣ) 1(A))=e Ap((ΔiΣ) 1(A))\forall A\in\mathbb{R}, p\big((\Delta i-\Sigma)^{-1}(-A) \big)=e^{-A}p \big((\Delta i-\Sigma)^{-1}(A) \big)
  1. The integral fluctuation theorem:
Ωdp(ω)e Δi(ω)+Σ(ω)=1\int_{\Omega} d p(\omega)\,e^{-\Delta i(\omega)+\Sigma(\omega)}=1
  1. The “Second Law” inequality:
ΔS:= Ωdp(ω)Δi(ω) Ωdp(ω)Σ(ω)\Delta S:=\int_{\Omega} d p(\omega)\,\Delta i(\omega)\geq \int_{\Omega} d p(\omega)\,\Sigma(\omega)

The same theorem can be formulated for other kinds of Markov processes as well, including diffusion processes (in which case it follows from the Girsanov theorem).


Landauer’s principle was introduced here:

• (Landauer1961) R.Landauer, Irreversibility and heat generation in the computing process}, IBM Journal of Research and Development 5, (1961) 183–191.

and is now being verified experimentally by various groups worldwide.

The “fundamental theorem of natural selection” was derived by Fisher in his book:

• (Fisher1930) R. Fisher, The Genetical Theory of Natural Selection, Clarendon Press, Oxford, 1930.

His derivation has long been considered obscure, even perhaps wrong, but apparently the theorem is now well accepted. I believe the first Markovian models of genetic evolution appeared here:

• (Fisher1922) R. A. Fisher, On the dominance ratio, Proc. Roy. Soc. Edinb. 42 (1922), 321–341.

• (Wright1931) S. Wright, Evolution in Mendelian populations, Genetics 16 (1931), 97–159.

Fluctuation theorems are reviewed here:

• (Sevick2008) E. Sevick, R. Prabhakar, S. R. Williams, and D.J. Searles, Fluctuation theorems, Ann. Rev. Phys. Chem. 59 (2008), 603–633.

Two of the key ideas for the “detailed fluctuation theorem” discussed here are due to Crooks:

• (Crooks1999) G. Crooks, Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences, Phys. Rev. E 60 (1999), 2721–2726.

who identified (E a(τ j)E a(τ j1))(E_{a}(\tau_{j})-E_{a}(\tau_{j-1})) as heat, and Seifert:

• (Seifert2005) Uwe Seifert, Entropy production along a stochastic trajectory and an integral fluctuation theorem, Phys. Rev. Lett. 95 (2005), 4.

who understood the relevance of the self-information in this context.

The connection between statistical physics and evolutionary biology is discussed here:

• (Sella2005) G. Sella and A.E. Hirsh, The application of statistical physics to evolutionary biology, Proc. Nat. Acad. Sci. USA102 (2005), 9541–9546.

and the “fitness flux theorem” is derived in

• (Mustonen2010) V. Mustonen and M. Lässig, Fitness flux and ubiquity of adaptive evolution, Proc. Nat. Acad. Sci. USA 107 (2010), 4248–4253.

Schrödinger’s famous discussion of the physical nature of life was published here:

• (Schrodinger1944) E. Schrödinger, What is Life?, Cambridge University Press, Cambridge, 1944.

category: blog