The Azimuth Project
Blog - network theory (part 12)

This page is a blog article in progress, written by John Baez. To see discussions of this page as it was being written, go to the Azimuth Forum. To see the final polished version, visit the Azimuth Blog.

Last time we proved a version of Noether’s theorem for stochastic mechanics. Now I want to compare that to the more familiar quantum version.

But to do this, I need to say more about the analogy between stochastic mechanics and quantum mechanics. And whenever I try, I get pulled toward explaining some technical issues involving analysis: whether sums converge, whether derivatives exist, and so on. I’ve been trying to avoid such stuff—not because I dislike it, but because I’m afraid you might. But the more I put off discussing these issues, the more they fester and make me unhappy. In fact, that’s why it’s taken so long for me to write this post!

So, this time I will gently explore some of these technical issues. But don’t be scared: I’ll mainly talk about some simple big ideas. Next time I’ll discuss Noether’s theorem. I hope that by getting the technicalities out of my system, I’ll feel okay about hand-waving whenever I want.

And if you’re an expert on analysis, maybe you can help me with a question.

Stochastic mechanics versus quantum mechanics

First, we need to recall the analogy we began sketching in Part 5, and push it a bit further. The idea is that stochastic mechanics differs from quantum mechanics in two big ways:

• First, instead of complex amplitudes, stochastic mechanics uses nonnegative real probabilities. The complex numbers form a ring; the nonnegative real numbers form a mere rig, which is a ‘ring without negatives’. Rigs are much neglected in the typical math curriculum, but unjustly so: they’re almost as good as rings in many ways, and there are lots of important examples, like the natural numbers \mathbb{N} and the nonnegative real numbers, [0,)[0,\infty). For probability theory, we should learn to love rigs.

But there are, alas, situations where we need to subtract probabilities, even when the answer comes out negative: namely when we’re taking the time derivative of a probability. So sometimes we need \mathbb{R} instead of just [0,)[0,\infty).

• Second, while in quantum mechanics a state is described using a ‘wavefunction’, meaning a complex-valued function obeying

|ψ| 2=1 \int |\psi|^2 = 1

in stochastic mechanics it’s described using a ‘probability distribution’, meaning a nonnegative real function obeying

ψ=1 \int \psi = 1

So, let’s try our best to present the theories in close analogy, while respecting these two differences.

States

We’ll start with a set XX whose points are states that a system can be in. Last time I assumed XX was a finite set, but this post is so mathematical I might as well let my hair down and assume it’s a measure space. A measure space lets you do integrals, but a finite set is a special case, and then these integrals are just sums. So, I’ll write things like

f \int f

and mean the integral of the function ff over the measure space XX, but if XX is a finite set this just means

xXf(x) \sum_{x \in X} f(x)

Now, I’ve already defined the word ‘state’, but both quantum and stochastic mechanics need a more general concept of state. Let’s call these ‘quantum states’ and ‘stochastic states’:

• In quantum mechanics, the system has an amplitude ψ(x)\psi(x) of being in any state xXx \in X. These amplitudes are complex numbers with

|ψ| 2=1\int | \psi |^2 = 1

We call ψ:X\psi: X \to \mathbb{C} obeying this equation a quantum state.

• In stochastic mechanics, the system has a probability ψ(x)\psi(x) of being in any state xXx \in X. These probabilities are nonnegative real numbers with

ψ=1\int \psi = 1

We call ψ:X[0,)\psi: X \to [0,\infty) obeying this equation a stochastic state.

In quantum mechanics we often use this abbreviation:

ϕ,ψ=ϕ¯ψ \langle \phi, \psi \rangle = \int \overline{\phi} \psi

so that a quantum state has

ψ,ψ=1 \langle \psi, \psi \rangle = 1

Similarly, we could introduce this notation in stochastic mechanics:

ψ=ψ \langle \psi \rangle = \int \psi

so that a stochastic state has

ψ=1 \langle \psi \rangle = 1

But this notation is a bit risky, since angle brackets of this sort often stand for expectation values of observables. So, I’ve been writing ψ\int \psi, and I’ll keep on doing this.

In quantum mechanics, ϕ,ψ\langle \phi, \psi \rangle is well-defined whenever both ϕ\phi and ψ\psi live in the vector space

L 2(X)={ψ:X:|ψ| 2<}L^2(X) = \{ \psi: X \to \mathbb{C} \; : \; \int |\psi|^2 &lt; \infty \}

In stochastic mechanics, ψ\langle \psi \rangle is well-defined whenever ψ\psi lives in the vector space

L 1(X)={ψ:X:|ψ|<}L^1(X) = \{ \psi: X \to \mathbb{R} \; : \; \int |\psi| &lt; \infty \}

You’ll notice I wrote \mathbb{R} rather than [0,)[0,\infty) here. That’s because in some calculations we’ll need functions that take negative values, even though our stochastic states are nonnegative.

Observables

A state is a way our system can be. An observable is something we can measure about our system. They fit together: we can measure an observable when our system is in some state. If we repeat this we may get different answers, but there’s a nice formula for average or ‘expected’ answer.

• In quantum mechanics, an observable is a self-adjoint operator AA on L 2(X)L^2(X). The expected value of AA in the state ψ\psi is

ψ,Aψ \langle \psi, A \psi \rangle

Here I’m assuming that we can apply AA to ψ\psi and get a new vector AψL 2(X)A \psi \in L^2(X). This is automatically true when XX is a finite set, but in general we need to be more careful.

• In stochastic mechanics, an observable is a real-valued function AA on XX. The expected value of AA in the state ψ\psi is

Aψ \int A \psi

Here we’re using the fact that we can multiply AA and ψ\psi and get a new vector AψL 1(X)A \psi \in L^1(X), at least if AA is bounded. Again, this is automatic if XX is a finite set, but not otherwise.

Symmetries

Besides states and observables, we need ‘symmetries’, which are transformations that map states to states. We use these to describe how our system changes when we wait a while, for example.

• In quantum mechanics, an isometry is a linear map U:L 2(X)L 2(X)U: L^2(X) \to L^2(X) such that

Uϕ,Uψ=ϕ,ψ \langle U \phi, U \psi \rangle = \langle \phi, \psi \rangle

for all ψ,ϕL 2(X)\psi, \phi \in L^2(X). If UU is an isometry and ψ\psi is a quantum state, then UψU \psi is again a quantum state.

• In stochastic mechanics, a stochastic operator is a linear map U:L 1(X)L 1(X)U: L^1(X) \to L^1(X) such that

Uψ=ψ \int U \psi = \int \psi

and

ψ0Uψ0 \psi \ge 0 \; \; \Rightarrow \; \; U \psi \ge 0

for all ψL 1(X)\psi \in L^1(X). If UU is stochastic and ψ\psi is a stochastic state, then UψU \psi is again a stochastic state.

In quantum mechanics we are mainly interested in invertible isometries, which are called unitary operators. There are lots of these. There are, however, very few invertible stochastic operators:

Puzzle 1. Suppose XX is a finite set. Show that every isometry U:L 2(X)L 2(X)U: L^2(X) \to L^2(X) is invertible.

Puzzle 2. Suppose XX is a finite set. What are the invertible stochastic operators U:L 1(X)L 1(X)U: L^1(X) \to L^1(X)?

This is why we usually think of time evolution as being reversible quantum mechanics, but not in stochastic mechanics! In quantum mechanics we often describe time evolution using a ‘1-parameter group’, while in stochastic mechanics we describe it using a 1-parameter semigroup… meaning that we can run time forwards, but not backwards.

But let’s see how this works in detail!

Time evolution in quantum mechanics

In quantum mechanics there’s a beautiful relation between observables and symmetries, which goes like this. Suppose that for each time tt we want a unitary operator U(t):L 2(X)L 2(X)U(t) : L^2(X) \to L^2(X) that describes time evolution. Then it makes a lot of sense to demand that these operators form a 1-parameter group:

Definition. A collection of linear operators U(t) (tt \in \mathbb{R}) on some vector space forms a 1-parameter group if

U(0)=1 U(0) = 1

and

U(s+t)=U(s)U(t) U(s+t) = U(s) U(t)

for all s,ts,t \in \mathbb{R}.

Note that these conditions force all the operators U(t)U(t) to be invertible.

Now suppose our vector space is a Hilbert space, like L 2(X)L^2(X). Then we call a 1-parameter group a 1-parameter unitary group if the operators involved are all unitary.

It turns out that 1-parameter unitary groups are either continuous in a certain way, or so pathological that you can’t even prove they exist without the axiom of choice! So, we always focus on the continuous case:

Definition. A 1-parameter unitary group is strongly continuous if U(t)ψU(t) \psi depends continuously on tt for all ψ\psi, in this sense:

t itU(t i)ψU(t)ψ0 t_i \to t \;\; \Rightarrow \; \;\|U(t_i) \psi - U(t) \psi \| \to 0

Then we get a classic result proved by Marshall Stone back in the early 1930s. You may not know him, but he was so influential at the University of Chicago during this period that it’s often called the “Stone Age”. And here’s one reason why:

Stone’s Theorem. There is a one-to-one correspondence between strongly continuous 1-parameter unitary groups on a Hilbert space and self-adjoint operators on that Hilbert space, given as follows. Given a strongly continuous 1-parameter unitary group U(t)U(t) we can always write

U(t)=exp(itH) U(t) = \exp(-i t H)

for a unique self-adjoint operator HH. Conversely, any self-adjoint operator determines a strongly continuous 1-parameter group this way. For all vectors ψ\psi for which HψH \psi is well-defined, we have

ddtU(t)ψ| t=0=iHψ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = -i H \psi

Moreover, for any of these vectors, if we set

ψ(t)=exp(itH)ψ \psi(t) = \exp(-i t H) \psi

we have

ddtψ(t)=iHψ(t) \frac{d}{d t} \psi(t) = - i H \psi(t)

When U(t)=exp(itH)U(t) = \exp(-i t H) describes the evolution of a system in time, HH is is called the Hamiltonian, and it has the physical meaning of ‘energy’. The equation I just wrote down is then called Schrödinger’s equation.

So, simply put, in quantum mechanics we have a correspondence between observables and nice one-parameter groups of symmetries. Not surprisingly, our favorite observable, energy, corresponds to our favorite symmetry: time evolution!

However, if you were paying attention, you noticed that I carefully avoided explaining how we define exp(itH)\exp(- i t H). I didn’t even say what a self-adjoint operator is. This is where the technicalities come in: they arise when HH is unbounded, and not defined on all vectors in our Hilbert space.

Luckily, these technicalities evaporate for finite-dimensional Hilbert spaces, such as L 2(X)L^2(X) for a finite set XX. Then we get:

Stone’s Theorem (Baby Version). Suppose we are given a finite-dimensional Hilbert space. In this case, a linear operator HH on this space is self-adjoint iff it’s defined on the whole space and

ϕ,Hψ=Hϕ,ψ \langle \phi , H \psi \rangle = \langle H \phi, \psi \rangle

for all vectors ϕ,ψ\phi, \psi. Given a strongly continuous 1-parameter unitary group U(t)U(t) we can always write

U(t)=exp(itH) U(t) = \exp(- i t H)

for a unique self-adjoint operator HH, where

exp(itH)ψ= n=0 (itH) nn!ψ \exp(-i t H) \psi = \sum_{n = 0}^\infty \frac{(-i t H)^n}{n!} \psi

with the sum converging for all ψ\psi. Conversely, any self-adjoint operator on our space determines a strongly continuous 1-parameter group this way. For all vectors ψ\psi in our space we then have

ddtU(t)ψ| t=0=iHψ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = -i H \psi

and if we set

ψ(t)=exp(itH)ψ \psi(t) = \exp(-i t H) \psi

we have

ddtψ(t)=iHψ(t) \frac{d}{d t} \psi(t) = - i H \psi(t)

Time evolution in stochastic mechanics

We’ve seen that in quantum mechanics, time evolution is usually described by a 1-parameter group of operators that comes from an observable: the Hamiltonian. Stochastic mechanics is different!

First, since stochastic operators aren’t usually invertible, we typically describe time evolution by a mere ‘semigroup’:

Definition. A collection of linear operators U(t)U(t) (t[0,)t \in [0,\infty)) on some vector space forms a 1-parameter semigroup if

U(0)=1 U(0) = 1

and

U(s+t)=U(s)U(t) U(s+t) = U(s) U(t)

for all s,t0s, t \ge 0.

Now suppose this vector space is L 1(X)L^1(X) for some measure space XX. We want to focus on the case where the operators U(t)U(t) are stochastic and depend continuously on tt in the same sense we discussed earlier.

Definition. A 1-parameter strongly continuous semigroup of stochastic operators U(t):L 1(X)L 1(X)U(t) : L^1(X) \to L^1(X) is called a Markov semigroup.

What’s the analogue of Stone’s theorem for Markov semigroups? I don’t know a fully satisfactory answer! If you know, please tell me.

Later I’ll say what I do know—I’m not completely clueless—but for now let’s look at the ‘baby’ case where XX is a finite set. Then the story is neat and complete:

Theorem. Suppose we are given a finite set XX. In this case, a linear operator HH on L 1(X)L^1(X) is infinitesimal stochastic iff it’s defined on the whole space,

Hψ=0 \int H \psi = 0

for all ψL 1(X)\psi \in L^1(X), and the matrix of HH in terms of the obvious basis obeys

H ij0 H_{i j} \ge 0

for all jij \ne i. Given a Markov semigroup U(t)U(t) on L 1(X)L^1(X), we can always write

U(t)=exp(tH) U(t) = \exp(t H)

for a unique infinitesimal stochastic operator HH, where

exp(tH)ψ= n=0 (tH) nn!ψ \exp(t H) \psi = \sum_{n = 0}^\infty \frac{(t H)^n}{n!} \psi

with the sum converging for all ψ\psi. Conversely, any infinitesimal operator on our space determines a Markov semigroup this way. For all ψL 1(X)\psi \in L^1(X) we then have

ddtU(t)ψ| t=0=Hψ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = H \psi

and if we set

ψ(t)=exp(tH)ψ \psi(t) = \exp(t H) \psi

we have the master equation:

ddtψ(t)=Hψ(t) \frac{d}{d t} \psi(t) = H \psi(t)

In short, time evolution in stochastic mechanics is a lot like time evolution in quantum mechanics, except it’s typically not invertible, and the Hamiltonian is typically not an observable.

Why not? Because we defined an observable to be a function A:XA: X \to \mathbb{R}. We can think of this as giving an operator on L 1(X)L^1(X), namely the operator of multiplication by AA. That’s a nice trick, which we used to good effect last time. However, at least when XX is a finite set, this operator will be diagonal in the obvious basis consisting of functions that equal 1 one point of XX and zero elsewhere. So, it can only be infinitesimal stochastic if it’s zero!

Puzzle 3. If XX is a finite set, show that any operator on L 1(X)L^1(X) that’s both diagonal and infinitesimal stochastic must be zero.

The Hille--Yosida theorem

I’ve now told you everything you really need to know… but not everything I want to say. What happens when XX is not a finite set? What are Markov semigroups like then? I can’t abide letting this obvious question go unresolved! Unfortunately I only know a partial answer.

We can get a certain distance using the Hille-Yosida theorem, which is much more general.

Definition. A Banach space is vector space with a norm such that any Cauchy sequence converges.

Examples include Hilbert spaces like L 2(X)L^2(X) for any measure space, but also other spaces like L 1(X)L^1(X) for any measure space!

Definition. If VV is a Banach space, a 1-parameter semigroup of operators U(t):VVU(t) : V \to V is called a contraction semigroup if it’s strongly continuous and

U(t)ψψ \| U(t) \psi \| \le \| \psi \|

for all t0t \ge 0 and all ψV\psi \in V.

Examples include strongly continuous 1-parameter unitary groups, but also Markov semigroups!

Puzzle 4. Show any Markov semigroup is a contraction semigroup.

The Hille–Yosida theorem generalizes Stone’s theorem to contraction semigroups. In my misspent youth, I spent a lot of time carrying around Yosida’s book Functional Analysis. Furthermore, Einar Hille was the advisor of my thesis advisor, Irving Segal. Segal generalized the Hille–Yosida theorem to nonlinear operators. I used this generalization a lot back when I studied nonlinear partial differential equations. So, I feel compelled to tell you this theorem:

Hille-Yosida Theorem. Given a contraction semigroup U(t)U(t) we can always write

U(t)=exp(tH) U(t) = \exp(t H)

for some closed and densely defined operator HH such that HλIH - \lambda I has a bounded inverse for all λ>0\lambda \gt 0 and

(HλI) 1ψ1λψ \displaystyle{ \| (H - \lambda I)^{-1} \psi \| \le \frac{1}{\lambda} } \| \psi \|

for all ψv\psi \in v. Conversely, any such operator determines a strongly continuous 1-parameter group. For all vectors ψ\psi for which HψH \psi is well-defined, we have

ddtU(t)ψ| t=0=Hψ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = H \psi

Moreover, for any of these vectors, if we set

ψ(t)=U(t)ψ \psi(t) = U(t) \psi

we have

ddtψ(t)=Hψ(t) \frac{d}{d t} \psi(t) = H \psi(t)

If you like, you can take the stuff at the end of this theorem to be what we mean by saying U(t)=exp(tH)U(t) = \exp(t H).

But now suppose V=L 1(X)V = L^1(X). What extra conditions on HH are necessary and sufficient for exp(tH)\exp(t H) to be a Markov semigroup? In other words, what’s a definition of ‘infinitesimal stochastic operator’ that’s suitable not only when XX is a finite set, but an arbitrary measure space?

I asked this question on Mathoverflow a few months ago, and so far the answers have not been completely satisfactory. Some people mentioned the Hille–Yosida theorem, which is surely a step in the right direction, but not the full answer.

Others discussed the special case when exp(tH)\exp(t H) extends to a bounded self-adjoint operator on L 2(X)L^2(X). When XX is a finite set, this special case happens precisely when the matrix H ijH_{i j} is symmetric: the probability of hopping from jj to ii equals the probability of hopping from ii to jj. This is a fascinating special case, not least because when HH is both infinitesimal stochastic and self-adjoint, we can use it as a Hamiltonian for both stochastic mechanics and quantum mechanics! However, it’s just a special case.

After grabbing people by the collar and insisting that I wanted to know the answer to the question I actually asked—not some vaguely similar question— the best answer seems to be Martin Gisser’s reference to this book:

• Zhi-Ming Ma and Michael Röckner, Introduction to the Theory of (Non-Symmetric) Dirichlet Forms, Springer, Berlin, 1992.

However, as best as I can tell, this does not answer my question in general, but only when the skew-symmetric part of HH is dominated (in a certain sense) by the symmetric part.

So, I’m stuck on this front, but that needn’t bring the whole project to a halt. We’ll just sidestep this question.

For a good well-rounded introduction to Markov semigroups and what they’re good for, try:

• Ryszard Rudnicki, Katarzyna Pichór and Marta Tyran-Kamínska, Markov semigroups and their applications.

category: blog