# The Azimuth Project Blog - network theory (part 11)

This page is a blog article in progress, written by John Baez. To see discussions this page as it was being written, go to the Azimuth Forum. For the final polished version, visit the Azimuth Blog.

jointly written with Brendan Fong

Noether proved lots of theorems, but when people talk about Noether's theorem, they always seem to mean her result linking symmetries to conserved quantities. Her original result applied to classical mechanics, but today we’d like to present a version that applies to ‘stochastic mechanics’—or in other words, Markov processes.

What’s a Markov process? We’ll say precisely in a minute—but in plain English, it’s a physical system where something hops around randomly from state to state, where its probability of hopping anywhere depends only on where it is now, not its past history. Markov processes include, as a special case, the stochastic Petri nets we’ve been talking about.

Our stochastic version of Noether’s theorem is copied after a well-known quantum version. It’s yet another example of how we can exploit the analogy between stochastic mechanics and quantum mechanics. But for now we’ll just present the stochastic version. Next time we’ll compare it to the quantum one.

#### Markov processes

We should and probably will be more general, but let’s start by considering a finite set of states, say $X$. To describe a Markov process we then need a matrix of real numbers $H=\left({H}_{ij}{\right)}_{i,j\in X}$. The idea is that if $i\ne j$, the number ${H}_{ij}$ describes the probability per unit time of hopping from the state $j$ to the state $i$. So, if ${\psi }_{i}\left(t\right)$ is the probability of being in the state $i$ at time $t$, we want the master equation to hold:

$\frac{d}{dt}{\psi }_{i}\left(t\right)=\sum _{j\in X}{H}_{ij}{\psi }_{j}\left(t\right)$\frac{d}{d t} \psi_i(t) = \sum_{j \in X} H_{i j} \psi_j(t)

This motivates the definition of ‘infinitesimal stochastic’, which we recall from Part 5:

Definition. Given a finite set $X$, a matrix of real numbers $H=\left({H}_{ij}{\right)}_{i,j\in X}$ is infinitesimal stochastic if

$i\ne j⇒{H}_{ij}\ge 0$i \ne j \implies H_{i j} \ge 0

and

$\sum _{i\in X}{H}_{ij}=0$\sum_{i \in X} H_{i j} = 0

for all $j\in X$.

The inequality says that the probability per unit time of hopping from one place to another is nonnegative. The equation says that probabilities are conserved. Together, they imply that the probability of staying in the same place goes down:

${H}_{ii}\le 0$H_{i i} \le 0

Using the magic of matrix multiplication, we can rewrite the master equation as follows:

$\frac{d}{dt}\psi \left(t\right)=H\psi \left(t\right)$\frac{d}{d t} \psi(t) = H \psi(t)

and we can solve it like this:

$\psi \left(t\right)=\mathrm{exp}\left(tH\right)\psi \left(0\right)$\psi(t) = \exp(t H) \psi(0)

If $H$ is an infinitesimal stochastic operator we call $\mathrm{exp}\left(tH\right)$ a Markov process, and $H$ its Hamiltonian.

Noether’s theorem is about ‘conserved quantities’, that is, observables whose expected values don’t change with time. To understand this theorem, you need to know a bit about observables. In stochastic mechanics an observable is simply a function assigning a number ${O}_{i}$ to each state $i\in X$.

However, in quantum mechanics we often think of observables as matrices, so it’s nice to do that here, too. It’s easy: we just create a matrix whose diagonal entries are the values of the function $O$. And just to confuse you, we’ll call this matrix $O$. So:

${O}_{ij}=\left\{\begin{array}{ccl}{O}_{i}& \mathrm{if}& i=j\\ 0& \mathrm{if}& i\ne j\end{array}$O_{i j} = \left\{ \begin{array}{ccl} O_i & if & i = j \\ 0 & if & i \ne j \end{array} \right.

One advantage of this trick is that it lets us ask whether an observable commutes with the Hamiltonian. Remember, the commutator of matrices is defined by

$\left[O,H\right]=OH-HO$[O,H] = O H - H O

Noether’s theorem will say that $\left[O,H\right]=0$ if and only if $O$ is ‘conserved’ in some sense. What sense? First, recall that a stochastic state is just our fancy name for a probability distribution $\psi$ on the set $X$. Second, the expected value of an observable $O$ in the stochastic state $\psi$ is defined to be

$\sum _{i\in X}{O}_{i}{\psi }_{i}$\sum_{i \in X} O_i \psi_i

In Part 5 we introduced the notation

$\int \varphi =\sum _{i\in X}{\varphi }_{i}$\int \phi = \sum_{i \in X} \phi_i

for any function $\varphi$ on $X$. The reason for this notation is that later, when we generalize $X$ from a finite set to a measure space, the sum at right will become an integral over $X$. A sum is just a special sort of integral!

Using this notation and the magic of matrix multiplication, we can write the expected value of $O$ in the stochastic state $\psi$ as

$\int O\psi$\int O \psi

We can calculate how this changes in time if $\psi$ obeys the master equation… and we can write the answer using the commutator $\left[O,H\right]$!

Lemma. Suppose $H$ is an infinitesimal stochastic operator and $O$ is an observable. If $\psi \left(t\right)$ obeys the master equation, then

$\frac{d}{dt}\int O\psi \left(t\right)=\int \left[O,H\right]\psi \left(t\right)$\frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t)

Proof. Using the master equation we have

(1)$\frac{d}{dt}\int O\psi \left(t\right)=\int O\frac{d}{dt}\psi \left(t\right)=\int OH\psi \left(t\right)$\frac{d}{d t} \int O \psi(t) = \int O \frac{d}{d t} \psi(t) = \int O H \psi(t)

But since $H$ is infinitesimal stochastic,

$\sum _{i\in X}{H}_{ij}=0$\sum_{i \in X} H_{i j} = 0

so for any function $\varphi$ on $X$ we have

$\int H\varphi =\sum _{i,j\in X}{H}_{ij}{\varphi }_{j}=0$\int H \phi = \sum_{i, j \in X} H_{i j} \phi_j = 0

and in particular

(2)$\int HO\psi \left(t\right)=0$\int H O \psi(t) = 0

Since $\left[O,H\right]=OH-HO$, we conclude from (1) and (2) that

$\frac{d}{dt}\int O\psi \left(t\right)=\int \left[O,H\right]\psi \left(t\right)$\frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t)

as desired. █

The commutator doesn’t look like it’s doing much here, since we also have

$\frac{d}{dt}\int O\psi \left(t\right)=\int OH\psi \left(t\right)$\frac{d}{d t} \int O \psi(t) = \int O H \psi(t)

which is even simpler. But the commutator will become useful when we get to Noether’s theorem!

#### Noether's theorem

Here’s a version of Noether’s theorem for Markov processes. It says an observable commutes with the Hamiltonian iff the expected values of that observable and its square don’t change as time passes:

Theorem. Suppose $H$ is an infinitesimal stochastic operator and $O$ is an observable. Then

$\left[O,H\right]=0$[O,H] =0

if and only if

$\frac{d}{dt}\int O\psi \left(t\right)=0$\frac{d}{d t} \int O\psi(t) = 0

and

$\frac{d}{dt}\int {O}^{2}\psi \left(t\right)=0$\frac{d}{d t} \int O^2\psi(t) = 0

for all $\psi \left(t\right)$ obeying the master equation.

If you know Noether’s theorem from quantum mechanics, you might be surprised that in this version we need not only the observable but also its square to have an unchanging expected value! We’ll explain this, but first let’s prove the theorem.

Proof. The easy part is showing that if $\left[O,H\right]=0$ then $\frac{d}{dt}\int O\psi \left(t\right)=0$ and $\frac{d}{dt}\int {O}^{2}\psi \left(t\right)=0$. In fact there’s nothing special about these two powers of $t$; we’ll show that

$\frac{d}{dt}\int {O}^{n}\psi \left(t\right)=0$\frac{d}{d t} \int O^n \psi(t) = 0

for all $n$. The point is that since $H$ commutes with $O$, it commutes with all powers of $O$:

$\left[{O}^{n},H\right]=0$[O^n, H] = 0

So, applying the Lemma to the observable ${O}^{n}$ we see

$\frac{d}{dt}\int {O}^{n}\psi \left(t\right)=\int \left[{O}^{n},H\right]\psi \left(t\right)=0$\frac{d}{d t} \int O^n \psi(t) = \int [O^n, H] \psi(t) = 0

The backward direction is a bit trickier. We now assume that

$\frac{d}{dt}\int O\psi \left(t\right)=\frac{d}{dt}\int {O}^{2}\psi \left(t\right)=0$\frac{d}{d t} \int O\psi(t) = \frac{d}{d t} \int O^2\psi(t) = 0

for all solutions $\psi \left(t\right)$ of the master equation. This implies

$\int OH\psi \left(t\right)=\int {O}^{2}H\psi \left(t\right)=0$\int O H\psi(t) = \int O^2H\psi(t) = 0

or since this holds for all solutions,

(3)$\sum _{i\in X}{O}_{i}{H}_{ij}=\sum _{i\in X}{O}_{i}^{2}{H}_{ij}=0$\sum_{i \in X} O_i H_{i j} = \sum_{i \in X} O_i^2H_{i j} = 0

We wish to show that $\left[O,H\right]=0$.

First, recall that we can think of $O$ is a diagonal matrix with:

${O}_{ij}=\left\{\begin{array}{ccl}{O}_{i}& \mathrm{if}& i=j\\ 0& \mathrm{if}& i\ne j\end{array}$O_{i j} = \left\{ \begin{array}{ccl} O_i & if & i = j \\ 0 & if & i \ne j \end{array} \right.

So, we have

$\left[O,H{\right]}_{ij}=\left(OH-HO{\right)}_{ij}=\sum _{k\in S}\left({O}_{ik}{H}_{kj}-{H}_{ik}{O}_{ij}\right)={O}_{i}{H}_{ij}-{H}_{ij}{O}_{j}=\left({O}_{i}-{O}_{j}\right){H}_{ij}$[O,H]_{i j} = (O H - H O)_{i j} = \sum_{k \in S} (O_{i k}H_{k j} - H_{i k} O_{i j}) = O_i H_{i j} - H_{i j}O_j = (O_i-O_j)H_{i j}

To show this is zero for each pair of elements $i,j\in X$, it suffices to show that when ${H}_{ij}\ne 0$, then ${O}_{j}={O}_{i}$. That is, we need to show that if the system can move from state $j$ to state $i$, then the observable takes the same value on these two states.

In fact, it’s enough to show that this sum is zero for any $j\in X$:

$\sum _{i\in X}\left({O}_{j}-{O}_{i}{\right)}^{2}{H}_{ij}$\sum_{i \in X} (O_j-O_i)^2 H_{i j}

Why? When $i=j$, ${O}_{j}-{O}_{i}=0$, so that term in the sum vanishes. But when $i\ne j$, $\left({O}_{j}-{O}_{i}{\right)}^{2}$ and ${H}_{ij}$ are both non-negative—the latter because $H$ is infinitesimally stochastic. So if they sum to zero, they must each be individually zero. Thus for all $i\ne j$, we have $\left({O}_{j}-{O}_{i}{\right)}^{2}{H}_{ij}=0$. But this means that either ${O}_{i}={O}_{j}$ or ${H}_{ij}=0$, which is what we need to show.

So, let’s take that sum and expand it:

$\begin{array}{ccl}\sum _{i\in X}\left({O}_{j}-{O}_{i}{\right)}^{2}{H}_{ij}& =& \sum _{i}\left({O}_{j}^{2}{H}_{ij}-2{O}_{j}{O}_{i}{H}_{ij}+{O}_{i}^{2}{H}_{ij}\right)\\ & =& {O}_{j}^{2}\sum _{i}{H}_{ij}-2{O}_{j}\sum _{i}{O}_{i}{H}_{ij}+\sum _{i}{O}_{i}^{2}{H}_{ij}\end{array}$\begin{array}{ccl} \sum_{i \in X} (O_j-O_i)^2 H_{i j} &=& \sum_i (O_j^2 H_{i j}- 2O_j O_i H_{i j} +O_i^2 H_{i j}) \\ &=& O_j^2\sum_i H_{i j} - 2O_j \sum_i O_i H_{i j} + \sum_i O_i^2 H_{i j} \end{array}

The three terms here are each zero: the first because $H$ is infinitesimal stochastic, and the latter two by equation (3). So, we’re done! █

#### Markov chains

So that’s the proof… but why do we need both $O$ and its square to have an expected value that doesn’t change with time to conclude $\left[O,H\right]=0$? There’s an easy counterexample if we leave out the condition involving ${O}^{2}$. However, the underlying idea is clearer if we work with Markov chains instead of Markov processes.

In a Markov process, time passes by continuously. In a Markov chain, time comes in discrete steps! We get a Markov process by forming $\mathrm{exp}\left(tH\right)$ where $H$ is an infinitesimal stochastic operator. We get a Markov chain by forming the operators $U,{U}^{2},{U}^{3},\dots$ where $U$ is a ‘stochastic operator’. Remember:

Definition. Given a finite set $X$, a matrix of real numbers $U=\left({U}_{ij}{\right)}_{i,j\in X}$ is stochastic if

${U}_{ij}\ge 0$U_{i j} \ge 0

for all $i,j\in X$ and

$\sum _{i\in X}{U}_{ij}=1$\sum_{i \in X} U_{i j} = 1

for all $j\in X$.

The idea is that $U$ describes a random hop, with ${U}_{ij}$ being the probability of hopping to the state $i$ if you start at the state $j$.

Any stochastic operator gives rise to a Markov chain $U,{U}^{2},{U}^{3},\dots$. And in case it’s not clear, that’s how we’re defining a Markov chain: the sequence of powers of a stochastic operator. There are other definitions, but they’re equivalent.

We can draw a Markov chain by drawing a bunch of states and arrows labelled by transition probabilities, which are the matrix elements ${U}_{ij}$:

Here is Noether’s theorem for Markov chains:

Theorem. Suppose $U$ is a stochastic operator and $O$ is an observable. Then

$\left[O,U\right]=0$[O,U] =0

if and only if

$\int OU\psi =\int O\psi$\int O U \psi = \int O \psi

and

$\int {O}^{2}U\psi =\int {O}^{2}\psi$\int O^2 U \psi = \int O^2 \psi

for all stochastic states $\psi$.

In other words, an observable commutes with $U$ iff the expected values of that observable and its square don’t change when we evolve our state one time step using $U$.

You can probably prove this theorem by copying the proof for Markov processes:

Puzzle. Prove Noether’s theorem for Markov chains.

But let’s see why we need the condition on the square of observable! That’s the intriguing part. Here’s a nice little stochastic operator $U$:

where we haven’t drawn arrows labelled by 0. So, state 1 has a 50% chance of hopping to state 0 and a 50% chance of hopping to state 2; the other two states just sit there. Now, consider the observable $O$ with

${O}_{i}=i$O_i = i

It’s easy to check that the expected value of this observable doesn’t change with time:

$\int OU\psi =\int O\psi$\int O U \psi = \int O \psi

for all $\psi$. The reason, in plain English, is this. Nothing at all happens if you start at states 0 or 2: you just sit there, so the expected value of $O$ doesn’t change. If you start at the state 1, the observable equals 1. You then have a 50% chance of going to a state where the observable equals 0 and a 50% chance of going to a state where it equals 2, so its expected value doesn’t change: it still equals 1.

On the other hand, we do not have $\left[O,U\right]=0$ in this example, because we can hop between states where $O$ takes different values. Furthermore,

$\int {O}^{2}U\psi \ne \int {O}^{2}\psi$\int O^2 U \psi \ne \int O^2 \psi

After all, if you start at the state 1, ${O}^{2}$ equals 1 there. You then have a 50% chance of going to a state where ${O}^{2}$ equals 0 and a 50% chance of going to a state where it equals 4, so its expected value changes!

So, that’s why $\int OU\psi =\int O\psi$ for all $\psi$ is not enough to guarantee $\left[O,U\right]=0$. The same sort of counterexample works for Markov processes, too.

Finally, we should add that there’s nothing terribly sacred about the square of the observable. For example, we have:

Theorem. Suppose $H$ is an infinitesimal stochastic operator and $O$ is an observable. Then

$\left[H,O\right]=0$[H,O] =0

if and only if

$\frac{d}{dt}\int f\left(O\right)\psi \left(t\right)=0$\frac{d}{d t} \int f(O) \psi(t) = 0

for all smooth $f:ℝ\to ℝ$ and all $\psi \left(t\right)$ obeying the master equation.

Theorem. Suppose $U$ is a stochastic operator and $O$ is an observable. Then

$\left[O,U\right]=0$[O,U] =0

if and only if

$\int f\left(O\right)U\psi =\int f\left(O\right)\psi$\int f(O) U \psi = \int f(O) \psi

for all smooth $f:ℝ\to ℝ$ and all stochastic states $\psi$.

These make the ‘forwards direction’ of Noether’s theorem stronger… and in fact, the forward version, while easier, is probably more useful! However, if we ever use Noether’s theorem in the ‘reverse direction’, it might be easier to check a condition involving only $O$ and its square.

category: blog