Birkhoff's Ergodic Theorem and Permutations

Satyadev Nandakumar
Fri Nov 4 13:17:58 2016

This article is about a trivial observation about Birkhoff's Ergodic Theorem which I have not seen stated explicitly elsewhere, but might be helpful to understand the definition of ergodicity. First, the definition and the statement of the Ergodic Theorem.

Let (Ω F, P) be a probability space. A transformation T: Ω → Ω is said to preserve measure if P(T^-1(A)) = P(A) for every measurable set.

A measure-preserving transformation is "ergodic" if every invariant set (i.e. A = T^-1) has measure 0 or 1. Equivalently, if P(A) is strictly between 0 and 1, then T^-1A is different from A. (Think of this as saying that T must move subsets of Ω around in a non-trivial manner.)

If you have such an ergodic system, then the Birkhoff ergodic theorem says that for any P-integrable function f, the average [(f(x)+f(Tx)+f(T²x)+ ... +f(T^n-1x))/n] converges to the expectation of f, for almost every x in Ω

The definition and the conclusion are slightly opaque at first sight, and it is not clear why ergodicity is a sufficient condition for the Ergodic Theorem. The point of the article is to give a simple, finite example which illustrates how the theorem follows from the definition of an ergodic transformation.

An Example in a finite setting

Let Ω be the set of the first k natural numbers, and T be a transitive permutation - that is, a permutation which consists of a single cycle - for example, if k=4, the permutation which sends (1,2,3,4) to (2,4,3,1).

In a transitive permutation P, if we follow the "orbit" of any element, say 1, it will take k steps before we come back to the same element. As a transformation, a permutation preserves the measure. It can also be verified that any non-empty proper subset A of {1, 2, ..., k} will be changed by the permutation : T(A) ≠ A.

Thus a transitive permutation is ergodic.

Suppose we take any real-valued (or complex-valued) function f defined on the k elements. The " time averge " of f at position i, where 1 ≤ i ≤ k, is

                                        (n-1) 
 lim            f(i)+f(P(i)) + ... + f(P     (i))
              ------------------------------------ 
n -> oo                        n

For any n, we can write the above n terms in the numerator as a multiple of k and some remaining terms, numbering at most (k-1) - i.e. n = km + r, where 0 ≤ r ≤ k-1. Starting from any number i, k iterations of the permutation P(i), P(P(i)), ..., P^k(i) enumerates {1, 2, ..., k} in some order where each element is enumerated exactly once. Hence the sum of f evaluated at these iterates is equal to f(1)+...+f(k-1). Let us call this sum as S(f). Note that S(f) = kE(f), where E(f) is the average of f over the k elements. Then the above term can be upper bounded by

                                       
 lim            mk E(f) +  (k-1) max(f(1), ..., f(k-1))
              ----------------------------------------
n -> oo                        n

and lower bounded by

                                       
 lim            mk E(f) +  (k-1) min(f(1), ..., f(k-1))
              ----------------------------------------
n -> oo                        n

The difference between these terms goes to 0 as n goes to ∞ since the excess term in the upper bound does not depend on n. Moreover, mk/n converges to 1 as n grows large.

Thus the "time average" at any point in Ω converges to the "space average" E(f). This establishes the Ergodic Theorem in the case of permutations.

Questions? Comments? Insults? e-mail me.