Friday, May 29, 2015

Basic ergodic theory

Consider a ''random'' process where the next state depends on the previous one but in a chaotic manner - say we consider a particle moving in a box. One would expect that as time passes, the trajectory of the particle fills the entire cube, and does so in a uniform way, so that different subsets are visited with frequency comparable to their volume. The path of the particle would then be called ergodic. Ergodic theory offers tools for analyzing dynamical systems, in particular as the time parameter of the system goes to infinity. Central questions are whether any subset of positive measure in our measure space is visited infinitely often, and whether the process is equidistributed, meaning that the measure of the subset tells the probability that the process lies in the subset at a given time.

In this post, we present the basics of ergodic theory, and use them to prove a strong version of Weyl's equidistribution result for the numbers $\gamma,2\gamma,3\gamma...$ modulo $1$ with $\gamma$ irrational. Another application of ergodic theory we present is the fact that almost all real numbers are normal in the sense that each finite sequence of digits occurs with the expected frequency, in each base. We also sketch a proof of the law of large numbers, assuming merely that the iid. random variables have an expectation. We require only the very basics of measure theory.


We follow Walter's book Ergodic Theory - Introductory Lectures. Here and in what follows, let $(X,\mathcal{F},\mu)$ be a measure space and $T:X\to X$ any map. The main question in ergodic theory is what guarantees that the iterates $T^k=T\circ...\circ T$ behave in a sense randomly. This should mean that their time average with respect to any $\mu$-measurable function $f$, that is
exists and converges to the spatial average
as $n\to \infty$. When this happens, intuitively we can think of $T$ forgetting its initial state $x$ as the iteration proceeds. Such processes naturally arise in probability theory and statistics, and this is indeed where the term ergodicity was coined. For example, the law of large numbers in probability theory is one consequence of the ergodic theorem that we present. The map $T$ might also be connected to a fractal, in which case one could analyze the structure of the fractal in this way. Moreover, ergodic theory can be applied to number theory as well, since for example irrational numbers exhibit behavior that seems chaotic.

In order for the the time averages to converge to the spatial average, we demand some assumptions on $T$. We will need that $T$ is measure-preserving and ergodic. These concepts are defined next.

Measure-preserving maps

A central concept in ergodic theory are measure-preserving maps.

Definition. We say that a map $T:X\to X$ is measure-preserving (w.r.t $\mu$) if $T$ is measurable and $\mu(T^{-1}(A))=\mu(A)$ for all measurable $A$, where $T^{-1}(A)$ is the inverse image of $A$.

At first it might seem more logical to require that $\mu(T(A))=\mu(A)$ for all measurable $A$, but (besides the fact that $T(A)$ is not necessarily measurable) the present definition is more useful, as for example tells that
for all measurable $g:X\to \mathbb{R}$. To prove this, notice that it holds for characteristic functions, then linear combinations of characteristic functions, and finally use the monotone convergence theorem along with the fact that any nonnegative measurable function is the monotone limit of a sequence of linear combinations of characteristic functions.

When one wants to check that a map $T$ is measure-preserving, it actually suffices to show $\mu(T^{-1}(A))=\mu(A)$ for all sets $A\in \mathcal{G}$, where $\mathcal{G}$ generates the sigma algebra of $\mu$-measurable sets. Indeed, sets that are measure-preserving clearly form a sigma algebra themselves. Next we present some concrete examples of measure-preserving maps, some of which actually arise from number theory.

Example 1. Let our measure space be $X=\mathbb{R}^n$ endowed with the Lebesgue measure $m_n$. Then the translation $T(x)=x+y_0$ is measure-preserving (because it preserves the measures of intervals).

Example 2. Let $X=\{z\in \mathbb{C}:|z|=1\}$ be the complex unit circle. A natural measure in $X$ is the Lebesgue measure along the arc (that is, we identify $X$ with $[0,2\pi)$ where one can use the ordinary Lebesgue measure). Now the rotation map $T(z)=az$ with $a\in X$ is measure-preserving (since the measures of arcs remain invariant).

Example 3. Let $X=[0,1)$ equipped with the Lebesgue measure and $T(x)=2x \pmod 1$. Then $T$ is measure-preserving. Indeed, we have $T^{-1}(A)=\frac{1}{2}A\cup (\frac{1}{2}A+\frac{1}{2})$, where translation and scaling of sets is defined as a pointwise operation. Now $m(T^{-1}(A))=m(\frac{1}{2}A)+m(\frac{1}{2}A+\frac{1}{2})=\frac{1}{2}m(A)+\frac{1}{2}m(A)=m(A)$.

Example 4. Again let $X=\{z\in \mathbb{C}:|z|=1\}$ with the same measure, and let $T(z)=z^n$, where $n$ is a positive integer. Then $T$ is measure-preserving. To see this, note that equivalently $x\mapsto nx \pmod 1$ is measure-preserving on $[0,1)$. Now apply a similar argument as with $n=2$ above.

Example 5. Let $X=\mathbb{R}^2$ with the Lebesgue measure $m_2$. The map $T(x)=Ax$, where $A$ is a $2\times 2$ matrix of determinant one, preserves $m_2$. Indeed, it is easy to see that such maps preserve the area of any rectangle, so the claim follows.

Example 6. Let $X=[0,1)$ with the Lebesgue measure, but take $\mu$ to be the Gauss measure $\mu(A)=\frac{1}{\log 2}\int_{A}\frac{dx}{1+x}$. Then the map
\[\begin{eqnarray}T(x)=\begin{cases}0,\quad x=0\\\frac{1}{x}\pmod 1,x\neq 0,\end{cases}\end{eqnarray}\]
preserves $\mu$. This is seen as follows. For any $[a,b]\subset (0,1)$, we have
\[\begin{eqnarray}\mu(T^{-1}(A))&=&\frac{1}{\log 2}\sum_{n=1}^{\infty}\log \frac{1+\frac{1}{n+a}}{1+\frac{1}{n+b}}\\&=&\frac{1}{\log 2}\lim_{N\to \infty}\log \prod_{n=1}^{N}\left(\frac{n+a+1}{n+a}\cdot \frac{n+b}{n+b+1}\right)\\&=&\frac{1}{\log 2}\lim_{N\to \infty}\log\frac{(M+a+1)(b+1)}{(a+1)(M+b+1)}\\&=&\frac{1}{\log 2}(\log(b+1)-\log(a+1))=\mu([a,b]),\end{eqnarray}\]
as wanted. The map $T$ can be applied to prove Khinchin's celebrated theorem stating for almost all numbers the geometric mean of the coefficients in their continued fraction expansion approaches a certain constant (which is defined as a strange-looking infinite product).


We are now ready to define the concept of ergodicity. Let $\mu(X)<\infty$ and $T:X\to X$ be a measure-preserving map, and further assume that $T^{-1}(A)=A$ for some measurable $A$ with $0<\mu(A)<\mu(X)$. Then $T$ maps the subspace $A$ into itself, and nothing else gets mapped into $A$. In other words, $A$ does not mix the space $T$ well, which is the intuitive requirement for ergodicity, if we for example think of a particle moving randomly in a box. Even if we wanted to give up on this intuitive requirement of randomness, maps such as $T$ above would give us nothing new, since we could study the restrictions $T|_A$ and $T|_{X\setminus A}$ separately. This leads us to the definition of ergodicity.

Definition. A measure-preserving map $T:X\to X$ (with $\mu(X)<\infty$) is ergodic if $T^{-1}(A)=A$ implies $\mu(A)=0$ or $\mu(A)=\mu(X)$.

As such, ergodicity is a difficult property to verify. The following lemma characterizes ergodicity in terms of measurable maps, and leads to an easier criterion for checking it.

Lemma 1. Let $T:X\to X$ be measure-preserving with respect to $\mu$. The following are equivalent.
(i) $T$ is ergodic
(ii) Whenever $f:X\to \mathbb{R}$ is measurable and $f(T(x))=f(x)$ for $\mu$-almost all $x$, $f$ is constant almost everywhere.
(iii) Whenever $f\in L^2(\mu)$ and $f(T(x))=f(x)$ for $\mu$-almost all $x$, $f$ is constant almost everywhere.

Proof. Clearly $(ii)$ implies $(iii)$. We prove first that $(i)$ implies $(ii)$. Let $X_{n}^k=\{x\in X:\frac{k}{n}\leq f(x)<\frac{k+1}{n}\}$ when $k,n\in \mathbb{Z}$ and $n>0$. If $f(T(x))=f(x)$ for some $x$, then $x$ belongs to both $X_n^k$ and $T^{-1}(X_{n}^k)$ or to neither one of them. Now if $\Delta$ denotes the symmetric difference, we have $T^{-1}(X_n^k)\Delta X_n^k\subset \{x\in X:f(T(x))\neq f(x)\}$, and the latter set has measure zero. Next we show with the help of $(i)$ that if $\mu(T^{-1}(B)\Delta B)=0$ for some set $B$, we have $\mu(B)\in \{0,\mu(X)\}$. Defining
\[\begin{eqnarray}B_0=\bigcap_{n=0}^{\infty} \bigcup_{i\geq n}^{\infty}T^{-i}(B),\end{eqnarray}\]
which is the set of points belonging to infinitely many sets $T^{-i}(B)$, we have $T^{-1}(B_0)=B_0$, so $\mu(B_0)\in \{0,\mu(X)\}$. On the other hand, $B\subset B_0$ and $B_0\setminus B$ has measure zero using $\mu(T^{-1}(B)\Delta B)=0$, so $\mu(B)\in {0,\mu(X)}$. As
\[\begin{eqnarray}X=\bigcup_{k\in \mathbb{Z}}X_n^k\end{eqnarray}\]
is a disjoint union, we conclude that for every $n$ there is a unique $k_n$ such that $\mu(X_n^{k_n})=\mu(X)$; for other value of $k$ we have $\mu(X_n^k)=0$. Lastly, let
Then $\mu(Y)=\mu(X)$ and $f$ is constant on $Y$, so it is constant almost everywhere.

To show that $(iii)$ implies $(i)$, suppose $T^{-1}(A)=A$ for a measurable set $A$. Then $1_A\in L^2(\mu)$ and $1_A(T(x))=1_A(x)$ for $\mu$-almost all $x$. Therefore, $1_A$ is constant almost everywhere, so $\mu(A)\in {0,\mu(X)}$.■

Example 7. As in Examples 2 and 4, let $X$ be the unit circle in the complex plane and $m$ the Lebesgue arc measure. Then $T(z)=az$ is ergodic if and only if $a\in X$ is not a root of unity.

Proof. First suppose $a^k=1$ for an integer $k\geq 0$. We make use of lemma 1. If $f(z)=z^k$, then $f$ is measurable, non-constant and $f\circ T=f$, so $T$ is not ergodic.

Now let $f(T(x))=f(x)$ for almost all $x$ and some $f\in L^2(m)$. It is known (see e.g. this post) that $f$ can be represented as a Fourier series
where the convergence is in $L^2$ norm. By assumption, also
Now Parseval's identity (see this post) tells that the two Fourier series must have equal coefficients, so $a_n(a^n-1)=0$ for all $n$, so $a_n=0$ for $n\neq 0$ (as $a$ is not a root of unity), so $f$ is constant almost everywhere. Hence $T$ is ergodic.■

Example 8. Let $X$ and $m$ be as before and $T(z)=z^k$ with $k\geq 2$ an integer. Then $T$ is ergodic.

Proof. Suppose $f(T(x))=f(x)$ for some non-constant $f\in L^2(m)$. Again comparing the Fourier series, we see that if $(a_n)$ are the Fourier coefficients of $f$, then $a_n=a_{kn}=a_{k^2n}=...$. Now if $a_n\neq0$ and $n\neq 0$, we have an infinite sequence of equal Fourier coefficients, which contradicts the fact that $\sum_{n=-\infty}^{\infty}|a_n|^2$ converges. Therefore $f$ is constant and $T$ ergodic.■

Example 9. Let $X=[0,1)$ with $m$ the Lebesgue measure. From the previous example it follows that the maps $T(x)=x+\gamma\pmod 1$ and $T(x)=kx\pmod 1$ ( $k>0$ integer) are ergodic precisely when $\gamma$ is irrational and $k\geq 2$.

Birkhoff's ergodic theorem

We are now in a position to prove Birkhoff's ergodic theorem from 1931, which is a cornerstone in the theory.

Theorem 2 (Birkhoff's ergodic theorem). Let $f\in L^1(\mu)$, and let $T:X\to X$ be measure-preserving, with $X$ $\sigma$-finite (that is, $X$ is a countable union of sets with finite measure). Then for $\mu$-almost all $x\in X$ we have
\[\begin{eqnarray}\lim_{n\to \infty}\frac{1}{n}\sum_{k=0}^{n-1}f(T^k(x))=\bar{f}(x),\end{eqnarray}\]
where $\bar{f}\in L^1(\mu)$ (in particular, $\bar{f}$ is finite almost everywhere). In addition, $\bar{f}$ satisfies $\bar{f}\circ T=\bar{f}$. If we assume also that $0<\mu(X)<\infty$, we have $\int_X f d\mu=\int_X \bar{f}d\mu$. If $T$ is also ergodic, we have
\[\begin{eqnarray}\bar{f}(x)=\frac{1}{\mu(X)}\int_X f d\mu.\end{eqnarray}\]

Note that Birhoff's ergodic theorem shows that the time and spatial averages agree for measure-preserving transformations, as anticipated. Taking $f(x)=1_A(x)$, where $A$ is measurable, tells that ergodicity guarantees that each subset $A\subset X$ is visited by proportion $\frac{\mu(A)}{\mu(x)}$ of the time. It is not clear though in the first place that the time average even exists for a general integrable function, and this is actually the hardest part of the proof.

Proof of Birkhoff's ergodic theorem. Consider the quantities
\[\begin{eqnarray}f^*(x)=\limsup_{n\to \infty}\frac{1}{n}\sum_{k=0}^{n-1}f(T^k(x)) \hspace{0.2cm}\text{and}\hspace{0.2cm}f_*(x)=\liminf_{n\to \infty}\frac{1}{n}\sum_{k=0}^{n-1}f(T^k(x)),\end{eqnarray}\]
which are natural to consider when one wants to show the existence of a limit. We have
\[\begin{eqnarray}f^*(T(x))-f^*(x)&=&\limsup_n\left(\frac{1}{n}\sum_{k=1}^{n+1} f(T^k(x))-\frac{1}{n}\sum_{k=0}^{n} f(T^k(x))\right)\\&=&\limsup_n \left((1+\frac{1}{n})\frac{1}{n+1}\sum_{k=0}^{n+1} f(T^k(x))-\frac{f(x)}{n}-\frac{1}{n}\sum_{k=0}^{n} f(T^k(x))\right)\\&=&f^*(x)-0-f^*(x)=0,\end{eqnarray}\]
because $\limsup_n(1+\frac{1}{n})a_n=\limsup_n a_n.$ Similarly we see that $f_{*}(T(x))=f_{*}(x)$.

We will show that $f^{*}=f_{*}$, which then tells that the limit function $\bar{f}$ exists and satisfies $\bar{f}\circ T=\bar{f}.$

For all $a<b$, define
\[\begin{eqnarray}E_{a,b}=\{x\in X:f_{*}(x)<a,f^{*}(x)>b\}.\end{eqnarray}\]
If $E$ denotes the set where $f^{*}$ and $f_{*}$ do not agree, then
\[\begin{eqnarray}E= \bigcup_{a<b\atop a,b\in \mathbb{Q}} E_{a,b},\end{eqnarray}\]
and as the union is countable, it suffices to show that $\mu(E_{a,b})=0$. In other words, we are applying the standard measure theory trick that difficult sets can be divided into countably many simpler pieces, and then combined by additivity. We also notice $T^{-1}(E_{a,b})=E_{a,b}$.

The core of Birkhoff's ergodic theorem is in the following lemma, which may at first seem simple, but is quite subtle (in particular, it is equivalent to a later lemma that looks far less obvious).

Lemma 3 (Maximal ergodic theorem). Let $f\in L^1(\mu)$ and define the maximal operator
\[\begin{eqnarray}M_Tf=sup_{n\geq 0}\frac{1}{n}\sum_{k=0}^{n-1}f(T^k(x))\end{eqnarray}\]
(taking $n=0$, we see that $M_Tf\geq 0$). We have
\[\begin{eqnarray}\int_{M_T f>0}fd\mu\geq 0\end{eqnarray}\] 

Proof. We give A. Garsia's ingenious proof from 1965. Define $U_T=g\circ T$. Then $U_T$ is a linear operator in $L^1(\mu)$. We have $U_TM_T f+f=M_Tf$ by the same argument that showed $f^*\circ T=f^*$. Now
\[\begin{eqnarray}\int_{\{M_Tf>0\}} fd\mu &=&\int_{\{M_Tf>0\}} M_Tf d\mu-\int_{\{M_Tf>0\}}U_T M_Tf d\mu\\&=&\int_X M_Tf d\mu-\int_{\{M_Tf>0\}} U_T M_Tf d\mu\\&\geq& \int_X M_Tf d\mu-\int_X U_T M_Tf d\mu=0,\end{eqnarray}\]
as $T$ is measure-preserving and $M_T g\geq 0$ for any $g$.■

We also have the following stronger-looking but equivalent formulation of the maximal ergodic theorem.

Lemma 4 (Maximal ergodic theorem, second version). Let $T:X\to X$ be measure-preserving. If $f\in L^1(\mu)$ and
\[\begin{eqnarray}E_b=\left\{x\in X:\sup_{n\in \mathbb{N}_0}\frac{1}{n}\sum_{k=0}^{n-1}f(T^k(x))>b\right\},\end{eqnarray}\]
\[\begin{eqnarray}\int_{E_b\cap A} f d \mu\geq b \mu(E_b \cap A),\end{eqnarray}\]
when $T^{-1}(A)=A$ and $\mu(A)<\infty$.

Proof. We may assume $X=A$; if the claim is known in this case, we may replace $T$ by the restriction $T|_A$ and $\mu$ by $\mu|_A$. Now $\mu(X)<\infty$, so our claim is equivalent to saying that $E_b=\{M_T(f-b)>0\}$ satisfies $\int_{E_b}(f-b)d\mu\geq 0$, but this is already known.■

Using the first version of the maximal ergodic theorem and sigma-finiteness of $X$, we see that $\mu(E_{a,b})<\infty$. Now applying the second version of the maximal ergodic theorem with $X=E_b=\{x\in X:(M_tf)(x)>b\}$ and $A=E_{a,b}$, as well as $X=E_a,A=E_{a,b}$, we find
\[\begin{eqnarray}a\mu(E_{a,b})\geq \int_{E_{a,b}}fd\mu\geq b\mu(E_b),\end{eqnarray}\]
so $\mu(E_{a,b})=0$ and $\bar{f}$ exists.

Next we show that $\bar{f}\in L^1(\mu)$. Let
\[\begin{eqnarray}g_n(x)=\left| \frac{1}{n}\sum_{k=0}^{n-1}f(T^k(x)) \right|\geq 0.\end{eqnarray}\]
\[\begin{eqnarray}\int_X g_n d\mu&=&\int_X \left| \frac{1}{n}\sum_{k=0}^{n-1}f(T^k(x)) \right| d\mu\\ &\leq& \frac{1}{n}\sum_{k=0}^{n-1}\int_X |f(T^k(x))| d\mu\\&=&\int_X |f| d \mu,\end{eqnarray}\]
as $T$ is measure-preserving. Fatou's lemma then yields
\[\begin{eqnarray}\int_X \liminf_n g_n d\mu\leq \liminf_n \int_X g_n d\mu \leq \int_X |f| d\mu\infty,\end{eqnarray}\]
so $\liminf_n g_n$ is integrable. On the other hand, $\liminf_n g_n=|\bar{f}|$, so $\bar{f}$ is integrable.

Our next task is to show that assuming $\mu(X)<\infty$ we have $\int_X \bar{f}d\mu=\int_{X}fd\mu$. Let $D_{n}^k=\{x\in X:\frac{k}{n}\leq \bar{f}(x)<\frac{k+1}{n}\}$ for all $k\in \mathbb{Z}$ and $n\geq 1$. For any $\varepsilon>0$, we have $D_{n}^k\cap E_{\frac{k}{n}-\varepsilon}=D_n^k$ (with $E_b$ as before). Therefore, the maximal ergodic theorem tells that
\[\begin{eqnarray}\int_{D_n^k}fd\mu\geq \left(\frac{k}{n}-\varepsilon\right)\mu(D_n^k)\to \frac{k}{n}\mu(D_n^k)\end{eqnarray}\]
as $\varepsilon>0.$ The previous inequality and a trivial estimate give
\[\begin{eqnarray}\int_{D_n^k}\bar{f} d \mu\leq \frac{k+1}{n}\mu(D_n^k)\leq \frac{\mu(D_n^k)}{n}+\int_{D_n^k} f d\mu.\end{eqnarray}\]
Summing over $k$ produces
\[\begin{eqnarray}\int_X \bar{f} d \mu\leq \frac{\mu(X)}{n}+\int_X f d \mu.\end{eqnarray}\]
As $n\to \infty$, we get $\int_{X}\bar{f}d\mu\leq \int_X f d\mu$, as $\mu(X)<\infty$. Applying the previous argument to $-\bar{f}$, we see that the integrals are equal.

Lastly, let $T$ be ergodic. In this case $\bar{f}\circ T=\bar{f}$ implies that $\bar{f}$ is constant. We know that $f$ and $\bar{f}$ have the same integral, so
\[\begin{eqnarray}\bar{f}=\frac{1}{\mu(X)}\int_X fd\mu,\end{eqnarray}\]
as desired.■


Taing as our measure-preserving map one of the exampes presented above, Birkhoff's theorem implies many very non-trivial facts, including a strengthening of the classical Weyl's equidistribution theorem. Weyl's theorem states that the sequence $\gamma n \pmod 1$ is equidistributed for $\gamma$ irrational (for rational $\gamma$, this is obviously false).

Theorem 5 (Weyl's equidistribution theorem, strong version). Let $\gamma$ be irrational. Then the sequence $\gamma n\pmod 1$ is equidistributed in the sense that for any Lebesgue measurable $A\subset [0,1)$ we have
\[\begin{eqnarray}\lim_{N\to \infty}\frac{1}{N}|1\leq n\leq N:\gamma n\pmod 1 \in (A+x)\pmod 1|=m(A)\end{eqnarray}\]
for almost all $x$ ($A+x$ is the translation of $A$ by $x$ and $S\pmod 1$ means the set of elements of $S$ reduced modulo $1$).

Proof. We know that $T(x)=(x+\gamma)\pmod 1$ is measure-preserving, and also ergodic when $\gamma$ is irrational, We take $f=1_A$ in Birkhoff's theorem. This gives
\[\begin{eqnarray}\frac{1}{N}\sum_{n=0}^{N-1}1_A((x+n\gamma) \pmod 1)\to m(A)\end{eqnarray}\]
as $N\to \infty$, but this is precisely the claim.■

We remark that taking almost all translates of $A$ is necessary in the claim, as it may fail for $x=0$ (just take $A$ to be the complement of the sequence $n\gamma \pmod 1$). Next we present the ordinary Weyl's equidistribution theorem and derive it from the previous theorem. In the ordinary version, $A$ is just an interval, but we do not need to take translates.

Theorem 6 (Weyl's equidistribution theorem, ordinary version). Let $\gamma$ be irrational. Then the sequence $n\gamma \pmod 1$ is equidistributed in the sense that
\[\begin{eqnarray}\lim_{N\to \infty}\frac{1}{N}|\{1\leq n\leq N:\gamma n\pmod 1 \in [a,b]\}|=b-a\end{eqnarray}\]
for any $a<b,a,b\in [0,1]$.

Proof. Take $A=[a,b]$ in the previous theorem, and let $(x_n)$ be a sequence of admissible translation parameters for $A$ such that $x_n\to 0$ as $n\to \infty$. It suffices to show that
\[\begin{eqnarray}\lim_{N\to \infty}\frac{1}{N}|\{1\leq n\leq N:\gamma n\pmod 1 \in [b,b+x_n]\}|\end{eqnarray}\]
converges to $0$ as $n\to \infty$. Let $k_n$ be the smallest positive integer such that $\|k_n\gamma\| \leq x_n$, where $\|\cdot\| $ is the distance to the neasrest integer. Clearly, $k_n\to \infty$ as $n\to \infty$. Observe that among any $k_n-1$ numbers there can be ast most one $n$ for which $n\gamma \pmod 1\in [b,b+x_n]$. Hence,
\[\begin{eqnarray}\lim_{N\to \infty}\frac{1}{N}|\{1\leq n\leq N:\gamma n\pmod 1 \in [b,b+x_n]\}|\leq \frac{1}{k_n-1},\end{eqnarray}\]
which converges to $0$ as $n\to\infty.$■

We can also deduce from Birkhoff's ergodic theorem the uniform distribution of exponential sequences.

Theorem 7 (Unifrom distribution of exponential sequences). Let $k\geq 2$ be an integer. Then the sequence $k^nx\pmod 1$ is uniformly distributed in the sense that
for any Lebesgue measurable $A\subset [0,1)$ we have
\[\begin{eqnarray}\lim_{N\to \infty}\frac{1}{N}|\{1\leq n\leq N:k^n x\pmod 1 \in A\pmod 1\}|=m(A)\end{eqnarray}\]
for almost all $x$.

Proof. Take $T(x)=kx$ as our measure-preserving transformation. We have also proved that it is ergodic, so taking $f=1_A$ in Birkhoff's theorem yields the claim.■

It turns out that the following modification of the previous theorem is false: If $\gamma$ is irrational, then the sequence $\gamma^n\pmod 1$ is equidistributed in the sense of Weyl's ordinary theorem. For example taking $\gamma=1+\sqrt{2}$, it is easy to see that $\gamma^n+(1-\sqrt{2})^n$ is always an integer, so $\gamma^n \pmod 1$ converges to $0$ exponentially.

The previous theorem can be put into a more striking form when one introduces the concept of a normal number. If we take an irrational number whose decimal representation has no obvious structure (for example $\pi$), we expect that each of the ten digits occurs asymptotically $\frac{1}{10}$ of the time. No such result is known for $\pi$, or any other irrational number not specifically constructed for such a purpose, but it turns out that numbers satisfying a much stricter randomness condition, and not just to base $10$, cover almost all of the real numbers.

Definition. A real number number $x\in [0,1)$ is called normal to base $b$ if $x=\sum_{k=0}^{\infty}a_kb^k$ with $a_k\in \{0,1,...,b-1\}$ and for every $\ell$ and every sequence $(\alpha_0,...,\alpha_{\ell-1})$ it holds that
\[\begin{eqnarray}\lim_{N\to \infty}\frac{1}{N}|\{1\leq n\leq N:(a_n,...,a_{n+\ell-1})=(\alpha_0,...,\alpha_{\ell-1})\}|=\frac{1}{b^{\ell}}.\end{eqnarray}\]
We also say that an arbitrary $x\in \mathbb{R}$ is normal to base $b$ if $x\pmod 1$ is normal to base $b$.

In other words, a number is normal to base $b$ if each given digit sequence occurs in it with the same probability as any other sequence of the same length. Borel was the first to consider normal numbers in 1909, and gave a proof of their ubiquitousness, but this prove turned out to be faulty. Later various correct proofs were found, including an ergodic theoretic proof.

Theorem 8 (Almost all numbers are normal). Let $\mathcal{N}$ be the set of real numbers that are normal to every base $b\geq 2$. Then $m(\mathbb{R}\setminus \mathcal{N})$.

Proof. It suffices to show that $\mathcal{N}\cap [0,1]$ has measure $1$. If $\mathcal{N}_b$ denotes the normal numbers in base $b$, it suffices to show that $\mathcal{N}_b\cap [0,1]$ has measure $1$ for each $b$. It even suffices to show that for every sequence $\alpha=(\alpha_1,...,\alpha_{\ell})\in \{0,...,b-1\}^{\ell}$ the set $\mathcal{N}_{b,\alpha}\cap [0,1]$ has measure $1$, where $\mathcal{N}_{b,\alpha}$ is the set of numbers whose base $b$ representation contains the sequence $\alpha$ with frequency $b^{-\ell}$. The reason for this being sufficient is that there are countably many such sequences, as each corresponds to a rational number. Recall that the map $T(x)=bx\pmod 1$ is ergodic on $[0,1)$ with respect to the Lebesgue measure. Also notice that if
\[\begin{eqnarray}A=\alpha_0+\alpha_1 b+...+\alpha_{\ell-1}b^{\ell-1},\end{eqnarray}\]
then $T^n(x)\in \left[\frac{A}{b^{\ell}},\frac{A+1}{b^{\ell}}\right)$ is the same as saying that $(a_n,...,a_{n+\ell-1})=(\alpha_0,...,\alpha_{\ell-1})$, where $x=\sum_{k=0}^{\infty}a_kb^k$. We are now in a position to apply Birkhoff's theorem to $f=1_{ \left[\frac{A}{b^k},\frac{A+1}{b^k}\right)}$ and to $T$; we get
\[\begin{eqnarray}&&\lim_{N\to \infty}\frac{1}{N}|\{1\leq n\leq N:(a_n,...,a_{n+\ell-1})=(\alpha_0,...,\alpha_{\ell-1})\}|\\&=&\lim_{N\to \infty}\frac{1}{N}\sum_{n=0}^{N-1}f(T^n(x))\\&=&\int_{0}^{1}f(x)dx=b^{-\ell}.\end{eqnarray}\]
We already discovered that this was enough to prove the theorem.■

The applications of the ergodic theorem are certainly not limited to number theory, and one can even use ergodicity to prove a version of the law of large numbers assuming only that the random variables (that is, measurable functions) have finite expectation (that is, are integrable). Skipping some technicalities, we give a sketch of the proof.

Theorem 9 (Law of large numbers). Let $(\Omega,\mathcal{F},P)$ be a probability space (measure space with total measure $1$), and let $X_1,X_2,...$ be independent and identically distributed random variables with expectation $E(X_i)=\bar{x}$ finite. Then
\[\begin{eqnarray}\frac{1}{n}(X_1(\omega)+...+X_n(\omega))\to \bar{x}\end{eqnarray}\]
as $n\to \infty$ $P$-almost surely (that is, for almost all $\omega$).

Proof (sketch). Define the map $\pi:\Omega\to \mathbb{R}^{\mathbb{N}}$ by $\pi(\omega)=(X_1(\omega),X_2(\omega),...))$. Let $\mu=P\circ \pi^{-1};$ clearly $\pi$ is a measure on $\mathbb{R}^{\mathbb{N}}$. As $X_1,X_2,...$ are independent, $\mu=\nu_1\times \nu_2\times ...$ (a product measure $\mu_1\times \mu_2$ is simply defined by $\mu_1\times \mu_2(B_1\times B_2)=\mu_1(B_1)\mu_2(B_2)$). As $X_i$ are identically distributed, all $\nu_i$ are equal, so $\mu=\nu^{\mathbb{N}}$. Define the Bernoulli shift $T(x_1,x_2,...)=(x_2,x_3,...)$. One can show that this map is measure-preserving and ergodic on $\mathbb{R}^{\mathbb{N}}$. Finally, let $f(x)=x_1$ and $x=\pi(\omega)$ and apply Birkhoff's theorem to $f$ and $T$. The time average becomes
\[\begin{eqnarray}\frac{1}{n}(X_1(\omega)+...+X_n(\omega))\to \bar{x},\end{eqnarray}\]
while the space average is just $\bar{x}$, as wanted.■