
644 INFORMATION THEORY AND PORTFOLIO THEORY
16.8 SHANNON–MCMILLAN–BREIMAN THEOREM
(GENERAL AEP)
The AEP for ergodic processes has come to be known as the Shan-
non –McMillan –Breiman theorem. In Chapter 3 we proved the AEP for
i.i.d. processes. In this section we offer a proof of the theorem for a
general ergodic process. We prove the convergence of
1
n
log p(X
n
) by
sandwiching it between two ergodic sequences.
In a sense, an ergodic process is the most general dependent process for
which the strong law of large numbers holds. For finite alphabet processes,
ergodicity is equivalent to the convergence of the kth-order empirical
distributions to their marginals for all k.
The technical definition requires some ideas from probability theory. To
be precise, an ergodic source is defined on a probability space (,
B,P),
where
B is a σ -algebra of subsets of and P is a probability measure.
A random variable X is defined as a function X(ω),ω ∈ , on the prob-
ability space. We also have a transformation T : → , which plays
the role of a time shift. We will say that the transformation is stationary
if P(TA) = P(A) for all A ∈
B. The transformation is called ergodic if
every set A such that TA= A, a.e., satisfies P(A) = 0or1.IfT is station-
ary and ergodic, we say that the process defined by X
n
(ω) = X(T
n
ω) is
stationary and ergodic. For a stationary ergodic source, Birkhoff’s ergodic
theorem states that
1
n
n
i=1
X
i
(ω) → EX =
XdP with probability 1. (16.172)
Thus, the law of large numbers holds for ergodic processes.
We wish to use the ergodic theorem to conclude that
−
1
n
log p(X
0
,X
1
,...,X
n−1
) =−
1
n
n−1
i=0
log p(X
i
|X
i−1
0
)
→ lim
n→∞
E[−log p(X
n
|X
n−1
0
)]. (16.173)
But the stochastic sequence p(X
i
|X
i−1
0
) is not ergodic. However, the
closely related quantities p(X
i
|X
i−1
i−k
) and p(X
i
|X
i−1
−∞
) are ergodic and
have expectations easily identified as entropy rates. We plan to sandwich
p(X
i
|X
i−1
0
) between these two more tractable processes.