84 ENTROPY RATES OF A STOCHASTIC PROCESS
Alternatively, by an application of the data-processing inequality to
the Markov chain X
1
→ X
n−1
→ X
n
,wehave
I(X
1
;X
n−1
) ≥ I(X
1
;X
n
). (4.53)
Expanding the mutual informations in terms of entropies, we have
H(X
n−1
) − H(X
n−1
|X
1
) ≥ H(X
n
) − H(X
n
|X
1
). (4.54)
By stationarity, H(X
n−1
) = H(X
n
), and hence we have
H(X
n−1
|X
1
) ≤ H(X
n
|X
1
). (4.55)
[These techniques can also be used to show that H(X
0
|X
n
) is
increasing in n for any Markov chain.]
5. Shuffles increase entropy.IfT is a shuffle (permutation) of a deck
of cards and X is the initial (random) position of the cards in the
deck, and if the choice of the shuffle T is independent of X,then
H(TX) ≥ H(X), (4.56)
where TX is the permutation of the deck induced by the shuffle T
on the initial permutation X. Problem 4.3 outlines a proof.
4.5 FUNCTIONS OF MARKOV CHAINS
Here is an example that can be very difficult if done the wrong
way. It illustrates the power of the techniques developed so far. Let
X
1
,X
2
,...,X
n
,...be a stationary Markov chain, and let Y
i
= φ(X
i
) be
a process each term of which is a function of the corresponding state
in the Markov chain. What is the entropy rate H(
Y)? Such functions of
Markov chains occur often in practice. In many situations, one has only
partial information about the state of the system. It would simplify matters
greatly if Y
1
,Y
2
,...,Y
n
also formed a Markov chain, but in many cases,
this is not true. Since the Markov chain is stationary, so is Y
1
,Y
2
,...,Y
n
,
and the entropy rate is well defined. However, if we wish to compute
H(
Y), we might compute H(Y
n
|Y
n−1
,...,Y
1
) for each n and find the
limit. Since the convergence can be arbitrarily slow, we will never know
how close we are to the limit. (We can’t look at the change between the
values at n and n + 1, since this difference may be small even when we
are far away from the limit—consider, for example,
1
n
.)