54 ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION
There are various other axiomatic formulations which result in the
same definition of entropy. See, for example, the book by Csisz
´
ar
and K
¨
orner [149].
2.47 Entropy of a missorted file. A deck of n cards in order 1, 2,...,n
is provided. One card is removed at random, then replaced at ran-
dom. What is the entropy of the resulting deck?
2.48 Sequence length. How much information does the length of a
sequence give about the content of a sequence? Suppose that we
consider a Bernoulli (
1
2
) process {X
i
}. Stop the process when the
first 1 appears. Let N designate this stopping time. Thus, X
N
is an
element of the set of all finite-length binary sequences {0, 1}
∗
=
{0, 1, 00, 01, 10, 11, 000,...}.
(a) Find I(N;X
N
).
(b) Find H(X
N
|N).
(c) Find H(X
N
).
Let’s now consider a different stopping time. For this part, again
assume that X
i
∼ Bernoulli(
1
2
) but stop at time N = 6, with prob-
ability
1
3
and stop at time N = 12 with probability
2
3
. Let this
stopping time be independent of the sequence X
1
X
2
···X
12
.
(d) Find I(N;X
N
).
(e) Find H(X
N
|N).
(f) Find H(X
N
).
HISTORICAL NOTES
The concept of entropy was introduced in thermodynamics, where it
was used to provide a statement of the second law of thermodynam-
ics. Later, statistical mechanics provided a connection between thermo-
dynamic entropy and the logarithm of the number of microstates in a
macrostate of the system. This work was the crowning achievement of
Boltzmann, who had the equation S = k ln W inscribed as the epitaph on
his gravestone [361].
In the 1930s, Hartley introduced a logarithmic measure of informa-
tion for communication. His measure was essentially the logarithm of the
alphabet size. Shannon [472] was the first to define entropy and mutual
information as defined in this chapter. Relative entropy was first defined
by Kullback and Leibler [339]. It is known under a variety of names,
including the Kullback–Leibler distance, cross entropy, information diver-
gence, and information for discrimination, and has been studied in detail
by Csisz
´
ar [138] and Amari [22].