
70 4 Classical information and communication
change of value of any given bit during transmission.
4
Error correction, which
is discussed in Sections 4.5, is the process of eliminating such errors.
4.2 Shannon entropy
The traditional measure of classical information is the Shannon entropy,
H(X) ≡ H[p
1
,p
2
,...,p
n
]=−
n
i
p
i
log
2
p
i
, (4.3)
which is a functional of the probability distribution {p
i
}
n
i=1
, associated with
events x
i
in a sample space S = {x
i
},givenbytheprobability mass function
as p
i
≡ p
X
(x
i
)=P (X = x
i
) for the random variable X over n possible values
that characterizes the information.
5
The Shannon entropy is thus the average
information associated with the set of events, in units of bits.
6
It can also be
thought of as the number of bits, on average, needed to describe the random
variable X.
If one considers a sequence of n independent and identically distributed
(i.i.d.) random variables, X
n
, drawn from a probability mass function p(X =
x
i
), the probability of a typical sequence is of order 2
−nH(X)
; there are, ac-
cordingly, roughly 2
nH(X)
possible such sequences.
7
The latter can be seen
by noting that typical sequences will contain p(x
i
)n instances of x
i
,sothat
the number of typical sequences is n!/
n
i
(np(x
i
))! which, under the Stirling
4
More detailed models of classical communication channels can be found, for ex-
ample, in [344].
5
See Sect. A.2 for pertinent definitions. The events associated with the random
variable X here correspond, for example, to different numbers or letters of an
alphabet on the sides of a die appearing face-up in a toss.
6
Such a measure and a fundamental unit of information were also independently
(and previously) introduced by Alan Turing, who chose 10 as a base rather than
Shannon’s 2, and so the “ban” rather than the bit as the fundamental information
unit, while he was working at Bletchley Park during the Second World War. For
a description of Turing’s formulation of information entropy, which involved the
“weight of evidence,” see [190].
7
This is known as the asymptotic equipartition property (AEP). One can distin-
guish two sets of sequences, the typical and atypical sequences, being comple-
ments of each other, where the typical sequences are those with probability close
to 2
−nH(X)
; in particular, the typical set is that of sequences {x
1
,x
2
, ..., x
n
} for
the random variable X such that 2
−nH(X)−
≤ p(x
1
,x
2
, ..., x
n
) ≤ 2
−nH(X)+
for
all >0. The AEP is a consequence of the weak law of large numbers, namely
that, for i.i.d. random variables, the average is close to the expected value of X for
large n.Thetheorem of typical sequences can also be proven, which supports the
notion that in the limit of large sequence length almost all sequences produced
byasourcebelongtothisset.