
11.3 UNIVERSAL SOURCE CODING 357
Thus, the expected number of occurrences of the event {D(P
x
n
||P) > }
for all n is finite, which implies that the actual number of such occur-
rences is also finite with probability 1 (Borel–Cantelli lemma). Hence
D(P
x
n
||P) → 0 with probability 1.
We now define a stronger version of typicality than in Chapter 3.
Definition We define the strongly typical set A
∗(n)
to be the set of
sequences in
X
n
for which the sample frequencies are close to the true
values:
A
∗(n)
=
x ∈
X
n
:
1
n
N(a|x) − P(a)
<
|X |
, if P(a) > 0
N(a|x) = 0ifP(a) = 0
.
(11.71)
Hence, the typical set consists of sequences whose type does not differ
from the true probabilities by more than /|
X| in any component. By the
strong law of large numbers, it follows that the probability of the strongly
typical set goes to 1 as n →∞. The additional power afforded by strong
typicality is useful in proving stronger results, particularly in universal
coding, rate distortion theory, and large deviation theory.
11.3 UNIVERSAL SOURCE CODING
Huffman coding compresses an i.i.d. source with a known distribution
p(x) to its entropy limit H(X). However, if the code is designed for
some incorrect distribution q(x), a penalty of D(p||q) is incurred. Thus,
Huffman coding is sensitive to the assumed distribution.
What compression can be achieved if the true distribution p(x) is
unknown? Is there a universal code of rate R, say, that suffices to describe
every i.i.d. source with entropy H(X) < R? The surprising answer is yes.
The idea is based on the method of types. There are 2
nH (P )
sequences of
type P . Since there are only a polynomial number of types with denom-
inator n, an enumeration of all sequences x
n
with type P
x
n
such that
H(P
x
n
)<R will require roughly nR bits. Thus, by describing all such
sequences, we are prepared to describe any sequence that is likely to arise
from any distribution Q having entropy H(Q) < R. We begin with a
definition.
Definition A fixed-rate block code of rate R for a source X
1
,X
2
,...,
X
n
which has an unknown distribution Q consists of two mappings: the
encoder,
f
n
: X
n
→{1, 2,...,2
nR
}, (11.72)