
11.5 EXAMPLES OF SANOV’S THEOREM 365
From (11.110), it follows that P
∗
has the form
P
∗
(x) =
2
λx
6
i=1
2
λi
, (11.113)
with λ chosen so that
iP
∗
(i) = 4. Solving numerically, we obtain
λ = 0.2519, P
∗
= (0.1031, 0.1227, 0.1461, 0.1740, 0.2072, 0.2468),and
therefore D(P
∗
||Q) = 0.0624 bit. Thus, the probability that the average
of 10000 throws is greater than or equal to 4 is ≈ 2
−624
.
Example 11.5.2 (C oins) Suppose that we have a fair coin and want
to estimate the probability of observing more than 700 heads in a series
of 1000 tosses. The problem is like Example 11.5.1. The probability is
P(
X
n
≥ 0.7)
.
= 2
−nD(P
∗
||Q)
, (11.114)
where P
∗
is the (0.7, 0.3) distribution and Q is the (0.5, 0.5) distribution.
In this case, D(P
∗
||Q) = 1 −H(P
∗
) = 1 − H(0.7) = 0.119. Thus, the
probability of 700 or more heads in 1000 trials is approximately 2
−119
.
Example 11.5.3 (Mutual dependence)LetQ(x, y) be a given joint
distribution and let Q
0
(x, y) = Q(x)Q(y) be the associated product dis-
tribution formed from the marginals of Q. We wish to know the likelihood
that a sample drawn according to Q
0
will “appear” to be jointly dis-
tributed according to Q. Accordingly, let (X
i
,Y
i
) be i.i.d. ∼ Q
0
(x, y) =
Q(x)Q(y). We define joint typicality as we did in Section 7.6; that is,
(x
n
,y
n
) is jointly typical with respect to a joint distribution Q(x, y) iff
the sample entropies are close to their true values:
−
1
n
log Q(x
n
) − H(X)
≤ , (11.115)
−
1
n
log Q(y
n
) − H(Y)
≤ , (11.116)
and
−
1
n
log Q(x
n
,y
n
) − H(X,Y)
≤ . (11.117)
We wish to calculate the probability (under the product distribution) of
seeing a pair (x
n
,y
n
) that looks jointly typical of Q [i.e., (x
n
,y
n
)