
404 INFORMATION THEORY AND STATISTICS
•
Argue that the term corresponding to i =np is the largest
term in the sum on the right-hand side of the last equation.
•
Show that this term is approximately 2
−nD
.
•
Prove an upper bound on the probability in Sanov’s theorem
using the steps above. Use similar arguments to prove a lower
bound and complete the proof of Sanov’s theorem.
11.14 Sanov.LetX
i
be i.i.d. ∼ N(0,σ
2
).
(a) Find the exponent in the behavior of Pr{
1
n
n
i=1
X
2
i
≥ α
2
}.
This can be done from first principles (since the normal dis-
tribution is nice) or by using Sanov’s theorem.
(b) Whatdothedatalooklikeif
1
n
n
i=1
X
2
i
≥ α?Thatis,what
is the P
∗
that minimizes D(P Q)?
11.15 Counting states. Suppose that an atom is equally likely to be in
each of six states, X ∈{s
1
,s
2
,s
3
,...,s
6
}. One observes n atoms
X
1
,X
2
,...,X
n
independently drawn according to this uniform
distribution. It is observed that the frequency of occurrence of
state s
1
is twice the frequency of occurrence of state s
2
.
(a) To first order in the exponent, what is the probability of
observing this event?
(b) Assuming n large, find the conditional distribution of the state
of the first atom X
1
, given this observation.
11.16 Hypothesis testing.Let{X
i
} be i.i.d. ∼ p(x), x ∈{1, 2,...}.
Consider two hypotheses, H
0
: p(x) = p
0
(x) vs. H
1
: p(x) =
p
1
(x),wherep
0
(x) =
1
2
x
and p
1
(x) = qp
x−1
, x = 1, 2, 3,....
(a) Find D(p
0
p
1
).
(b) Let Pr{H
0
}=
1
2
. Find the minimal probability of error test for
H
0
vs. H
1
given data X
1
,X
2
,...,X
n
∼ p(x).
11.17 Maximum likelihood estimation.Let{f
θ
(x)} denote a parametric
family of densities with parameter θ
R.LetX
1
,X
2
,...,X
n
be
i.i.d. ∼ f
θ
(x). The function
l
θ
(x
n
) = ln
n
i=1
f
θ
(x
i
)
is known as the log likelihood function.Letθ
0
denote the true
parameter value.