
8.4 Point Estimation 289
The posterior mode maximizes the posterior density in the same way that
the MLE maximizes the likelihood. When the posterior mode is used as an
estimator, it is called the maximum posterior (MAP) estimator. The MAP esti-
mator is popular in some Bayesian analyses in part because it is computation-
ally less demanding than the posterior mean or median. The reason for this is
simple; to find a MAP, the posterior does not need to be fully specified because
argmax
θ
π(θ|x) = argmax
θ
f (x|θ)π(θ), that is, the product of the likelihood and
the prior as well as the posterior are maximized at the same point.
Example 8.5. Binomial-Beta Conjugate Pair. In Example 8.4 we argued
that for the likelihood X
|θ ∼B in(n,θ) and the prior θ ∼B e(α,β), the posterior
distribution is
B e(x +α, n − x +β). The Bayes estimator of θ is the expected
value of the posterior
ˆ
θ
B
=
α +x
(α +x)+(β +n −x)
=
α +x
α +β +n
.
This is actually a weighted average of the MLE, X /n, and the prior mean
α/(α +β),
ˆ
θ
B
=
n
α +β +n
·
X
n
+
α +β
α +β +n
·
α
α +β
.
Notice that, as n becomes large, the posterior mean approaches the MLE,
because the weight
n
n+α+β
tends to 1. On the other hand, when α or β or both
are large compared to n, the posterior mean is close to the prior mean. Because
of this interplay between n and prior parameters, the sum
α +β is called the
prior sample size, and it measures the influence of the prior as if additional
experimentation was performed and
α +β trials have been added. This is in
the spirit of Wilson’s proposal to “add two failures and two successes” to an
estimator of proportion (p. 254). Wilson’s estimator can be seen as a Bayes
estimator with a beta
B e(2, 2) prior.
Large
α indicates a small prior variance (for fixed β, the variance of
B e(α, β) is proportional to 1/α
2
) and the prior is concentrated about its mean.
In general, the posterior mean will fall between the MLE and the prior
mean. This was demonstrated in Example 8.1. As another example, suppose
we flipped a coin four times and tails showed up on all four occasions. We are
interested in estimating the probability of showing heads,
θ, in a Bayesian
fashion. If the prior is
U (0, 1), the posterior is proportional to θ
0
(1−θ)
4
, which
is a beta
B e(1, 5). The posterior mean shrinks the MLE toward the expected
value of the prior (1/2) to get
ˆ
θ
B
= 1/(1 +5) = 1/6, which is a more reasonable
estimator of
θ than the MLE. Note that the 3/n rule produces a confidence
interval for p of [0,3/4], which is too wide to be useful (Sect. 7.4.4).