
MAKING STATEMENTS ABOUT A BINOMIAL PARAMETER 107
Box 4.9 Laplace rule of succession and the sunrise problem
If we have n independent observations of a Bernoulli experiment in which the event
has occurred k times, what is the probability that it will occur in the next, independent,
experiment? Conventional reasoning would estimate the unknown probability as k/n,
which therefore is the predicted probability that it will occur in the next test.
However, if we have had success on all occasions, so that k = n, is it obvious that this
is the proper estimate? The prediction would be that we are certain the event will occur
next time. But if we toss a possibly biased coin 10 times getting a head each time, is the
best predictive probability estimate really that we are certain to get a head next time?
Laplace addressed this as the sunrise problem: ‘what is the probability that the sun
will rise tomorrow?’ He argued as follows. Prior to knowing of any sunrises, one is
completely ignorant of the probability p of a sunrise. Laplace takes this ignorance to
mean that at that stage of our knowledge about p, it can be described by the uniform
distribution on the interval (0, 1). He then derived equation (4.6) and concluded that
after the sun has risen on n consecutive days, the updated probability of a sunrise is
p = (n + 1)/(n + 2). The larger the number of days that have begun with a sunrise, the
higher the plausibility of a sunrise tomorrow. His numerical estimate of this was based
on the assumption that Earth was created on October 23, 4004 bc, as the Bible had been
thought to imply.
However, we must not think that Laplace seriously believed in this. Laplace was
the author of the masterpiece Trait´edeM´echanique C´eleste in five volumes, which
described the solar system in deterministic mathematical equations. Concerning the
sunrise problem he reflected: ‘But this number is far greater for him who, seeing in
the totality of phenomena the principle regulating the days and seasons, realizes that
nothing at present moment can arrest the course of it.’ Or, in other words, the plausibility
of a sunrise depends on how much you know, and this varies from person to person.
This is at the heart of Bayesian statistics: the only good probability is a conditional
probability taking into account what one knows.
by the β(p;4, 6) distribution (which has mean 0.4). In such a case the a posteriori distribution
is proportional to
p
x
(1 − x)
n−x
dβ(p;4, 6) = p
x+4−1
(1 − p)
n−x+6−1
dp,
which is the β(p; x + 4,n− x + 6) distribution. (Bayesian statistics is often numerically hard
and often requires computer simulations, but for this particular case the beta distribution makes
it simple.)
All this is illustrated in Figure 4.4, using the data above. In this graph we have also
reproduced, as the dashed curve, the confidence function in Figure 4.3. Indicated is its median
value, which is the old estimate 0.66, together with the 90% confidence interval (0.58, 0.74).
In addition there are two CDFs, both drawn as solid curves. The rightmost of these, which
is very close to the graph of the confidence function, shows the a posteriori distribution for
p when we use the uninformative prior. We have also indicated its median value, together
with what would correspond to a 90% confidence interval. However, in this setting, this
corresponds to a true probability statement – the probability that the value of p lies in this