REFERENCES 113
method we suggest, its true coverage probability is not always the nominal one. For more on
this, see Brown et al. (2001) and/or Agresti (2003).
Bayesian statistics actually pre-dates the frequentist approach (Feinberg, 2006) in the
practice of statistics. Thomas Bayes may have been the first to formulate the inverse probability
formula, but he had no influence on its future applications (just as James Lind did not really
change the treatment of scurvy). The most influential person when it comes to Bayes’ theorem,
and early probability theory in general, is without doubt Laplace. In his time it became
customary to make statements about a parameter based on the probability of the outcome
given the parameter, which meant they used the uniform prior for the parameter and Bayes’
theorem. This was called the inverse probability method and was introduced by Laplace in
1774. However, it was not without its critics, including Laplace himself who seems to have
moved away from it when he introduced the CLT. It was realized early that its use included
a confusing double meaning of probability, which is sometimes taken to be objective and
sometimes taken to be subjective. It was also noted at the time that the application of the rule
of succession leads to a ‘futile and illusory conclusion’: if you toss a coin twice and get heads
both times, few bet two to one that one gets heads at the next toss (Hald, 2007). Which, of
course, is a reflection of the fact that the uniform prior is often irrelevant.
The discussion on Gauss’s justification of the Gaussian distribution is based on the de-
scription in (Eisenhart, 1982), where the original references can be found. To claim that Gauss
invented the least squares method is controversial; some attribute it to the French mathemati-
cian Legendre. The importance of the Gaussian distribution in statistics is as an approximation
to the distributions for various test statistics, for reasons often traceable to the CLT. It is also
true in biostatistics that the data themselves, or a transformation thereof, often have a distribu-
tion similar to the Gaussian distribution. Sometimes the argument is that what we see is the net
result of many small, random, entities that add up to the outcome variable measured, which
is another appeal to the CLT. It does not imply that the Gaussian is appropriate for all kind
of data. In many areas of human affairs, such as distributions of wages or time to completion
of tasks, it may well be that fundamentally different distributions (Taleb, 2007) are more ap-
propriate, such as heavy-tailed power law distributions of the form F(x) = 1 − x
−D
for some
D>0. These are distributions with very different properties than those of the bell-shaped
Gaussian distribution. The key is that they are scale-invariant, which is related to the theory
of fractals in modern mathematics.
References
Agresti, A. (2003) Dealing with discreteness: making ‘exact’ confidence intervals for proportions,
difference of proportions and odds ratios more exact. Statistical Methods in Medical Research, 12,
3–21.
Brown, L.D., Cai, T.T. and DasGupta, A. (2001) Interval estimation for a binomial proportion. Statistical
Science, 16(2), 101–133.
Dawid, A.P. and Mortera, J. (1996) Coherent analysis of forensic identification evidence. Journal of the
Royal Statistical Society, Series B, 58(2), 425–443.
Eisenhart, C. (1982) Encyclopedia of Statistical Sciences vol. 4 John Wiley& Sons, Inc. chapter Laws
of Error II: Gaussian Distribution, pp. 547–560.
Feinberg, S.E. (2006) When did Bayesian inference become ‘Bayesian’?. Bayesian Analysis, 1(1), 1–40.