REFERENCES 27
harmonize the regulatory requirements all over the world, in a document labeled ICH
E10 (International Conference on Harmonisation, 2000).
There is much more to be said about the nature of probabilities and its implications for
a proper treatment of p-values. Often the need for a probabilistic discussion stems from
lack of information, as in the game show. If we only had complete information, we would
often not need probabilities, like if we only knew all the initial conditions when we toss
a coin, we can predict the outcome with certainty. In fact, you could argue that there are
few cases when there are pure random events. The notable exception are some deep aspects
of contemporary quantum physics. Deterministic processes may appear probabilistic to us,
simply because we cannot obtain sufficient knowledge to explore the deterministic nature of
the problem, a subject mathematicians discuss in chaos theory. Accepting that we need to
compute probabilities, it becomes important to understand the conditions under which the
computed probability is valid. A very rare event occurs, seen from a prospective vantage
point, with a very small probability. Retrospectively the probability is one. To compute that
probability when we know that it has occurred is basically meaningless. However, we should
not confuse that with what we do when we compute p-values. These are probabilities for the
outcome, computed under an assumption, and we use the p-value as indirect evidence for or
against that assumption. Note that the p-value computes the probability of the outcome given
that the null hypothesis is true, not the transposed conditional, the probability that the null
hypothesis is true given the outcome we have observed.
When it comes to error control with multiple testing, the original suggestion by Bonferroni
on how to allocate parts of the available α (significance level) to the different tests was
improved upon considerably by Sture Holm in 1979. He showed that the testing could be
done in a stepwise manner in the order of increasing individual p-values, where these p-values
were compared with successively larger fractions of α. After that it took another 28 years for
the next major step, made independently by Guilbaud and Strassburger-Bretz, which was the
development of confidence intervals corresponding to Holm’s and related testing procedures.
A modern review of this subject, by Dmitrienko et al. (2010), covers not only basic/traditional
approaches but also novel ones. In connection with the multiplicity problem we also touched
upon one of the present hypes in medical statistics, the adaptive designs. We will say no more
on this subject, and refer the reader who wants to learn more to the vast literature on the
subject (there are also plenty of conferences he or she can go to). Both Whitehead (1997)
and Chang (2008) offer useful starting points. The illustration in Example 1.7 is an adaptation
of the main result in Calverley et al. (2007).
References
Calverley, P.M., Anderson, J.A., Celli, B., Ferguson, G.T., Jenkins, C., Jones, P.W., Yates, J.C. and
Vestbo, J. (2007) Salmeterol and fluticasone propionate and survival in chronic obstructive pulmonary
disease. New England Journal of Medicine, 356(8), 775–789.
Chang, M. (2008) Adaptive Design Theory and Implementation Using SAS and R, CRC Biostatistics
Series. Boca Raton, FL: Chapman & Hall/CRC.
Dmitrienko, A., Tamhane, A.C. and Bretz, F. (2010) Multiple Testing Problems in Pharmaceutical
Statistics, CRC Biostatistics Series. Boca Raton, FL: Chapman & Hall/CRC.
Feynman, R.P. and Leighton, R. (1992) Surely You’re Joking Mr Feynman! Adventures of a Curious
Character. London: Vintage.