
312 HAZARDS AND CENSORED DATA
From this we deduce the classical Greenwood formula approximation to the variance
for F
n
(t), which is V (F
n
(t)) ≈ F
c
n
(t)
2
σ
2
n
(t). When there are no censored data we have that
r
j+1
= r
j
− d
j
, and this variance expression becomes F
n
(t)(1 − F
n
(t))/n.
When we make inference about F(t) from the Kaplan–Meier estimator F
n
(t), we can use
large-sample theory, which is outlined in Appendix 11.A, to deduce that
(F
c
n
(t) − F
c
(t))
2
F
c
(t)
2
σ
2
n
(t)
∈ Asχ
2
(1).
Once we have this, we can do the same thing as we did when we described F (t) from complete
data, using the standard e-CDF. From this observation we can obtain confidence regions for
F (t) as well as confidence intervals for percentiles, using the methods discussed in Chapter 6.
11.9 Comments and further reading
A short introduction to what makes survival data special is given by Hougaard (1999), which
expands on some of the points made above. When it comes to textbooks on the analysis of this
kind of data, there is a choice available for almost any taste – theoretical, example-driven, for
dummies, etc. – and new books appear more or less every year. Some (historically) important
ones can be found among the references below.
The problem of competing risks is a controversial subject (Hougaard, 2000, Chapter 12).
The basic problem with analyzing a particular cause of death in the presence of competing
causes is that there is not sufficient information available to settle the problem, which is why we
need to make additional assumptions. This creates a philosophical dilemma regarding what can
legitimately be done: can we think about cause-specific survivals at all, using models that are
such that we can identify these (like an independence assumption, or using so-called copulas)
or should we restrict ourselves to what we actually can observe? Examples of advocates for
these two positions are Zheng and Klein (1994) and Prentice et al. (1978), respectively.
The paper that started it all, by Daniel Bernoulli, has been republished (Bernoulli, 2004)
in a review which starts with the following quote from the author in 1760: ‘I simply wish
that, in a matter which so closely concerns the well-being of the human race, no decision
shall be made without all the knowledge which a little analysis and calculation can provide.’
The data he used (Halley, 1693) can, at the time of writing, be found on the internet at
http://www.pierre-marteau.com/editions/1693-mortality.html. We should note that Bernoulli
added a ‘guesstimate’ to this table for the number of births.
A more detailed discussion on the smallpox model, its history and some generalizations
can be found in Dietz and Heesterbeek (2002), including a discussion on a dispute Bernoulli
had with the French mathematician d’Alembert. The latter advocated a simpler competing
risk model, in which you either die from smallpox or not. The model he suggested was the
general approach to competing risks we use today, whereas Bernoulli’s model is restricted
to infection-type diseases. Modeling competing risks is a first step toward the wider subject
of multi-state models, to which Andersen and Keiding (2002) provide an introduction, and
about which most reasonably advanced textbooks have something to say.
An introduction to the frailty problem in survival analysis is given by Aalen (1994),
whereas a more extensive discussion is found in the book by Hougaard (2000), who suggests
using a three-parameter family of distributions to describe frailty, as an alternative to the
gamma distribution.