HAZARD MODELS FROM A POPULATION PERSPECTIVE 291
coincide with the original CDF for x>0.5. If we computed summary statistics on such data
we would get a mean that is larger than the true mean. To see how much, recall (see page 156)
that the mean value can be visualized as the area above the CDF curve (up to the level one).
The bias introduced when we use truncated data is therefore given by the gray area in the
graph. To reduce this bias, it is common practice to impute the value LOQ/2 for data below
the LOQ. In our case such imputation would be rather accurate in terms of mean estimation.
In fact, the (white) area above the curve in the small rectangle with base (0.25, 0.5) is of about
the same size as the gray area under the curve in the interval (0, 0.25), so these areas more or
less cancel each other out.
Data below the LOQ are really missing data, the curse of which was addressed in Sec-
tion 3.7. When we replace such data with LOQ/2 in descriptive statistics, we make an impu-
tation for missing values. However, in a statistical analysis we may not need to use imputed
data. If our choice of analysis is to do a rank test, what value we impute does not matter (as
long as the same value is imputed for all individuals and this is below the LOQ). If we use a
parametric model we can often avoid imputation altogether by using a likelihood method.
Censored data are most common in the time-to-event context. Right-censored data are
particularly relevant in that context; we may follow some patients after initiation of a new
treatment to see how long they live, but since the trial itself usually is restricted in time, we
may not be able to follow them all to their death. Instead we stop the trial at some specific
point in time, and for those still alive at that point, we only know that the survival time is
longer than the time we have observed them, not the exact value. There are a number of other
reasons why follow-up of a subject may cease before the event of interest has occurred. As
we will discuss in some detail later, for a proper analysis to be conducted, the reason why the
event has not been observed in an individual must be independent of his underlying risk for
the event (non-informative censoring), or there will be some issues around the interpretation
of the results. There will be bias in the results if there is a systematic withdrawal of either
high- or low-risk patients.
Left-censored data may also appear in this context, but not for randomized studies in which
the ‘clock starts’ at the point of randomization. In observational studies, however, when we
analyze the natural history of a disease, we might be interested in having birth as the origin
of time, and, depending on how subjects present themselves in the study, we may have left-
censored data; we may know that an event has happened prior to the first investigation, but
not when.
Interval-censored data also occur in some clinical trials. We may want to determine the
time to a particular event, but the determination of whether or not this event has occurred
can only be made at visits to the clinic, where the appropriate measurement can be obtained.
When to schedule visits to the clinic is defined in the protocol, and we may for a particular
individual only know that the event had not occurred at visit 3 after 23 days of treatment,
but had occurred at visit 4 after 56 days of treatment. This means that we know it occurred
somewhere in the time interval (23, 56) days, but not on which day.
11.3 Hazard models from a population perspective
To describe data for events that occur with a particular intensity, we first need to define
what we mean by an intensity, or hazard (these two words will be used interchangeably;