THE EXPONENTIAL FAMILY OF DISTRIBUTIONS 263
influenced by the heterogeneity. This explains why, for the second Hodgkin’s lymphoma study
in Section 4.5.1, we obtained a smaller estimate from the first analysis than from the second.
The relation is 1.47 = 2.14
ν
, from which we get an indication of the heterogeneity by solving
for ν. However, there is no concomitant increase in precision, the 90% confidence interval
for ψ is (0.88, 2.46), whereas that for θ from the matched analysis is (1.03, 4.47) (derived
from the Wilson interval for a single binomial parameter by transforming to the odds). This
is consistent with the discussion above, and also reflects the fact that fewer observations are
used in the second analysis.
The difference between β and β
∗
may be considered some kind of misspecification bias,
because we analyze the wrong model. This may not always be a good use of the word ‘bias’,
because it really only reflects the fact that the overall effect seen in a population depends on
how much heterogeneity is left unaccounted for. The more predictive covariates we include,
the smaller is this residual heterogeneity. The general observation is that the larger the hetero-
geneity, the smaller is the effect we see in the population (we have |β
∗
|≤|β| in the notation
above, with the difference increasing with the heterogeneity). The exception here are the
identity and exponential response functions. A simple regression model in a heterogeneous
environment provides estimates, for example treatment estimates, that may be reasonably ac-
curate from a population perspective, but wrong when interpreted as individual effects. This
distinction between the population perspective and the individual perspective will play a major
role in our discussion on survival data and the Cox model later. It is also further discussed in
the next chapter, where we discuss the difference between a subject-specific and a population
averaged approach to the description of dose response.
We may also note that because of the general observation that the variance can be de-
composed as V (Y ) = E(V (Y|Z)) +V (E(Y |Z)) (applied to the conditional distribution of Y
given that X = x), the (conditional) variance of Y always increases with omitted covariates,
including the otherwise harmless case with the identity response function. However, this does
not imply that the precision in the estimated regression coefficients has to increase when we
include more predictive covariates, if they get a new meaning.
9.7 The exponential family of distributions
The rest of this chapter is more mathematical in character. It is about a particular drive in
mathematics – the wish to generalize and systematize, to see what is common to a num-
ber of particular cases and find a general formulation which treats these as special cases.
We seek a general theory, including all the proofs necessary, which we can apply to the
particular cases, without the need for individual proofs. This is something that appeals to
mathematicians, and statistics is a subdiscipline of mathematics. We therefore wish to take
this opportunity to formulate as part of a general framework the regression theories so far
encountered. This will give us the tools to find variations of these, applicable to specific
problems (not that we will make much use of these tools, but at least we will be able to if
we wish).
We will use the following definition. A distribution is said to belong to the exponential
family of distributions if its CDF can be written in the form
dF
φ
(x, θ) = e
(xθ−κ(θ))/φ
dF
φ
(x) (9.5)