
214 HOW TO COMPARE THE OUTCOME IN TWO GROUPS
Box 8.2 Analysis of a percentile difference for two groups
Here we outline how we can obtain confidence claims for the difference for a particular
percentile for two independent distributions F (x) and G(x). Consider any particular
percentile, x
p
,ofF (x) (so that F (x
p
) = p). Let the corresponding percentile for G(x)
be written x
p
+ θ, so that G(x
p
+ θ) = p. In order to obtain knowledge about θ we use
the approximation
(F
n
(x
p
) − p)
2
V (F
n
(x
p
))
∈ χ
2
(1), where V (F
n
(x
p
)) = p(1 − p)/n,
for F (x), and similarly for G(x). Since the test statistics for F(x
p
) and G(x
p
+ θ) are
independent, we can sum two quadratic forms to get
(F
n
(x
p
) − p)
2
V (F
n
(x
p
))
+
(G
m
(x
p
+ θ) − p)
2
V (G
m
(x
p
+ θ))
∈ χ
2
(2).
From this we derive a confidence function for (x
p
,θ), and in order to obtain knowledge
about θ we now profile x
p
out of this, as in Section 7.7. This means that for given θ
we estimate x
p
= x
p
(θ) by minimization, which gives us a function P
mn
(θ)ofθ alone.
The end result is the two-sided confidence function C(θ) = χ
1
(P
mn
(θ)), from which we
can obtain asymptotically correct knowledge about θ. Properly modified, this approach
also allows us to compare the two percentiles in other ways, such as their ratio.
a priori distribution for the mean difference, but we will follow the biostatistical tradition and
not discuss this any further.
In order to apply the discussion above to the data in Figure 8.1 we estimate the group
mean values of the logarithmic data to 4.99 and 3.00, respectively, which gives us a mean
difference of 1.99 with 95% confidence interval (0.80, 3.17). This is the (estimated) size of
the shift in Figure 8.1. However, to get the shift back to the original measurement scale we
back-transform by exponentiation. This gives us a ratio of geometric means for treated versus
placebo of e
−1.99
, which is 14%, with 95% confidence interval (4.2, 45)%.
Next we consider the median. A key difference between the mean and the median is that
whereas the difference of two means is the mean of the difference, this need not be true for
medians. So the approach is more complicated, building on ideas used in Section 7.7 when
we analyzed two independent binomial parameters. An outline is given in Box 8.2. When
we carry out this analysis for our sputum data on the log scale, we get an estimated median
difference of 1.83 with 95% confidence interval (0.26, 3.39), which we can back-transform
to a statement about the ratio of the medians for the two distributions as being 16% with 95%
confidence interval (3.4, 77)%. For these data there is therefore a close agreement between
the mean and median estimates of a possible horizontal shift. The confidence interval for the
median is wider than that for the mean, because the mean value analysis uses more of the
information in the data than the median analysis does.
Now we return to the original cell count scale, instead of their log values. For the mean
values we have the estimates 752 and 93 for the two groups, giving a mean difference estimate
of 659 with 95% confidence interval (−34, 1351). In this computation we have assumed equal