
Example 10.16 As an example of the bootstrap for two samples, consider data from a study of
children talking to themselves (private speech), introduced in Example 1.2. The
children were each observed in many 10-s intervals (about 100) and the researchers
computed the percentage of intervals in which private speech occurred. Because
private speech tends to occur when there is a challenging task, the students were
observed when they were doing arithmetic. The private speech is classified as on
task if it is about arithmetic, off task if it is about something else, and mumbling if
the subject is not clear.
Each child was observed in the first, second, and third grades, but we will
consider here just the first grade off-task private speech. For the 18 boys and
15 girls here are the percentages:
B: 4.9, 5.5, 6.5, 0.0, 0.0, 3.0, 2.8, 6.4, 1.0, 0.9, 0.0, 28.1, 8.7, 1.6, 5.1, 17.0, 4.7, 28.1
G: 0.0, 1.3, 2.2, 0.0, 1.3, 0.0, 0.0, 0.0, 0.0, 3.9, 0.0, 10.1, 5.2, 3.2, 0.0.
With the large number of zeroes, a majority for the girls, the normality assumption
of Section 10.2 does not apply here. Also, the sample sizes for the two groups are
not very large, so the two-sample z methods of Section 10.1 might not work for this
data set. Nevertheless, it is useful to give the t CI for comparison purposes. The
95% interval is
x y t
:025;n
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s
2
1
18
þ
s
2
2
15
r
¼ 6:906 1:813 2:080
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
8:719
2
18
þ
2:846
2
15
s
¼ 5:093 2:080ð2:1825Þ¼5:093 4:540 ¼ð:55; 9:63Þ
The degrees of freedom n ¼ 21 come from the messy formula in the theorem of
Section 10.2. The confidence interval does not include 0, which implies that we would
reject the hypothesis m
1
¼ m
2
against a two-tailed alternative at the .05 level. This is in
agreement with what we get in testing this hypothesis directly: t ¼ 2.33, P-value .030.
The t method is of questionable validity, because of sample sizes that might not
be enough to compensate for the nonnormality. The bootstrap method involves
drawing a random sample of size 18 with replacement from the 18 boys, drawing a
random sample of size 15 with replacement from the 15 girls, and calculating the
difference of means. Then this process is repeated to give a total of 999 differences of
means. The distribution of these 999 differences of means is the bootstrap distribution.
To help clarify the procedure, here are random samples from the boys and girls:
B: 0.0, 3.0, 2.8, 0.9, 3.0, 0.0, 0.0, 6.5, 6.4, 8.7, 6.4, 1.0, 0.9, 5.5, 17.0, 17.0, 0.0, 3.0
G: 1.3, 0.0, 0.0, 0.0, 0.0, 1.3, 1.3, 0.0, 3.2, 0.0, 1.3, 5.2, 0.0, 0.0, 0.0.
Of course, in sampling with replacement some values will occur more than once
and some will not occur at all. For these two samples, the difference of means is
4.56 .91 ¼ 3.65. Doing this 999 times (using the R package boot) gives the
bootstrap distribution displayed in Figure 10.12.
The distribution looks almost normal, but with some positive skewness. The
idea of the bootstrap, with its samples taken from the original samples of boys and
girls, is for this histogram to resemble the true distribution of the difference of means.
If the original samples of boys and girls are representative of their populations, then
our histogram should be a reasonable imitation of the population distribution for the
difference of means.
10.6 Comparisons Using the Bootstrap and Permutation Methods 533