the placebo. The burden of proof is on the alternative hypothesis; a real
difference must exist between the two groups of patients—one that is unlikely
to have occurred by chance alone. To take an absurd case, suppose that only
one patient was in each of the control and the experimental groups; the
patient given the drug reduced his cholesterol by points, while the patient
given the placebo reduced his cholesterol by only points. Clearly, the
difference could easily arise for chance reasons totally unrelated to the drug.
If people had been in each group, however, and all in the drug treatment
had reduced their cholesterol levels by more points than all in the placebo
treatment, then we could be very sure that the differences were real (due to
the drug). The alternative hypothesis met the burden of proof; thus, we could
reject the null hypothesis of no difference between the drug and the placebo
treatment. Note that failure to reject the null hypothesis doesn’t mean that we
accept the null hypothesis. In the first case, in which only one person had
been in each treatment group, that difference could have been real. We just
lacked proof that it was. The alternative hypothesis did not meet the burden
of proof.
Biologists, like many other researchers, rely on statistical tests to determine
whether differences between groups represent differences that are unlikely to
have arisen by chance alone. Two types of errors can be made in these tests.
The first is rejecting the null hypothesis—saying that there is a difference
between the groups when in fact, no real difference exists. Sometimes, these
errors are called “false positives” (note that positive here doesn’t mean posi-
tive selection). The other type of error is not rejecting the null hypothesis
when there really is a difference. These errors are sometimes called “false
negatives.” By convention, scientists are willing to accept a rate of making
false positives. That is to say, the convention allows that the null hypothesis
is to be rejected when the probability that the differences among the groups
are due to chance is less than . (This level of is somewhat arbitrary and
can be adjusted.) Differences that are unlikely to come about by chance alone
are sometimes called statistically significant. Differences that are extremely
unlikely to come about by chance are sometimes called highly significant.
Note that for any particular result deemed statistically significant, a small
probability exists that it is a false positive. If many tests are performed,
chances are good that at least one of the statistically significant results is a
false positive. Further testing can mitigate that problem.
An important property of a statistical test is its ability to reject the null
hypothesis—that is, not to make a “false negative” error—at a given level of
making “false positive” errors. This property, known as power, is a measure
of the discriminatory ability of the test, and it generally increases with
increasing the number of data. For instance, the likelihood of making a false
negative is lower if one tested individuals with a placebo and with the
drug than it would be if one only tested from each group. Testing with
the larger samples allows for greater power. For a given number of data,
certain tests and experimental designs have more power than do others.
Negative Selection and the Neutral Theory of Molecular Evolution