
For example, if the three observations from the treatment sample have ranks 1, 3,
and 5, this will lead to a value of T = 9. The theoretical expectation of Tunder the hy-
pothesis that no expression difference is present is (compare Eq. (3-71) in Chap-
ter 3) E
H
0
T
nn m 1
2
10:5. Thus, the P-value of the observation ac-
cording to Eq. (9-8) is p 2
1 1 2 3
20
0:7 and the observation is not signifi-
cant to reject the null hypothesis. It is clear that the distribution of T is symmetric
around the expectation value. This example illustrates a disadvantage of the test:
with small sample sizes, hardly any result will be significantly below the 0.05 le-
vel. This results from the combinatorial nature of the P-value computation in con-
trast to t-tests, where a theoretical probability distribution is assumed. However,
for reasonable sample sizes the number of permutations increases rapidly. For ex-
ample, there are 70, 720, and 12,870 possible combinations of rank orderings for
sample sizes equal to 4, 6, and 8, respectively.
Example 9.2: Comparison of tests
In a microarray study incorporating approximately 15,000 different cDNAs and
four independent hybridization experiments, we investigated the early differentia-
tion event in human blastocysts, i. e., the formation of the trophectoderm and the
inner cell mass. HMBG1 is a specific gene of interest because it has been pub-
lished as a potential “stemness” gene in human stem cell lines, i.e., a gene that is
relevant for remaining pluripotency of cells. HMBG1 is a member of the high-
mobility group of transcription factor–encoding proteins that act primarily as ar-
chitectural facilitators in the assembly of nucleoprotein complexes, e.g., the initia-
tion of transcription factor target genes.
The four measurements for the trophectoderm and ICM, respectively, are
32,612, 46,741, 29,238, 32,671
and 49,966, 58,037, 94,785, 122,044.
P-values are 3.7E-02 for Student’s t-test, 6.8E-02 for the Welch test, and 2.9E-02
for the Wilcoxon test. The ANOVA test results in a non-significant P-value. This
example shows how a high variance (ICM sample) can mislead the Gaussian-
based tests, whereas the rank-based test is fairly stable. Note that ranking sepa-
rates the groups perfectly.
9.2.3
Multiple Testing
The single-gene analysis described above has a major statistical drawback. We cannot
view each single test separately but have to take into account the fact that we perform
thousands of tests in parallel (for each gene on the array). Thus, a global significance
303
9.2 Fold-change Analysis