A very simple argument would be to calculate the averages of the two series and
compare the ratios. However, this would not allow judging whether the ratio has any
significance. If we make some additional assumptions, we can describe the problem
using an appropriate probability distribution. We regard the two series as realiza-
tions of random variables x
1
,…,x
n
and y
1
,…,y
m
. Statistical tests typically have two
constraints: it is assumed that (1) repetitions are independent and (2) the random
variables are identically distributed within each sample. Test decisions are based
upon a reasonable test statistic, a real-valued function T, on both samples. For speci-
fic functions and using the distributional assumptions, it has been shown that they
follow a quantifiable probability law given the null hypothesis H
0
. Suppose that we
observe a value of the test statistic T (x
1
,…,x
n
, y
1
,…,y
m
)=
^
t.IfT could be described
with a probability law, we can then judge the significance of the observation by
prob(T more extreme than
^
t|H
0
). This probability is called a P-value. Thus, if one as-
signs a P-value of 0.05 to a certain observation, this means that under the distribu-
tional assumptions, the probability of observing an outcome more extreme than the
observed one is 0.05 given the null hypothesis. Observations with a small P-value ty-
pically give incidence that the null hypothesis should be rejected. This makes it pos-
sible to quantify statistically, using a probability distribution, whether the result is
significant. In practice, significance levels of 0.01, 0.05, and 0.1 are used as upper
bounds for significant results.
In such a test setup, two types of error occur: error of the first kind and error of
the second kind.
H
0
is true H
1
is true
Test does not reject H
0
No error (TN) Error of the second kind (FN)
Test rejects H
0
Error of the first kind (FP) No error (TP)
The error of the first kind is the false-positive (FP) rate of the test. Usually, this er-
ror can be controlled by the analysis by assuming a significance level of a and jud-
ging only those results where the probability is lower than a as significant. The error
of the second kind is the false-negative (FN) rate of the test. The power of a test (TP)
(given a significance level a) is defined as the probability of rejecting H
0
across the
parameter space that is under consideration. It should be low in the subset of the
parameter space that belongs to H
0
and high in the subset H
1
. The quantities
TP
TP FN
and
TN
FP TN
are called sensitivity and specificity, respectively. An optimal
test procedure would give a result of 1 to both quantities.
3.4.3.2 Two-sample Location Tests
Assume that both series are independently Gaussian distributed, N ( m
x
,s
2
) and
N( m
y
,s
2
) respectively, with equal variances. Thus we interpret each series value x
i
as
an outcome of independent random variables that are Gaussian distributed with the
respective parameters (y
i
likewise). We want to test the hypothesis of whether the
sample means are equal, i.e.,
93
3.4 Statistics