
304 Measurement and Data Analysis for Engineering and Science
Historically, regression originally was called reversion. Reversion referred
to the tendency of a variable to revert to the average of the population from
which it came. It was Francis Galton who first elucidated the property of
reversion ([14]) by demonstrating how certain characteristics of a progeny
revert to the population average more than to the parents. So, in general
terms, regression analysis relates variables to their mean quantities.
8.6 Confidence Intervals
Thus far it has been shown how measurement uncertainties and those intro-
duced by assuming an incorrect order of the fit can contribute to differences
between the measured and calculated y values. There are additional uncer-
tainties that must be considered. These arise from the finite acquisition of
data in an experiment. The presence of these additional uncertainties af-
fects the confidence associated with various estimates related to the fit. For
example, in some situations, the inverse of the best-fit relation established
through calibration is used to determine unknown values of the indepen-
dent variable and its associated uncertainty. A typical example would be to
determine the value and uncertainty of an unknown force from a voltage
measurement using an established voltage-versus-force calibration curve. To
arrive at such estimates, the sources of these additional uncertainties must
be examined first.
For simplicity, focus on the situation where the correct order of the fit is
assumed and there is no measurement error in x. Here, σ
E
y
= σ
y
. That is,
the uncertainty in determining a value of y from the regression fit is solely
due to the measurement error in y.
Consider the following situation, as illustrated in Figure 8.3, in which
best fits for two sets of data obtained under the same experimental condi-
tions are plotted along with the data. Observe that different values of y
i
are
obtained for the same value of x
i
each time the measurement is repeated
(in this case there are two values of y
i
for each x
i
). This is because y is a
random variable drawn from a normally distributed population. Because x
is not a random variable, it is assumed to have no uncertainty. So, in all
likelihood, the best-fit expression of the first set of data, y = a
1
+b
1
x, will be
different from the second best-fit expression, y = a
2
+ b
2
x, having different
values for the intercepts (a
1
6= a
2
) and for the slopes (b
1
6= b
2
).
The true-mean regression line is given by Equation 8.45 in which x = x
0
.
The true intercept and true slope values are those of the underlying popu-
lation from which the finite samples are drawn. From another perspective,
the true-mean regression line would be that found from the least-squares
linear regression analysis of a very large set of data (N >> 1).