
Regression and Correlation 317
Caution should be exercised in interpreting various values of the linear
correlation coefficient. For example, a value of r ∼ 0 simply means that the
two variables are not linearly correlated. They could be highly correlated
nonlinearly. Further, a value of r ∼ ±1 implies that there is a strong linear
correlation. But the correlation could be casual, such as a correlation be-
tween the number of cars sold and pints of Guinness consumed in Ireland.
Both are related to Ireland’s population, but not directly to each other.
Also, even if the linear correlation coefficient value is close to unity, that
does not imply necessarily that the fit is the most appropriate. Although
the spring’s energy is related fundamentally to the square of its extension,
a linear correlation coefficient value of 0.979 results for Case 2 in section
8.9 when correlating a spring’s energy with its extension. This high value
implies a strong linear correlation between energy and extension, but it does
not imply that a linear relation is the most appropriate one.
Finally, when attempting to establish a correlation between two variables
it is important to recognize the possibility that two uncorrelated variables
can appear to be correlated simply by chance. This circumstance makes it
imperative to go one step more than simply calculating the value of r. One
must also determine the probability that N measurements of two uncorre-
lated variables will give a value of r equal to or larger than any particular
r
o
. This probability is determined by
P
N
(| r |≥| r
o
|) =
2Γ[(N − 1)/2]
√
πΓ[(N − 2)/2]
Z
1
|r
o
|
(1 − r
2
)
(N−4)/2
dr = f(N, r), (8.44)
where Γ denotes the gamma function. If P
N
(| r |≥| r
o
|) is small, then it is
unlikely that the variables are uncorrelated. That is, it is likely that they are
correlated. Thus, 1 − P
N
(| r |≥| r
o
|) is the probability that two variables
are correlated given | r |≥| r
o
|. If 1 − P
N
(| r |≥| r
o
|) > 0.95, then there
is a significant correlation, and if 1 − P
N
(| r |≥| r
o
|) > 0.99, then there
is a very significant correlation. Values of 1 − P
N
(| r |≥| r
o
|) versus the
number of measurements, N , are shown in Figure 8.12. For example, a value
of r
o
= 0.6 gives a 60 % chance of correlation for N = 4 and a 99.8 % chance
of correlation for N = 25. Thus, whenever citing a value of r it is imperative
to present the percent confidence of the correlation and the number of data
points upon which it is based. Reporting a value of r alone is ambiguous.
8.8 Uncertainty from Measurement Error
One of the major contributors to the differences between the measured and
calculated y values in a regression analysis is measurement error. This can
be understood best by examining the linear case.