
68 2 Phenomenological Models
as LinRegEx1.r. Note that no formulas are available for general nonlinear
regression equations, as discussed in Section 2.4.
As Figure 2.2b shows, the regression line captures the tendency in the data. It is
thus reasonable to use the regression line for prediction, extrapolating the tendency
in the data using the regression line. Formally, this is done by inserting x values
into Equation 2.32. For example, to predict y for x = 60, we would compute as
follows:
ˆ
y(60) = 0.33 ·60 − 0.5 = 19.3 (2.33)
Looking at the data in Figure 2.2b, you see that
ˆ
y(60) = 19.3 indeed is a reasonable
extrapolation of the data. Of course, predictions of this kind can expected to be
useful only if the model fits the data sufficiently well, and if the predictions are
computed ‘‘close to the data’’. For example, our results would be questionable if we
used Equation 2.32 to predict y for x = 600, since this x-value would be far away
from the data in
spring.csv. The requirement that predictions should be made
close to the data that have been used to construct the model applies very generally to
phenomenological models, including the phenomenological approaches discussed
in the next sections.
Note 2.2.3 (Prediction) Regression functions such as Equation 2.27 can be
used to predict values of the response variable for given values of the explanatory
variable(s). Good predictions can be expected only if the regression function fits
the data sufficiently well, and if the given values of the explanatory variable lie
sufficiently close to the data.
2.2.3
The Coefficient of Determination
As to the quality of the fit between the model and the data, the simplest approach
is to look at appropriate graphical comparisons of the model with the data such
as Figure 2.2b. Based on that figure, you do not need to be a regression expert to
conclude that there is a good matching between model and data, and that reasonable
predictions can be expected using the regression line. A second approach is the
coefficient of determination, which is denoted as R
2
. Roughly speaking, the
coefficient of determination measures the quality of the fit between the model and
the data on a scale between 0 and 100%, where 0% refers to very poor fits and 100%
refers to a perfect matching between the model and the data. R
2
thus expresses the
quality of a regression model in terms of a single number, which is useful e.g. when
you want to compare the quality of several regression models, or if you evaluate
multiple linear regression models (Section 2.3) which involve higher-dimensional
regression functions
ˆ
y(x
1
, x
2
, ..., x
n
) that cannot be plotted similar to Figure 2.2b