would simply be the sum of all the individual r
2
’s from the K different SLRs of Y on
each X
k
. However, unless the data were gathered under controlled conditions (e.g., via
an experiment), the regressors in social data analysis will rarely be orthogonal. Most
of the time they are correlated—sometimes highly so.
With this in mind, there are four primary advantages of MULR over SLR. First, by
including several regressors in the same model, we can counteract omitted-variable
bias in the coefficient for any given X
k
. Such bias would be present were we to omit
a regressor that is correlated with both X
k
and a determinant of Y. Second, we can
examine the discriminatory power achieved when employing the collection of regres-
sors simultaneously to model Y. When the X’s are correlated, R
2
is no longer the sim-
ple sum of r
2
’s from the SLRs of Y on each X
k
. R
2
can be either smaller or larger than
that sum. The first two advantages are germane only when regressors are correlated.
However, the next two advantages apply even if the regressors are all mutually
orthogonal. The third advantage of MULR is its ability to model relationships
between Y and X
k
that are nonlinear, or to model statistical interaction among two or
more regressors. Although interaction is discussed below, a consideration of nonlin-
ear relationships between the X’s and Y is postponed until Chapter 5. The final advan-
tage is that in employing MULR, we are able to obtain a much more precise estimate
of the disturbance variance than is the case with SLR. By “precise” I mean an esti-
mate that is as free as possible from systematic influences and that comes as close as
possible to representing purely random error. The importance of this, as we shall see,
is that it makes tests of individual slope coefficients much more sensitive to real
regressor effects than would otherwise be the case.
Example
Figure 3.1 presents a scatterplot of Y against X for 26 cases, along with the OLS fitted
line for the linear regression of Y on X. It appears that there is a strong linear impact
of X on Y, with a slope of 1.438 that is highly significant ( p ⬍ .0001). For this regres-
sion, the r
2
is .835, and the estimate of the error variance is 3.003. However, the plot
is somewhat deceptive in suggesting that X has such a strong impact on Y. In truth,
this relationship is driven largely by a third, omitted variable, Z. Z is strongly related
to both Y (r
zy
⫽ .943) and X (r
zx
⫽ .891). If Z is a cause of both Y and X, or even if Z
is a cause of Y but only a correlate of X, the SLR of Y on X leads us to overestimate
the true impact of X on Y, perhaps by a considerable amount. What is needed here is
to control for Z and then reassess the impact of X on Y.
Controlling for a Third Variable
What is meant in this instance by controlling for Z or holding Z constant? I begin
with a mechanical analogy. Figure 3.2 illustrates a three-variable system involving
X, Y, and Z. Actually, there are four variables—counting ε, the disturbance—but only
three that are observed. Suppose that this system is the true model for Y. Further,
imagine that the circles enveloping X, Y, Z, and ε are gears, and the arrows and
80 INTRODUCTION TO MULTIPLE REGRESSION