Chapter 9 More on Specification and Data Problems 317
Adding the log of the crime rate from five years earlier has a large effect on the expendi-
tures coefficient. The elasticity of the crime rate with respect to expenditures becomes .14,
with t 1.28. This is not strongly significant, but it suggests that a more sophisticated model
with more cities in the sample could produce significant results.
Not surprisingly, the current crime rate is strongly related to the past crime rate. The esti-
mate indicates that if the crime rate in 1982 was 1% higher, then the crime rate in 1987 is
predicted to be about 1.19% higher. We cannot reject the hypothesis that the elasticity of
current crime with respect to past crime is unity [t (1.194 1)/.132 1.47]. Adding the
past crime rate increases the explanatory power of the regression markedly, but this is no sur-
prise. The primary reason for including the lagged crime rate is to obtain a better estimate of
the ceteris paribus effect of log(lawexpc
87
) on log(crmrte
87
).
The practice of putting in a lagged y as a general way of controlling for unobserved
variables is hardly perfect. But it can aid in getting a better estimate of the effects of pol-
icy variables on various outcomes.
Adding a lagged value of y is not the only way to use two years of data to control for
omitted factors. When we discuss panel data methods in Chapters 13 and 14, we will cover
other ways to use repeated data on the same cross-sectional units at different points in time.
A Different Slant on Multiple Regression
The discussion of proxy variables in this section suggests an alternative way of interpret-
ing a multiple regression analysis when we do not necessarily observe all relevant explana-
tory variables. Until now, we have specified the population model of interest with an addi-
tive error, as in equation (9.9). Our discussion of that example hinged upon whether we
have a suitable proxy variable (IQ score in this case, other test scores more generally) for
the unobserved explanatory variable, which we called “ability.”
A less structured, more general approach to multiple regression is to forego specify-
ing models with unobservables. Rather, we begin with the premise that we have access to
a set of observable explanatory variables—which includes the variable of primary inter-
est, such as years of schooling, and controls, such as observable test scores. We then model
the mean of y conditional on the observed explanatory variables. For example, in the wage
example with lwage denoting log(wage), we can estimate E(lwage
|
educ,exper,tenure,
south,urban,black,IQ)—exactly what is reported in Table 9.2. The difference now is that
we set our goals more modestly. Namely, rather than introduce the nebulous concept of
“ability” in equation (9.9), we state from the outset that we will estimate the ceteris paribus
effect of education holding IQ (and the other observed factors) fixed. There is no need to
discuss whether IQ is a suitable proxy for ability. Consequently, while we may not be
answering the question underlying equation (9.9), we are answering a question of inter-
est: if two people have the same IQ levels (and same values of experience, tenure, and so
on), yet they differ in education levels by a year, what is the expected difference in their
log wages?
As another example, if we include as an explanatory variable the poverty rate in a
school-level regression to assess the effects of spending on standardized test scores, we