Chapter 13 Pooling Cross Sections across Time: Simple Panel Data Methods 449
randomly selecting people from a population at a given point in time. Then, these same
people are reinterviewed at several subsequent points in time. This gives us data on
wages, hours, education, and so on, for the same group of people in different years.
Panel data sets are fairly easy to collect for school districts, cities, counties, states,
and countries, and policy analysis is greatly enhanced by using panel data sets; we will
see some examples in the following discussion. For the econometric analysis of panel
data, we cannot assume that the observations are independently distributed across time.
For example, unobserved factors (such as ability) that affect someone’s wage in 1990 will
also affect that person’s wage in 1991; unobserved factors that affect a city’s crime rate
in 1985 will also affect that city’s crime rate in 1990. For this reason, special models and
methods have been developed to analyze panel data. In Sections 13.3, 13.4, and 13.5, we
describe the straightforward method of differencing to remove time-constant, unobserved
attributes of the units being studied. Because panel data methods are somewhat more
advanced, we will rely mostly on intuition in describing the statistical properties of the
estimation procedures, leaving detailed assumptions to the chapter appendix. We follow
the same strategy in Chapter 14, which covers more complicated panel data methods.
13.1 Pooling Independent Cross
Sections across Time
Many surveys of individuals, families, and firms are repeated at regular intervals, often
each year. An example is the Current Population Survey (or CPS), which randomly sam-
ples households each year. (See, for example, CPS78_85.RAW, which contains data from
the 1978 and 1985 CPS.) If a random sample is drawn at each time period, pooling the
resulting random samples gives us an independently pooled cross section.
One reason for using independently pooled cross sections is to increase the sample
size. By pooling random samples drawn from the same population, but at different points
in time, we can get more precise estimators and test statistics with more power. Pooling
is helpful in this regard only insofar as the relationship between the dependent variable
and at least some of the independent variables remains constant over time.
As mentioned in the introduction, using pooled cross sections raises only minor
statistical complications. Typically, to reflect the fact that the population may have
different distributions in different time periods, we allow the intercept to differ across peri-
ods, usually years. This is easily accomplished by including dummy variables for all but
one year, where the earliest year in the sample is usually chosen as the base year. It is also
possible that the error variance changes over time, something we discuss later.
Sometimes, the pattern of coefficients on the year dummy variables is itself of
interest. For example, a demographer may be interested in the following question:
After controlling for education, has the pattern of fertility among women over
age 35 changed between 1972 and 1984? The following example illustrates how this
question is simply answered by using multiple regression analysis with year dummy
variables.