
736
PART IV
✦
Cross Sections, Panel Data, and Microeconometrics
model that contains R in addition to x can provide the means for a simple test of
endogeneity. JKR (and Verbeek and Nijman) suggest using the number of waves at
which the individual is present as the measure of R. Thus, adding R to the pooled
model, we can use a simple t test for the hypothesis.
Devising an estimator given that (non)response is nonignorable requires a more
detailed understanding of the process generating the response pattern. The crucial
issue is whether the sample selection is based “on unobservables” or “on observables.”
Selection on unobservables results when, after conditioning on the relevant variables,
x and other information, z, the sampling mechanism is still nonrandom with respect to
the disturbances in the models. Selection on unobservables is at the heart of the sample
selectivity methodology pioneered by Heckman (1979) that we will study in Chapter 19.
(Some applications of the role of unobservables in biased estimation are discussed in
Chapter 8, where we examine sources of endogeneity in regression models.) If selection
is on observables and then conditioned on an appropriate specification involving the
observable information, (x,z), a consistent estimator of the model parameters will be
available by “purging” the estimator of the endogeneity of the sampling mechanism.
JKR adopt an inverse probability weighted (IPW) estimator devised by Robins,
Rotnitsky and Zhao (1995), Fitzgerald, Gottshalk, and Moffitt (1998), Moffitt, Fitzger-
ald and Gottshalk (1999), and Wooldridge (2002). The estimator is based on the general
MCAR assumption that P(R = 1 |h, x, z) = P(R = 1 |x, z). That is, the observable
covariates convey all the information that determines the response pattern—the prob-
ability of nonresponse does not vary systematically with the outcome variable once the
exogenous information is accounted for. Implementing this idea in an estimator would
require that x and z be observable when R = 0, that is, the exogenous data be avail-
able for the nonresponders. This will typically not be the case; in an unbalanced panel,
the entire observation is missing. Wooldridge (2002) proposed a somewhat stronger
assumption that makes estimation feasible: P(R = 1 |h, x, z) = P(R = 1 |z) where z is
a set of covariates available at wave 1 (entry to the study). To compute Wooldridge’s
IPW estimator, we will begin with the sample of all individuals who are present at wave
1 of the study. (In our Example 17.17, based on the GSOEP data, not all individuals
are present at the first wave.) At wave 1, (x
i1
, z
i1
) are observed for all individuals to be
studied; z
i1
contains information on observables that are not included in the outcome
equation and that predict the response pattern at subsequent waves, including the re-
sponse variable at the first wave. At wave 1, then, P(R
i1
= 1 |x
i1
, z
i1
) = 1. Wooldridge
suggests using a probit model for P(R
it
= 1 |x
i1
, z
i1
), t = 2,...,T for the remain-
ing waves to obtain predicted probabilities of response, ˆp
it
. The IPW estimator then
maximizes the weighted log likelihood
ln L
IPW
=
n
i=1
T
t=1
R
it
ˆp
it
ln L
it
.
Inference based on the weighted log-likelihood function can proceed as in Section 17.3.
A remaining detail concerns whether the use of the predicted probabilities in the
weighted log-likelihood function makes it necessary to correct the standard errors for
two-step estimation. The case here is not an application of the two-step estimators we
considered in Section 14.7, since the first step is not used to produce an estimated param-
eter vector in the second. Wooldridge (2002) shows that the standard errors computed