802
PART IV
✦
Cross Sections, Panel Data, and Microeconometrics
country B with the same health status would report only “Fair.” A simple frequency of
the distribution of self-assessments of health status in the two countries would suggest
that people in country A are much healthier than those in country B when, in fact, the
opposite is true. Correcting for the influences of DIF in such a situation would be essen-
tial to obtaining a meaningful comparison of the two countries. The impact of DIF is an
accepted feature of the model within a population but could be strongly distortionary
when comparing very disparate groups, such as across countries, as in KMST (political
groups), Murray, Tandon, Mathers, and Sudana (2002) (health outcomes), Tandon et al.
(2004), and KSV (work disability), Sirven, Santos-Egglmann, and Spagnoli (2008), and
Gupta, Kristensens, and Possoli (2008) (health), Angelini et al. (2008) (life satisfaction),
Kristensen and Johansson (2008), and Bago d’Uva et al. (2008), all of whom used the
ordered probit model to make cross group comparisons.
KMST proposed the use of anchoring vignettes to resolve this difference in per-
ceptions across groups. The essential approach is to use a series of examples that, it
is believed, all respondents will agree on to estimate each respondent’s DIF and cor-
rect for it. The idea of using vignettes to anchor perceptions in survey questions is not
itself new; KMST cite a number of earlier uses. The innovation is their method for
incorporating the approach in a formal model for the ordered choices. The bivariate
and multivariate probit models that they develp combine the elements described in
Sections 18.3.1–18.3.3 and the HOPIT model in Section 18.3.5.
18.4 MODELS FOR COUNTS OF EVENTS
We have encountered behavioral variables that involve counts of events at several
points in this text. In Examples 14.10 and 17.20, we examined the number of times an
individual visited the physician using the GSOEP data. The credit default data that we
used in Examples 7.10 and 17.22 also include another behavioral variable, the number
of derogatory reports in an individual’s credit history. Finally, in Example 17.23, we ana-
lyzed data on firm innovation. Innovation is often analyzed [for example, by Hausman,
Hall, and Griliches (1984) and many others] in terms of the number of patents that the
firm obtains (or applies for). In each of these cases, the variable of interest is a count
of events. This obviously differs from the discrete dependent variables we analyzed in
the previous two sections. A count is a quantitative measure that is, at least in principle,
amenable to analysis using multiple linear regression. However, the typical preponder-
ance of zeros and small values and the discrete nature of the outcome variable suggest
that the regression approach can be improved by a method that explicitly accounts for
these aspects.
Like the basic multinomial logit model for unordered data in Section 18.2 and the
simple probit and logit models for binary and ordered data in Sections 17.2 and 18.3,
the Poisson regression model is the fundamental starting point for the analysis of count
data. We will develop the elements of modeling for count data in this framework in
Sections 18.4.1–18.4.3, and then turn to more elaborate, flexible specifications in subse-
quent sections. Sections 18.4.4 and 18.4.5 will present the negative binomial and other
alternatives to the Poisson functional form. Section 18.4.6 will describe the implications
for the model specification of some complicating features of observed data, truncation,
and censoring. Truncation arises when certain values, such as zero, are absent from