
CHAPTER 18
✦
Discrete Choices and Event Counts
791
TABLE 18.11
Marginal Effect of a Binary Variable
−
ˆ
β
xˆμ −
ˆ
β
x Prob[y = 0] Prob[y = 1] Prob[y = 2]
MARR = 0 −0.8863 0.9037 0.187 0.629 0.184
MARR = 1 −0.4063 1.3837 0.342 0.574 0.084
Change 0.155 −0.055 −0.100
18.3.2 A SPECIFICATION TEST FOR THE ORDERED CHOICE
MODEL
The basic formulation of the ordered choice model implies that for constructed binary
variables,
w
ij
= 1ify
i
≤ j, 0 otherwise, j = 1, 2,..., J − 1, (18-16)
Prob(w
ij
= 1 |x
i
) = F(x
i
β−μ
j
).
The first of these, when j = 1, is the binary choice model of Section 17.2. One implication
is that we could estimate the slopes, but not the threshold parameters, in the ordered
choice model just by using w
i1
and x
i
in a binary probit or logit model. (Note that this
result also implies the validity of combining adjacent cells in the ordered choice model.)
But, (18-16) also defines a set of J −1 binary choice models with different constants but
common slope vector, β. This equality of the parameter vectors in (18-16) has been
labeled the parallel regression assumption. Although it is merely an implication of the
model specification, this has been viewed as an implicit restriction on the model. [See,
for example, Long (1997, p. 141).] Brant (1990) suggests a test of the parallel regressions
assumption based on (18-16). One can, in principle, fit J −1 such binary choice models
separately. Each will produce its own constant term and a consistent estimator of the
common β. Brant’s Wald test examines the linear restrictions β
1
= β
2
=···=β
J −1
,or
H
0
: β
q
− β
1
= 0, q = 2,..., J − 1. The Wald statistic will be
χ
2
[(J − 2)K] = (R
ˆ
β
∗
)
[R × Asy.Var[
ˆ
β
∗
] × R
]
−1
(R
ˆ
β
∗
),
where
ˆ
β
∗
is obtained by stacking the individual binary logit or probit estimates of β
(without the constant terms). [See Brant (1990), Long (1997), or Greene and Hensher
(2010, page 187) for details on computing the statistic.]
Rejection of the null hypothesis calls the model specification into question. An
alternative model in which there is a different β for each value of y has two problems:
it does not force the probabilities to be positive and it is internally inconsistent. On the
latter point, consider the suggested latent regression, y
∗
= x
β
j
+ε. If the “β” is different
for each j, then it is not possible to construct a data generating mechanism for y
∗
(or,
for example, simulate it); the realized value of y
∗
cannot be defined without knowing
y (that is, the realized j), since the applicable β depends on j, but y is supposed to be
determined from y
∗
through, for example, (18-16). There is no parametric restriction
other than the one we seek to avoid that will preserve the ordering of the probabilities
for all values of the data and maintain the coherency of the model. This still leaves
the question of what specification failure would logically explain the finding. Some
suggestions in Brant (1990) include (1) misspecification of the latent regression, x
β;