the statistical significance of x
j
is determined by whether we can reject H
0
:
j
0 at a
sufficiently small significance level.
As we briefly discussed in Section 7.5 for the linear probability model, we can com-
pute a goodness-of-fit measure called the percent correctly predicted. As before, we
define a binary predictor of y
i
to be one if the predicted probability is at least .5, and zero
otherwise. Mathematically,
~
y
i
1 if G(
ˆ
0
x
i
) .5 and
~
y
i
0 if G(
ˆ
0
x
i
) .5.
Given {
~
y
i
: i 1,2, ...,n}, we can see how well
~
y
i
predicts y
i
across all observations. There
are four possible outcomes on each pair, (y
i
,
~
y
i
); when both are zero or both are one, we
make the correct prediction. In the two cases where one of the pair is zero and the other
is one, we make the incorrect prediction. The percent correctly predicted is the percent-
age of times that
~
y
i
y
i
.
Although the percent correctly predicted is useful as a goodness-of-fit measure, it can
be misleading. In particular, it is possible to get rather high percentages correctly predicted
even when the least likely outcome is very poorly predicted. For example, suppose that
n 200, 160 observations have y
i
0, and, out of these 160 observations, 140 of the
~
y
i
are also zero (so we correctly predict 87.5% of the zero outcomes). Even if none of the
predictions is correct when y
i
1, we still correctly predict 70% of all outcomes (140/
200 .70). Often, we hope to have some ability to predict the least likely outcome (such
as whether someone is arrested for committing a crime), and so we should be up front
about how well we do in predicting each outcome. Therefore, it makes sense to also com-
pute the percent correctly predicted for each of the outcomes. Problem 17.1 asks you to
show that the overall percent correctly predicted is a weighted average of q
ˆ
0
(the percent
correctly predicted for y
i
0) and q
ˆ
1
(the percent correctly predicted for y
i
1), where
the weights are the fractions of zeros and ones in the sample, respectively.
Some have criticized the prediction rule just described for using a threshold value of .5,
especially when one of the outcomes is unlikely. For example, if
_
y .08 (only 8% “suc-
cesses” in the sample), it could be that we never predict y
i
1 because the estimated prob-
ability of success is never greater than .5. One alternative is to use the fraction of successes
in the sample as the threshold—.08 in the previous example. In other words, define
~
y
i
1
when G(
ˆ
0
x
i
) .08 and zero otherwise. Using this rule will certainly increase the
number of predicted successes, but not without cost: we will necessarily make more
mistakes—perhaps many more—in predicting zeros (“failures”). In terms of the overall
percent correctly predicted, we may do worse than using the .5 threshold.
A third possibility is to choose the threshold such that the fraction of
~
y
i
1 in the
sample is the same as (or very close to)
_
y. In other words, search over threshold values t,
0 t 1, such that if we define
~
y
i
1 when G(
ˆ
0
x
i
) t, then
n
i1
~
y
i
n
i1
y
i
.
(The trial-and-error required to find the desired value of t can be tedious but it is feasi-
ble. In some cases, it will not be possible to make the number of predicted successes
exactly the same as the number of successes in the sample.) Now, given this set of
~
y
i
,we
can compute the percent correctly predicted for each of the two outcomes as well as the
overall percent correctly predicted.
There are also various pseudo R-squared measures for binary response. McFadden
(1974) suggests the measure 1
ur
/
o
,where
ur
is the log-likelihood function for
the estimated model, and
o
is the log-likelihood function in the model with only an
intercept. Why does this measure make sense? Recall that the log-likelihoods are negative,
Chapter 17 Limited Dependent Variable Models and Sample Selection Corrections 589