CHAPTER 14
✦
Maximum Likelihood Estimation
535
that we have examined so far, the log-likelihood and these results are all based on the
sum of squared residuals, and as we have seen, imposing restrictions never reduces the
sum of squares.) The limiting distribution of the LR statistic under the assumption of
the null hypothesis is chi squared with degrees of freedom equal to the reduction in the
number of dimensions of the parameter space of the alternative hypothesis that results
from imposing the restrictions.
Vuong’s analysis is concerned with nonnested models for which
i
m
i
need not
be positive. Formalizing the test requires us to look more closely at what is meant
by the “right” model (and provides a convenient departure point for the discussion
in the next two sections). In the context of nonnested models, Vuong allows for the
possibility that neither model is “true” in the absolute sense. We maintain the clas-
sical assumption that there does exist a “true” model, h(y
i
|Z
i
, α) where α is the
“true” parameter vector, but possibly neither hypothesized model is that true model.
The Kullback–Leibler Information Criterion (KLIC) measures the distance between
the true model (distribution) and a hypothesized model in terms of the likelihood
function. Loosely, the KLIC is the log-likelihood function under the hypothesis of
the true model minus the log-likelihood function for the (misspecified) hypothesized
model under the assumption of the true model. Formally, for the model of the null
hypothesis,
KLIC = E[ln h(y
i
|Z
i
, α) |h is true] − E[ln f (y
i
|Z
i,
θ) |h is true].
The first term on the right hand side is what we would estimate with (1/n)ln L if we
maximized the log-likelihood for the true model, h(y
i
|Z
i
, α). The second term is what
is estimated by (1/n) ln L assuming (incorrectly) that f (y
i
|Z
i
, θ ) is the correct model.
Notice that f (y
i
|Z
i
, θ ) is written in terms of a parameter vector, θ. Because α is the
“true” parameter vector, it is perhaps ambiguous what is meant by the parameteriza-
tion, θ . Vuong (p. 310) calls this the “pseudotrue” parameter vector. It is the vector
of constants that the estimator converges to when one uses the estimator implied by
f (y
i
|Z
i
, θ ). In Example 5.7, if H
0
gives the correct model, this formulation assumes
that the least squares estimator in H
1
would converge to some vector of pseudo-true
parameters. But, these are not the parameters of the correct model—they would be the
slopes in the population linear projection of C
t
on [1, Y
t
, C
t−1
].
Suppose the “true” model is y = Xβ + ε, with normally distributed disturbances
and y = Zδ + w is the proposed competing model. The KLIC would be the ex-
pected log-likelihood function for the true model minus the expected log-likelihood
function for the second model, still assuming that the first one is the truth. By con-
struction, the KLIC is positive. We will now say that one model is “better” than an-
other if it is closer to the “truth” based on the KLIC. If we take the difference of
the two KLICs for two models, the true log-likelihood function falls out, and we are
left with
KLIC
1
− KLIC
0
= E[ln f (y
i
|Z
i
, θ ) |h is true] − E[ln g(y
i
|Z
i
, γ ) |h is true].
To compute this using a sample, we would simply compute the likelihood ratio statis-
tic, n ¯m (without multiplying by 2) again. Thus, this provides an interpretation of the
LR statistic. But, in this context, the statistic can be negative—we don’t know which
competing model is closer to the truth.