Greene W.H. Econometric Analysis

Подождите немного. Документ загружается.

CHAPTER 9

✦

The Generalized Regression Model

289

a. Compute the ordinary least squares regression of y on a constant, x

, and x

.Be

sure to compute the conventional estimator of the asymptotic covariance matrix

of the OLS estimator as well.

b. Compute the White estimator of the appropriate asymptotic covariance matrix

for the OLS estimates.

c. Test for the presence of heteroscedasticity using White’s general test. Do your

results suggest the nature of the heteroscedasticity?

d. Use the Breusch–Pagan/Godfrey Lagrange multiplier test to test for hetero-

scedasticity.

e. Reestimate the parameters using a two-step FGLS estimator. Use Harvey’s for-

mulation, Var[ε

, x

] = σ

exp(γ

+ γ

2. (We look ahead to our use of maximum likelihood to estimate the models discussed

in this chapter in Chapter 14.) In Example 9.4, we computed an iterated FGLS

estimator using the airline data and the model Var[ε

|Loadfactor

i,t

] = exp(γ

Loadfactor

i,t

). The weights computed at each iteration were computed by esti-

mating (γ

,γ

) by least squares regression of ln ˆε

i,t

on a constant and Loadfactor.

The maximum likelihood estimator would proceed along similar lines, however

the weights would be computed by regression of [ˆε

i,t

/ ˆσ

i,t

− 1] on a constant and

Loadfactor

i,t

instead. Use this alternative procedure to estimate the model. Do you

get different results?

SYSTEMS OF EQUATIONS

10.1 INTRODUCTION

There are many settings in which the single equation models of the previous chapters

apply to a group of related variables. In these contexts, it makes sense to consider the

several models jointly. Some examples follow.

1. Munnell’s (1990) model for output by the 48 continental U.S. states is

ln GSP

= β

+ β

ln pc

+ β

ln hwy

+ β

ln water

+ β

ln util

+ β

ln emp

+ β

unemp

+ ε

Taken one state at a time, this provides a set of 48 linear regression models. The

application develops a model in which the observations are correlated across time

within a state. An important question pursued here and in the applications in the

next example is whether it is valid to assume that the coefﬁcient vector is the same

for all states (individuals) in the sample.

2. The capital asset pricing model of ﬁnance speciﬁes that for a given security,

−r

= α

+ β

−r

) + ε

where r

is the return over period t on security i, r

is the return on a risk-free

security, r

is the market return, and β

is the security’s beta coefﬁcient. The distur-

bances are obviously correlated across securities. The knowledge that the return on

security i exceeds the risk-free rate by a given amount gives some information about

the excess return of security j, at least for some j’s. It may be useful to estimate

the equations jointly rather than ignore this connection.

3. Pesaran and Smith (1995) proposed a dynamic model for wage determination in 38

UK industries. The central equation is of the form

= α

+ x



+ γ

i,t−1

+ ε

Nair-Reichert and Weinhold’s (2001) cross-country analysis of growth of develop-

ing countries takes the same form. In both cases, each group (industry, country)

could be analyzed separately. However, the connections across groups and the in-

teresting question of “poolability”—that is, whether it is valid to assume identical

coefﬁcients—is a central part of the analysis. The lagged dependent variable in the

model produces a substantial complication.

4. In a model of production, the optimization conditions of economic theory imply that

if a ﬁrm faces a set of factor prices p, then its set of cost-minimizing factor demands

for producing output Q will be a set of equations of the form x

= f

(Q, p).

290

CHAPTER 10

✦

Systems of Equations

291

The empirical model is

= f

(Q, p|θ) + ε

= f

(Q, p|θ) + ε

···

= f

(Q, p|θ) + ε

where θ is a vector of parameters that are part of the technology and ε

represents

errors in optimization. Once again, the disturbances should be correlated. In addi-

tion, the same parameters of the production technology will enter all the demand

equations, so the set of equations has cross-equation restrictions. Estimating the

equations separately will waste the information that the same set of parameters

appears in all the equations.

5. The essential form of a model for equilibrium in a market is

Demand

= α

+ α

Price + α

Income + d



α + ε

Demand

Supply

= β

+ β

Price + s



β + ε

Supply

Equilibrium

= Q

Demand

= Q

Supply

where d and s are other variables that inﬂuence the equilibrium through their

impact on the demand and supply curves, respectively. This model differs from

those suggested thus far because the implication of the third equation is that Price

is not exogenous in the equation system. The equations of this model ﬁt more

appropriately in the instrumental variables framework developed in Chapter 8

than in the regression models developed in Chapters 1 to 7. The multiple equations

framework developed in this chapter provides additional results for estimating

“simultaneous equations models” such as this.

The multiple equations regression model developed in this chapter provides a

modeling framework that can be used in many different settings. The models of pro-

duction and cost developed in Section 10.5 provide the platform for a literature on

empirical analysis of ﬁrm behavior. At the macroeconomic level, the “vector autore-

gression models” used in Chapter 21 are speciﬁc forms of the seemingly unrelated

regressions model of Section 10.2. The simultaneous equations model presented in

Section 10.6 lies behind the speciﬁcation of the large variety of models considered in

Chapter 8.

This chapter will develop the essential theory for sets of related regression equa-

tions. Section 10.2 examines the general model in which each equation has its own ﬁxed

set of parameters, and examines efﬁcient estimation techniques. Section 10.2.6 examines

the “pooled” model with identical coefﬁcients in all equations. Production and consumer

demand models are a special case of the general model in which the equations of the

model obey an adding-up constraint that has important implications for speciﬁcation

and estimation. Section 10.3 suggests extensions of the seemingly unrelated regression

model to the generalized regression models with heteroscedasticity and autocorrelation

that are developed in Chapters 9 and 20. Section 10.4 broadens the seemingly unrelated

regressions model to nonlinear systems of equations. In Section 10.5, we examine a

classic application of the seemingly unrelated regressions model that illustrates the

292

PART II

✦

Generalized Regression Model and Equation Systems

interesting features of the current genre of demand studies in the applied literature.

The seemingly unrelated regressions model is then extended to the translog speciﬁca-

tion, which forms the platform for many recent microeconomic studies of production

and cost. Finally, Section 10.6 merges the results of Chapter 8 on models with endoge-

nous variables with the development in this chapter of multiple equation systems. In

Section 10.6, we will develop simultaneous equations models. These are systems of

equations that build on the seemingly unrelated regressions model to produce equation

systems that include interrelationships among the dependent variables. The supply and

demand model suggested in example 5 above, of equilibrium in which price and quantity

in a market are jointly determined, is an application.

10.2 THE SEEMINGLY UNRELATED

REGRESSIONS MODEL

All the examples suggested in the chapter introduction have a common multiple equa-

tion structure, which we may write as

= X

+ ε

= X

+ ε

··· (10-1)

= X

+ ε

There are M equations and T observations in the sample of data used to estimate them.

The second and third examples embody different types of constraints across equations

and different structures of the disturbances. A basic set of principles will apply to them

all, however.

The seemingly unrelated regressions (SUR) model in (10-1) is

= X

+ ε

, i = 1,...,M. (10-2)

Deﬁne the MT × 1 vector of disturbances,

ε = [ε



, ε



,...,ε



]



We assume strict exogeneity of X

E [ε |X

, X

,...,X

] = 0,

and homoscedasticity

E [ε



, X

,...,X

] = σ

We assume that a total of T observations are used in estimating the parameters of

the M equations.

Each equation involves K

regressors, for a total of K =



i=1

We will require T > K

. The data are assumed to be well behaved, as described in

The use of T is not meant to imply any necessary connection to time series. For instance, in the fourth

example, above, the data might be cross sectional.

See the surveys by Srivastava and Dwivedi (1979), Srivastava and Giles (1987), and Fiebig (2001).

There are a few results for unequal numbers of observations, such as Schmidt (1977), Baltagi, Garvin, and

Kerman (1989), Conniffe (1985), Hwang (1990), and Im (1994). But, the case of ﬁxed T is the norm in practice.

CHAPTER 10

✦

Systems of Equations

293

Section 4.4.1, and we shall not treat the issue separately here. For the present, we also

assume that disturbances are uncorrelated across observations but correlated across

equations. Therefore,

E [ε

, X

,...,X

] = σ

, if t = s and 0 otherwise.

The disturbance formulation is, therefore,

E [ε



, X

,...,X

] = σ

E [εε



, X

,...,X

] =  =

⎡

⎢

⎣

I σ

I ··· σ

I σ

I ··· σ

I σ

I ··· σ

⎤

⎥

⎦

. (10-3)

It will be convenient in the discussion below to have a term for the particular

kind of model in which the data matrices are group speciﬁc data sets on the same set

of variables. Munnell’s model noted in the introduction is such a case. This special

case of the seemingly unrelated regressions model is a multivariate regression model.

In contrast, the cost function model examined in Section 10.4.1 is not of this type—it

consists of a cost function that involves output and prices and a set of cost share equations

that have only a set of constant terms. We emphasize, this is merely a convenient term

for a speciﬁc form of the SUR model, not a modiﬁcation of the model itself.

10.2.1 GENERALIZED LEAST SQUARES

Each equation is, by itself, a classical regression. Therefore, the parameters could be

estimated consistently, if not efﬁciently, one equation at a time by ordinary least squares.

The generalized regression model applies to the stacked model,

⎡

⎢

⎣

⎤

⎥

⎦

⎡

⎢

⎣

0 ··· 0

··· 0

00··· X

⎤

⎥

⎦

⎡

⎢

⎣

⎤

⎥

⎦

⎡

⎢

⎣

⎤

⎥

⎦

= Xβ + ε. (10-4)

Therefore, the efﬁcient estimator is generalized least squares.

The model has a partic-

ularly convenient form. For the tth observation, the M × M covariance matrix of the

disturbances is

 =

⎡

⎢

⎣

··· σ

⎤

⎥

⎦

, (10-5)

See Zellner (1962) and Telser (1964).

294

PART II

✦

Generalized Regression Model and Equation Systems

so, in (10-3),

 =  ⊗ I (10-6)

and



−1

= 

−1

⊗ I.

Denoting the ijth element of 

−1

by σ

, we ﬁnd that the GLS estimator is

β = [X





−1





−1

y = [X



(

−1

⊗ I)X]

−1



(

−1

⊗ I)y. (10-7)

Expanding the Kronecker products produces

β =

⎡

⎢

⎣



··· σ



··· σ



··· σ



⎤

⎥

⎦

−1

⎡

⎢

⎣



j=1

1 j





j=1

2 j





j=1



⎤

⎥

⎦

The asymptotic covariance matrix for the GLS estimator is the bracketed inverse matrix

in (10-7). All the results of Chapter 9 for the generalized regression model extend to

this model (which has both heteroscedasticity and autocorrelation).

This estimator is obviously different from ordinary least squares. At this point,

however, the equations are linked only by their disturbances—hence the name seem-

ingly unrelated regressions model—so it is interesting to ask just how much efﬁciency

is gained by using generalized least squares instead of ordinary least squares. Zellner

(1962) and Dwivedi and Srivastava (1978) have analyzed some special cases in detail.

1. If the equations are actually unrelated—that is, if σ

= 0 for i = j—then there is

obviously no payoff to GLS estimation of the full set of equations. Indeed, full GLS

is equation by equation OLS.

2. If the equations have identical explanatory variables—that is, if X

= X

—then

OLS and GLS are identical. We will turn to this case in Section 10.2.2.

3. If the regressors in one block of equations are a subset of those in another, then GLS

brings no efﬁciency gain over OLS in estimation of the smaller set of equations;

thus, GLS and OLS are once again identical.

In the more general case, with unrestricted correlation of the disturbances and

different regressors in the equations, the results are complicated and dependent on

See Appendix Section A.5.5.

See also Baltagi (1989) and Bartels and Fiebig (1992) for other cases in which OLS = GLS.

An intriguing result, albeit probably of negligible practical signiﬁcance, is that the result also applies if the



s are all nonsingular, and not necessarily identical, linear combinations of the same set of variables. The

formal result which is a corollary of Kruskal’s theorem [see Davidson and MacKinnon (1993, p. 294)] is that

OLS and GLS will be the same if the K columns of X are a linear combination of exactly K characteristic

vectors of . By showing the equality of OLS and GLS here, we have veriﬁed the conditions of the corollary.

The general result is pursued in the exercises. The intriguing result cited is now an obvious case.

The result was analyzed by Goldberger (1970) and later by Revankar (1974) and Conniffe (1982a, b).

CHAPTER 10

✦

Systems of Equations

295

the data. Two propositions that apply generally are as follows:

1. The greater is the correlation of the disturbances, the greater is the efﬁciency gain

accruing to GLS.

2. The less correlation there is between the X matrices, the greater is the gain in

efﬁciency in using GLS.

10.2.2 SEEMINGLY UNRELATED REGRESSIONS

WITH IDENTICAL REGRESSORS

The case of identical regressors is quite common, notably in the capital asset pricing

model in empirical ﬁnance—see the chapter introduction. In this special case, general-

ized least squares is equivalent to equation by equation ordinary least squares. Impose

the assumption that X

= X

= X, so that X



= X



X for all i and j in (10-7). The

inverse matrix on the right-hand side now becomes [

−1

⊗X



−1

, which, using (A-76),

equals [ ⊗ (X



−1

]. Also on the right-hand side, each term X



equals X



, which,

in turn equals X



. With these results, after moving the common X



X out of the

summations on the right-hand side, we obtain

β =

⎡

⎢

⎣



−1



−1

··· σ



−1



−1



−1

··· σ



−1



−1



−1

··· σ



−1

⎤

⎥

⎦

⎡

⎢

⎣





l=1





l=1





l=1

⎤

⎥

⎦

. (10-8)

Now, we isolate one of the subvectors, say the ﬁrst, from

β. After multiplication, the

moment matrices cancel, and we are left with



j=1

1 j



l=1





j=1

1 j







j=1

1 j



+···+b





j=1

1 j



The terms in parentheses are the elements of the ﬁrst row of 

−1

= I, so the end result

. For the remaining subvectors, which are obtained the same way,

, which

is the result we sought.

To reiterate, the important result we have here is that in the SUR model, when all

equations have the same regressors, the efﬁcient estimator is single-equation ordinary

least squares; OLS is the same as GLS. Also, the asymptotic covariance matrix of

for this case is given by the large inverse matrix in brackets in (10-8), which would be

estimated by

Est. Asy. Cov[

] = ˆσ



−1

, i, j = 1,...,M, where



= ˆσ



Except in some special cases, this general result is lost if there are any restrictions on β,

either within or across equations.

See also Binkley (1982) and Binkley and Nelson (1988).

See Hashimoto and Ohtani (1990) for discussion of hypothesis testing in this case.

296

PART II

✦

Generalized Regression Model and Equation Systems

10.2.3 FEASIBLE GENERALIZED LEAST SQUARES

The preceding discussion assumes that  is known, which, as usual, is unlikely to be the

case. FGLS estimators have been devised, however.

. The least squares residuals may

be used (of course) to estimate consistently the elements of  with

ˆσ

= s



. (10-9)

The consistency of s

follows from that of b

and b

A degrees of freedom correction

in the divisor is occasionally suggested. Two possibilities that are unbiased when i = j

are

∗



[(T − K

)(T − K

)]

1/2

and s

∗∗



T − max(K

, K

)

(10-10)

Whether unbiasedness of the estimator of  used for FGLS is a virtue here is uncertain.

The asymptotic properties of the feasible GLS estimator,

β do not rely on an unbiased

estimator of ; only consistency is required. All our results from Chapters 8 and 9 for

FGLS estimators extend to this model, with no modiﬁcation. We shall use (10-9) in what

follows. With

S =

⎡

⎢

⎣

··· s

⎤

⎥

⎦

(10-11)

in hand, FGLS can proceed as usual.

10.2.4 TESTING HYPOTHESES

For testing a hypothesis about β, a statistic analogous to the F ratio in multiple regression

analysis is

F[J, MT − K] =

β − q)



[R(X





−1



]

−1

β − q)/J

ˆε





−1

ˆε/(MT − K)

. (10-12)

The computation requires the unknown . If we insert the FGLS estimate

 based on

(10-9) and use the result that the denominator converges to one, then, in large samples,

the statistic will behave the same as

F =

β − q)



(

Var[

β]R



]

−1

β − q). (10-13)

This can be referred to the standard F table. Because it uses the estimated , even

with normally distributed disturbances, the F distribution is only valid approximately.

In general, the statistic F[J, n] converges to 1/ J times a chi-squared [J ]asn →∞.

See Zellner (1962) and Zellner and Huang (1962). The FGLS estimator for this model is also labeled

Zellner’s efﬁcient estimator, or ZEF, in reference to Zellner (1962) where it was introduced

Perhaps surprisingly, if it is assumed that the density of ε is symmetric, as it would be with normality, then

is also unbiased. See Kakwani (1967).

See, as well, Judge et al. (1985), Theil (1971), and Srivastava and Giles (1987).

CHAPTER 10

✦

Systems of Equations

297

Therefore, an alternative test statistic that has a limiting chi-squared distribution with

J degrees of freedom when the null hypothesis is true is

F = (R

β − q)



(

Var[

β]R



]

−1

β − q). (10-14)

This can be recognized as a Wald statistic that measures the distance between R

β and q.

Both statistics are valid asymptotically, but (10-13) may perform better in a small or

moderately sized sample.

Once again, the divisor used in computing ˆσ

may make a

difference, but there is no general rule.

A hypothesis of particular interest is the homogeneity restriction of equal coefﬁ-

cient vectors in the multivariate regression model. That case is fairly common in this

setting. The homogeneity restriction is that β

= β

, i = 1,...,M −1. Consistent with

(10-13)–(10-14), we would form the hypothesis as

Rβ =

⎡

⎢

⎣

I0··· 0 −I

0I··· 0 −I

···

00··· I −I

⎤

⎥

⎦

⎛

⎜

⎝

···

⎞

⎟

⎠

⎛

⎜

⎝

− β

···

M−1

− β

⎞

⎟

⎠

= 0. (10-15)

This speciﬁes a total of (M −1)K restrictions on the KM ×1 parameter vector. Denote

the estimated asymptotic covariance for (

) as

. The bracketed matrix in (10-13)

would have typical block

(

Var[

β]R



]

−

This may be a considerable amount of computation. The test will be simpler if the

model has been ﬁt by maximum likelihood, as we examine in Section 14.9.3. Pesaran

and Yamagata (2008) provide an alternative test that can be used when M is large and

T is relatively small.

10.2.5 A SPECIFICATION TEST FOR THE SUR MODEL

It is of interest to assess statistically whether the off diagonal elements of  are zero. If

so, then the efﬁcient estimator for the full parameter vector, absent heteroscedasticity or

autocorrelation, is equation by equation ordinary least squares. There is no standard test

for the general case of the SUR model unless the additional assumption of normality of

the disturbances is imposed in (10-2) and (10-3). With normally distributed disturbances,

the standard trio of tests, Wald, likelihood ratio, and Lagrange multiplier, can be used.

For reasons we will turn to shortly, the Wald test is likely to be too cumbersome to apply.

With normally distributed disturbances, the likelihood ratio statistic for testing the null

hypothesis that the matrix  in (10-5) is a diagonal matrix against the alternative that

 is simply an unrestricted positive deﬁnite matrix would be

= T[ln |S

|−ln |S

|], (10-16)

See Judge et al. (1985, p. 476). The Wald statistic often performs poorly in the small sample sizes typical in

this area. Fiebig (2001, pp. 108–110) surveys a recent literature on methods of improving the power of testing

procedures in SUR models.

298

PART II

✦

Generalized Regression Model and Equation Systems

where S

is the residual covariance matrix deﬁned in (10-9) (without a degrees of free-

dom correction). The residuals are computed using maximum likelihood estimates of

the parameters, not FGLS.

Under the null hypothesis, the model would be efﬁciently

estimated by individual equation OLS, so

ln |S



i=1

ln (e



/T ),

where e

= y

−X

. The limiting distribution of the likelihood ratio statistic under the

null hypothesis would be chi-squared with M(M − 1)/2 degrees of freedom.

The likelihood ratio statistic requires the unrestricted MLE to compute the residual

covariance matrix under the alternative, so it is can be cumbersome to compute. A

simpler alternative is the Lagrange multiplier statistic developed by Breusch and Pagan

(1980) which is

= T



i=2

i−1



j=1

(10-17)

= (T/2)[trace(R



R) − M],

where R is the sample correlation matrix of the M sets of T OLS residuals. This has the

same large sample distribution under the null hypothesis as the likelihood ratio statistic,

but is obviously easier to compute, as it only requires the OLS residuals.

The third test statistic in the trio is the Wald statistic. In principle, the Wald statistic

for the SUR model would be computed using

W = ˆσ



[Asy. Var( ˆσ )]

−1

ˆσ ,

where ˆσ is the M(M − 1)/2 length vector containing the estimates of the off-diagonal

(lower triangle) elements of , and the asymptotic covariance matrix of the estimator

appears in the brackets. Under normality, the asymptotic covariance matrix contains the

corresponding elements of 2 ⊗ /T. It would be possible to estimate the covariance

term more generally using a moment-based estimator. Because

ˆσ



t=1

is a mean of T observations, one might use the conventional estimator of its variance

and its covariance with ˆσ

, which would be

ij,lm

T − 1



t=1

− ˆσ

)(e

− ˆσ

). (10-18)

The modiﬁed Wald statistic would then be



= ˆσ



[F]

−1

ˆσ

In the SUR model of this chapter, the MLE for normally distributed disturbances can be computed by

iterating the FGLS procedure, back and forth between (10-7) and (10-9) until the estimates are no longer

changing. We note, this procedure produces the MLE when it converges, but it is not guaranteed to converge,

nor is it assured that there is a unique MLE. For our regional data set, the iterated FGLS procedure does not

converge after 1,000 iterations. The Oberhofer–Kmenta (1974) result implies that if the iteration converges,

it reaches the MLE. It does not guarantee that the iteration will converge, however. The problem with this

application may be the very small sample size, 17 observations. One would not normally use the technique

of maximum likelihood with a sample this small.