Hayashi ch3 4 - GMM
Hayashi ch3 4 - GMM
Hayashi ch3 4 - GMM
Motivation: In our previous linear model, the most important assumption we made is the orthogonality between error term and regressors
(i.e. Strict exogeneity or predetermined regressors), without which the OLS estimator is not even consistent for the desired
(i.e. from our model yi =xi + i ) Endogeneity Bias!
Since in economics the orthogonality condition is not satisfied, we develop methods here to deal with endogenous regressors,
called Generalized Method of Moments (GMM), which includes OLS as a special case.
Example 1:
Single Equation GMM: Here we relax assumptions even further and take away the predetermined regressor assumption. Instead, we have
orthogonality condition from instruments.
I. Assumptions:
3.1 Linearity: The data we observe comes from underlying RVs { yi (1x1), xi (1xd)}with
(this is the equation we want to estimate)
yi = xi + i (i = 1,2,,n)
3.2 Ergodic Stationarity:
Let zi be a M-dimensional vector of instruments, and let wi be the unique and nonconstant elements of (yi,xi, zi).
{wi} is jointly stationary and ergodic.
3.3 Orthogonality Condition
All the M variables in zi are predetermined in the sense that they are all orthogonal to the current error term:
E ( z im i ) = 0 i, m E[(yi xi) . zi ] = 0 E(gi) = 0 where gi = zi . (yi xi) = zi .i
Note on Moment Conditions: E[(yi xi)1x1 . zi Mx1 ]Mx1 = 0 These are the M moment conditions
Note on Instruments vs. Regressors: Even though we denote regressors and instruments by xi and zi, this does not mean that
they do not share the same variables. Not true! Regressors that are predetermined are instruments, and regressors that are not
predetermined are endogenous regressors.
Note on 1 as an Instrument: Typically we will include 1 as an instrument E(i) = 0!
3.4 Rank Condition for Identification : Guarantees theres a unique solution to the system of equations 1
The m x d matrix E(zi xi)mxd is of full column rank (or E(xi zi) is of full row rank), M > d (# of equations > # of unknowns).
We denote this matrix by ZX.
3.5 Martingale Difference with Finite Second Moments: Assumption for Asymptotic Normality
gi = xi . i is a martingale difference sequence with finite second moments gi is the sequence of moment conditions!
{ gi } is a martingale difference sequence (so E(gi) = 0 with E(gi | gi-1, gi-2,, g1)= 0 for i > 2) no serial correlation in gi
The KxK matrix of cross moments, E(gi gi) is nonsingular.
so,
g nx1 =
1
n
i i
nx1 ) = E ( g i gi ' ) by
i =1
This is called the rank condition for identification for the following reason (see proof: the condition guarantees unique min)
We can rewrite the moment/orthogonality condition as a system of K simultaneous equations:
E[g(wi;1)] = 0mx1 where g(wi;)= xi . (yi zi), wi is the unique and nonconstant elements of (yi,xi, zi), and 1 is the parameter vector / coefficient vector
The moment condition means that the true value of the coefficient vector 1 is a solution to this system of K simultaneous equations. Assumptions 3.1 3.3. guarantees that
there exists a solution to the moment conditions, but if the coefficient vector (or the equation) is identified if there is a unique solution to the moment condition. A necessary
and sufficient condition for a unique solution in the system of simultaneous equations is that XZ has full column rank.
Derivation: Want unique minimization of E[g(wi;1)]. Let be a minimizer.
unique iff b , E P [g ( wi ; b)] EP [g ( wi ; )] where P is the underlying distribution that generated the data
E P [zi ( yi xi ' b) ] E P [zi ( yi xi ' )] E P [zi ( xi ' b)] E P [zi ( xi ' )] E P [zi ( xi ' b)] E P [zi ( xi ' )] 0
E P [zi ( xi ' b xi ' )] 0 E P [zi xi ']mxd (b ) 0 E P [zi xi ']mxd = XZ has full column rank with m d
If not, then there exists a nonzero vector b such that (b ) is in the Ker(EP[zixi]). i.e. The columns are linearly dependent, so there exists a non-trivial linear
combination (b ) such that EP[zixi] (b ) = 0 (Minimizer not unique!)
Order Condition for Identification - m > d
A necessary condition (embedded in the proofs here) is that m > d (# of equations > # of unknowns) order condition for identification1
We can interpret this in 3 ways: #predetermined vars > #regressors or #orthogonality conditions > #parameters or # orthogonality conditions > #parameters
If order condition is not satisfied, then the equation or parameter is not identified.
We say that the equation is
1. Overidentified if the rank condition is satisfied and m > d
2. Exactly identified / just identified if the rank condition is satisfied and m = d
3. Underidentified / not identified if the rank condition is not satisfied or m < d
2
IID is a special case. If iid, we only need Lindberg-Chevy CLT
If instruments include a constant, then the error term is a martingales sequence (and a fortiori serially uncorrelated)
The assumption is hard to interpret, so we interpret an easier/sufficient condition: E(i | i-1, i-2,, 1, zz, zi-1,, z1 ) = 0
Besides being a martingales sequence and therefore uncorrelated with itself, the error term is orthogonal not only to the current but also to the past instruments
Since gigi= i2 zizi, S = E(gigi) is a matrix of 4th moments, so consistent estimation of S will require a 4th moment assumption (assumption 3.6)
Ezi
Ez2i zi A
Ezi xi
Ezi2
If cov( z2i , xi ) = 0, then 1st column of A times E ( xi ) = 2nd column of A
1
zi = Ez2i
Ez
i
zi )
Exi
Ez2i xi
IV. Generalized Method of Moments Defined: Well show that IV estimator is a special GMM estimator (i.e. exactly identified system)
1. General Setup: The true parameter of interest is the solution to the moment conditions
s.t. E [g (Wi , b)] = 0 = arg min E [g (Wi , b)]' W E [g (Wi , b)] for some p.d . weighting matrix
= arg max E [g (Wi , b)]' W E [g (Wi , b)]
1
By Analogy Principle, b GMM = arg max b
n
i =1
'
1
g (Wi , b) W n
n
g (W , b)
i
i =1
1
n
g (W , b)
i
i =1
Applied to Linear Model: Our model is yi = xi + i with the moment condition E[(yi xi) . zi ] = 0
o Expression for sample moment condition in linear model
g n (Wi , b) =
1
n
(
n
i =1
) 1n z
n
z i y i x i' b =
i yi
i =1
1
n
z i x i' b =
i =1
1
n
zi yi
i =1
1
n
i =1
1
n
i =1
1
zi yi
n
z x b
'
i i
i =1
Method of Moments: If the equation is exactly identified m = d, and there exists a unique b such that gn(Wi,b) = 0, and zx
invertible, then for sufficiently large n, SZX p zx by ergodic theorem and is invertible with prob. 1.
So, with large sample size the system of simultaneous equation has unique solution given by
The MM estimator
bIV = (S ZX
s ZY
1
=
n
z i x i'
i =1
z
i =1
i =1
'
g (Wi , b) W n ( mxm)
g (W , b)
i
i =1
Note: If the equation is just identified, then regardless of the weighting matrix, the GMM estimator = the IV estimator numerically
3.
The IV estimator is defined for the EXACTLY IDENTIFIED case (i.e. the case where there are as many instruments as endogenous regressors). If zi = xi , i.e. all the regressors
are predetermined/orthogonal to the error contemporaneous term, then this boils down to the OLS estimator. So, OLS is a special case of MM estimator. And, IV and OLS are both
special cases of GMM
5
The quadratic form formulation gn(Wi,b)Wn gn(Wi,b) gives us a 1x1 real number over which we can define the minimization/maximization problem. Otherwise it would be
impossible to minimize over a m-dimensional vector of moment conditions gn(Wi,b)zd
1
1
bGMM = arg max bR n g n (Wi , b )' Wn g n (Wi , b ) = arg max bR n (sZY S ZX b )'Wn (sZY S ZX b )
2
2
1
6 FOC : Assume int eriority ,
sZY ( mx1) S ZX b 'Wn sZY ( mx1) S ZX b = 0 S ZX 'Wn sZY S ZX b = 0 S ZX 'Wn s ZY = S ZX 'Wn S ZX b
b 2
By assumption 3.2 and 3.4, S XZ is of full column rank for sufficiently big n with prob 1, then S XZ 'Wn S XZ invert. (Wn p.d .)
) (
Claim: if Amxn with m> n has full column rank, Wmxm p.d., then AWA invertible.
Proof: Suppose not. Then there exists non-zero nx1 vector c s.t. cAWAc = 0 there exists a mx1 vector d = Ac, d nonzero (since A has full column rank so there are no
nontrivial linear combination of columns that give zero vector), and dWd = 0. Contradiction to the assumption that W is p.d.!
'
'
1 1 '
'
1
'
1 1 '
'
1 1 '
bGMM = ( S ZX
Wn S ZX ) 1 S ZX
Wn sZY = ( S ZX
Wn ( S ZX ) 1 ) S ZX
Wn sZY = S ZX
sZY = bIV Note : ( S ZX
Wn S ZX ) 1 = ( S ZX
Wn ( S ZX ) 1) sin ce S ZX
Wn S ZX S ZX
Wn ( S ZX ) 1 = I
2.
D
n (bGMM (W n ) )
N (0, V )
'XZ W S W XZ 'XZ W XZ
= (E ( x i z i ' )' W E ( x i z i ' ) )1 E ( x i z i ' )' W E ( g i g i ' ) W E ( x i z i ' ) (E ( x i z i ' )' W E ( x i z i ' ) )1 where W = p lim W n
3.
GMM
ZX
n ZX
ZX
n ZX
ZX
n ZX
4.
1
n
2
i
( )
E i2
( )
i =1
Consistent estimation of S: Weve assumed S* exists thus far- How do we obtain consistent estimator S of Skxk from the sample (y,X)?
Suppose the coefficient estimate b used for calculating the residual for S is consistent for , and suppose S =E(gigi) exists and is
5.
1
finite. Then, under assumptions 3.1, 3.2, and 3.6, S =
n
2
i zi zi '
is consistent for S. 14
i =1
n
n
1
1
1
1
1
bGMM = ( S ZX 'Wn S ZX ) S ZX 'Wn sZY = ( S ZX 'Wn SZX ) S ZX 'Wn
zi yi = ( S ZX 'Wn S ZX ) S ZX 'Wn
zi ( xi ' + i ) by 3.1
n
i =1
i =1
= ( S ZX 'Wn S ZX )
10
n
n
1
1
1
1
S ZX 'Wn
zi ( xi ' ) + ( S ZX 'Wn S ZX ) S ZX 'Wn
zi i = + ( S ZX 'Wn S ZX ) S ZX 'Wn s z
n
i =1
i =1
P
z
i =1
11
GMM
Continuing from above,
1
1
bGMM = ( S ZX 'Wn S ZX ) S ZX 'Wn s x n bGMM = ( S ZX 'Wn S ZX ) S ZX 'Wn n sz
n
1
D
P
P
n s x =
zi i N (0, E ( gi gi ')) by Ergodic Martingale Differences CLT , S ZX XZ by Ergodic Theorem, Wn W by construction
n
i =1
1
1
1
D
n bGMM = ( S ZX 'Wn S ZX ) S ZX 'Wn ns z N 0, ( ZX 'Wn ZX ) ZX 'W E ( gi gi ')W ' ZX ( ZX 'Wn ZX ) by CMT and Slutsky ' s
12
This follows from above. Standard asymptotic tools.
13
This proof is very similar to 3D from previous notes
1
n
i =1
ei2 =
1
n
i =1
1
n
i =1
1
n
i 2 + 2(b )
i =1
P
P
P
2(b )' s x
0 sin ce s x
some finite vector and b
P
P
P
1
n
i xi ' + (b )'
1
xi xi ' (b ) =
n
i =1
i =1
2
i
VI. Efficient GMM: How do we choose Wn to minimize Avar(bGMM(Wn))? Let Wn = E(gi gi)-1 inverse of variance of moment conditions
1. Efficient weighting matrix is given by Wn* = S-1 = E(gi gi)-1
For any weighting matrix Wn, Avar(bGMM(Wn)) > Avar(bGMM(Wn*))= [zxWn*zx ]-1 =[zx S-1 zx ]-1 =[E(xizi) (E(gigi)-1) E(xizi) ]-1
eff
bGMM
( S 1 ) = S 'ZX S 1S ZX
2.
) (S
1
' 1
ZX S s ZY
(
A var ( b
) (
)) = ( S
eff
(S 1 ) = 'ZX S 1 ZX
A var bGMM
eff
GMM
3.
(S 1
' 1
ZX S S ZX
)
)
b( S 1 )l l
1 ' 1
S ZX S S ZX
n
Robust
SEl*
) (
)(
'
' 1
W = n a b( S 1 ) A b( S 1 ) S ZX
S S ZX
4.
) (
1
ll
'
A b( S 1 ) a b( S 1 )
Pick some arbitrary weighting matrix (e.g. I) (that converges in probability to a symmetric p.d. W) and obtain a preliminary consistent GMM
estimator bGMM = (SzxWn Szx)-1 SzxWnsZY , which we will use to construct the optimal W
Note: Usually we set Wn = SXX-1. Then bGMM(SXX-1) = (SzxSzz-1Szx)-1 Szx Szz-1sZY This is the 2SLS
1
Then, using the preliminary estimator bGMM(SZZ-1) we construct S 1 with S =
n
2
i zi zi '
i =1
) (
' 1
' 1
S ZX
S sZY with A var bneff,GMM = S ZX
S S ZX
12
with B
Or in matrix notation: b( S 1 ) = [ X ' Z ( Z ' BZ ) 1 Z ' X ] X ' Z ( Z ' BZ ) 1 Z ' y
Note: With this notation wan see that the efficient GMM is a GLS estimator!
5.
2
(in 2SLS, B = I)
n2
'
Then, by above, A var(b) S ZX
W S ZX
'
' S ' W S
S ZX
W SWS
ZX
ZX
ZX
And
(a) Under the null hypothesis H 0 : k = k ,
n bk (W ) k
tk
)=
bk k
A var(b (W )) kk
1
A var(b (W )) kk
n
bk k
D N (0,1)
15
This t-ratio is the robust t-ratio because it uses the S.E. that is robust to errors that can be conditionally
heteroskedastic.
(b) Under the null hypothesis, H 0 : R# rxK Kx1 = r# rx1 where R is an #rxK matrix, #r < K, with full row rank
(where #r is the number of restrictions on )
( Rb(W ) r ) D 2 (# r ) 16
Then,
a( )
is # a K matrix of continuous derivatives with full row rank
Kx1
W n a(b(W ))1' x # a A(b(W )) # axK A var(b(W )) KxK A(b(W ))'Kx # a a (b(W )) # ax1 D 2 (# a )
18
15
)
(
n ( bk k )
A var(bk )
D N (0,1)
16
)(
n Rb r = cn' Qn1cn
)(
n R Rb
The full row rank condition is there so that the hypothesis is well-defined. Thisis the generalization of the requirement for linear restrictions Rb = r that R is full row rank.
D
By (b) above, n b N (0, A var(b)) By Delta Method , n a (b) a ( ) D c c ~ N (0, A(b) A var(b) A(b) ')
VIII.
1.
Test for Overidentifying Restrictions: If the equation is exactly identified, then it is possible to choose b* s.t.
gb(b*) = 0 and J(,Wn) = gn(b*)Wn gn(b*) = 0 (We call b* the IV estimator). If the equation is overidentified, then the distance
cannot be set to 0 exactly (since there is no correlation between the moment conditions), though we expect the minimized distance to
be close to 0. If we choose the efficient weighting matrix W* s.t. plim W* = S-1, then the minimized distance is asymptotically chisquared.
Hansens Test of Overidentifying Restrictions:
Suppose there is available a consistent estimator S of S ( = E(gigi) ). Under assumptions 3.1 3.5,
D
J b
S 1 , S 1 = g b
S 1 ' S 1 g b
S 1
2 (m d )
GMM
( ) )
GMM
( ))
GMM
( ))
Note:
This says that the objective function evaluated at the estimator, i.e. the minimum distance, is asymptotically chi-squared.
This is a specification test, testing whether all the restrictions of the model (i.e. 3.1 3.5) are satisfied. Given a large enough sample,
if the J statistic is surprisingly large, then either the orthogonality condition (3.3) or the other assumptions (or both) are likely to be
false.
2.
g 2 n b d 2 x1
S 21( d 2 xd1 ) S 22( d 2 xd 2 )
()
()
1
n
z z
'
i i
i =1
1
n
z z
'
i i
i =1
bGMM(( SZZ) ) = (Szx( Szz)-1Szx)-1 Szx ( Szz)-1sZY = (Szx(Szz)-1Szx)-1 Szx (Szz)-1sZY = bGMM(SZZ-1) = b2SLS
(Does not depend on 2 !) 20
-1
X. 2SLS: 2SLS is a (special case) GMM estimator, i.e. with a particular choice of weighting matrix - ( 2 SZZ)-1.
Its also the efficient GMM estimator obtained under conditional homoskedasticity.
A. Alternative Derivations of 2SLS: 2SLS as IV estimator and 2SLS as 2 Regressions
x1'
z1'
y1
'
'
x2
z2
y
Let X nxd = , Z nxm = , ynx1 = 2
x'
z'
y
n
n
n
Then,
b
= (S
2 SLS
ZX
= ( X ' Z ( Z ' Z ) 1 Z ' X ) 1 X ' Z ( Z ' Z ) 1 Z ' y = ( X ' PX ) 1 X ' Py where P = Z ( Z ' Z ) 1 Z '
1
'
1
and y X b2 SLS
(b2 SLS ) = n 2 X ' Z ( Z ' Z ) 1 Z ' X = n 2 [ X ' PX ] where
AVar
n
T Statistic : tl =
b2 SLS ,l l
2 [ X ' PX ]
Wald Statistic : W =
D N (0,1)
ll
1
a(b2 SLS ) ' A(b2 SLS ) X ' Z ( Z ' Z ) 1 Z ' X A(b2 SLS ) '
a(b2 SLS )
( y Xb ) P ( y Xb )
=
D ChiSq (# a )
'
J Statistic : J b, ( 2 S XX ) 1
S arg an ' s Statistic :
19
20
ChiSq (m d )
' P
2
S = E(gigi) = E[ (zi i) (zi i)] = E[i2 zi zi] = E[E(i2 zi zi | zi) ] = E[zi ziE(i2 | zi )]= [2 zi zi ] =2ZZ
Note: In the efficient 2-step GMM estimation, the first step is to obtain a consistent estimator of S. Under conditional homoskedasticity, we dont need to perform the first step
1
n
z z = S
'
i i
-1
ZZ . So, the second step estimator collapses to the GMM estimator with SXX as
i =1
the weighting matrix. This estimator is called the 2SLS estimator because it can be estimated by 2 OLS regressions.
(Verify that these are indeed instruments: uncorrelated with errors and correlated with endogenous var)
b.
Recall, if ZnxD is the data matrix of D instruments for the D endogenous regressors XnxD, then
bIV =
1
S ZX
sZY
1
=
n
i =1
zi xi '
xi xi '
z y
x y = ( X ' X ) ( X 'Y ) = b
i i
i =1
i =1
2 SLS
i i
i =1
1
Since P is symmetric and idempotent, b2 SLS = ( X ' PX ) 1 X ' Py = ( X ' P ' PX ) X ' Py = X ' X
X 'y
a.
b.
2 SLS
Note: OLS packages return 2SLS SEs based on the residual vector y X b2 SLS .
This is NOT the same y X b
(i.e. the true estimated residual of interest). Therefore the estimated asymptotic standard
2 SLS
variance from the second stage cannot be used for statistical inference.
C. Asymptotic Properties of 2SLS: These results follow from the fact that 2SLS is special case of GMM with Wn = (Szz)-1
a.
b.
Consistency 21 : Under Assumptions 3.1 3.4, the 2SLS estimator b2SLS = (Szx(Szz)-1Szx)-1 Szx (Szz)-1sZY is consistent.
Asymptotic Normality: If we add Assumption 3.5 to 3.1 3.4, then the 2SLS estimator is asymptotically normal
D
1
n bGMM ( S ZZ
)
N (0, V )
) (
1
where V = A var bGMM ( S ZZ
) = 'XZ ZZ1 XZ
= ( E ( xi zi ') ' E ( zi zi ') E ( xi zi ') ) E ( xi zi ') ' E ( zi zi ') E ( gi gi ') E ( zi zi ') E ( xi zi ') ( E ( xi zi ') ' E ( zi zi ') E ( xi zi ') )
bc S ZZ =
21
1
n
i i
i =1
Since 2SLS estimator is a special case of GMM, therefore consistency follows from the general case
c.
Conditional Homoskedasticity: If we add Assumption 3.7 to 3.1 3.5, then the estimator is the efficient GMM estimator with
the asymptotic variance given by:
GMM Estimator Under Conditional Homoskedasticity: Setting S * = 2
1
n
z z
'
i i
i =1
bGMM(( SZZ)-1) = (Szx( Szz)-1Szx)-1 Szx ( Szz)-1sZY = (Szx(Szz)-1Szx)-1 Szx (Szz)-1sZY = bGMM(SZZ-1) = b2SLS
( )
eff 1
A var(b2 SLS ) = A var bGMM
S
'
2
= under cond hom o = ZX ZZ
'
1
A var(b2 SLS ) = 2 S ZX
S ZZ
S ZX
d.
1
n
( y x ' b )
n
2 SLS
i =1
2 ' 1
S ZX S ZZ S ZX
n
2
1
= X ' Z ( Z ' Z ) ZX ' = X ' PZ X '
l
W=
f.
where 2
N (0,1)
Robust SE (b2 SLS )l
Robust SEl =
e.
1
a(b2 SLS ) ' A(b2 SLS ) X ' Z ( Z ' Z ) 1 Z ' X A(b2 SLS ) '
a (b2 SLS )
D 2 (# r )
' P D
2 (m d )
2
E. When Regressors are Predetermined and Errors are Conditionally Homoskedastic: Efficient GMM = OLS!
When all regressors are predetermined and errors are conditionally homoskedastic, the objective function (J statistic) for the
efficient GMM estimator/2SLS is:
1
1
( y Xb2 SLS ) ' P( y Xb2 SLS )
2
Z ( y Xb2 SLS ) =
J b2 SLS , 2 S ZZ
= Z ( y Xb2 SLS ) ' S ZZ
2
y ' Py b2 SLS ' X ' Py y ' PXb2 SLS + b2 SLS ' X ' PXb2 SLS
y ' Py 2b2 SLS ' X ' Py + b2 SLS ' X ' PXb2 SLS
=
=
2
2
y ' Py 2b2 SLS ' X ' P ' y + b2 SLS ' X ' PXb2 SLS
Since P = P '
=
2
y ' Py 2b2 SLS ' X ' y + b2 SLS ' X ' Xb2 SLS
=
Since PX = X when xi zi (i.e. regressors are instruments )
2
( y Zb2 SLS ) ' ( y Zb2 SLS ) y ' y y ' Py
=
2
2
( y Zb2 SLS ) ' ( y Zb2 SLS ) ( y y )( y y ) ' where y Py
=
2
2
Since the last term does not depend on b, minimizing J amounts to minimizing the SSR = (y Z b) (y Z b).
Implication:
i. Efficient GMM estimator is OLS (This is true as long as zi = xi, ie. regressors are predetermined)
ii. The restricted efficient GMM estimator subject to constraints of the null hypothesis is the restricted OLS (whose objective
function is not J but SSR
iii. Wald statistic, which is numerically equal to the LR statistic, can be calculated as the difference in SSR with and whtout the
imposition of the null, normalized to 2 . (this confirms the derivation in 2.6 that the LR principle can derive the Wald)
(Note: This is why we can fit OLS in the GMM framework, treating xs as instruments)
F. Limited Information Maximum Likelihood Estimator (LIML): This is the ML counterpart of 2SLS
Theyre both k class estimators. 2SLS is a k class estimator with k = 1 (p. 541). So when the equation is just identified (k = 1),
LIML = 2SLS numerically.
Summary: Under conditional homoskedasticity, multiple equation GMM reduces to the full-information instrumental variable efficient
estimator (FIVE), which reduces to the 3SLS if the set of instruments is common to all equations. If we further assume that all regressors are
predetermined, the 3SLS reduces to seemingly unrelated regressions (SUR), which in turn reduces to the multivariate regression when all the
equations have the same regressors.
I.
Assumptions: There are M equations, each of which is a linear equation like the one in GMM
4.1 Linearity: There are M linear equation,
yim = ximm + im (m = 1, 2, ... , m; i = 1,2,,n) (this is the system of equations we want to estimate)
(xim is the dm-dim. vector of regressors, m is the coefficient vector, and im is the unobservable error term for the m-th equation)
Note on interequation correlation and cross-equation restriction 22
Cross-Equation restrictions often occur in panel data models, where the same relationship can be estimated for different points in
time. 23
z i1( K x1) i1
(1xd1 ) ' d1 x1 1x1
1
1
E ( zim im ) = 0 i, m = 1, 2,..., M E ( g i ) M E
= E
= 0 M
K m x1
K x1
m
z
y i1 z i1(1xd M ) ' d M x1 1x1
i =1
i =1
z iM ( K M x1) i1
i1( K M x1)
Note on Cross Orthogonalities: The model assumes no cross orthogonalities. e.g. zi1 and i2 do not have to be orthogonal.
However, if a variable is included in both zi1 and zi2 (shared instrument), then 4.3 implies that the variable is orthogonal to both i1
and i2.
22
The model makes no assumptions about the interequation (or contemporaneous) correlation between errors (i1,, iM). Also, there is no a priori restrictions on the coefficients
from different equations: i.e. the model assumes no cross-equation restrictions on the coefficients.
Example: Suppose we want to estimate the wage equation (a la Grliches) and add to it the equation for KWW (score on the Knowledge of the World Test)
4.4 Rank Condition for Identification: Guarantees theres a unique solution to the system of equations25
For each of the m (=1,2,,M) equations, the Km x dm matrix E(zi xi)KmxDm is of full column rank (or E(xi zi) is of full row rank),
Km > dm (# of equations > # of unknowns) for all m.
4.5 gi is a Martingale Difference with Finite Second Moments: Assumption for Asymptotic Normality
{gi} is a joint martingale difference sequence with finite second moments
(so E(gi) = 0 with E(gi | gi-1, gi-2,, g1)= 0 for i > 2) no serial correlation in gi)
E ( i1 i1 z i1 z i1 ') E ( i1 iM z i1 z iM ')
is
The matrix of cross moments, S = E(gi gi) = S = E ( g i g i ' ) M M =
K
K
m m
E ( i1 i1 z iM z i1 ') E ( iM iM z iM z iM ')
m =1
m =1
nonsingular.
Note: This is stronger than assuming gim= zim. im is MDS in each equation m.
Addl:
4.6 Finite fourth moments for Regressors: (For consistent estimation of S)
E[(zimkxihj)2] exists and is finite for all k = 1,, Km,j = 1,2, , Dh m,h = 1,2,M, where zimk is the kth element of xim and zihj is
the jth elemnt of zih
4.7 Conditional Homoskedasticity: Constant Cross Moment
E(imih | zim , zih) = mh2 for all m, h = 1,2,, M or E(ii| Zi)=
Note on Complete System of Simultaneous Equations: the Complete system adds more assumptions to our model, assumptions which are
unnecessary for development of ME GMM. They are covered in 8.5.
II.
Multuple-Equation GMM Defined: This is same as Single Equation GMM but with re-defined matrices
1. General Setup
1
The parameter of interest GMM = is defined implicitly as the solution to the moment conditions
M
E ( g i ( wi , )) M
K m x1
i =1
z i1( K x1) y i1 x i1
E ( z i1( K x1) y i1 )
(1xd1 ) ' d1 x1 1x1
1
1
E
=
z
E( z
'
)
y
x
y
iM ( K M x1)
iM
i1
iM (1xd M )
d M x1 1x1
iM ( K M x1)
M
0
E( z
0
iM ( K M x1) y iM )
K m x1
i =1
0
0
(1xd1 ) ' d1 x1 )
1
'
E( z
'
)
x
i1( K1 x1)
iM (1xd1 )
d M x1
K m x1
i =1
1( d x1)
1
'
M
M
E ( z i1( K1 x1) x iM
'
)
M
M
d
x
(
1
)
(1xd1 )
M
d m x1
K m x d m
0
i =1
i =1
i =1
ZY ZX = 0
Sample Analogue:
1
1 n
z i1( K1 x1) y i1
n
n i =1
g n ( wi , ) M
1 n
K m x1
i =1
z iM ( K M x1) y iM
n i =1
x iM ( d M x1)
x i1( d x1)
'
and ZX = E ( Z i X i ) with X i =
z iM ( K M x1)
z i1( K x1)
1
Where ZY = E ( Z i Yi ) with Z i =
z
i =1
'
i1( K1 x1) x i1 (1xd1 )
0
0
1( d x1)
'
z iM ( K1 x1) x iM (1xd )
M ( d M x1) M d m x1
1
M M
i =1
K m x d m
0
1
n
i =1
i =1
i =1
s ZY S ZX = 0
We can uniquely determine all the coefficient vectors 1,..,m iff each coefficient vector mis uniquely determined, which occurs iff Assumption 3.4 holds for each equation. The
rank condition is simple here because there are no cross-equation restrictions when coefficients are assumed to be the same across all equations we will have different
identification condition.
25
4 Special Features of M.E. GMM: We substitute these into Single Equation GMM and get same results!
i. sZY is a stacked vector
ii. SZX is a block diagonal matrix
iii. By ii, Wn is a Sm Km x Sm Km matrix
1 n
z i1 i1
n
n i =1
1
= g n ( ) is a stacked vector
iv. g
gi =
1 n
n i =1
z iM iM
n i =1
2. Applied to Linear Model: Our model is yi = xi + i with the moment condition E[(yi xi) . zi ] = 0
o Expression for sample moment condition in linear model
g n (Wi , b) = s ZY ( mx1) S ZX ( mxd ) b( dx1)
o
Method of Moments: If the equation is exactly identified Km = dm for all m, and there exists a unique b such that gn(Wi,b) = 0,
and zx invertible, then for sufficiently large n, SZX p zx by ergodic theorem and is invertible with prob. 1.
So, with large sample size the system of simultaneous equation has unique solution given by
The MM estimator (CHECK THIS!!!)
IV = (S ZX
o
)1 s ZY
] [
GMM = arg max bR N [E ( g i (Wi , b))]' W [E ( g i (Wi , b))] = E ( Z i (Yi X i' b)) W E ( Z i (Yi X i' b))
1
1
g i (Wi , b) W n ( mxm)
n
g (W , b)
i
i =1
GMM
1
GMM =
n
26
i =1
'
'
Zi ' X i Wn ( mxm )
i =1
n
Zi ' X i
i =1
'
1
Z i ' X i Wn ( mxm )
n
i =1
n
Z Y
i i
i =1
The IV estimator is defined for the EXACTLY IDENTIFIED case (i.e. the case where there are as many instruments as endogenous regressors). If zi = xi , i.e. all the regressors
are predetermined/orthogonal to the error contemporaneous term, then this boils down to the OLS estimator. So, OLS is a special case of MM estimator. And, IV and OLS are both
special cases of GMM
27
The quadratic form formulation gn(Wi,b)Wn gn(Wi,b) gives us a 1x1 real number over which we can define the minimization/maximization problem. Otherwise it would be
impossible to minimize over a m-dimensional vector of moment conditions gn(Wi,b)zd
XI.
1 (W )
GMM (W ) =
= S ZX ' WS ZX
(W )
M
1
n
=
i =1
xi1zi'1
d1 xK1
0
0
i =1
xi1zi'1
d1 xK1
0
0
i =1
S ZX ' W sZY
0
W11( K1 xK1 )
W M 1( K xK )
'
M
1
xiM ziM
d M xK M
W1M ( K1 xK M ) n
W MM ( K M xK M )
W11( K1 xK1 )
W M 1( K xK )
'
M
1
xiM ziM
d M xK M
1 n
W1M ( K1 xK M ) n
i =1
1 n
WMM ( K M xK M )
ziM ( K M x1) yiM
n i =1
i =1
1 n
1 n
xi1zi'1
W11( K1 xK1 )
zi1xi'1
n
d1 xK1
i =1
K1 xd1
i =1
=
n
1 n
'
1
xiM ziM
WM 1( K M xK1 )
zi1xi'1
n
i =1
d M xK M
i =1
K1 xd1
1 n
1 n
xi1zi'1
W11( K1 xK1 )
zi1( K1 x1) yi1
n i =1
d1 xK1
n i =1
K1 xd1
1 n
1 n
'
x
z
W
zi1( K1 x1) yi1
iM iM
M 1( K M xK1 )
n i =1
d M xK M
n i =1
K1 xd1
11
11
1.
'
A
B
B
MM M 1
MM
A'
11
2.
B
11
'
AMM
B
M 1
A' B A
11 11 11
=
'
AMM AMM BM 1 A11
'
B1M c1 A11
B11c1 +
=
'
BMM cM AMM Bc1 +
+
+
i =1
zi1xi'1
K1 xd1
0
i =1
'
ziM xiM
K M xd M
28
i =1
1
xi1zi'1
W1M ( K1 xK M )
n
d1 xK1
'
iM ziM
i =1
+
1
+
n
i =1
WMM ( K M xK M )
n
d M xK M
i =1
i =1
'
ziM xiM
K M xd M
'
ziM xiM
K M xd M
n
i =1
1
xi1zi'1
W1M ( K1 xK M )
n
d1 xK1
i =1
1
'
xiM ziM
W MM ( K M xK M )
n
d M xK M
'
A11
B1M AMM
'
AMM
BMM AMM
'
A11
B1M cM
'
AMM
BMM cM
K M xd M
K M xd M
n
i =1
III.
Large Sample Theory: As seen above, all the theory/formulas for the multiple-equation GMM is a matter of substitution of the newly
defined matrices into the equations. (See summary)
GMM Summary
IV.
Population Orthogonality Conditions
g ( wi , ) E ( Z iYi ) E ( Z i X i ') = ZY ZX = 0
g n ( wi , ) sZY S ZX = 0
bn,GMM
'XZ W S W XZ 'XZ W XZ
)1 S ZX ' W
S *W ' S ZX (S ZX ' W n S ZX
ZX
ZX
ZY
A var(bGMM ) S ZX ' S 1 S ZX
( ) )
( ))
( ))
'
J bGMM S 1 , S 1 = n g n bGMM S 1 S 1 g n bGMM S 1
)1
gi
dx1
sZY
m =1
m =1
] Dm x1
1
zi1( K1 x1) yi1
i =1
n
1
ziM ( K M x1) yiM
i =1
z y
i =1
SZX
S = A var( g ) = E ( g i g i ' )
i =1
n
1
zi1( K1 x1) xi'1(1xd )
1
n
i =1
zi xi'
0
n
1
'
i =1
K m x d m
i =1
KXK
Size of W
ZX
E(zi xi)
S = E ( g i g i ') = E
'
E(z
i1( K1 x1) xi1(1xd1 ) ' )
i2 z i z i '
1
S =
n
'
E ( zi1( K1 x1) xiM
'
)
M
(1xd1 ) M
K m x d m
0
i =1
)
S = E(g i g i ' )
M
M
K m
Km
m =1
m =1
S (consistent estimator of S)
i =1
M
M
K m x K m
i =1
i =1
2
i zi zi '
i =1
i =1
E ( i1 i1 z i1 z i1 ') E ( i1 iM z i1 z iM ')
E ( i1 i1 z iM z i1 ') E ( iM iM z iM z iM ')
1 n
1
'
i1i1 z i1 z i'1
i1iM z i1 z iM
n i =1
n i =1
S =
n
n
1
'
1
iM i1 z iM z i'1
MM MM z MM z MM
n
n
i =1
i =1
3.1 3.4
4.1 4.4
K-D
m (Km Dm)
K m K m
m =1
m =1
V.
W11
Wmm: W =
W11
VI. Special Cases of Multiple Equation GMM under Conditional Homoskedasticity: FIVE, 3SLS, and SUR
0. S under conditional homoskedasticity:
E ( i1 i1 zi1 zi1 ') E ( i1 iM zi1 ziM ') 11 E ( zi1 zi1 ') 1M E ( zi1 ziM ' )
S = E ( gi gi ') =
=
= E ( Z i ' Z i )
E ( i1 i1 ziM zi1 ' ) E ( iM iM ziM ziM ') M 1 E ( ziM zi1 ' ) MM E ( ziM ziM ' )
1.
29
1
1
zi1 zi1 ' 1M
zi1 ziM '
11
n i =1
n i =1
1M
11
n
1 30
=
=
Z
Z
'
where
=
i i '
S =
n
n
i =1
n
n
M 1
MM
1
M 1 1
ziM zi1 ' MM
ziM ziM '
n i =1
n i =1
( is usually the 2SLS estimator for equation m. So for the cross moments we need 2 2SLS estimators)
( S 1 ) = S ' S 1S
S ' S 1s = XZ '( Z Z ') 1 ZX '
XZ '( Z Z ')1 ZY
( ZX
FIVE
ZX
ZX
ZY
Large Sample Properties of FIVE: (these follow from the large sample properties of M.E. GMM estimators)
Suppose Assumptions 4.1 4.5 and 4.7 hold. Suppose further that E(zihzim) exists and is finite for all m, h = 1,2,M. Let S and S be
defined as above. Then,
(a) S P S
(b) FIVE ( S 1 ) is consistent, asymptotically normal, and efficient with Avar( FIVE ) = (ZXS-1ZX)-1
(c) The estimated asymptotic variance Avar( A var(
)
) = (S' S -1S )-1 is consistent for Avar(
FIVE
ZX
ZX
FIVE
D
(d) Sargans Statistic / J Statistic: J ( FIVE , S 1 ) n g n ( FIVE )' S 1 g n (FIVE )
2
29
30
(K
D m )
The (m,h) block of E(gigi) = E(im ih xim xih) = E(E(im ih | xim xih) xim xih) by Law of Iterated Exp and linearity of conditional exp
= E(mh xim xih) by conditional homoskedasticity = mh E(xim xih)
mh
1
n
im ih
'
m (m, h = 1, 2,..., M ) for some consistent estimator m of m
, im yim xim
i =1
P
mh
mh (Prop 4.1 p.269). By (joint) ergodic stationarity,
1
n
z
i =1
im zih
P
E ( zim zih ') which exists and is finite by
'
2.
Three-Stage Least Squares (3SLS): When the set of instruments is same across equations, FIVE can be simplified to 3SLS
Simplification of gi, S, and S : If zi ( = zi1 = zi2 = zi3 = = ziM) is the common set of instruments (for all M equations) with
dimension K, then gi, S, and S can be written compactly using the Kronecker product 31 as follows:
z i1 i1 z i i1
=
gi =
since instruments are common = i zi
z iM iM z i iM
MKx1
1M E ( z i1 z iM )
E ( i1 iM z i1 z iM ) 11 E ( z i1 z i1 )
E ( i1 i1 z i1 z i1 )
S = E(g i g i ' ) =
=
E ( iM i1 z iM z i1 )
E ( iM iM z iM z iM ) M 1 E ( z iM z i1 )
MM E ( z iM z iM )
1M E ( z i z i )
11 E ( z i z i )
by common instruments
=
M 1 E ( z i z i )
MM E ( z i z i )
1M
i1
11
= E ( z i z i ) MK MK where
= E ( i i ' ) for i =
iM
M 1
MM
S-1 = -1 E(zi zi)-1
1
S =
n
i =1
1
S 1 = 1
n
i =1
zi zi '
= n 1 (Z ' Z )1
1M
1
= n
MM
( ) (
' 1
3SLS = FIVE S 1 = S ZX
S S ZX
31
11
where =
M 1
1
1
z i z i ' = Z ' Z = ( Z ' Z )
n
n
a11
(
= X ' (
)
X ' (
' 1
S ZX
S sZY = X ' 1 Z ( Z ' Z )1 Z ' X
a1N
b11
(MxN)
and
B
=
bK1
aMN
PZ X
(
P )Y
33
b1L
a11
(KxL),
a
=
(Mx1), b =
aM 1
bKL
a1N B
b11
a11
a11B
b
=
bN 1
aM 1
aM 1B
aMN B
MKxNL
Useful Properties:
(A B)(C D) = AC BD (provided that A and C are conformable and B and D are conformable)
b11
(Nx1)
bN1
a11b
aM 1b
MNx1
(A B) = A B
Just as before,
mh
1
n
im ih
'
m (m, h = 1, 2,..., M ) for some consistent estimator m of m
, im yim xim
i =1
m is usually the 2SLS estimator for equation m. So for the cross moments we need 2 2SLS estimators)
33
This is bc
'
i i
i =1
32
FIVE
( (
) )
( (
) )
D
(c) Sargans Statistic / J Statistic: J (3SLS , S 1 ) n g n (3SLS )' S 1 g n (3SLS )
2 MK
Dm
Seemingly Unrelated Regressions: Suppose in addition to common instruments, zi = union of (xi1,, xi1) (all regressors are instruments!)
The SUR cross orthogonality condition is equivalent to E(xim . ih) = 0 (m, h = 1,2,,M)
Predetermined regressors satisfy cross orthogonalities: not only are the regressors predetermined in each equation E(xim . im), but
also they are predetermined in other equations (so regressors in any equation is an instrument for all equations).
This simplification produces the SUR estimator WHAT IS THE IMPORTANCE OF CROSS ORTHOG?
3.
With the above, (S and g still defined the same) we get that (bc Xi are in the column space of PZ)
( ) (
' 1
SUR = 3SLS = FIVE S 1 = S ZX
S S ZX
n
'
n
i =1
1
1
S ZX =
Zi X i ' =
0
n
n
i =1
i =1
1
=
0
n
n
'
i =1
z'
1(1xK ) z11
=
where Z =
zn' (1xK ) zn1
1
sZY = ( I M Z ) ' Y
n
z x
X1( nxDM )
z1K
and X =
znK
nxK
'
i i(11xd1 )
0
0
' 1
S ZX
S sZY = X ' 1 I n X
34
X ' 1 I n Y
=
0
I Z )' X
n( M
n
'
zi xiM
(1xd1 )
i =1
X M ( nxDM )
nMx
x'
1m
, Xm =
(data matrix for mth equation ' s regressors)
'
xnm nxDm
m Dm
y
y
1
m1
where Y =
(data matrix for mth equation ' s dependent var)
, ym =
yn nMx1
ymn nx1
1
1
1
1
1
1
1
1
1
1
' 1
S ZX S sZY = X ' ( I M Z ) n ( Z ' Z ) ( I M Z ) ' y = X ' Z ( Z ' Z ) ( I M Z ) ' y = X ' Z ( Z ' Z ) Z ' Y
n
n
n
n
' 1
S ZX
S S ZX
34
SUR
FIVE
( (
) )
) = n(X ' (
) )
In X
D
2 MK
Dm
(c) Sargans Statistic / J Statistic: J (SUR , S 1 ) n g n (SUR )' S 1 g n (SUR )
2 Cases to consider:
(i) Each equation is just identified:
Then, since common instrument set is the union of all regressors, this is possible only if the regressors are the same for all
equations, i.e. xim = zi for all m Multivariate Regression equation by equation OLS!
(ii) At least one equation is overidentified:
Then, SUR is more efficient than equation by equation OLS, unless the equations are unrelated to each other in the
sense that E(eimeihximxoh) = mh E(ximxoh) = 0 (first equality true by conditional homoskedasticity) for all m !=h (recall, in
this case the ME GMM is asymptotically equivalent to SE GMM).
Since E(ximxoh) is assumed to be non-0 (by rank condition), mh E(ximxoh) = 0 iff mh = 0 (cov of error terms = 0)
Therefore, SUR is more efficient than OLS if mh != 0 for some pair (m,h), and they are are asymptotically
equivalent if mh = 0 for all (m,h)
( SZX' S 1SZX )
X'
P
1M PZ X1( nxD1 )
1( D1 xn)
11 Z ( nxn)
X ' 1 PZ X =
' M 1PZ
X M
MM PZ
XM
X '
P'
P
X
11 1( D1 xn) Z ( nxn) Z ( nxn) 1( nxD1 )
'
'
MM X M PZ PZ X M
X '
X
11 1( D1 xn) 1( nxD1 )
sin ce X is in the column space of P ( Z is projection onto space spanned by union of x ' s)
=
m
Z
m
'
MM X M X M
X'
X1
1( D1 xn)
11I n
'
MM I n
X M
XM
= X ' 1 I n X
Similarly,
1
1
' 1
S ZX
S sZY = X ' 1 Z ( Z ' Z ) 1 Z ' Y = X ' 1 I n Y
n
n
4.
Multivariate Regression : Suppose in addition to common instruments, zi = union of (xi1,, xi1), and all equations are identified.
This condition implies xim = zi for all m (same regressors for all equations, which are all exogenous) 35
Simplification of X:
With the assumption, we get that X = IM Z
With the above, (S and g still defined the same) we get that
( ) (
' 1
MVR = SUR = 3SLS = FIVE S 1 = S ZX
S S ZX
36
' 1
S ZX
S sZY = I M ( Z ' Z ) 1 Z ' Y
Multivariate Regression Interpretation of SUR: We can think of SUR model as a multivariate regression model with a priori
exclusion restrictions. (i.e. a system with same regressors but with restrictions on certain coefficients)
Multiple Equation GMM with Common Coefficients: We modify the ME GMM model to allow this restriction. We get RE and Pooled OLS.
Background:
In many applications (e.g. panel data) we deal with a special case of ME GMM model where the number of regressors is the same
across equations with the same coefficients. How do we apply ME GMM while imposing common coefficient restriction
Random Effects Estimator and Pooled OLS
35
36
This is true bc zi instruments all m equations. So, if not, xim c zi for some m, then dim(zi) > dim(xim) and the mth equation is overidentified.
(
X ' (
)
) ( I Z ) = ( I Z ')( I ) ( I Z ) = ( Z ') ( I Z ) = (
(
I ) Y = ( I Z ')( I ) Y = ( Z ' ) Y
= X ' ( I ) X X ' ( I ) Y = Z ' Z ( Z ') Y = ( ( Z ' Z ) )( Z ') Y = ( I
X ' 1 I n X = ( I M Z ) ' 1 I n
1
MVR
'
M
Z 'Z
'
M
m (Z ' Z )
Z' Y
FIVE
3SLS
SUR
Multivariate
Regression
Assumptions
4.1 4.5, 4.7
E(ximxih)
finite
zim = zi for
all m
xim = zi for
all m
The
Model
E ( z z ') E ( z z ')
i1 i1 iM i1
iM iM iM iM
1M E ( zi zi )
11E ( zi zi )
M 1E ( zi zi )
MM E ( zi zi )
= E ( zi zi ) MK MK
1M E ( zi zi )
11E ( zi zi )
M 1E ( zi zi )
MM E ( zi zi )
= E ( zi zi ) MK MK
Irrelevant
1
S =
zi zi '
n
i =1
= Z 'Z
n
from 2 SLS residuals
1
S =
zi zi '
n
i =1
= Z 'Z
n
from OLS residuals
Irrelevant
= E ( Z i ' Z i )
S
1
i1i1zi1zi'1
n
i =1
n
1
iM i1ziM zi'1
n
i =1
(S
( S 1)
' 1
ZX S S ZX
i =1
1
'
MM MM zMM zMM
n
i =1
1
n
'
i1iM zi1ziM
E ( Zi ' Z i )
' 1
S ZX
S sZY
( S S S ) S
= ( XZ '( Z Z ') ZX ')
1
'
1
ZX
ZX
' 1
ZX S sZY
Avar
( S 1)
A var
( S 1)
( ZX ' S 1 ZX )1
(S
ZX ' S
S ZX
( ZX ' S 1 ZX )1
(S
ZX ' S
S ZX
XZ '( Z Z ') 1 ZY
(S
' 1
ZX S S ZX
= X ' 1 PZ X
X ' 1 PZ Y
( (
) )
) S S s
= X ' ( I ) X X ' ( I ) Y
n X ' ( I ) X
(S
' 1
S ZX
S sZY
' 1
ZX S S ZX
1
'
ZX
1
ZY
1
Equation-byEquation
OLS
OLS
Formula
OLS
Formula
n X ' 1 I n X
Note:
'
z1(1
z
xK )
11
Z =
=
'
zn (1xK ) zn1
X1( nxDM )
z1K
, X =
znK nxK
X M ( nxDM )
nMx
Dm
x1' m
1( nx1)
y1( nx1)
1m
y1m
1
, Xm =
, D = , Y =
, ym =
, m =
, =
m m
'
nm
M
ynm
x
(
1)
M
(
nx
1)
M
nx
nx1
nx1
nMx1
nMx1
nm nxDm
VII. Simultaneous Equations, FIML (ML Counterpart to 3SLS) and LIML (ML Counterpart to 2SLS)
Background: Given that were going to estimate simultaneous equations system (with same instruments across all equations) via maximum
likelihood, we will assume iid and normality. But first, we need to complete the system (# of endogenous variables = #
equations)
1.
B. The square matrix 0 ( M M ) is nonsingular: This implies that the structural equation can be solved for endog. vars
y t ( M 1) = 01 ( M M ) B 0( M K ) x t ( K 1) + 01 t ( M 1) '0( M K ) x t ( K 1) + v t
D. Log-likelihood function for the sample (i.e. our objective function) (pg 531 - 532 Hayashi):
Qn ( , ) =
M
1
1
1
log ( 2 ) + log | |2 log (| |)
2
2
2
2n
( y
+ Bxt ) 1 ( y t + Bxt )
'
t =1
Therefore, the FIML estimate of ( 0 , 0 ) (i.e. the coefficients in the B0 and 0 matrix and the variance of errors) is the
( , ) that maximizes the objective function.
37
How does this relate to instruments? If its not complete, then we can add equations that involve instruments that we take to be true, which are orthogonal to the error terms (See
LIML example later).
Properties of FIML
A. Identification of the FIML estimator
The identification condition is equivalent to the rank condition being satisfied for each equation:
'
E (z t xtm
) is full column rank for all m = 1, 2,..., M
B. Invariance: Since FIML is an ML estimator, the invariance property holds (see HW2 and HW3 on why useful)
C. Asymptotic Properties of FIML
Consider the M-Equation system: yim = ximm + im (m = 1, 2, ... , m; i = 1,2,,n)
Let 0 be the stacked vector collecting all the coefficients in the M-equation system.
Suppose the following assumptions are satisfied
'
) is full column rank for all m = 1, 2,..., M
A1: Rank condition for identification is satisfied: E (z t xtm
( )
( (
) ) (same as 3SLS)
A consistent estimator of the asymptotic variance is n X ' ( Z ( Z ' Z ) Z ') X (same as 3SLS)
(c) The likelihood ratio static for testing overidentifying restrictions is asymptotically ChiSq KM
Furthermore, these asymptotic results hold even without the normality assumption.
2.
M
L
m =1 m
LIML vs. 3SLS: They are asymptotically equivalent. However, LIML has invariance property that 3SLS (and 2SLS dont).
Limited Information Maximum Likelihood (ML Estimation of 2SLS)
Setup: The difference here is that we are only estimating 1 equation, instead of a whole system, and we have endogeneity problem.
The only trick here is that we need to complete the system by adding 1 more equation relating the endogenous variable to
the set of predetermined variables (just take something you know to be true).
'
Y2 = 0 + 1 ' Z ( mx1) + u 0 1 2 x 2 Y2
Y2 = 0 + 1 ' Z + u
0 1 2 xm Z mx1 u
2 x1
Y
1 2
1 =
1
Y2 2 x1 0
1
0 0 1 1 2
+
'
0 1 Z 0 1 u
Example of LIML (271A PS2 Empirical Ex #2): Single Equation System, Completed by Equation Given
Suppose the population model of interest is:
Y1 = 0 + 1Y2 +
We suspect endogeneity, and we complete system with: Y2 = 0 + 1Z + u
1
Such that E = 0,
Z
11 12
, and we observe an iid sample of (Y1,Y2,Z)
u | Z ~ N (0, ) where =
21 22
So, orthogonality condition holds. E(ZZ) assumed to be invertible. And we have iid and Gaussian errors, therefore, we can estimate this
equation consistently by OLS But OLS with is the same as the MLE estimate (under Gaussian errors assumption)
Similarly, we can estimate the second equation consistently via OLS to obtain MLE estimate (under Gaussian errors)
We obtain MLE estimates 0 , 1 , 0 , 1 from OLS
Example of FIML (271A PS3 #5): Multiple (2) Equation System, Completed by Equation Given
The structural model is:
Y1 = 12Y2 + 13Y3 + 13 Z 3 + 14 Z 4 + u1
Y2 = 22Y1 + 21Z1 + u2
Y3 = 31Z1 + 32 Z 2 + 33 Z 3 + 34 Z 4 + u4
where Z1 = 1, E (us ) = 0 for s = 1, 2,3 and E ( Z j us ) = 0 for j = 1,.., 4, s = 1, 2,3
Assume in addition that 13 + 14 = 1
1.
=
=
=
= 1 +
Var ( p i )
Var ( p i )
Var ( p i )
Var ( p i )
Var ( p i )
Asymptotic Bias =
Cov(ui , pi )
Var ( pi )
=
=
=
= 1 +
Var ( pi )
Var ( pi )
Var ( pi )
Var ( pi )
Var ( pi )
Asymptotic Bias =
Cov (vi , pi )
Var ( pi )
Since Cov( pi , ui ) 0 and Cov( pi , vi ) 0 , therefore endogeneity bias/simultaneous equation bias/simultaneity bias exists! (bc
regressor and error term are related to each other through a system of simultaneous equations).
So, OLS estimator is not consistent for either 1 or 1.
Solution: Instrumental Variables and 2 Stage Least Squares
The reason why demand curve nor supply curve can be consistently estimated because we cannot infer from the data whether the
observed changes in price and quantity is due to a shift in demand or supply. Therefore, we might be able to estimate the
demand/supply if some of the factors that shift the supply/demand curves are observable.
Def: A predetermined variable (predetermined in the system) that is correlated with the endogenous regressor is called an
instrumental variable or instrument. Sometimes we call it a valid instrument to emphasize that the correlation with
the endogenous regressor is not 0.
To see endogeneity, treat the 3 equations as a system of simultaneous equations and solve for pi and qi
(1 1 ) pi = ( 0 0 ) + (vi ui ) pi = ( 0 0 ) + (vi ui )
(1 1 ) (1 1 )
qid = qis 0 + 1 pi + ui = 0 + 1 pi + vi
( 0 ) (vi ui )
1
1
So, Cov( pi , ui ) = Cov 0
+
, ui =
Cov ((vi ui ), ui ) =
(Cov(vi , ui ) Var (ui )) = Var (ui ) (Since Cov(vi , ui ) = 0 by assumption)
(1 1 )
(1 1 )
(1 1 ) (1 1 ) (1 1 )
( 0 0 ) (vi ui )
1
1
(Var (vi ) Cov(vi , ui )) = Var (vi ) (Since Cov(vi , ui ) = 0 by assumption)
Cov ( pi , vi ) = Cov
+
, vi =
Cov ((vi ui ), vi ) =
(1 1 )
(1 1 )
(1 1 ) (1 1 ) (1 1 )
Suppose further that the observed supply shifter xi is predetermined in the demand equation, i.e. uncorrelated with the error term ui
(e.g. think of xi is the temperature in coffee growing regions). If the temperature (xi) is uncorrelated with the unobserved factors that
shift demand (ui), i.e. temperature (xi) is an instrument (for the demand equation), it would be possible to extract from observed price
movements a component that is related to the temperature (i.e. the observed supply shifter) but uncorrelated with the demand shifter.
Then, we can estimate the demand curve by examining the relationship between coffee consumption and that component of price.
( u i )
( 0 0 )
2
xi + i
+
( 1 1 )
( 1 1 ) ( 1 1 )
( 0 )
( u i )
2
2
1
(Cov( i , x i ) Cov(u i , x i ))
Cov ( p i , x i ) = Cov 0
+
xi + i
, xi =
Var ( x i ) +
( 1 1 ) ( 1 1 )
( 1 1 )
( 1 1 ) ( 1 1 )
pi =
( 1 1 )
0 ( So x i a valid instrument )
With a valid instrument, we can estimate the price coefficient 1 of demand curve consistently.
Cov(q i , x i ) = Cov( 0 + 1 p i + u i , x i ) = 1Cov( p i , x i ) + Cov(u i , x i ) = 1Cov( p i , x i ) sin ce Cov(u i , x i ) = 0 by assumption
1 =
Cov (q i , x i )
Cov ( p i , x i )
If we observe an iid sample (qi, pi, xi), then by the analogy principle, the natural (consistent) estimator is:
(we say the endougenous regressor pi is instrumente by xi)
1, IV =
x q = x ( + p
px
px
i
i i
i i
+i )
= 0
i i
x
px
i
i i
+ 1
px + x
px px
i
i i
i i 40
i i
i i
When z and are uncorrelated, the final term vanishes in the limit providing a consistent estimator. Note that when x is
uncorrelated with the error term, x is itself an instrument. In that case the OLS estimator is a type of IV estimator.
The approach above generalizes in a straightforward way to a regression with multiple explanatory variables. Suppose
X is the T x K matrix of explanatory variables resulting from T observations on K variables. Let Z be a T x K matrix of
instruments. Then,
39
This decomposition is always possible by the projection theorem. vi can be expressed as the projection onto the space spanned by xi and the orthogonal complement (remember,
vi includes all factors that affect supply, so by definition has at least as many dimensions than xi ). i.e. If the least squares projection of vi on a constant and xi is E*(vi | 1 xi) = 0 +
2xi. Define i = vi 0 + 2xi. By definition, i is orthogonal to xi and E(i ) = 0, therefore i, xi uncorrelated. Substituting this into the original supply equation, and combining the
intercept terms we get the resulting expression.
40
Recall, sample covariance can be expressed as
Here, average of
i i
i i
i i
i i
One computational method often used for implementing the technique is two-stage least-squares (2SLS). One
advantage of this approach is that it can efficiently combine information from multiple instruments for over-identified
regressions: where there are fewer covariates than instruments. Under the 2SLS approach, in a first stage, each
endogenous covariate (predictor variable) is regressed on all valid instruments, including the full set of exogenous
covariates in the main regression. Since the instruments are exogenous, these approximations of the endogenous
covariates will not be correlated with the error term. So, intuitively they provide a way to analyze the relationship
between the outcome variable and the endogenous covariates. In the second stage, the regression of interest is estimated
as usual, except that in this each endogenous covariate is replaced with its approximation estimated in the first stage.
The slope estimator thus obtained is consistent. A small correction must be made to the sum-of-squared residuals in the
second-stage fitted model in order that the associated standard errors be computed correctly.
Stage 1:
Stage 2:
Mathematically, this estimator is identical to the single stage estimator presented above when the number of
instruments is the same as the number of covariates.
Two-Stage Least Squares (2SLS) Estimator for 1: This is another procedure for consistently estimating 1 which is named thusly
because the procedure consists of running 2 least squares (OLS) regressions.
First Stage: Endogenous regressor pi is regressed on a constant and the predetermined variable xi to obtain fitted values p i .
(OLS coeff. for xi is Sample Cov bt pi and xi / sample variance of xi).
Second Stage: Regress dependent variable qi on a constant and p i .
(OLS coeff. for xi is Sample Cov bt p i and xi / sample variance of xi).
The 2nd stage estimates the equation (bracketed term is error): q i = 0 + 1 p i + [u i + 1 ( p i p i )]