Generalized Method of Moments (GMM) Estimation: Outline
Generalized Method of Moments (GMM) Estimation: Outline
1 of 32
Outline
(1) Introduction and motivation
(2) Moment Conditions and Identification
(3) A Model Class: Instrumental Variables (IV) Estimation
(4) Method of Moment (MM) Estimation
Pseudo-ML Estimation
(8) Empirical Example: C-CAPM Model
2 of 32
Introduction
Generalized method of moments (GMM) is a general estimation principle.
Estimators are derived from so-called moment conditions.
Three main motivations:
(1) Many estimators can be seen as special cases of GMM.
Unifying framework for comparison.
(2) Maximum likelihood estimators have the smallest variance in the class of consistent
3 of 32
()
where is a K 1 vector of parameters; f () is an R dimensional vector of (nonlinear) functions; wt contains model variables; and zt contains instruments.
(R1)
5 of 32
rt = E[ t+1 | It] +
Under rational expectations, the expectation error, vt, should be orthogonal to the
information set, It, and for zt It we have the moment condition
E[ut zt] = E[(rt xt+1) zt] = 0.
This is enough to identify .
6 of 32
MM , as the solution to gT (b
We can derive an estimator, b
MM ) = 0.
For a sample, y1, y2, ..., yT , we state the corresponding sample moment conditions:
T
1X
gT (b
) =
(yt
b) = 0.
T t=1
bMM
T
1X
=
yt,
T t=1
8 of 32
()
9 of 32
T
T
T
1X
X
X
1
1
0
b =
b = 0.
b =
gT ()
xt yt xt
xtyt
xtx0t
T t=1
T t=1
T t=1
=
xtx0
xtyt,
MM
provided that
PT
0
t=1 xt xt
t=1
t=1
is non-singular.
10 of 32
Example: Under-Identification
Consider again a regression model
yt = x0t 0 +
= x01t 0 + x02t 0 + t.
()
()
()
x1t
x1t
xt =
and
zt =
.
x2t
z2t
(K1)
(K1)
xt are model variables, z2t are new instruments, and zt are instruments.
We say that x1t are instruments for themselves.
Using () and () we have K moment conditions:
E[x1t t]
g( 0) =
= E[zt t] = E[zt (yt x0t 0)] = 0,
E[z2t t]
which are sucient to identify the K parameters in .
12 of 32
X
1
0b
b
gT () =
zt yt xt = 0.
T t=1
provided that
PT
t=1
0
t=1 zt xt
t=1
is non-singular.
13 of 32
14 of 32
ga
gT () =
,
gb
where the dependence of T and is suppressed.
1
0
g
a
QT () = gT ()0WT gT () = ga gb
= ga2 + gb2,
gb
0 1
which is the square of the simple distance from gT () to zero.
Here the coordinates are equally important.
2
0
g
a
QT () = gT ()0WT gT () = ga gb
= 2 ga2 + gb2,
gb
0 1
which attaches more weight to the first coordinate in the distance.
15 of 32
T
X
t=1
Asymptotic Distribution
Assume a central limit theorem for f (wt, zt, ), i.e.:
T
1 X
T gT (0) =
f (wt, zt, 0) N(0, S),
T t=1
Then it holds that for any positive definite weight matrix, W , the asymptotic distribution of the GMM estimator is given by
b
T GMM 0 N(0, V ).
The asymptotic variance is given by
V = (D0W D)
where
D0W SW D (D0W D)
f (wt, zt, )
D=E
0
is the expected value of the R K matrix of first derivatives of the moments.
17 of 32
1 0 1 1 0 1 1 0 1 1
V = D0S 1D
D S SS D D S D
= DS D
.
Vb = DT0 ST1DT
,
where
T
gT ()
1 X f (wt, zt, )
DT =
=
|{z}
T t=1
0
0
(RK)
Estimation of the weight matrix is typically the most tricky part of GMM.
19 of 32
J = T gT (b
GMM )0WToptgT (b
GMM ) = T QT (b
GMM ) 2(R K).
20 of 32
Computational Issues
The estimator is defined by minimizing QT (). Minimization can be done by
QT () (gT ()0WT gT ())
=
= 0 .
(K1)
b
[1] = arg min gT ()0W[1]gT ().
opt
[1]. Find the ecient estimator
W[2]
, based on b
opt
b
[2] = arg min gT ()0W[2]
gT ().
The estimator is not unique as it depends on the initial weight matrix W[1].
21 of 32
22 of 32
Example: 2SLS
Consider again a regression model
yt = x0t 0 +
= x01t 0 + x02t 0 + t,
Instead, we want to derive the GMM estimator by minimizing the criteria function
QT () = gT ()0WT gT ()
= T 1Z 0 (Y X) WT T 1Z 0 (Y X)
To estimate the optimal weight matrix, WTopt = ST1, we use the estimator
T
T
1 X
1X 2 0
0
ST =
f (wt, zt, )f (wt, zt, ) =
b ztz ,
T t=1
T t=1 t t
24 of 32
0 1 1
a
1
b
GMM N 0, T
.
DS D
The derivative is given by
gT ()
DT =
=
0
(RK)
PT
1
0
T
t=1 zt (yt xt )
= T
T
X
ztx0t,
t=1
bGMM = T 1 D0 W optDT 1
V
T T
!
!1
!1
T
T
T
X
X 2
X
1
1
0
1
0
1
T
T
T
= T
xtzt
bt ztzt
ztx0t
T
X
t=1
xtzt0
!1
t=1
T
X
t=1
b2t ztzt0
T
X
t=1
t=1
ztx0t
!1
t=1
Note that this is the heteroskedasticity consistent (HC) variance estimator of White.
GMM with allowance for heteroskedastic errors automatically produces heteroskedasticity consistent standard errors!
25 of 32
If we assume that the error terms are IID, the optimal weight matrix simplifies to
T
b2 X 0
ST =
ztzt = T 1
b2Z 0Z,
T t=1
where
b2 is a consistent estimator for 2.
T
T
1
1 0
1
0
1 2 0
0
b ZZ
ZX
X 0Z T 1
b2 Z 0 Z
ZY
= XZ T
1
1 0
1
0
0
= X Z (Z Z) Z X
X 0Z (Z 0Z) Z 0Y,
which is identical to the two stage least squares (2SLS) estimator.
1
1
b
V GMM = T 1 DT0 ST1DT
=
b2(X 0Z (Z 0Z) Z 0X)1,
which again coincides with the 2SLS variance.
26 of 32
b
ML is consistent for weaker assumptions than maintained by ML.
The FOC for a normal regression model corresponds to
E[xt(yt x0t)] = 0,
which is weaker than the assumption that the entire distribution is correctly specified.
OLS is consistent even if t is not normal.
A ML estimation that maximizes a likelihood function dierent from the true models
likelihood is referred to as a pseudo-ML or a quasi-ML estimator.
Note that the variance matrix is no longer the inverse information.
27 of 32
Eciency:
Typical
approach:
Robustness:
28 of 32
X
s=1
E [ s u(ct+s) | It] ,
where u0() is the derivative, and Rt+1 = 1 + rt+1 is the return factor.
29 of 32
< 1,
so that u0(ct) = c
t . That gives the explicit Euler equation:
R
|
I
t+1
t = 0.
t
t+1
ct+1
E
Rt+1 1 | It = 0,
ct
which is a conditional moment condition.
ct+1
E [f (ct+1, ct, Rt+1; zt; , )] = E
Rt+1 1 zt = 0,
ct
for all variables zt It included in the formation set.
30 of 32
0
ct
zt = 1,
, Rt .
ct1
That produces the moment conditions
"
!
#
ct+1
E
Rt+1 1
= 0
ct
"
!
#
ct+1
ct
E
Rt+1 1
= 0
ct
ct1
"
#
!
ct+1
E
= 0,
Rt+1 1 Rt
ct
for t = 1, 2, ..., T .
T
J
DF
2-Step
HC
1
0.9987
0.8770 237 0.434
1
p val
0.510
(0.0086)
(3.6792)
Iterated HC
0.9982
0.301
CU
HC
0.9981
0.302
2-Step
HAC
0.9987
0.513
Iterated HAC
0.9980
0.296
CU
HAC
0.9977
0.297
2-Step
HC
0.9975
0.660
Iterated HC
0.311
0.321
0.643
0.298
0.309
CU
2-Step
HC
HAC
Iterated HAC
CU
HAC
2
2
2
2
(0.0044)
(0.0044)
(0.0092)
(0.0045)
(0.0045)
(0.0066)
(0.0045)
(0.0046)
(0.0068)
(0.0047)
(0.0048)
(1.8614)
(1.8629)
(4.0228)
(1.8757)
(1.8815)
(2.6415)
(1.7925)
(1.8267)
(2.7476)
(1.8571)
(1.9108)
32 of 32