Lecture Notes Lectures 1 8
Lecture Notes Lectures 1 8
ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 1
Ralf Becker
February 4, 2016
Table of contents
State of Play
Model assumptions and parameter properties
Example: Basic Econometrics Grades (1)
Testing multiple restrictions
Example 1: Basic Econometrics Grades (2)
p-values - Revision
Overview Semester 2
Auxiliary regressions
What next
Assumptions
or
y = X + u (2)
Assumption 2
Random samples {yi , x1i , ..., xki }. n observations.
Assumption 3
There is variation in the explanatory variables. Absence of perfect
multicollinearity (full rank of X).
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Assumptions
or
E [u|X] = 0 (4)
Assumptions
Assumption 5
Homoskedasticity. Constant residual variance.
or
Var [u|X] = 2 I (7)
Assumptions
Assumption 6
Normality.
ui N 0, 2
(8)
or
u N 0, 2 I .
(9)
This assumption implies A4 and A5.
Gauss-Markov assumptions + A6 (= Classical linear regression
assumptions) guarantee that inference on can be based on t and
F tests in samples of any size.
bi i /se bi tnk1 (10)
F Fr ,nk1 (see below)
If A6 is not valid then the above inference is justified in large
sample (and the associated theory is called asymptotic theory).
a a
bi i /se bi tnk1 = N (0, 1) (11)
a
Distributing prohibited
F F|rDownloaded
,nk1
by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Semester 1 and
Semester 2 grades
(somewhat randomised)
for Econometrics
(gradeexample.csv). 137
observations.
H0 : = 1
HA : < 1
b 1 0.9337 1
tcalc = = = 0.9507
se b 0.0697
(SSRr SSRu ) /r
F = (13)
SSRu / (n k 1)
F Fr ,nk1 if A1 to A6
a .
F Fr ,nk1 if A1 to A5
H0 : 2 = 3 = 0
H0 : 2 and/or 3 6= 0
The test statistic to be used is the F -test
(SSRr SSRu ) /r
F = Fr ,nk1
SSRu / (n k 1)
We are testing two restrictions, hence r = 2, and n k 1 = 133.
The decision rule is to reject H0 if F > F2,133, (3.00 at = 0.05).
(26786.75 25437.09) /2
F = = 3.5284
25437.09/133
(Get SSRr and SSRu yourself!) We reject H0 at = 0.05. The
Semester 1 / Semester 2 grade relationship (marginally) varies
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
between Year 2 and Year 3 students.
lOMoARcPSD|1057622
p-values
p-values - Examples
For the above examples the p-values are
I t-test, left-tailed (one sided)
H0 : = 1
HA : < 1
H0 : s1t = s1e = 0
H0 : s1t and/or s1e 6= 0
Overview Semester 2
Auxiliary Regressions
Small Sample Parameter Properties
The Matrix Form
Asymptotic Parameter Properties
Introduction to Time-Series Data
Multicolinearity - Breach of A3
Heteroskedasticity - Breach of A5
Autocorrelation - Breach of A5 for time series data
Specification testing - Breach of A1
Forecasting
Maximum Likelihood
Bayesian Econometrics
Assessment
Auxiliary regressions
Reading: (Wooldridge p176-178)
Later in the course we will encounter helper or auxiliary
regressions.
Some multiple restrictions can easily be tested by auxiliary
regressions. Example:
y = X + u (14)
0
= (0 1 ... 4 ) (15)
H0 : 2 = 3 = 4 = 0 (16)
1. Estimate restricted model
yi = 0 + 1 x1i + ui (17)
and obtain estimated residuals
ei = yi e0 e1 x1i
u (18)
where eiprohibited
Distributing are OLS| Downloaded
estimatesbyfrom the(yypieesp@abyssmail.com)
Elia Aile restricted model.
lOMoARcPSD|1057622
Auxiliary regressions
yi = 0 + 1 xi + ui (19)
Parameter Properties
Unbiasedness
Formally
E (1 ) = 1 (22)
Assume that there is a population with 100,000 members. In this
population the true (but unknown) relationship is (where 1 = 1.5)
Parameter Properties
Unbiasedness
But note: In practice you have only one of these 1 s! And that
mayDistributing
happen to be somewhere
prohibited in by
| Downloaded the
Eliatail!
Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Parameter Properties
Efficiency
Parameter Properties
Why are they important
What to do next:
ECON20110/30370 Econometrics
2016/17 - Semester 2 - Week 2
Ralf Becker
February 1, 2016
Table of contents
Asymptotic Preliminaries
Random Regressors
y = X + u (1)
yi = 0 + 1 xi + ui (4)
Cov (yi , xi )
b1 =
1 Var (xi )
b = X0 X X0 y (2)
b0 = y b1 x (5)
2
1 Var bj =
Var b = 2 X0 X
(3) SSTj 1 Rj 2
(6)
The Matrix form is much more general as it leaves the number of
columns in X unspecified
We will use the form that makes life easier (depends on the issue)
Matrix Form
b1
b2
b = (8)
...
(q1)
bq
If the first column of X is a vector of ones (representing the
constant) then b1 is the estimated constant parameter.
Matrix Form
Let X be a (n q) matrix, y be a (n 1) vector and u is a (n 1)
vector of error terms
1
2 0
Var b = X X (9)
(11) (qn)(nq)
Var (b1 )
Cov (b1 , b2 ) Var (b2 )
Var b =
.. ..
(qq) . .
Cov (1 , q ) Cov (2 , q )
b b b b Var (bq )
(10)
This matrix is a symmetric matrix
Equally:
1
1 0 n yi
Xy= 1
n n (xi yi )
Asymptotic Preliminaries
In Semester 1:
I We needed Assumption 6 (error normality) to derive the
distributions of t and F tests.
I If A6 holds we know the distributions of t and F tests at
any sample size.
In Semester 2:
I As we relax some assumptions (e.g. the homoskedasticity
assumption) we loose the ability to derive small sample
distributions. But we will be able to derive asymptotic
properties (i.e. as the sample size goes to infinity).
I Means that we can do without A6!
Recall that, as
Distributing u is a |r.v.,
prohibited b by
Downloaded is Elia
alsoAile
a (yypieesp@abyssmail.com)
r.v.
lOMoARcPSD|1057622
In Semester 1:
In Semester 2:
a
A1 to A4 are sufficient to derive b N(0, MPM 0 ). A6 is not
necessary. A5 can be relaxed (using different LLNs and CLTs)
Random Regressors
Random Regressors
If X is fixed:
E ()
b = (19)
if E (u) = 0
If X is random:
E (|X)
b = (20)
if E (u|X) = 0
hence result is conditioned on the particular set of observations X
we used.
Random Regressors
E () = EX (E (|X))
b = EX () = (21)
Random Regressors
What about our variance formula?
If X is fixed:
1
b = 2 X0 X
Var () (22)
If X is random:
1
Var (|X)
b = 2 X0 X (23)
i.e. at this stage the variance formula is valid for the particular X
only.
1 1
Var ()
b = EX (Var (|X))
b = EX ( 2 X0 X ) = 2 EX X0 X
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)(24)
lOMoARcPSD|1057622
Random Regressors
Variance implementation
we established that
1
b = 2 EX
Var () X0 X (25)
Data may be sampled across time. (W. chps 10.1, 10.2 and 11)
Example: Phillips curve. Is there a relationship between inflation
() and unemployment (un)?
obs CS obs TS
all in 2003 all UK data
1 (UK , unUK ) 1 (1990 , un1990 )
2 (Ch , unCh ) 2 (1991 , un1991 )
3 (Jap , unJap ) 3 (1992 , un1992 )
.. .. .. ..
. . . .
Assumption 2
Random samples {yi , x1i , ..., xki }. n observations.
Often Data will trend for the entire or part of the series.
Example
Relationship between CO2 emissions (thousand metric tons of
carbon) and global temperature (deviation from 1961-1990
average).
ECON20110/30370 Econometrics
2015/16 - Semester 2 - Lecture Week 3
Ralf Becker
Table of contents
Model Setup
yt = xt + ut (1)
Assumptions
Assumption TS1 (as in Wooldridge chapter 11) Assume that
the model is as in (1) and that the draws of (yt , xt ) for t = 1, ..., T
are stationary and weakly dependent.
Assumption TS2
No perfect correlation between variables in xt .
Assumption TS3
Zero conditional mean.
E [ut |xt ] = 0. (2)
Assumption TS4
Homoskedasticity. Constant residual variance.
Var [ut |xt ] = 2 (3)
Assumption TS5
No autocorrelation (serial correlation)..
Consider the following four time series (all annual from 1961 to
2007, TS_Data_SpuriousRegression.wf1)
I Life Expectancy at Birth in Belgium (lifeexp)
I Agriculture, value added (% of GDP) in China (agrval)
I ODA aid per person (constant 2007 US$) in Norway (aid)
I CO2 emission per person (metric tons) in Australia (co2em)
Which ones should be least related? Lets run a regression and
lets see what we get.
yt = 0 + 1 xt + t
and get
ybt = 16.0 + 0.1697 xt
(0.220) (0.011)
The t-stat is around 16. Can we trust this result? No! TS1 is
breached if data behave like in (5) and (6).
Spurious Regression!
Using nonstationary series in standard regression analysis will cause
problems.
The issue are the availability of LLNs and CLTs.
Straightforward for iid data; They are available for weakly dep.
(stationary) data; But not available for nonstationary data
Need to understand more about the behaviour of TS
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Features of TS data:
I persistence
I trending
I seasonality
Models of the type (7) can capture all these features.
Time Series for which the autocorrelation starts with values very
close to 1 (for lag h = 1) and only decays very slowly are likely to
not be weakly dependent. The dependence is too strong
E [yt ] = t (9)
Var [yt ] = t2 . (10)
0
E [yt ] = (11)
1 1
u2
Var [yt ] = 2 = (12)
1 12
E [yt ] = t = 0 / (1 1 )
Does not use all info, in particular yt1, yt2 etc. (persistence!)
From elementary statistics: P (A) 6=P (A|B)
Distributing
Using all infoprohibited
will lead| Downloaded
to improvedby Elia Aile (yypieesp@abyssmail.com)
expectation.
lOMoARcPSD|1057622
Today: t
which is unequal to
Implementation in R
Before we stated the AR(1) model
yt = 0 + 1 yt1 + ut . (21)
0
E (yt ) = (22)
1 1
The model that R actually estimates is
ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 4
Ralf Becker
Table of contents
Heteroskedasticity
What is it?
Consequences of Heteroskedasticity
Detection
Robust standard errors
Generalised Least Squares (GLS)
Weighted LS and Feasible GLS
Assumptions
y = X + u (1)
Assumption 5
Homoskedasticity. Constant residual variance.
or
Var [u|X] = 2 I (3)
Example
House Price Sales - US data, Stockton3.csv
Model house sales price (spricei ) as being dependent on the
number of bedrooms (bedsi )
Consequences of Heteroskedasticity
... for the OLS estimator
Consequences of Heteroskedasticity
... for inference using OLS
1 1
Var ()
b = X0 X X0 X X0 X (5)
1
6= 2 X0 X (6)
bk k
t stat =
se(bk )
we need to use the correct variance estimator from which to obtain
se(bk ).
Consequences of Heteroskedasticity
... for inference using OLS
What next?
Detection
Graphical Tools
Regress:
spricei = o + 1 livareai + 2 agei + ui
Detection
Graphical Tools
Detection
Using hypothesis tests
u
bi = 0 + 1 livareai + 2 agei + i ?
b1 is an estimate of corr (b
ui , livareai ). Hence
b1 = 0 (Try
yourself!). Does not express the relationship we could see in the
above scatter.
Detection
Using hypothesis tests
u
bi is not a good measure of residual variance
bi2 = 0 + 1 livareai + i
u
Detection
Using hypothesis tests
bi2 = 0 + zi 1 + i
u (8)
Detection
Using hypothesis tests
Detection
Which explanatory variables to use?
In the above procedure it was left unspecified what and how many
variables should be in zi . Lets refer to this model:
Detection
Heteroskedasticity test examples in R
reg1 < lm(SPRICE LIVAREA + AGE , data = hs data)
Breusch-Pagen Test, the default version
> bptest(reg1)
studentized Breusch-Pagan test
data: reg1 BP = 192.5768, df = 2, p-value < 2.2e 16
White Test, without cross-products
> bptest(reg 1, LIVAREA + I (LIVAREA2 ) + AGE + I (AGE 2 ),
data = hs data)
studentized Breusch-Pagan test
data: reg1 BP = 278.4171, df = 4, p value < 2.2e 16
White Test, with cross-products
> bptest(reg 1, LIVAREA + I (LIVAREA2 ) + AGE + I (AGE 2 ) +
I (AGE LIVAREA), data = hs data)
studentized Breusch-Pagan test
data:Distributing
reg1 BPprohibited
= 282.2782, p Aile
df =by5,Elia
| Downloaded value < 2.2e 16
(yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Detection
Test for time dependence of residual variance - ARCH
Consider daily exchange rate changes (USD/UKP), dxt (=100*
log-difference). (4 Jan 1971 to 7 Feb 2014) in
usdukp.xls/usdukp.wf1.
Detection
Test for time dependence of residual variance - ARCH
2
We do not know the values for t+j but have proxies u 2
bt+j
Detection
Test for time dependence of residual variance - ARCH
bt2 = 0 + 1 u
u 2
bt1 + 2 u 2
bt2 + ... + k u 2
btk + t
H0 : 1 = ... = k = 0 homoskedastic residuals
HA : any j 6= 0 for j = 1, .., k heteroskedastic
(ARCH) residuals
a
will deliver the test statistic LM = T R 2 2k under the null
hypothesis. Here T is the number of observations in the auxiliary
regression.
ARCH LM Test in R
> ArchTest(reg 2$residuals, lags = 12)
ARCH LM-test; Null hypothesis: no ARCH effects
data : reg 2$residuals
Chi squared = 1004.562, df = 12, p value < 2.2e 16
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
see RStudio
Distributingwork for |how
prohibited this is by
Downloaded implemented in R.
Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Asymptotically
1 0
b N , X0 X b X0 X 1 .
X X
Example
resulting in
v N (0, I). (26)
P1 y = P1 X + P1 u
y = X
e e +v (27)
Var (v) = I
GLS estimator of :
1
bGLS = e 0X
X e e 0e
X y (28)
0 0 1
= X0 P1 P1 X X0 P1 P1 y
1 0 1
= X0 1 X X y
1
X0 1 X
Var bGLS = . (29)
0 0 zN
This is then essentially what is sometimes (see Wooldridge chapter
8.4) called weighted least squares.
and e
y is merely a re-weighted version of y. X e is re-weighted in the
same fashion. If the first column in X is a column of ones, the first
column of X e will be a column with the reciprocals of all zi s.
Hence, in such
Distributing a case| Downloaded
prohibited there wouldby be
Elia no
Aileconstant in (27).
(yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Note:
Sometimes one suspects that more than one variable drives the
residual variance, Var (ui ).
Reading: Wooldridge (pp 282-284, 4th ed) (pp 276-278, 5th ed)
ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 5
Ralf Becker
Table of contents
Autocorrelation
What is it?
Consequence
Detection
LM test
Extra notes on detection
How to deal with autocorrelation
Newey-West standard errors
Estimation in differences
Empirical Example
Specification Testing
Overview
RESET Test
Assumptions
Assumption TS5
No autocorrelation (serial correlation)..
A simple model
The setup
A simple regression set-up but now the error terms are not iid, but
dependent.
We specify them as following the one process we know that
induced dependence, an AR(1) process:
yt = xt + ut (2)
ut = ut1 + vt (3)
N 0, v2
vt (4)
A simple model
The error term dynamics
If equation (3) is true for ut it is also true for ut1 :
A simple model
The error term properties
E (ut ) = 0 (8)
v2
Var (ut ) = 2 = (9)
1 2
Corr (ut , ut1 ) = (10)
2
Cov (ut , ut1 ) = (11)
k
Corr (ut , utk ) = (12)
2 k
Cov (ut , utk ) = . (13)
Consequence
Returning to the matrix representation of the model, DGP (2) and
(3) can be written as follows:
y = X + u (14)
2
Var (u) = 6= I (15)
Var (u) = (16)
T 1
1 2
1 T 2
2
..
= 2 1 .
.. .. ..
..
. . . .
T 1 T 2 1
where T = sample size.
Consequence
It is cleary not possible to simplify this to 2 I.
Residual processes other than the AR(1) will result in a different
setup for the variance-covariance matrix .
Is ,
b when estimated by means of OLS, still consistent and
efficient?
Similar to Heteroskedasticity. b is still consistent.
The derivation of Var b remains unchanged:
1 0 1
Var b = X0 X X X X0 X (17)
which simplifies to
1
Var b = 2 X0 X (18)
only when = 2 I.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Consequence
ATS1 to ATS5
1
a
b N , 2 X0 X . (19)
ATS1 to ATS3
1 0 1
a
b N , X0 X X X X0 X . (20)
bi 0.5 a
t= N (0, 1) . (21)
sbi
Detection of Autocorrelation
Informal tools - Time Series Plots
Detection of Autocorrelation
Informal tools - Time Series Plots
A: Residual from B: Residual from regressing
regressing agrvalt on log (usdukp)t on
aidt . log (usdukp)t1 .
The left clearly has runs of observations above and below the
mean (of zero).
Residual ut is correlated with its predecessor observation ut1 .
Residuals on the right appear more random.
HowDistributing
could weprohibited
quantify this in a by
| Downloaded statistic?
Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Detection of Autocorrelation
Testing for autocorrelation - LM test
yt = xt + ut (22)
ut = 1 ut1 + 2 ut2 + . . . + k utk + vt
Detection of Autocorrelation
Testing for autocorrelation - LM test
Detection of Autocorrelation
Extra notes on detection
X0 X
b = X0 u
b u b0 X
= X0 u b0 X
b u
= 0
b nw =
Note:
- The exposition here differs from Wooldridge although structure
similar to (chapter 12.5) (Reading: Hamilton, Time Series
Analysis, p.219).
- It also caters for heteroskedastic residuals.
- Parameter estimate remains
unchanged
Inference using Varnw is valid asymptotically only. In the
b
presence of AC and/or HS
bi 0.5 a
t testOLS = ? (30)
sbi ,OLS
bi 0.5 a
t testNW = N (0, 1) (31)
sbi ,nw
yt = + xt + ut (32)
ut = ut1 + vt where vt is iid. (33)
FirstDistributing
note, that prohibited | Downloaded
one cannot by EliaAile
estimate from(yypieesp@abyssmail.com)
(34).
lOMoARcPSD|1057622
The idea behind this is that perhaps (ut ut1 ) reduces to the iid
process vt .
A worked example
Wooldridge Ex 11.7
Specification Testing
Overview
Potential problem:
I Heteroskedasticity
I Autocorrelation
I Omitted variable (specific suspicion about a missing variable)
I Functional form (RESET test)
I Structural change (Chow test)
For heteroskedasticity and autocorrelation the alternatives were
well defined. Formulating a test was straightforward.
Specification Testing
Overview
Omitted variable:
Consider:
yt = + xt + ut (36)
yt = + xt + zt + ut (37)
A simple t-test of the H0 : = 0 will do the trick.
I Quadratic rather than a linear relationship between xt and yt :
yt = + xt + xt2 + ut (38)
Specification Testing
RESET Test
ybt =
b + x
b t. (39)
Specification Testing
RESET Test
ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 6
Ralf Becker
April 6, 2016
Table of contents
Specification Testing
Overview
RESET Test
Specification Testing
Overview
Potential problem:
I Heteroskedasticity
I Autocorrelation
I Omitted variable (specific suspicion about a missing variable)
I Functional form (RESET test)
I Structural change (Chow test)
For heteroskedasticity and autocorrelation the alternatives were
well defined. Formulating a test was straightforward.
Specification Testing
Overview
Omitted variable:
Consider:
yt = + xt + ut (1)
yt = + xt + zt + ut (2)
A simple t-test of the H0 : = 0 will do the trick.
I Quadratic rather than a linear relationship between xt and yt :
yt = + xt + xt2 + ut (3)
Specification Testing
RESET Test
ybt =
b + x
b t. (4)
Specification Testing
RESET Test
What will we do
1. How can we detect this?
2. What do we do about this.
Structural Change
An example
ybt = 0.542 + 0.041 yt1 + 0.122 yt2 0.054 yt3 + 0.690 yt4
Structural Change
An example - Forecasts from full sample estimation
Figure: The data series from 1988Q2 to 2011Q4 and then 8 quarters of
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
forecasts
lOMoARcPSD|1057622
Structural Change
An example - Forecasts from full sample estimation
Figure: prohibited
Distributing The forecasts and realisations
| Downloaded from 2012 and 2013
by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Structural Change
An example - Forecasts from full sample estimation
0.542
E (yt ) = = 2.689 (6)
1 0.041 0.122 + 0.054 0.690
to which we would expect this stationary process to converge.
I Also RESET test has p-value of 0.0019
I If the early observations are from a regime that may not be
relevant any more, then we may want to exclude these
observations.
I Lets re-estimate the model with observations starting from
1992Q1 instead (exclude first 4 years of data).
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Structural Change
An example - Forecasts from 92+ sample estimation
ybt = 1.291 0.047 yt1 + 0.052 yt2 0.118 yt3 + 0.499 yt4
(0.425) (0.092) (0.089) (0.072) (0.074)
RSS = 223.233; n = 80 (7)
Structural Change
An example - Forecasts from 92+ sample estimation
Structural Change
The Chow Test
Structural Change
The Chow Test
k = # of parameters estimated
(RSSr RSSu ) /k
F = (8)
RSSu / (dofu )
F Fk,T 2k . (9)
Structural Change
The Chow Test - Extra Notes
Dummy Variables
An Introduction
Reading: Semester 1
I Different attack to the previous example.
I A dummy variable is a variable that takes values of 0 and 1.
I The criterion that decides between 0 and 1 depends on the
problem,
I male - female
I pre - post 1981
I pre - post EMU etc.
Dummy Variables
An Example
Dummy Variables
sprice
\i = 118210 69142 pooli (12)
(1345) (6026)
Dummy Variables
Two strategies:
1. Estimate two models, one for houses with pool and another
for houses without pool
2. Estimate one model but use the pooli dummy variable
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Dummy Variables
Strategy 1:
Pool: sprice
\i = b0 + b1 livareai (15)
Rp2 = 0.658; RSS = 4.51 10 , n = 13011
No Pool: sprice
\i =
b0 +
b1 livareai (16)
2 12
Rnp = 0.647; RSS = 3.67 10 , n = 2480
Strategy 2:
sprice
\i = b0 + b1 livareai + b2 pooli + b3 (livareai pooli )(17)
Rd um2 = 0.665; RSS = 4.12 1012 , n = 2610
Dummy Variables
Dummy Variables
Testing for significance of Dummy Variables
(RSSr RSSu ) /k
F = (18)
RSSu /dofu
k = number of restrictions
dofu = dof in the unrestricted model
F Fk,dofu .
Dummy Variables
Testing for significance of Dummy Variables
(4.14 4.12) /2
F = = 6.325 (19)
4.12/ (2610 4)
F F2,inf ,Fcv ,0.01 = 4.61 (20)
Dummy Variables
Additional Notes
Dummy Variables
Additional Notes
Dummy Variables
Additional Notes
ECON20110/30370 Econometrics
2014/15 - Semester 2 - Week 8
Ralf Becker
Table of contents
come from?
1. Minimisation of residual sum of squares (Least Squares - LS):
0
min y Xb b0 u
y Xb = u b
or
2. Minimisation of sample moments (Method of Moments - MM)
X0 y Xb = X0 u
b = 0.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Example: Lets say you want to model the amount of goals scored
in an English Premier League Match. Data from all matches from
August 2012 to 24 March 2014. 681 matches. (EPL 2012to14.csv)
A straightforward way to
model this would be to
use OLS for (gi =
goals):
gi = + ui (3)
with ui N(0, 2 ) and
hence
E (gi ) = (4)
P(gi 1) = 0.846
P(gi 3) = 0.450
gi e
f (gi ) = .
gi !
The parameter to be estimated here is .
Parameter Estimation
Compare the empirical histogram with an arbitrary poisson
distribution ( = 3)
Parameter Estimation
How do we find the optimal, the ML parameter estimate?
The density of a Poisson r.v. is
gi e
f (gi ; ) = . (5)
gi !
The parameter to be estimated here is .
What would be the probability of the two first outcomes(say 0 and
5 goals), given a certain value of ?
iid
L (; g1 , g2 ) = f (g1 ; )f (g2 ; )
This is called the likelihood function.
Here we have a product; often more convenient to work with
summations hence take log:
iid
ln L (; g1 , g2 ) = ln f (g1 ; ) + ln f (g2 ; )
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Parameter Estimation
The first few observations are: 0, 5, 3, 5, 2, ...
Lets assume the parameter was = 2 or 5 or 8.
= 2:
ln L ( = 2; g1 , g2 ) = 2 + (3.322) = 5.322
= 5
ln L ( = 5; g1 , g2 ) = 6.740
= 8
ln L ( = 8; g1 , g2 ) = 10.390
The larger the value of ln L (; g1 , g2 ) the more likely that the data
were drawn from the respective distribution.
From which distribution did the data most likely come? Here from
the Poisson with = 2.
We only used the first two datapoints and we did not search over
all possible parameter
Distributing values. by Elia Aile (yypieesp@abyssmail.com)
prohibited | Downloaded
lOMoARcPSD|1057622
Parameter Estimation
We only used the first two datapoints and we did not search over
all possible parameter values.
P(gi 1) = 0.9379
P(gi 3) = 0.5256
P(gi < 0) = 0
Parameter Estimation
Conditional models
Say you want to recognise that the number of goals may depend
on a number of explanatory variables: e.g. whether the match is a
home match for a top team or not, perhaps the Table position
of the teams, the temperature on the day, etc.
gi = + 1 topi + ui (7)
E (gi |topi ) = + 1 topi (8)
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Parameter Estimation
Conditional models
How would we adjust the Poisson model?
We need to adjust the density in (5) as follows:
gi i e i
f (gi | topi ; ) = .
gi !
now changes, meaning we get varying conditional expectations
E (gi |topi ) = i .
i is specified as follows
i = exp(0 + 1 topi )
We use the exp() function to ensure that i is positive.
We then find the ML parameter estimates
Outlook
ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 8
Ralf Becker
Table of contents
Bayesian Econometrics
Introduction
The Basics
Summary Comparison
An Example
Summary
Bayesian Econometrics
Introduction
Frequentists Econometrics
This is what we have done so far.
I Data, y , are observed.
I Model, we choose a particular model, say M, e.g.
y = X + u (1)
Frequentists Econometrics
Bayesian Econometrics
The crucial difference
This implies that Bayesians are really looking for p(|y ), i.e. the
probability distribution of conditional on the observed data, y
(and the assumed model, M with associated error distribution).
The observed data (potentially a vector of data) may well be
random as well and be characterised by p(y ).
Bayesian Econometrics
The Basics
Recall the following basic probability rule (where A and B are
events):
P(A, B)
P(A|B) = (4)
P(B)
The same is valid for random variables, a and b, rather than events
p(a, b)
p(a|b) = (5)
p(b)
p(a, b)
p(b|a) = p(a, b) = p(b|a)p(a) (6)
p(a)
Now substitute the second line into the first and obtain
p(b|a)p(a)
p(a|b) = (7)
p(b)
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Bayesian Econometrics
The Basics
Why is this useful?
p(b|a)p(a)
p(a|b) = (8)
p(b)
Think of our two random variables, the parameter vector, and
the data y .
p(y |)p()
p(|y ) = (9)
p(y )
The left hand side is the object of desire for Bayesians, the
posterior distribution of conditional on the observed data.
You could look at the mean of that distribution as your one best
Distributing
estimate of .prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Bayesian Econometrics
The Basics
Bayesian Econometrics
Summary comparison
Frequentists:
Bayesians:
An Example
Setup
To get a flavour for the
calculations required we
will work through an
example.
An Example
Setup
An Example
Frequentists Approach
An Example
Bayesian Approach
We apply
An Example
Bayesian Approach
For simplicity, discretise the problem and consider the 101 possible
values,
1 = 0.0, 2 = 0.01, 3 = 0.02, ..., 100 = 0.99, 101 = 1.0
where the second line is justPa rescaling to ensure that p(i |yt ) is a
probability distribution and 101
i=1 p(i |yt ) = 1. Before we can start
we need a prior distribution p(i )
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
An Example
The prior distribution
1. N[0.4, 0.052 ]
2. N[0.6, 0.052 ]
3. U[0, 1]
An Example
The updating mechanism
Recall, this is what we do.
p(yt |i ) = i if yt = 1 (20)
p(yt |i ) = (1 i ) if yt = 0 (21)
An Example
The posterior distribution
after 163 annual updates:
An Example
The prior distribution
Once we have the posteriors after all updating we can calculate the
following probabilities:
Bayesian Econometrics
Summary
CONTRAs
I Arbitrary choice of priors
I With uninformative priors (sometimes) same results as
frequentists
I When allowing for continuous parameter distributions we need
to use numerical integration (computationally intensive!)
PROs
I Ability to find probabilities for parameters of interest
I Ability to deal with latent/unobserved random variables
I Computational issues become less important