Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ecf480 FPD 9 2015 2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

UNIT 3: LARGE SAMPLE PROPERTIES

This unit will introduce the notion of large sample properties. These are properties that
estimators possess as the sample size hypothetically increases to infinite.

Objectives

 Students are introduced to the optimal properties that estimators possess in large
samples.
 Students appreciate the importance of large samples in regression analysis

By assumption the least estimators have some optimal properties in finite samples. In addition
to the BLUE properties, the Least Squares estimators are also assumed to be normally
distributed. However, it has been shown that when some assumptions are relaxed the
estimators cease to possess some of the desirable properties. For instance, when the error
terms are not normally distributed the Least Squares Estimators cannot be said to be normal
(recall that this assumption is essential for the purposes of hypothesis testing). It can also be
shown that if the explanatory are not independently distributed of the error terms then it can be
shown that the Least Squares are not unbiased.

As the sample size hypothetically increases to infinite the Least Squares will possess certain
optimal properties known as large samples. Below we briefly discuss some of the properties.

a) Asymptotic Unbiasedness

An estimator θ^ is said to be an asymptotically unbiased estimator of θ if

lim E ( θ^ n) =¿ θ ¿
n→∞

Where θ^ n means that the estimator is based on a sample size of n .

Consider the variance of a random variable X , it is given as:

2
S=
∑ ( X i−X )
2

It can be shown that

( 1n )
E ( S2 ) =σ 2 1−

In a small sample the estimator, S2 , of the true population variance, σ 2, is biased as E ( S2 ) ≠ σ 2.


However in large samples it can be shown that the estimator is unbiased. That is:

lim E ( S 2 )=σ 2
n→∞

1
b) Consistency

An estimator θ^ is said to be a consistent estimator if it approaches the true value θ as the


sample size gets larger and larger (approaches infinite). The estimator will be said to be
consistent if the probability that the absolute value of the difference between θ^ and θ is less
than δ (an arbitrarily small positive quantity) approaches unity. Symbolically

^
lim P { ¿ θ−θ∨¿ δ }=1 δ >0
n→∞

This can also be written as

p lim θ^ ¿ θ
n →∞

c) Asymptotic Efficiency

Let θ^ be an estimator of θ . The variance of the asymptotic distribution of θ^ is called the


asymptotic variance of θ^ , If θ^ is consistent and its asymptotic variance is smaller than the
asymptotic variance of all other consistent estimators of θ , θ^ is said to be asymptotically
efficient.

d) Asymptotic Normality

An estimator θ^ is said to be asymptotically normally distributed if its sampling distribution tends


to approach the normal distribution as the sample size n increases indefinitely.

2
UNIT 4: AUTOCORRELATION
In this unit we examine in detail one of the violation of the assumptions of the Classical Linear
Regression Model.

Objectives:

 Students understand the notion of Serial correlation and why violation of the assumption
of no serial correlation has serious implication for hypothesis testing.

Recall that the one of the assumption of the classical linear regression model is that the error
terms are not correlated. That is to say that errors are not interdependent. This assumption may
be tenable in cross sectional data but not in time series data. Recall that the assumption of no
autocorrelation is simply specified as:

E ( U t U t + s )=0 for s ≠ 0

This means that two errors s periods apart are not interdependent. As mentioned this
assumption may be violated in time series data, implying that E ( U t U t + s ) ≠ 0 for s ≠ 0. There are
several reasons why autocorrelations may occur:

 Inertia. Most economic time series exhibit the habit of inertia or sluggishness. Data such
as GNP, price indexes, production and employment exhibit cycles. Starting at the bottom
of the recession, when economic recovery starts, the series start to move upward. Thus
momentum is built into them, and it continues until something happens to slow them
down. Therefore, in regressions involving time series data, successive observations are
likely to be interdependent.
 Specification Bias: Excluded Variables Case. Once a relevant variable is omitted
from the estimated model this may show up as autocorrelation in the error terms.
 Specification Bias: Incorrect Functional Form. Sometimes the researcher may
estimate the econometric model using the wrong functional form. For instance,
estimating a linear cost function instead of a quadratic cost function.
 Cobweb Phenomenon
 Lags. Some time series model require that a lagged regressand be included among the
regressors. Omission the lagged regressand among the regressors may indicate the
presence of autocorrelation in the error terms.
 “Manipulation” of Data
 Data Transformation
Nature of Autocorrelation

3
Having studied the causes of autocorrelation it is important to know the precise form that the
autocorrelation takes. In other words, we want to impose a structure on how the error terms are
related.

The most commonly assumed structure of autocorrelation is the first order autoregressive
scheme. To illustrate this structure, we assume the simple linear regression model,
Y t =β 1+ β2 X t + U t .

Given this model it is assumed that E ( U t U t + s ) ≠ 0 for s ≠ 0.

The first order autoregressive scheme assumes that the error terms are related as1:

U t =ρ U t −1 + ε t −1 < ρ < 1

εt is a stochastic error that obeys the following assumptions.

E ( ε t ) =0

2
var ( ε t ) =σ ε

cov ( ε t , ε t +s ) =0 for s ≠ 0

Given the AR(1) it has been shown that

σ 2ε
var ( u t ) =E ( U ) =
2
t
1−ρ2

σ 2ε
cov ( U t , U t +s ) =( U t U t +s ) =ρs
1−ρ2
2
cor ( U t , U t +s ) =ρ

Since ρ is a constant between -1 and +1, it holds that under the AR(1) scheme the error term U t
is still homoscedastic. However U t is correlated not only with its immediate past value but its
values several periods in the past.

1
The error terms may be correlated with higher order autoregressive schemes such as
U t =ρ 1 U t−1 + ρ2 U t−2 +ε t (2nd Order autoregressive scheme) or more generally put,
U t =ρ 1 U t−1 + ρ2 U t−2 +… ρn U t −n+ ε t . For explanation purposes the first order autoregressive will
suffice.

4
It must be noted that ¿ ρ∨¿1 this implies that the mean, variance and covariance of U t is
stationary; that is they do not change over-time. It can also be seen that the value of the
covariance will decline as we go into the distant past.

OLS Estimation in the presence of Autocorrelation

Consider the regression model, Y t =β 1+ β2 X t + U t , the least squares estimator ^β 2=


∑ xt yt
∑ x2t
under the first order autoregressive scheme has the variance:

[ ∑ x t x t −1 +2 ρ2 ∑ x t x t−2 + …+2 ρn−1 ∑ x1 x n


]
2
σ
var ( ^β 2) AR 1= 1+ 2 ρ
∑x 2
t ∑ x 2t ∑ x2t ∑ x 2t

Therefore under the autocorrelation, the variance of ^β 2 depends on the value of ρ and the
sample correlations between the values taken by the regressor X at various lags.
2
σ
Recall that under the assumptions of no autocorrelation the variance of ^β 2 is ( β^ 2 ) = .
∑ x2t
hence it can be seen that the variance of ^β 2 under the first order autoregressive scheme is the
multiple of the variance of var ( ^β 2) times a term that depends on ρ and the sample correlations
between the values taken by the regressor X at various lags.

In the presence of autocorrelation, the least squares estimator is still linear and unbiased.
Assuming that we obtain the estimator without adjusting for autocorrelation but we instead
adjust the variance of the estimator for autocorrelation as shown above, the least squares
estimator computed under such conditions is not efficient, i.e., in the class of linear and
unbiased, it will not have minimum variance.

Therefore, we have to have an estimator that is BLUE.

THE BLUE ESTIMATOR IN THE PRESENCE OF AUTOCORRELATION

To obtain the BLUE when there is autocorrelation in the error term we resort to the Generalized
Least Squares method. The BLUE estimator for the regression model (above) is given as:
n

∑ ( x t −ρ x t−1 ) ( y t −ρ yt −1 )
^β GLS = t =2 +C ,
2 n

∑ ( x t− ρ x t −1) 2

t=2

5
Where C is a correction factor that is usually disregarded in practice. Its variance is given as:
2
σ
Var ^β 2 =
GLS
n
+D
∑ ( x t −ρ x t −1 ) 2

t=2

Where D is a correction factor that maybe disregarded in practice.

Note that the generalized Least squares method utilizes the information given by the
autocorrelation structure as given by the value of ρ . When ρ=0, the Generalized Least Squares
method is identical to the Least Squares method.

Consequences of Using OLS in the presence of Autocorrelation

To determine the consequences of autocorrelation we consider two cases;

i) OLS estimation allowing for autocorrelation

It has been shown that the confidence interval of the estimator computed using the variance
var ( ^β 2) AR 1 is much wider than that computed using the variance Var ^βGLS
2 . Even as the sample

size increases the confidence intervals will still be much wider, hence the estimator ^β 2 is not
asymptotically efficient even after allowing for autocorrelation. The implication for hypothesis
testing is that we are likely declare a coefficient statistically insignificant even though in fact it
may be.

ii) OLS estimation disregarding autocorrelation

Disregarding autocorrelation entails that we implement our estimation and hypothesis as though
σ2
there is no autocorrelation in the model. In this instance our estimator will be var ( ^β 2) =
∑ x 2t
^ σ2
with variance denoted as var ( β 2) = . A number of errors will arise due to this omission:
∑ x 2t

a. The residual variance σ^ =


2 ∑ e2i is likely to underestimate the true σ 2.
( n−2 )
b. As a result, we are likely to overestimate R2.
c. Even if σ 2 is not underestimated, var ( ^β 2) may underestimate var ( ^β 2) AR 1, its variance
under (first-order) autocorrelation, even though the latter is inefficient compared to
Var ^β 2 .
GLS

6
d. Therefore, the usual t and F tests of significance are no longer valid, and if applied, are
lilkely to give seriously misleading conclusions about the statistical significance of the
estimated regression coefficients.
DETECTING AUTOCORRELATION

a. Graphical Method
Plot the residuals against time. Any systematic patterns in the residuals over time is a good
indicator that the model may be suffering from autocorrelation in the error terms.

b. The Runs Test


c. Durbin-Watson d Test
The Durbin-Watson test is a test of first order autocorrelation. It is implemented by computing a
Durbin Watson Statistic. The statistic is then compared against some critical values to
determine whether positive or negative serial correlation is present in the model. In some
instances, the test may be inconclusive as to whether there is negative or positive serial
correlation in the model.

The Durbin Watson statistic is computed as follows:


t=n

∑ ( et −e t−1 )2
d= t =2 t=n

∑ e 2t
t=1

The following assumptions underlie the Durbin Watson statistic:

i) The regression model includes the intercept term.


ii) The explanatory variables, the X ' s, are nonstochastic, or fixed in repeated sampling.
iii) The disturbances U t are generated by the first-order autoregressive scheme:
U t =ρ U t −1 + ε t . Therefore, it cannot be used to detect higher-order autoregressive
schemes.
iv) The error term U t is assumed to be normally distributed.
v) The regression model does not include the lagged value(s) of the dependent variable
as one of the explanatory. Thus, the test is inapplicable in models of the following
type:
Y t =β 1+ β2 X 2 t + β 3 X 3 t + …+ β k X kt + γ Y t −1 +U t

vi) There are no missing observations in the data.


The exact sampling or probability distribution of the d statistic is difficult to derive. The critical
values of the Durbin Watson statistic lie between 0 and 4. The Durbin Watson critical values are
given in the Durbin-Watson Tables. The Durbin-Watson Table gives the lower and upper bound
Durbin-Watson values for given sample sizes at the five percent level of significance.

7
The null-hypothesis states two null-hypothesis – the null hypothesis that there is no positive
autocorrelation and the one that is no negative serial correlation.

Zone A Zone B Zone C Zone D Zone E

0 2 4- 4- 4
2

The decision rules are as follows if the Durbin-Watson statistic lies in:

 Zone A – Reject H 0 and conclude that there is evidence of positive autocorrelation


 Zone B – This is the zone of indecision.
 Zone C - Do not reject either of the null-hypothesis.
 Zone D – This is the zone of indecision
 Zone E – Reject H 0 and conclude that there is evidence of negative serial correlation.

d. A General Test of Autocorrelation: The Breusch-Godfrey (BG) Test


This test improves upon the Durbin-Watson test. The test is general because it incorporates the
following elements:

i) Non-stochastic regressors, such as the lagged values of the regressand; ii) higher-
order autoregressive schemes and iii) simple or higher-order moving of white noise
error terms. (See Gujarati for Details).

REMEDIAL MEASURES

8
Autocorrelation maybe pure or a result of model mis-specification. If it is pure autocorrelation
there are several procedures that may be used to overcome the problem of autocorrelation. In
the case of autocorrelation arising from model misspecification other appropriate methods may
be employed to reverse the misspecification.

When autocorrelation is detected it may be due tom pure correlation of the error terms or due to
misspecification of the model. If the autocorrelation is due to misspecification, the remedy is
simply to correct the misspecification. However, if the autocorrelation is pure, the following
remedies are proposed:

The Method of Generalized Least Squares (GLS)

To understand this method we assume that simple regression model that is plagued by
autocorrelation of the first order autoregressive; that is:

Y t =β 1+ β2 X t + U t
(1)

And plagued with autocorrelation of the first order autoregressive nature

U t =ρ U t −1 + ε t −1 < ρ < 1

The researcher is typically faced with two issues; when ρ is known and when ρ is unknown.

Case I: when ρ is known

If Y t =β 1+ β2 X t + U t holds at time t then it hold at time ( t−1 ) equation (2) will hold true.

Y t −1=β 1+ β 2 X t −1 +U t −1
(2)

Multiplying equation (2) on both sides by ρ yields:

ρ Y t −1= ρ β 1 + ρ β 2 X t −1+ ρU t−1


(3)

Subtracting equation (3) from equation (1) yields:

(Y ¿ ¿ t −ρ Y t −1 )=β 1 ( 1−ρ ) + β 2 ( X ¿ ¿ t−1− ρX t −1)+ε t ¿ ¿


(4)

Where ε t=(U ¿ ¿ t−ρ U t −1 )¿

Equation (4) can be expressed as:


¿ ¿ ¿ ¿
Y t =β 1+ β2 X t + ε t
(5)

9
¿ ¿ ¿ ¿
Where β 1=β 1 ( 1−ρ ), Y t =(Y ¿ ¿t− ρY t −1) ¿, X t =( X ¿ ¿ t−1− ρX t −1)¿, and β 2=β 2.

The error term in equation (5) satisfies all the usual OLS assumptions, Least Squares
¿ ¿
estimation is then applied to the transformed variables Y t and X t , and the estimators obtained
satisfy all the optimum properties, that is they are BLUE.

Case II: when ρ is unknown

In practice the value of ρ is rarely known. Therefore Econometricians have to find a way of
estimating ρ .

 The First-Difference Method


As ρ lies between 0 and ± 1. At one extreme when ρ=0, that is no serial correlation and when
ρ=±1, that is perfect positive or negative correlation. If ρ=+ 1, the generalized difference
equation (5) reduces to

Y t −Y t −1 =β2 ( X t − X t−1 ) + ( U t −U t −1 )
(7)

∆ Y t =β 2 ∆ Y t + ε t
(8)

Where ∆ is the first difference operator.

The first difference transformation may be appropriate if the coefficient of autocorrelation is very
high, in excess of 0.8, or the Durbin-Watson d is quite. Madam proposed the rule of thumb: Use
the first difference form whenever d < R2 .

 ρ Based on Durbin-Watson d Statistic

ρ can be estimated from the relationship between ρ and the Durbin-Watson d statistic. ρ can be
estimated as:

d
^ρ ≈ 1−
2

 ρ Estimated from the Residuals

If the autocorrelation structure follows the AR(1) scheme U t =ρ U t −1 + ε t , ρ is estimated by


regressing e t on e t−1. That is the following regression

e t =ρ et −1+ v t

The Newey-West Method of Correcting the OLS Standard Errors

10
Instead of using the Feasible Generalized Least Squares discussed above, the Least Squares
estimators can still be used but the standard errors are corrected by implementing a procedure
developed by Newey and West. The corrected standard errors are known as Newey-West
standard errors or Heteroscedasticity and autocorrelation consistent standard errors.

Example

In macroeconomics you learnt that output is function of labor and capital inputs and hence the
production function can be specified as:

Y =f ( K , L)

Using econometric methods, we can specific a simple production as:

Y t = ^β 1+ ^β2 K t + β^ 3 Lt +e t

Where Y is a measure of output.

K is some measure of capital input.

L is labour input.

We estimate a simple production function for Zambia using the STATA software and data
obtained from the Penn-world Tables.

As a measure of output we utilize the variable - Expenditure-side real GDP at chained PPPs (in
mil. 2005US$) (rgdpe ¿

As a measure of Labour input we use the variable - Human capital index, based on years of
schooling (Barro/Lee, 2010) (hc )

As a measure of the capital input we utilize the variable - Capital stock at current PPPs (in mil.
2005US$) (ck )

We utilize time series data from 1955 to 2011.

The model we estimate is:

rgdpet = ^β 1+ ^β2 hc t + β^ 3 ck t + et

The following output was obtained:

11
. reg rgdpe hc ck

Source SS df MS Number of obs = 57


F( 2, 54) = 28.48
Model 614334150 2 307167075 Prob > F = 0.0000
Residual 582343042 54 10784130.4 R-squared = 0.5134
Adj R-squared = 0.4953
Total 1.1967e+09 56 21369235.6 Root MSE = 3283.9

rgdpe Coef. Std. Err. t P>|t| [95% Conf. Interval]

hc -16193.1 3201.031 -5.06 0.000 -22610.78 -9775.419


ck .5292661 .0753843 7.02 0.000 .3781297 .6804025
_cons 29673.93 4357.548 6.81 0.000 20937.58 38410.29

The estimated model can be expressed as:

rgdpet =29673.93−16193.1hc t + 0.5293ck t +e t

As usual, after estimating the model we have to test whether the model is statistically significant.
Using the F test and inspecting the p value for the F test statistic we conclude that at the 5 %
level of significance our model is statistically significant.

An inspection of the partial regression coefficients shows that at the 5 % of level of significance
all our partial regression coefficients are statistically significant.

However, before we proceed to the interpretation of the model, we carry out diagnostic tests,
the first test we carryout is a test to determine whether the error terms are serially correlated.
Recall that there a number of tests that we can employ but as a first approximation, we plot the
residuals (obtained from the regression) against time. Such a plot yield the graph below:

12
10000
5000
Residuals
0
-5000

1960 1980 2000 2020


Year

The graph shows that the residuals are displaying a systematic pattern and hence there is a
strong evidence that the error terms may be serially correlated or perhaps our model may be
mis-specified. To determine whether there is serial correlation we use the Durbin-Watson test:

After implementing the Durbin Watson, we Durbin Watson Statistic equal to:

Durbin−Watson d−statistic(3 , 57) = 0.2846755

To make use of the Durbin – Watson statistic in deciding whether the error terms are serially
correlated of order 1, we have to determine the critical values (I leave this to the students to do).
After determining the critical values, the test statistic above indicates that the error terms are
negatively serially correlated.

Hence we have to find a remedy to cure the serial correlation in the error terms, to do so we
estimate the regression and we then obtain the robust standard errors. The resulting model is
given as:

13
. reg rgdpe hc ck, vce(robust)

Linear regression Number of obs = 57


F( 2, 54) = 24.98
Prob > F = 0.0000
R-squared = 0.5134
Root MSE = 3283.9

Robust
rgdpe Coef. Std. Err. t P>|t| [95% Conf. Interval]

hc -16193.1 3351.359 -4.83 0.000 -22912.17 -9474.028


ck .5292661 .079302 6.67 0.000 .3702752 .6882571
_cons 29673.93 4738.529 6.26 0.000 20173.75 39174.11

After estimating the model, we use the Durbib-Watson do determine whether the error terms are
still serially correlated. The Durbin-Watson Test Statistic obtained is:

Durbin−Watson d−statistic(3 , 57) = .2846755

This indicates that there is still serial correlation in our error terms. Most likely the serial
correlation that the error terms are detecting is due to model misspecification and not pure serial
correlation.

14

You might also like