Chapter 4

CHAPTER FOUR
VIOLATING THE ASSUMPTIONS OF

THE CLASSICAL LINEAR
REGRESSION MODEL (CLRM)
4.1 Introduction
The estimates derived using OLS techniques and

the inferences based on those estimates are valid
only under certain conditions.
In general, these conditions amount to the
regression model being "well-specified".
A regression model is statistically well-specified for
an estimator (say, OLS) if all assumptions needed
for optimality of the estimator are satisfied.
Before we relax the assumptions of the CLRM, let
us recall: (i) basic steps in a scientific enquiry &
(ii) the assumptions made.
4.1 Introduction
I. The Major Steps Followed in a Scientific Study:

Study
1. Specifying a statistical model consistent with theory
(model representing the theoretical relationship
between a set of variables).
This involves at least two choices to be made:
A. choice of variables to be included in the model,
B. choice of the functional form of the link (linear in
variables, linear in logs of the variables, polynomial
in regressors, etc.)
2. Selecting an estimator with certain desirable
properties (provided that the regression model in
question satisfies a given set of conditions).
4.1 Introduction
3. Estimating the model. When can one estimate a

model? (sample size? perfect multicollinearity?)
4. Testing for the validity of assumptions made.
5. a) If there is no evidence of misspecification, go on
to conducting statistical inferences.
5. b) If the tests show evidence of misspecification in
one/more relevant forms, two possible courses of
action:
If the precise form of misspecification is known, then
find an alternative estimator.
Regard statistical misspecification as an indication of a
defective model. Then, search an alternative, well-
specified model & start over (return to Step 1).
4.1 Introduction
II. The Assumptions of the CLRM:
A1: n ≥ K+1. Otherwise, estimation is not possible.
A1
n > K+1 is needed for inference!
A2: No perfect multicollinearity among the X's.
A2
A3: ɛi|Xji ~ IID(0,σ2) or
A3 σ 2 for s = t
E(ε s εt | X j ) = 
0 for s ≠ t
A3.1: var(ɛi|Xj) = σ2 (0 < σ < ∞).
2
A3.2: cov(ɛi,ɛs|Xj) = 0, for all i ≠ s; i, s = 1, …, n.

A4: ɛi's are normally distributed: ɛi|Xj ~ N(0,σ2).
A4
A5: E(ɛi|Xj) = E(ɛi) = 0; i = 1, …, n & j = 1, …, K.
A5
A5.1: E(ɛi) = 0 and X’s are non-stochastic, or
A5.2: E(ɛiXji)=0 or E(ɛi|Xj)=E(ɛi) with stochastic X’s
Implication: ɛ is independent of Xj & thus cov(ɛ,Xj)=0
4.1 Introduction
The several tests for violations of the assumptions
of the CLRM are tests of model misspecification.
The values of the test statistics for testing

particular H0's tend to reject these H0's when the
model is misspecified in some way.
e.g., tests for heteroskedasticity or autocorrelation
are sensitive to omission of relevant variables.
A significant test statistic may indicate hetero-
skedasticity or autocorrelation, but it may also
reflect omissionof relevant variables.
4.1 Introduction
Outline of the Chapter:
1. Small Samples (A1?)
2. Multicollinearity (A2?)
3. Non-Normal Error terms (A4?)
4. Non-IID Error terms (A3?):
A. Heteroskedasticity (A3.1?)
B. Autocorrelation (A3.2?)
5. Endogeneity (A5?):
A. Stochastic Regressors and Measurement Errors
B. Model Specification Errors:
a. Omission of Relevant Variables
b. Wrong Functional Form
c. Inclusion of Irrelevant Variables (?XXX)
d. Stability of Parameters
C. Simultaneity (or Reverse Causality)
4.2 Sample Size: Problems with Few Data Points
Requirement for estimation: n ≥ K+1.

If n is small, it may be difficult to detect violations
of assumptions.
With small n, it is hard to detect heteroskedast-
icity or non-normality of ɛi's even when present.
Though no assumption is violated, a regression
with small n may not have sufficient power to reject
βj = 0, even if βj ≠ 0.
If [(K+1)/n] > 0.4, it will often be difficult to fit a
reliable model.
Rule of thumb: aim to have n≥6X & ideally n≥10X.
4.3 Multicollinearity
Many social research studies use a large number of

predictors. Problems arise when the various
predictors are highly & linearly related.
Recall that, in a MLR, only the independent
variation in a regressor is used in estimating the
coefficient of that regressor.
If X1 & X2 are highly correlated, the coefficients of
X1 & X2 will be determined by the minority of cases
where they don’t overlap.
Perfect multicollinearity: occurs when one (or
more) of the regressors in a model (e.g., XK) is a
linear function of other/s (Xi, i = 1, 2, …, K-1).
For instance, if X2 = 2X1, then there is a perfect (an

exact) multicollinearity between X1 & X2.
Suppose, PRF: Y=β0+β1X1+β2X2+Ui, & X2=2X1.
The OLS technique yields 3 normal equations:
∑ Yi = nβˆ0 + βˆ1 ∑ X 1i + βˆ 2 ∑ X 2i
2
∑ Yi X 1i = βˆ0 ∑ X 1i + βˆ1 ∑ X 1i + βˆ 2 ∑ X 1i X 2i
2
∑ Yi X 2i = βˆ0 ∑ X 2i + βˆ1 ∑ X 1i X 2i + βˆ 2 ∑ X 2i
But, substituting 2X1 for X2 in the 3rd equation

yields the 2nd equation. i.e., one of the normal
equations is in fact redundant.
Thus, we have only 2 independent equations but 3
unknowns (β's) to estimate.
As a result, the normal equations will reduce to:

∑
Y = nβˆ + [βˆ +2βˆ ] X
i 0 1 2 ∑ 1i
∑Y X
i 1i = β0 ∑ 1i 2 ∑ 1i
ˆ X + [βˆ +2βˆ ] X2
1
The number of β's to be estimated is greater than
the number of independent equations.
So, if two or more X's are perfectly correlated, it is
not possible to find the estimates for all β's.
i.e., we cannot find β̂1 & β̂2separately, but β̂1 + 2β̂2 .
α̂ = β̂1 + 2β̂2 =
∑YX i 1i − nX1Y
β̂0 = Y − [β̂1 + 2β̂2 ]X1
∑X 2
1i − nX12 &
High, but not perfect, multicollinearity: two/more

regressors in a model are highly (but imperfectly)
correlated. e.g. X1 = 3 – 5XK + ui.
This makes it difficult to isolate the effect of each of
the highly collinear X's on Y.
If there is inexact but strong multicollinearity:
collinear regressors explain the same variation in Y.
estimated coefficients change radically depending on
inclusion/exclusion of other predictor/s.
β̂' stend to be very shaky from one sample to another.
standard errors of β̂' s will be inflated, and as a result,
t-tests will be insignificant & CIs become wide
(rejecting H0: βj = 0 becomes very rare).
low t-ratios but high R2 (or F): i.e., no much individual

variation in the X's, but a lot of common variation.
Yet, the OLS estimators are BLUE.
BLUE
BLUE – a property of repeated-sampling – says
nothing about estimates from a single sample.
But, multicollinearity is not a problem if the
principal aim is prediction, given that the same
pattern of multicollinearity persists into the
forecast period.
Sources of Multicollinearity:
Improper use of dummy variables. (Later!)
Including (almost) the same variable twice.
Method of data collection used (e.g. sampling over
a limited range of X values).
Including a variable computed from other
variables in the model (e.g. using family income,
mother’s income & father’s income together).
Adding many polynomial terms to a model,
especially if the range of the X variable is small.
Or, it may just happen that variables are highly
correlated (chance!).
Detecting Multicollinearity:
The classic case of multicollinearity occurs when R2
is high (& significant), but none of the X's is
significant (some may even have wrong sign).
Detecting the presence of multicollinearity is more

difficult in the less clear-cut cases.
Sometimes, simple or partial coefficients of
correlation among regressors are used.
However, serious multicollinearity may exist even if
these correlation coefficients are low.
A statistic commonly used for detecting multi-
collinearity is VIF (Variance Inflation Factor).
2
From a SLR of Y on Xj we have: var( β̂ j ) = σ
From MLR regression of Y on X's: ∑ ji
x 2
2 2 2
var(βˆ j ) =
σ
= VIFj *
σ where R j is the R2 from
∑ ji
x 2
(1 − R 2
j ) ∑ ji
x 2
regressing Xj on all other X's.
The difference between variances of βj in the 2

cases arises from the correlation between Xj & the
other X's, and is captured by: VIF = 1
1 − R 2j
j
If Xj is not correlated with the other X's, R = 0,

2
j
VIFj = 1 and the two variances will be identical.
As Rj2 increases, VIFj rises.
If Xj is perfectly correlated with other X's, VIFj=∞.
Implication for precision (or CIs)???
So, large VIF is a sign of serious or “intolerable”
multicollinearity.
There is no cutoff point on VIF beyond which

multicollinearity is taken as intolerable.
A rule of thumb: VIF ≥ 10 is a sign of severe
multicollinearity.
Solutions to Multicollinearity:
Solutions depend on the sources of the problem.
The formula below is indicative of some solutions:
vâr(β̂ j ) =
2
σ̂
=
2
∑ ei
∑ x ji (1 − Rj ) (n− K − 1)∑x2ji (1− R2j )
2 2
More precision (or lower vaˆr(βˆ j )) may result from:

a) smaller RSS – less noise, ceteris paribus (cp);
b)larger sample size (n) relative to No. of β's (K+1), cp;
c) greater variation in values of each Xj, cp;
d) less correlation between regressors, cp.
Thus, serious multicollinearity may be solved by
using one/more of the following:
1. Increasing sample size (if possible). ???
2. Utilizing a priori information on parameters (from
theory or prior research).
3. Transforming variables or functional form:
a) Using ΔX instead of X (where the cause may be X's
moving in the same direction over time).
b)In polynomial regressions, using X j − X j instead of
Xj tends to reduce collinearity.
c) Usually, logs are less collinear than levels.
4. Pooling cross-sectional and time-series data.

5. Dropping one of the collinear predictors. ???
But, this may lead to the omitted variable bias.
6. To be aware of its existence and employing cautious
interpretation of results.
4.4 Non-
Non-normality of the Error Term
Normality is not required to get BLUE of β's.

The CLRM merely requires errors to be IID.
Normality of errors is required only for valid
hypothesis testing, i.e., validity of t- and F-tests.
In small samples, if errors are not normally
distributed, estimated coefficients will not follow
normal distribution, which complicates inference.
NB: No obligation on X's to be normally
distributed!
A formal test of normality is Shapiro-Wilk test [H0:
errors are normally distributed].
Large p-value shows that H0 cannot be rejected.
4.4 Non-
Non-normality of the Error Term
If H0 is rejected, transforming the regressand or re-

specifying the functional form of the model may help.
With large samples (due to the CLT) hypothesis

testing is (asymptotically) valid even if the
distribution of errors deviates from normality.
4.5 Non-
Non-IID Errors
The assumption of IID errors is violated if a

(simple) random sampling cannot be assumed.
Specifically, the assumption of IID errors fails if:
1) errors are not identically distributed, i.e., if
var(εi) varies with observation, heteroscedasticity
2) errors are not independently distributed, i.e., if
εi's are correlated to each other, serial correlation
3)errors are both heteroscedastic & auto-correlated
Common in panel & time series data.
4.5.1 Heteroskedasticity
One assumption of the CLRM is homoskedasticity,

i. e., var(εi|X) = var(εi) = σ2.
This holds if the observations of the error term are
drawn from identical distributions.
Heteroskedasticity is present if var(εi) = σi2 ≠ σ2:
different variances for different segments of the
population (segments by the values of the X's).
e.g.: Variability of consumption rises with rise in
income, i.e., people with higher incomes display
greater variability in consumption.
Heteroskedasticity is more likely in cross-sectional
than time-series data.
One consequences of
hetrosecedasticity is that
default OLS standard
errors are incorrect.
With a correctly specified model (in any other

aspect), but heteroskedastic errors, OLS estimators
are unbiased & consistent but inefficient.
Reason: OLS estimator for σ2 (and thus for the
standard errors of the coefficients) are biased.
Hence, CIs based on biased standard errors will be
wrong, and the t & F tests will be invalid.
NB: Heteroskedasticity could be a symptom of other
problems (e.g. omitted variables).
If heteroskedasticity is a result of specification
error (say, omitted variables), OLS estimators will
be biased & inconsistent.
With heteroskedasticity, OLS is not optimal: it

gives equal weight to all observations; actually,
observations with larger error variances (σi2)
contain less information than those with smaller σi2
To correct, give less weight to data points with
greater σi2 and more weight to those with smaller
σi2. [i.e., use GLS (WLS or FGLS)].
Detecting Heteroskedasticity:
A. Graphical Method
Run OLS & plot squared residuals vs. Ŷ or each X.
The graph may show some r/p (linear, quadratic,
…), providing clues as to the nature of the problem
and a possible remedy.
e.g. suppose the plot of ũ2 (from Y=α+βX+u) vs. X

signifies that var(ui) increases proportional to X2;
(i.e., var(ui) = σi2 = cXi2). What is the Solution?
Transform the model by dividing throughout by X.
Y 1 X u
= α +β + ⇒ y* = αx * + β + u *
X X X X
u* is homoskedastic: V(ui*) = V(ui/Xi) = (1/Xi2)V(ui)
= (1/Xi2)cXi2 = c; i.e, WLS solves heteroskedasticity!
WLS yields BLUE for the transformed model.
If the pattern of heteroskedasticity is unknown, log
transformation of both sides may solve the problem
But, this cannot be used with 0 or negative values.
B. A Formal Test:
The most-often used test for heteroskedasticity is
the Breusch-Pagan (BP) test.
H0: homoskedasticity vs. Ha: heteroskedasticity
Regress ũ2 on Ŷ or ũ2 on the original X's, X2's and,
if enough data, cross-products of the X's.
H0 will be rejected for high values of the test
statistic [n*R2~χ2q] or for low p-values.
n & R2 are obtained from auxiliary regression of ũ2
on q (number of) predictors.
Solutions to (or Estimation with) Heteroskedasticity

If heteroskedasticity is detected, first check for
some other specification error (omitted variables,
wrong functional form, …).
If it persists even after correcting for other
specification errors, use one of the following:
1. Use a better method of estimation (WLS/FGLS);
2. Stick to OLS but use robust (heteroskedasticity
consistent) standard errors.
4.5.2 Autocorrelation
Error terms are autocorrelated if error terms from

different (usually adjacent) time periods (cross-
sectional units) are correlated, E(εiεj)≠0.
Autocorrelation in cross-sectional data is called
spatial autocorrelation (in space, not over time).
But, spatial autocorrelation is uncommon since
cross-sectional data do not usually have some
ordering logic, or economic interest.
Serial correlation occurs in time-series when errors
associated with a given time period carry over into
future time periods.
et are correlated with lagged values: et-1, et-2, …
Effects of autocorrelation are similar to those of

heteroskedasticity: OLS coefficients are unbiased
and consistent, but inefficient; the estimate of σ2 is
biased, and thus inferences are invalid.
Detecting Autocorrelation
Plotting OLS residuals against the time variable,
or a formal test could be used.
The Breusch-Godfrey Test
Commonly-used general test of autocorrelation.
Steps:
Steps
1. Regress OLS residuals on X's and lagged residuals:
et = f(X1t, ..., XKt, et-1, …, et-j)
2. Test the joint hypothesis that all the estimated

coefficients on lagged residuals are zero. Use the
test statistic: jFcal ~ χ2j ;
3. Reject H0: no serial correlation for high values of
the test statistic or for small p-values.
Estimation in the Presence of Serial Correlation:
Solutions depend on the sources of the problem.
Autocorrelation may result from:
Model misspecification (omitted variables, wrong
functional form, …)
Misspecified dynamics (e.g. static model estimated
when dependence is dynamic), …
If autocorrelation is significant, check for model

specification errors, & consider re-specification.
If the revised model passes other specification tests,
but still fails tests of autocorrelation, consider the
following key solutions:
1. Use FGLS,
2. Use OLS with robust standard errors.
THE NEWEY–WEST METHOD OF
CORRECTING THE OLS STANDARD
ERRORS
Instead of using the FGLS methods , we
can still use OLS but correct the standard
errors for autocorrelation by a procedure
developed by Newey and West.This is an
extension of White’s heteroscedasticity
consistent standard errors
The corrected standard errors are known as
HAC (heteroscedasticity- and
autocorrelation-consistent) standard errors
or simply as Newey–West standard errors.
We will not present the mathematics
behind the Newey–West procedure, for it is
involved.
But most modern computer packages now
calculate the Newey–West standard errors.
But it is important to point out that the Newey–
West procedure is strictly speaking valid in large
samples and may not be appropriate in small
samples.
But in large samples we now have a method
that produces autocorrelation corrected
standard errors so that we do not have to worry
about the EGLS transformations.
Therefore, if a sample is reasonably large, one
should use the Newey–West procedure to correct
OLS standard errors not only in situations of
autocorrelation only but also in cases of
heteroscedasticity, for the HAC method can
handle both, unlike the White method, which was
designed specifically for heteroscedasticity.
Once again let us return to our wages–
productivity regression (12.5.1). We know that
this regression suffers from autocorrelation. Our
sample of 40 observations is reasonably large, so
we can use the HAC procedure. Using Eviews 4,
we obtain the following regression results:
4.6 Endogenous Regressors: E(
E(ɛɛi|Xj) ≠ 0
A key assumption maintained in the previous

lessons is that the model, E(Y|X) = Xβ or
K
∑
E(Y|X) = β + β X , was correctly specified.
0
i =1
i i
The model Y = Xβ + ε is correctly specified if:

1. ε is orthogonal to X's, enters the model with an
additively separable effect on Y & this effect equals
zero on average; and,
2. E(Y|X) is linear in stable parameters (β's).
If the assumption E(εi|Xj) = 0 is violated, the OLS
estimators will be biased & inconsistent.
Assuming exogenous regressors is unrealistic in
many situations.
E(ɛɛi|Xj) ≠ 0
The possible sources of endogeneity are:
1. stochastic regressors & measurement error;
2. specification errors: omission of relevant
variables or a wrong functional form;
3. nonlinearity in & instability of parameters; and
4. bidirectional link between the X's and Y.
Recall two versions of exogeneity assumption:
1. E(ɛi) = 0 and X’s are fixed (non-stochastic),
2. E(ɛiXj) = 0 or E(ɛi|Xj) = 0 with stochastic X’s.
The assumption E(εi) = 0 amounts to: “We do not
systematically over- or under-estimate the PRF,”
or the overall impact of all the excluded variables is
random/unpredictable.
E(ɛɛi|Xj) ≠ 0
This assumption cannot be tested as residuals will

always have zero-mean if the model has an
intercept.
If there is no intercept, some information can be
obtained by plotting the residuals.
E(ɛɛi) = μ (a constant but ≠ 0) & X's are fixed, the
If E(
estimators of all β's, except β0, will be OK!
But, can we assume non-non-stochastic regressors?
4.6.1 Stochastic Regressors and Measurement Error
A. Stochastic Regressors
Many economic variables are stochastic, and it is
only for ease that we assumed fixed X's.
For instance, the set of regressors may include:
a lagged dependent variable (Yt-1), or
an X characterized by a measurement error.
In both cases, it is unreasonable to assume fixed X's
If no other assumption is violated, OLS retains its
desirable properties even if X's are stochastic.
In general, stochastic regressors may or may not be
correlated with the model error term.
1. If X & ɛ are independently distributed, E(ɛ|X) = 0,
OLS retains all its desirable properties.
2. If X & ɛ are not independent but are either
contemporaneously uncorrelated, [E(ɛi|Xi±s)≠0 for s =
1, 2,… but E(ɛi|Xi)=0], or ɛ & X are asymptotically
uncorrelated, OLS retains its large sample
properties: estimators are biased, but consistent and
asymptotically efficient.
The basis for valid statistical inference remains but
inferences must be based on large samples.
3. If X & ɛ are not independent & are correlated even
asymptotically, then OLS estimators are biased &
inconsistent.
SOLUTION: IV/2SLS REGRESSION!
It is not whether X's are stochastic or fixed that
matters, but the nature of correlation b/n X's & ɛ.
B. Measurement Error
Measurement error in the regressand only does
not cause bias in OLS estimators as long as the
measurement error is not systematically related to
one or more of the regressors.
If the measurement error in Y is uncorrelated with
X's, OLS is perfectly applicable (though with less
precision or higher variances).
If there is a measurement error in a regressor & if
this error is correlated with the measured variable,
then OLS estimators will be biased & inconsistent.
SOLUTION: IV/2SLS REGRESSION!
4.6.2 Specification Errors
Model misspecification may result from:

omission of relevant variable/s,
using a wrong functional form, or
inclusion of irrelevant variable/s.
1. Omission of relevant variables: when one/more
relevant variables are omitted from a model.
Omitted-variable bias: bias in parameter estimates
when the assumed specification is incorrect in that
it omits a regressor that must be in the model.
e.g. estimating Y=β0+β1X1+β2X2+u when the correct
model is Y=β0+β1X1+β2X2+β3Z+u.
Wrongly omitting a variable (Z) is equivalent to
imposing β3 = 0 when in fact β3 ≠ 0.
If a relevant regressor (Z) is missing from a model,

OLS estimators of β's (β0, β1 & β2) will be biased,
except if cov(Z,X1) = cov(Z,X2) = 0.
Even if cov(Z,X1) = cov(Z,X2) = 0, the estimator for
β0 is biased.
The OLS estimators for σ2 & for the standard
errors of the β̂ 's are also biased.
Consequently, t- and F-tests will not be valid.
Generally, OLS estimators will be biased,
inconsistent and the inferences will be invalid.
The decision to include/exclude variables should be
guided by economic theory and reasoning.
2. Error in the algebraic form of the relationship: a

model that includes all regressors may be mis-
specified due to error in functional form relating.
e.g. using a linear functional form when the true r/p
is logarithmic (log-log) or semi-logarithmic (lin-log
or log-lin).
The effects of functional form misspecification are
the same as those of omitting relevant variables.
Testing for OVs & Functional Form Misspecification
1. Examination of Residuals
Often, the plot of residuals vs fitted values is used to
have a quick glance at problems like nonlinearity.
Ideally, we would like to see residuals rather

randomly scattered around zero.
If there are such errors as OVs or incorrect
functional form, the plot exhibits distinct patterns.
2. Ramsey’s Regression Equation Specification Error
Test (RESET)
It tests for misspecification due to omitted variables
or a wrong functional form.
Steps:
1. Regress Y on X's, and get Ŷ & ũ.
2. Regress: Y on X's Ŷ2 & Ŷ3.
3. If Ŷ2 & Ŷ3 are significant (using F test), then reject
H0 & conclude that there is misspecification.
If the model is misspecified, then try another

model: look for some variables which are left out
and/or try a different functional form like log-
linear (but based on some theory).
The test (by rejecting the null) does not suggest an
alternative specification.
3. Inclusion of irrelevant variables: when one/more
irrelevant variables are wrongly included in the
model. e.g. estimating Y=β0+β1X1+β2X2+β3X3+u
when the correct model is Y=β0+β1X1+β2X2+u.
Consequence: OLS estimators remain unbiased &
consistent but inefficient.
σ2 is correctly estimated & conventional hypothesis-

testing methods are still valid, but the estimated
variances of the coefficients are larger.
As a result, our probability inferences about the
parameters are less precise, i.e., precision is lost if
the correct restriction β3 = 0 is not imposed.
To test for irrelevant variables, use F-tests (based
on RRSS & URSS).
Do not eliminate variables from a model based on
insignificance implied by t-tests.
In particular, do not drop a variable with |t| > 1.
Do not drop 2 or more variables at once (based on
t-tests) even if each has |t| < 1.
The t-statistic corresponding to an X (Xj) may

radically change once another (Xi) is dropped.
In general, model misspecification due to the
inclusion of irrelevant variables is less serious than
that due to omission of relevant variable/s.
4.6.3 Stability of Parameters and Dummy Variables Regression (DVR
DVR))
So far we assumed that the intercept and all the

slope coefficients (βj's) are the same/stable for the
whole set of observations. Y = Xβ + e
But, structural shifts and/or group differences are
common in the real world.
world May be:
the intercept differs/changes, or
the (partial) slope differs/changes, or
both differ/change across categories or time period.
Two methods for testing parameter stability:
(i) Using Chow tests, or (ii) Using DVR.
A. The Chow Tests
Using an F-test to determine whether a single
regression is more efficient than two/more separate
regressions on sub-samples.
DVR))
The stages in running the Chow test are:

1. Run 2 separate regressions (say, before & after war
or policy reform, …) & save RSS's: RSS1 & RSS2.
RSS1 has n1–(K+1) df & RSS2 has n2–(K+1) df.
RSS1 + RSS2 = URSS with n1+n2–2(K+1) df.
2. Estimate pooled model (under H0: β's are stable).
RSS from this model is RRSS with n–(K+1) df
where n = n1+n2.
3. The test-statistic (under H0): [RRSS − URSS]
(K + 1)
F cal =
URSS
[n − 2(K + 1)]
4. Find the critical value: FK+1,n-2(K+1) from table.
5. If Fcal>Ftab, reject H0 of stable parameters (and
favour Ha: there is structural break).
DVR))
e.g.: we have the ff results from estimation of real

consumption from real disposable income:
i. For the period 1974-1991: consi = α1+β1*inci+ui
Consumption = 153.95 + 0.75*Income
p-value: (0.000) (0.000)
RSS = 4340.26114; R2 = 0.9982
ii. For the period 1992-2006: consi = α2+ β2*inci+ui
p-value: (0.975) (0.000)
RSS = 10706.2127; R2 = 0.9949
iii. For the period 1974-2006: consi = α+ β*inci+ui
t-ratio: (4.96) (155.56)
RSS = 22064.6663; R2 = 0.9987
DVR))
1. URSS = RSS1 + RSS2 = 15064.474

2. RRSS = 22064.6663
K = 1 and K + 1 = 2; n1 = 18, n2 = 15, n = 33.
3. Thus, [22064.666 3 − 15064.474]
Fcal = 2 = 6.7632981
15064.474
29
4. p-value = Prob(F-tab > 6.7632981) = 0.003883
5. Reject H0 at α=1%. Thus, there is structural break.
The pooled consumption model is an inadequate
specification; we should run separate regressions.
The above method of calculating the Chow test
breaks down if either n1 < K+1 or n2 < K+1.
Solution: use Chow’s second (predictive) test!
4.6.3 Stability of Parameters and Dummy Variables Regression (DVR)
If, for instance, n2 < K+1, then the F-statistic will be

altered as follows: [RRSS − RSS ]
1
n2
Fcal =
RSS 1
n 1 − (K + 1)
The Chow test tells if the parameters differ on
average, but not which parameters differ.
Also, it requires that all groups have the same σ2.
This assumption is questionable: if parameters can
be different, then so can the variances be.
One way of correcting for unequal σ2 is to use
dummy variable regression with robust standard
errors.
DVR))
B. The Dummy Variables Regression

I. Introduction:
Introduction:
Not all information can easily be quantified.
So, need to incorporate qualitative information.
e.g. 1. Effect of belonging to a certain group:
Gender, location, marital status, occupation
Beneficiary of a program/policy
2. Ordinal variables:
Answers to yes/no (or scaled) questions...
Effect of some quantitative variable may differ
between groups/categories:
Returns to education may differ between sexes
or between ethnic groups …
DVR))
Interest in determinants of belonging to a group

Determinants of being poor …
Dummy dependent variable (logit, probit…)
Dummy Variable: a variable devised to use
qualitative information in regression analysis.
A dummy variable takes 2 values: usually 0/1.
e.g. Yi=β0+β1*D+u; 1 for ∀i
∀ ϵ group 1, and
D=
0 for ∀i
∀ ∉ group 1.
If D = 0, E(Y) = E(Y|D = 0) = β0
If D = 1, E(Y) = E(Y|D = 1) = β0 + β1
Thus, the difference between the two groups (in
mean values of Y) is: E(Y|D=1) – E(Y|D=0) = β1.
The significance of this difference is tested by a t-
test of β1 = 0.
DVR))
e.g.: Wage differential between male and female

Two possible ways: a male or a female dummy.
1. Define a male dummy (male = 1 & female = 0).
reg wage male
Result: Yi = 9.45 + 172.84*D + ûi
p-value: (0.000) (0.000)
Interpretation the monthly wage of a male worker is, on
average, $172.84 higher than that of a female worker.
This difference is significant at 1% level.
2. Define a female dummy (female = 1 & male = 0)
reg wage female
Result: Yi = 182.29 – 172.84*D + ûi
p-value: (0.000) (0.000) Interpretation ??
DVR))
II. Using the DVR to Test for Structural Break:

Recall the example of consumption function:
period 1: consi = α1+ β1*inci+ui vs.
period 2: consi = α2+ β2*inci+ui
Let’s define a dummy variable D1, where:
1 for the period 1974-1991, and
D1 = 
0 for the period 1992-2006
Then, consi = α0+α1*D1+β0*inci+β1(D1*inci)+ui
α0+α1)+(β
For period 1: consi = (α β0+β1)inci+ui
For period 2 (base category): consi= α0+ β0*inci+ui
Regressing cons on inc, D1 and (D1*inc) gives:
cons = 1.95 + 152D1 + 0.806*inc – 0.056(D1*inc)
p-value: (0.968) (0.010) (0.000) (0.002)
DVR))
D1=1 for i ϵ period-1 & D1=0 for i ϵ period-2:

period 1 (1974-1991): cons = 153.95 + 0.75*inc
period 2 (1992-2005): cons = 1.95 + 0.806*inc
The Chow test is equivalent to testing α1=β1=0 in:
cons=1.95+152D1+0.806*inc – 0.056(D1*inc)
This gives: F(2, 29) = 6.76; p-value = 0.0039.
Then, reject H0! There is a structural break!
Comparing the two methods, it is preferable to use
the method of dummy variables regression.
This is because with the method of DVR:
1. we run only one regression.
2. we can test whether the change is in the intercept
only, in the slope only, or in both.
DVR))
For a total of m categories, use m–1 dummies!

Including m dummies (1 for each group) results in
perfect multicollinearity (dummy variable trap).
e.g.: 2 groups & 2 dummies:
constant = D1 + D2 !!!
X = [constant D1 D2 ]
1 X 11 1 0

X = 1 X 12 1 0 
 1 X 13 0 1 
4.6.4 Simultaneity Bias
Simultaneity occurs when an equation is part of a
simultaneous equations system, such that causation
runs from Y to X as well as from X to Y.
In such a case, cov(X, ε) ≠ 0 and OLS estimators
are biased and inconsistent.
e.g. The Simple Keynesian Consumption Function
Structural form model: consists of NIAs identity &
a basic consumption function:
Y t = C t + I t

C t = α + βY t + U t
Yt & Ct are endogenous (simultaneously determined)
and It is exogenous.
Reduced form: expresses each endogenous variable
as a function of exogenous variables (and/or
predetermined variables – lagged values of
endogenous variables, if present) & random term/s.
The reduced form is: Y = ( 1 )[ α + I + U ]
 t 1− β t t
 1
C = ( )[ α + βI + U ]
 t 1− β t t
The reduced form equation for Yt shows that:
1
cov(Yt ,U t ) = cov[( )(α + I t + U t ),U t ]
1− β
1 1 σ2U
=( α,Ut )+cov(It ,Ut )+cov(Ut ,Ut )] = ( ) var(Ut ) = ( ) ≠ 0
)[cov(
1−β 1− β 1− β
In Ct=α+βYt+Ut, Yt is correlated with Ut.

OLS estimators for β (=MPC) & α (=autonomous
consumption) are biased and inconsistent.
Solution: IV/2SLS
Solution
… THE END …
GOOD LUCK!
Time series regression model
Panel Data Regression model

Chapter 4

Uploaded by

Copyright:

Available Formats

Chapter 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4

Uploaded by

Copyright:

Available Formats

CHAPTER FOUR

VIOLATING THE ASSUMPTIONS OF

The estimates derived using OLS techniques and

I. The Major Steps Followed in a Scientific Study:

3. Estimating the model. When can one estimate a

A3.2: cov(ɛi,ɛs|Xj) = 0, for all i ≠ s; i, s = 1, …, n.

The values of the test statistics for testing

Requirement for estimation: n ≥ K+1.

Many social research studies use a large number of

For instance, if X2 = 2X1, then there is a perfect (an

But, substituting 2X1 for X2 in the 3rd equation

As a result, the normal equations will reduce to:

High, but not perfect, multicollinearity: two/more

low t-ratios but high R2 (or F): i.e., no much individual

Detecting the presence of multicollinearity is more

The difference between variances of βj in the 2

If Xj is not correlated with the other X's, R = 0,

There is no cutoff point on VIF beyond which

More precision (or lower vaˆr(βˆ j )) may result from:

4. Pooling cross-sectional and time-series data.

Normality is not required to get BLUE of β's.

If H0 is rejected, transforming the regressand or re-

With large samples (due to the CLT) hypothesis

The assumption of IID errors is violated if a

One assumption of the CLRM is homoskedasticity,

With a correctly specified model (in any other

With heteroskedasticity, OLS is not optimal: it

e.g. suppose the plot of ũ2 (from Y=α+βX+u) vs. X

Solutions to (or Estimation with) Heteroskedasticity

Error terms are autocorrelated if error terms from

Effects of autocorrelation are similar to those of

2. Test the joint hypothesis that all the estimated

If autocorrelation is significant, check for model

A key assumption maintained in the previous

The model Y = Xβ + ε is correctly specified if:

This assumption cannot be tested as residuals will

Model misspecification may result from:

If a relevant regressor (Z) is missing from a model,

2. Error in the algebraic form of the relationship: a

Ideally, we would like to see residuals rather

If the model is misspecified, then try another

σ2 is correctly estimated & conventional hypothesis-

The t-statistic corresponding to an X (Xj) may

So far we assumed that the intercept and all the

The stages in running the Chow test are:

e.g.: we have the ff results from estimation of real

1. URSS = RSS1 + RSS2 = 15064.474

If, for instance, n2 < K+1, then the F-statistic will be

B. The Dummy Variables Regression

Interest in determinants of belonging to a group

e.g.: Wage differential between male and female

II. Using the DVR to Test for Structural Break:

D1=1 for i ϵ period-1 & D1=0 for i ϵ period-2:

For a total of m categories, use m–1 dummies!

In Ct=α+βYt+Ut, Yt is correlated with Ut.

You might also like