Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Lecture 3

The document discusses model misspecification and its implications in regression analysis, particularly focusing on heteroskedasticity, serial correlation, and multicollinearity. It explains the consequences of these violations on statistical inference and the reliability of regression coefficients, emphasizing the importance of correcting for these issues through robust standard errors and other methods. Additionally, it outlines testing methods for detecting these violations and suggests potential solutions for multicollinearity.

Uploaded by

amir rafique
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 3

The document discusses model misspecification and its implications in regression analysis, particularly focusing on heteroskedasticity, serial correlation, and multicollinearity. It explains the consequences of these violations on statistical inference and the reliability of regression coefficients, emphasizing the importance of correcting for these issues through robust standard errors and other methods. Additionally, it outlines testing methods for detecting these violations and suggests potential solutions for multicollinearity.

Uploaded by

amir rafique
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Model

Misspecification
• Model specification refers to the set of variables included in the regression and
the regression equation’s functional form.
• When estimating a regression, we assume it has the correct functional form, an
assumption that can fail in different ways, as shown in Exhibit 2.
Violations of regression
assumptions: Heteroskedasticity
• An important assumption underlying linear regression is that the variance of
errors is constant across observations (errors are homoskedastic).
• Residuals in financial model estimations, however, are often heteroskedastic,
meaning the variance of the residuals differs across observations.
• Heteroskedasticity may arise from model misspecification, including omitted
variables, incorrect functional form, and incorrect data transformations,
as well as from extreme values of independent variables.
Consequences of Heteroskedasticity

• There are two broad types of heteroskedasticity: unconditional and conditional.


• Unconditional heteroskedasticity occurs when the error variance is not
correlated with the regression’s independent variables.
• Although it violates a linear regression assumption, this form of
heteroskedasticity creates no major problems for statistical inference.
• Conditional heteroskedasticity is more problematic for statistical inference—
when the error variance is correlated with (conditional on) the values of the
independent variables.
• This type of heteroskedasticity may lead to mistakes in statistical inference.
• When errors are conditional heteroskedastic, the F-test for the overall
regression significance is unreliable because the MSE becomes a biased
estimator of the true population variance.
• Moreover, t-tests of individual regression coefficients are unreliable because
heteroskedasticity introduces bias into estimators of the standard error of
regression coefficients.
• Thus, in regressions with financial data, the most likely impacts of conditional
heteroskedasticity are that standard errors will be underestimated, so t-statistics
will be inflated.
• If there is conditional heteroskedasticity in the estimated model, we tend to
find significant relationships where none actually exist and commit more Type
I errors (rejecting the null hypothesis when it is actually true)
Testing for Conditional Heteroskedasticity
• The Breusch–Pagan (BP) test is widely used in financial analysis to diagnose
potential conditional heteroskedasticity and is best understood via the three-
step process shown in Exhibit 4

If conditional heteroskedasticity is present in the initial regression, the independent


variables will explain a significant portion of the variation in the squared residuals in Step
Correcting for Heteroskedasticity

• It is important to note that market efficiency implies that in efficient


markets, heteroskedasticity should generally not be observed in financial
data.
• However, if heteroskedasticity is detected, for example, in the form of
volatility clustering—where large (small) changes tend to be followed by large
(small) changes—then it presents an opportunity to forecast asset returns
that should be exploited to generate alpha.
• So, analysts should not only correct problems in their models due to
heteroskedasticity but also understand the underlying processes in their
data and capitalize on them.
• The easiest method to correct for the effects of conditional heteroskedasticity in
linear regression is to compute robust standard errors, which adjust the
standard errors of the regression’s estimated coefficients to account for the
heteroskedasticity.
VIOLATIONS OF REGRESSION ASSUMPTIONS: SERIAL
CORRELATION

• A common and serious problem in multiple linear regression is violation of the


assumption that regression errors are uncorrelated across observations.
• When regression errors are correlated across observations, they are serially
correlated.
The Consequences of Serial Correlation
• The main problem caused by serial correlation in linear regression is an
incorrect estimate of the regression coefficients’ standard errors.
• If none of the regressors is a previous value—a lagged value—of the dependent
variable, then the estimated parameters themselves will be consistent and need
not be adjusted for the effects of serial correlation.
• But if one of the independent variables is a lagged value of the dependent
variable, serial correlation in the error term causes all parameter estimates to be
inconsistent—that is, invalid estimates of the true parameters.
• Positive serial correlation is present when a positive residual for one
observation increases the chance of a positive residual in a subsequent
observation, resulting in a stable pattern of residuals over time.

• Positive serial correlation also means a negative residual for one observation
increases the chance of a negative residual for another observation.

• Conversely, negative serial correlation has the opposite effect, so a positive


residual for one observation increases the chance of a negative residual for
another observation, and so on.

• We examine positive serial correlation because it is the most common type and
assume first-order serial correlation, or correlation between adjacent
observations. In a time series, this means the sign of the residual tends to
persist from one period to the next.
• Positive serial correlation does not affect the consistency of regression
coefficients, but it does affect statistical tests.

• First, the F-statistic may be inflated because the MSE will tend to
underestimate the population error variance.

• Second, positive serial correlation typically causes standard errors to be


underestimated, so t-statistics are inflated, which (as with
heteroskedasticity) leads to more Type I errors
• Importantly, if a time series exhibits serial correlation, this means that there
is some degree of predictability to it.

• In the case of asset prices, if these prices were to exhibit a pattern, investors
would likely discern this pattern and exploit it to capture alpha, thereby
eliminating such a pattern.

• This idea follows directly from the efficient market hypothesis.


Consequently, assuming market efficiency (even weak form), we should not
observe serial correlation in financial market data.
Testing for Serial Correlation
• There are a variety of tests for serial correlation, but the most common are the
Durbin– Watson (DW) test and the Breusch–Godfrey (BG) test.

• The DW test is a measure of autocorrelation and compares the squared


differences of successive residuals with the sum of the squared residuals.

• However, the DW test is limiting because it applies only to testing for first-
order serial correlation.

• The BG test is more robust because it can detect autocorrelation up to a pre-


designated order p, where the error in period t is correlated with the error in
period t – p.
Correcting for SC
• The most common “fix” for a regression with significant serial correlation is to
adjust the coefficient standard errors to account for the serial correlation.

• Methods for adjusting standard errors are standard in many software packages.
The corrections are known by various names, including serial-correlation
consistent standard errors, serial correlation and heteroskedasticity adjusted
standard errors, Newey–West standard errors, and robust standard errors.

• An advantage of these methods is that they also correct for conditional


heteroskedasticity.

• The robust standard errors, for example, use heteroskedasticity- and


autocorrelation-consistent (HAC) estimators of the variance–covariance matrix
in the regression estimation.
VIOLATIONS OF REGRESSION ASSUMPTIONS:
MULTICOLLINEARITY
• An assumption of multiple linear regression is that there is no exact linear
relationship between two or more independent variables.

• However, multicollinearity may occur when two or more independent


variables are highly correlated or when there is an approximate linear
relationship among independent variables.

• With multicollinearity, the regression can be estimated, but interpretation of


the role and significance of the independent variables is problematic.

• Multicollinearity is a serious concern because approximate linear


relationships among economic and financial variables are common.
Consequences of Multicollinearity

• Multicollinearity does not affect the consistency of regression coefficient


estimates, but it makes these estimates imprecise and unreliable.

• Moreover, it becomes impossible to distinguish the individual impacts of the


independent variables on the dependent variable.

• These consequences are reflected in inflated standard errors and diminished


t-statistics, so t-tests of coefficients have little power (ability to reject the null
hypothesis).
Detecting Multicollinearity
• Except in the case of exactly two independent variables, using the magnitude
of pairwise correlations among the independent variables to assess
multicollinearity is generally inadequate.

• With more than two independent variables, high pairwise correlations are
not a necessary condition for multicollinearity.

• For example, despite low pairwise correlations, there may be approximate


linear combinations among several independent variables (which are
unobservable) and that themselves are highly correlated.

• The classic symptom of multicollinearity is a high R2 and significant F-


statistic but t-statistics for the individual estimated slope coefficients that are
not significant due to inflated standard errors.
• Fortunately, we can use the variance inflation factor (VIF) to quantify
multicollinearity issues.

• In a multiple regression, a VIF exists for each independent variable.

• Suppose we have k independent variables X1, . . ., Xk.

• By regressing one independent variable (Xj) on the remaining k – 1 independent


variables, we obtain Rj2 for the regression—the variation in Xj explained by the
other k – 1 independent variables— from which the VIF for X j is
• For a given independent variable, Xj, the minimum VIFj is 1, which occurs
when Rj2 is 0, so when there is no correlation between Xj and the remaining
independent variables.

• VIF increases as the correlation increases; the higher the VIF, the more likely a
given independent variable can be accurately predicted from the remaining
independent variables, making it increasingly redundant.

• The following are useful rules of thumb:

• VIFj > 5 warrants further investigation of the given independent variable.

• VIFj >10 indicates serious multicollinearity requiring correction.


Correcting for Multicollinearity

• Possible solutions to multicollinearity include

• excluding one or more of the regression variables,

• using a different proxy for one of the variables,

• increasing the sample size.

You might also like