Heteroscedasticity:: Testing and Correcting in SPSS
Heteroscedasticity:: Testing and Correcting in SPSS
Heteroscedasticity:: Testing and Correcting in SPSS
1) Introduction
Recall that for estimation of coefficients and for regression inference to be correct we have to
assume that:
1. Equation is correctly specified:
2. Error Term has zero mean
3. Error Term has constant variance
4. Error Term is not autocorrelated
5. Explanatory variables are fixed
6. No linear relationship between RHS variables
When assumption 3 holds, the errors ui in the regression equation have common variance, and
then we have what is called homoscedasticity, or a scalar error covariance matrix (assuming also
that there is no autocorrelation), where scalar is another word for constant. When assumption 3
breaks down, we have the opposite of homoscedasticity: heteroscedasticity, or a non-scalar error
covariance matrix
0 conditional distributions of y.
The variability of b across samples is measured by the standard error of b, which is an estimate of
the variation of b across regressions run on repeated samples. Although we dont know SE(b) for
sure (unless we run all possible repeated samples), we can estimate it from within the current
sample because the variability of the slope parameter estimate will be linked to the variability of
the y-values about the hypothesised line of best fit within the current sample. In particular, it is
likely that the greater the variability of y for each given value of x, the greater the variability of
estimates of a and b in repeated samples and so we can work backwards from the variability of y
for a given value of x in our sample to provide an estimate of the sampling variability of b.
We can apply a similar logic to the variability of the residuals across samples. Recall that the
value of the Residual for each observation i is the vertical distance between the observed value of
the dependent variable and the predicted value of the dependent variable (i.e. the difference
between the observed value of the dependent variable and the line of best fit value). Assume in the
following figure that this is a plot from a single sample, this time with multiple observations of y
for each given value of x).
between samples by
100000
looking at how it varies -ive
within the current +
residual
Purchase price
sample. 0
-
300000
2) Causes
What might cause the variance of
200000
the residuals to change over the
course of the sample? The error
term may be correlated with either
100000 the dependent variable and/or the
explanatory variables in the
Unstandardized Residual
-200000
a) Non-constant
0 2 4 6 8 10 12 14 coefficient
Suppose that the slope coefficient
Number of rooms
varies across observations i:
yi = a + bi xi + ui
and suppose that it varies randomly around some fixed value :
bi = + i
then the regression actually estimated by SPSS will be:
b) Omitted variables
Suppose the true model of y is:
yi = a + b xi + c zi + ui
but the model we estimate fails to include z:
yi = a + b xi + vi
then the error term in the model estimated by SPSS (vi) will be capturing the effect of the omitted
variable, and so it will be correlated with z:
vi = c zi + ui
and so the variance of vi will be non-scalar.
c) Non-linearities
If the true relationship is non-linear:
yi = a + b xi2 + ui
but the regression we attempt to estimate is linear:
yi = a + b xi + vi
then the residual in this estimated regression will capture the non-linearity and its variance will be
affected accordingly:
vi = f(xi2, ui)
d) Aggregation
Sometimes we aggregate our data across groups. For example, we might use quarterly time series
data on income which is calculated as the average income of a group of households in a given
quarter. If this is so, and the size of groups used to calculate the averages varies, then the variation
of the mean not be constant (larger groups will have a smaller standard error of the mean). This
means that the measurement errors of each value of our variable will be correlated with the sample
size of the groups used.
Since measurement errors will be captured by the regression residual, the implication is that the
regression residual will vary the sample size of the underlying groups on which the data is based.
3) Consequences
Heteroscedasticity by itself does not cause OLS estimators to be biased or inconsistent (for the
difference between these two concepts see the graphs below) since neither bias nor consistency are
determined by the covariance matrix of the error term. However, if heteroscedasticity is a
symptom of omitted variables, measurement errors, or non-constant parameters, then OLS
estimators will be biased and inconsistent. Note that in such cases, heteroscedasticity does not
causes the bias: it is merely one of the side effects of a failure of one of the other assumptions that
also causes bias and inconsistency.
3.5
n = 500
3
n = 300
2.5
2
n = 200
1.5
n = 150
0.5
0
0.55
1.25
1.95
2.65
3.35
4.05
4.75
5.45
6.15
6.85
7.55
-3.7
-3.3
-2.6
-2.3
-1.9
-1.6
-1.2
-0.9
-0.5
-0.2
0.2
0.9
1.6
2.3
3.7
4.4
5.1
5.8
6.5
7.2
7.9
3
-4
-3
hat
3.5
n = 500
3
2.5
n = 300
2
n = 200
1.5
n = 150
1
0.5
0
0.2
0.9
1.6
2.3
3.7
4.4
5.1
5.8
6.5
7.2
7.9
0.55
1.25
1.95
2.65
3.35
4.05
4.75
5.45
6.15
6.85
7.55
-3.7
-3.3
-2.6
-2.3
-1.9
-1.6
-1.2
-0.9
-0.5
-0.2
3
-4
-3
hat
So testing for heteroscedasticity is closely related to tests for misspecification generally and many
of the tests for heteroscedasticity end up being general mispecification tests. Unfortunately, there
is no straightforward way to identify the cause of heteroscedasticity.
Scatter plot of the standardised residuals on the standardised predicted values (ZRESID as
the Y variable, and ZPRED as the X variable this plot will allow you to detect outliers and
non-linearities since well behaved residuals will be spherical i.e. scattered randomly in an
approximate circular pattern). If the plot fans out in (or fan in) a funnel shape, this is a sign of
heteroscedasticity. If the residuals follow a curved pattern, then this is a sign that non-
linearities have not been accounted for in the model.
These can all be included as part of the regression output by clicking on Plots in the Linear
Regression Window, check the Histogram and Normal Probability Plot boxes, and select the
ZRESID on ZPRED scatter plot. Alternatively, you can add
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID) .
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT purchase
/METHOD=ENTER floorare
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID) .
Histogram
Dependent Variable: Purchase Price
140
120
100
80
60
40
Frequency
0 N = 556.00
-3
-3 0
-2 0
-2 0
-1 0
-1 0
-.5 0
0.
.5
1.
1. 0
2. 0
2. 0
3. 0
3. 0
4. 0
4. 0
5. 0
5. 0
00
0
0
5
0
5
0
5
0
5
0
50
.5
.0
.5
.0
.5
.0
0
.75
Expected Cum Prob
.50
.25
0.00
0.00 .25 .50 .75 1.00
-2
-4
-2 -1 0 1 2 3 4
The residuals are pretty much normally distributed but there is evidence of heteroscedasticity since
the residual plot fans out. If we re-run the regression using the log of purchase price as the
dependent variable, we find that the residuals become spherical again (one should check whether
taking logs has a detrimental effect on other diagnostics such as the Adjusted R2 and t-values in
this case the impact is negligible):
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT price_l
/METHOD=ENTER floorare
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID) .
Scatterplot
Dependent Variable: PRICE_L
4
Regression Standardized Residual
-1
-2
-3
-4
-2 -1 0 1 2 3 4
Use the Levenes test to test for heteroscedasticity caused by age of dwelling in a regression
of floor area on age of dwelling, rooms, bedrooms. Also test for heteroscedasticity caused by
floor area (e.g. variance of the residuals increases with floor area).
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT floorare
/METHOD=ENTER age_dwel bedrooms bathroom
/save resid(res_1).
T-TEST
GROUPS=age_dwel(62.5)
/MISSING=ANALYSIS
Indepe
F Sig.
Unstandardized Residual Equal variances
.110 .740
assumed
Equal variances
not assumed
H0: equal variances Age dwelling < 62.5 and age dwelling > 62.5. Since the
significance level is so high, we cannot reject the null of equal variances. In other
words, the Levenes test is telling us that the variance of the residual term does not
vary by age of dwelling. This seems surprising given the residual plots we did
earlier, but the standard deviations of the residual across the two groups reported in
the Group Statistics table seems to confirm this (i.e. the standard deviations are very
similar).
However, it may be that it is only at the extremes of age that the heteroscedasticity
occurs. We should try running the Levenes test on the first and last quartile (i.e.
group age of dwelling as below the 25 percentile and above the 75 percentile). You
can find out percentiles by going to Analyse, Custom Tables, Basic Tables, enter
Age of dwelling into the Summary, click statistics and select the relevant percentiles
from the list available. This gives you the following syntax and output:
* Basic Tables.
TABLES
/FORMAT BLANK MISSING('.')
/OBSERVATION age_dwel
/TABLES age_dwel
BY (STATISTICS)
/STATISTICS
mean( )
ptile 25( 'Percentile 25')
ptile 75( 'Percentile 75')
median( ).
Now run the Levenes test again, but this time screen out the middle two quartiles
from the sample using the TEMPORARY. SELECT IF age_dwel le 21 or
age_dwel ge 99 syntax before the T-TEST syntax.
le means less than or equal to, and ge means greater than or equal to. Note that
you must run the TEMPORARY. SELECT IF and the T-TEST. syntax all
in one go (i.e. block off all seven lines and run):
TEMPORARY.
SELECT IF age_dwel le 21 or age_dwel ge 99.
T-TEST
GROUPS=age_dwel(62.5)
/MISSING=ANALYSIS
/VARIABLES=res_1
/CRITERIA=CIN(.95) .
Group Statistics
Independe
F Sig.
Unstandardized Residual Equal variances
.789 .375
assumed
Equal variances
not assumed
Given that the G-Q test is very similar to the Levenes test considered above, we shall not spend
any time on it here.
Step 0: Test for non-normality in the errors. If they are normal, proceed. If not, see
Koenker (1981) version below.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT floorare
/METHOD=ENTER age_dwel bedrooms bathroom
/RESIDUALS HIST(ZRESID) NORM(ZRESID)
/save resid(res_4).
DESCRIPTIVES
VARIABLES=res_4
/STATISTICS=MEAN KURTOSIS SKEWNESS .
Histogram
Dependent Variable: Floor Area (sq meters)
80
60
40
Frequency
20
Std. Dev = 1.00
Mean = 0.00
0 N = 556.00
-4
-3 5
-3 5
-2 5
-2 5
-1 5
-1 5
-.7 5
-.2
.2
.7
1.
1.
2.
2.
3.
3.
5
5
25
75
25
75
25
75
.2
.7
.2
.7
.2
.7
.2
5
5
.75
Expected Cum Prob
.50
.25
0.00
0.00 .25 .50 .75 1.00
The histogram and normal probability plot suggest that the errors are fairly
normal. The positive value of the skewness statistic suggests that it is
skewed to the left (long right tail) and since this is more than twice its
standard error this suggests a degree of non-normality. The positive Kurtosis
suggests that the distribution is more clustered than the normal distribution. I
would say this was a borderline case so I shall present both the B-P statistic
and the Koenker version. It is worth noting that the Koenker version is
probably more reliable anyway so there is a case for dropping the B-P
version entirely (the only reason to continue with it is because more people
are familiar with it).
DESCRIPTIVES
VARIABLES=res_4sq
/STATISTICS= sum .
Descriptive Statistics
N Sum
RES_4SQ 556 322168.4
Valid N (listwise) 556
Note that the sum of squared residuals = RSS = the figure reported in the
ANOVA table, so you might want to check it against your ANOVA table to
make sure youve calculated the squared residuals correctly.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT g
/METHOD=ENTER age_dwel bedrooms bathroom
agedw_sq agedw_cu agedw_4
bedrm_sq bedrm_cu bedrm_4
bath_sq bath_cu bath_4.
The ANOVA table from this regression will give you the explained (or
regression) of squares REGSS = 218.293:
Sum of
Model Squares df M
1 Regression 218.293 9
Residual 1892.280 546
Total 2110.573 555
a. Predictors: (Constant), BATH_4, AGEDW_SQ
BATHROOM, AGE_DWEL, BEDRM_SQ, AGE
b. Dependent Variable: G
You could use Chi-square tables which will give you the Chi square
value for a particular significance level and df. In this case, for df = 8,
and a sig. level of 0.05, 2 =2.73. Since our test statistic value of
109.1465 for 2 is way beyond this we can confidently reject the null
of homoscedasticity (i.e. we have a problem with heteroscedasticity).
b) White Test
The most general test of heteroscedasticity
no specification of the form of hetero required
Procedure for Whites test:
Step 1: run an OLS regression use the OLS regression to calculate uhat 2 (i.e.
square of residual).
Step 2: use uhat 2 as the dependent variable in another regression, in which the
regressors are: (a) all k original independent variables, and (b) the square
of each independent variable, (excluding dummy variables), and all 2-way
interactions (or crossproducts) between the independent variables.
The statistic is asymptotically (I.e. in large samples) distributed as chi-squared with P degrees of
freedom, where P is the number of regressors in the regression, not including the constant.
Problems:
Example:
Run a regression of the log of floor area on terrace semidet garage1 age_dwel bathroom
bedrooms and use the White Test to investigate the existence of heteroscedasticity.
One could calculate this test manually.The only problem is that it can be quite time
consuming constructing all the cross products.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT flarea_l
/METHOD=ENTER terrace semidet garage1 age_dwel bathroom
bedrooms
/SAVE RESID(RES_1) .
* 5th step: run a regression on the original explanatory variables plus all cross
products.
*Note that SPSS will automatically drop out variables if that are perfectly correlated
with variables already in the regression.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/NOORIGIN
/DEPENDENT esq
/METHOD=ENTER age_dwel bathroom bedrooms cp1 cp2 cp3
/ SAVE RESID(RES_2) .
* 6th Step: calculate the test statistic as nRsquare ~ Chi-square with degrees of
freedom equal to P = the total number of regressors actually run in this last
regression (i.e. not screened out because of perfect colinearity), not including the
constant term. You can do this by hand or run the following syntax which will also
calculate the significance
level of Chi-square test statistic (the only thing you will need to do is enter the
value for P in the first line of MATRIX syntax).
MATRIX.
COMPUTE P = 6.
GET ESQ / VARIABLES = ESQ.
GET RES_2 / VARIABLES = RES_2.
COMPUTE RES2_SQ = RES_2 &**2.
COMPUTE N = NROW(ESQ).
COMPUTE RSS = MSUM(RES2_SQ).
COMPUTE ii_1 = MAKE(N, N, 1).
COMPUTE I = IDENT(N).
COMPUTE M0 = I - ((1/N) * ii_1).
COMPUTE TSS = TRANSPOS(ESQ)*M0*ESQ .
RSS 2.385128E+00
TSS 2.487222E+00
R_SQ 4.104736E-02
N 5.560000E+02
White's General Test for Heterosced (CHI-SQUARE df = P)
2.282233E+01
SIGNIFICANCE LEVEL OF CHI-SQUARE df = P (H0 = homoscedasticity)
8.582205E-04
So we reject the null (i.e. we have a problem with heteroscedasticity)
3. Solutions
a) Weighted Least Squares
If the differences in variability of the error term can be predicted from another variable within the
model, the Weight Estimation procedure (available in SPSS) can be used. The procedure
computes the coefficients of a linear regression model using weighted least squares (WLS), such
that the more precise observations (that is, those with less variability) are given greater weight in
determining the regression coefficients. The Weight Estimation procedure tests a range of weight
transformations and indicates which will give the best fit to the data.
i) HC0: Matrix Procedure for Whites Standard Errors in SPSS when the sample is > 500:
* 1st step: Open up your data file and save it under a new name since the
following procedure will alter it.
* 2nd step: Run you OLS regression and save UNSTANDARDISED residuals
as RES_1:.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT mp_pc
/METHOD=ENTER xp_pc gdp_pc
/SAVE RESID(RES_1) .
* 4th step: create a variable called CONSTANT = constant of value 1 for all
observations in the sample.
FILTER OFF.
USE ALL.
EXECUTE .
COMPUTE CONSTANT = 1.
EXECUTE.
* 5th step: Filter out missing values and Enter Matrix syntax mode .
FILTER OFF.
USE ALL.
SELECT IF(MISSING(ESQ) = 0).
EXECUTE .
Notes:
Dont save your data file under the same name since the above procedure has
removed from the data all observations with missing values.
If you already have a variable called res_1, you will need to delete or rename it before you run
the syntax. This means that if you run the procedure on several regressions, you will need to
delete the newly created res_1 and ESQ variables after each run.
Note that the output will use scientific notation, so 20.7 will be written as 2.07E+01, and
0.00043 will be written as 4.3E-04.
Note that the last table just collects together the results of five of the other tables.
WT_VAL is an abbreviation for Whites t-values and SIG_WT is the significance
level of these t values.
OLS Coefficients
CONSTANT 3.536550E+00
AGE_DWEL 1.584464E-03
BATHROOM 2.258710E-01
BEDROOMS 2.721069E-01
OLS t-values
CONSTANT 1.006304E+02
AGE_DWEL 9.661319E+00
BATHROOM 9.034130E+00
BEDROOMS 2.354899E+01
WESTIM
B SE WHITE_SE WT_VAL SIG_WT
CONSTANT 3.536550E+00 3.514394E-02 4.043030E-02 8.747276E+01 0.000000E+00
AGE_DWEL 1.584464E-03 1.640008E-04 1.715285E-04 9.237322E+00 0.000000E+00
BATHROOM 2.258710E-01 2.500197E-02 2.735781E-02 8.256180E+00 2.220446E-16
BEDROOMS 2.721069E-01 1.155493E-02 1.284207E-02 2.118870E+01 0.000000E+00
If we compare the adjusted t-values with those from OLS, then we will see that they are
marginally lower but all still highly significant in this case. The greater the
heteroscedasticity, the larger the difference between the OLS t values and WT_VAL.
When the sample size is small, it has been found that Whites stand ard errors are not reliable
MacKinnon and White (1985) proposed three tests to be used when the sample size is small. Long
and Ervin (1999) found that the third of these tests, what they call HC3, is the most reliable, but
unless one has a great deal of RAM on your computer, you may run into difficulties if your
sample size is greater than 250. As a result, I would recommend the following:
n < 250 use HC3 irrespective of whether your tests for heteroscedasticity
prove positive (Long and Ervin found that the tests are not very
powerful in small samples).
250 < n < 500 use HC2 since this is more reliable than HC0 (HC0 = Whites original
SE as computed above).
Syntax for computing HC2 is presented below. Follow the first 5 steps as before, and then run the
following:
*HC2.
MATRIX.
GET Y / VARIABLES = flarea_l.
GET X / VARIABLES = CONSTANT, age_dwel, bathroom, bedrooms
/ NAMES = XTITLES.
GET RESIDUAL / VARIABLES = RES_1.
GET ESQ / VARIABLES = ESQ.
COMPUTE XRTITLES = TRANSPOS(XTITLES).
COMPUTE N = NROW(ESQ).
COMPUTE K = NCOL(X).
COMPUTE O = MDIAG(ESQ).
/*Computing HC2*/.
COMPUTE XX = TRANSPOS(X) * X.
COMPUTE XX_1 = INV(XX).
COMPUTE X_1 = TRANSPOS(X).
COMPUTE H = X*XX_1*X_1.
COMPUTE H_MONE = h * -1.
COMPUTE ONE_H = H_MONE + 1.
COMPUTE O_HC2 = O &/ ONE_H.
COMPUTE HC2_a = XX_1 * X_1 *O_HC2.
COMPUTE HC2 = HC2_a * X*XX_1.
COMPUTE HC2DIAG = DIAG(HC2).
COMPUTE HC2_SE = SQRT(HC2DIAG).
PRINT HC2_SE
/ FORMAT = "E13"
OLS Coefficients
CONSTANT 3.536550E+00
AGE_DWEL 1.584464E-03
BATHROOM 2.258710E-01
BEDROOMS 2.721069E-01
For HC3, you need to make sure that your sample is not too large otherwise the computer may
crash. You can temporarily draw a random sub-sample by using the TEMPORARY. SAMPLE
*HC3.
/*when Computing HC3 make sure n is < 250 (e.g. use TEMPORARY.
SAMPLE 0.4.) */.
TEMPORARY.
SAMPLE 0.4.
MATRIX.
GET Y / VARIABLES = flarea_l.
GET X / VARIABLES = CONSTANT, age_dwel, bathroom, bedrooms
/ NAMES = XTITLES.
GET RESIDUAL / VARIABLES = RES_1.
GET ESQ / VARIABLES = ESQ.
COMPUTE XRTITLES = TRANSPOS(XTITLES).
COMPUTE N = NROW(ESQ).
COMPUTE K = NCOL(X).
COMPUTE O = MDIAG(ESQ).
COMPUTE XX = TRANSPOS(X) * X.
COMPUTE XX_1 = INV(XX).
COMPUTE X_1 = TRANSPOS(X).
COMPUTE H = X*XX_1*X_1.
COMPUTE H_MONE = h * -1.
COMPUTE ONE_H = H_MONE + 1.
/*Computing HC3*/.
COMPUTE ONE_H_SQ = ONE_H &** 2.
COMPUTE O_HC3 = O &/ ONE_H_SQ.
COMPUTE HC3_a = XX_1 * X_1 *O_HC3.
COMPUTE HC3 = HC3_a * X*XX_1.
COMPUTE HC3DIAG = DIAG(HC3).
COMPUTE HC3_SE = SQRT(HC3DIAG).
COMPUTE B = XX_1 * X_1 * Y.
PRINT B
/ FORMAT = "E13"
/TITLE = "OLS Coefficients".
PRINT HC3_SE
/ FORMAT = "E13"
/ TITLE = "HC3 Small Sample Corrected Standard Errors"
/ RNAMES = XRTITLES.
COMPUTE HC3_TVAL = B / HC3_SE.
PRINT HC3_TVAL
/ FORMAT = "E13"
/ TITLE = "t-values based on HC3 corrected SEs"
/ RNAMES = XRTITLES.
COMPUTE SIG_HC3T = 2*(1- TCDF(ABS(HC3_TVAL), N)) .
PRINT SIG_HC3T
OLS Coefficients
3.530325E+00
1.546620E-03
2.213146E-01
2.745376E-01
4. Conclusions
In conclusion, it is worth quoting Greene (1990),
It is rarely possible to be certain about the nature of the heteroscedasticity in a egression model.
In one respect, this is only a minor problem. The weighted least squares estimator, , is
consistent regardless of the weights used, as long as the weights are uncorrelated with the
disturbances But using the wrong set of weights has two other consequences which may be
less benign. First, the improperly weighted least squares estimator is inefficient. This might be a
moot point if the correct weights are unknown, but the GLS standard errors will also be incorrect.
The asymptotic covariance matrix of the estimator may not resemble the usual estimator. This
underscores the usefulness of the White estimator Finally, if the form of the heteroscedasticity
is known but involves unknown parameters, it remains uncertain whether FGLS corrections are
better than OLS. Asymptotically, the comparison is clear, but in small or moderate-sized
samples, the additional variation incorporated by the estimated variance parameters may offset
the gains to GLS. (W. H. Green, 1990, p. 407)
Reading
Kennedy (1998) A Guide to Econometrics, Chapters 5,6,7 and 9
Maddala, G.S. (1992) Introduction to Econometrics chapter 12
Field, A. (2000) chapter 4, particularly pages 141-162.
Greene, W. H. (1990) Econometric Analysis, 2nd edition.
Further References:
i) Original Papers for test statistics:
S.M. Goldfeld and R.E. Quandt, "Some Tests for Homoscedasticity," Journal of the
American Statistical Society, Vol.60, 1965.
T.S. Breusch and A.R. Pagan, "A Simple Test for Heteroscedasticity and Random
Coefficient Variation," Econometrica, Vol. 47, 1979.
H. White. 1980. A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test
for Heteroskedasticity. Econometrica, 48, 817-838.
MacKinnon, J.G. and H. White. (1985), Some heteroskedasticity consistent covariance
matrix estimators with improved finite sample properties. Journal of Econometrics,
29, 53-57.
Gwilym Pryce,
University of Glasgow
14th March 2002