Robust Regression Modeling With STATA Lecture Notes
Robust Regression Modeling With STATA Lecture Notes
Modeling
with STATA
lecture notes
Outline
1.
38.4
91.3
244.2
137.2
91.3
x1
38.4
19.1
x2
15.8
137.2
244.2
15.8
19.1
Theory of Regression
Analysis
What is linear regression
Analysis?
Finding the relationship between a
dependent and an independent
variable.
Y = a + bx + e
The Multiple
Regression Formula
Y = a + bx + e
Y is the dependent variable
a is the intercept
b is the regression coefficient
x is the predictor variable
Graphical Decomposition
of Effects
Decomposition of Effects
yi y = Total Effect
}}
Yi
y = a + bx
yi yi = error
y y = regression effect
X
X
e
i =1
y a
i
i =1
i =1
b xi
i =1
Because by definition ei = 0
i =1
0=
y a
i =1
i =1
i =1
i =1
i =1
b xi
i =1
ai = yi b xi
n
i =1
i =1
na = yi b xi
a = y bx
9
Derivation of the
Regression Coefficient
Given : yi = a + b xi + ei
ei = yi a b xi
n
i =1
(y
i =1
a b xi )
2
e
i =
2
y
b
x
(
)
i
i
i =1
i =1
ei 2
i =1
i =1
= 2 xi ( yi ) 2b xi xi
i =1
b
0
i =1
i =1
= 2 xi ( yi ) 2b xi xi
n
b =
x y
i =1
n
2
x
i
i =1
10
11
r =
i=1
xi yi
( x ) ( y )
n
i=1
i=1
w here
x = xi x
y = yi y
n
i = 1
n
i = 1
bj = r *
sd y
sd x
12
13
yx . x =
1
yx . x =
2
2
x1 x2
2
x1 x2
sd y
sd x
sd y
sd x
(6)
(7)
a = Y b1 x1 b2 x2 (8)
14
Linear Multiple
Regression
Suppose that we have the
following data set.
15
16
Regression modeling
and the assumptions
1. What are the assumptions?
1. linearity
2. Heteroskedasticity
3. No influential outliers in small
samples
4. No multicollinearity
5. No autocorrelation of residuals
6. Fixed independent variables-no
measurement error
7. Normality of residuals
17
Misspecification tests
heteroskedasticity tests
rvfplot
hettest
residual autocorrelation tests
corrgram
outlier detection
tabulation of standardized residuals
influence assessment
residual normality tests
sktest
Specification tests (not covered in this lecture)
18
Misspecification tests
We need to test the residuals
for normality.
We can save the residuals in
STATA, by issuing a command
that creates them, after we
have run the regression
command.
The command to generate the
residuals is
predict resid, residuals
19
20
Generation of
standardized residuals
Predict rstd, rstandard
21
Generation of
studentized residuals
Predict rstud, rstudent
22
24
A Graphical test of
heteroskedasticity:
rvfplot, border yline(0)
Cook-Weisberg Test
Var (ei ) = 2 exp( zt )
where
ei = error in regression model
z = x or variable list supplied by user
The test is whether t = 0
hettest estimates the model ei 2 = + zi t + i
SS of model
2
2 where p = number of parameters
26
Cook-Weisberg test
syntax
1. The command for this test is:
hettest resid
27
28
Create a time
dependent series
29
corrgram resid
31
Outlier detection
Outlier detection involves the
determination whether the residual
(error = predicted actual) is an
extreme negative or positive value.
We may plot the residual versus
the fitted plot to determine which
errors are large, after running the
regression.
The command syntax was already
demonstrated with the graph on
page 16: rvfplot, border yline(0)
32
Create Standardized
Residuals
A standardized residual is one
divided by its standard deviation.
yi yi
resid standardized =
s
where s = std dev of residuals
33
Standardized residuals
predict residstd, rstandard
list residstd
tabulate residstd
34
Limits of Standardized
Residuals
If the standardized residuals
have values in excess of 3.5
and -3.5, they are outliers.
If the absolute values are less
than 3.5, as these are, then
there are no outliers
While outliers by themselves
only distort mean prediction
when the sample size is small
enough, it is important to
gauge the influence of outliers.
35
Outlier Influence
Suppose we had a different
data set with two outliers.
We tabulate the standardized
residuals and obtain the
following output:
36
Y=a+bx
37
In this data set, we have two outliers. One is negative and the
other is positive.
38
Studentized Residuals
Alternatively, we could form
studentized residuals. These are
distributed as a t distribution with
df=n-p-1, though they are not
quite independent. Therefore, we
can approximately determine if
they are statistically significant or
not.
Belsley et al. (1980)
recommended the use of
studentized residuals.
39
Studentized Residual
ei =
s
ei
s 2 (i ) (1 hi )
where
ei s = studentized residual
s( i ) = standard deviation where ith obs is deleted
hi = leverage statistic
These are useful in estimating the statistical significance
of a particular observation, of which a dummy variable
indicator is formed. The t value of the studentized residual
will indicate whether or not that observation is a significant
outlier.
The command to generate studentized residuals, called rstudt is:
predict rstudt, rstudent
40
Influence of Outliers
1. Leverage is measured by the
diagonal components of the hat
matrix.
2. The hat matrix comes from the
formula for the regression of Y.
Y = X = X '( X ' X ) 1 X ' Y
where X '( X ' X ) 1 X ' = the hat matrix, H
Therefore,
Y = HY
41
42
Cooks D
1. Another measure of influence.
2. This is a popular one. The
formula for it is:
1 hi
ei 2
Cook ' s Di =
2
p 1 hi s (1 hi )
Cook and Weisberg(1982) suggested that values of
D that exceeded 50% of the F distribution (df = p, n-p)
are large.
43
Using Cooks D in
STATA
Predict cook, cooksd
Finding the influential outliers
List cook, if cook > 4/n
Belsley suggests 4/(n-k-1) as a cutoff
44
Graphical Exploration of
Outlier Influence
Graph cook residstd, xlab ylab
45
DFbeta
One can use the DFbetas to
ascertain the magnitude of
influence that an observation has
on a particular parameter estimate
if that observation is deleted.
DFbeta j =
b j b(i ) j u j
2
j
(1 h j )
where u j = residuals of
regression of x on remaining xs.
46
Obtaining DFbetas in
STATA
47
Nonlinearity
1.
2.
2.
Influential Outliers
1.
2.
3.
2.
5.
Heteroskedasticity of residuals
1.
4.
Transformation to linearity
Nonlinear regression
Autoregression with
prais y x1 x2, robust
newey-west regression
Nonnormality of residuals
1.
2.
Nonlinearity:
Transformations to linearity
1. When the equation is not
intrinsically nonlinear, the
dependent variable or
independent variable may be
transformed to effect a
linearization of the relationship.
2. Semi-log, translog, Box-Cox, or
power transformations may be
used for these purposes.
1. Boxcox regression permits
determines the optimal parameters
for many of these transformations.
49
nl exp2 y x
estimates Y = b1b2
nl exp3 y x
estimates y = b0 + b1b2
50
Nonlinear Regression in
Stata
. nl exp2 y x
(obs = 15)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Source
residual SS =
residual SS =
residual SS =
residual SS =
56.08297
49.46372
49.4593
49.4593
SS
df
MS
Number of obs =
15
F( 2, 13) = 1585.01
Model 12060.5407 2 6030.27035
Prob > F
= 0.0000
Residual 49.4592999
13 3.80456153
R-squared =
Adj R-squared = 0.9953
Total
12110
15 807.333333
Root MSE
= 1.950529
Res. dev. = 60.46465
2-param. exp. growth curve,
y=b1*b2^x
y
Coef.
Std. Err.
b1
b2
58.60656
.9611869
P>t
0.9959
51
Heteroskedasticity
correction
1. Prof. Halbert White showed that
heteroskedasticity could be
handled in a regression with a
heteroskedasticity-consistent
covariance matrix estimator
(Davidson & McKinnon (1993),
Estimation and Inference in
Econometrics, Oxford U Press,
p. 552).
2. This variance-covariance matrix
under ordinary least squares is
shown on the next page.
52
where = st /( X ' X )
2
53
54
Heteroskedastically consistent
covariance
matrix Sandwich estimator (H.
White)
Bread
Meat(tofu)
Bread
HC1 : =
et 2
HC 3 : =
(1 ht ) 2
55
56
57
Problems with
Autoregressive Errors
1.
2.
58
58
Sources of Autocorrelation
1. Lagged endogenous variables
2. Misspecification of the model
3. Simultaneity, feedback, or reciprocal
relationships
4. Seasonality or trend in the model
59
Prais-Winston
Transformation-contd
2
v
t
et 2 =
, therefore et =
2
(1 )
vt
(1 2 )
It follows that
Yt = a + bxt +
(1 2 Yt =
(1 2 a +
vt
(1 )
2
(1 2 bxt + vt
Yt * = a * + bxt * + vt
60
Autocorrelation of the
residuals: prais & newey
regression
To test whether the variable is
autocorrelated
Tsset time
corrgram y
prais y x1 x2, robust
newey y x1 x2, lag(1) t(time)
61
62
Prais-Winston Regression
for AR(1) errors
Using the robust option here guarantees that the
White heteroskedasticity consistent sandwich
variance-covariance estimator will be used
in the autoregression procedure.
63
Newey-West Robust
Standard errors
An autocorrelation correction is
added to the meat or tofu in the
White Sandwich estimator by
Newey-West.
n 1 ( X ' X ) 1 (n 1 X ' X )(n 1 X ' X ) 1
et 2
where =
1 ht 2
However , there are different versions :
HC0 : = et 2
n
et 2
nk
et 2
HC 2 : =
1 ht
HC1 : =
et 2
HC 3 : =
(1 ht ) 2
64
ei ei 1 ( xi ' xi 1 + xi 1 ' xi )
n k l =1
m+1
where k = number of predictors
l = time lag
m = maximum time lag
65
Newey-West Robust
Standard errors
Newey West standard errors are robust to autocorrelation
and heteroskedasticity with time series regression models.
66
Assume OLS
regression
We regress y on x1 x2 x3
We obtain the following output
67
Residual Assessment
68
Robust regression
algorithm: rreg
1. A regression is performed
and absolute residuals are
computed.
ri = | y i x i b |
ri
ui =
s
yi xi b
=
s
69
M
s=
0.6745
where
M = med (| ri med (ri ) |)
The residuals are scaled by the median absolute
value of the median residual.
70
Essential Algorithm
The estimator of the parameter b
minimizes the sum of a less
rapidly increasing function of the
residuals (SAS Institute, The
Robustreg Procedure, draft copy,
p.3505, forthcoming):
ri
Q(b) =
i =1
where ri = y
n
xi b
is estimated by s
71
Essential algorithm-contd
1. If this were OLS, the would be
a quadratic function.
2. If we can ascertain s, we
can by taking the derivatives
with respect to b, find a first
order solution to
ri
xij = 0,
s
i =1
where j = 1,..., p
n
= '
72
73
Iteratively reweighted
least squares
The case weight w(x) is defined as:
w( x) =
( x)
x
74
75
Weight Functions
Tukey biweight (bisquare)
Tuning Constants
When the residuals are normally
distributed and the tuning
constants are set at the default,
they give the procedure about
95% of the efficiency of OLS.
The tuning constants may be
adjusted to provide
downweighting of the outliers at
the expense of Gaussian
efficiency.
Higher tuning constants cause the
estimator to more closely
approximate OLS.
77
Robust Regression
algorithm contd
3. WLS regression is performed
using those case weights
4. Iterations case when case
weights drop below a
tolerance level
5. Weights are based initially on
Huber weights. Then Beaton
and Tukey biweights are
used.
6. Caveat: M estimation is not
that robust with regard to
leverage points.
78
79
Quantile Regression
qreg in STATA estimates least
absolute value ( LAV or MAD or
L1 norm regression).
The algorithm minimizes the sum of
the absolute deviations about the
median.
The formula generated estimates the
median rather than the mean, as
rreg does.
Ymedian = constant + bx
81
Median regression
82
Bootstrapping
Bootstrapping may be used to
obtain empirical regression
coefficients, standard errors,
confidence intervals, etc. when
the distribution is non-normal.
Bootstrapping may be applied
to qreg with bsqreg
83
Bootstrapping quantile or
median regression
standard errors
qreg y x1 x2 x3
bsqreg y x1 x2 x3, reps(1000)
84
Methods of Model
Validation
These methods may be
necessary where the sampling
distributions of the parameters
of interest are nonnormal or
unknown.
Bootstrapping
Cross-validation
Data-splitting
85
Bootstrapping
When the distribution of the
residuals is nonnormal or the
distribution is unknown,
bootstrapping can provide
proper regression coefficients,
standard errors, and
confidence intervals.
86
Stata Bootstrapping
Syntax
87
Internal Validation
R2 and adjusted R2
1. Plot Y against Y. Compute an R2
and an adjusted R2.
88
Cross-validation
Jacknifing
This is repeated sampling,
where one group or
observation is left out.
The analysis is reiterated and
the results are averaged to
obtain a validation.
89
Resampling
1.
90
Bootstrapped Formulae
x = xi / n
b
Var ( x)b =
b
b 2
(
x
avg
(
x
) /( B 1)
b =1
91
Data-splitting
1. Sample Splitting
1. Subset the sample into a training
and a validation subsample. One
has to be careful about the tail
wagging the dog, as David Reilly
is wont to say.
2. This results in poorer accuracy and
loss of power unless there is plenty
of data.
3. Tests for parameter constancy
92
Comparison of STATA,
SAS, and S-PLUS
Stata has rreg, qreg, bsqreg
Rreg is M estimation with Huber and Tukey bisquare
weight functions
qreg is quantile regression
Bsqreg is bootstrapped quantile regression
Bootstrapping
93