Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

7SSMM700 Lecture 8

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Regularization techniques and shrinkage estimators

7SSMM700 Quantitative Methods for Finance and Data Analytics

Leone Leonida and Marina Dolfin


M.Sc. Banking and Finance
King’s Business School

King’s College London

Week 8

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 1 / 33
The big data era is creating plenty of opportunities for new developments
in econometrics, economics, and finance.

The recent improvements in computer science are making data collection


easier.

Meanwhile, the analysis of large data sets poses methodological challenges


(for example, high-frequency observations, unstructured data, and new
large datasets).

We explore the recent advancements in macroeconometrics and empirical


financial analysis to deal with big data. See how machine learning
techniques can be implemented to estimate macro and financial data for
forecasting and structural analysis.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 2 / 33
Tools to manage Big Data
Introduction to Regularization

Theoretical approach

Ridge Regression

Lasso Regression

Elastic Net Regression

Variable selection in time series applications

CLRM vs. ML: an example in predicting returns on the Forex with


high frequency data.
L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 3 / 33
Introduction to Regularization
Consider the standard model of ordinary least squares (OLS) for multiple
linear regression

y = Xβ + e (1)
where y ∈ RN , β ∈ Rk , and X ∈ RN ×k .

Multicollinearity and Overfitting

The assumptions for the Gauss-Markov theorem are mostly true in smaller
datasets, which makes OLS a very powerful tool for statisticians and
scientists.

However, these assumptions tend to be false with sufficiently large


datasets and therefore the OLS method can cause some issues with the
resulting model.
L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 4 / 33
Multicollinearity and Overfitting

One of the most common issues with the OLS method is the tendency for
the model to overfit the data when there is too much noise caused by
correlated variables. This can happen in many different situations.

The most extreme case occurs when p > n. In this case there exists no
unique solution to the system and linear regression fails to produce
accurate coefficient values.

With less extreme situations multicollinearity can cause the model to be


overly sensitive to small changes in parameter values and coefficients can
have the ”wrong” sign or an incorrect order of magnitude.

When overfitting, the coefficients can hive high standard errors and low
levels of significance despite a high R 2 value.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 5 / 33
Ridge Regression and LASSO (Least Absolute Shrinkage and Selection)
improve the overall accuracy of ordinary least squares regression by adding
a bias that imposes shrinkage on the model that greatly reduces the
variance of coefficient estimates.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 6 / 33
Regularization or shrinkage

Both LASSO and ridge regression estimation minimize the sum-of-squares


plus a penalty. The latter encourages the estimator to be small, in
particular closer to zero.

The penalty imposes a shrinkage on the coefficient estimates of ordinary


least squares. and its particular form yields different types of shrinkage.

Generally, for the Lp regularization term we have


1
Lp = (∑ | β i |p ) p
i

Ridge and LASSO deal with the L2 and L1 penalties respectively.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 7 / 33
Some notation for LASSO and Ridge penalties

L1 = ∑ | βi |
i

q
L2 = ∑ β2i
i

Regularization is used in preference over other common methods of


determining the best linear model, such as best subset selection and
stepwise subset selection.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 8 / 33
Ridge Regression

RSS

Ridge regression adds the L2 penalty such that we have

(y − X β)T (y − X β) + λβT β (2)

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 9 / 33
Ridge estimator
From equation (2) we can derive the ridge coefficient estimate:

(y − X β)T (y − X β) + λβT β
= (y T − βT X T )(y − X β) + λβT β
= y T Y − Y T X β − βT X T y + βT X T X β + λβT β
= y T Y − 2y T X β + βT X T X β + λβT β


→ = 0 − 2y T X + 2X T X β + 2λβ = 0
∂β

β̂ ridge = (X T X + λI )−1 X T y (3)


L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 10 / 33
Discussion of the Ridge estimator

In equations (2) and (3), λ ≥ 0 is a tuning parameter for the penalty,


which is determined separately.

When λ = 0 we have the ordinary least squares estimate, and when


λ → ∞ all the coefficients approach zero.

The selection of lambda, and thus the optimal model, will be not be
discussed here. However, there is an existence theorem that states there
always exists a λ > 0 such that the MSE is less than that of the least
squares estimate λ = 0. A proof of the theorem can be found in Hoerl
(1970).

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 11 / 33
Advantages of Ridge

Ridge regression’s advantage over ordinary least squares lies in it’s


bias-variance trade-off. As λ increases, the flexibility of the model fit
decreases. This leads to increased bias but decreased variance.

Disadvantages of Ridge

Ridge regression has a major disadvantage to other methods dealing with


ill-posed problems and overfitting: it does not perform feature selection.
While ridge shrinks coefficients towards zero, the final model chosen will
always include all of the predictors (unless λ = ∞ in which all predictors
will be zero).

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 12 / 33
LASSO estimator

The LASSO model can be shown in a similar form as for the Ridge
p
(y − X β )T (y − X β ) + λ ∑ | β j | (4)
j =1

Advantages The largest benefit of LASSO is the model’s ability to create


sparse matrices, i.e. performing feature selection.

Disadvantages The main disadvantage with respect to Ridge is that


because the L1 penalty contains absolute values, it is much more difficult
to solve analytically. Like Ridge, the correct choice of λ and thus the
optimal model is very important.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 13 / 33
Comparing Ridge and LASSO estimators
Alternatively, one can show that the Ridge and LASSO models solve these
equations respectively:
Ridge equation

(y − X β)T (y − X β) such that βT β ≤ t (5)

LASSO equation

(y − X β)T (y − X β) such that | β| ≤ t (6)

This means that for every value of λ in Ridge and LASSO, there exists a t
such that you will get the same coefficient estimates for (2) and (5), and
the same estimates for (4) and (6). When p = 2, (5) shows that ridge
regression has the smallest RSS out of all the points that lie within the
diamond defined by | β 1 | + | β 2 | ≤ t. (6) shows LASSO performs the same
with the points that lie within the circle defined by β21 + β22 ≤ t.
L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 14 / 33
Graphical representation

Figure: Image taken from 1

1 Tibshirani, R. (1996).
L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 15 / 33
Graphical representation: comment

In the above figure, β̂ is the least squares solution, and the diamond and
circle portray the ridge and LASSO constraints given in (6). The ellipses
around β̂ are lines of constant RSS.

Equation (5) shows that the LASSO and ridge coefficient estimate is
where the ellipses and constraint regions meet.

Since the constraint region of ridge is a circle, the probability that the
intersection will occur on an axis is zero. In contrast, the diamond
constraint region has corners at each axis, so the intersection of the
ellipses and the constraint region will often occur on the axis.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 16 / 33
Instabilities in the Ridge regression

Figure: If moving along the “ridge ”, large changes in the parameters estimate
cause small changes in the error, determining instabilities.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 17 / 33
Elastic Net Regression

General penalty term


p
λ ∑ [(1 − α)| β j | + α( β j )2 ]
j =1

λ=0

OLS.

α=1
Ridge regression.

α=0

LASSO regression.
L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 18 / 33
All regressors must be standardized, i.e. they have zero mean and variance
equal to 1.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 19 / 33
Application to returns in the Forex market
Data Intra-day data, five years (January 2007 - January 2011) worth of
one-minute, on returns on the currency pair EUR/USD.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 20 / 33
Application to returns in the Forex market

Model Linear regression model

Forecasting Forecasting future returns (60 minutes), using short-term


returns and technical indicators (15 regressors).

Tools Comparison CLRM vs. stepwise selection vs. ML by cross-validation


on in-sample and out-sample.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 21 / 33
Set of regressors: recent returns and technical indicators

Recent returns
N-Minute returns
over windows lengths of 5, 10, 15, 20, 25, 30, 60 mins

MACD Moving Average Convergence/Divergence

RSI Relative Strength Index


over windows lengths of 5, 10, 15, 20, 25, 30, 60 mins

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 22 / 33
Regressand

Future returns (60 mins in advance)

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 23 / 33
OLS estimate

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 24 / 33
Back-testing

Look for the effectiveness of the prediction by adopting a back-testing


strategy.

We calculate the returns by simply multiplying the actual returns by the


sign of the predicted returns, then apply a cumprod to generate a returns
curve

Trading with the model Take a very simple strategy (realistically, we would
put in some minimum return before we actually trade): whenever the
returns prediction is greater than 0, go long, and whenever the returns
prediction is less then 0, go short

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 25 / 33
Figure: Backtesting the OLS estimate

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 26 / 33
Repeat this process using a stepwise algorithm to discard terms and retain
only those predictors with the most predictive capability.
The algorithm remove all those factors that are less statistically significant,
returning a more compact model.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 27 / 33
Stepwise estimate

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 28 / 33
Backtesting the OLS and stepwise estimates

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 29 / 33
Comparison with the Elastic Net regularization

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 30 / 33
Conclusions

The Elastic Net Regularization is outperforming on the out-sample both


the OLS and the stepwise, even if robustness with respect to the splitting
of the data should be checked before reaching a definite conclusion.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 31 / 33
Readings
Tibshirani, R. (1996) “Regression Shrinkage and Selection via the Lasso.”
Journal of the Royal Statistical Society. Series B, Vol. 58, No. 1, 267–288.

Zou, H., and T. Hastie. (2005) “Regularization and Variable Selection via
the Elastic Net.” Journal of the Royal Statistical Society. Series B, Vol.
67, No. 2, 301–320.

Friedman, J., R. Tibshirani, and T. Hastie. (2010) “Regularization Paths


for Generalized Linear Models via Coordinate Descent.” Journal of
Statistical Software. Vol. 33, No. 1, https://www.jstatsoft.org/v33/i01

Hastie, T., R. Tibshirani, and J. Friedman. (2008) The Elements of


Statistical Learning. 2nd edition. New York: Springer.

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 32 / 33
THANK YOU

L. Leonida - M. Dolfin (King’s College London) M.Sc. Banking and Finance Week 8 33 / 33

You might also like