Modeling The S&P 500 Index Using The Kalman Filter and The Laglasso
Modeling The S&P 500 Index Using The Kalman Filter and The Laglasso
Modeling The S&P 500 Index Using The Kalman Filter and The Laglasso
Nicolas Mahler ENS Cachan & UniverSud CMLA UMR CNRS 8536 and Telecom Paristech (TSI) LTCI UMR Institut Telecom/CNRS 5141 nico.mahler@gmail.com
Abstract
This article introduces a method to predict upward and downward monthly variations of the S&P 500 index by using a pool of macro-economic and nancial explicative variables. The method is based on the combination of a denoising step, performed by Kalman ltering, with a variable selection step, performed by a Lasso-type procedure. In particular, we propose an implementation of the Lasso method called LagLasso which includes selection of lags for individual factors. We provide promising backtesting results of the prediction model based on a naive trading rule.
Introduction
We consider the problem of predicting monthly movements of the S&P 500 index, and assume that a small subset of macro-economic and nancial predictors can efciently represent the exogenous inuence on S&P 500. The inuence of each of these predictors can change over time and it can be lagged. Additionnally, according to economists (cf.[8]), S&P 500 is sensitive to the variations of those predictors around their own trend rather than to the variations themselves. Therefore, we need to ltrate the predictors : a linear state-space model is rst proposed for each of them and their innovation residuals are computed with the Kalman algorithm (cf.[4],[5],[6],[7]). These residuals are then used to predict S&P 500 variations : the most informative residuals are identied thanks to the Lasso method, a procedure which aims at minimizing a L2 regression t under L1 penalty. This constraint allows for a sparse selection which is not only a gain in terms of interpretability, but which allows for variance reduction leading to more accurate predictions. The issue of lagged inuence between variables is adressed by slightly modifying the Lasso. Indeed, as shown in [1] and [2], the Lasso is intimately connected to the LARS algorithm, which is an iterative procedure of variable selection generalizing the concept of bissector in a multidimensional framework. The basic idea consists in writing a variant of the Lasso, where both a variable and a lag are selected at each step, all the other lags of the variables being then eliminated from the possible further selections : we call it the LagLasso procedure. This approach is nally prospective, in the sense that we would like not only to build a competitive prediction method for S&P 500 but also to clearly state the problem of lag identication. This is different from [3], that introduces a LARS algorithm adapted to time series, where each variable can be represented by a matrix made of its lagged realisations : the algorithm manages to select iteratively blocks of lags corresponding to a single variable instead of single lags corresponding each to a variable. In the next section, a mathematical framework of this approach is given. The way linear state-space models are used to denoise variables is explained as well as the modeling of S&P 500 variations through the LagLasso procedure. In the last section, 1
a backtesting of this method is provided : it is built on a sample period of 20 years, uses a gliding window of 5 years and a small number of macro-economic and nancial indicators.
More formally, we observe the predictors xt = (xi,t )1id , where x.,t Rd , t and we want to forecast the real variable yt at horizon h. The forecasting linear model is proposed :
d
yt+h =
i=1
i xi,t(i) ,
()
where yt+h and xi,t are the innovation residuals of the variable xi,t corresponding, for each i, to a given linear state space model, where (i) is a lag corresponding to the ith variable, and where = (1 , . . . , d ) is a real vector. 2.1 First step : Kalman ltering
We propose the following linear state space model : zt = Ht t + vt , t = Ft t1 + wt , 0 N (m0 , V0 ), vt N (0, Vt ), wt N (0, Wt ),
where zt Rm stands for the observation vector, t Rp is a hidden random vector, Ht and Ft are real matrices of size respectively m p and p p, and are to be specied. The only parameters of the model are the observation and evolution variances V (matrice of size m m) and W (matrice of size p p), that we estimate from available data using maximum likelihood. The Kalman lter recursively estimates the internal state of the process t given the sequence of noisy observations zt . We denote by t|t the estimate of the state at time t given observations up to and including time t , and by Pt|t the associated error covariance matrix. This can be summed up by the system of equations : t|t1 = Ft t1|t1 , () Pt|t1 = Ft Pt1|t1 Ft + Wk1 , rt = zt Ht t|t1 , ( ) St = Ht Pt|t1 Ht + Vt , Kt = Pt|t1 H S 1 , t t = t|t t|t1 + Kt rt , Pt|t = (I Kt Ht )Pt|t1 , Equation (**) gives the predicted state at step t and equation (***) the innovation residual : this is the way we compute the quantities xt and yt stated in (*). Finally, in our implementation, we use such a model for the response yt and a single such model for all predictors xi,t , for simplicity of use. 2.2 Second step : selecting variables and lags with LagLasso
Predicting yt+h is achieved by selecting the most signicant variables and lags, knowing that only one lag can be chosen per variable. We implemented a Lasso-type procedure : the LagLasso, which aims at building the vector given in (*). From now on, we use the notation xi for x and y for i,t yt and we use a double index for to account for the variables and the lags. In addition, as for the Lasso, it is necessary to offer some criterion to choose a single step in this iterative process that determines a single vector : both Cp -type and cross-validation stopping criteria were considered.
In order to question the validity of this method and to explore possible renements, several simple test methods are given. All of them are based on the same principle : considering the last 20 years of 2
LagLasso steps. 0. Choose lag max and lag min : i [lag min , lag max ], i. 1. Standardization of the predictors xi, to have mean 0 and variance 1. Initialisation : r = y y, i, = 0, i, . 2. Find the predictor xj, most correlated with r. 3. Move j, from 0 towards its least-squares coefcient xj, , r , until some other competitor xk, , k = j, has as much correlation with the current residual as does xj, . 4. Move (j, , k, ) in the direction dened by their joint least squares coefcient of the current residual on (xj, , xk, ), until some other competitor xl, has as much correlation with the current residual, i.e. : < xl, , r >=< xk, , r >=< xj, , r >. 5a. If a non-zero coefcient hits zero, drop it from the active set, reinclude the variable and all its lags in the inactive set and recompute the current joint least squares direction. 5b. Eliminate all the lags corresponding to variable j from the inactive set. Continue until d variables are entered.
S&P 500 monthly variations, we use a gliding window containing a sufcient and constant number of points to make a prediction of the variation of the S&P 500 index over the next month. A number of successive predictions at horizon h = 1 month are obtained and compared with those computed with other methods, linear state-space models and regression particularly. Obviously, some explicative variables are needed and they have been chosen carefully : we checked that having too much correlated variables in the data basis is usually very counterproductive, which nally drastically limits the number of explicative variables. With the help of an economic expert, we chose PER, OIL, NAPM, INCOME and CORP PROFIT, that are all available on the website of the Federal Reserve Bank of St. Louis. A rst backtesting of this method consists in computing a recognition rate of upward and downward movements of the S&P 500 depending on the amplitude of the variation of the index. Results are provided for different maximal lags and for some other methods (cf. Table 1). In addition, the following naive trading rule is proposed. Imagine a trader that decides to sell or to buy one unit of S&P 500 index every month. If the prediction of the model for next month is positive, the trader buys; if it is negative, he sells. At the end of the backtesting period, prot and loss accounts - computed with different maximal lags and following similar strategies derived from other methods - are compared (cf. Figure 1).
Conclusion
A rst conclusion is that a multidimensional framework is usually more interesting than an unidimensional one. Furthermore, combining a ltering method with a selection method gives promising results : a simple state-space model combined with the Lasso outperforms all the other backtested methods. This has to be tempered by the delicate calibration of the model : if the database contains too much correlated variables, the results (recognition rate and prot and loss account) are clearly worse. And since macro-economic and nancial variables usually strongly depend on each other, this limits the number of predictors in the database. Unfortunately, taking lags into account does not improve signicantly neither the recognition rate nor the prot and loss accounts, although it is a phenomenon highlighted by economists. We believe further improvements can be reached through a better indexing of data. Table 1: recognition rate of S&P 500s upward and downward movements.
Amplitude of the variation 0 0.01 0.02 0.03 0.04 0.05 Kalman and LagLasso, lag max = 1 61.6% 62.5% 64.4% 64.6% 66% 66.6% Kalman and LagLasso, lag max = 6 60% 59.8% 58.8% 62.1% 66% 71.7% Kalman and LagLasso, lag max = 12 62.2% 61.8% 60.7% 64.6% 66% 71.7% Level Model 53.8% 55.2% 53.2% 53.6% 50.9% 51.2% Local Trend Model 56.1% 56.5% 58.8% 58.5% 64.1% 71.7% Lasso 61.6% 61.8% 60.7% 60.7% 59.7% 58.4% Regression 61.6% 61.8% 62.6% 62.1% 62.2% 56.4%
Figure 1: Backtesting KL stands for Kalman/LagLasso-type methods(lag max = 1, 6, 12), SSM for linear State-Space Models (Level Model and Local Trend Model) and Reg for both Regression and Lasso.
References
[1] Efron, B.; Hastie, T.; Johnstone, I. and Tishirani, R. (2004), "Least Angle Regression," The Annals of Statistics, 32, 407-451. [2] Hastie, T.; Taylor, J.; Tibshirani, R; and Walther, G. (2007), "Forward stagewise regression and the monotone lasso", Electron. J. Statist., 1, 1-29. [3] Croux, C; Gelper S. (2008), "Least Angle Regression for Time Series Forecasting with Many Predictors", 1-36 pp. Leuven: K.U.Leuven. [4] Kalman, R. (1960), "A new approach to linear ltering and prediction problem", Trans ASME Journal of Basic Engineering, 35-45. [5] Ruey, T., (2005), "Analysis of Financial Time Series", Wiley. [6] Welch, G.; Bishop, G., "An Introduction to the Kalman Filter", University of North Carolina, Department of Computer Science, TR 95-041. 1995. [7] Cappe, O., Moulines, E., Ryden, T. (2005), "Inference in Hidden Markov Models", Springer Series in Statistics, Springer-Verlag New York, Inc., Secaucus, NJ. [8] Schleifer, A. (2000), "An Introduction to Behavioral Finance", Oxford University Press.