0% found this document useful (0 votes)

95 views

Time Series Analysis and Forecasting Using ARIMA Modeling, Neural Network and Hybrid Model Using ELM

For past three decades ARIMA modeling of time series data has been the most used method for forecasting. Recent research has shown that using Artificial Neural Networks (ANNs) can improve model fit significantly giving better predictions. With some data ARIMA models and ANNs give mixed results in terms of model superiority. In this paper, we try to apply a hybrid model constituting of ARIMA model for the linear part in time series and ANN model for modeling the non-linear part. We prove that the result obtained by this hybrid model is better than any of the model used alone. Further, we have implemented Extreme Learning Machine technique as a substitute to ANN and identified some interesting results. Mean square error has been used as a measure of model’s strength.

Uploaded by

ved2903_iitk

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views

Time Series Analysis and Forecasting Using ARIMA Modeling, Neural Network and Hybrid Model Using ELM

Uploaded by

ved2903_iitk

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Time Series Analysis and forecasting using ARIMA modeling, Neural Network and Hybrid Model using ELM

Puneet Singh, Ved Prakash Gupta

Department of Mathematics and Scientific Computing, Indian Institute of Technology, Kanpur

Abstract For past three decades ARIMA modeling of time series data has been the most used method for forecasting. Recent research has shown that using Artificial Neural Networks (ANNs) can improve model fit significantly giving better predictions. With some data ARIMA models and ANNs give mixed results in terms of model superiority. In this paper, we try to apply a hybrid model constituting of ARIMA model for the linear part in time series and ANN model for modeling the non-linear part. We prove that the result obtained by this hybrid model is better than any of the model used alone. Further, we have implemented Extreme Learning Machine technique as a substitute to ANN and identified some interesting results. Mean square error has been used as a measure of models strength.

Introduction Time series forecasting is one the most important area of forecasting where future value of a variable is predicted based on the past value of the same variable. A model is estimated based on the properties of the actual data and the obtained model is used for extrapolating the time series in to the future. This method works even when there is no significant information about the origin of the data and its nature. A lot of research is aimed at improving the time series forecasting models. One of the most important and widely used time series models is the autoregressive integrated moving average (ARIMA) model. The popularity of the ARIMA model is due to its statistical properties as well as the well-known BoxJenkins methodology in the model building process. In addition, various exponential smoothing models can be implemented by ARIMA models. Although ARIMA models are quite flexible in that they can represent several different types of time series, i.e., pure autoregressive (AR), pure moving average (MA) and combined AR and MA (ARMA) series, their major limitation is the pre-assumed linear form of the model. That is, a linear correlation structure is assumed among the time series values and therefore, no nonlinear patterns can be captured by the ARIMA model. The approximation of linear models to complex real-world problem is not always satisfactory. Recently, artificial neural networks (ANNs) have been extensively studied and used in time series forecasting. Zhang et al. presented a recent review in this area. Artificial neural networks (ANNs) are one of the most accurate and widely used forecasting models that have enjoyed fruitful applications in forecasting social, economic, engineering, foreign exchange, stock problems, etc. Several distinguishing features of artificial neural networks make them valuable and attractive for a forecasting task. First, as opposed to the traditional modelbased methods, artificial neural networks are data-driven self-adaptive methods in that there are few a priori assumptions about the models for problems under study. Second, artificial neural networks can generalize. After learning the data presented to them (a sample), ANNs can often correctly infer the unseen part of a population even if the sample data contain noisy information. Third, ANNs are universal functional approximators. It has been shown that a network can approximate any continuous function to any desired accuracy. Finally, artificial neural networks are nonlinear. The traditional approaches to time series prediction, such as the BoxJenkins or ARIMA, assume that the time series under study are generated from linear processes. However, they may be inappropriate if the underlying mechanism is nonlinear. In fact, real world systems are often nonlinear [Zhang et al.]. In this paper, auto-regressive integrated moving average models are applied to construct a new hybrid model in order to yield more accurate model than artificial neural networks. In our proposed model, the future value of a time series is considered as nonlinear function of several past observations and random errors. Therefore, in the first phase, an auto-regressive integrated moving average model is used in order to generate the necessary data from under study time series. Then, in the second phase, a neural network is used to model the generated data by ARIMA model, and to predict the future value of time series. Two well-known data sets the Wolfs sunspot data and the Canadian lynx data are used in this paper in order to show the appropriateness and effectiveness of the proposed model to time series forecasting.

Time Series Forecasting Methods For more than half a century, auto-regressive integrated moving average (ARIMA) models have dominated many areas of time series forecasting. In an ARIMA (p,d,q)model, the future value of a variable is assumed to be a linear function of several past observations and random errors. That is, the underlying process that generates the time series with the mean has the form: ( ) where, and ( ) ( ) .(1)

are the actual value and random error at time period t, respectively; ( ) ( ) .(2)

are polynomials in B of degree p and q, (i=1,2,3,..p) and (i=1,2,3.q) are model parameters, is the backward shift operator, p and q are integers and often referred to as orders of the model, and d is an integer and often referred to as order of differencing. Random errors are assumed to be independently and identically distributed with a mean of zero and a constant variance of . The Box and Jenkins (1976) methodology includes three iterative steps of model identification, parameter estimation, and diagnostic checking. The basic idea of model identification is that if a time series is generated from an ARIMA process, it should have some theoretical autocorrelation properties. By matching the empirical autocorrelation patterns with the theoretical ones, it is often possible to identify one or several potential models for the given time series. Box and Jenkins (1976) proposed to use the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the sample data as the basic tools to identify the order of the ARIMA model. Some other order selection methods have been proposed based on validity criteria, the informationtheoretic approaches such as the Akaikes information criterion (AIC) (Shibata, 1976) and the minimum description length (MDL) (Hurvich & Tsai, 1989; Jones, 1975; Ljung, 1987). In addition, in recent years different approaches based on intelligent paradigms, such as neural networks (Hwang, 2001), genetic algorithms (Minerva & Poli, 2001; Ong, Huang, & Tzeng, 2005) or fuzzy system (Haseyama & Kitajima, 2001) have been proposed to improve the accuracy of order selection of ARIMA models. In the identification step, data transformation is often required to make the time series stationary. Stationarity is a necessary condition in building an ARIMA model used for forecasting. A stationary time series is characterized by statistical characteristics such as the mean and the autocorrelation structure being constant over time. When the observed time series presents trend and heteroscedasticity, differencing and power transformation are applied to the data to remove the trend and to stabilize the variance before an ARIMA model can be fitted. Once a tentative model is identified, estimation of the model parameters is straightforward. The parameters are estimated such that an overall measure of errors is minimized. This can be accomplished using a nonlinear optimization procedure. The last step in model building is the diagnostic checking of model adequacy. This is basically to check if the model assumptions about the errors are satisfied. Several diagnostic statistics and plots of the residuals can be used to examine the goodness of fit of the tentatively entertained model to the historical data. If the model is not adequate, a new tentative model should be identified, which will again be followed by the steps of parameter estimation and model verification.

Diagnostic information may help suggest alternative model(s). This three-step model building process is typically repeated several times until a satisfactory model is finally selected. The final selected model can then be used for prediction purposes.

The ANN approach to time series modeling Recently, computational intelligence systems and among them artificial neural networks (ANNs), which in fact are model free dynamics, has been used widely for approximation functions and forecasting. One of the most significant advantages of the ANN models over other classes of nonlinear models is that ANNs are universal approximators that can approximate a large class of functions with a high degree of accuracy (Chen, Leung, & Hazem, 2003; Zhang & Min Qi, 2005). Their power comes from the parallel processing of the information from the data. No prior assumption of the model form is required in the model building process. Instead, the network model is largely determined by the characteristics of the data. Single hidden layer feed forward network is the most widely used model form for time series modeling and forecasting (Zhang et al., 1998). The model is characterized by a network of three layers of simple processing units connected by acyclic links (Fig. 1). The relationship between the output and the ( ) has the following mathematical representation: ( ) .(3)

where, (i=0,1,2p, j=1q) (j=0,1,2q) are model parameters often called connection weights; p is the number of input nodes; and q is the number of hidden nodes. Activation functions can take several forms. The type of activation function is indicated by the situation of the neuron within the network. In the majority of cases input layer neurons do not have an activation function, as their role is to transfer the inputs to the hidden layer. The most widely used activation function for the output layer is the linear function as non-linear activation function may introduce distortion to the predicated output. The logistic and hyperbolic functions are often used as the hidden layer transfer function that are shown in Eqs.(4) and (5), respectively. Other activation functions can also be used such as linear and quadratic, each with a variety of modeling applications. ( )
( )

(4)

( )

( (

) )

(5)

Hence, the ANN model of(1), in fact, performs a nonlinear functional mapping from past observations to the future value , i.e.,

(6)

where, w is a vector of all parameters and f() is a function determined by the network structure and connection weights. Thus, the neural network is equivalent to a nonlinear auto-regressive model. The simple network given by (1) is surprisingly powerful in that it is able to approximate the arbitrary function as the number of hidden nodes when q is sufficiently large. In practice, simple network structure that has a small number of hidden nodes often works well in out-of-sample forecasting. This may be due to the over fitting effect typically found in the neural network modeling process. An overfitted model has a good fit to the sample used for model building but has poor generalizability to data out of the sample (Demuth & Beale, 2004). There are some similarities between ARIMA and ANN models. Both of them include a rich class of different models with different model orders. Data transformation is often necessary to get best results. A relatively large sample is required in order to build a successful model. The iterative experimental nature is common to their modeling processes and the subjective judgment is sometimes needed in implementing the model. Because of the potential over fitting effect with both models, parsimony is often a guiding principle in choosing an appropriate model for forecasting. Overview of Extreme Learning Machine Gradient descent based algorithms require all the weights be updated after every iteration. So, gradient based algorithms are generally slow and may easily converge to local minima. On the other hand, ELM, proposed by Huang et al. in 2004, randomly assigns the weights connecting input and hidden layers; and hidden biases. Then it analytically determines the output weight using the Moore-Penrose generalized inverse. It has been proved in that given randomly assigned input weights and hidden biases with almost any non-zero activation function, we can approximate any continuous function on compact sets. Unlike the traditional algorithms, ELM not only achieves the minimum error but also assigns the smallest norm for the output weights. The reason for using Moore-Penrose inverse is that according to Bartletts theory, smaller norm of weights results in better generalization of the feedforward neural network. The advantages of ELM over traditional algorithms are as follows ELM can be up to 1000 times faster than the traditional algorithm. ELM has better generalization performance as it not only reaches the smallest error but also the assigns smallest norm of weights. Non-differentiable functions can also be used to train SLFNs with ELM learning algorithms. Given a training set ( ) , activation function g(x), and hidden node number the mathematical equation for the neural network can be written as: where

( [

) ] is the weight vector connecting the ith hidden neuron and the input

neurons and [ ] is the weight vector connecting the ith hidden neuron and the output neurons. tj denotes the target vector of the input xj whereas oj denotes the output vector obtained from the neural network. wi.xj denotes the inner product of wi and xj. The output neurons are chosen to be linear.

Standard SLFNs with hidden neurons with activation function g(x) can approximate these N samples with zero mean error means that

, i.e, there exists

such that

The above n equations can be written compactly as: ( [ ( ) ( ) (

where
)

] )

[ ]

The smallest norm least squared solution of the above linear system is:

The algorithm for the ELM with architecture as shown in figure 1 can be summarized as follows: Given training sample N, activation function g(x) and number of hidden neurons , 1. Assign random input weights andbias 2. Calculate the hidden layer output matrix H. 3. Calculate the output weight

The Hybrid Methodology Both ARIMA and ANN models have achieved successes in their own linear or nonlinear domains. However, none of them is a universal model that is suitable for all circumstances. The approximation of ARIMA models to complex nonlinear problems may not be adequate. On the other hand, using ANNs to model linear problems have yielded mixed results. For example, using simulated data, Denton showed that when there are outliers or multicollinearity in the data, neural networks can significantly outperform linear regression models. Markham and Rakes also found that the performance of ANNs for linear regression problems depends on the sample size and noise level. Hence, it is not wise to apply ANNs blindly to any type of data. Since it is difficult to completely know the characteristics of the data in a real problem, hybrid methodology that has both linear and nonlinear modeling capabilities can be a good strategy for practical use. By combining different models, different aspects of the underlying patterns may be captured. It may be reasonable to consider a time series to be composed of a linear autocorrelation structure and a nonlinear component. That is, ,

where , denotes the linear component and Nt denotes the nonlinear component. These two components have to be estimated from the data. First, we let ARIMA to model the linear component, and then the residuals from the linear model will contain only the nonlinear relationship. Let et denote the residual at time t from the linear model, then Where is the forecast value for time t from the estimated relationship (2). Residuals are important in diagnosis of the sufficiency of linear models. A linear model is not sufficient if there are still linear correlation structures left in the residuals. However, residual analysis is not able to detect any nonlinear patterns in the data. In fact, there is currently no general diagnostic statistics for nonlinear autocorrelation relationships. Therefore, even if a model has passed diagnostic checking, the model may still not be adequate in that nonlinear relationships have not been appropriately modeled. Any significant nonlinear pattern in the residuals will indicate the limitation of the ARIMA. By modeling residuals using ANNs, nonlinear relationships can be discovered. With n input nodes, the ANN model for the residuals will be ( ) (7)

where f is a nonlinear function determined by the neural network and t is the random error. Note that if the model f is not an appropriate one, the error term is not necessarily random. Therefore, the correct model identification is critical. Denote the forecast from (7) as View the source, the combined forecast will be

In summary, the proposed methodology of the hybrid system consists of two steps. In the first step, an ARIMA model is used to analyze the linear part of the problem. In the second step, a neural network model is developed to model the residuals from the ARIMA model. Since the ARIMA model cannot capture the nonlinear structure of the data, the residuals of linear model will contain information about the nonlinearity. The results from the neural network can be used as predictions of the error terms for the ARIMA model. The hybrid model exploits the unique feature and strength of ARIMA model as well as ANN model in determining different patterns. Thus, it could be advantageous to model linear and nonlinear patterns separately by using different models and then combine the forecasts to improve the overall modeling and forecasting performance. As previously mentioned, in building ARIMA as well as ANN models, subjective judgment of the model order as well as the model adequacy is often needed. It is possible that suboptimal models may be used in the hybrid method. For example, the current practice of BoxJenkins methodology focuses on the low order autocorrelation. A model is considered adequate if low order autocorrelations are not significant even though significant autocorrelations of higher order still exist. This sub optimality may not affect the usefulness of the hybrid model. Granger has pointed out that for a hybrid model to produce superior forecasts, the component model should be suboptimal. In general, it has been observed that it is more effective to combine individual forecasts that are based on different information sets.

Empirical Results 1)Data Sets Two well-known data setsthe Wolf's sunspot data and the Canadian lynx data are used in this study to demonstrate the effectiveness of the hybrid method. These time series come from different areas and have different statistical characteristics. They have been widely studied in the statistical as well as the neural network literature. Both linear and nonlinear models have been applied to these data sets, although more or less nonlinearities have been found in these series.

The sunspot data we consider contains the annual number of sunspots from 1700 to 2012, giving a total of 313 observations. The study of sunspot activity has practical importance to geophysicists, environment scientists, and climatologists. The data series is regarded as nonlinear and non-Gaussian and is often used to evaluate the effectiveness of nonlinear models. The plot of this time series (see Fig. 1) also suggests that there is a cyclical pattern with a mean cycle of about 11 years. The sunspot data have been extensively studied with a vast variety of linear and nonlinear time series models including ARIMA and ANNs. The lynx series contains the number of lynx trapped per year in the Mackenzie River district of Northern Canada. The data shows a periodicity of approximately 10 years. The data set has 114 observations,

corresponding to the period of 18211934. It has also been extensively analyzed in the time series literature with a focus on the nonlinear modeling. Following other studies, the logarithms (to the base 10) of the data are used in the analysis. To assess the forecasting performance of different models, each data set is divided into two samples of training and testing. The training data set is used exclusively for model development and then the test sample is used to evaluate the established model. The data compositions for the three data sets are given in Table 1. 2) Results Only the one-step-ahead forecasting is considered. The mean squared error (MSE) is selected to be the forecasting accuracy measures. Table 1 gives the forecasting results for the sunspot data. A subset autoregressive model of order 9 has been found to be the most parsimonious among all ARIMA models that are also found adequate judged by the residual analysis.. The neural model and Extreme Learning Machine used is a 231 network based on experimental results achieved by calculating the mean square error. Forecast horizons of 67 periods are used. Results show that while applying neural networks alone could not improve the forecasting accuracy over the ARIMA model in the 67-period horizon. The results of the hybrid model show that by combining two models together, the overall forecasting errors can be significantly reduced. The comparison between the actual value and the forecast value for the 67 points out-of-sample is given in the figure below. Although at some data points, the hybrid model gives worse predictions than either ARIMA or ELM forecasts, its overall forecasting capability is improved.

Table 1: Sunspot Data 67 points ahead Model ARIMA ANN ELM Hybrid(ARIMA+ELM) Hybrid(ARIMA+ANN)

Mean Square Estimate Value 421.73 512.02 517.64 403.57 401.69

In a similar fashion, we have fitted to Canadian lynx data with a subset AR model of order 12. This is a parsimonious model also used by Subba Rao and Gabr and others. The overall forecasting results for the last 14 years are summarized in Table 2. A neural network structure/Extreme Learning Structure of 221 gives is not able to improve the results of ARIMA models. Applying the hybrid method, we find a significant decrease in the MSE error of the data. Figure below gives the gives the actual vs. forecast values with individual models of Extreme Learning Machine and ARIMA as well as the combined model.

Table 2: Lynx Data 14 points ahead Model ARIMA ANN ELM Hybrid(ARIMA+ELM) Hybrid(ARIMA+ANN)

Mean Square Estimate Value 0.128 .148 .154 .118 .123

Conclusion Applying quantitative methods for forecasting and assisting investment decision making has become more indispensable in business practices than ever before. Time series forecasting is one of the most important quantitative models that has received considerable amount of attention in the literature. Artificial neural networks (ANNs) have shown to be an effective, general-purpose approach for pattern recognition, classification, clustering, and especially time series prediction with a high degree of accuracy Nevertheless, their performance is not always satisfactory. Theoretical as well empirical evidences in the literature suggest that by using dissimilar models or models that disagree each other strongly, the hybrid model will have lower generalization variance or error. Additionally, because of the possible unstable or changing patterns in the data, using the hybrid method can reduce the model uncertainty, which typically occurred in statistical inference and time series forecasting. In this paper, the auto-regressive integrated moving average models and ELM models are applied to propose a new hybrid method for improving the performance of the artificial neural networks to time series forecasting. In our proposed model, based on the BoxJenkins methodology in linear modeling, a time series is considered as nonlinear function of several past observations and random errors. Therefore, in the first stage, an autoregressive integrated moving average model is used in order to generate the necessary data, and Extreme Learning Machine is used to determine a model in order to capture the underlying data generating process and predict the future, using preprocessed data. Empirical results with two well-known real data sets indicate that the proposed model can be an effective way in order to yield more accurate model than traditional artificial neural networks. Thus, it can be used as an appropriate alternative for artificial neural networks.

References [1] J.M. Bates, C.W.J. Granger, The combination of forecasts, Oper. Res. Q., 20 (1969), pp. 451468 [2] G.E.P. Box, G. Jenkins,Time Series Analysis, Forecasting and ControlHolden-Day, San Francisco, CA (1970) [3] M.J. Campbell, A.M. Walker,A survey of statistical work on the MacKenzie River series of annual Canadian lynx trappings for the years 18211934, and a new analysis,J. R. Statist. Soc. Ser. A, 140 (1977), pp. 411431 [4] F.C. Palm, A. Zellner,To combine or not to combine? Issues of combining forecasts,J. Forecasting, 11 (1992), pp. 687701 [5] Z. Tang, C. Almeida, P.A. Fishwick, Time series forecasting using neural networks vs BoxJenkins methodology,Simulation, 57 (1991), pp. 303310 [6] G.B. Huang, Q. Zhu, C.Siew, Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks, International Joint Conference on Neural Networks (2004), Vol. 2, pp:985-990 [7] G.B. Huang, Q. Zhu, C. Siew, Extreme Learning Machine: Theory and Applications, Neurocomputing (2006), Vol. 70, pp: 489-501 [8] G.B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme Learning Machine for Regression and Multiclass Classification, IEEE Transaction on Systems, Man. And Cybernetics(2012), Cybernetics, Vol. 42, No. 2