Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
104 views

Notes

This document provides information about a project titled "Stock Prediction using ARIMA Model". It is an internal project with Prof. Internal Guide Name as the internal guide. The project aims to predict time series using an ARIMA model. The abstract discusses that stock price prediction is an important topic that has interested researchers, and ARIMA models have been used for time series prediction. The document lists several relevant conferences and journals where papers on this topic have been published. It also provides background on time series analysis and its importance in forecasting future values.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

Notes

This document provides information about a project titled "Stock Prediction using ARIMA Model". It is an internal project with Prof. Internal Guide Name as the internal guide. The project aims to predict time series using an ARIMA model. The abstract discusses that stock price prediction is an important topic that has interested researchers, and ARIMA models have been used for time series prediction. The document lists several relevant conferences and journals where papers on this topic have been published. It also provides background on time series analysis and its importance in forecasting future values.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

SYNOPSIS

1 Group Id
Mention Group ID

2 Project Title
Stock Prediction using ARIMA Model

3 Project Option
Please mention type either industry sponsored, entrepreneur or internal project

4 Internal Guide
Prof. Internal Guide Name

5 Sponsorship and External Guide


Please write if any sponsorship

6 Technical Keywords (As per ACM Keywords)


Please note ACM Keywords can be found :
http://www.acm.org/about/class/ccs98- html
Example is given as

1.C. Computer Systems Organization

(a)C.2 COMPUTER-COMMUNICATION NETWORKS


i.C.2.4 Distributed Systems
A.Client/server
B.Distributed applications
C.Distributed databases
D.Network operating systems
E.Distributed file systems
F.Security and reliability issues in distributed applications
7 Problem Statement
To predict Time Series Analysis using Arima Model

8 Abstract
 As we know today everything in this world is getting on internet and getting
automated so here, we are doing the same thing using the help of the knowledge we
are having and with the help of our guide.
 Stock price prediction is an important topic in finance and economics which has
spurred the interest of researchers over the years to develop better predictive
models. The autoregressive integrated moving average (ARIMA) models have been
explored in literature for time series prediction.

9 Goals and Objectives


• Objectives

10 Relevant mathematics associated with the Project


System Description:
• Input:
• Output:

Identify data structures, classes, divide and conquer strategies to exploit
distributed/parallel/concurrent processing, constraints.
Functions
• : Identify Objects, Morphisms, Overloading in functions, Func- tional
relations
• Mathematical formulation if possible
• Success Conditions:
• Failure Conditions:

2
11 Names of Conferences / Journals where papers can
be published

 Kalid Yunus, Torbjorn Thiringer, and Peiyuan Chen, “ARIMA-Based Frequency-


Decomposed Modeling of Wind Speed Time Series”, IEEE Transaction on Power
Systems, 31(4), 2546-2556, 2016.

 Y. S. Lee, L. I. Tong, “Forecasting Time Series Using a Methodology Based on


Autoregressive Integrated Moving Average and Genetic Programming”, Knowledge-
Based Systems, 24, 66-72, 2011.

 G. Liu, D. Zhang and T. Zhang, "Software Reliability Forecasting: Singular Spectrum


Analysis and ARIMA Hybrid Model", Theoretical Aspects of Software Engineering
(TASE), 2015 International Symposium on Nanjing, 111-118, 2015.

 A. Vaccaro, T. H. M. EL-Fouly, C. A. Cañizares and K. Bhattacharya, "Local


Learning- ARIMA Adaptive Hybrid Architecture for Hourly Electricity Price
Forecasting", PowerTech, 2015 IEEE Eindhoven, 1-6, 2015.

 Takaomi HIRATA, Takashi KUREMOTO, Masanao OBAYASHI, Shingo MABU,


“Time Series Prediction Using DBN and ARIMA”, International Conference on
Computer Application Technologies, IEEE, 2015.

 G. Liu, D. Zhang and T. Zhang, "Software Reliability Forecasting: Singular Spectrum


Analysis and ARIMA Hybrid Model", Theoretical Aspects of Software Engineering
(TASE), 2015 International Symposium on Nanjing, 111-118, 2015.

 Sornpon Wichaidit, Surin Kittitornkun, “Predicting SET50 Stock Prices Using


CARIMA”, International Computer Science and Engineering Conference (ICSEC),
2015.

 A. Vaccaro, T. H. M. EL-Fouly, C. A. Cañizares and K. Bhattacharya, "Local


Learning- ARIMA Adaptive Hybrid Architecture for Hourly Electricity Price
Forecasting", Powertech, 2015 IEEE Eindhoven, 1-6, 2015.

 W. Jacobs, A. M. Souza and R. R. Zanini – “Combination of Box-Jenkins and


MLP/RNA Models for Forecasting Combining Forecasting of Box-Jenkins”, IEEE

3
LATIN AMERICA TRANSACTIONS, 14( 4), 2016.

4
 S. Li, H. Wang, Y. Tian, Y. Shen and A. Aitouche, "Wind Speed Forecasting
Based on Fuzzy-Neural Network Combination Method", The 27th Chinese
Control and Decision Conference (2015 CCDC), Qingdao, 4811-4816, 2015.

 http://shodhganga.inflibnet.ac.in/bitstream/10603/661/14/08_chapter
%203.pdf , Chapter 3, Review of Literature.

 https://www.quora.com/What-is-the-importance-and-usage-of-Time-Series

 Idrees, Sheikh Mohammad, M. Afshar Alam, and Parul Agarwal. "A


Prediction Approach for Stock Market Volatility Based on Time Series
Data." IEEE Access 7 (2019): 17287-17298.

 Barbosa, Manuel Romano, and Antonio M. Lopes. "Temperature time series:


Pattern analysis and forecasting." 2017 4th Experiment@ International
Conference (exp. at'17). IEEE, 2017.

 Dmytro, Ksenzovets, Telenyk Sergii, and Pysarenko Andiy. "ARIMA forecast


models for scheduling usage of resources in IT-infrastructure." 2017 12th
International Scientific and Technical Conference on Computer Sciences and
Information Technologies (CSIT). Vol. 1. IEEE, 2017.

 Siami-Namini, Sima, Neda Tavakoli, and Akbar Siami Namin. "A Comparison
of ARIMA and LSTM in Forecasting Time Series." 2018 17th IEEE
International Conference on Machine Learning and Applications (ICMLA).
IEEE, 2018.

 Wang, Bo, et al. "The forecast on the customers of the member point platform
built on the blockchain technology by ARIMA and LSTM." 2018 IEEE 3rd
International Conference on Cloud Computing and Big Data Analysis
(ICCCBDA). IEEE, 2018.

 Ye, Tian. "Stock forecasting method based on wavelet analysis and ARIMA-
SVR model." 2017 3rd International Conference on Information Management
(ICIM). IEEE, 2017.

 www.wikipedia.com

5
12 Review of Conference/Journal Papers support- ing Project idea

INTRODUCTION TO TIME SERIES DATA ANALYSIS

1.1 Introduction to Time Series Data:

A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a
time series is a sequence taken at successive equally spaced points in time. Thus, it is a sequence of discrete-
time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of
the Dow Jones Industrial Average.

Time series are very frequently plotted via line charts. Time series are used in statistics, signal
processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake
prediction, electroencephalography, control engineering, astronomy, communications engineering, and largely
in any domain of applied science and engineering which involves temporal measurements.

Time series analysis comprises methods for analyzing time series data in order to extract meaningful
statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future
values based on previously observed values. While regression analysis is often employed in such a way as to
test theories that the current values of one or more independent time series affect the current value of another
time series, this type of analysis of time series is not called "time series analysis", which focuses on comparing
values of a single time series or multiple dependent time series at different points in time. Interrupted time
series analysis is the analysis of interventions on a single time series.

Time series data have a natural temporal ordering. This makes time series analysis distinct from cross-
sectional studies, in which there is no natural ordering of the observations (e.g. explaining people's wages by
reference to their respective education levels, where the individuals' data could be entered in any order). Time
series analysis is also distinct from spatial data analysis where the observations typically relate to geographical
locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses).
A stochastic model for a time series will generally reflect the fact that observations close together in time will
be more closely related than observations further apart. In addition, time series models will often make use of
the natural one-way ordering of time so that values for a given period will be expressed as deriving in some
way from past values, rather than from future values (see time reversibility.)

Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete
symbolic data (i.e. sequences of characters, such as letters and words in the English language.

1.2 Motivation behind Time Series Analysis:


We come across many instances in real life where we try to forecast what will happen in future
(tomorrow, next week, next month, next year or may be in coming years etc.). Few common examples are:

6
1.What will be the stock price after a month?

2.What revenue business will make next year?

3.What will be the air temperature tomorrow? and many more…

If you analyze these questions, one common factor in all the questions is TIME. Thus, when data is
recorded on a timely basis It is called as a time series and analysis on this data is known as Time Series
Analysis.

1.3 Time Series Data Analysis Using ARIMA:

ARIMA models are, in theory, the most general class of models for forecasting a time series
which can be made to be “stationary” by differencing (if necessary), perhaps in conjunction with nonlinear
transformations such as logging or deflating (if necessary). A random variable that is a time series is stationary
if its statistical properties are all constant over time. A stationary series has no trend, its variations around its
mean have a constant amplitude, and it wiggles in a consistent fashion, i.e., its short-term random time patterns
always look the same in a statistical sense. The latter condition means that its autocorrelations (correlations
with its own prior deviations from the mean) remain constant over time, or equivalently, that its power
spectrum remains constant over time. A random variable of this form can be viewed (as usual) as a
combination of signal and noise, and the signal (if one is apparent) could be a pattern of fast or slow mean
reversion, or sinusoidal oscillation, or rapid alternation in sign, and it could also have a seasonal component.
An ARIMA model can be viewed as a “filter” that tries to separate the signal from the noise, and the signal is
then extrapolated into the future to obtain forecasts.

The ARIMA forecasting equation for a stationary time series is a linear (i.e., regression-type)
equation in which the predictors consist of lags of the dependent variable and/or lags of the forecast errors.
That is:

Predicted value of Y = a constant and/or a weighted sum of one or more recent values of Y and/or
a weighted sum of one or more recent values of the errors.

Introduction to Time Series Data Analysis:

Time series analysis is a statistical technique that deals with time series data, or trend analysis.  Time series data
means that data is in a series of particular time periods or intervals. 

A set of data depending on the time

A series of values over a period of time

Collection of magnitudes belonging to different time periods of some variable or composite of variables such as
production of steel, per capita income, gross national income, price of tobacco, index of industrial production.

Prediction of time series data is a challenging task mainly due to the unprecedented changes in economic trends

7
and conditions in one hand and incomplete information on the other hand. Market volatility in recent years has introduced
serious concerns for economic and financial time series forecasting. Therefore, assessing the accuracy of forecasts is
necessary when employing various forms of forecasting methods, and more specifically forecasting using regression
analysis as they have several limitations in applications.

The data is considered in three types:

 Time series data: A set of observations on the values that a variable takes at different times.

 Cross-sectional data: Data of one or more variables, collected at the same point in time.

 Pooled data: A combination of time series data and cross-sectional data.

Terms and concepts:

 Dependence: Dependence refers to the association of two observations with the same variable,
at prior time points.

 Stationarity: Shows the mean value of the series that remains constant over a time period; if
past effects accumulate and the values increase toward infinity, then stationarity is not met.

 Differencing: Used to make the series stationary, to De-trend, and to control the auto-
correlations; however, some time series analyses do not require differencing and over-differenced series can
produce inaccurate estimates.

 Specification: May involve the testing of the linear or non-linear relationships of dependent


variables by using models such as ARIMA, ARCH, GARCH, VAR, Co-integration, etc.

 Exponential smoothing in time series analysis: This method predicts the one next period value
based on the past and current value.  It involves averaging of data such that the non-systematic components of
each individual case or observation cancel out each other.  The exponential smoothing method is used to predict
the short-term predication.  Alpha, Gamma, Phi, and Delta are the parameters that estimate the effect of the time
series data.  Alpha is used when seasonality is not present in data.  Gamma is used when a series has a trend in
data.  Delta is used when seasonality cycles are present in data.  A model is applied according to the pattern of
the data.  Curve fitting in time series analysis: Curve fitting regression is used when data is in a non-linear
relationship. The following equation shows the non-linear behaviour:

 Dependent variable, where case is the sequential case number.

8
 Curve fitting can be performed by selecting “regression” from the analysis menu and then
selecting “curve estimation” from the regression option. Then select “wanted curve linear,” “power,”
“quadratic,” “cubic,” “inverse,” “logistic,” “exponential,” or “other.”

Chapter 2

Literature survey of Time Series Data Analysis

Two important aspects of time series data mining can be identified as forecasting and classification.
Time-series forecasting has been performed predominantly using statistical-based methods, for example, the
linear autoregressive (AR) models because of their flexibility to model many stationary processes. These
include the well-known ARIMA (autoregressive moving average) model and its extensions by Weron and
Misiorek for short-term time series forecasting. The ARIMA model assumes a linear relationship between the
lagged variables and produces only a coarse approximation to real-world complex systems and generally fails
to accurately predict the evolution of nonlinear and non-stationary processes.[11]

ARIMA model performance frequently degrades considerably whenever time trends and seasonality
features are present in the highly fluctuating time series data. Methods such as the linear autoregressive
integrated moving average (ARIMA), which are based on the evolution of the increments are used at times to
remove/reduce first-order (moment) non-stationarity. However, differencing generally amplifies the high
frequency noise in the time series, and great effort is thus required to determine the order of an ARIMA model.
Also, ARIMA models are largely limited to capturing the first-order non-stationarity in a time series data.
Engle introduced the ARCH (autoregressive conditional heterodasticity) model to capture the second-order
(moment) non-stationarity; i.e., time-varying conditional variance or volatility.[11]

The GARCH (generalized autoregressive conditional heterodasticity) model, developed later, and
represents the variance of the error term as a function of its auto regressive terms, thereby allowing a more
parsimonious representation of the time series. Further, threshold nonlinear ARIMA models (TAR) were
proposed by Tong which successfully applied for time series forecasting in economics and neuroscience among
other fields. Krishnamurthy and Yin combined a hidden Markov model and AR models under a Markov
regime, where AR parameters switch in time according to the realization of a finite-state Markov chain, for
nonlinear time series forecasting. 175 However, most of these methods tend to be limited for nonlinear and

9
stationary time series forecasting by the local linearity assumption implicit with an AR-type structure. Over the
past few decades, artificial neural networks (ANNs) those exhibit superior performance on classification and
regression problems in machine learning domain have attracted tremendous attention in the time series
forecasting community. Compared to statistics-based forecasting techniques, neural network approaches have
several unique characteristics, including: 1) being both nonlinear and data driven; 2) having no requirement for
an explicit underlying model (nonparametric); and 3) being more flexible and universal, thus applicable to
more complicated models. Thus, neural networks have been used extensively for a wide range of applications
in time series forecasting varying from financial, economic, to energy systems, earthquakes, and weather.
These models do not need prior assumptions on the form of nonlinearity and are known as universal
approximators since they can approximate any continuous function to an arbitrary precision. A recent review of
NN models for time series forecasting has been provided by Zhang. Feed-forward Neural Network models
(FNNs) parameterized with a back-propagation algorithm have been employed for nonlinear time series
forecasting. They are known to outperform traditional statistical methods such as regression and Box–Jenkins
approaches in functional approximation, but they assume the dynamics underlying time series are time-
invariant. FNNs with recurrent feedback connections have also been attempted for time series forecasting. Such
dynamic Recurrent NN (RNN) models allow forecasting of nonlinear time series occurring in various fields.
Menezes and Barreto built a recurrent network structure of nonlinear AR models with exogenous input for
multi-step forecasting of chaotic time series. Various types of Radial Basis Function (RBF) NN models, such
as those employing dynamic regularization, orthogonal least squares learning, and recursion have been
investigated to capture different forms of trends and volatility in the time series. Barreto et al reviewed time
series forecasting approaches using Self-Organizing Map (SOM) NN models. The local approximation
property inherent in these models can improve the forecasting accuracy of nonlinear time series compared to
global models such as FNN. Additionally, they obviate the need to specify the number of neurons in advance
by allowing the network architecture to grow based on the data. Ensemble or hybrid NN models, such as
wavelet NN models, have also been attempted for nonlinear time series forecasting. SVM-based forecasting
methods use a class of generalized regression models, such as Support Vector Regression (SVR) and Least-
Squares Support Vector Machines (LSSVMs), that are parameterized using convex quadratic programming
methods. SVMs are categorized into linear, Gaussian or RBF, polynomial, and multilayer perceptron
classifiers. A linear regressor is then constructed by minimizing the structural risk minimization (the upper
bound of the generalization error), leading to better forecasting performance than conventional techniques.
Recently, extreme learning machine (ELM), a new type of neural network has been introduced for regression
and classification problems. Though ANN is found to be a successful forecasting tool in large number of
applications, it suffers from the limitations like black box technique, over fitting and gets trapped in local
minima. To overcome these limitations researchers preferred the hybrid technique to develop the efficient
forecasting model. A combination of wavelet and Takagi Sugeno Kang (TSK) fuzzy rules-based system is
applied to predict financial time series data of Taiwan stock market. Fuzzy logic theory is preferred by many
researchers because it is an efficient tool to handle uncertainties. A fuzzy time series method based on a
multiple period modified equation derived from adaptive expectation model is used to forecast the Taiwan
stock exchange. A fuzzy neural network is used to forecast financial time series where genetic algorithm and

10
gradient descent learning algorithm are used alternatively in an iterative manner to adjust the parameters until
the error is less than the required value. A hybrid neuro fuzzy architecture based on Kalman filter has been
applied to predict financial time series taking Mackey glass time series as experimental data. A combination of
improved particle swarm optimization (PSO) algorithm and fuzzy neural network has been adopted to predict
Shanghai stock market indices. He has also applied genetic fuzzy neural network to forecast Shenzhen stock
indices. A neural fuzzy model has been applied to forecast sales data of a well-known convenience store
Franchise Company in Taiwan where weights are generated by GA. Interval type-2 fuzzy neural networks have
been used to forecast financial time series data. Both PSO and differential evolution (DE) algorithms are used
for training the weights of the network 177 Most of the models reviewed above involve batch processing,
where the model is fit and updated intermittently using batches of historic data. However, the curse of
dimensionality due to the prohibitive computational effort, memory requirements, and large data sizes hampers
their applicability to many real-world problems, especially for online process monitoring. A variety of
sequential (also known as online or recursive) forecasting models, such as Hidden Markov Models (HMMs),
are investigated to surmount this limitation. An HMM is a special class of mixture models, where the observed
time series y(t) is treated as a function of the underlying, unobserved states vector. A state vector may be
reconstructed from autoregressive terms of y(t). Generally speaking, state space models such as Kalman Filter
(KF) and Particle Filter (PF) can be classified as HMMs.[11]

In order to relax restrictive Gaussian and linearity assumptions in KF, Extended Kalman filters (EKFs)
have been attempted for nonlinear time series forecasting. An EKF model still assumes a Gaussian posterior
and uses a first-order Taylor series expansion to approximate state dynamics. Therefore, Unscented KFs
(UKFs) have been introduced to overcome the limitations of EKFs. Instead of local linearization and to avoid
the Jacobian matrix calculation inherent in EKFs, UKFs choose a small sample of points to achieve a more
accurate estimate of local dynamics, and the evolution of these sample points is propagated at each estimation
step. All the above methods can be applied to classification of nonlinear and nonstationary time series
databases, which are also an important aspect of data mining.[11]

1.Kalid Yunus et al [1] presents this paper, a modified auto regressive integrated moving average (ARIMA)
modelling technique which could capture time correlation and possibility distribution of determined wind-
pace time collection records is offered. The technique introduces frequency decomposition (splitting the wind-
speed information into high frequency (HF) low-frequency (LF) components), shifting, and limiting further to
differencing and energy transformation that are used within the trendy ARIMA modelling system.

2.Yi-Shian Lee et al [2] which is a conventional numerical method, is employed in many fields to build
models for forecasting time series. Although ARIMA can be adopting to obtain a highly accurate linear
forecasting model, it cannot exactly forecast nonlinear time series. Artificial neural network (ANN) can be used
to build more accurate forecasting model than ARIMA for nonlinear time series, but explanation the meaning
of the hidden layers of ANN is hard and, moreover, it does not yield a mathematical equation. This revision
proposes a mixture forecasting model for nonlinear time series by combine ARIMA with genetic programming

11
(GP) to improve upon both the ANN and the ARIMA forecasting model. Finally, some real data sets are
adopted to show the efficiency of the proposed forecasting model

3.Guoqiang Liu et al [3] in the software program reliability boom section, the character of the failure
records in an experience, decided by means of the software testing method. A hybrid version is proposed for
medium and lengthy-term software program failure time forecasting on this paper. The hybrid version consists
of two techniques, Singular Spectrum Analysis (SSA) and ARIMA on this version, the time series of software
failure time are firstly decomposed into numerous sub-series corresponding to some tendentious and
oscillation (periodic or quasiperiodic) components and noise by using the usage of SSA after which every sub-
series is expected, respectively, through the best ARIMA version, and lastly a correction system is conducted
for the sum of the prediction results to ensure the residual to be a natural random collection.

4. A. Vaccaro et al [4] the paper proposes hybrid architecture for electricity price forecasting. The proposed
architecture combines the reward of the easy-to-use and comparatively simple to- tune Auto regressive
Integrated Moving Average (ARIMA) model and the approximation power of local learning techniques. The
architecture is robust and more accurate than the individual forecasting methodologies on which it is based,
since it combines a reliable built-in linear model (ARIMA) with an adaptive dynamic corrector (Lazy Learning
algorithm). The corrector model is sequentially updated, in order to adjust the whole architecture to varying
market conditions. Detailed simulation studies show the effectiveness of the proposed hybrid learning
methods for forecasting the volatile Hourly Ontario Energy Prices (HOEPs) of the Ontario, Canada, and
electricity market.

5.Takaomi Hirata [5] Time series records examine and prediction may be very essential to the
observe of nonlinear phenomenon. Studies of time series prediction have protected records when
you consider that last century, linear models along with ARIMA, and nonlinear models consisting of
multi-layer perception (MLP) are well-known. Because the nation-of-art approach, a deep belief net
(DBN) using a couple of Restricted Boltzmann machines (RBMs) changed into proposed these days.
In this have a look at advice a novel prediction method which composes not simplest a form of DBN
with RBM and MLP however additionally ARIMA. Prediction experiments for the time collection of
the actual records and chaotic time collection were performed, and effects confirmed the
effectiveness of the proposed method.

6.Ling Wang [6] presents Based totally on the evaluation of the measured put on records and the
wear characteristics in terms the wheels of Guangzhou Metro Line1, the accumulative put on
prediction approach of metro wheels based totally on the ARIMA (p, d, q) model is proposed in this
paper. According to the time series modelling method of the ARIMA (p, d, q) model, the stationarity
evaluation and transformation of the metro wheel Z. Asha Farhat et al, International Journal of
12
Computer Science and Mobile Computing, Vol.5 Issue.8, August- 2016, pg. 104-109 2016, IJCSMC All
Rights Reserved 106 wear records are described at first. Then, with the application of the AIC
criterion and the most likelihood Estimation method, the model order is determined and the version
parameters are derived for the ARIMA (p, d, q) model in the end, the flange thickness and the
diameter of the metro wheels will be anticipated by means of this advanced ARIMA(p, d, q) model.
The effects show that the proposed prediction method is straightforward and effective for the quick-
term prediction of the metro wheel.

7.Theresa Hoang Diem Ngo [7] a time series is a put of ethics of an exacting variable that happen
over an age of time in a positive pattern. The most ordinary patterns are rising or falling trend, cycle,
seasonality, and uneven fluctuations. To replica a time sequence event as an occupation of its
history values, analysts recognize the outline with the supposition that the pattern motivation keep
on in the future. Applying the Box-Jenkins style, this paper emphasizes how to recognize a fitting
time series replica by identical behaviors of the taster autocorrelation purpose (ACF) and fractional
autocorrelation purpose (PACF) to the academic autocorrelation functions. In addition to model
recognition, the paper examines the meaning of the stricture estimate, checks the diagnostics, and
validates the forecasts.

8.Eliete Nascimento Pereira [8] present to pick up time series forecasts the wavelet decompose has been
applied. The mixture of forecasting method as the Autoregressive incorporated Moving Average (ARIMA) and
false Neural Networks have been using to realize a superior merit time series forecasting than. This paper
proposed a hybrid model collected of wavelet decompose, ARIMA and neural network Multilayer Perception.
These models are mutual linearly then yielding the time sequence forecasting. The sequence deliberate is the
Wolf's sunspots and the British pound/US dollar swap in excess of rate data. The contrast of the projected
model in this paper with writing indicates an efficient way to advance forecasting.

9.W. Jacobs [9] presents This study aims to predict the standards of the time series of milk insist in a
dairy industry by combining forecasting of Box-Jenkins and artificial neural network replica and
contrast the results to the individual models, exemplifying the joint forecast for the production
planning. Eight prediction combining technique were used and, behind the use of statistical
methods, the consequences obtain by fitting the Box- Jenkins and artificial neural network template
were contrast with the results obtained in the future combinations. The results showed that the
mixture of seasonal Box-Jenkins and de seasonalize artificial neural network model by the inverse
denote square method, provide a performance in the forecast for six months in front 66.5% higher
than the individual models, where the combination of forecasts provided a root mean square error
of 1.43 and mean absolute percentage error of 2.16. The forecast for 12 months ahead, the

13
presentation of the mixture was 56.5% higher compare to entity models, with root mean square
error of 2.86 and mean absolute percentage error of 3.70%. In both cases, the combination of
predictions showed superior results.

10.Ratnadip Adhikari [10] presents development of time series forecasting correctness through combining numerous
models is an important as well as a dynamic area of research. As a result, a variety of forecasts combination methods have
been urban in literature. However, most of them are based on straightforward linear ensemble strategies and therefore
ignore the possible relationships between two or more participate models. In this paper, propose a robust weighted
nonlinear group technique which considers the entity forecasts from different models as well as the correlation among
them while combining. The planned ensemble is constructing using three well-known forecasting models and is
experienced for three real-world time series. A comparison is complete among the proposed scheme and three others
widely used linear combination method, in terms of the obtain forecast errors. This contrast shows that our ensemble
scheme provides significantly lesser forecast errors than each entity model as well as each of the four linear combination
methods.

Table showing the Name of the Data Scientist, Year, Field and Method of Forecasting.

Name Year Field Method

Gera 1959 Egg Industry Econometric Model

Suits 1962 U. S. Economy Econometric Model

Bluestone 1963 Broiler Industry Seasonal Adjustment and


Data Filtering

A. Vaccaro and S. I. 2016 Electricity Price ARIMA


Vagropolous

Takaomi Hirata 2015 Restricted Boltzmann ARIMA


Machine

Ling Wang 2015 Metro Wheels ARIMA

Theresa Hoang Diem Ngo 2015 Enhancements in ARIMA ARIMA

Eliete Nascimento Pereira 2015 Economy of USD ARIMA

W. Jacobs 2016 Dairy Industry ARIMA

Ratnadip Adhikari 2015 Enhancement in ARIMA ARIMA

Table 1.1

14
13 Plan of Project Execution
Chapter 3

Importance of Time Series Data Analysis:

Time Series is a sequence of data-points measured at a regular time-intervals over a period of


time. Irregular data does not form Time-Series. [12]

It uses statistical methods to analyse time series data and extract meaningful insights about the
data.

The data points are collected over a period. These data points (past values) are analysed to
forecast a future. Obviously, it is time-dependent.

Time Series Analysis helps us to recognize the major components in a time series data

It has four main components:

 Trend: Trend is the increase or decrease in the series over a period of time, it persists
over a long period of time.

 Seasonality: Regular pattern of up and down fluctuations. It is a short-term variation


occurring due to seasonal factors.

 Cyclicity: It is a medium-term variation caused by circumstances, which repeat in


irregular intervals.

 Irregularity: It refers to variations which occur due to unpredictable factors and also do
not repeat in particular patterns.

Benefits & Applications of Time Series:


15
 It helps to achieve various objectives:

 1. Descriptive Analysis: determines trends and patterns of future using graphs and other
tools.

 2. Forecasting: It is used extensively in financial, business forecasting based on historical


trends and patterns

 3. Explanative Analysis: to study cross-correlation/relationship between two time series


and their dependency on one another.

 The biggest advantage of using time series analysis – It can be used to understand the
past as well as predict the future.

Time Series Plot:

A usual time series plot having trend and seasonal components looks like:

Fig 1.1[12]
Here, you can see that we have data points spread across 4 years and the trend is increasing
over time.

A time series is important in business and policy planning:

 It studies the past behaviour of the phenomenon.

 It compares the current trends with that in the past or the expected trends.

16
 It is used in forecasting sales, profit etc. and policy planning by various organizations.

 The cyclic variation helps us understand the business cycles.

 The seasonal variation is very useful for businesses and retailers as they earn more in
certain seasons.

The factors that are responsible for bringing about changes in a time series, also called the
components of time series.

You may have heard people saying that the price of a particular commodity has increased or
decreased with time. This commodity can be anything like gold, silver, any eatables, petrol,
diesel etc. Also, you may have heard that the rate of interest has increased in banks. The rate of
interest for home loans has decreased. What are all these? How are they useful to us? These
types of data are the time series data.

Considering the patterns in the time series is an important aspect in time series analysis as
it helps in selection of models. However the patterns in data can be best analyzed when the
underlying different components of time series are examined [3].

3.1 Different components of time series


Any time series is composed of the following components [4].

 “Trend component (T)”


 “Cyclic component (C)”
 “Seasonal component (S)”
 “Irregular component (I)”
All these components can be combined in many ways. However, it is commonly supposed
that they are multiplied or added, likewise as below:

“ Y (t) = T(t) x C(t) x S(t) x I(t) ”


“ Y (t) = T(t) + C(t) + S (t) + I(t) ”

Where:

Y(t) = Time series Observation. In the above multiplicative model the assumption is that
all these constituents of a time series are not essentially independent, they are interrelated and
as such are supposed to impact each other. However the additive model consents to assume
tha t the four components are independent of one other.

17
Figure1: different variations [4].

3. Trend component
The trend component, an outcome of the lonng term movements of various factors
forms the chief component in a time series. A time series may exhibit upsurge or
downward movement or may show steady movements over an extensively long
duration of time. This overall movement is referred to as a trend component of a time
series [5]. Based on the pattern exhibited by a time series, a trend may be positive
or negative in nature. However, there are cases when a time series does not show
either an upward or a downward pattern, such series are stationary and have a
constant mean.
Example – Population growth time series displays an ascending trend, while descending
trend is displayed in series relating to epidemics.
Trend Component can be represented graphically as below in following figures:

18
Figure2: Trend component of Time Series[6].

Linear and Non-Linear Trend

If we plot the time series values on a graph in accordance with time t. The pattern of
the data clustering shows the type of trend. If the set of data cluster more or less
round a straight line, then the trend is linear otherwise it is non-linear (Curvilinear)
[10].

Figure 3: Linear and Non-Linear Trends[6].

19
fashion trend
120
dec, 100
100 nov , 90
New fashion

., 78 ., 80
80

60 ., 55
., 45

40 march, 30
fab, 20
20 jan , 10

0
0 1 2 3 4 5 6 7 8 9 10
time(months)

Graph1: fashion trend for a year (linear increasing trend).

4. Cyclic component
Cyclic component is present in a time series when the time series displays a rise
and fall of irregular time period. The interval of a cycle is expressed by the type of
business or industry being examined. The cyclic component usually stretches over longer
intervals, which may range from two or more years. Cyclical variation is displayed by
almost all categories of economic and financial time series. Illustration: Four segments
of a business cycle [4] with figure.
i) Success
ii) Decay
iii) Depression
iv) Retrieval

20
Figure4: Phases of a business
Cyclical Unemployment is unemployment due to a period of negative economic growth, or
economic slowdown. In a recession, cyclic a un employment will tend to rise sharply.

Peaks in unemployment correspond with swings in the economic cycle.

Figure5: Uk unemployment

5. Seasonal component
Seasonality occurs when the time series is influenced by seasonal factors and repeat at
regular periodic intervals like weekly, fortnightly, monthly, or through the identical quarter
for each year.

These variations come into play either because of the natural forces or man-made conventions.
The various seasons or climatic conditions play an important role in seasonal variations.

21
There are many factors like climate and weather conditions, traditional customs and habits
and many others responsible for causing seasonal variations. For instance - sale of woollen
cloths increase in winter, sale of ice-cream and cold drinks increase in summer, festival
season etc. Businessmen, shopkeeper and producers monitor the seasonal variation very
closely in order to make proper future plans and conquer the best returns over investments.

The effect of man-made conventions such as some festivals, customs, habits, fashions, and
some occasions like marriage is easily noticeable. They recur themselves year after year. An
upswing in a season should not be taken as an indicator of better business conditions.

Figure6: Demands of movies on weekend[2]

22
SALE OF WARM CLOTHES (Y) SALE OF WARM CLOTHES
250
dec, 220
200 dec, 200
nov, 175
150 nov, 155
oct, 120 jan, 135
jan, 120 oct, 120
100jan, 100 fab , 100
fab, 80 march, 80
march, 65 fab, 60
50 ., 55 ., 60
., 35
., 20 ., 20 march, 20
0 ., 10
0 5 10 15 20 25

MONTHS(X)

Graph2: sale of warm clothes

6. Irregular component
Irregular or random variation component of a time series is unpredictable. They are
initiated by unpredictable influence that are neither regular nor repeat in some pattern
[6].

These are sudden changes occurring in a time series which are unlikely to be
repeated. They are components of a time series which cannot be explained by trends,
seasonal or cyclic movements.
These variations are sometimes called residual or random components. These
variations, though accidental in nature, can cause a continual change in the trends,
seasonal and cyclical oscillations during the forthcoming period.

The random variations may be caused by various external factors like earthquake,
war, flood, etc. There is no statistical procedure to measure the random fluctuations in
a time series. While making predictions, the motive is to model all the components of a
time series to a point that we end up with only one component that remains
unexplained called as the random component.

Data Mining is an approach where analytically designed to discover data (market related or
business oriented) and in search of reliable patterns and/or logical relationships between variables.

23
Appropriate validation is used to find the patterns with the detected patterns among the data set. It is
also called as data discovery or knowledge discovery. Data mining enhance the revenue and reduce
the cost incurred for the exploration of data. The general research associated with stock market is
highly focusing on neither buy nor sell. But it fails to address the dimensionality and expectancy of
a naive investor .The general trend towards stock market among the society is that it is highly risky
for investment or not suitable for trade. The seasonal variance and steady flow of any index will
help both for existing and naïve investor to understand and make a decision to invest in the stock
market. To solve these types of problems, the time series analysis will be the best tool for forecast
and also to predict the trend. The trend chart will provide adequate guideline for the investor. Some
time it may not address or forecast the variations or steady flow of the market. It may forecast only
on particular season, but it is not adequate for long term decision making. The investors are very
much interested to know the past trend or flow, seasonal growth, or variations of the stock. A
general view or expectation is that it must give a holistic view of the stock market. Since, it is
essential to identify a model to show the trend with adequate information for the investor to make a
decision. It recommends that ARIMA is an algorithmic approach to transform the series is better
than forecasting directly, and also it gives more accurate results []. This approach has not included
any testing criteria for model estimation. Analysis of Price Causality and Forecasting in the Nifty
Market futures employed to investigate the short-run. This short run investigation will not bring any
significance among the naïve investor as well market penetrator. In this paper we focus on the real
world problem in the stock market. The seasonal trend and flow is the highlight of the stock market.
Eventually investors as well the stock broking company will also observe and capture the variations,
constant growth of the index. This will aid new investor as well as existing people will make a
strategic decision. It can be achieved by experience and the constant observations by the investors
[3]. In order to overcome the above said issues, we have suggested ARIMA algorithm in three steps,

Step 1: Model identification

Step 2: Model estimation

Step 3: Forecasting

3.1.1 Proposed Approach:

The Box-Jenkins methodology is a five-step process for identifying, selecting, and assessing
conditional mean models (for discrete, Univariate time series data.

Phase 1: Identification

Step 1: Data Preparation

• Transform the data to stabilize the attributes.

• Find the difference if it is not stationary; successively difference

• Series to attain stationary.

24
11

Step 2: Model Selection

• Examine data, plot ACF and PACF to identify potential models.

Phase 2: Estimation and Testing

Step 1: Estimation

• Estimate parameters in potential models.

• Select best model using AICBIC criterion.

Step 2: Diagnostics

• Check ACF/PACF of residuals.

• Test the residuals.

• Are the residuals are white

noise. Phase 3: Forecast the

application

• Forecasting the trend

• This model is used to forecast the future [3]

3.2 How Stocks Are Predicted Using ARIMA:

3.2.1 Methodology:
The method used in this study to develop ARIMA model for stock price forecasting is
explained in detail in subsections below. The tool used for implementation is Eviews software
version 5. Stock data used in this research work are historical daily stock prices obtained from two
countries stock exchanged. The data composed of four elements, namely: open price, low price, high
price and close price respectively. In this research the closing price is chosen to represent the price of
the index to be predicted. Closing price is chosen because it reflects all the activities of the index in a
trading day. To determine the best ARIMA model among several experiments performed, the
following criteria are used in this study for each stock index.

• Relatively small of BIC (Bayesian or Schwarz Information Criterion)

• Relatively small standard error of regression (S.E. of regression) • Relatively


high of adjusted R2

• Q-statistics and correlogram show that there is no significant pattern left in the
autocorrelation functions (ACFs) and partial autocorrelation functions (PACFs) of the residuals,
it means the residual of the selected model are white noise.[4]

11
3.2.2 ARIMA Model for Nokia Stock Index:

Nokia stock data used in this study covers the period from 25th April, 1995 to 25th February,
2011 having a total number of 3990 observations. Figure 1 depicts the original pattern of the series
to have general overview whether the time series is stationary or not. From the graph below the time
series have random walk pattern.Figure 2 is the correlogram of Nokia time series. From the graph,
the ACF dies down extremely slowly which simply means that the time series is nonstationary. If the
series is not stationary, it is converted to a stationary series by differencing. After the first difference,
the series “DCLOSE” of Nokia stock index becomes stationary as shown in figure 3 and figure 4 of
the line graph and correlogram respectively.[4]

Figure 1:- Graphical Representation of the Nokia Stock closing price

Figure 2:-The correlogram of Nokia stock price index

12
13

Figure 3:- Graphical representation of the Nokia stock price index after differencing

Figure 4:- The correlogram of Nokia stock price index after first differencing

13
In figure 5 the model checking was done with Augmented Dickey Fuller (ADF) unit root test on
“DCLOSE” of Nokia stock index. The result confirms that the series becomes stationary after the
first-difference of the series.

Table 1 shows the different parameters of autoregressive (p) and moving average (q) among the
several ARIMA model experimented upon . ARIMA (2, 1, 0) is considered the best for Nokia stock
index as shown in figure 6. The model returned the smallest Bayesian or Schwarz information
criterion of 5.3927 and relatively smallest standard error of regression of 3.5808 as shown in figure 6

Figure 7 is the residual of the series. If the model is good, the residuals (difference between actual
and predicted values) of the model are series of random errors. Since there are no significant spikes
of ACFs and PACFs, it means that the residual of the selected ARIMA model are white noise, no
other significant patterns left in the time series. Therefore, there is no need to consider any AR(p)
and MA(q) further.[4]

Figure 5:- ADF unit root test for DCLOSE of Nokia stock index.

14
Figure 6:- ARIMA estimation output with DCLOSE of Nokia index

Figure 7: Correlogram of residuals of the Nokia stock index

15
In forecasting form, the best model selected can be expressed as follows:

Table 2 :- Statistical Results of Different ARIMA Parameters For Nokia Stock Index.

3.2.3 Result of ARIMA Model for Nokia Stock Price Prediction:


Table 3 is the result of the predicted values of ARIMA (2, 1, 0) considered the
best model for Nokia stock index. Figure 15 gives graphical illustration of the level accuracy of the
predicted price against actual stock price to see the performance of the ARIMA model selected.
From the graph, is obvious that the performance is satisfactory.[4]

16
Table 3:- Sample Of Empirical Results of ARIMA of Nokia Stock Index.

17

17
Figure 8:- Graph of Actual Stock Price vs Predicted values of Nokia Stock Index

18

18
19
CHAPTER 4

IMPLEMENTATION OF ARIMA USING PYTHON


Adopting an ARIMA model for a time series assumes that the underlying process that
generated the observations is an ARIMA process. This may seem obvious, but helps to motivate the
need to confirm the assumptions of the model in the raw observations and in the residual errors of
forecasts from the model.[5]

Next, let’s take a look at how we can use the ARIMA model in Python. We will
start with loading a simple univariate time series.

This dataset describes the monthly number of sales of shampoo over a 3 year period.The
units are a sales count and there are 36 observations. The original dataset is credited to Makridakis,
Wheelwright, and Hyndman (1998).Diagram below describes the sale of shampoo over a three year
period.

Below is an example of loading the Shampoo Sales dataset with Pandas with a
custom function to parse the date-time field. The dataset is baselined in an arbitrary year, in this case
1900.

We can see that the Shampoo Sales dataset has a clear trend.This suggests that
the time series is not stationary and will require differencing to make it stationary, at least a
difference order of 1.

Figure 9: Time vs Bottles of Shampoo sold per month

19
20

Let’s also take a quick look at an autocorrelation plot of the time series. This
is also built-in to Pandas. The example below plots the autocorrelation for a large number of lags in
the time series.

Python Code:

from pandas import read_csv


from pandas import datetime
from matplotlib import
pyplot
from pandas.tools.plotting import autocorrelation_plot

def parser(x):
return datetime.strptime('190'+x, '%Y-%m')

series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True,


date_parser=parser)
autocorrelation_plot(series)
pyplot.show()

Running the example, we can see that there is a positive correlation with the first 10-to-12
lags that is perhaps significant for the first 5 lags.

A good starting point for the AR parameter of the model may be 5.

Figure 10:- Autocorrelation Plot of Shampoo Sales Data

20
The statsmodels library provides the capability to fit an ARIMA model.

An ARIMA model can be created using the statsmodels library as follows:

1. Define the model by calling ARIMA() and passing in the p, d, and q parameters.
2. The model is prepared on the training data by calling the fit() function.
3. Predictions can be made by calling the predict() function and specifying the index of the time
or times to be predicted.

Let’s start off with something simple. We will fit an ARIMA model to the entire Shampoo Sales
dataset and review the residual errors.

First, we fit an ARIMA(5,1,0) model. This sets the lag value to 5 for autoregression, uses a
difference order of 1 to make the time series stationary, and uses a moving average model of 0.

When fitting the model, a lot of debug information is provided about the fit of the linear regression
model. We can turn this off by setting the disp argument to 0.

from pandas import read_csv


from pandas import datetime
from pandas import DataFrame
from statsmodels.tsa.arima_model import ARIMA
from matplotlib import pyplot

def parser(x):
return datetime.strptime('190'+x, '%Y-%m')

series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True,


date_parser=parser)
# fit model
model = ARIMA(series, order=(5,1,0))
model_fit = model.fit(disp=0)
print(model_fit.summary())
# plot residual errors
residuals = DataFrame(model_fit.resid)

21
residuals.plot()
pyplot.show()
residuals.plot(kind='kde')
pyplot.show()
print(residuals.describe())

Running the example prints a summary of the fit model. This summarizes the coefficient values used
as well as the skill of the fit on the on the in-sample observations.[5]

Table 4:- Arima Model Results

First, we get a line plot of the residual errors, suggesting that there may still be
some trend information not captured by the model.

22
23

Figure 11:- ARIMA Residual Error Line Plot


Next, we get a density plot of the residual error values, suggesting the errors
are Gaussian, but may not be centered on zero.The distribution of the residual
errors is displayed. The results show that indeed there is a bias in the prediction (a
non-zero mean in the residuals).

Figure 12: Density Plot of Residual Error Value

Note, that although above we used the entire dataset for time series
analysis, ideally we would perform this analysis on just the training dataset
when developing a predictive model.[5]

23

You might also like