Time Series and Forecasting Using R

Last Updated : 21 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Time series forecasting is the process of using historical data to make predictions about future events. It is commonly used in fields such as finance, economics, and weather forecasting. R is a powerful programming language and software environment for statistical computing and graphics that is widely used for time series forecasting.

What is Time Series Forecasting?

Time series forecasting focuses on making predictions about future events or values using past and present data points. Data points are gathered over time, and this technique is used in a variety of fields, such as sales, finance, weather forecasting, and economics. The following are some important ideas and methods to consider when carrying out time series forecasting.

To apply any models of time series forecasting we need to make it stationary. Stationary means time series should have constant mean, constant variance and constant autocorrelation. We need to remove seasonality and trends from the data. Seasonality can be additive or multiplicative. We use different transformation on data to remove trends and seasonality. We can test the time series stationary or not by using Dickey-Fuller test.

There are multiple methods and tricks present to make the time series stationary. We need to remove seasonal elements and trends before making any forecast. Most of the real-world time series data contains both seasonality and trends. Time series forecasting model does not need these properties for better forecasting. To detrend the time series we can take moving average. This can be done by window functions . We can also use linear regression and fit a line along with the data. Removing seasonality is difficult and it needs a lot of domain knowledge. For example, in a drug sale, seasonality may exist due to winter in each year. We can use locally weighted scatterplot smoothing in some cases to remove seasonality.

Time Series Data: Time series data consists of observations or measurements collected at regular time intervals. These data points are typically plotted over time, and the goal of time series forecasting is to predict future values in this sequence.

Components of Time Series:

  • Trend: The long-term movement or direction in the data. Trends can be upward (increasing), downward (decreasing), or flat (constant).
  • Seasonality: Repeating patterns or fluctuations that occur at fixed intervals. For example, sales of winter clothing may exhibit a yearly seasonality pattern.
  • Cyclic Patterns: Longer-term, non-seasonal patterns that may not have fixed intervals. Cyclic patterns represent oscillations in the data that are not tied to a specific season.
  • Irregularity (Noise): Random, unpredictable fluctuations in the data.

Time Series Forecasting Methods

Time series forecasting methods are techniques used to make predictions about future values in a time series based on historical and current data. There are several well-established methods for time series forecasting, each with its own strengths and weaknesses. Here are some of the most commonly used time series forecasting methods.

  1. Autoregressive Integrated Moving Average (ARIMA)
  2. Seasonal Decomposition of Time Series (STL)
  3. Seasonal Autoregressive Integrated Moving-Average (SARIMA)

There are so many more methods are available but these are the most common methods for time series forecasting.

Packages for Time Series Forcasting in R

In R Programming Language There are several R packages available for time series forecasting, including.

  1. “forecast”: This package provides a wide range of methods for time series forecasting, including exponential smoothing, ARIMA, and neural networks.
  2. “tseries”: This package provides functions for time series analysis and forecasting, including functions for decomposing time series data, and for fitting and forecasting models such as ARIMA.
  3. “prophet”: This package is developed by Facebook, it provides a simple and fast way to perform time series forecasting using additive models. It is designed for business time-series data and it is easy to interpret and to use.
  4. “rugarch”: This package provides a flexible and powerful framework for fitting and forecasting volatility models, including GARCH and its variants.
  5. “stlplus”: This package provides functions for decomposing time series data using the STL algorithm, which is useful for removing seasonal and trend components from time series data.

To use these packages, first, they need to be installed and loaded into R. Then, the time series data must be prepared and cleaned, and the appropriate forecasting method can be applied. The forecasted values can then be plotted, evaluated and compared to the actual values.

Forecasting is nothing but a prediction. Analyzing time-series data, observing hidden patterns in it, and predicting future trends using the previous ones is called forecasting. Some of the cool real-world applications of time series forecasting are as follows:

Application

Timeseries data

Forecasting Results

Weather predictionThe temperature of a place collected for one monthForecast weather for the next few months in that place
Stock market price predictionStock market price data for one day and patternsPredict the stock market price for the next day
E-commerce and RetailSales data of a company for one yearPredict revenue and number of sales for next year
Industrial managementThe raw material used and available in 3-5 yearsRaw materials requirements prediction, profit prediction

Here we will use the AutoRegressive Integrated Moving Average which is nothing but the ARIMA method for forecasting using time series data. We will use AirPassengers(this dataset contains US airline passengers from 1949 to 1960) and forecast passenger data for 10 years that is from 1960-1970.

R
# this line will download forecast package in your IDE
install.packages('forecast')

library('forecast') 

To check the kind of class the “AirPassengers” dataset belongs to using we can use the class method.

R
class(AirPassengers)

Output:

ts

“ts” means it is timeseries data.

We can also see the content of the dataset.

R
AirPassengers

Output:

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432

Box Plot for AirPassengers monthly count

R
# Create a color palette for the box plot
my_colors <- rainbow(12)

# Box plot by month with customizations
boxplot(split(AirPassengers, cycle(AirPassengers)),
        xlab = "Month", ylab = "Number of Passengers",
        col = my_colors,  # Assign colors to each box
        border = "black",  # Set the border color
        main = "Monthly Air Passenger Counts by Month",
        names = month.abb,  # Use abbreviated month names as labels
        outline = FALSE)  # Remove outliers

Output:


Rplot16

Monthly Air Passenger Counts by Month


Plot the dataset to observe how the values have been changing from 1949 to 1960.

R
plot(AirPassengers)

Output:

Time v/s Passengers monthly count

Time v/s Passengers monthly count

Time series data are decomposed into three components :   

  1. Seasonal – Patterns that show how data is being changed over a certain period of time. Example – A clothing e-commerce website will have heavy traffic during festive seasons and less traffic during normal times. Here it is a seasonal pattern as value is being increased only at a certain period of time.
  2. Trend – It is a pattern that shows how values are being changed. For example how a website is running overall if running successfully trend goes up, if not, the trend comes down.
  3. Random – The remaining data of the time series after seasonal trends are removed is a random pattern. This is also known as noise.
R
data<-ts(AirPassengers, frequency=12)
d<-decompose(data, "multiplicative")
plot(d)

Output:

Patterns in the time series data

Patterns in the time series data

The parameter multiplicative is added because time series data changes with the trend, if not so, such kinds of data are called “additive”.

Now we forecast 10 years of data by using Arima() function.

R
model<-auto.arima(AirPassengers)
summary(model)

# h = 10*12 because, forecast is for 10 years for all 12 months
f<-forecast(model, level=c(95), h=10*12)
plot(f)

Output:

Series: AirPassengers 
ARIMA(2,1,1)(0,1,0)[12]
Coefficients:
ar1 ar2 ma1
0.5960 0.2143 -0.9819
s.e. 0.0888 0.0880 0.0292
sigma^2 = 132.3: log likelihood = -504.92
AIC=1017.85 AICc=1018.17 BIC=1029.35
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 1.3423 10.84619 7.86754 0.420698 2.800458 0.245628 -0.00124847
Forecast for the next 10 years

Forecast for the next 10 years

The provided ARIMA(2,1,1)(0,1,0)[12] model is designed for time series forecasting with a 12-month seasonal pattern. It includes a second-order autoregressive (AR) component, first-order differencing (I) to make the series stationary, and a first-order moving average (MA) term.

The model estimates the coefficients for these components and reports error measures. The AIC, AICc, and BIC values help assess model quality, with lower values indicating a better fit. The error measures, including RMSE and MAPE, evaluate the model’s predictive accuracy on the training data, while the log likelihood measures how well the model fits the data. Further evaluation on new data is needed to confirm its forecasting performance.

The shaded region covers all the values that can possibly occur in the future10 years and the blue color pattern is the average of all values in the shaded part. This is how we can forecast values using any time series dataset.

Advantages of using R for Time Series Forecasting:

  1. Large community: R has a large and active community of users and developers, which means that there are many resources and packages available for time series forecasting, and it also allows for easy collaboration and sharing of knowledge.
  2. Flexibility: R provides a wide range of tools and packages for time series forecasting, which allows for flexibility in selecting the appropriate method for a given dataset.
  3. Open-source: R is an open-source programming language, which means that it is free to use and can be modified to fit specific needs.
  4. Easy to use: R has a simple and intuitive syntax, which makes it easy to learn and use.
  5. High-quality visualization: R has powerful data visualization capabilities, which allows for easy interpretation and analysis of time series data.

Disadvantages of using R for Time Series Forecasting:

  1. Speed: R is an interpreted language, which can make it slower than compiled languages such as C or C++ for large datasets.
  2. Memory usage: R can be memory-intensive, which can be a problem for large datasets.
  3. Limited scalability: R is not designed for large-scale parallel computing, so it may not be suitable for large-scale time series forecasting tasks.
  4. Steep learning curve: R is a powerful programming language, but it has a steep learning curve, which can make it difficult for beginners.
  5. Lack of standardization: R provides a wide range of tools and packages for time series forecasting, which can lead to a lack of standardization in the way that time series forecasting tasks are performed, this could make it difficult to compare results across different studies.


Similar Reads

Time Series Forecasting using Recurrent Neural Networks (RNN) in TensorFlow
Time Series Data: Each data point in a time series is linked to a timestamp, which shows the exact time when the data was observed or recorded. Many fields, including finance, economics, weather forecasting, and machine learning, frequently employ this kind of data. The fact that time series data frequently display patterns or trends across time, s
14 min read
Time Series Forecasting using Pytorch
Time series forecasting plays a major role in data analysis, with applications ranging from anticipating stock market trends to forecasting weather patterns. In this article, we'll dive into the field of time series forecasting using PyTorch and LSTM (Long Short-Term Memory) neural networks. We'll uncover the critical preprocessing procedures that
12 min read
Random Forest for Time Series Forecasting using R
Random Forest is an ensemble machine learning method that can be used for time series forecasting. It is based on decision trees and combines multiple decision trees to make more accurate predictions. Here's a complete explanation along with an example of using Random Forest for time series forecasting in R. Time Series ForecastingTime series forec
7 min read
TIme Series Forecasting using TensorFlow
TensorFlow emerges as a powerful tool for data scientists performing time series analysis through its ability to leverage deep learning techniques. By incorporating deep learning into time series analysis, we can achieve significant advancements in both the depth and accuracy of our forecasts. TensorFlow sits at the forefront of this transformative
8 min read
Univariate Time Series Analysis and Forecasting
Time series data is one of the most challenging tasks in machine learning as well as the real-world problems related to data because the data entities not only depend on the physical factors but mostly on the chronological order in which they have occurred. We can forecast a target value in the time series based on a single feature that is univaria
15+ min read
Time Series Analysis and Forecasting
Time series analysis and forecasting are crucial for predicting future trends, behaviors, and behaviours based on historical data. It helps businesses make informed decisions, optimize resources, and mitigate risks by anticipating market demand, sales fluctuations, stock prices, and more. Additionally, it aids in planning, budgeting, and strategizi
15+ min read
Python | ARIMA Model for Time Series Forecasting
A Time Series is defined as a series of data points indexed in time order. The time order can be daily, monthly, or even yearly. Given below is an example of a Time Series that illustrates the number of passengers of an airline per month from the year 1949 to 1960. Time Series Forecasting Time Series forecasting is the process of using a statistica
5 min read
Autoregressive (AR) Model for Time Series Forecasting
Autoregressive models, often abbreviated as AR models, are a fundamental concept in time series analysis and forecasting. They have widespread applications in various fields, including finance, economics, climate science, and more. In this comprehensive guide, we will explore autoregressive models, how they work, their types, and practical examples
11 min read
Multivariate Time Series Forecasting with GRUs
Multivariate forecasting steps up as a game-changer in business analysis, bringing a fresh perspective that goes beyond the limits of one-variable predictions. In this article, we will explore the world of multivariate forecasting, peeling back the layers to understand its core, explore its applications, and grasp the revolutionary influence it has
9 min read
Multivariate Time Series Forecasting with LSTMs in Keras
Multivariate forecasting entails utilizing multiple time-dependent variables to generate predictions. This forecasting approach incorporates historical data while accounting for the interdependencies among the variables within the model. In this article, we will explore the world of multivariate forecasting using LSTMs, peeling back the layers to u
8 min read
Practice Tags :