AutoCorrelation

Last Updated : 19 Mar, 2024
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

Autocorrelation is a fundamental concept in time series analysis. Autocorrelation is a statistical concept that assesses the degree of correlation between the values of variable at different time points. The article aims to discuss the fundamentals and working of Autocorrelation.

What is Autocorrelation?

Autocorrelation measures the degree of similarity between a given time series and the lagged version of that time series over successive time periods. It is similar to calculating the correlation between two different variables except in Autocorrelation we calculate the correlation between two different versions Xt and Xt-k of the same time series.

Calculation of Autocorrelation

Mathematically, autocorrelation coefficient is denoted by the symbol ρ (rho) and is expressed as ρ(k), where ‘k’ represents the time lag or the number of intervals between the observations. The autocorrelation coefficient is computed using Pearson correlation or covariance.

For a time series dataset, the autocorrelation at lag ‘k’ (ρ(k)) is determined by comparing the values of the variable at time ‘t’ with the values at time ‘t-k’.

[Tex]\rho(k) = \frac{Cov(X_t, X_{t-k})}{σ(X_t) \cdot σ(X_{t-k})} [/Tex]

Here,

  • Cov is the covariance
  • [Tex]\sigma [/Tex] is the standard deviation
  • Xt represents the variable at time ‘t’

Interpretation of Autocorrelation

  • A positive autocorrelation (ρ > 0) indicates a tendency for values at one time point to be positively correlated with values at a subsequent time point. A high autocorrelation at a specific lag suggests a strong linear relationship between the variable’s current values and its past values at that lag.
  • A negative autocorrelation (ρ < 0) suggests an inverse relationship between values at different time intervals. A low or zero autocorrelation indicates a lack of linear dependence between the variable’s current and past values at that lag.

Use of Autocorrelation

  • Autocorrelation detects repeating patterns and trends in time series data. Positive autocorrelation at specific lags may indicate the presence of seasonality.
  • Autocorrelation guides the determination of order of ARIMA and MA models by providing insights into the number of lag terms to include.
  • Autocorrelation helps to check whether a time series is stationary or exhibits trends and non-stationary behavior.
  • Sudden spikes or drops in autocorrelation at certain lags may indicate the presence of anomalies and outliers.

What is Partial Autocorrelation?

In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It is different from the autocorrelation function, which does not control other lags.

Partial correlation quantifies the relationship between a specific observation and its lagged values. This helps us to examine the direct influence of past time point on the current time point, excluding the indirect influence through the other lagged values. It seeks to determine the unique correlation between a specific time point and another time point, accounting for the influence of the time points in between.

[Tex]PACF(T_i, k) = \frac{[Cov(T_i|T_{i-1}, T_{i-2}…T_{i-k+1}], [T_{i-k}|T_{i-1}, T_{i-2}…T_{i-k+1}]}{\sigma_{[T_i|T_{i-1}, T_{i-2}…T_{i-k+1}]} \cdot \sigma_{[T_{i-k}|T_{i-k}, T_{i-2}…T_{i-k+1}]}} [/Tex]

Here,

  • [Tex]T_i| T_{i-1}, T_{i-2}…T_{i-k+1} [/Tex] is the time series of residuals obtained from fitting multivariate linear model to [Tex]T_{i-1}, T_{i-2}…T_{i-k+1} [/Tex] for predicting [Tex]T_i [/Tex].
  • [Tex]T_{i-k}|T_{i-1}, T_{i-2}…T_{i-k+1} [/Tex]is the time series of the residuals obtained from fitting a multivariate linear model to [Tex] T_{i-1}, T_{i-2}…T_{i-k+1} [/Tex] for predicting [Tex]T_{i-k} [/Tex].

Testing For Autocorrelation – Durbin-Watson Test

Durbin Watson test is a statistical test use to detect the presence of autocorrelation in the residuals of a regression analysis. The value of DW statistic always ranges between 0 and 4.

In stock market, positive autocorrelation (when DW<2) in stock prices suggests that the price movements have a persistent trend. Positive autocorrelation indicates that the variable increased or decreased on a previous day, there is a there is a tendency for it to follow the same direction on the current day. For example, if the stock fell yesterday, there is a higher likelihood it will fall today. Whereas the negative autocorrelation (when DW>2) indicates that if a variable increased or decreased on a previous day, there is a tendency for it to move in the opposite direction on the current day. For example, if the stock fell yesterday, there is a greater likelihood it will rise today.

Assumptions for the Durbin-Watson Test:

  • The errors are normally distributed, and the mean is 0.
  • The errors are stationary.

Calculation of DW Statistics

Where et is the residual of error from the Ordinary Least Squares (OLS) method.

The null hypothesis and alternate hypothesis for the Durbin-Watson Test are:

  • H0: No first-order autocorrelation in the residuals ( ρ=0)
  • HA: Autocorrelation is present.

Formula of DW Statistics

[Tex]d = \frac{\sum_{t=2}^{T}(e_t – e_{t-1})^2}{\sum_{t=1}^{T}e_{t}^{2}} [/Tex]

Here,

  • et is the residual at time t
  • T is the number of observations.

Interpretation of DW Statistics

  • If the value of DW statistic is 2.0, it suggests that there is no autocorrelation detected in the sample.
  • If the value is less than 2, it suggests that there is a positive autocorrelation.
  • If the value is between 2 and 4, it suggests that there is a negative autocorrelation.

Decision Rule

  • If the Durbin-Watson test statistic is significantly different from 2, it suggests the presence of autocorrelation.
  • The decision to reject the null hypothesis depends on the critical values provided in statistical tables for different significance levels.

Need For Autocorrelation in Time Series

Autocorrelation is important in time series as:

  1. Autocorrelation helps reveal repeating patterns or trends within a time series. By analyzing how a variable correlates with its past values at different lags, analysts can identify the presence of cyclic or seasonal patterns in the data. For example, in economic data, autocorrelation may reveal whether certain economic indicators exhibit regular patterns over specific time intervals, such as monthly or quarterly cycles.
  2. Financial analysts and traders often use autocorrelation to analyze historical price movements in financial markets. By identifying autocorrelation patterns in past price changes, they may attempt to predict future price movements. For instance, if there is a positive autocorrelation at a specific lag, indicating a trend in price movements, traders might use this information to inform their predictions and trading strategies.
  3. The Autocorrelation Function (ACF) is a crucial tool for modeling time series data. ACF helps identify which lags have significant correlations with the current observation. In time series modeling, understanding the autocorrelation structure is essential for selecting appropriate models. For instance, if there is a significant autocorrelation at a particular lag, it may suggest the presence of an autoregressive (AR) component in the model, influencing the current value based on past values. The ACF plot allows analysts to observe the decay of autocorrelation over lags, guiding the choice of lag values to include in autoregressive models.

Autocorrelation Vs Correlation

  1. Autocorrelation refers to the correlation between a variable and its past values at different lags in a time series. It focuses on understanding the temporal patterns within a single variable. Correlation representations the statistical association between two distinct variables. It focuses on accessing the strength and direction of the relationship between separate variables.
  2. Autocorrelation measures metrics as ACF and PACF, which quantify the correlation between a variable and its lagged values. Correlation measures using coefficients like Pearson correlation coefficient for linear relationships or Spearman rank correlation for non-linear relationships, providing a single value ranging from -1 to 1.

Difference Between Autocorrelation and Multicollinearity

Feature

Autocorrelation

Multicollinearity

Definition

Correlation between a variable and its lagged values

Correlation between independent variables in a model

Focus

Relationship within a single variable over time

Relationship among multiple independent variables

Purpose

Identifying temporal patterns in time series data

Detecting interdependence among predictor variables

Nature of Relationship

Examines correlation between a variable and its past values

Investigates correlation between independent variables

Impact on the model

Can lead to biased parameter estimates in time series models

Can lead to inflated standard errors and difficulty in isolating individual variable effects

Statistical Test

Ljung-Box test, Durbin-Watson statistic

Variance Inflation Factor (VIF), correlation matrix, condition indices

How to calculate Autocorrelation in Python?

This section demonstrates how to calculate the autocorrelation in python along with the interpretation of the graphs. We will be using google stock price dataset, you can download the dataset from here.

Importing Libraries and Dataset

We have used Pandas, NumPy, Matplotlib, statsmodel, linear regression model and tsaplots.

Python3

# Importing necessary dependencies import pandas as pd import numpy as np import matplotlib.pyplot as plt import statsmodels.api as sm from statsmodels.stats.stattools import durbin_watson from statsmodels.regression.linear_model import OLS from statsmodels.graphics.tsaplots import plot_acf goog_stock_Data = pd.read_csv('GOOG.csv', header=0, index_col=0) goog_stock_Data['Adj Close'].plot() plt.show()

Output:


download-(8)


Here, we have plotted the adjusted close price of the Google stock.

Plotting Autocorrelation Function

Python3

# Plot the autocorrelation for stock price data with 0.05 significance level plot_acf(goog_stock_Data['Adj Close'], alpha =0.05) plt.show()

Output:

download-(9)


The graph plotted above represent autocorrelation at different lags in the time series. In the ACF plot, the x-axis typically represents the lag or time gap between observations, while the y-axis represents the autocorrelation coefficients. Here, we can see that there is some autocorrelation for significance level 0.05. The peak above the horizontal axis indicates positive autocorrelation, suggesting repeating pattern at the corresponding lag.

The Autocorrelation Function plot represents the autocorrelation coefficients for a time series dataset at different lag values.

Performing Durbin-Watson Test

Python3

#Code for Durbin Watson test df = pd.DataFrame(goog_stock_Data,columns=['Date','Adj Close']) X =np.arange(len(df[['Adj Close']])) Y = np.asarray(df[['Adj Close']]) X = sm.add_constant(X) # Fit the ordinary least square method. ols_res = OLS(Y,X).fit() # apply durbin watson statistic on the ols residual durbin_watson(ols_res.resid)

Output:

0.13568583561262496

The DW statistics value is 0.13 falls in the range close to 0, indicating strong positive autocorrelation.

How to Handle Autocorrelation?

To handle autocorrelation in a model,

  • For positive serial correlation
    • Include lagged values of the dependent variable or relevant independent variables in the model. This helps capture the autocorrelation patterns in the data.
    • For example, if dealing with time series data, consider using lagged values in an autoregressive (AR) model.
  • For negative serial correlation
    • Ensure that differencing (if applied) is not excessive. Over-differencing can introduce negative autocorrelation.
    • If differencing is used to achieve stationarity, consider adjusting the differencing order or exploring alternative methods like seasonal differencing.

Also Check:

Frequently Asked Questions (FAQs)

Q. What is autocorrelation vs. correlation?

Correlation looks at how two things are connected, while autocorrelation checks how a thing is linked to its own earlier versions over time.

Q. Why is autocorrelation a problem?

Autocorrelation poses a challenge for many statistical tests since it indicates a lack of independence among values.

Q. What are the types of autocorrelations?

Types of Autocorrelations:

  • Positive Autocorrelation
  • Negative Autocorrelation
  • Zero Autocorrelation
  • Cross-Lag Autocorrelation

Q. What is the principle of autocorrelation?

The principle of autocorrelation is rooted in the idea that the values of a variable in a time series are correlated with their own past values. Autocorrelation measures the strength and direction of this relationship at different time lags.

Q. What is the difference between cross-correlation and autocorrelation?

Autocorrelation measures the correlation of a variable with its own past values, while cross-correlation measures the correlation between two different variables at various time lags. Autocorrelation focuses on the internal relationship within a single time series, while cross-correlation assesses the association between two distinct time series.



Previous Article
Next Article

Similar Reads

Autocorrelation and Partial Autocorrelation
Autocorrelation and partial autocorrelation are statistical measures that help analyze the relationship between a time series and its lagged values. In R Programming Language, the acf() and pacf() functions can be used to compute and visualize autocorrelation and partial autocorrelation, respectively. AutocorrelationAutocorrelation measures the lin
6 min read
Types of Autocorrelation
Autocorrelation: As we discussed in this article, Autocorrelation is defined as the measure of the degree of similarity between a given time series and the lagged version of that time series over successive time periods. Autocorrelation measures the degree of similarity between a time series and the lagged version of that time series at different i
4 min read
Understanding Partial Autocorrelation Functions (PACF) in Time Series Data
Partial autocorrelation functions (PACF) play a pivotal role in time series analysis, offering crucial insights into the relationship between variables while mitigating confounding influences. In essence, PACF elucidates the direct correlation between a variable and its lagged values after removing the effects of intermediary time steps. This stati
9 min read
How to Test the Autocorrelation of the Residuals in R?
Autocorrelation in residuals is a critical aspect of time series analysis and regression modeling. It refers to the correlation of a signal with a delayed copy of itself as a function of delay. Autocorrelation in residuals indicates that there is some pattern left in the residuals that the model has not captured, which can lead to inefficiency in t
4 min read
Machine Learning Tutorial
Machine Learning tutorial covers basic and advanced concepts, specially designed to cater to both students and experienced working professionals. This machine learning tutorial helps you gain a solid introduction to the fundamentals of machine learning and explore a wide range of techniques, including supervised, unsupervised, and reinforcement lea
8 min read
Types of Machine Learning
Machine learning is the branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data and improve from previous experience without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data. In this article,
13 min read
100+ Machine Learning Projects with Source Code [2024]
Machine Learning gained a lot of popularity and become a necessary tool for research purposes as well as for Business. It is a revolutionary field that helps us to make better decisions and automate tasks. Machine Learning is the field of study that gives computers the capability to learn without being explicitly programmed. In this article, you'll
6 min read
Support Vector Machine (SVM) Algorithm
Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or nonlinear classification, regression, and even outlier detection tasks. SVMs can be used for a variety of tasks, such as text classification, image classification, spam detection, handwriting identification, gene expression analysis, face detection, and anomaly
10 min read
Filter Pandas Dataframe by Column Value
Filtering a Pandas DataFrame by way of column values is a commonplace operation while running with information in Python. You can use various methods and techniques for Pandas filtering. Here are numerous ways to filter out a Pandas DataFrame through column values. Pandas Filtering using Exact Value # Filter rows where column 'A' is equal to 3filte
5 min read
One Hot Encoding in Machine Learning
Most real-life datasets we encounter during our data science project development have columns of mixed data type. These datasets consist of both categorical as well as numerical columns. However, various Machine Learning models do not work with categorical data and to fit this data into the machine learning model it needs to be converted into numer
5 min read
Create a Pandas DataFrame from Lists
Converting lists to DataFrames is crucial in data analysis, Pandas enabling you to perform sophisticated data manipulations and analyses with ease. List to Dataframe Example # Simple listdata = [1, 2, 3, 4, 5]# Convert to DataFramedf = pd.DataFrame(data, columns=['Numbers'])Here we will discuss different ways to create a Pandas Dataframe from the l
5 min read
How to rename columns in Pandas DataFrame
Given a Pandas DataFrame, let’s see how to rename columns in Pandas with examples. Here, we will discuss 5 different ways to rename column names in Pandas DataFrame. Pandas Rename Column df.rename(columns={'old_column_name1': 'new_column_name1', 'old_column_name2': 'new_column_name2'}, inplace=True)Pandas Rename ColumnRename Column Pandas using ren
6 min read
Adding new column to existing DataFrame in Pandas
Adding new columns to an existing DataFrame is a fundamental task in data analysis using Pandas . It allows you to enrich your data with additional information and facilitate further analysis and manipulation. This article will explore various methods for adding new columns, including simple assignment, the insert() method, the assign() method. Let
7 min read
How to Get Column Names in Pandas
While analyzing the real datasets which are often very huge in size, we might need to get the pandas column names in order to perform some certain operations. Let's discuss how to get column names in the Pandas . Syntax to Get Column Names in Pandas # Get column names as an Index objectcolumn_names = df.columns Get Column Names in Pandas Get a list
3 min read
Pandas dataframe.groupby() Method
Pandas groupby is used for grouping the data according to the categories and applying a function to the categories. It also helps to aggregate data efficiently. The Pandas groupby() is a very powerful function with a lot of variations. It makes the task of splitting the Dataframe over some criteria really easy and efficient. Pandas dataframe groupb
7 min read
Different ways to create Pandas Dataframe
It is the most commonly used Pandas object. The pd. DataFrame() function is used to create a DataFrame in Pandas . You can also create Pandas DataFrame in multiple ways. There are several ways to create a Pandas Dataframe in Python . You can create a DataFrame with the following methods: Different Ways to Create Dataframe in Python Pandas Dataframe
9 min read
Linear Regression in Machine learning
Machine Learning is a branch of Artificial intelligence that focuses on the development of algorithms and statistical models that can learn from and make predictions on data. Linear regression is also a type of machine-learning algorithm more specifically a supervised machine-learning algorithm that learns from the labelled datasets and maps the da
15+ min read
Principal Component Analysis(PCA)
As the number of features or dimensions in a dataset increases, the amount of data required to obtain a statistically significant result increases exponentially. This can lead to issues such as overfitting, increased computation time, and reduced accuracy of machine learning models this is known as the curse of dimensionality problems that arise wh
13 min read
Introduction to Deep Learning
In the fast-evolving era of artificial intelligence, Deep Learning stands as a cornerstone technology, revolutionizing how machines understand, learn, and interact with complex data. At its essence, Deep Learning AI mimics the intricate neural networks of the human brain, enabling computers to autonomously discover patterns and make decisions from
11 min read
Reinforcement learning
Reinforcement Learning: An OverviewReinforcement Learning (RL) is a branch of machine learning focused on making decisions to maximize cumulative rewards in a given situation. Unlike supervised learning, which relies on a training dataset with predefined answers, RL involves learning through experience. In RL, an agent learns to achieve a goal in a
8 min read
Activation functions in Neural Networks
It is recommended to understand Neural Networks before reading this article.  In the process of building a neural network, one of the choices you get to make is what Activation Function to use in the hidden layer as well as at the output layer of the network. This article discusses Activation functions in Neural Networks. Table of Content What is a
7 min read
Agents in Artificial Intelligence
In artificial intelligence, an agent is a computer program or system that is designed to perceive its environment, make decisions and take actions to achieve a specific goal or set of goals. The agent operates autonomously, meaning it is not directly controlled by a human operator. Agents can be classified into different types based on their charac
11 min read
ML | Underfitting and Overfitting
When we talk about the Machine Learning model, we actually talk about how well it performs and its accuracy which is known as prediction errors. Let us consider that we are designing a machine learning model. A model is said to be a good machine learning model if it generalizes any new input data from the problem domain in a proper way. This helps
5 min read
Decision Tree
Decision trees are a popular and powerful tool used in various fields such as machine learning, data mining, and statistics. They provide a clear and intuitive way to make decisions based on data by modeling the relationships between different variables. This article is all about what decision trees are, how they work, their advantages and disadvan
5 min read
Supervised and Unsupervised learning
Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Supervised learning and unsupervised learning are two main types of machine learning. In supervised learning, the machine is trained on a set of labeled data, which means that the input data is paired with the desired outpu
15 min read
Introduction to Convolution Neural Network
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to understand and interpret the image or visual data.  When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural N
10 min read
Logistic Regression in Machine Learning
Logistic regression is a supervised machine learning algorithm used for classification tasks where the goal is to predict the probability that an instance belongs to a given class or not. Logistic regression is a statistical algorithm which analyze the relationship between two data factors. The article explores the fundamentals of logistic regressi
13 min read
K means Clustering - Introduction
K-Means Clustering is an Unsupervised Machine Learning algorithm, which groups the unlabeled dataset into different clusters. The article aims to explore the fundamentals and working of k mean clustering along with the implementation. Table of Content What is K-means Clustering?What is the objective of k-means clustering?How k-means clustering work
9 min read
K-Nearest Neighbor(KNN) Algorithm
The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning method employed to tackle classification and regression problems. Evelyn Fix and Joseph Hodges developed this algorithm in 1951, which was subsequently expanded by Thomas Cover. The article explores the fundamentals, workings, and implementation of the KNN algorithm. What is t
15+ min read
Naive Bayes Classifiers
A Naive Bayes classifiers, a family of algorithms based on Bayes' Theorem. Despite the "naive" assumption of feature independence, these classifiers are widely utilized for their simplicity and efficiency in machine learning. The article delves into theory, implementation, and applications, shedding light on their practical utility despite oversimp
12 min read
Practice Tags :