Time Series Forecasting With Python Cheat Sheet
Time Series Forecasting With Python Cheat Sheet
The autocorrelation function (ACF) plot shows the The partial autocorrelation function (PACF) plot Separate the series into 3 components: trend,
autocorrelation coefficients as a function of the shows the partial autocorrelation coefficients as a seasonality, and residuals
lag. function of the lag. • Trend: long-term changes in the series
• Use it to determine the order q of a stationary • Use it to determine the order p of a stationary • Seasonality: periodical variations in the series
MA(q) process AR(p) process • Residuals: what is not explained by trend and
• A stationary MA(q) process has significant • A stationary AR(p) process has significant seasonality
coefficients up until lag q coefficients up until lag p
Output for a MA(2) process (i.e., q = 2): Output for an AR(2) process (i.e., p = 2):
Statistical tests
ADF test – Test for stationarity Ljung-Box test – Residuals analysis Granger causality – Multivariate forecasting
A series is stationary it its mean, variance, and Used to determine if the autocorrelation of a group Determine if one time series is useful in predicting
autocorrelation are constant over time. Test for of data is significantly different from 0. Use it on the other one.
stationarity with augmented Dickey-Fuller (ADF) the residuals to check if they are independent. Use to validate the VAR model. If Granger causality
test. • Null hypothesis: the data is independently test fails, then the VAR model is invalid.
• Null hypothesis: a unit root is present (i.e., the distributed (i.e., there is no autocorrelation) • Null hypothesis: 𝑦2,𝑡 does not Granger-causes
series is not stationary) • We want a p-value > 0.05 𝑦1,𝑡
• We want a p-value < 0.05 • Works for predictive causality
• Tests causality in one direction only (i.e., must
run the test twice)
• We want a p-value < 0.05
The moving average model: the current value The autoregressive model is a regression against The autoregressive moving average model (ARMA)
depends on the mean of the series, the current itself. This means that the present value depends is the combination of the autoregressive model
error term, and past error terms. on past values. AR(p), and the moving average model MA(q).
• Denoted as MA(q) where q is the order • Denoted as AR(p) where p is the order • Denoted as ARMA(p,q) where p is the order of
• Use ACF plot to find q • Use PACF to find p the autoregressive portion, and q is the order of
• Assumes stationarity. Use only on stationary • Assumes stationarity. Use only on stationary the moving average portion
data data • Cannot use ACF or PACF to find the order p, and
q. Must try different (p,q) value and select the
Equation Equation model with the lowest AIC (Akaike’s Information
𝑦𝑡 = 𝜇 + 𝜖𝑡 + 𝜃1 𝜖𝑡−1 + 𝜃2 𝜖𝑡−2 + ⋯ + 𝜃𝑞 𝜖𝑡−𝑞 𝑦𝑡 = 𝐶 + 𝜙1 𝑦𝑡−1 + 𝜙2 𝑦𝑡−2 + ⋯ + 𝜙𝑝 𝑦𝑡−𝑝 + 𝜖𝑡 Criterion)
• Assumes stationarity. Use only on stationary
data.
Equation
𝑦𝑡 = 𝐶 + 𝜙1 𝑦𝑡−1 + ⋯ 𝜙𝑝 𝑦𝑡−𝑝 + 𝜃1 𝜖𝑡−1 + ⋯ 𝜃𝑞 𝜖𝑡−𝑞 + 𝜖𝑡
Time Series Forecasting with Python – Cheat Sheet
Data Science with Marco
The autoregressive integrated moving average The seasonal autoregressive integrated moving SARIMAX is the most general model. It combines
(ARIMA) model is the combination of the average (SARIMA) model includes a seasonal seasonality, a moving average portion, an
autoregressive model AR(p), and the moving component on top of the ARIMA model. autoregressive portion, and exogenous variables.
average model MA(q), but in terms of the
differenced series. • Denoted as SARIMA(p,d,q)(P,D,Q)m. Here, p, d, • Can use external variables to forecast a series
and q have the same meaning as in the ARIMA
• Denoted as ARMA(p,d,q), where p is the order model.
of the autoregressive portion, d is the order of • P is the seasonal order of the autoregressive
integration, and q is the order of the moving portion
average portion • D is the seasonal order of integration
• Can use on non-stationary data • Q is the seasonal order of the moving average
portion
Equation • m is the frequency of the data (i.e., the number
𝑦′𝑡 = 𝐶 + 𝜙1 𝑦′𝑡−1 + ⋯ 𝜙𝑝 𝑦′𝑡−𝑝 + 𝜃1 𝜖𝑡−1 + ⋯ 𝜃𝑞 𝜖𝑡−𝑞 + 𝜖𝑡 of data points in one season)
Caveat: SARIMAX predicts the next timestep. If
your horizon is longer than one timestep, then you
must forecast your exogenous variables too, which
can amplify the error in your model
The vector autoregressive moving average with BATS and TBATS are used when the series has more Exponential smoothing uses past values to predict
exogenous variables (VARMAX) model is used for than one seasonal period. This can happen when the future, but the weights decay exponentially as
multivariate forecasting (i.e., predicting two time we have high frequency data, such as daily data. the values go further back in time.
series at the same time)
• When there is more than one seasonal period, • Simple exponential smoothing: returns flat
• Assumes Granger-causality. Must use the SARIMA cannot be used. Use BATS or TBATS. forecasts
Granger-causality test. If the test fails, the • BATS: Box-Cox transformation, ARMA errors, • Double exponential smoothing: adds a trend
VARMAX model cannot be used. Trend and Seasonal components component. Forecasts are a straight line
• TBATS: Trigonometric seasonality, Box-Cox (increasing or decreasing)
transformation, ARMA errors, Trend and • Triple exponential smoothing: adds a seasonal
Seasonal components. component
• Trend can be “additive” or “exponential”
• Seasonality can be “additive” or “multiplicative”
Time Series Forecasting with Python – Cheat Sheet
Data Science with Marco
Deep neural network (DNN) Long short-term memory - LSTM Convolutional neural network - CNN
A deep neural network stacks fully connected An LSTM is great at processing sequences of data, A CNN can act as a filter for our time series, due to
layers and can model non-linear relationship in the such as text and time series. the convolution operation which reduces the
time series if the activation function is non-linear. feature space.
Its architecture allows for past information to still
• Start with a simple model with few hidden be used for later predictions • A CNN trains faster than an LSTM
layers. Experiment training for more epochs • Can be combined with an LSTM. Place the CNN
before adding layers • You can stack many LSTM layers in your model layer before the LSTM
• You can try combining an LSTM with a CNN
• An LSTM is longer to train since the data is
processed in sequence
Time Series Forecasting with Python – Cheat Sheet
Data Science with Marco