Probabilistic forecasting with machine learning
  Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

If you like  Skforecast ,  help us giving a star on   GitHub! ⭐️

Probabilistic forecasting with machine learning

Joaquín Amat Rodrigo, Javier Escobar Ortiz
April, 2022 (last update August 2024)

Introduction

When trying to anticipate future values, most forecasting models try to predict what will be the most likely value. This is called point-forecasting. Although knowing in advance the expected value of a time series is useful in almost every business case, this kind of prediction does not provide any information about the confidence of the model nor the prediction uncertainty.

Probabilistic forecasting, as opposed to point-forecasting, is a family of techniques that allow for predicting the expected distribution of the outcome instead of a single future value. This type of forecasting provides much rich information since it allows for creating prediction intervals, the range of likely values where the true value may fall. More formally, a prediction interval defines the interval within which the true value of the response variable is expected to be found with a given probability.

There are multiple ways to estimate prediction intervals, most of which require that the residuals (errors) of the model follow a normal distribution. When this property cannot be assumed, two alternatives commonly used are bootstrapping and quantile regression. In order to illustrate how skforecast allows estimating prediction intervals for multi-step forecasting, the following examples are shown:

Warning

As Rob J Hyndman explains in his blog, in real-world problems, almost all prediction intervals are too narrow. For example, nominal 95% intervals may only provide coverage between 71% and 87%. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. With forecasting models, there are at least four sources of uncertainty:
  • The random error term
  • The parameter estimates
  • The choice of model for the historical data
  • The continuation of the historical data generating process into the future
When producing prediction intervals for time series models, generally only the first of these sources is taken into account. Therefore, it is advisable to use test data to validate the empirical coverage of the interval and not only rely on the expected one.

✎ Note

If this is the first time using skforecast, please visit Skforecast: time series forecasting with Python and Scikit-learn for a quick introduction.

✎ Note

Conformal prediction is a relatively new framework that allows for the creation of confidence measures for predictions made by machine learning models. This method is on the roadmap of skforecast, but not yet available.

Prediction intervals using bootstrapped residuals

The error of one-step-ahead forecast is defined as $e_t = y_t - \hat{y}_{t|t-1}$. Assuming future errors will be like past errors, it is possible to simulate different predictions by sampling from the collection of errors previously seen in the past (i.e., the residuals) and adding them to the predictions.

Doing this repeatedly, a collection of slightly different predictions is created (possible future paths), that represent the expected variance in the forecasting process.

Finally, prediction intervals can be computed by calculating the $α/2$ and $1 − α/2$ percentiles of the simulated data at each forecasting horizon.


The main advantage of this strategy is that it only requires a single model to estimate any interval. The drawback is that, running hundreds or thousands of bootstrapping iterations, is computationally very expensive and not always workable.

Libraries

In [126]:
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.datasets import fetch_dataset

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
pio.renderers.default = 'notebook' 
poff.init_notebook_mode(connected=True)
plt.style.use('seaborn-v0_8-darkgrid')
from skforecast.plot import plot_residuals
from skforecast.plot import plot_prediction_distribution
from pprint import pprint

# Modelling and Forecasting
# ==============================================================================
import skforecast
import sklearn
import lightgbm
from lightgbm import LGBMRegressor
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.ForecasterAutoregDirect import ForecasterAutoregDirect
from skforecast.model_selection import bayesian_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from sklearn.metrics import mean_pinball_loss
from scipy.stats import norm

# Configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

color = '\033[1m\033[38;5;208m'
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version lightgbm: {lightgbm.__version__}")
print(f"{color}Version pandas: {pd.__version__}")
print(f"{color}Version numpy: {np.__version__}")
Version skforecast: 0.13.0
Version scikit-learn: 1.5.1
Version lightgbm: 4.4.0
Version pandas: 2.2.2
Version numpy: 2.0.1

Data

In [127]:
# Data download
# ==============================================================================
data = fetch_dataset(name='bike_sharing_extended_features')
data.head(2)
bike_sharing_extended_features
------------------------------
Hourly usage of the bike share system in the city of Washington D.C. during the
years 2011 and 2012. In addition to the number of users per hour, the dataset
was enriched by introducing supplementary features. Addition includes calendar-
based variables (day of the week, hour of the day, month, etc.), indicators for
sunlight, incorporation of rolling temperature averages, and the creation of
polynomial features generated from variable pairs. All cyclic variables are
encoded using sine and cosine functions to ensure accurate representation.
Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5W894.
Shape of the dataset: (17352, 90)
Out[127]:
users weather month_sin month_cos week_of_year_sin week_of_year_cos week_day_sin week_day_cos hour_day_sin hour_day_cos ... temp_roll_mean_1_day temp_roll_mean_7_day temp_roll_max_1_day temp_roll_min_1_day temp_roll_max_7_day temp_roll_min_7_day holiday_previous_day holiday_next_day temp holiday
date_time
2011-01-08 00:00:00 25.0 mist 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.258819 0.965926 ... 8.063334 10.127976 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0
2011-01-08 01:00:00 16.0 mist 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.500000 0.866025 ... 8.029166 10.113334 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0

2 rows × 90 columns

In [128]:
# One hot encoding of categorical variables
# ==============================================================================
encoder = ColumnTransformer(
              [('one_hot_encoder', OneHotEncoder(sparse_output=False), ['weather'])],
              remainder='passthrough',
              verbose_feature_names_out=False
          ).set_output(transform="pandas")
data = encoder.fit_transform(data)
In [129]:
# Selección de las variables exógenas
# ==============================================================================
exog_features = [
    'weather_clear', 'weather_mist', 'weather_rain', 'month_sin', 'month_cos',
    'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos',
    'hour_day_sin', 'hour_day_cos', 'sunrise_hour_sin', 'sunrise_hour_cos',
    'sunset_hour_sin', 'sunset_hour_cos', 'temp'
]
data = data[['users'] + exog_features]

To facilitate the training of the models, the search for optimal hyperparameters and the evaluation of their predictive accuracy, the data are divided into three separate sets: training, validation and test.

In [130]:
# Split train-validation-test
# ==============================================================================
data = data.loc['2011-05-30 23:59:00':, :]
end_train = '2012-08-30 23:59:00'
end_validation = '2012-11-15 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")
Dates train      : 2011-05-31 00:00:00 --- 2012-08-30 23:00:00  (n=10992)
Dates validacion : 2012-08-31 00:00:00 --- 2012-11-15 23:00:00  (n=1848)
Dates test       : 2012-11-16 00:00:00 --- 2012-12-30 23:00:00  (n=1080)

Graphic exploration

Graphical exploration of time series can be an effective way of identifying trends, patterns, and seasonal variations. This, in turn, helps to guide the selection of the most appropriate forecasting model.

In [131]:
# Interactive plot of time series
# ==============================================================================
fig = go.Figure()
fig.add_trace(go.Scatter(x=data_train.index, y=data_train['users'], mode='lines', name='Train'))
fig.add_trace(go.Scatter(x=data_val.index, y=data_val['users'], mode='lines', name='Validation'))
fig.add_trace(go.Scatter(x=data_test.index, y=data_test['users'], mode='lines', name='Test'))
fig.update_layout(
    title  = 'Number of users',
    xaxis_title="Time",
    yaxis_title="Users",
    legend_title="Partition:",
    width=900,
    height=400,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1,
        xanchor="left",
        x=0.001
    )
)
fig.show()

Auto-correlation plots are a useful tool for identifying the order of an autoregressive model. The autocorrelation function (ACF) is a measure of the correlation between the time series and a lagged version of itself. The partial autocorrelation function (PACF) is a measure of the correlation between the time series and a lagged version of itself, controlling for the values of the time series at all shorter lags. These plots are useful for identifying the lags to be included in the autoregressive model.

In [132]:
# Autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 2))
plot_acf(data.users, ax=ax, lags=24*3)
plt.show()
In [133]:
# Partial autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 2))
plot_pacf(data.users, ax=ax, lags=24*3)
plt.show()

The autocorrelation plot demonstrates a strong correlation between the number of users in one hour and prior hours, as well as between users in one hour and the corresponding hour in preceding days. This observed correlation suggests that autoregressive models may be effective in this scenario.

In sample residuals

A recursive-multi-step forecaster is trained and its hyperparameters optimized. Then, prediction intervals based on bootstrapped residuals are estimated.

In [134]:
# Create forecaster and hyperparameters search
# ==============================================================================
# Forecaster
forecaster = ForecasterAutoreg(
                 regressor = LGBMRegressor(random_state=15926, verbose=-1),
                 lags      = 7 # Place holder that will be replaced during the search
             )

# Lags used as predictors
lags_grid = [24, 48, (1, 2, 3, 23, 24, 25, 47, 48, 49, 71, 72, 73, 364*24, 365*24)]

# Regressor hyperparameters search space
def search_space(trial):
    search_space  = {
        'lags'            : trial.suggest_categorical('lags', lags_grid),
        'n_estimators'    : trial.suggest_int('n_estimators', 200, 800, step=100),
        'max_depth'       : trial.suggest_int('max_depth', 3, 8, step=1),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 25, 500),
        'learning_rate'   : trial.suggest_float('learning_rate', 0.01, 0.5),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 0.8, step=0.1),
        'max_bin'         : trial.suggest_int('max_bin', 50, 100, step=25),
        'reg_alpha'       : trial.suggest_float('reg_alpha', 0, 1, step=0.1),
        'reg_lambda'      : trial.suggest_float('reg_lambda', 0, 1, step=0.1)
    }

    return search_space

results_search, frozen_trial = bayesian_search_forecaster(
                                   forecaster         = forecaster,
                                   y                  = data.loc[:end_validation, 'users'],
                                   exog               = data.loc[:end_validation, exog_features],
                                   steps              = 24,
                                   metric             = 'mean_absolute_error',
                                   search_space       = search_space,
                                   initial_train_size = len(data[:end_train]),
                                   refit              = False,
                                   n_trials           = 20, # Increase for more exhaustive search
                                   random_state       = 123,
                                   return_best        = True,
                                   n_jobs             = 'auto',
                                   verbose            = False,
                                   show_progress      = True
                               )

best_params = results_search['params'].iloc[0]
best_lags   = results_search['lags'].iloc[0]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [   1    2    3   23   24   25   47   48   49   71   72   73 8736 8760] 
  Parameters: {'n_estimators': 300, 'max_depth': 5, 'min_data_in_leaf': 25, 'learning_rate': 0.10794173876361574, 'feature_fraction': 0.6, 'max_bin': 100, 'reg_alpha': 1.0, 'reg_lambda': 1.0}
  Backtesting metric: 55.26098214742841

Once the best hyperparameters have been selected, the backtesting_forecaster() function is used to generate the prediction intervals for the entire test set.

  • The interval argument indicates the desired coverage probability of the prediction intervals. In this case, interval is set to [10, 90], which means that the prediction intervals are calculated for the 10th and 90th percentiles, resulting in a theoretical coverage probability of 80%.

  • The n_boot argument is used to specify the number of bootstrap samples to be used in estimating the prediction intervals. The larger the number of samples, the more accurate the prediction intervals will be, but the longer the calculation will take.

By default, intervals are calculated using in-sample residuals (residuals from the training set). However, this can result in intervals that are too narrow (overly optimistic).

In [135]:
# Backtesting with prediction intervals in test data using in-sample residuals
# ==============================================================================
metric, predictions = backtesting_forecaster(
                          forecaster          = forecaster,
                          y                   = data['users'],
                          exog                = data[exog_features],
                          steps               = 24,
                          metric              = 'mean_absolute_error',
                          initial_train_size  = len(data.loc[:end_validation]),
                          refit               = False,
                          interval            = [10, 90], # 80% prediction interval
                          n_boot              = 250,
                          in_sample_residuals = True, # Use in-sample residuals
                          binned_residuals    = False,
                          n_jobs              = 'auto',
                          verbose             = False,
                          show_progress       = True
                      )
display(metric)
predictions.head(5)
mean_absolute_error
0 50.462723
Out[135]:
pred lower_bound upper_bound
2012-11-16 00:00:00 75.957987 47.819542 105.837195
2012-11-16 01:00:00 22.895924 -3.148551 55.610378
2012-11-16 02:00:00 11.105227 -19.112222 35.062099
2012-11-16 03:00:00 7.465197 -13.938146 30.810502
2012-11-16 04:00:00 6.216360 -17.572476 34.432383
In [136]:
# Function to plot predicted intervals
# ======================================================================================
def plot_predicted_intervals(
    predictions: pd.DataFrame,
    y_true: pd.DataFrame,
    target_variable: str,
    initial_x_zoom: list=None,
    title: str=None,
    xaxis_title: str=None,
    yaxis_title: str=None,
):
    """
    Plot predicted intervals vs real values

    Parameters
    ----------
    predictions : pandas DataFrame
        Predicted values and intervals.
    y_true : pandas DataFrame
        Real values of target variable.
    target_variable : str
        Name of target variable.
    initial_x_zoom : list, default `None`
        Initial zoom of x-axis, by default None.
    title : str, default `None`
        Title of the plot, by default None.
    xaxis_title : str, default `None`
        Title of x-axis, by default None.
    yaxis_title : str, default `None`
        Title of y-axis, by default None.
    
    """

    fig = go.Figure([
        go.Scatter(name='Prediction', x=predictions.index, y=predictions['pred'], mode='lines'),
        go.Scatter(name='Real value', x=y_true.index, y=y_true[target_variable], mode='lines'),
        go.Scatter(
            name='Upper Bound', x=predictions.index, y=predictions['upper_bound'],
            mode='lines', marker=dict(color="#444"), line=dict(width=0), showlegend=False
        ),
        go.Scatter(
            name='Lower Bound', x=predictions.index, y=predictions['lower_bound'],
            marker=dict(color="#444"), line=dict(width=0), mode='lines',
            fillcolor='rgba(68, 68, 68, 0.3)', fill='tonexty', showlegend=False
        )
    ])
    fig.update_layout(
        title=title,
        xaxis_title=xaxis_title,
        yaxis_title=yaxis_title,
        width=900,
        height=400,
        margin=dict(l=20, r=20, t=35, b=20),
        hovermode="x",
        xaxis=dict(range=initial_x_zoom),
        legend=dict(
            orientation="h",
            yanchor="top",
            y=1.1,
            xanchor="left",
            x=0.001
        )
    )
    fig.show()


def empirical_coverage(y, lower_bound, upper_bound):
    """
    Calculate coverage of a given interval
    """
    return np.mean(np.logical_and(y >= lower_bound, y <= upper_bound))
In [137]:
# Plot intervals (with zoom ['2012-12-01', '2012-12-20'])
# ==============================================================================
plot_predicted_intervals(
    predictions     = predictions,
    y_true          = data_test,
    target_variable = "users",
    initial_x_zoom  = ['2012-12-01', '2012-12-20'],
    title           = "Real value vs predicted in test data",
    xaxis_title     = "Date time",
    yaxis_title     = "users",
)

# Predicted interval coverage (on test data)
# ==============================================================================
coverage = empirical_coverage(
    y = data.loc[end_validation:, 'users'],
    lower_bound = predictions["lower_bound"], 
    upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100*coverage, 2)} %")

# Area of the interval
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")
Predicted interval coverage: 56.76 %
Area of the interval: 81872.55

The prediction intervals exhibit overconfidence as they tend to be excessively narrow, resulting in a true coverage that falls below the nominal coverage. This phenomenon arises from the tendency of in-sample residuals to often overestimate the predictive capacity of the model.

Out sample residuals (non-conditioned on predicted values)

The set_out_sample_residuals() method is used to specify out-sample residuals computed with a validation set through backtesting. Once the new residuals have been added to the forecaster, set in_sample_residuals to False use them.

In [138]:
# Backtesting on validation data to obtain out-sample residuals
# ==============================================================================
_, predictions_val = backtesting_forecaster(
                         forecaster         = forecaster,
                         y                  = data.loc[:end_validation, 'users'],
                         exog               = data.loc[:end_validation, exog_features],
                         steps              = 24,
                         metric             = 'mean_absolute_error',
                         initial_train_size = len(data.loc[:end_train]),
                         refit              = False,
                         n_jobs             = 'auto',
                         verbose            = False,
                         show_progress      = True
                     )

residuals = data.loc[predictions_val.index, 'users'] - predictions_val['pred']
residuals = residuals.dropna()
In [139]:
# Out-sample residuals distribution
# ==============================================================================
print(pd.Series(np.where(residuals < 0, 'negative', 'positive')).value_counts())
plt.rcParams.update({'font.size': 8})
_ = plot_residuals(residuals=residuals, figsize=(7, 4))
positive    1029
negative     819
Name: count, dtype: int64
In [140]:
# Store out-sample residuals in the forecaster
# ==============================================================================
forecaster.set_out_sample_residuals(residuals=residuals)
In [141]:
# Backtesting with prediction intervals in test data using out-sample residuals
# ==============================================================================
metric, predictions = backtesting_forecaster(
                          forecaster          = forecaster,
                          y                   = data['users'],
                          exog                = data[exog_features],
                          steps               = 24,
                          metric              = 'mean_absolute_error',
                          initial_train_size  = len(data.loc[:end_validation]),
                          refit               = False,
                          interval            = [10, 90], # 80% prediction interval
                          n_boot              = 250,
                          in_sample_residuals = False, # Use out-sample residuals
                          binned_residuals    = False,
                          n_jobs              = 'auto',
                          verbose             = False,
                          show_progress       = True
                      )
predictions.head(5)
Out[141]:
pred lower_bound upper_bound
2012-11-16 00:00:00 75.957987 18.931677 172.339542
2012-11-16 01:00:00 22.895924 -50.801788 148.575646
2012-11-16 02:00:00 11.105227 -48.341104 125.591789
2012-11-16 03:00:00 7.465197 -41.571523 159.773331
2012-11-16 04:00:00 6.216360 -43.391596 169.835398
In [142]:
# Plot intervals (with zoom ['2012-12-01', '2012-12-20'])
# ==============================================================================
plot_predicted_intervals(
    predictions     = predictions,
    y_true          = data_test,
    target_variable = "users",
    initial_x_zoom  = ['2012-12-01', '2012-12-20'],
    title           = "Real value vs predicted in test data",
    xaxis_title     = "Date time",
    yaxis_title     = "users",
)

# Predicted interval coverage (on test data)
# ==============================================================================
coverage = empirical_coverage(
    y = data.loc[end_validation:, 'users'],
    lower_bound = predictions["lower_bound"], 
    upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100*coverage, 2)} %")

# Area of the interval
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")
Predicted interval coverage: 79.54 %
Area of the interval: 239790.0

The resulting prediction intervals derived from the out-of-sample residuals are wider than those generated using the in-sample residuals. This results in an empirical coverage that is closer to the nominal coverage, although still lower. Examining the plot, it is easy to see that the intervals are particularly wide when the predicted values are low, indicating that the model is not able to properly locate the uncertainty of its predictions.

Out sample residuals (conditioned on predicted values)

The bootstrapping process assumes that the residuals are independently distributed so that they can be used independently of the predicted value. In reality, this is rarely true; in most cases, the magnitude of the residuals is correlated with the magnitude of the predicted value. In this case, for example, one would hardly expect the error to be the same when the predicted number of users is close to zero as when it is in the hundreds.

To account for the dependence between the residuals and the predicted values, skforecast allows to partition the residuals into K bins, where each bin is associated with a range of predicted values. Using this strategy, the bootstrapping process samples the residuals from different bins depending on the predicted value, which can improve the coverage of the interval while adjusting the width if necessary, allowing the model to better distribute the uncertainty of its predictions.

To enable the forecaster to bin the out-sample residuals, the predicted values are passed to the set_out_sample_residuals() method in addition to the residuals. Internally, skforecast uses a sklearn.preprocessing.KBinsDiscretizer with the parameters: n_bins=15, encode=ordinal, strategy=quantile, subsample=10000, random_state=789654, dtype=np.float64. The binning process can be adjusted using the argument binner_kwargs of the Forecaster object.

In [143]:
# Create and train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor     = LGBMRegressor(random_state=15926, verbose=-1, **best_params),
                 lags          = best_lags,
                 binner_kwargs = {'n_bins': 15}   
             )

forecaster.fit(
    y     = data.loc[:end_validation, 'users'],
    exog  = data.loc[:end_validation, exog_features]
)

During the training process, the forecaster uses the in-sample predictions to define the intervals at which the residuals are stored, depending on the predicted value to which they are related. Although not used in this example, the in-sample residuals are divided into bins and stored in the in_sample_residuals_by_bin attribute.

In [144]:
# Intervals of the residual bins
# ==============================================================================
pprint(forecaster.binner_intervals)
{0: (np.float64(-1.741708189819649), np.float64(10.317040266602518)),
 1: (np.float64(10.317040266602518), np.float64(23.397712684935296)),
 2: (np.float64(23.397712684935296), np.float64(42.81143589390928)),
 3: (np.float64(42.81143589390928), np.float64(93.70601153260277)),
 4: (np.float64(93.70601153260277), np.float64(146.00472540188062)),
 5: (np.float64(146.00472540188062), np.float64(185.98114181351224)),
 6: (np.float64(185.98114181351224), np.float64(227.94516357708295)),
 7: (np.float64(227.94516357708295), np.float64(263.17131420814536)),
 8: (np.float64(263.17131420814536), np.float64(298.7107366143719)),
 9: (np.float64(298.7107366143719), np.float64(339.1431337029709)),
 10: (np.float64(339.1431337029709), np.float64(399.04627830804475)),
 11: (np.float64(399.04627830804475), np.float64(472.88850088361374)),
 12: (np.float64(472.88850088361374), np.float64(553.7082345948482)),
 13: (np.float64(553.7082345948482), np.float64(682.5049823044178)),
 14: (np.float64(682.5049823044178), np.float64(994.5807380527069))}

Next, the out-of-sample residuals are stored in the predictor. Since the predicted values are also provided, they are binned according to the intervals learned during the fit. To avoid using too much memory, a maximum of 200 residuals are stored per bin.

In [145]:
# Store out-sample residuals in the forecaster
# ==============================================================================
forecaster.set_out_sample_residuals(residuals=residuals, y_pred=predictions_val['pred'])
In [146]:
# Number of residuals by bin
# ==============================================================================
for k, v in forecaster.out_sample_residuals_by_bin.items():
    print(f" Bin {k}: n={len(v)}")
 Bin 0: n=148
 Bin 1: n=118
 Bin 2: n=107
 Bin 3: n=117
 Bin 4: n=121
 Bin 5: n=133
 Bin 6: n=116
 Bin 7: n=106
 Bin 8: n=161
 Bin 9: n=138
 Bin 10: n=126
 Bin 11: n=125
 Bin 12: n=134
 Bin 13: n=118
 Bin 14: n=80
In [147]:
# Distribution of the residual by bin
# ==============================================================================
out_sample_residuals_by_bin_df = pd.DataFrame(
    dict([(k, pd.Series(v)) for k,v in forecaster.out_sample_residuals_by_bin.items()])
)
fig, ax = plt.subplots(figsize=(6, 3))
out_sample_residuals_by_bin_df.boxplot(ax=ax)
ax.set_title("Distribution of residuals by bin")
ax.set_xlabel("Bin")
ax.set_ylabel("Residuals");

The box plots show how the spread and magnitude of the residuals differ depending on the predicted value. For example, for bin 0, whose predicted value is in the interval (-8.2, 11.1), the residuals never exceed an absolute value of 100, while for bin 9, for predicted values in the interval (486.2, 970.5), they often do.

Finally, the prediction intervals for the test data are estimated using the backtesting process, with out-of-sample residuals conditioned on the predicted values.

In [148]:
# Backtesting with prediction intervals in test data using out-sample residuals
# ==============================================================================
metric, predictions = backtesting_forecaster(
                          forecaster          = forecaster,
                          y                   = data['users'],
                          exog                = data[exog_features],
                          steps               = 24,
                          metric              = 'mean_absolute_error',
                          initial_train_size  = len(data.loc[:end_validation]),
                          refit               = False,
                          interval            = [10, 90], # 80% prediction interval
                          n_boot              = 250,
                          in_sample_residuals = False, # Use out-sample residuals
                          binned_residuals    = True,  # Use binned residuals
                          n_jobs              = 'auto',
                          verbose             = False,
                          show_progress       = True
                      )
predictions.head(3)
Out[148]:
pred lower_bound upper_bound
2012-11-16 00:00:00 75.957987 42.852151 148.387477
2012-11-16 01:00:00 22.895924 12.497842 84.588488
2012-11-16 02:00:00 11.105227 5.109920 33.501403
In [149]:
# Plot intervals (with zoom ['2012-12-01', '2012-12-20'])
# ==============================================================================
plot_predicted_intervals(
    predictions     = predictions,
    y_true          = data_test,
    target_variable = "users",
    initial_x_zoom  = ['2012-12-01', '2012-12-20'],
    title           = "Real value vs predicted in test data",
    xaxis_title     = "Date time",
    yaxis_title     = "users",
)

# Predicted interval coverage (on test data)
# ==============================================================================
coverage = empirical_coverage(
    y = data.loc[end_validation:, 'users'],
    lower_bound = predictions["lower_bound"], 
    upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100*coverage, 2)} %")

# Area of the interval
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")
Predicted interval coverage: 78.52 %
Area of the interval: 212122.55

When using out-of-sample residuals conditioned on the predicted value, the area of the interval is significantly reduced and the uncertainty is mainly allocated to the predictions with high values. The empirical coverage is slightly above the expected coverage, which means that the estimated intervals are conservative.

Predict bootstraping, quantile and distribution

The previous sections have demonstrated the use of the backtesting process to estimate the prediction interval over a given period of time. The goal is to mimic the behavior of the model in production by running predictions at regular intervals, incrementally updating the input data.

Alternatively, it is possible to run a single prediction that forecasts N steps ahead without going through the entire backtesting process. In such cases, skforecast provides four different methods: predict_bootstrapping, predict_interval, predict_quantile and predict_distribution.

Predict Bootstraping

The predict_bootstrapping method performs the n_boot bootstrapping iterations that generate the alternative prediction paths. These are the underlying values used to compute the intervals, quantiles, and distributions.

In [150]:
# Fit forecaster
# ==============================================================================
forecaster.fit(
    y     = data.loc[:end_validation, 'users'],
    exog  = data.loc[:end_validation, exog_features]
)
In [151]:
# Predict 10 different forecasting sequences of 7 steps each using bootstrapping
# ==============================================================================
boot_predictions = forecaster.predict_bootstrapping(
                       exog   = data_test[exog_features],
                       steps  = 7,
                       n_boot = 25
                   )
boot_predictions
Out[151]:
pred_boot_0 pred_boot_1 pred_boot_2 pred_boot_3 pred_boot_4 pred_boot_5 pred_boot_6 pred_boot_7 pred_boot_8 pred_boot_9 ... pred_boot_15 pred_boot_16 pred_boot_17 pred_boot_18 pred_boot_19 pred_boot_20 pred_boot_21 pred_boot_22 pred_boot_23 pred_boot_24
2012-11-16 00:00:00 56.446654 78.445434 61.043958 71.459856 9.317830 64.332017 64.050459 76.361132 65.391584 84.685244 ... 87.526800 57.170664 72.480408 95.528115 17.025536 61.520101 66.864733 82.503459 86.648542 30.886569
2012-11-16 01:00:00 44.796911 2.923846 11.778865 29.072737 15.084071 12.776094 18.477441 -13.875616 18.371827 -89.436960 ... 0.908435 49.211016 34.131547 36.231169 56.130703 14.115231 30.541057 18.533885 76.172637 16.193008
2012-11-16 02:00:00 20.416541 4.158834 -23.223879 12.158382 1.075786 7.250440 18.567831 8.250957 14.241982 0.752238 ... -1.085292 -5.054992 5.628579 13.979976 58.768310 25.870401 10.443847 -2.442668 18.724291 11.647707
2012-11-16 03:00:00 -38.207220 45.561446 -4.192694 38.291202 27.971438 -8.517371 -31.841692 -18.697897 35.900958 6.917370 ... 6.926916 14.078069 10.533025 21.515378 -7.615915 9.654914 21.113748 61.384655 -14.405346 0.178372
2012-11-16 04:00:00 -72.194262 14.797662 -24.063171 23.826356 37.685130 14.918799 -34.693072 18.138418 0.042308 -9.205498 ... 13.952389 -26.256247 -26.405837 13.231683 13.755999 11.069402 34.319534 0.044407 15.768496 -53.377396
2012-11-16 05:00:00 24.985400 -4.953175 12.111519 -4.859893 47.206273 38.533189 48.216362 44.090328 -3.100227 43.584801 ... 30.629482 53.532216 47.671314 157.364984 -0.199830 21.288378 99.802875 24.241801 36.099815 42.725157
2012-11-16 06:00:00 151.234534 79.329648 51.657951 76.256826 143.545622 162.381012 189.234897 154.496473 47.031632 168.917666 ... 142.194798 138.115698 151.705814 280.713932 105.594056 153.644263 175.658375 69.230538 190.391641 170.676641

7 rows × 25 columns

A ridge plot is a useful way to visualize the uncertainty of a forecasting model. This plot estimates a kernel density for each step by using the bootstrapped predictions.

In [152]:
# Ridge plot of bootstrapping predictions
# ==============================================================================
_ = plot_prediction_distribution(boot_predictions, figsize=(7, 4))

Predict Interval

In most cases, the user is interested in a specific interval rather than the entire bootstrapping simulation matrix. To address this need, skforecast provides the predict_interval method. This method internally uses predict_bootstrapping to obtain the bootstrapping matrix and estimates the upper and lower quantiles for each step, thus providing the user with the desired prediction intervals.

In [153]:
# Predict intervals for next 7 steps, quantiles 10th and 90th
# ==============================================================================
predictions = forecaster.predict_interval(
                  exog     = data_test[exog_features],
                  steps    = 7,
                  interval = [10, 90],
                  n_boot   = 150
              )
predictions
Out[153]:
pred lower_bound upper_bound
2012-11-16 00:00:00 75.957987 44.263714 97.276697
2012-11-16 01:00:00 22.895924 -5.114410 57.302635
2012-11-16 02:00:00 11.105227 -9.493630 41.790914
2012-11-16 03:00:00 7.465197 -26.936476 43.340331
2012-11-16 04:00:00 6.216360 -24.203510 39.367809
2012-11-16 05:00:00 31.850292 3.253671 62.886696
2012-11-16 06:00:00 145.218401 99.102526 177.571879

Predict Quantile

This method operates identically to predict_interval, with the added feature of enabling users to define a specific list of quantiles for estimation at each step. It's important to remember that these quantiles should be specified within the range of 0 to 1.

In [154]:
# Predict quantiles for next 7 steps, quantiles 5th, 25th, 75th and 95th
# ==============================================================================
predictions = forecaster.predict_quantiles(
                  exog      = data_test[exog_features],
                  steps     = 7,
                  n_boot    = 150,
                  quantiles = [0.05, 0.25, 0.75, 0.95],
              )
predictions
Out[154]:
q_0.05 q_0.25 q_0.75 q_0.95
2012-11-16 00:00:00 26.697026 59.945667 84.290311 103.314988
2012-11-16 01:00:00 -12.617032 15.646876 39.329589 68.914888
2012-11-16 02:00:00 -19.912987 1.411855 24.329655 58.154870
2012-11-16 03:00:00 -41.897336 -1.289290 28.780368 60.202145
2012-11-16 04:00:00 -33.126274 -3.550934 21.619017 48.812445
2012-11-16 05:00:00 -4.013282 21.614429 48.093790 72.628595
2012-11-16 06:00:00 76.470323 115.885365 160.820446 191.414876

Predict Distribution

The intervals estimated so far are distribution-free, which means that no assumptions are made about a particular distribution. The predict_dist method in skforecast allows fitting a parametric distribution to the bootstrapped prediction samples obtained with predict_bootstrapping. This is useful when there is reason to believe that the forecast errors follow a particular distribution, such as the normal distribution or the student's t-distribution. The predict_dist method allows the user to specify any continuous distribution from the scipy.stats module.

In [155]:
# Predict the parameters of a normal distribution for the next 7 steps
# ==============================================================================
predictions = forecaster.predict_dist(
                  exog         = data_test[exog_features],
                  steps        = 7,
                  n_boot       = 150,
                  distribution = norm
              )
predictions
Out[155]:
loc scale
2012-11-16 00:00:00 71.063846 23.131844
2012-11-16 01:00:00 26.095015 27.171060
2012-11-16 02:00:00 14.122650 24.917369
2012-11-16 03:00:00 10.727670 28.392185
2012-11-16 04:00:00 8.927261 27.192044
2012-11-16 05:00:00 35.598823 25.568744
2012-11-16 06:00:00 138.121327 35.306129

Prediction intervals using quantile regression models

As opposed to ordinal linear regression, which is intended to estimate the conditional mean of the response variable given certain values of the predictor variables, quantile regression aims at estimating the conditional quantiles of the response variable. For a continuous distribution function, the $\alpha$-quantile $Q_{\alpha}(x)$ is defined such that the probability of $Y$ being smaller than $Q_{\alpha}(x)$ is, for a given $X=x$, equal to $\alpha$. For example, 36% of the population values are lower than the quantile $Q=0.36$. The most known quantile is the 50%-quantile, more commonly called the median.

By combining the predictions of two quantile regressors, it is possible to build an interval. Each model estimates one of the limits of the interval. For example, the models obtained for $Q = 0.1$ and $Q = 0.9$ produce an 80% prediction interval (90% - 10% = 80%).

Several machine learning algorithms are capable of modeling quantiles. Some of them are:

Just as the squared-error loss function is used to train models that predict the mean value, a specific loss function is needed in order to train models that predict quantiles. The most common metric used for quantile regression is calles quantile loss or pinball loss:

$$\text{pinball}(y, \hat{y}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \alpha \max(y_i - \hat{y}_i, 0) + (1 - \alpha) \max(\hat{y}_i - y_i, 0)$$

where $\alpha$ is the target quantile, $y$ the real value and $\hat{y}$ the quantile prediction.

It can be seen that loss differs depending on the evaluated quantile. The higher the quantile, the more the loss function penalizes underestimates, and the less it penalizes overestimates. As with MSE and MAE, the goal is to minimize its values (the lower loss, the better).

Two disadvantages of quantile regression, compared to the bootstrap approach to prediction intervals, are that each quantile needs its regressor and quantile regression is not available for all types of regression models. However, once the models are trained, the inference is much faster since no iterative process is needed.

This type of prediction intervals can be easily estimated using quantile regressor inside a Forecaster object.

Warning

Forecasters of type ForecasterAutoregDirect are slower than ForecasterAutoregRecursive because they require training one model per step. Although they can achieve better performance, their scalability is an important limitation when many steps need to be predicted. To limit the time required to run the following examples, the data is aggregated from hourly frequency to daily frequency and only 7 steps ahead (one week) are predicted.

Data

In [156]:
# Data download
# ==============================================================================
data = fetch_dataset(name='bike_sharing_extended_features', verbose=False)
In [157]:
# Aggregate data to daily frequency
# ==============================================================================
data = (
    data
    .resample(rule="D", closed="left", label="right")
    .agg({"users": "sum"})
)
In [158]:
# Split train-validation-test
# ==============================================================================
end_train = '2012-05-31 23:59:00'
end_validation = '2012-09-15 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]
print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")
Dates train      : 2011-01-09 00:00:00 --- 2012-05-31 00:00:00  (n=509)
Dates validacion : 2012-06-01 00:00:00 --- 2012-09-15 00:00:00  (n=107)
Dates test       : 2012-09-16 00:00:00 --- 2012-12-31 00:00:00  (n=107)

Quantile regresion models

An 80% prediction interval is estimated for 7 steps-ahead predictions using quantile regression. A LightGBM gradient boosting model is trained in this example, however, the reader may use any other model just replacing the definition of the regressor.

In [159]:
# Create forecasters: one for each limit of the interval
# ==============================================================================
# The forecasters obtained for alpha=0.1 and alpha=0.9 produce a 80% confidence
# interval (90% - 10% = 80%).

# Forecaster for quantile 10%
forecaster_q10 = ForecasterAutoregDirect(
                     regressor = LGBMRegressor(
                                     objective    = 'quantile',
                                     metric       = 'quantile',
                                     alpha        = 0.1,
                                     random_state = 15926,
                                     verbose      = -1
                                     
                                 ),
                     lags  = 7,
                     steps = 7
                 )
# Forecaster for quantile 90%
forecaster_q90 = ForecasterAutoregDirect(
                     regressor = LGBMRegressor(
                                     objective    = 'quantile',
                                     metric       = 'quantile',
                                     alpha        = 0.9,
                                     random_state = 15926,
                                     verbose      = -1
                                     
                                 ),
                     lags  = 7,
                     steps = 7
                 )

When validating a quantile regression model, a custom metric must be provided depending on the quantile being estimated.

In [160]:
# Loss function for each quantile (pinball_loss)
# ==============================================================================
def mean_pinball_loss_q10(y_true, y_pred):
    """
    Pinball loss for quantile 10.
    """
    return mean_pinball_loss(y_true, y_pred, alpha=0.1)


def mean_pinball_loss_q90(y_true, y_pred):
    """
    Pinball loss for quantile 90.
    """
    return mean_pinball_loss(y_true, y_pred, alpha=0.9)
In [161]:
# Bayesian search of hyper-parameters and lags for each quantile forecaster
# ==============================================================================
def search_space(trial):
    search_space  = {
        'n_estimators'  : trial.suggest_int('n_estimators', 100, 500, step=50),
        'max_depth'     : trial.suggest_int('max_depth', 3, 10, step=1),
        'learning_rate' : trial.suggest_float('learning_rate', 0.01, 0.1)
    }

    return search_space

results_grid_q10 = bayesian_search_forecaster(
                       forecaster         = forecaster_q10,
                       y                  = data.loc[:end_validation, 'users'],
                       steps              = 7,
                       metric             = mean_pinball_loss_q10,
                       search_space       = search_space,
                       initial_train_size = len(data[:end_train]),
                       refit              = False,
                       n_trials           = 10, # Increase for more exhaustive search
                       random_state       = 123,
                       return_best        = True,
                       n_jobs             = 'auto',
                       verbose            = False,
                       show_progress      = True
                   )

results_grid_q90 = bayesian_search_forecaster(
                       forecaster         = forecaster_q90,
                       y                  = data.loc[:end_validation, 'users'],
                       steps              = 7,
                       metric             = mean_pinball_loss_q90,
                       search_space       = search_space,
                       initial_train_size = len(data[:end_train]),
                       refit              = False,
                       n_trials           = 10, # Increase for more exhaustive search
                       random_state       = 123,
                       return_best        = True,
                       n_jobs             = 'auto',
                       verbose            = False,
                       show_progress      = True
                   )
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [1 2 3 4 5 6 7] 
  Parameters: {'n_estimators': 250, 'max_depth': 3, 'learning_rate': 0.04582398297973883}
  Backtesting metric: 222.8916557556898

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [1 2 3 4 5 6 7] 
  Parameters: {'n_estimators': 400, 'max_depth': 4, 'learning_rate': 0.02579065805327433}
  Backtesting metric: 164.5817884709446

Once the best hyper-parameters have been found for each forecaster, a backtesting process is applied again using the test data.

In [162]:
# Backtesting on test data
# ==============================================================================
metric_q10, predictions_q10 = backtesting_forecaster(
                                  forecaster          = forecaster_q10,
                                  y                   = data['users'],
                                  steps               = 7,
                                  metric              = mean_pinball_loss_q10,
                                  initial_train_size  = len(data.loc[:end_validation]),
                                  refit               = False,
                                  n_jobs              = 'auto',
                                  verbose             = False,
                                  show_progress       = True
                              )

metric_q90, predictions_q90 = backtesting_forecaster(
                                  forecaster          = forecaster_q90,
                                  y                   = data['users'],
                                  steps               = 7,
                                  metric              = mean_pinball_loss_q90,
                                  initial_train_size  = len(data.loc[:end_validation]),
                                  refit               = False,
                                  n_jobs              = 'auto',
                                  verbose             = False,
                                  show_progress       = True
                              )
In [163]:
# Plot
# ==============================================================================
fig = go.Figure([
    go.Scatter(name='Real value', x=data_test.index, y=data_test['users'], mode='lines'),
    go.Scatter(
        name='Upper Bound', x=predictions_q90.index, y=predictions_q90['pred'],
        mode='lines', marker=dict(color="#444"), line=dict(width=0), showlegend=False
    ),
    go.Scatter(
        name='Lower Bound', x=predictions_q10.index, y=predictions_q10['pred'],
        marker=dict(color="#444"), line=dict(width=0), mode='lines',
        fillcolor='rgba(68, 68, 68, 0.3)', fill='tonexty', showlegend=False
    )
])
fig.update_layout(
    title="Real value vs predicted in test data",
    xaxis_title="Date time",
    yaxis_title="users",
    width=900,
    height=400,
    margin=dict(l=20, r=20, t=35, b=20),
    hovermode="x",
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.1,
        xanchor="left",
        x=0.001
    )
)
fig.show()
In [164]:
# Predicted interval coverage (on test data)
# ==============================================================================
coverage = empirical_coverage(
    y = data.loc[end_validation:, 'users'],
    lower_bound = predictions_q10["pred"], 
    upper_bound = predictions_q90["pred"]
)
print(f"Predicted interval coverage: {round(100*coverage, 2)} %")

# Area of the interval
# ==============================================================================
area = (predictions_q90["pred"] - predictions_q10["pred"]).sum()
print(f"Area of the interval: {round(area, 2)}")
Predicted interval coverage: 49.53 %
Area of the interval: 207151.73

In this use case, the quantile forecasting strategy does not achieve empirical coverage close to the expected coverage (80 percent).

Session information

In [165]:
import session_info
session_info.show(html=False)
-----
lightgbm            4.4.0
matplotlib          3.9.1
numpy               2.0.1
optuna              3.6.1
pandas              2.2.2
plotly              5.23.0
scipy               1.14.0
session_info        1.0.0
skforecast          0.13.0
sklearn             1.5.1
statsmodels         0.14.2
-----
IPython             8.26.0
jupyter_client      8.6.2
jupyter_core        5.7.2
notebook            6.4.12
-----
Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0]
Linux-5.15.0-1066-aws-x86_64-with-glibc2.31
-----
Session information updated at 2024-08-05 20:08

Bibliography

Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia.

Forecasting: theory and practice PDF

Citation

How to cite this document

If you use this document or any part of it, please acknowledge the source, thank you!

Probabilistic forecasting with machine learning by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) at https://cienciadedatos.net/documentos/py42-probabilistic-forecasting.html

How to cite skforecast

If you use skforecast for a publication, we would appreciate it if you cite the published software.

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.13.0). Zenodo. https://doi.org/10.5281/zenodo.8382788

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.13.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.13.0}, month = {8}, year = {2024}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }


Did you like the article? Your support is important

Website maintenance has high cost, your contribution will help me to continue generating free educational content. Many thanks! 😊


Creative Commons Licence
This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.