Forecasting Time Series - Model Zoo¶

Note

This documentation is intended for advanced users and may not be comprehensive.

For a stable public API, refer to the documentation for TimeSeriesPredictor.

This page contains the list of time series forecasting models available in AutoGluon. The available hyperparameters for each model are listed under Other Parameters.

This list is useful if you want to override the default hyperparameters (Manually configuring models) or define custom hyperparameter search spaces (Hyperparameter tuning), as described in the In-depth Tutorial. For example, the following code will train a TimeSeriesPredictor with DeepAR and ETS models with default hyperparameters (and a weighted ensemble on top of them):

predictor = TimeSeriesPredictor().fit(
   train_data,
   hyperparameters={
      "DeepAR": {},
      "ETS": {},
   },
)

The model names in the hyperparameters dictionary don’t have to include the "Model" suffix (e.g., both "DeepAR" and "DeepARModel" correspond to DeepARModel).

Note that some of the models’ hyperparameters have names and default values that are different from the original libraries.

Overview¶

`NaiveModel`	Baseline model that sets the forecast equal to the last observed value.
`SeasonalNaiveModel`	Baseline model that sets the forecast equal to the last observed value from the same season.
`AverageModel`	Baseline model that sets the forecast equal to the historic average or quantile.
`SeasonalAverageModel`	Baseline model that sets the forecast equal to the historic average or quantile in the same season.
`ZeroModel`	A naive forecaster that always returns 0 forecasts across the prediction horizon, where the prediction intervals are computed using conformal prediction.
`ETSModel`	Exponential smoothing with trend and seasonality.
`AutoARIMAModel`	Automatically tuned ARIMA model.
`AutoETSModel`	Automatically tuned exponential smoothing with trend and seasonality.
`AutoCESModel`	Forecasting with an Complex Exponential Smoothing model where the model selection is performed using the Akaike Information Criterion.
`ThetaModel`	Theta forecasting model [Assimakopoulos2000].
`ADIDAModel`	Intermittent demand forecasting model using the Aggregate-Dissagregate Intermittent Demand Approach [Nikolopoulos2011].
`CrostonClassicModel`	Intermittent demand forecasting model using Croston's model where the smoothing parameter is fixed to 0.1 [Croston1972].
`CrostonOptimizedModel`	Intermittent demand forecasting model using Croston's model where the smoothing parameter is optimized [Croston1972].
`CrostonSBAModel`	Intermittent demand forecasting model using Croston's model with the Syntetos-Boylan bias correction approach [SyntetosBoylan2001].
`IMAPAModel`	Intermittent demand forecasting model using the Intermittent Multiple Aggregation Prediction Algorithm [Petropoulos2015].
`NPTSModel`	Non-Parametric Time Series Forecaster.
`DeepARModel`	Autoregressive forecasting model based on a recurrent neural network [Salinas2020].
`DLinearModel`	Simple feedforward neural network that subtracts trend before forecasting [Zeng2023].
`PatchTSTModel`	Transformer-based forecaster that segments each time series into patches [Nie2023].
`SimpleFeedForwardModel`	Simple feedforward neural network that simultaneously predicts all future values.
`TemporalFusionTransformerModel`	Combines LSTM with a transformer layer to predict the quantiles of all future target values [Lim2021].
`TiDEModel`	Time series dense encoder model from [Das2023].
`WaveNetModel`	WaveNet estimator that uses the architecture proposed in [Oord2016] with quantized targets.
`DirectTabularModel`	Predict all future time series values simultaneously using TabularPredictor from AutoGluon-Tabular.
`RecursiveTabularModel`	Predict future time series values one by one using TabularPredictor from AutoGluon-Tabular.
`ChronosModel`	Chronos pretrained time series forecasting models, based on the original ChronosModel implementation.

Baseline models¶

Baseline models are simple approaches that use minimal historical data to make predictions. They serve as benchmarks for evaluating more complex methods.

Baseline model that sets the forecast equal to the last observed value.

Quantiles are obtained by assuming that the residuals follow zero-mean normal distribution, scale of which is estimated from the empirical distribution of the residuals. As described in https://otexts.com/fpp3/prediction-intervals.html

Parameters:: n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.

class autogluon.timeseries.models.SeasonalNaiveModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Baseline model that sets the forecast equal to the last observed value from the same season.

Parameters:

seasonal_period (int or None, default = None) – Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, will fall back to Naive forecast. Seasonality will also be disabled, if the length of the time series is < seasonal_period.
n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.

class autogluon.timeseries.models.AverageModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Baseline model that sets the forecast equal to the historic average or quantile.

Parameters:

n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (Optional[int], default = None) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

class autogluon.timeseries.models.SeasonalAverageModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Baseline model that sets the forecast equal to the historic average or quantile in the same season.

Parameters:

seasonal_period (int or None, default = None) – Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, will fall back to Naive forecast. Seasonality will also be disabled, if the length of the time series is < seasonal_period.
n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (Optional[int], default = None) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

A naive forecaster that always returns 0 forecasts across the prediction horizon, where the prediction intervals are computed using conformal prediction.

Parameters:

n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

Statistical models¶

Statistical models capture simple patterns in the data like trends and seasonality.

Exponential smoothing with trend and seasonality.

The E (error), T (trend) and S (seasonal) components are fixed and provided by the user.

This is an alias for statsforecast.models.AutoETS.

Parameters:

model (str, default = "AAA") – Model string describing the configuration of the E (error), T (trend) and S (seasonal) model components. Each component can be one of “M” (multiplicative), “A” (additive), “N” (omitted). For example when model=”ANN” (additive error, no trend, and no seasonality), ETS will explore only a simple exponential smoothing.
seasonal_period (int or None, default = None) – Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, seasonality will be disabled.
damped (bool, default = False) – Whether to dampen the trend.
n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

class autogluon.timeseries.models.AutoARIMAModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Automatically tuned ARIMA model.

Automatically selects the best (p,d,q,P,D,Q) model parameters using an information criterion

Based on statsforecast.models.AutoARIMA.

Parameters:

d (int, optional) – Order of first differencing. If None, will be determined automatically using a statistical test.
D (int, optional) – Order of seasonal differencing. If None, will be determined automatically using a statistical test.
max_p (int, default = 5) – Maximum number of autoregressive terms.
max_q (int, default = 5) – Maximum order of moving average.
max_P (int, default = 2) – Maximum number of seasonal autoregressive terms.
max_Q (int, default = 2) – Maximum order of seasonal moving average.
max_d (int, default = 2) – Maximum order of first differencing.
max_D (int, default = 1) – Maximum order of seasonal differencing.
start_p (int, default = 2) – Starting value of p in stepwise procedure.
start_q (int, default = 2) – Starting value of q in stepwise procedure.
start_P (int, default = 1) – Starting value of P in stepwise procedure.
start_Q (int, default = 1) – Starting value of Q in stepwise procedure.
stationary (bool, default = False) – Restrict search to stationary models.
seasonal (bool, default = True) – Whether to consider seasonal models.
approximation (bool, default = True) – Approximate optimization for faster convergence.
allowdrift (bool, default = False) – If True, drift term is allowed.
allowmean (bool, default = True) – If True, non-zero mean is allowed.
seasonal_period (int or None, default = None) – Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, seasonality will be disabled.
n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

class autogluon.timeseries.models.AutoETSModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Automatically tuned exponential smoothing with trend and seasonality.

Automatically selects the best ETS (Error, Trend, Seasonality) model using an information criterion

Based on statsforecast.models.AutoETS.

Parameters:

model (str, default = "ZZZ") – Model string describing the configuration of the E (error), T (trend) and S (seasonal) model components. Each component can be one of “M” (multiplicative), “A” (additive), “N” (omitted). For example when model=”ANN” (additive error, no trend, and no seasonality), ETS will explore only a simple exponential smoothing.
seasonal_period (int or None, default = None) – Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, seasonality will be disabled.
damped (bool, default = False) – Whether to dampen the trend.
n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

class autogluon.timeseries.models.AutoCESModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Forecasting with an Complex Exponential Smoothing model where the model selection is performed using the Akaike Information Criterion.

Based on statsforecast.models.AutoCES.

References

[Svetunkov2022]

Svetunkov, Ivan, Nikolaos Kourentzes, and John Keith Ord. “Complex exponential smoothing.” Naval Research Logistics (NRL) 69.8 (2022): 1108-1123.

Parameters:

model ({"Z", "N", "S", "P", "F"}, default = "Z") – Defines type of CES model, “N” for simple CES, “S” for simple seasonality, “P” for partial seasonality (without complex part), “F” for full seasonality. When “Z” is selected, the best model is selected using Akaike Information Criterion (AIC).
seasonal_period (int or None, default = None) – Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, seasonality will be disabled.
n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

Theta forecasting model [Assimakopoulos2000].

Based on statsforecast.models.Theta.

References

[Assimakopoulos2000]

Assimakopoulos, Vassilis, and Konstantinos Nikolopoulos. “The theta model: a decomposition approach to forecasting.” International journal of forecasting 16.4 (2000): 521-530.

Parameters:

decomposition_type ({"multiplicative", "additive"}, default = "multiplicative") – Seasonal decomposition type.
seasonal_period (int or None, default = None) – Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, seasonality will be disabled.
n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

Non-Parametric Time Series Forecaster.

This models is especially well suited for forecasting sparse or intermittent time series with many zero values.

Based on gluonts.model.npts.NPTSPredictor. See GluonTS documentation for more information about the model.

Parameters:

kernel_type ({"exponential", "uniform"}, default = "exponential") – Kernel used by the model.
exp_kernel_weights (float, default = 1.0) – Scaling factor used in the exponential kernel.
use_seasonal_model (bool, default = True) – Whether to use the seasonal variant of the model.
num_samples (int, default = 100) – Number of samples generated by the forecast.
num_default_time_features (int, default = 1) – Number of time features used by seasonal model.
n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (Optional[int], default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

Statistical models for sparse data¶

Statistical models that are built specifically for sparse and nonnegative data, especially for use in intermittent demand forecasting.

Intermittent demand forecasting model using the Aggregate-Dissagregate Intermittent Demand Approach [Nikolopoulos2011].

Based on statsforecast.models.ADIDA.

References

[Nikolopoulos2011]

Nikolopoulos, K., Syntetos, A., Boylan, J. et al. An aggregate–disaggregate intermittent demand approach (ADIDA) to forecasting: an empirical proposition and analysis. J Oper Res Soc 62, 544–554 (2011). https://doi.org/10.1057/jors.2010.32

Parameters:

n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

class autogluon.timeseries.models.CrostonClassicModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Intermittent demand forecasting model using Croston’s model where the smoothing parameter is fixed to 0.1 [Croston1972].

Based on statsforecast.models.CrostonClassic.

References

[Croston1972]

Croston, John D. “Forecasting and stock control for intermittent demands.” Journal of the Operational Research Society 23.3 (1972): 289-303.

Parameters:

n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

class autogluon.timeseries.models.CrostonOptimizedModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Intermittent demand forecasting model using Croston’s model where the smoothing parameter is optimized [Croston1972].

Based on statsforecast.models.CrostonOptimized.

References

[Croston1972]

Croston, John D. “Forecasting and stock control for intermittent demands.” Journal of the Operational Research Society 23.3 (1972): 289-303.

Parameters:

n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

class autogluon.timeseries.models.CrostonSBAModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Intermittent demand forecasting model using Croston’s model with the Syntetos-Boylan bias correction approach [SyntetosBoylan2001].

Based on statsforecast.models.CrostonSBA.

References

[SyntetosBoylan2001]

Syntetos, Aris A., and John E. Boylan. “On the bias of intermittent demand estimates.” International journal of production economics 71.1-3 (2001): 457-466.

Parameters:

n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

Intermittent demand forecasting model using the Intermittent Multiple Aggregation Prediction Algorithm [Petropoulos2015].

Based on statsforecast.models.IMAPA.

References

[Petropoulos2015]

Petropoulos, Fotios, and Nikolaos Kourentzes. “Forecast combinations for intermittent demand.” Journal of the Operational Research Society 66.6 (2015): 914-924.

Parameters:

n_jobs (int or float, default = 0.5) – Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
max_ts_length (int, default = 2500) – If not None, only the last max_ts_length time steps of each time series will be used to train the model. This significantly speeds up fitting and usually leads to no change in accuracy.

Deep learning models¶

Deep learning models use neural networks to capture complex patterns in the data.

Autoregressive forecasting model based on a recurrent neural network [Salinas2020].

Based on gluonts.torch.model.deepar.DeepAREstimator. See GluonTS documentation for additional hyperparameters.

References

[Salinas2020]

Salinas, David, et al. “DeepAR: Probabilistic forecasting with autoregressive recurrent networks.” International Journal of Forecasting. 2020.

Parameters:

context_length (int, default = max(10, 2 * prediction_length)) – Number of steps to unroll the RNN for before computing predictions
disable_static_features (bool, default = False) – If True, static features won’t be used by the model even if they are present in the dataset. If False, static features will be used by the model if they are present in the dataset.
disable_known_covariates (bool, default = False) – If True, known covariates won’t be used by the model even if they are present in the dataset. If False, known covariates will be used by the model if they are present in the dataset.
num_layers (int, default = 2) – Number of RNN layers
hidden_size (int, default = 40) – Number of RNN cells for each layer
dropout_rate (float, default = 0.1) – Dropout regularization parameter
embedding_dimension (int, optional) – Dimension of the embeddings for categorical features (if None, defaults to [min(50, (cat+1)//2) for cat in cardinality])
max_cat_cardinality (int, default = 100) – Maximum number of dimensions to use when one-hot-encoding categorical known_covariates.
distr_output (gluonts.torch.distributions.DistributionOutput, default = StudentTOutput()) – Distribution to use to evaluate observations and sample predictions
scaling (bool, default = True) – Whether to automatically scale the target values
max_epochs (int, default = 100) – Number of epochs the model will be trained for
batch_size (int, default = 64) – Size of batches used during training
predict_batch_size (int, default = 500) – Size of batches used during prediction.
num_batches_per_epoch (int, default = 50) – Number of batches processed every epoch
lr (float, default = 1e-3,) – Learning rate used during training
trainer_kwargs (dict, optional) – Optional keyword arguments passed to lightning.Trainer.
early_stopping_patience (int or None, default = 20) – Early stop training if the validation loss doesn’t improve for this many epochs.
keep_lightning_logs (bool, default = False) – If True, lightning_logs directory will NOT be removed after the model finished training.

class autogluon.timeseries.models.DLinearModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Simple feedforward neural network that subtracts trend before forecasting [Zeng2023].

Based on gluonts.torch.model.d_linear.DLinearEstimator. See GluonTS documentation for additional hyperparameters.

References

[Zeng2023]

Zeng, Ailing, et al. “Are transformers effective for time series forecasting?” AAAI Conference on Artificial Intelligence. 2023.

Parameters:

context_length (int, default = 96) – Number of time units that condition the predictions
hidden_dimension (int, default = 20) – Size of hidden layers in the feedforward network
distr_output (gluonts.torch.distributions.DistributionOutput, default = StudentTOutput()) – Distribution to fit.
scaling ({"mean", "std", None}, default = "mean") – Scaling applied to the inputs. One of "mean" (mean absolute scaling), "std" (standardization), None (no scaling).
max_epochs (int, default = 100) – Number of epochs the model will be trained for
batch_size (int, default = 64) – Size of batches used during training
predict_batch_size (int, default = 500) – Size of batches used during prediction.
num_batches_per_epoch (int, default = 50) – Number of batches processed every epoch
lr (float, default = 1e-3,) – Learning rate used during training
trainer_kwargs (dict, optional) – Optional keyword arguments passed to lightning.Trainer.
early_stopping_patience (int or None, default = 20) – Early stop training if the validation loss doesn’t improve for this many epochs.
weight_decay (float, default = 1e-8) – Weight decay regularization parameter.
keep_lightning_logs (bool, default = False) – If True, lightning_logs directory will NOT be removed after the model finished training.

class autogluon.timeseries.models.PatchTSTModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Transformer-based forecaster that segments each time series into patches [Nie2023].

Based on gluonts.torch.model.d_linear.PatchTSTEstimator. See GluonTS documentation for additional hyperparameters.

References

[Nie2023]

Nie, Yuqi, et al. “A Time Series is Worth 64 Words: Long-term Forecasting with Transformers.” International Conference on Learning Representations. 2023.

Parameters:

context_length (int, default = 96) – Number of time units that condition the predictions
patch_len (int, default = 16) – Length of the patch.
stride (int, default = 8) – Stride of the patch.
d_model (int, default = 32) – Size of hidden layers in the Transformer encoder.
nhead (int, default = 4) – Number of attention heads in the Transformer encoder which must divide d_model.
num_encoder_layers (int, default = 2) – Number of layers in the Transformer encoder.
distr_output (gluonts.torch.distributions.DistributionOutput, default = StudentTOutput()) – Distribution to fit.
scaling ({"mean", "std", None}, default = "mean") – Scaling applied to the inputs. One of "mean" (mean absolute scaling), "std" (standardization), None (no scaling).
max_epochs (int, default = 100) – Number of epochs the model will be trained for
batch_size (int, default = 64) – Size of batches used during training
num_batches_per_epoch (int, default = 50) – Number of batches processed every epoch
lr (float, default = 1e-3,) – Learning rate used during training
weight_decay (float, default = 1e-8) – Weight decay regularization parameter.
keep_lightning_logs (bool, default = False) – If True, lightning_logs directory will NOT be removed after the model finished training.

class autogluon.timeseries.models.SimpleFeedForwardModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Simple feedforward neural network that simultaneously predicts all future values.

Based on gluonts.torch.model.simple_feedforward.SimpleFeedForwardEstimator. See GluonTS documentation for additional hyperparameters.

Parameters:

context_length (int, default = max(10, 2 * prediction_length)) – Number of time units that condition the predictions
hidden_dimensions (List[int], default = [20, 20]) – Size of hidden layers in the feedforward network
distr_output (gluonts.torch.distributions.DistributionOutput, default = StudentTOutput()) – Distribution to fit.
batch_normalization (bool, default = False) – Whether to use batch normalization
mean_scaling (bool, default = True) – Scale the network input by the data mean and the network output by its inverse
max_epochs (int, default = 100) – Number of epochs the model will be trained for
batch_size (int, default = 64) – Size of batches used during training
predict_batch_size (int, default = 500) – Size of batches used during prediction.
num_batches_per_epoch (int, default = 50) – Number of batches processed every epoch
lr (float, default = 1e-3,) – Learning rate used during training
trainer_kwargs (dict, optional) – Optional keyword arguments passed to lightning.Trainer.
early_stopping_patience (int or None, default = 20) – Early stop training if the validation loss doesn’t improve for this many epochs.
keep_lightning_logs (bool, default = False) – If True, lightning_logs directory will NOT be removed after the model finished training.

class autogluon.timeseries.models.TemporalFusionTransformerModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Combines LSTM with a transformer layer to predict the quantiles of all future target values [Lim2021].

Based on gluonts.torch.model.tft.TemporalFusionTransformerEstimator. See GluonTS documentation for additional hyperparameters.

References

[Lim2021]

Lim, Bryan, et al. “Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting.” International Journal of Forecasting. 2021.

Parameters:

context_length (int, default = max(64, 2 * prediction_length)) – Number of past values used for prediction.
disable_static_features (bool, default = False) – If True, static features won’t be used by the model even if they are present in the dataset. If False, static features will be used by the model if they are present in the dataset.
disable_known_covariates (bool, default = False) – If True, known covariates won’t be used by the model even if they are present in the dataset. If False, known covariates will be used by the model if they are present in the dataset.
disable_past_covariates (bool, default = False) – If True, past covariates won’t be used by the model even if they are present in the dataset. If False, past covariates will be used by the model if they are present in the dataset.
hidden_dim (int, default = 32) – Size of the LSTM & transformer hidden states.
variable_dim (int, default = 32) – Size of the feature embeddings.
num_heads (int, default = 4) – Number of attention heads in self-attention layer in the decoder.
dropout_rate (float, default = 0.1) – Dropout regularization parameter
max_epochs (int, default = 100) – Number of epochs the model will be trained for
batch_size (int, default = 64) – Size of batches used during training
predict_batch_size (int, default = 500) – Size of batches used during prediction.
num_batches_per_epoch (int, default = 50) – Number of batches processed every epoch
lr (float, default = 1e-3,) – Learning rate used during training
trainer_kwargs (dict, optional) – Optional keyword arguments passed to lightning.Trainer.
early_stopping_patience (int or None, default = 20) – Early stop training if the validation loss doesn’t improve for this many epochs.
keep_lightning_logs (bool, default = False) – If True, lightning_logs directory will NOT be removed after the model finished training.

Time series dense encoder model from [Das2023].

Based on gluonts.torch.model.tide.TiDEEstimator. See GluonTS documentation for additional hyperparameters.

References

[Das2023]

Das, Abhimanyu, et al. “Long-term Forecasting with TiDE: Time-series Dense Encoder.” Transactions of Machine Learning Research. 2023.

Parameters:

context_length (int, default = max(64, 2 * prediction_length)) – Number of past values used for prediction.
disable_static_features (bool, default = False) – If True, static features won’t be used by the model even if they are present in the dataset. If False, static features will be used by the model if they are present in the dataset.
disable_known_covariates (bool, default = False) – If True, known covariates won’t be used by the model even if they are present in the dataset. If False, known covariates will be used by the model if they are present in the dataset.
disable_past_covariates (bool, default = False) – If True, past covariates won’t be used by the model even if they are present in the dataset. If False, past covariates will be used by the model if they are present in the dataset.
feat_proj_hidden_dim (int, default = 4) – Size of the feature projection layer.
encoder_hidden_dim (int, default = 4) – Size of the dense encoder layer.
decoder_hidden_dim (int, default = 4) – Size of the dense decoder layer.
temporal_hidden_dim (int, default = 4) – Size of the temporal decoder layer.
distr_hidden_dim (int, default = 4) – Size of the distribution projection layer.
num_layers_encoder (int, default = 1) – Number of layers in dense encoder.
num_layers_decoder (int, default = 1) – Number of layers in dense decoder.
decoder_output_dim (int, default = 4) – Output size of the dense decoder.
dropout_rate (float, default = 0.3) – Dropout regularization parameter.
num_feat_dynamic_proj (int, default = 2) – Output size of feature projection layer.
embedding_dimension (int, default = [16] * num_feat_static_cat) – Dimension of the embeddings for categorical features
layer_norm (bool, default = False) – Should layer normalization be enabled?
scaling ({"mean", "std", None}, default = "mean") – Scaling applied to the inputs. One of "mean" (mean absolute scaling), "std" (standardization), None (no scaling).
max_epochs (int, default = 100) – Number of epochs the model will be trained for
batch_size (int, default = 64) – Size of batches used during training
predict_batch_size (int, default = 500) – Size of batches used during prediction.
num_batches_per_epoch (int, default = 50) – Number of batches processed every epoch
lr (float, default = 1e-3,) – Learning rate used during training
trainer_kwargs (dict, optional) – Optional keyword arguments passed to lightning.Trainer.
early_stopping_patience (int or None, default = 20) – Early stop training if the validation loss doesn’t improve for this many epochs.
keep_lightning_logs (bool, default = False) – If True, lightning_logs directory will NOT be removed after the model finished training.

class autogluon.timeseries.models.WaveNetModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

WaveNet estimator that uses the architecture proposed in [Oord2016] with quantized targets.

The model is based on a CNN architecture with dilated convolutions. Time series values are quantized into buckets and the model is trained using the cross-entropy loss.

Based on gluonts.torch.model.wavenet.WaveNetEstimator. See GluonTS documentation for additional hyperparameters.

References

[Oord2016]

Oord, Aaron van den, et al. “Wavenet: A generative model for raw audio” arXiv preprint arXiv:1609.03499. 2016.

Parameters:

num_bins (int, default = 1024) – Number of bins used for quantization of the time series.
num_residual_channels (int, default = 24) – Number of residual channels in WaveNet architecture.
num_skip_channels (int, default = 32) – Number of skip channels in WaveNet architecture, by default 32
dilation_depth (int or None, default = None) – Number of dilation layers in WaveNet architecture. If set to None (default), dilation_depth is set such that the receptive length is at least as long as the seasonality and at least 2 * prediction_length.
num_stacks (int, default = 1) – Number of dilation stacks in WaveNet architecture.
temperature (float, default = 1.0) – Temperature used for sampling from softmax distribution.
seasonality (int, optional) – The seasonality of the time series. By default is set based on the freq of the data.
embedding_dimension (int, default = 5) – The dimension of the embeddings for categorical features.
use_log_scale_feature (bool, default = True) – If True, logarithm of the scale of the past data will be used as an additional static feature.
negative_data (bool, default = True) – Flag indicating whether the time series take negative values.
max_cat_cardinality (int, default = 100) – Maximum number of dimensions to use when one-hot-encoding categorical known_covariates.
max_epochs (int, default = 100) – Number of epochs the model will be trained for
batch_size (int, default = 64) – Size of batches used during training
predict_batch_size (int, default = 500) – Size of batches used during prediction.
num_batches_per_epoch (int, default = 50) – Number of batches processed every epoch
lr (float, default = 1e-3,) – Learning rate used during training
trainer_kwargs (dict, optional) – Optional keyword arguments passed to lightning.Trainer.
early_stopping_patience (int or None, default = 20) – Early stop training if the validation loss doesn’t improve for this many epochs.
weight_decay (float, default = 1e-8) – Weight decay regularization parameter.
keep_lightning_logs (bool, default = False) – If True, lightning_logs directory will NOT be removed after the model finished training.

Tabular models¶

Tabular models convert time series forecasting into a tabular regression problem.

class autogluon.timeseries.models.DirectTabularModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Predict all future time series values simultaneously using TabularPredictor from AutoGluon-Tabular.

A single TabularPredictor is used to forecast all future time series values using the following features:

lag features (observed time series values) based on freq of the data
time features (e.g., day of the week) based on the timestamp of the measurement
known covariates (if available)
static features of each item (if available)

Features not known during the forecast horizon (e.g., future target values) are replaced by NaNs.

If eval_metric.needs_quantile, the TabularPredictor will be trained with "quantile" problem type. Otherwise, TabularPredictor will be trained with "regression" problem type, and dummy quantiles will be obtained by assuming that the residuals follow zero-mean normal distribution.

Based on the mlforecast library.

Parameters:

lags (List[int], default = None) – Lags of the target that will be used as features for predictions. If None, will be determined automatically based on the frequency of the data.
date_features (List[Union[str, Callable]], default = None) – Features computed from the dates. Can be pandas date attributes or functions that will take the dates as input. If None, will be determined automatically based on the frequency of the data.
differences (List[int], default = []) – Differences to take of the target before computing the features. These are restored at the forecasting step. If None, will be set to [seasonal_period], where seasonal_period is determined based on the data frequency. Defaults to no differencing.
scaler ({"standard", "mean_abs", None}, default = "mean_abs") – Scaling applied to each time series.
tabular_hyperparameters (Dict[Dict[str, Any]], optional) – Hyperparameters dictionary passed to TabularPredictor.fit. Contains the names of models that should be fit. Defaults to {"GBM": {}}.
tabular_fit_kwargs (Dict[str, Any], optional) – Additional keyword arguments passed to TabularPredictor.fit. Defaults to an empty dict.
max_num_items (int or None, default = 20_000) – If not None, the model will randomly select this many time series for training and validation.
max_num_samples (int or None, default = 1_000_000) – If not None, training dataset passed to TabularPredictor will contain at most this many rows (starting from the end of each time series).

class autogluon.timeseries.models.RecursiveTabularModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Predict future time series values one by one using TabularPredictor from AutoGluon-Tabular.

A single TabularPredictor is used to forecast the future time series values using the following features:

lag features (observed time series values) based on freq of the data
time features (e.g., day of the week) based on the timestamp of the measurement
known covariates (if available)
static features of each item (if available)

TabularPredictor will always be trained with "regression" problem type, and dummy quantiles will be obtained by assuming that the residuals follow zero-mean normal distribution.

Based on the mlforecast library.

Parameters:

lags (List[int], default = None) – Lags of the target that will be used as features for predictions. If None, will be determined automatically based on the frequency of the data.
date_features (List[Union[str, Callable]], default = None) – Features computed from the dates. Can be pandas date attributes or functions that will take the dates as input. If None, will be determined automatically based on the frequency of the data.
differences (List[int], default = None) – Differences to take of the target before computing the features. These are restored at the forecasting step. If None, will be set to [seasonal_period], where seasonal_period is determined based on the data frequency.
scaler ({"standard", "mean_abs", None}, default = "standard") – Scaling applied to each time series.
tabular_hyperparameters (Dict[Dict[str, Any]], optional) – Hyperparameters dictionary passed to TabularPredictor.fit. Contains the names of models that should be fit. Defaults to {"GBM": {}}.
tabular_fit_kwargs (Dict[str, Any], optional) – Additional keyword arguments passed to TabularPredictor.fit. Defaults to an empty dict.
max_num_items (int or None, default = 20_000) – If not None, the model will randomly select this many time series for training and validation.
max_num_samples (int or None, default = 1_000_000) – If not None, training dataset passed to TabularPredictor will contain at most this many rows (starting from the end of each time series).

Pretrained models¶

Deep learning models pretrained on large time series datasets, able to perform zero-shot forecasting.

class autogluon.timeseries.models.ChronosModel(freq: str | None = None, prediction_length: int = 1, path: str | None = None, name: str | None = None, eval_metric: str | None = None, hyperparameters: Dict[str, Any] | None = None, **kwargs)[source]¶

Chronos pretrained time series forecasting models, based on the original ChronosModel implementation.

Chronos is family of pretrained models, based on the T5 family, with number of parameters ranging between 8M and 710M. The full collection of Chronos models is available on Hugging Face. For Chronos small, base, and large variants a GPU is required to perform inference efficiently.

Chronos takes a minimalistic approach to pretraining time series models, by discretizing time series data directly into bins which are treated as tokens, effectively performing regression by classification. This results in a simple and flexible framework for using any language model in the context of time series forecasting. See [Ansari2024] for more information.

References

[Ansari2024]

Ansari, Abdul Fatir, Stella, Lorenzo et al. “Chronos: Learning the Language of Time Series.” http://arxiv.org/abs/2403.07815

Parameters:

model_path (str, default = "autogluon/chronos-t5-small") – Model path used for the model, i.e., a HuggingFace transformers name_or_path. Can be a compatible model name on HuggingFace Hub or a local path to a model directory. Original Chronos models (i.e., autogluon/chronos-t5-{model_size}) can be specified with aliases tiny, mini , small, base, and large.
batch_size (int, default = 16) – Size of batches used during inference
num_samples (int, default = 20) – Number of samples used during inference
device (str, default = None) – Device to use for inference. If None, model will use the GPU if available. For larger model sizes small, base, and large; inference will fail if no GPU is available.
context_length (int or None, default = None) – The context length to use in the model. Shorter context lengths will decrease model accuracy, but result in faster inference. If None, the model will infer context length from the data set length at inference time, but set it to a maximum of 512.
optimization_strategy ({None, "onnx", "openvino"}, default = None) – Optimization strategy to use for inference on CPUs. If None, the model will use the default implementation. If onnx, the model will be converted to ONNX and the inference will be performed using ONNX. If openvino, inference will be performed with the model compiled to OpenVINO.
torch_dtype (torch.dtype or {"auto", "bfloat16", "float32", "float64"}, default = "auto") – Torch data type for model weights, provided to from_pretrained method of Hugging Face AutoModels. If original Chronos models are specified and the model size is small, base, or large, the torch_dtype will be set to bfloat16 to enable inference on GPUs.
data_loader_num_workers (int, default = 0) – Number of worker processes to be used in the data loader. See documentation on torch.utils.data.DataLoader for more information.

MXNet Models¶

MXNet models from GluonTS have been deprecated because of dependency conflicts caused by MXNet.

Additional features¶

Overview of the additional features and covariates supported by different models. Models not included in this table currently do not support any additional features.

Model	Static features (continuous + categorical)	Known covariates (continuous + categorical)	Past covariates (continuous + categorical)
`DirectTabularModel`	✅	✅
`RecursiveTabularModel`	✅	✅
`DeepARModel`	✅	✅
`TemporalFusionTransformerModel`	✅	✅	✅
`TiDEModel`	✅	✅
`WaveNetModel`	✅	✅