A primer on GluonTS

Quick prototyping of Deep Learning models for Time Series

11 min readJun 10, 2022

In 2019, at the ICML Workshop on Time Series, a team of researchers from Amazon’s AWS division presented GluonTS, a Python library for quick prototyping of Deep Learning models for Time Series applications.

In this article we’ll give it a go, trying to gauge its potential to enter the proverbial Data Science toolbox.

Working with Time Series

Time Series are far too common in Data Science: maybe because they have an extremely general definition, encompassing basically everything that is both measurable and not constant in time.

More precisely, a Time Series is a random outcome of a stochastic process evolving in time; it consists of an array of (possibly multi-dimensional) values, each associated with its timestamp.

In other words, a Time Series is a path through time made of consequential data points that were drawn from different “local” distributions.

This is in contrast with other instances of structured data, where we don’t really care or account for time dependence.

Indeed, with tabular data for example (think customer profiles from a retail firm) we implicitly assume either that all records were collected in a short window or that time doesn’t affect our data distribution too much.

Statistically, this means we are treating our dataset as a sample from a single population, i.e. a set of realizations from a single (however complex and multi-dimensional) probability distribution.

Still, even in structured, time-independent data, one can periodically assess time effects when more data becomes available by studying separate subsets and data drift to check if the assumptions of independence were, and remain, legitimate.

Forecasting and Machine Learning

By far the most common task we want to perform on a Time Series is forecasting: gaining information by creating a suitable model that retains the main features of the series, and then extrapolating future values from it.

Of the various model families that have been proposed for this very task, we’ve recently witnessed the rise of neural networks: availability of ever-increasing datasets of time series, advances in dedicated computing infrastructures and specialization of practitioners and researchers led to really complex Deep Learning architectures.

These are usually tailored to excel on sub-classes of forecasting problems, depending on the features of their Time Series [2][3][4].

Forecasting is an example of a sub-discipline in the machine learning community where the comparatively modest attention it receives in published research is in stark contrast to a tremendous business impact [3].

Introducing: GluonTS

GluonTS (documentation, tutorials) spawns from this recent wave of interest in Deep Learning for Time Series Forecasting, aiming to become an all-in-one Python modeling tool akin to darts or sktime.

We’ll let the creators of this library give us an introduction in a few sentences:

[GluonTS is] a deep learning library that bundles components, models and tools for time series applications such as forecasting or anomaly detection. […] It includes components such as distributions, neural network architectures for sequences, and feature processing steps which can be used to quickly assemble and train new models [1].

More technically, it is a Gluon toolkit for probabilistic time series modeling, allowing for both probabilistic and state-space models as well as Deep Learning models, which are of course its main focus.

Being based on the Gluon API, it relies on the Apache MXNet framework for neural networks, also supporting PyTorch backends.

Although thought mainly as a prototyping tool for experimenting with new models, GluonTS comes with implementation for several established architectures.

GluonTS was developed with three principles in mind:

Modularity: Each component has its own interface and usability.
Scalability: The goal is to be able to work equally easily independently on dataset size and numerosity. To achieve this, datasets are streamed instead of being loaded in memory.
Reproducibility: Each component can be serialized and stored in a way that favours human-readable logging. This allows for later inspection, retrieval and reproduction of configured models and experiments.

From a data standpoint, GluonTS contains both a dataset repository as well as a synthetic data generator, facilitating quick prototyping and benchmarking.

Evaluation and visualization are also made really easy thanks to the presence of specific classes that streamline these phases, producing quick comparable outputs for each model.

Taking a ride

Please note: that an executable Python notebook with the same code and explanations as below can be found here.

Let’s go through the process of exploring GluonTS and its features, by starting with a dataset and playing with it. Eventually, we’ll get to the modeling phase and try a few of the available models that are coded in the library.

Here are the main imports we are going to require: other than a few GluonTS objects, we’ll only need plotly, matplotlib and ipywidgets for interactive visualization of outcomes.

Data

For simplicity’s sake, we’ll take advantage of the suite of pre-prepared datasets available in GluonTS.
Keep in mind that, although we won’t show it here, you can turn any dataset in the conventional form required by GluonTS instead.

Anyway, we can check the available datasets in gluonts.dataset.repository.datasets:

Let’s pick one; for no particular reason, we’ll go with tourism_monthly, made of 366 monthly time series of tourist presences.
This is a heterogeneous set of series, collected through various tourism bodies and institutions: for reference, the dataset originates from here [5].

Here are some descriptive statistics about the dataset:

Number of series: 366
Mean length: 298
Median length: 330
Min length: 91
Max length: 333

A TrainDatasets object like this contains three components: two FileDataset objects for training and test respectively, and a MetaData object for storing relevant information about the dataset.

Notice that a FileDataset object like dataset.train is iterable (but not subscriptable), thus we can turn it into a list to inspect the whole of it.

Now let’s instead go deeper into one of the time series:

As it’s conventional in GluonTS, a time series is represented as a dictionary: the information is stored in the “start” and “target” fields, separating the time index from the actual values.

Other fields are reserved for useful metadata such as the id of the series and the physical location of the information.

In GluonTS available datasets, the “test” dataset contains all “train” values of the time series, plus a number of successive realizations equal to the prediction_length metadata attribute.

If we want to try and apply some external model on the dataset, we can make use of to_pandas to turn a time series dictionary into a Pandas Series.

Exploration

We can build a small widget for on-demand plotting of single series, in case we find one that stands out in some respect and we want to just visualize it:

Just out of curiosity, we can check correlations between our series:

Most of them are positively correlated, some even strongly; a couple of minorities form different clusters, being negatively or non-correlated to the rest, but still positively correlated within themselves.

Modeling

We’ve just observed how most of the series in the dataset are positively correlated; what this means is that probably it’ll be beneficial to use algorithms that learn global models, i.e. models that learn a full representation for many series instead of one for each of them.

The underlying assumption is indeed that the extra information brought by series other than the target one will compensate for the noise and variance introduced by using them.

GluonTS supports PyTorch and MXNet model backends.
In this tutorial, we’ll use a Trainer object from MXNet, properly instantiated with training hyperparameters:

After training, Evaluator will conveniently compute various performance metrics, both single-series and aggregate.

DeepAR

DeepAR [6] is a probabilistic auto-regressive model based on a Recurrent Neural Network architecture, introduced by Amazon Research in 2018.

It natively makes one-step-ahead predictions, but it is able to recursively produce n-step-ahead Montecarlo samples allowing for probabilistic (quantile) forecasts.

Moreover, it trains on a whole set of related series (it is a global model) and optionally allows for features that label each series: this helps identify patterns that may be shared by some subset of the whole set of series.

Target series and a couple of covariates [7].

It also automatically incorporates covariates such as calendar features.

Target series and calendar features (hour of day and day of the week respectively) [7].

Such bundle of series gets chopped into training samples: each sample consists of automatically selected lagged values (in red in the following figure), immediate past values (in green, regulated by the context_length parameter), and values to be predicted (in blue, regulated by the prediction_length parameter):

Selection of a training sample based on given (or inferred) parameters [7].

The RNN architecture relies on a latent recurrent state h propagating forward and feeding on lagged target values and covariate values; it produces one-step-ahead estimates for the target series by means of a likelihood node, that can be either Gaussian or Binomial, that gets optimized in the training process.

At inference, the model generates one-step-ahead predictions thanks to the same likelihood nodes, and by feeding those predictions into itself it is able to effectively produce Montecarlo sample paths of any desired length n.

Schema of training (left) and inference (right) in DeepAR [6].

Let’s instantiate a DeepAR estimator with GluonTS and train it on our tourism dataset:

The estimator object asks us for the frequency of the series (since we stored only the values in the dataset dictionary) and for the prediction length, for which we keep the default 24 months indicated in the dataset metadata.

For testing, we take advantage of a built-in method that directly gives us a specified number of Montecarlo paths.

With the Evaluator we defined above we get a plethora of metrics, both global and local (single-series). You can find more details about them in the documentation.

We can define a function for visualizing single-series forecasts and local metrics:

Temporal Fusion Transformer (TFT)

TFT is a recent architecture proposed by Google [8][9]. It is a very complex global model, using static and future-known covariates (like DeepAR).

The main difference with DeepAR, without looking under the hood of the model, is that TFT natively outputs multi-step-ahead forecasting in the form of point forecasts accompanied by prediction intervals.

This is because the end nodes of the model are trained to learn pre-determined quantile levels.

High-level illustration of TFT’s capabilities [8].

We won’t go into details regarding the architecture; instead, here are the main blocks and brief descriptions of how they work:

Variable selection blocks: used on both static features and on each step of each Time Series.
LSTM Encoder/Decoder layer: sequence-to-sequence layer (seq2seq for short) that learns short-range correlations along each Time Series. This structure is well-known and often used in NLP, especially in machine translation [10].
Temporal fusion decoder: it is the core and main novelty of the model, it accepts all encoded states coming from the previous blocks and learns long-range and cross-correlations between series. It does so with many self-attention mechanisms, each looking for its own interesting pattern.
Residual connections: extensively used in all blocks, they have been proven to add expressivity and improve gradient propagation [11].

TFT’s architecture on a block level [8].

In GluonTS we can instantiate a TFT estimator by passing a number of hyperparameters controlling the encodings, the attention heads, dropout, and so on. In particular, we can specify the quantiles we want it to output:

The syntax for training and inference is basically the same we saw in DeepAR:

Again, we can evaluate the model by computing all standard metrics and visualising the forecast.

Notice that because of the structural differences between the two models, the Forecast object coming from TFT needs to be manipulated quite a bit before being able to plot prediction intervals with the same formatting as DeepAR.

It is indeed quite instructive to look at how inference looks in each of the models:

As you can see, the first one gives separate sample paths (as we expect from DeepAR’s Montecarlo approach to n-step-ahead inference), while the second only has predictions for the specified quantiles (checks out with TFT’s output nodes).

I hope you enjoyed this article! Please do not hesitate to leave comments or feedback.

If you’d like, leave a bunch of claps below 👏 and follow me to avoid missing out on my next publications.

You can also find me on LinkedIn or through my company, xtream.

[1] V. Flunkert et al., GluonTS: Probabilistic Time Series Models in Python (2019).

[2] B. Lim et al., Time Series Forecasting With Deep Learning: A Survey (2020).

[3] K. Benidis et al., Deep learning for time series forecasting: Tutorial and literature survey (2021).

[4] R. Masini et al., Machine Learning Advances for Time Series Forecasting (2021).

[5] G. Athanasopoulos et al., The tourism forecasting competition (2011).

[6] D. Salinas et al., DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks (2019).

[7] Amazon Sagemaker developer guide, How the DeepAR Algorithm Works.

[8] B. Lim et al., Temporal Fusion Transformers for interpretable multi-horizon time series forecasting (2021).

[9] Google AI Blog, Interpretable Deep Learning for Time Series Forecasting.