Time series forecasting (Part 2 of 3): Selecting algorithms

Published in

Data Science at Microsoft

8 min readApr 13, 2021

By Yasmin Bokobza and Siddharth Kumar

This is the second article of a series focusing on time series forecasting methods and applications. In Part 1, we discussed how to choose the right forecasting method and how to make the forecasting task simpler. We also provided a list of some popular Python packages for performing forecasting and time series analysis. In Part 3, we discuss approaches to time series forecasting with an emphasis on what led us to develop the Adaptive Univariate Time Series (AUTS) algorithm for the forecasting tasks we have encountered, and delve into details of the AUTS methodology, including how we deployed the model using Microsoft Azure.

In this article, we discuss our approach to evaluating the accuracy of some forecasting models and choosing the most accurate one as advised by the capabilities of the Univariate Forecast Engine that we developed. In addition, we review some metrics to evaluate the performance of the models. We also describe how we help stakeholders — who are not necessarily data scientists — to move quickly on their time series problems and make high quality forecasts. We hope that this article helps you with your own business problems.

Algorithm selection

As discussed in Part 1, defining the problem and then exploring and preparing the data enable us to simplify the prediction problem and focus on a set of relevant algorithms. In order to fit the most accurate model for our forecasting task we must compare different algorithms. In this section we discuss how we did this for the prediction problems we encountered in the hope that it will provide some guidance.

Univariate Forecast Engine

For our univariate prediction problems we leverage the training and validation framework provided by our Univariate Forecast Engine. Figure 1 presents the high-level architecture of the engine. Input into the engine is historical data that can be a customer’s purchase history, usage history, and so on. In addition, related data should also be provided such as the forecast horizon size, data granularity, presence of low latency constraints, and so on.

Figure 1: High-level architecture of our team’s Univariate Time Series Forecast Engine

After feeding in the input, the engine trains different machine learning models like AUTS (Adaptive Univariate Time Series, which we developed and describe in the next article in this series), Auto ARIMA [see Reference Note 1 at the end of the article], Facebook Prophet [Reference Note 2], and more. Model evaluation is performed by using the rolling-window validation approach. That is, one training record is generated for each selected cutoff date in the customer’s history and the target variables correspond to N points into the future from that date. The same procedure is applied for validation as well. Figure 2 depicts the rolling-window validation approach. Based on the cutoff date the training input is defined in the desired size and the forecasts evaluate them compared to the actual values in the forecast horizon. The number of cutoff dates and the lag between them can be defined.

Figure 2: Rolling-window validation approach

The engine outputs are the forecasts of the most accurate model presented in a fixed schema. Figure 3 presents sample output of the engine in a fixed schema that gives information about the customers, the product, and the model that produced the predictions as well as the predictions themselves.

Figure 3: Sample output of our team’s Univariate Time Series Forecast Engine

Evaluation metric

To choose the most accurate model among the ones in the engine we must compare their performance. At each forecast level or granularity, we aim to evaluate each model using a metric for goodness of fit. There are several metrics useful for evaluating model performance and so it is important to choose the right one because some metrics are good for certain use cases and others for other use cases.

In this article we review four common metrics for prediction problems. The first is mean absolute error (MAE):

Equation 1: MAE metric

This metric calculates the average of the errors for the n fitted points. Namely, the average of the differences between the predicted value Fₜ and the actual value Aₜ. The MAE is easy to use and understand, and it is resistant to anomalies. That is, if there is an anomaly in the data the MAE gives the same weight to each of the points. The second metric is Root Mean Squared Error (RMSE):

This metric calculates the root of the sum of the squares of the errors, but it is not resistant to anomalies. For example, suppose there are two points and for the first point the error is one unit and for the second point the error is nine units. By using MAE, the error is:

Conversely, by using RMSE, the error is:

That is, the RMSE is higher for data with anomalies. Therefore, for cases in which it is necessary to predict anomalies more efficiently, the RMSE is a better metric. In other cases in which there is a need, for example, to obtain a higher accuracy for users or customers with average performance, MAE is the better metric to use. This means the choice of metric depends on the business use case.

Both the RMSE and the MEA are not unit-free, i.e., they depend on the scale of the data. In contrast, the following two metrics indicate percentage errors, which makes them unit-free and so they can therefore be used to compare prediction performance among various datasets.

The next two are Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE):

Where Aₜ is the actual value, Fₜ is the forecast value, and n is the number of fitted points. One challenge in our forecasting scenarios is that cost or revenue depends on several factors, like the size of the customer or the industry to which the customer belongs. Therefore, we have a strong variation in cost/revenue across our time series, which adds a certain bias depending on which metric is used. Hence, we should compare between the models’ performance using scale-independent metrics like MAPE and SMAPE [Reference Note 3].

MAPE is asymmetric and puts a heavier penalty on negative errors (over-forecasting) than on positive errors (under-forecasting). This is caused by the fact that the percentage error cannot exceed 100 percent for forecasts that are too low, while there is no upper limit for forecasts that are too high. As a result, MAPE favors models that under-forecast rather than over-forecast. However, SMAPE fixes the asymmetry of boundlessness of the MAPE, because it has both the lower (0 percent) and the upper (200 percent) bounds. This observation led us to use SMAPE because in our business forecast scenarios we care about both types of errors [Reference Note 3]. You can read more on evaluating forecast accuracy in the Hyndman book that is listed in the recommended reading list below.

Univariate Forecast Engine as an Azure Web Service

A variation of our Univariate Forecast Engine has been deployed as a Microsoft-internal Azure Web Service and may serve internal stakeholders who are not necessarily data scientists to move quickly on their time series problems and make high-quality forecasts. Figure 4 presents the UI of the engine that enables inputs to be made of the following parameters:

Forecast algorithm: The algorithm used to train a model and produce forecasts. If no algorithm is selected the engine performs evaluations of different models and returns forecasts from the most accurate model.

Granularity: The frequency or interval at which the data are recorded. For example, every week.

Forecast horizon: The number of time-steps to make. Namely, how far into the future to make predictions.

Seasonality type: Seasonality consists of the regular patterns of variability within certain time periods, such as a week. By default, seasonality is determined by the granularity of the time series. For example, for a weekly time series, to predict for week t, the patterns of variability learn from the observations corresponding to week t in previous months.

Lower quantile: The percentage X of the prediction interval. In other words, X percent of the time, the actual value is expected to be less. By default, the value is 20 percent.

Upper quantile: The percentage X of the prediction interval. In other words, X percent of the time, the actual value is expected to be less. By default, the value is 80 percent.

Metric: This parameter is optional and describes the quantity predicted.

Amount of historical data: This parameter is optional and specifies the number of time-steps in the historical data that are used to train the model.

File: Provides data for training the predictor. The file should be in the following schema:

Item: A reference to the associated scenario.
KeyType: The identifier type, such as customer ID.
KeyValue: An identifier for the forecast.
DateTime: A time-step in the historical data.
Target: The feature for which the predictor produces forecasts.

Figure 4: The Univariate Forecast Engine as an Azure Web Service

Conclusions

In this article, we reviewed details of our approach to algorithm selection for forecasting problems we have encountered. In addition, we presented how we leverage our approach to enable stakeholders who are not necessarily data scientists to move quickly on their time series problems and make high-quality forecasts. We must keep in mind that business goals and available data are the main drivers of algorithm selection. We can then compare multiple algorithms and methods.

There are different metrics for evaluating model accuracy and the one selected should be based on the business use case. In the next article in this series we discuss a few approaches to time series forecasting with an emphasis on what led us to develop the AUTS algorithm. In addition, we discuss details of the AUTS methodology, including how the model is deployed using Microsoft Azure, and we provide guidance for tackling your own business problems. We hope this series provides you with guidelines to help you conquer your own business problems.

We would like to thank Casey Doyle for helping review the work.

References

[1]. Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(1), 1–22

[2]. Sean J. Taylor, L. B. (2018). Forecasting at Scale. The American Statistician Journal.

[3]. Makridakis, S. (1993). Accuracy measures: theoretical and practical concerns. International journal of forecasting, 9(4), 527–529.

Time series forecasting (Part 1 of 3): Understanding the fundamentals

By Yasmin Bokobza and Siddharth Kumar

medium.com

Time series forecasting (Part 3 of 3): Introducing AUTS (Adaptive Univariate Time Series…

By Yasmin Bokobza and Siddharth Kumar

medium.com

Want to know more about how you can set up your Machine Learning projects for success? Read our recent article:

Setting up Machine Learning projects for success

A tried-and-tested guide to framing data science projects

medium.com

Time series forecasting (Part 2 of 3): Selecting algorithms

Algorithm selection

Univariate Forecast Engine

Evaluation metric

Univariate Forecast Engine as an Azure Web Service

Conclusions

Recommended reading

References

Time series forecasting (Part 1 of 3): Understanding the fundamentals

By Yasmin Bokobza and Siddharth Kumar

Time series forecasting (Part 3 of 3): Introducing AUTS (Adaptive Univariate Time Series…

By Yasmin Bokobza and Siddharth Kumar

Setting up Machine Learning projects for success

A tried-and-tested guide to framing data science projects

Written by Yasmin Bokobza