1. Introduction
Forecasting electricity demand is currently amongst the most important challenges for the industries. Due to the increasingly high level of electricity consumption, electrical companies need to efficiently manage the production of energy. Sustainable production plans are required to meet demands and account for important challenges of this century such as global warming and the energy crisis. Smart meters now provide useful data that can help to understand consumption patterns and monitor power demand more efficiently. Data mining techniques can use this information to learn from historical past data and predict the expected demand to make decisions accordingly. Obtaining accurate forecasts can be essential for the future electricity market considering the increasing penetration of renewable energies. However, forecasting power demand is a complex task that involves many factors and requires sophisticated machine learning models to produce high-quality predictions.
Statistical-based models, such as the Box–Jenkins model called ARIMA, were for many years the state-of-the-art for electricity time series forecasting [
1,
2]. However, machine learning models have proven to provide better performance for problems of this domain. Artificial neural networks (ANNs) [
3], support vector machines (SVMs) [
4,
5], and regression trees [
6] have been applied successfully for diverse power demand prediction tasks. More recently, deep learning (DL) has emerged as a very powerful approach for time series forecasting. DL models are especially suitable for big-data temporal sequences due to their capacity to extract complex patterns automatically without feature extraction preprocessing steps [
7]. As an evolution from simple ANNs, deep, fully connected networks have been applied for load forecasting problems [
8]. However, fully connected networks are unable to capture the temporal dependencies of a time series. Consequently, more specialised DL models such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs) started to gain importance in the time series forecasting field. These networks can efficiently encode the underlying patterns of time series by transforming the temporal problem into a spatial architecture [
9].
In the recent literature, a significant number of studies presenting results of the application of RNNs to energy-related time series forecasting can be found [
10,
11]. Among all existing RNN architectures, long short-term memory (LSTM) networks have been the most popular due to their capacity to solve problems of previous RNN such as gradient explosion and vanishing gradient [
12]. It has been considered a standard forecasting model for several tasks such as traffic prediction [
13], solar power forecasting [
14], financial market predictions [
15], and electricity price prediction [
16]. Although CNNs were originally designed for computer vision tasks, they are also suitable for time series data since they can extract high-level features from data with a grid topology. Despite the popularity of RNNs, several works using convolutional networks can be found. In both [
17,
18], the authors proposed CNN models for short-term load forecasting that provides comparable results to LSTM models. Other works have been able to build deep convolutional networks that can outperform LSTM networks for electricity demand [
19] and solar power data problems [
20]. Furthermore, in all these works, the CNN models proved to be more suitable for real-time applications given their faster training and testing execution time. The properties of local connectivity and parameter sharing of convolutional networks reduce the number of trainable parameters compared to RNNs, hence they can be trained more efficiently. There have also been proposals using hybrid models that combine convolutional and LSTM layers. In [
21], the output feature maps of a CNN are fed to a RNN that provides the prediction. Other approaches consider combining the features extracted in parallel from a CNN and a LSTM to improve the forecasting using electricity demand data [
22] or financial data [
23]. These ensemble proposals can enhance the predictive performance by fusing the long-term patterns captured by the LSTM and the local trend features obtained with the CNN.
More recently, a specialised CNN architecture known as temporal convolutional networks (TCN) has acquired popularity due to their suitability to deal with time series data. TCNs were first proposed in [
24], in which they were compared to several RNNs over sequence modelling tasks. TCNs use causal dilated causal convolution in order to be able to capture longer-term dependencies and prevent information loss. Furthermore, they present other advantages over RNNs such as lower memory requirements, parallel processing of long sequences as opposed to the sequential approach of RNNs, and a more stable training scheme. Several works have already successfully used TCNs for time series forecasting tasks: the original architecture using stacked dilated convolutions was proposed in [
25] to improve the performance of LSTM networks for financial domain problems; Ref. [
26] designed a deep TCN for multiple related time series with an encoder–decoder scheme, evaluating over data from the sales domain; the study in [
27] proposed a multivariate time series forecasting model for meteorological data, which outperformed several popular deep learning models. However, to the best of our knowledge, the potential of TCNs has not yet been explored for univariate time series forecasting problems related to electricity demand data.
In this work, we study the applicability and performance of TCNs for multistep time series forecasting over two energy-related datasets. With the first dataset, we build a deep learning model to forecast the electricity demand in Spain based on the historical consumption data over five years. In the second dataset, the problem is to forecast the expected energy consumption of charging stations for electric vehicles in Spain. Our aim in this study is to present a deep learning model that uses a TCN to obtain high accuracy on time series forecasting. We present the results obtained with several TCN architectures and perform an extensive comparison with different LSTM models, which has been so far the most extended approach for these types of problems. In the experimental study, we carry out an extensive parameter search process which involves 1998 different network architectures.
In summary, the main scientific contributions of this paper can be condensed as follows:
A temporal convolutional neural network model to achieve high accuracy in forecasting over energy demand time series;
A thorough experimental study, comparing the performance of temporal convolutional with long short-term memory networks for time series forecasting.
The rest of the paper is organised as follows:
Section 2 describes the materials used, the methodology, and the experiments carried out; in
Section 3, the experimental results obtained are reported and discussed;
Section 4 presents the conclusions and future work.
4. Conclusions
In this paper, we proposed a deep learning model based on temporal convolutional networks (TCN) to perform forecasting over two energy-related time series. The experimental study considered two real-world time series data from Spain: the national electric demand and the power demand at charging stations for electric vehicles. An extensive parameter search was conducted in order to obtain the best architecture configuration, testing more than 1200 different TCN models for both dataset. Furthermore, the performance of these convolutional networks was compared in terms of accuracy and efficiency with long short-term memory (LSTM) recurrent networks—that have so far been considered the state-of-the-art for forecasting tasks.
The results of the experimental study carried out showed that TCNs outperformed the forecasting accuracy of LSTM models for both datasets. The dilated causal convolutions used by TCNs were more effective at capturing temporal dependencies than the recurrent LSTM units. Furthermore, TCNs proved to be less sensitive to the parameter selection than LSTM models. Regardless of the chosen values, the convolutional approach provided a more reliable performance. Moreover, we also aimed to illustrate the importance of the size of the past history input window. Thanks to the use of residual connections, TCNs provided better results when using longer input sequences. In contrast, LSTM models were more accurate at encoding patterns when using smaller windows.
Regarding the computational efficiency, it was seen that TCN models have deeper architectures with many more trainable parameters. This implied that the training procedure of a TCN was slightly more costly. However, once TCNs were trained, they provided significantly faster predictions than recurrent networks due to the use of parallel convolutions to process the input sequences. In conclusion, our study demonstrated that TCNs are a very powerful alternative to LSTM networks. They can provide more accurate predictions and are more suitable for real-time applications given their faster predicting speed.
Future efforts on this path will be focused on analysing the use of ensembles of TCN blocks with different receptive fields and using techniques such as evolutionary algorithms for the parameter search process. Another interesting future work could be the application of TCN networks in an online environment for real-time data streaming forecasting. Moreover, further research should also study the suitability of TCN networks for other problems like multivariable time series forecasting or time series classification.