ELFNet: An Effective Electricity Load Forecasting Model Based on a Deep Convolutional Neural Network with a Double-Attention Mechanism

Zhao, Pei; Ling, Guang; Song, Xiangxiang

doi:10.3390/app14146270

Open AccessArticle

ELFNet: An Effective Electricity Load Forecasting Model Based on a Deep Convolutional Neural Network with a Double-Attention Mechanism

by

Pei Zhao

¹,

Guang Ling

^2,* and

Xiangxiang Song

^2,*

¹

Teachers’ College, Beijing Union University, Beijing 100023, China

²

School of Science, Wuhan University of Technology, Wuhan 430070, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 6270; https://doi.org/10.3390/app14146270

Submission received: 13 June 2024 / Revised: 12 July 2024 / Accepted: 15 July 2024 / Published: 18 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Forecasting energy demand is critical to ensure the steady operation of the power system. However, present approaches to estimating power load are still unsatisfactory in terms of accuracy, precision, and efficiency. In this paper, we propose a novel method, named ELFNet, for estimating short-term electricity consumption, based on the deep convolutional neural network model with a double-attention mechanism. The Gramian Angular Field method is utilized to convert electrical load time series into 2D image data for input into the proposed model. The prediction accuracy is greatly improved through the use of a convolutional neural network to extract the intrinsic characteristics from the input data, along with channel attention and spatial attention modules, to enhance the crucial features and suppress the irrelevant ones. The present ELFNet method is compared to several classic deep learning networks across different prediction horizons using publicly available data on real power demands from the Belgian grid firm Elia. The results show that the suggested approach is competitive and effective for short-term power load forecasting.

Keywords:

electricity load forecasting; deep convolutional neural network; channel attention; spatial attention; Gramian angular field

1. Introduction

The rapid development of society has led to a significant rise in energy consumption and the depletion of traditional energy sources. This has created a growing demand for the efficient use of energy, particularly in our daily lives. The absence of a consistent pattern in power demand could lead to an imbalance between supply and demand, resulting in energy losses [1]. Electricity load forecasting plays a crucial role in informing long-term policy decisions aimed at addressing these challenges. It also provides valuable information to the electrical sector, suppliers, and market regulators [2].

Numerous researchers have proposed various forecasting approaches to enhance the accuracy of electricity load forecasting [3,4]. These approaches can be broadly categorized into two groups. The first group comprises classic statistical models, including linear regression models [5,6], autoregressive moving average models [7,8], and autoregressive integrated moving average models [9].

Despite their long-standing use, speed of calculation, and practicality, traditional methods often struggle with accuracy and effectiveness when analyzing electrical load data that contain significant random elements and high levels of nonlinearity [10]. The second category of prediction techniques involves machine learning algorithms, such as artificial neural networks, support vector regression, decision trees, and XGBoost.

With the increasing improvement in the available computational resources, deep learning (DL) has become a highly successful data-driven technology in the field of forecasting electricity load [11]. The ability of deep neural networks (DNNs) to forecast electricity loads with strong nonlinearity has been exploited [12]. Talaat et al. introduce a medium- to short-term load forecasting (MTLF; STLF) model that can be used to forecast load at different times of the month and on different days [13]. However, DNNs have the drawback of not being able to simulate some delicate changes in time series, and the training process is vulnerable to gradient disappearance and easily causes the prediction results to reach the local optimum [14]. Fekri et al. propose an online adaptive RNN, which is a load forecasting method that can continuously learn from newly arrived data and adapt to new patterns [15]. Jagait et al. use RNN and ARIMA for load forecasting under concept drift [16]. Bui et al. propose a multi-scale RNN model with short-term and long-term memory for load forecasting [17]. While RNNs struggle with long-term time dependencies and cannot effectively represent large time series, LSTMs have emerged as a solution to this issue [18]. Zang et al. combine LSTM with the self-attention mechanism (SAM) to develop a hybrid model with two input channels [19]. Bashir et al. proposed a hybrid approach using Prophet and LSTM models to predict accurate loads. The Prophet model predicts the raw load data using both linear and nonlinear data, and the nonlinear data are trained using LSTM [20]. Memarzadeh et al. proposed a new hybrid forecasting model for short-term power load and price forecasting. The proposed method consists of three modules: wavelet transform to eliminate the fluctuation behavior of the power load and price time series, feature selection based on entropy and mutual information, and finally LSTM to train the model [21]. The incorporation of gating units like forgetting gates and memory gates has significantly enhanced the ability of these models to address long-term time dependencies and spatial complexity. Historically, LSTM and its variations have been widely utilized in power load forecasting models.

Due to the advantages of CNNs in processing image data with strong nonlinearity and the ability to extract more intrinsic data features, the convolutional neural network (CNN) was adopted to process electricity load data [22]. Singh et al. proposed a novel STLF model based on 2D CNN. The discussion focused on an overview of available prediction techniques, the implemented CNN architecture, the feature selection process, and the performance of the model on a test dataset [23]. Imani used CNN to extract nonlinear relationships between load values. In addition, a load-temperature cube was composed of hourly load and temperature values for a week. Another CNN was trained using the load-temperature cube to learn the hidden nonlinear load-temperature features. Finally, SVR was used for load forecasting [24]. Wang and Oates first encoded univariate time series to images using the Gramian angular fields (GAFs) and Markov transition fields (MTFs) methods. The images were then utilized as inputs for a CNN. This image-based framework pioneered a new class of deep learning algorithms for time series analysis [25,26]. Since then, many techniques have been introduced to convert time series data into images for use as the inputs of CNNs. The application of a relative position matrix (RPM) time-series encoding approach to convert raw time series into 2D images is investigated to develop an efficient CNN architecture for autonomously learning a higher-level representation of raw time-series data [27]. To feed each CNN, a multi-resolution imaging technique based on Gramian angular fields (GAF) is utilized, allowing for the analysis of diverse time-related periods for a single observation [28]. A 2D representation approach called Relative Position Matrix (RPM) is presented to transform raw time series data to 2D pictures, and an enhanced CNN model is provided to categorize these 2D images. The conversion approach for time series is applied based on the Hue saturation value space, which makes it simple to compare colors since it can transmit the brightness, hue, and vividness of a color very naturally [29]. Hong et al. proposed a new method for predicting solar radiation by encoding time series data into images using Gramian angular fields and convolutional LSTM (ConvLSTM) networks. The preprocessed data become a five-dimensional input tensor that is well-suited to ConvLSTM. The ConvLSTM network uses convolution operations in its input-to-state transition and state-to-state transition [30]. A local phase binary encoding operation was executed to create the histogram of a 2D phase encoding of power signals incorporating neighborhood information, and the suggested encoding method greatly reduced the dimensions of the appliance signals, in addition to improving the discriminating capacity of classifier models [31]. Multivariate time series data were converted from 1D signals to 2D visuals using a variety of encoding approaches, including the Gramian Angular Summation Field (GASF), Gramian Angular Difference Field (GADF), Markov Transition Field (MTF), and Recurrence Plot (RP) [32].

In order to further leverage the machine learning capabilities of CNNs, more improvements have been developed. The Alexnet algorithm [33], the first deep convolution application, the Inception structure [34], which expands on the classical CNNs, the Resnet strategy [35,36], which was proposed to solve the problem of the deep convolutional vanishing gradient, and the Resnext method [37], which unifies the concepts of group convolution and residual networks, are all well-known improved forms or variants of CNNs. The transformer model put forth by Vaswani et al. [38] utilized the attention mechanism to process natural language. The attention mechanism model provides an advantage in the area of natural language processing due to its simplicity and lack of model parameters. The channel attention module and the spatial attention module were proposed by Woo et al. [39] and implemented in the CNN. Each channel serves as a feature detector for the channel attention module. The spatial attention module makes use of the space of the feature map, while the channel attention module concentrates on the most significant input images and improves or suppresses various channels for various tasks to optimize the networks’ capacity for representation. In addition to being a useful addition to channel attention, the relationship creates a spatial attention map that identifies the most crucial regions of the network for processing and may be utilized for adaptive feature optimization with the input feature map.

Inspired by previous works, this article presents a new deep prediction network frame for predicting the electricity load. Taking advantage of the powerful image data-processing capabilities of CNN, the proposed model first converts the time series into images via the method of the statistically interpretable GAF, which substantially increases the prediction performance of the power load data with strong nonlinear features and environmental fluctuations, and then uses a deep CNN to extract nonlinear features from the electricity load data. The types of attention mechanism, i.e., the channel attention (CA) and spatial attention (SA) modules, are applied to extract the data’s hierarchical features to minimize information loss, and residual connections are adopted in the appropriate convolutional layers to ensure the proper convergence of model training and parameter updates. The novel concepts and main contributions of this paper are as follows:

(1): A novel deep convolutional attention mechanism model is proposed to solve the issue of electricity load forecasting with strong nonlinear features, which optimizes the deep characteristics of the power load data using the convolution layer, residual connections, and the CA and SA mechanisms;
(2): The proposed deep learning structure is designed to reduce the randomness of the data and guarantee the robustness of machine learning. It can also easily extract the intrinsic properties of the data thanks to the Gram matrix principle, which is used to convert time series data into figure data;
(3): From different time scales, the proposed model can directly output the multi-step prediction results, reduce the prediction error, and achieve more excellent prediction performance results.

The remainder of the paper will be structured as follows: the background theory of the proposed prediction approach is provided in Section 2, the fundamental structure of the proposed model is described in Section 3, the experimental data, evaluation metrics, and results are presented in Section 4, and the final conclusion is given in Section 5.

2. Background Theory

ELFNet is a deep convolutional residual network that is proposed for electricity load prediction. It works by first converting time series into pictures and then using deep convolutional residual networks with a double-attention mechanism. The next subsections provide a detailed introduction to the theory underlying the network structure.

2.1. Gramian Angular Field

Typically, we present the time series in a Cartesian coordinate system, with the vertical axis representing the related real values and the horizontal axis representing the timestamp. In order to reduce information loss, we create a bidirectional mapping between the one-dimensional time series and the two-dimensional space, as proposed by Wang et al. [25], to replace the traditional Cartesian coordinate system with a polar coordinate system. Given a time series with actual observations

X = (x_{1}, x_{2}, \dots, x_{n})

, time series

X

is first normalized into the interval

[\frac{1}{2}, 1]

:

{\tilde{x}}_{i} = \frac{1}{2} \times \frac{x_{i} - \min (X)}{\max (X) - \min (X)} + \frac{1}{2}

(1)

Then, we present the normalized time series in the polar coordinate system using the following mathematical formula, encoding the values of the time series as angles in polar coordinates and the time stamps as polar radii in the polar coordinate system:

θ_{i} = \arccos ({\tilde{x}}_{i}), \frac{1}{2} \leq {\tilde{x}}_{i} \leq 1, {\tilde{x}}_{i} \in \tilde{X}

(2)

r_{i} = \frac{t_{i}}{N}, t_{i} \in N

(3)

where

t_{i}

represents the time stamp of the time series, and

N

is a constant factor used to standardize the range of the polar coordinate system. The mapping transformation (2), (3) will only provide one result in the polar coordinate system. Differing from other methods of converting time series to images, this mapping transformation has a uniquely accurate inverse mapping and, unlike the Cartesian coordinate system, the polar coordinate system preserves the absolute time relationship of the time series.

When we deflate the time series to the interval

[\frac{1}{2}, 1]

, the corresponding inverse cosine function values will fall in the interval

[0, \frac{π}{3}]

. After converting the time series to a polar coordinate system, this article will identify the temporal correlation at different time intervals by considering the angular sum between each point. The Gram summation angular field (GASF) is defined as follows:

G A S F_{i j} = \cos (θ_{i} + θ_{i})

(4)

From the time series

X

, the corresponding Gram matrix can be obtained by Gram-summing the angular field, and the elements in the Gram matrix

G (X)

are obtained via Equation (4):

G (x_{1}, x_{2}, \dots, x_{n}) = (\begin{matrix} G_{11} & \dots & G_{1 n} \\ \dots & \dots & \dots \\ G_{n 1} & \dots & G_{n n} \end{matrix})

(5)

where

G (X)

contains the time dependency, which increases successively from the upper left of the matrix to the lower right corners.

G_{i j}

indicates the relative correlation for the time series with point

i

and point

j

, and the elements of the diagonal

G_{i i}

contain the angular information corresponding to the original values of the time series. Meanwhile, the actual values of the original time series can be reconstructed from the principal diagonal values of the matrix by using an inverse transformation. The length of the time series to be converted determines the size of the transformed Gram matrix, as is shown in Figure 1, and from there we can determine the size of the converted image. The image training set used in this study was set to be

64 \times 64

pixels in size.

2.2. Residual Convolutional Structure

The concept of residual networks was developed to highlight the nonlinear representation capabilities of deep CNNs, where deep sub-convolutional networks combined with residual connections may efficiently avoid the degradation issues in the network [40]. The basic residual convolutional module used in this study is shown in Figure 2. To achieve the desired output feature map size and downsampling accuracy, we can adjust the size of the convolution kernel and the convolution stride. After extracting the image’s feature information using a 2D-CNN, we can improve the nonlinear expression of the convolution using an ReLU nonlinear activation function, and then apply a MaxPool layer to reduce the network’s computation and obtain the feature map we are looking for. In this procedure, the following mathematical formula is used to represent the 2D-CNN convolutional module:

F_{o u t} = C o n v . u n i t (F_{i n}) + R e s i d u a l = M a x P o o l (σ (C o n v 2 d (F_{i n}))) + R e s i d u a l

(6)

where

F_{i n}

denotes the output feature of the previous layer,

C o n v 2 d

denotes the 2D convolution operation,

σ

denotes the ReLU activation function,

R e s i d u a l

denotes the residual connections, and

M a x P o o l

is the maximum pooling operation.

2.3. Channel Attention Mechanism

The channel attention mechanism was introduced to modify the features extracted by convolution; the modified features can keep the valuable features and suppress the non-valuable ones [41]. The main idea of the mechanism is to use some network structures to calculate the attention weights, which are combined with the feature map to build an improved attention feature map. The channel attention module has a dual-channel feature that uses the global average pooling and global maximum pooling, respectively, to obtain different feature information through two different pooling channels. The obtained features then are input into the same MLP, where they are applied to generate channel attention weights using sigmoid, which are finally multiplied by the input feature map to produce enhanced attention. It is crucial to highlight that the CA module does not change the input data’s dimensions; hence, both the input data’s dimensions and the output data’s dimensions remain the same. Figure 3 depicts the whole process of the channel attention mechanism, where

F_{i n}

denotes the input feature matrix,

σ

denotes the sigmoid activation function, MLP denotes the multiple linear perceptron,

{C A}_{F_{i n}}

denotes the attention weight matrix of the input features,

F_{o u t}

denotes the output feature map after attention enhancement, and

*

denotes the matrix multiplication.

{C A}_{F_{i n}} = σ (M L P (A v g p o o l (F_{i n}) + M L P (M a x p o o l (F_{i n})))

(7)

F_{o u t} = {C A}_{F_{i n}} * F_{i n}

(8)

2.4. Spatial Attention Mechanism

Using the attention mechanism, Wang et al. [42] transform the spatial information in the original image into a different space while maintaining the key information. A spatial converter module is used to perform the necessary spatial transformation of the spatial domain information in order to extract the key information. The input feature map data are passed through the maximum pooling layer and the average pooling layer to generate feature maps. The two feature maps are then merged by concatenating the operation to construct a feature map with one channel. After this, the spatial attention of the corresponding feature map is obtained after the sigmoid activation function. Similar to the channel attention module, the spatial attention mechanism does not alter the dimensional information of the data.

According to the flow chart in Figure 4, the channel attention is calculated using Equations (9) and (10), where

c o n v

is the two-dimensional convolution operation,

σ

denotes the sigmoid activation function,

[A; B]

stands for the concatenate operation of matrix

A

and matrix

B

,

{S A}_{F_{i n}}

indicates the attention weight matrix of the input features,

F_{o u t}

represents the output feature map after attention enhancement, and

*

denotes the matrix multiplication.

{S A}_{F_{i n}} = σ (c o n v ([A v g p o o l (F_{i n}); M a x p o o l (F_{i n})]))

(9)

F_{o u t} = {S A}_{F_{i n}} * F_{i n}

(10)

3. Structure of the Proposed Models

Traditional time series forecasting methods have a limited ability to extract nonlinear features since the majority of the current electricity load data contain nonlinear characteristics. We examine the nonlinear properties of electricity load data using the 2D-CNN model, which performs in a variety of fields. Before inputting the model, we transform the time series data into image data using GAF with a statistical interpretation since the 2D-CNN model performs incredibly well using the data input as images and has a very strong ability to extract features. The covariance matrix at various time points in GAF is represented by the Gram matrix. The matrix includes both the time series data and the relationships between the data at various time periods. The proposed model receives the time series data as the input, and to further simplify the training procedure, we add a double attention mechanism to the model to enhance the extracted features.

Based on the above discussion, a novel, deep convolutional attention mechanism model for electricity load forecasting is proposed. The deep features of the input data are extracted using a deep convolutional network, and the extracted features are then filtered using channel attention and spatial attention, providing more weight to the features that are valuable and less weight to the features that are worthless. Multi-step ahead forecasting can be expressed as a prediction of

{X_{t + k}}

based on a given time series

{X_{t}}

; here,

t = 1,2, \dots, T

,

k = 1,2, \dots, K,

and

k

is the forecast horizon.

T

is the total sample of the time series. Therefore, the proposed method forecasts the

({X_{t + 1}, X_{t + 2}, X_{t + 3})

electricity load horizons. Figure 5 illustrates the fundamental flowchart of the proposed model. In the preprocessing stage, the set of the time series is first deflated to the interval

[\frac{1}{2}, 1]

, and then the deflated time series is transformed into images via GAF and the resulting image dataset is used as the input of the deep convolutional attention network. We then continuously adjust the weight parameters to obtain the final prediction results via backpropagation. More intricate mathematical expressions can be found in the following equations:

Y_{1}^{(t, t)} = G A F (X_{t})

(11)

Y_{2}^{(\frac{t}{2}, \frac{t}{2})} = σ (C o n v 2 d (Y_{1}^{(t, t)}))

(12)

Y_{3}^{(\frac{t}{32}, \frac{t}{32})} = (S A (C A (C o n v . u n i t (Y_{2}^{(\frac{t}{2}, \frac{t}{2})}) + R e s i d u a l) \times 4

(13)

Y_{p r e} = L i n e a r (Y_{3})

(14)

Here,

X_{t}

denotes the time series observations;

G A F (\cdot)

denotes the transformation of the input matrix

X_{t}

into a Gram matrix with the size of

t \times t

;

σ

denotes the activation function;

C o n v . u n i t (\cdot)

indicates the convolution operation on the matrix from Section 2.2. CA means the attention operation on the input features stated in Section 2.3, and similarly, SA is the spatial attention operation. Finally, Equation (13) repeats the same structure four times, corresponding to the four identical tandem structures in Figure 5.

4. Experiment Study

In this section, we present the dataset used in this paper, the evaluation metrics, and the experimental results in detail; the remainder of this section contains evaluations of the obtained results, a performance improvement analysis, and a comparison of the deep learning methods.

4.1. Experiment Data

The PyTorch architecture was applied to all models in this research, and real-time electricity load data gathered by Elia at 15 min intervals were used. The data have 8828 time points and span three months of electricity load from 1 March 2022 to 1 June 2022. The general information and statistical properties of the dataset are shown in Table 1. The model was trained using the first 6180 data points as the training set and its performance was tested using the last 2648 data points (at a ratio of nearly 7:3). The proposed model can estimate electricity load at multiple scales, including at 1 h, 2 h, and 3 h time intervals. Throughout the training process, MSELoss was chosen as the loss function and was optimized using the Adam optimizer. The training batch size was set to 32; the learning rate was set to 0.001. All models were trained on an AMD Ryzen 7 5800H [email protected] GHz, GeForce GTX 1060 6G from NVIDIA, Santa Clara, CA, USA.

4.2. Evaluation Metrics

A series of evaluation metrics were chosen, including root mean squared error (RMSE), mean absolute error (MAE), symmetric mean absolute percentage error (SMAPE), mean absolute percentage error (MAPE), and correlation coefficient (R), to test the suggested model and better assess its effectiveness. The related mathematical expressions are shown below:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(Y_{i} - X_{i})}^{2}}

(15)

M A E = \frac{1}{N} \sum_{i = 1}^{N} | Y_{i} - X_{i} |

(16)

S M A P E = \frac{1}{N} \sum_{i = 1}^{N} \frac{| Y_{i} - X_{i} |}{(|Y_{i}| + | X_{i} |) / 2}

(17)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} [\frac{Y_{i} - X_{i}}{X_{i}}]

(18)

R = \frac{\sum_{i = 1}^{N} (Y_{i} - \bar{Y}) (X_{i} - \bar{X})}{\sqrt{\sum_{i = 1}^{N} {(Y_{i} - \bar{Y})}^{2} \sum_{i = 1}^{N} {(X_{i} - \bar{X})}^{2}}}

(19)

4.3. Ablation Study

In this paper, we enumerate three different combinations to compare the experimental results with prediction scales of 1 h, 2 h, and 3 h, respectively. We confirm that the use of different modules in ELFNet has a positive impact on the final experimental results through a comparison between the errors obtained by ELFNet structures with those of four different modules:

CNN: the convolutional structure proposed in Section 2.2, without CA and SA modules;
CNN-CA: the CNN structure with only the CA module added;
CNN-SA: the CNN structure with only the SA module added;
ELFNet: the proposed final model in this paper (shown by Figure 5 in Section 3).

The prediction results of the above structures with different modules are shown in Table 2. Figure 6 shows that, for the proposed ELFNet model, the five evaluation metric values for RMSE, MAE, SMAPE, MAPE, and R are 0.0316, 0.0254, 0.2246, 0.3323, and 0.9920, respectively, at the 1 h forecast. The values of the evaluation measures when utilizing only the CNN structure without the CA and SA modules were 0.0465, 0.0373, 0.2806, 0.3769, and 0.9899. The evaluation metric values for the CNN-CA model were 0.0349, 0.0274, 0.2259, 0.3179, and 0.9916, respectively. The evaluation metric values were 0.0409, 0.0321, 0.2652, 0.4021, and 0.9892, corresponding to the CNN-SA model. Based on these metrics, it can be seen that the proposed ELFNet delivers the best prediction results on the four-evaluation metrics of RMSE, MAE, SMAPE, and R, and can provide accurate and effective prediction results. They all show that the CA module has a significant impact on the optimization of intrinsic features, while only the CNN-CA structural model achieves the best results for the MAPE measurements.

The results of the 2 h ahead electricity load forecast for different deep learning structures are shown in Figure 7. The values of the evaluation metrics RMSE, MAE, SMAPE, MAPE, and R of the proposed model ELFNet with the 2 h ahead forecast are shown in Table 2 and are, respectively, 0.0346, 0.0270, 0.2402, 0.3689, and 0.9911, and the corresponding error metrics increase when the CA and SA modules are not used. The RMSE drops from 0.0442 to 0.0402 when the CA module is applied to the CNN framework, increasing forecast accuracy by 9

%

; other evaluation indicators also show varying degrees of improvement, demonstrating the beneficial effects of the CA and SA modules. How the CA and SA modules affect the electricity load forecasting accuracy is also shown.

The results for the 3 h ahead forecasting are shown in Figure 8. As can be seen in Table 2, the values of the RMSE, MAE, SMAPE, MAPE, and R of the proposed ELFNet model are 0.0417, 0.0333, 0.2813, 0.4118, and 0.9892, respectively, under the 3 h ahead forecasting value of 0.0440, and the MAE value is reduced from 0.0440 to 0.0377 after using the CA module, showing an improvement of 13.32

%

; similarly, after using the SA module, the MAE value is decreased from 0.0440 to 0.0379m with an improvement of 13.86

%

; after using both the CA and SA modules, the improvement reaches 24.32

%

.

It can be shown from the aforementioned analysis and experimental data that both the CA and SA modules are crucial to increasing the model’s forecast accuracy at various prediction horizons.

4.4. Comparative Experiment Results and Analysis

In this subsection, we compare our proposed ELFNet model with several prevalent deep learning algorithms for tackling image data in order to further validate the multi-scale prediction performance of our model. The deep learning methods we selected to take part in this comparison are ResNet-18, ResNeXt-50, and GoogLeNet. Table 3 summarizes the comprehensive results for following the evaluation metrics: RMSE, MAE, MAPE, SMAPE, and R.

The results for the 1 h ahead forecasting and the discrepancies between the predicted results and actual values are shown in Figure 9. Table 3 shows that, for the proposed forecasting model ELFNet, the values of the evaluation metrics are 0.0316, 0.0254, 0.2246, 0.3323, and 0.9920, respectively, under the 1 h ahead forecast. The evaluation metrics for ResNeXt-50 are 0.0514, 0.0404, 0.3070, 0.4520, and 0.9812, respectively, whereas those for ResNet-18 are 0.0569, 0.0409, 0.0307, 0.4520, and 0.9812. The evaluation metrics for GoogLeNet have values of 0.0506, 0.0395, 0.3013, 0.3009, and 0.8806. A visualization of the experimental data can be seen in Figure 10. It is apparent from the statistics of these evaluation metrics that ResNet-18 has the poorest prediction results and ELFNet has the best prediction performance for all evaluation metrics. The scatter plot composed of the real and actual value data pairings is presented in Figure 9c. The black dashed line in the figure, which runs from the lower left corner to the upper right corner, is where all points should fall in perfect situations, so the nearer the scatter points to this line, the more accurate the model predictions will be. It is obvious that ELFNet performs admirably in this regard compared to all other models.

The results for the 2 h ahead forecasting are shown in Figure 11 and Figure 12. Table 3 shows that the five evaluation indicators for the prediction ELFNet model developed in this paper are 0.0346, 0.0270, 0.2402, 0.3689, and 0.9911, respectively, under the assumption that the forecast scale is 2 h. Compared to other deep learning techniques, ELFNet produced the greatest results using the RMSE assessment measure, with GoogLeNet coming in second, with a score of 0.0464. The least accurate prediction was made by ResNet-18, which showed that relying only on residual connections to mined features is insufficient to capture all of the available characteristics of the experimental data. In this comparative experiment, ELFNet differed significantly from the other three deep learning methods using the evaluation index of R. The R for the other three techniques varied greatly. The numbers were nearly identical, and the ELFNet R value was as high as 0.9911, suggesting that the model suggested in this paper has excellent nonlinear representation and robust data-fitting capabilities.

Figure 12c shows that the degree of dispersion is the minimum and that the actual value and anticipated value of ELFNet are more in line with the middle black dotted line. It also shows how effective and precise ELFNet’s prediction capabilities are.

The results for the 3 h ahead forecasting are shown in Figure 13 and Figure 14. For the prediction scale of 3 h, the values of the evaluation metrics RMSE, MAE, SMAPE, MAPE, and R of the proposed ELFNet model are 0.0417, 0.0333, 0.2813, 0.4118, and 0.9892, respectively. ELFNet outperforms other deep learning techniques according to the SMAPE evaluation criteria, with ResNet-18 coming in last, with a result of 0.4996. Figure 13c illustrates that the prediction scale is 3 h.

ELFNet still maintains a high prediction accuracy, while the prediction errors of the other three methods are the highest for ResNeXt-50. This also reflects that, under the assumption of some evaluation indicators, the depth of the network cannot further improve the prediction accuracy. When the ResNet-18 error fluctuates the most, the increase becomes progressively bigger. This convincingly demonstrates that the ELFNet model put forward in this research has great prediction stability and continues to produce reliable forecasts at various prediction scales.

A histogram of prediction deviations (Figure 15) was created to show the estimated error margin and distribution properties for all electricity prediction horizons. The normal distribution, along with the mean and variance, is shown by the black dashed line. In general, the position of the normal distribution curve is determined by the mean value, while the shape of the normal distribution curve is determined by the variance. The closer the mean value is to 0, the smaller the variance is and the more accurate the prediction accuracy. In this experiment, for the 1 h prediction horizon, the mean is 21.4271 and the variance is 133.7763; for the 2 h prediction horizon, the mean is −14.9310 and the variance is 140.0519; and for the 3 h prediction horizon, the mean is 33.3051 and the variance is 171.8637. These data all indicate the accuracy of ELFNet’s predictions.

4.5. Robustness Experiment

The accuracy and stability of predictions in the face of noise or other signal assaults in the original time series data are typically referred to as the robustness of the time series forecasting [43]. The data are supplemented with random Gaussian noise with SNRs of 20, 30, 40, 50, 60, and 70;

P_{s}

reflects the effective power of the time series;

P_{n}

reflects the effective power of the noise; (20) is the calculation formula for SNR.

The comparison between the time series before adding Gaussian noise and the original time series is depicted in Figure 16. To evaluate the ability of the time series to accurately forecast the future after adding noise, the RMSE evaluation index is utilized. Figure 17 and Table 4 show that the RMSE of the electricity load data at the 1 h prediction horizon is extremely near; for the 2 h prediction horizon, the RMSE of the prediction result of the electricity data with an SNR of 20 is 0.05657 compared to the original data without noise. The RMSE of 0.0316 increased significantly, and when the prediction horizon is 3 h, the SNR errors of 20, 30, and 40 increased to varying degrees, with a maximum increase of 41.87

%

. The above experimental results demonstrate that ELFNet has strong robustness and a strong anti-noise ability at various prediction horizons.

S N R = 10 \lg (P_{s} / P_{n})

(20)

5. Conclusions

Accurate electricity load predictions are crucial to determine the supply and demand relationship of electric energy. In order to guarantee the sensible and efficient use of electric energy, high-efficiency and high-accuracy electricity load forecasts may provide valuable information to the electric industry, suppliers, etc. In this study, we propose a novel model for forecasting electricity loads called ELFNet using a deep convolutional attention mechanism. The inherent features of the 2D image input are extracted by deep convolutional blocks from the electricity load time series input by GAF, which is then added after the convolutional structure. CA and SA modules are added after the convolutional structure and aim to further optimize the intrinsic features extracted by the convolutional block.

The results show that the prediction accuracy is improved to varying degrees by the addition of CA and SA modules at prediction horizons of 1 h, 2 h, and 3 h, respectively, and the greatest improvement is achieved by adding CA and SA modules simultaneously. We evaluated the prediction performance of the network when different attention modules were added to the deep convolution. Additionally, when compared to the traditional deep learning networks ResNet-18, ResNeXt-50, and GoogLeNet, ELFNet also achieves the highest prediction accuracy at multiple prediction horizons. At the 1 h ahead forecasting, the RMSE value of ELFNet is 0.0316, which is higher by 44.38

%

, 38.42

%

, and 37.49

%

than the 0.0569 value obtained by ResNet-18, 0.0514 value obtained by ResNeXt-50, and 0.0506 value obtained by GoogLeNet. At the 2 h ahead forecasting, the RMSE value of ELFNet is 0.0346, which is 50.49

%

, 40.86

%

, and 25.27

%

, higher than the 0.07 value of ResNet-18, 0.0586 value of ResNeXt-50, and 0.0464 value of GoogLeNet. ELFNet’s RMSE value for the 3 h ahead forecasting is 0.0417, which is higher than the numerical model of 0.0836 obtained by ResNet-18, 0.0661 obtained by ResNeXt-50, and 0.0541 obtained by GoogleNet, showing an improvement of 50.10

%

, 36.85

%

, and 22.85

%

, respectively. In addition to increasing prediction efficiency while maintaining accuracy, future work will be focused on optimizing the network structure of ELFNet, tweaking the network’s hyperparameters, and modifying the network structure in accordance with the data’s inherent features.

Author Contributions

Conceptualization, P.Z., G.L. and X.S.; Methodology, P.Z., G.L. and X.S.; Formal analysis, X.S.; Investigation, P.Z. and G.L.; Writing—original draft, P.Z. and X.S.; Writing—review & editing, G.L.; Visualization, X.S.; Supervision, G.L.; Funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Natural Science Foundation of China under Grants 61503282 and 62073301, the Fundamental Research Funds for the Central Universities (WUT: 2021III062JC).

Data Availability Statement

The datasets generated and analyzed in the current study are available in the OpenDataElia repository, [https://opendata.elia.be/pages/home (accessed on 1 March 2022)].

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

Adam	The optimizer used for training	SA	Spatial attention
Avgpool	Average pooling layers	Skew	Skewness value
CA	Channel attention	SMAPE	Symmetric mean absolute percentage error
CNN	Convolutional neural network	SNR	Signal–noise ratio
Conv2d	The 2D convolution operation	Std	Standard deviation
CV	Coefficient of variation	$σ$	Activation function
DL	Deep learning	$k$	The forecast horizon
DNN	Deep neural network	$X$	Time series
ELFNet	Electricity load forecasting networks	$N$	Constant normalization factor
GAF	Gramian angular field	$R$	Correlation coefficient
GASF	Gram summation angular field	$θ_{i}$	Polar angles in polar coordinate system
Kurt	Kurtosis value	${\tilde{x}}_{i}$	The normalized time series
LSTM	Long short-term memory neural network	$P_{n}$	The effective power of the noise
MAE	Mean absolute error	$P_{s}$	The effective power of the time series
MAPE	Mean absolute percentage error	$r_{i}$	Polar radius in polar coordinate system
Max	Maximum	$t_{i}$	The time stamp of the time series
MaxPool	Max pooling layers	$T$	The total sample of the time serie
Med	Median	$Y_{1}^{(t, t)}$	Time series after GAF processing
Min	Minimum	$Y_{2}^{(\frac{t}{2}, \frac{t}{2})}$	Feature map data obtained after a 2D convolution
MLP	The multiple linear perceptron	$Y_{3}^{(\frac{t}{32}, \frac{t}{32})}$	Feature map data obtained after four 2D convolutions
MSELoss	Error function used for training	$Y_{p r e}$	Forecast results
NN	Neural network
ReLU	Rectified linear unit
ResNet
RMSE	Root mean square error
RNN	Recurrent neural network

References

Xi, K.; Dubbeldam, J.; Lin, H.; Schuppen, J. Power-imbalance allocation control for secondary frequency control of power systems. IFAC-PapersOnLine 2017, 50, 4382–4387. [Google Scholar] [CrossRef]
Deryugina, T.; MacKay, A.; Reif, J. The long-run dynamics of electricity demand: Evidence from municipal aggregation. Am. Econ. J. Appl. Econ. 2020, 12, 86–114. [Google Scholar] [CrossRef]
Guo, W.; Che, L.; Shahidehpour, M.; Wan, X. Machine-Learning based methods in short-term load forecasting. Electr. J. 2021, 34, 106884. [Google Scholar] [CrossRef]
Pan, L.; Feng, X.; Sang, F.; Li, L.; Leng, M.; Chen, X. An improved back propagation neural network based on complexity decomposition technology and modified flower pollination optimization for short-term load forecasting. Neural Comput. Appl. 2019, 31, 2679–2697. [Google Scholar] [CrossRef]
Jiao, J.; Tang, Z.; Zhang, P.; Yue, M.; Yan, J. Cyberattack-resilient load forecasting with adaptive robust regression. Int. J. Forecast. 2022, 38, 910–919. [Google Scholar] [CrossRef]
Dudek, G. Pattern-based local linear regression models for short-term load forecasting. Electr. Power Syst. Res. 2016, 130, 139–147. [Google Scholar] [CrossRef]
Matrenin, P.; Safaraliev, M.; Dmitriev, S.; Kokin, S.; Ghulomzoda, A.; Mitrofanov, S. Medium-term load forecasting in isolated power systems based on ensemble machine learning models. Energy Rep. 2022, 8, 612–618. [Google Scholar] [CrossRef]
Alhmoud, L.; Nawafleh, Q. Short-term load forecasting for Jordan power system based on narx-elman neural network and ARMA model. IEEE Can. J. Electr. Comput. Eng. 2021, 44, 356–363. [Google Scholar] [CrossRef]
Pedregal, D.; Trapero, J. Adjusted combination of moving averages: A forecasting system for medium-term solar irradiance. Appl. Energy 2021, 298, 117155. [Google Scholar] [CrossRef]
Bu, X.; Wu, Q.; Zhou, B.; Li, C. Hybrid short-term load forecasting using CGAN with CNN and semi-supervised regression. Appl. Energy 2023, 338, 120920. [Google Scholar] [CrossRef]
Zhang, Q.; Chen, J.; Xiao, G.; He, S.; Deng, K. TransformGraph: A novel short-term electricity net load forecasting model. Energy Rep. 2023, 9, 2705–2717. [Google Scholar] [CrossRef]
Liao, Z.; Huang, J.; Cheng, Y.; Li, C.; Liu, P. A novel decomposition-based ensemble model for short-term load forecasting using hybrid artificial neural networks. Appl. Intell. 2022, 52, 11043–11057. [Google Scholar] [CrossRef]
Talaat, M.; Farahat, M.; Mansour, N.; Hatata, A. Load forecasting based on grasshopper optimization and a multilayer feed-forward neural network using regressive approach. Energy 2020, 196, 117087. [Google Scholar] [CrossRef]
Mohammed, N.; Al-Bazi, A. An adaptive backpropagation algorithm for long-term electricity load forecasting. Neural Comput. Appl. 2022, 34, 477–491. [Google Scholar] [CrossRef] [PubMed]
Fekri, M.; Patel, H.; Grolinger, K.; Sharma, V. Deep learning for load forecasting with smart meter data: Online adaptive recurrent neural network. Appl. Energy 2021, 282, 116177. [Google Scholar] [CrossRef]
Jagait, R.; Fekri, M.; Grolinger, K.; Mir, S. Load forecasting under concept drift: Online ensemble learning with recurrent neural network and ARIMA. IEEE Access 2021, 9, 98992–99008. [Google Scholar] [CrossRef]
Bui, V.; Pham, T.; Kim, J.; Jang, Y. RNN-based deep learning for one-hour ahead load forecasting. In Proceedings of the 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 19–21 February 2020; pp. 587–589. [Google Scholar]
Huang, Y.; Huang, Z.; Yu, J.; Dai, X.; Li, Y. Short-term load forecasting based on IPSO-DBiLSTM network with variational mode decomposition and attention mechanism. Appl. Intell. 2023, 53, 12701–12718. [Google Scholar] [CrossRef]
Zang, H.; Xu, R.; Cheng, L.; Ding, T.; Liu, L.; Wei, Z.; Sun, G. Residential load forecasting based on LSTM fusing self-attention mechanism with pooling. Energy 2021, 229, 120682. [Google Scholar] [CrossRef]
Bashir, T.; Haoyong, C.; Tahir, M.F.; Liqiang, Z. Short term electricity load forecasting using hybrid prophet-LSTM model optimized by BPNN. Energy Rep. 2022, 8, 1678–1686. [Google Scholar] [CrossRef]
Memarzadeh, G.; Keynia, F. Short-term electricity load and price forecasting by a new optimal LSTM-NN based prediction algorithm. Electr. Power Syst. Res. 2021, 192, 106995. [Google Scholar] [CrossRef]
Khan, S.; Javaid, N.; Chand, A.; Khan, A.; Rashid, F.; Afridi, I. Electricity load forecasting for each day of week using deep CNN. In Proceedings of the Workshops of the International Conference on Advanced Information Networking and Applications, Matsue, Japan, 27–29 March 2019; pp. 1107–1119. [Google Scholar]
Singh, N.; Vyjayanthi, C.; Modi, C. Multi-step short-term electric load forecasting using 2D convolutional neural networks. In Proceedings of the 2020 IEEE-HYDCON, Hyderabad, India, 11–12 September 2020; IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar]
Imani, M. Electrical load-temperature CNN for residential load forecasting. Energy 2021, 227, 120480. [Google Scholar] [CrossRef]
Wang, Z.; Oates, T. Imaging time-series to improve classification and imputation. In Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 3939–3945. [Google Scholar]
Wang, Z.; Oates, T. Spatially encoding temporal correlations to classify temporal data using convolutional neural networks. arXiv 2015, arXiv:1509.07481. [Google Scholar]
Chen, W.; Shi, K. A deep learning framework for time series classification using relative position matrix and convolutional neural network. Neurocomputing 2019, 359, 384–394. [Google Scholar] [CrossRef]
Barra, S.; Carta, S.; Corriga, A.; Podda, A.; Recupero, D. Deep learning and time series-to-image encoding for financial forecasting. IEEE/CAA J. Autom. Sin. 2020, 7, 683–692. [Google Scholar] [CrossRef]
Acikgoz, H.; Budak, U.; Korkmaz, D.; Yildiz, C. WSFNet: An efficient wind speed forecasting model using channel attention-based densely connected convolutional neural network. Energy 2021, 233, 121121. [Google Scholar] [CrossRef]
Hong, Y.Y.; Martinez, J.J.F.; Fajardo, A.C. Day-ahead solar irradiation forecasting utilizing gramian angular field and convolutional long short-term memory. IEEE Access 2020, 8, 18741–18753. [Google Scholar] [CrossRef]
Himeur, Y.; Alsalemi, A.; Bensaali, F.; Amira, A. An intelligent nonintrusive load monitoring scheme based on 2D phase encoding of power signals. Int. J. Intell. Syst. 2021, 36, 72–93. [Google Scholar] [CrossRef]
Azza, A.; Ienco, D.; Abbes, A.B.; Farah, I.R. Combining 2D Encoding and Convolutional Neural Network to Enhance Land Cover Mapping from Satellite Image Time Series. Eng. Appl. Artif. Intell. 2023, 122, 106152. [Google Scholar]
Lan, D.; Zhu, H.; Wu, J.; Li, F.; Zhang, J.; Sun, H. A short-term load forecasting method of distribution network based on improved AlexNet-GRU deep learning network. In Proceedings of the 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2), Taiyuan, China, 22–24 October 2021; pp. 3004–3010. [Google Scholar]
Zheng, X.; Wang, B.; Du, X.; Lu, X. Mutual attention inception network for remote sensing visual question answering. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5606514. [Google Scholar] [CrossRef]
Zhang, W.; Quan, H.; Gandhi, O.; Rajagopal, R.; Tan, C.W.; Srinivasan, D. Improving probabilistic load forecasting using quantile regression NN with skip connections. IEEE Trans. Smart Grid 2020, 11, 5442–5450. [Google Scholar] [CrossRef]
Sheng, Z.; Wang, H.; Chen, G.; Zhou, B.; Sun, J. Convolutional residual network to short-term load forecasting. Appl. Intell. 2021, 51, 2485–2499. [Google Scholar] [CrossRef]
Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Tang, X.; Chen, H.; Xiang, W.; Yang, J.; Zou, M. Short-term load forecasting using channel and temporal attention based temporal convolutional network. Electr. Power Syst. Res. 2022, 205, 107761. [Google Scholar] [CrossRef]
Wang, S.; Li, R.; Wang, X.; Shen, S.; Zhou, B.; Wang, Z. Multiscale residual network based on channel spatial attention mechanism for multilabel ECG classification. J. Healthc. Eng. 2021, 2021, 6630643. [Google Scholar] [CrossRef]
Zhao, S.; Wu, Q.; Zhang, Y.; Wu, J.; Li, X.A. An asymmetric bisquare regression for mixed cyberattack-resilient load forecasting. Expert Syst. Appl. 2022, 210, 118467. [Google Scholar] [CrossRef]

Figure 1. Time series conversion principle.

Figure 2. CNN structure.

Figure 3. Channel attention structure.

Figure 4. Spatial attention structure.

Figure 5. Flowchart of the proposed model.

Figure 6. ELFNet forecasting results for 1 h ahead. (a) is the prediction result, and (b) is the local enlargement.

Figure 7. ELFNet forecasting results for 2 h ahead. (a) is the prediction result, and (b) is the local enlargement.

Figure 8. ELFNet forecasting results for 3 h ahead. (a) is the prediction result, and (b) is the local enlargement.

Figure 9. Forecasting results of ELFNet and deep learning methods for 1 h ahead. (a) is the prediction result, (b) is the local enlargement, and (c) is the correlation diagram between the true value and the predicted value.

Figure 10. Evaluation metrics for 1 h ahead.

Figure 11. Evaluation metrics for 2 h ahead.

Figure 12. Forecasting results of ELFNet and deep learning methods for 2 h ahead. (a) is the prediction result, (b) is the local enlargement, and (c) is the correlation diagram between the true value and the predicted value.

Figure 13. Forecasting results of ELFNet and deep learning methods for 3 h ahead. (a) is the prediction result, (b) is the local enlargement, and (c) is the correlation diagram between the true value and the predicted value.

Figure 14. Evaluation metrics for 3 h ahead.

Figure 15. Error distributions of ELFNet and other methods.

Figure 16. Data distribution with different SNRs.

Figure 17. RMSE histograms with different SNRs.

Table 1. Statistical properties of the dataset.

	Max (Mw)	Min (Mw)	Med (Mw)	Mean (Mw)	Std (Mw)	CV (%)	Skew	Kurt
Dataset	12591.59	6823.94	9391.1	9335.93	1068.91	0.1145	−0.0571	−0.858
Training	12591.59	6883.69	9551.95	6568.26	1053.28	0.1114	−0.0725	−0.782
Testing	11390.69	6823.94	8995.8	9065.98	1056.53	0.1165	−0.0114	−1.131

Table 2. Comparison of the evaluation metrics of CNN with different combinations of modules.

Metric	Horizon	CNN	CNN-CA	CNN-SA	ELFNet
RMSE	1 h	0.0465	0.0349	0.0409	0.0316
	2 h	0.0442	0.0402	0.0441	0.0346
	3 h	0.0542	0.0479	0.0482	0.0417
MAE	1 h	0.0373	0.0274	0.0321	0.0254
	2 h	0.0354	0.0317	0.0348	0.0270
	3 h	0.0440	0.0377	0.0379	0.0333
SMAPE	1 h	0.2806	0.2259	0.2652	0.2246
	2 h	0.2777	0.2525	0.2753	0.2402
	3 h	0.3225	0.3050	0.2983	0.2813
MAPE	1 h	0.3769	0.3179	0.4021	0.3323
	2 h	0.3939	0.3846	0.4941	0.3689
	3 h	0.5631	0.4151	0.4495	0.4118
R	1 h	0.9899	0.9916	0.9892	0.9920
	2 h	0.9895	0.9888	0.9869	0.9911
	3 h	0.9779	0.9840	0.9836	0.9892

Table 3. Performance comparison of the evaluation metrics for ELFNet and deep learning methods.

Metrics	Horizon	ResNet-18	ResNeXt-50	GoogleLeNet	ELFNEt
	1 h	0.0569	0.0514	0.0506	0.0316
RMSE	2 h	0.0700	0.0586	0.0464	0.0346
	3 h	0.0836	0.0661	0.0541	0.0417
	1 h	0.0445	0.0404	0.0395	0.0254
MAE	2 h	0.0563	0.0460	0.0353	0.0270
	3 h	0.0671	0.0522	0.0415	0.0333
	1 h	0.3322	0.3070	0.3013	0.2246
SMAPE	2 h	0.4161	0.3612	0.2697	0.2402
	3 h	0.4996	0.4186	0.3009	0.2813
	1 h	0.4498	0.4520	0.3812	0.3323
MAPE	2 h	0.5449	0.6055	0.3762	0.3689
	3 h	0.6703	0.6727	0.4141	0.4118
	1 h	0.9752	0.9812	0.9806	0.9920
R	2 h	0.9738	0.9792	0.9841	0.9911
	3 h	0.9657	0.9754	0.9784	0.9892

Table 4. Comparison of evaluation metrics of CNN with different combinations of modules.

Metric	Horizon	SNR20	SNR30	SNR40	SNR50	SNR60	SNR70	ELFNet
RMSE	1 h	0.03742	0.03317	0.03162	0.03317	0.03461	0.03606	0.0316
	2 h	0.05657	0.03873	0.03606	0.03873	0.03742	0.04	0.0346
	3 h	0.05916	0.05521	0.05477	0.04359	0.04583	0.4472	0.0417

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, P.; Ling, G.; Song, X. ELFNet: An Effective Electricity Load Forecasting Model Based on a Deep Convolutional Neural Network with a Double-Attention Mechanism. Appl. Sci. 2024, 14, 6270. https://doi.org/10.3390/app14146270

AMA Style

Zhao P, Ling G, Song X. ELFNet: An Effective Electricity Load Forecasting Model Based on a Deep Convolutional Neural Network with a Double-Attention Mechanism. Applied Sciences. 2024; 14(14):6270. https://doi.org/10.3390/app14146270

Chicago/Turabian Style

Zhao, Pei, Guang Ling, and Xiangxiang Song. 2024. "ELFNet: An Effective Electricity Load Forecasting Model Based on a Deep Convolutional Neural Network with a Double-Attention Mechanism" Applied Sciences 14, no. 14: 6270. https://doi.org/10.3390/app14146270

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ELFNet: An Effective Electricity Load Forecasting Model Based on a Deep Convolutional Neural Network with a Double-Attention Mechanism

Abstract

1. Introduction

2. Background Theory

2.1. Gramian Angular Field

2.2. Residual Convolutional Structure

2.3. Channel Attention Mechanism

2.4. Spatial Attention Mechanism

3. Structure of the Proposed Models

4. Experiment Study

4.1. Experiment Data

4.2. Evaluation Metrics

4.3. Ablation Study

4.4. Comparative Experiment Results and Analysis

4.5. Robustness Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI