Energies 16 02283
Energies 16 02283
Energies 16 02283
Article
Electrical Load Forecasting Using LSTM, GRU, and RNN Algorithms
Mobarak Abumohsen 1 , Amani Yousef Owda 1, * and Majdi Owda 2
1 Department of Natural, Engineering and Technology Sciences, Arab American University, Ramallah P600, Palestine
2 Faculty of Data Science, Arab American University, Ramallah P600, Palestine
* Correspondence: amani.owda@aaup.edu
Abstract: Forecasting the electrical load is essential in power system design and growth. It is critical
from both a technical and a financial standpoint as it improves the power system performance,
reliability, safety, and stability as well as lowers operating costs. The main aim of this paper is to
make forecasting models to accurately estimate the electrical load based on the measurements of
current electrical loads of the electricity company. The importance of having forecasting models is in
predicting the future electrical loads, which will lead to reducing costs and resources, as well as better
electric load distribution for electric companies. In this paper, deep learning algorithms are used to
forecast the electrical loads; namely: (1) Long Short-Term Memory (LSTM), (2) Gated Recurrent Units
(GRU), and (3) Recurrent Neural Networks (RNN). The models were tested, and the GRU model
achieved the best performance in terms of accuracy and the lowest error. Results show that the GRU
model achieved an R-squared of 90.228%, Mean Square Error (MSE) of 0.00215, and Mean Absolute
Error (MAE) of 0.03266.
Keywords: load forecasting; machine learning; deep learning models; electric power system; short-term
load forecasting
1. Introduction
The last era in the world was generally characterized by the rapid and large expansion
Citation: Abumohsen, M.; Owda, of electricity networks, especially electrical loads, as they swelled dramatically and new
A.Y.; Owda, M. Electrical Load types of these electrical loads appeared that need a special study [1]. The increase in
Forecasting Using LSTM, GRU, and electrical loads causes complexity in the design of the electrical system components. The
RNN Algorithms. Energies 2023, 16, reorganization of the energy system also led to the formation of institutionalized generation,
2283. https://doi.org/10.3390/ transmission, and distribution companies. These entities are challenged by the increasing
en16052283 requirements for the reliable operation of power system networks [2]. The main concern of
Academic Editor: Surender every electrical company is to provide reliable and continuous service to its customers. It
Reddy Salkuti has become difficult to predict electrical loads using traditional and old methods since many
factors affect electrical loads directly and indirectly. Those factors are population census,
Received: 1 February 2023 temperatures, climatic changes, rainwater, underground basins, the economic system
Revised: 22 February 2023
in each country, human behavior, global epidemics, and the evolution of industries [3].
Accepted: 23 February 2023
Electricity in Palestine is taken from the Israeli through connection points between the
Published: 27 February 2023
two sides, and some of these points have high electricity consumption and others low
consumption, which causes malfunctions in high-load transformers and leads to problems
in energy outputs and infrastructure. Electrical load forecasting is critical in establishing
Copyright: © 2023 by the authors.
and improving power system efficiency because it ensures reliable and economic planning,
Licensee MDPI, Basel, Switzerland. control, and operation of the power system. It helps the electricity companies make
This article is an open access article critical choices such as the acquisition and generation of electrical power, as well as the
distributed under the terms and establishment of the infrastructure for the transmission and distribution system.
conditions of the Creative Commons With the rapid and dramatic increase in energy consumption, developing reliable
Attribution (CC BY) license (https:// models to predict electrical loads is becoming increasingly demanded and complicated [4,5].
creativecommons.org/licenses/by/ The problem of rapid and sharp growth in energy consumption in Palestine has led to the
4.0/). need to create reliable models to predict electricity loads, as these models will help the
electricity companies in managing and planning energy transmission and ensuring reliable
and uninterrupted service to their customers. Predicting electrical loads is very important
for electricity companies in Palestine to prepare short, medium, and long-term plans, and
the energy authority and government agencies need to secure energy in the coming years.
Forecasting the electrical loads does not depend only on the power sector in Palestine,
but instead can feed all economic sectors for feedback that may benefit them in preparing
plans for future development for these sectors. Globally, the importance of predicting loads
comes from the difficulty of predicting them, as loads are the missing and most ambiguous
link for many countries that seek to develop strategies for the electrical system. Therefore,
load forecasting is useful in t designing electrical networks and developing strategic plans
that ensure a stronger economy, a cleaner environment, and energy sustainability.
In this research, we are going to forecast the short-term electrical loads in Palestine
based on real data and deep learning algorithms namely: long short-term memory (LSTM),
gated recurrent unit (GRU), and recurrent neural network (RNN). The objectives of this
work can be illustrated as follows:
• Forecasting electrical loads with the highest accuracy to simulate the real development
of electrical loads.
• Assisting electrical companies in developing short and medium-term plans for design-
ing electrical networks and estimating infrastructure needs.
• Improving the electricity service in Palestine and solving the problem of power outages
in Palestine.
• Helping the electricity companies in securing sources of energy that are suitable for
the loads and not reduce the loads; as this increase is considered a waste that cannot
be used.
The main contribution of this study is to develop models using deep learning algo-
rithms (RNN, LSTM, and GRU) to forecast electricity load in Palestine based on a novel real
dataset. This dataset is the first to come to light in a specific area (Palestine). In addition
to the tuning that was performed using different types of Hyperparameters (optimizer,
activation function, learning rate, number of epochs, batch size, number of hidden layers,
and dropout). To the best of the authors knowledge there is no studies in the open literature
review conducted to forecast the electrical loads using seven types of Hyperparameters.
The proposed forecasting models presented in this research can be applied to any electricity
company dataset. The forecasting models will help electricity companies to introduce
reliable and uninterrupted services to their customers, assist them in developing short and
medium-term plans for designing electrical networks and estimating infrastructure needs,
and help them in securing sources of energy that are suitable for the loads. Moreover, the
proposed models will help the electric companies to make critical decisions, such as the
development of transmission and distribution systems infrastructure to guarantee the best
electrical services for the customers.
This section provides an overview of the electrical load forecasting and the following
sections in the paper are structured as follows: Section 2 describes the literature review and
previous studies, Section 3 presents the methodology used in building the deep learning
models to forecast the electric load, Section 4 illustrates the experimental results and
compares the results with previous studies. Section 5 provides conclusions and plan for
future work.
2. Literature Review
This section presents the state of the arts and an analysis of the relevant literature
review for the forecasting of electrical loads and demands.
2.1. Background
Machine learning (ML) and deep learning algorithms are widely used in the field
of forecasting energy demand and the amount of electricity consumption [6]. Engineers
and data scientists depend on these approaches to deal with temporal data in terms of
Energies 2023, 16, 2283 3 of 31
exploration, explanation, and analysis. Deep learning algorithms are used to optimally
manage the competitive markets of electricity, heat, and hydrogen by tapping into the
potential of intelligent consumers. Through using the capabilities of data-driven customers,
deep learning algorithms are employed to efficiently control the dynamic electricity, heat,
and hydrogen markets [7]. At a high level, the use of ML in power demand analysis
in the literature is separated into two types and those are: (1) unsupervised learning
techniques, which are primarily used to give descriptive analytics or as pre-processing
stages and discover the behavior of electricity consumption [8–11], and (2) supervised
learning approaches, which are mostly used for predictive modeling [12–16].
One of the most obstacles that face the Palestinian electricity companies is forecasting
the electricity loads since the forecasting process helps these companies guarantee the
electrical services to their customers, reduce power outages, and management of their
electrical network, so there is a need for building a reliable electricity load forecasting
system to predict future power loads in Palestine. It is worth mentioning that we cannot
depend on the power loads forecasting system that had been proposed in the literature
by the previous researchers since the weather factors and the power sources differ from
country to country.
2.3.1. Short-Term Load Forecasting for Medium and Large Electrical Networks
The models proposed in [29] is based on neural networks and particle swarm opti-
mization (PSO) to evaluate the Iranian power system. The neural network-based solutions
resulted in fewer prediction mistakes due to their capacity to adapt effectively the hidden
properties of the consuming load. The accuracy of the proposed model was assessed based
on the mean absolute percentage error (MAPE) which does not exceed 0.0338 and the mean
absolute error (MAE) was found to be 0.02191.
The studies in [30,31] used empirical modal decomposition (EMD), where it makes
the original electricity consumption data is first decomposed into several inherent mode
functions (IMFS) with different frequencies and amplitudes. Researchers in [30] suggested
empirical mode decomposition gated recurrent units with feature selection for short-term
Energies 2023, 16, 2283 4 of 31
load forecasting (EMD-GRU-FS). The Pearson correlation is used as the prediction model’s
input feature to determine the correlation between the subseries and the original series. The
experimental findings revealed that the suggested method’s average prediction accuracy
on four data sets was 96.9%, 95.31%, 95.72%, and 97.17%, consecutively. Moreover, authors
in [31] enhanced a combination of integrated empirical modal decomposition (EMD) and
long short-term memory network (LSTM) was presented for short-term load power con-
sumption forecasting. The LSTM is used to extract features and make temporal predictions.
Finally, on the end-user side, short-term electricity consumption prediction results were
obtained by accumulating multiple target prediction results. The proposed EMD—LSTM
method achieved MAPE of 2.6249% in the winter and 2.3047% in the summer.
Moreover, in China, a hybrid short-load forecasting system based on variation mode
decomposition (VMD) and long short-term memory (LSTM) networks and optimized using
the Bayesian optimization algorithm (BOA) has been developed [32]. They compared the
proposed methods with SVR, multi-layered perceptron regression, LR, RF, and EMD-LSTM,
the result of the proposed method shows that MAPE is 0.4186% and R-squared is 0.9945.
In [33] a variational mode decomposition (VMD), temporal convolutional network (TCN),
and error correction approach hybrid prediction model are suggested; where the train set
is prediction error is used to adjust the model’s prediction accuracy. The hybrid model
beats contrast models in prediction; the MAPE for 6, 12, and 24-step forecasting is 0.274%,
0.326%, and 0.405, respectively. The authors in [34] employed the VMD-MFRFNN and DCT-
MFRFNN algorithms to predict historical data, reducing volatility in the time series and
simplifying its structure. They also compared them based on RMSE. The results indicated
that the VMD-MFRFNN model was the best in predicting the historical data.
The researchers in [35–37] used Artificial neural network (ANN) algorithms in building
models for short-term electrical load forecasting since ANN algorithms deal with non-
linear data. Ref. [35] Proposed an ANN algorithm to make a robust computation with vast
and dynamic data to cope with the difficulty of non-linearity of constructing historical
load data for short-term load forecasting building energy consumption. The authors [35]
created and confirmed their results on a testbed home, which was supposed to be a real test
facility. Their model was based on the Levenberg–Marquardt and newton algorithms and
achieved a coefficient of determination within R2 is 0.91, which means the model is a perfect
fitting with a rate of 90% of the variance in the power consumption variable predicted
from the independent variable. Furthermore, researchers in [36,37] investigated the use of
certain types of neural networks such as non-linear autoregressive exogenous (NARX) and
convolutional neural networks (CNN) to improve the performance of standard ANN in
handling time-series data. [36] Suggested a novel version of CNN for the short-term load
(one day ahead) forecasting employing using a two-dimensional input layer (consumptions
from past states in one layer and meteorological and contextual inputs in the second layer).
The model was used in an Algerian case study and the performance metrics indicated that
MAPE and RMSE are 3.16% and 270.60 (MW) respectively. Ref. [37] Proposed a model for
load forecasting based on a non-linear autoregressive model with exogenous input (NARX)
neural network and support vector regression (SVR) to forecast power consumption for the
day ahead, a week ahead, and a month ahead at 15-min granularity, and they compared
SVR and NARX neural network methods. Then, they evaluated the models with varied
time horizons after training them with genuine data from three real commercial buildings.
The SVR outperformed the NARX neural network model, according to their findings. For
the day ahead, a week ahead, and a month ahead forecasting, the average predicting
accuracy is approximately 93%, 88–90%, and 85–87%, respectively. In [38] a novel multi-
functional recurrent fuzzy neural network (MFRFNN) is proposed for developing chaotic
time series forecasting approaches. They validated the efficacy of MFRFNN on real datasets
to forecast wind speed prediction.
Energies 2023, 16, 2283 5 of 31
learning-based techniques. Ref. [48] Built a model to forecast short-term load forecasting
for individual residential households. Researchers in [48] contrasted the LSTM model
performance with the extreme learning machine (ELM), back-propagation neural network
(BPNN), and k-nearest neighbor regression to show considerable prediction error reduction
by employing the LSTM structure, and obtain Avg. MAPE aggregating forecasts of 8.18%,
and an Avg. MAPE for individual forecasts of 44.39%.
Overall, predicting electrical loads in the short term helps in predicting loads for a few
minutes, hours, a day, and sometimes a week, which helps in controlling the distribution
of loads and evaluating the safety of the electrical network. However, based on the
mentioned works, some researchers encountered difficulty in obtaining accurate data on
the consumption behavior of consumers. In this section, we have reviewed some of the
previous studies related to electric load forecasting. We started by reviewing some works
related to the forecasting of short-term electrical loads, whether at the network level in
regions as a whole or at the level of residential buildings, and then we discussed different
algorithms used in forecastings such as LSTM, SVM, RF, CNN, ANN, and SVR.
To the best of the author’s knowledge, there is no forecast of electrical loads in the
State of Palestine, and the prediction is different from country to another because the terrain
and climatic conditions differ from one country to another as well as the population density
and the power consumption. This research will focus on predicting short-term electrical
loads based on the real dataset in Palestine. Using machine learning algorithms (LSTM,
GRU, RNN) with the highest accuracy and least error rate will help to solve the problem of
power outages in Palestine and save time and cost.
3. Methodology
The first step of the methodology is data collection and preparation. and the second
step is data exploration, the third step is data preprocessing for machine learning; the
fourth step is to use different machine learning algorithms i.e., (LSTM, RNN, GRU) for
forecasting the electrical short-term load forecasting, and use of different performance
metrics to compare different machines learning algorithms performance and select the best
approach. Finally, select the best model for electric load forecasting based on the steps of
Energies 2023, 16, x FOR PEER REVIEW 7 of
the optimization and tuning process in the models. Figure 1 summarizes the methodology
as illustrated below:
Figure 1. Basic
Figure 1. Basic workflow workflow
for electric loadfor electric load
forecasting forecasting models.
models.
Table 1. The first five records in the dataset before data preprocessing.
Table 2 shows the description standards for electrical load data on a daily, weekly,
and monthly basis. Mean, median, and standard deviation was found. The mean for daily
electrical loads is 199,013 kWh, the weekly average is 200.51 kWh, and the monthly average
is approximately 202.18 kWh. The mean is close to the same loads. As for the standard
deviation, it is clear that the distribution of loads daily is 35.59 kWh away from the average,
which is the most dispersed from the arithmetic mean, and the lowest is the standard
deviation of monthly loads of 25.13 kWh. As for the median, the average daily electrical
load is 200.36 kWh, the weekly is 198.31 kWh, and the monthly is 202.25 kWh. Therefore, it
is clear from the previous explanation that the dispersion of the daily electrical loads from
the arithmetic mean is the highest.
3.2.1. Correlation
The linear link between two or more variables is measured using a statistical technique
called correlation. One variable may be predicted from another via the use of correlation.
ize and investigate the interrelationship of different variables and to unearth previously
unseen patterns is a key feature of EDA that is crucial to the creation of time series fore-
casting models [49].
3.2.1. Correlation
Energies 2023, 16, 2283 8 of 31
The linear link between two or more variables is measured using a statistical tech-
nique called correlation. One variable may be predicted from another via the use of cor-
relation. The theory behind utilizing correlation to select features is that useful variables
The theory behind utilizing correlation to select features is that useful variables will have a
will have a strong correlation with the result. The heat map assists in understanding the
strong correlation with the result. The heat map assists in understanding the correlation
correlation ratio
ratio between thebetween
featuresthe features
to know to know
whether whether
there thereconnection
is enough is enough to
connection
constructto con-
a deep
struct
learning model to predict the short-term load forecasts to examine the link betweenlink
a deep learning model to predict the short-term load forecasts to examine the the
between the dataset components.
dataset components. The heat map The
washeat map with
created was created withmatplotlib
the python the pythonand
matplotlib
seaborn
and seaborn
module, module,
which whichthe
calculates calculates the coefficient
correlation correlation (r)
coefficient
between(r) between
the the compo-
components using
nents using Equation
Equation (1) [50]. (1) [50].
∑∑(𝑥 yi −−y𝑦− ))
( xi− x− )()(𝑦
r=
𝑟=q (1)
(1)
∑∑(𝑥
( xi− x− ))2 ∑∑(𝑦 −y𝑦− ))2
( yi −
where
where rr ==correlation
correlationcoefficient,
coefficient,xxi =i =
values
valuesofofthethex-variable
x-variable in in
a sample,
a sample,x− =x−mean
= mean of the
of
values of the x-variable, y = values of the y-variable in a sample, and
the values of the x-variable, yi = values of the y-variable in a sample, and y = mean of the
i y −−= mean of the
values
values ofof the
the y-variable.
y-variable.
Figure
Figure 22 shows
shows the
the correlation
correlation between
between the the features
features within
within the
the dataset,
dataset, and
and itit can
can be
be
seen
seen that
that there
there are
arepositive
positiverelationships
relationships between
between some some features
features such
suchasas(week
(weekand andmonth,
month,
and
and also
also the
the week
week and
and aa number
number of of days
days ofof the
the year)
year) and
and negative
negative relationships
relationships between
between
(year
(year and month, year and and day
day ofof the
the year).
year). In
In addition,
addition,the therelationship
relationshipbetween
betweenelectricity
electric-
ity consumption
consumption andand the hour
the hour is equal
is equal to since
to 0.47 0.47 since the loads
the loads are highest
are highest in the morning
in the morning during
during the working hours of the institutions, and at night, they are
the working hours of the institutions, and at night, they are dropped to the lowest. dropped to the lowest.
Figure 2.
Figure Correlation coefficient
2. Correlation coefficient for
for all
all features
features in
in the
the dataset.
dataset.
Figure
Figure 3. The demand
3. The demand for
for electric
electric load
load in
in Kilowatts
Kilowatts hours
hours (kWh)
(kWh) from
from 2021
2021 to
to 2022.
2022.
Figure 3 shows the electricity consumption from September 2021 to June 2022, it can
be seen that the consumption in September was the highest (350 kWh) among the months.
During this month, the temperatures are high. As for November, it is the lowest; the value
is 50 kWh in consumption. This helps us in dealing with loads and predicting electric
Energies 2023, 16, 2283 9 of 31
Figure 3. The demand for electric load in Kilowatts hours (kWh) from 2021 to 2022.
Figure 3. The demand for electric load in Kilowatts hours (kWh) from 2021 to 2022.
Figure 3 shows the electricity consumption from September 2021 to June 2022, it can
Figure 3 shows the electricity consumption from September 2021 to June 2022, it can
be seen that the consumption in September was the highest (350 kWh) among the months.
be seen this
During
that the consumption in September was the
month, the temperatures are high. Asthe
highest (350
forhighest (350 kWh)
November, kWh) among
among
it is the
the months.
thethe
lowest; months.
value
During this month, the temperatures are high. As for November,
November, it is the lowest; the value
is 50 kWh in consumption. This helps us in dealing with loads and predicting electric
is 50 kWh
50 kWhininconsumption.
consumption. This
This helps
helps us usdealing
in in dealing
with with
loadsloads
and and predicting
predicting electric
electric loads,
loads, especially in the summer when the electricity can be generated from alternative
loads, especially
especially in the summer
in the summer when thewhen the electricity
electricity can be generated
can be generated from alternative
from alternative sources to
sources to avoid problems due to overloads.
sources
avoid to avoiddue
problems problems due to overloads.
to overloads.
Figure 4. Distribution of electric load demand based on the days of the week.
Figure
Figure 4. Distribution of
4. Distribution of electric
electric load
load demand
demand based
based on
on the
the days
days of
of the
the week.
week.
Figure
Figure 4 shows the electricity consumption from
from September
September 2021 2021 to
to June
June 2022
2022 group
Figure 44 shows
shows the
the electricity
electricity consumption
consumption from September 2021 to June 2022 group
group
by
by the
the days
days of
of the
the week.
week. It
It can
can be
be seen
seen that
that the
the consumption
consumption on
on Saturday
Saturday is
is the
the highest,
highest,
by the days of the week. It can be seen that the consumption on Saturday is the highest,
where
where the highest value
the highest value isis 397
397 kWh
kWh between
between thethe days
days of
of the
the week,
week, whereas
whereas Friday
Friday is is the
the
where the highest value is 397 kWh between the days of the week, whereas Friday is the
lowest
lowest day
day of
of the
the week.
week. This
This isis due
due to
to the
the weekend
weekend holiday
holiday inin Palestine
Palestine as
as it
it takes
takes place
place
lowest day of the week. This is due to the weekend holiday in Palestine as it takes place
on
on Friday, and companies
Friday, and companies and factories are
and factories are closed
closed on
on this
this day.
day. This
This analysis
analysis helps
helps us us to
to
on Friday, and companies and factories are closed on this day. This analysis helps us to
focus on forecasting
forecasting electrical
electricalloads
loadsduring
duringthetheworking
workingdays
days forfor employees
employees and and schools
schools as
focus on forecasting electrical loads during the working days for employees and schools
as well
well as reinforced
as reinforced by focusing
by focusing on forecasting
on forecasting loadsloads on days
on days with excessive
with excessive loads loads
to avoidto
as well as reinforced by focusing on forecasting loads on days with excessive loads to
avoid electrical faults and working on the correct distribution of electricity
electrical faults and working on the correct distribution of electricity based on days. based on days.
avoid electrical faults and working on the correct distribution of electricity based on days.
Figure 5. Distribution of electric load demand based on the hours of the day.
Figure 5 shows the consumption of electricity based on the hours during each day.
It can be seen that in the period between 6 a.m., the consumption of electricity begins to
gradually increase to 8 p.m. because it is almost the beginning of the work of factories,
companies, and agricultural wells. Then as noticed, the consumption begins to gradually
decrease in the available night hours and the early morning hours, because it is the sleeping
period for families, and the shops are closed.
Figure 6 shows the electricity consumption from the beginning of September 2021 to
Jun 2022 every minute. It can be seen that the consumption is variable and noticed that
the highest value is 400 kWh, and the lowest value is zero, and this indicates that there is
a disconnection in the electric current at that time.
Figure 6 shows the electricity consumption from the beginning of September 2021 to
Jun 2022 every minute. It can be seen that the consumption is variable and noticed that
the highest value is 400 kWh, and the lowest value is zero, and this indicates that there is
a disconnection in the electric current at that time.
Energies 2023, 16, x FOR PEER REVIEW 11 of 33
Figure 7 shows the electricity consumption from the beginning of September 2021 to
June 2022 in the form of (daily, weekly, and monthly averages). It can be seen from the
monthly average that consumption from September 2021 starts decreasing until the end
of January 2022, because it is the exit period from summer to winter. During the period
between February 2022 and April 2022, consumption was virtually constant, after which
consumption began
Figure 7. Tubas to rise
electricity withover
demand the beginning of the summer
time period (Sep-2021 season.
to Jun-2022).
Figure
Figure8.8.The
Therelationship
relationship between temperaturein
between temperature indegrees
degreesand
andthe
the demand
demand power
power in (kWh).
in (kWh).
Figure 8 shows daily temperatures and daily electricity consumption. It can be seen
that from September to November, the temperatures ranged between (25 °C, and 35 °C),
and the demand for electricity during this period was high from the beginning of Septem-
ber to mid-October because the temperature was high. From mid-October to late Decem-
Energies 2023, 16, 2283 11 of 31
Figure 6 shows the electricity consumption from the beginning of September 2021 to
Jun 2022 every minute. It can be seen that the consumption is variable and noticed that
the highest value is 400 kWh, and the lowest value is zero, and this indicates that there is a
disconnection in the electric current at that time.
Figure 7 shows the electricity consumption from the beginning of September 2021 to
June 2022 in the form of (daily, weekly, and monthly averages). It can be seen from the
monthly average that consumption from September 2021 starts decreasing until the end
of January 2022, because it is the exit period from summer to winter. During the period
between February 2022 and April 2022, consumption was virtually constant, after which
consumption began to rise with the beginning of the summer season.
Figure 8 shows daily temperatures and daily electricity consumption. It can be seen
that from September to November, the temperatures ranged between (25 ◦ C, and 35 ◦ C),
and the demand for electricity during this period was high from the beginning of September
to mid-October because the temperature was high. From mid-October to late December,
the consumption was low because the temperature was in the range of (18 ◦ C to 25 ◦ C. In
the period between November and the beginning of January, temperatures ranged between
(15 ◦ C–25 ◦ C), and the demand for electricity during this period was minimal. Moreover,
during the period between January to February, the temperatures were the lowest in the
range of (5 ◦ C–20 ◦ C), in this period the consumption rises again in the range of (153 kWh to
289 kWh). Then, from March to June, temperatures begin to rise gradually, and accordingly,
the demand for electricity increases or decreases according to the temperature.
In this section, exploratory data analysis was performed, which allows us to focus on
data patterns and decide how to utilize machine learning to extract knowledge from the
data. After visualizing the data, It can be seen that when the temperatures are high, the
demand for electricity increases, and when the temperatures are medium (15 ◦ C–25 ◦ C),
the electricity demand is affected, and at low temperatures (less than 15 ◦ C), the demand
for electricity increases. In addition, the peak electricity consumption is from 6:00 a.m. to
08:00 p.m. Further, the lowest consumption is on the weekends (on Fridays). Following
this exploration, in the next section, the methodology used to forecast the electric load will
be presented and discussed.
Figure 9.
Figure 9. Methodology
Methodologyof
ofbuilding
buildingthe
themachine
machine learning
learning algorithms
algorithms forfor
thethe electric
electric loadload consump-
consumption.
tion.
Data Normalization
3.3.1.Data
Datanormalization
Preprocessingis a preprocessing technique used to prevent some features from
dominating all other features;
Data preprocessing data
is a key normalization
stage in training aims for features with the
the machine-learning same
model to scale to
use the
be of equal
ideal importance;
data structure; there preprocessed
without are many types of data
data, normalization methods,
the machine-learning modelsincluding
may not
standardization and max-min
operate as effectively normalization
as necessary, [51].
resulting in Theand
a miss range of all
poor features was standard-
outcomes.
ized to be between
Depending on[0–1] in this of
the nature study, and the
the raw max-min
data, normalization
different preprocessing technique
sub-stepswas
mayused
be
to conduct
used a linear
[31], data transformation
normalization, on the data,
and removing with correlated
highly the max-min normalization
features were usedmethod
in this
determined
paper, as well using Equation (2)
as removing [52]. with little correlation with the targeted feature, out-
features
lier’s removal, and an evaluation to check the null values in the original data. The tech-
x − xmin
niques will be described in full in thex 0next= section. (2)
xmax − xmin
Data Normalization
where x’ is the normalized value, x is the original feature value, xmin is the minimum value
of theData
feature, and xmax isisthe
normalization maximum value
a preprocessing of the feature.
technique used to prevent some features from
dominating all other features; data normalization aims for features with the same scale to
Feature Selection
be of equal importance; there are many types of data normalization methods, including
Feature selection
standardization is a technique
and max-min where [51].
normalization we choose
The rangethose features
of all in was
features our standard-
data that
contribute most to the target variable; it also has a good set of characteristics to achieve
ized to be between [0–1] in this study, and the max-min normalization technique was used
excellent identification rates in challenging situations [53]. The correlation between features
to conduct a linear transformation on the data, with the max-min normalization method
was calculated, where its statistical technique determines how one variable moves/changes
determined using Equation (2) [52].
about the other variable, when we have highly correlated features in the dataset it increases
the variance and is unreliable [54]. To isolate 𝑥−𝑥
𝑥 = the highly correlated feature and the features (2)
𝑥
with little correlation with the target; we used the− 𝑥 filter method feature selection which
depends on the correlation coefficient to apply a threshold to remove the features with a
correlation higher than 90%.
Figure 10.
Figure 10. A
A long
long short-term
short-term memory
memory block
block diagram
diagram structure.
structure.
Figure 12.
Figure 12. Gate
Gate Recurrent
RecurrentUnit
UnitStructure.
Structure.
3.3.3. HyperparametersTuning
3.3.3. Hyperparameters TuningforforMachine
MachineLearning
Learning Models
Models
This section
section will
willshow
showthe
thehyperparameters
hyperparameters that
that were
were used
used in this
in this research
research to obtain
to obtain
the best results for the models that have been applied. Discovering this section,
the best results for the models that have been applied. Discovering this section, we willwe will
investigate what the best parameters that determine the structure of models to predict
electrical loads are, which is called hyperparameter tuning.
Tuning is the process of choosing an optimal set of hyperparameters for the learning
algorithm [63]. The parameters used in this research are listed as follows:
• Best optimizer.
Energies 2023, 16, 2283 15 of 31
investigate what the best parameters that determine the structure of models to predict
electrical loads are, which is called hyperparameter tuning.
Tuning is the process of choosing an optimal set of hyperparameters for the learning
algorithm [63]. The parameters used in this research are listed as follows:
• Best optimizer.
• Activation function.
• Learning rate.
• The number of epochs.
• Batch size.
• The number of hidden layers.
• Dropout.
where yt is the actual data value and yt P the predicted data value.
2. Root Mean Square Error (RMSE) is equal to the square root of the average squared
error. Equation (4) shows how to calculate RMSE.
s
n
1
∑ (|yt − ytP |)
2
RMSE = (4)
n i =1
3. Mean Absolute Error (MAE) is the mean of the absolute value of the errors. Equation
(5) shows how to calculate MAE.
n
1
MAE =
n ∑ yi− yˆi (5)
i =1
where:
• SSregression —The regression sum of squares (explained sum of squares).
• SStotal —The sum of all squares.
4.1.1. Forecasting Using LSTM, RNN, and GRU Algorithms with Adam Optimizer
4.1.1. In this section,
Forecasting Usingthe results
LSTM, RNN,obtained
and GRU from LSTM, RNN,
Algorithms with Adamand GRU algorithms will be
Optimizer
discussed using one and multiple hidden layers in addition
In this section, the results obtained from LSTM, RNN, and GRU algorithms to the input andwilloutput
be layer
on each model
discussed withand
using one themultiple
dropout in Adam
hidden layersoptimizer.
in addition to the input and output layer
on each model
Figure 13with
showsthe the
dropout
actualin (blue)
Adam and optimizer.
the predicted (orange) results in forecasting the
Figure 13 shows the actual (blue) and
STLF using the LSTM, GRU, and RNN models. Where the predicted (orange) theresults in forecasting
test result values thewere taken
STLF using the LSTM, GRU, and RNN models. Where the test result
from each learning rate. After comparing the test results for each learning rate values were taken
in the LSTM
from each
model, it learning
was found rate.that
After comparing
the LSTM modelthe testachieved
results forthe
each learning
highest rate in
value atthe LSTM rate of
a learning
model, it was found that the LSTM model achieved the highest value at a learning rate of
0.1, where R-squared = 0.87239, It can be seen from the error rate in forecasting the loads on
0.1, where R-squared = 0.87239, It can be seen from the error rate in forecasting the loads
the peak is minimal. The GRU model achieved the highest value at the learning rate of 0.01,
on the peak is minimal. The GRU model achieved the highest value at the learning rate of
where R-squared = 0.8732, the error rate in forecasting the loads on the peak is minimal,
0.01, where R-squared = 0.8732, the error rate in forecasting the loads on the peak is min-
and
imal, andresults
the are very
the results close
are very to the
close LSTM
to the LSTMmodel.
model.After
Aftercomparing
comparingthe the test
test results
results for each
learning rate in rate
for each learning the RNN model,
in the RNN it was
model, found
it was thatthat
found thethe
RNNRNNmodel
modelachieved
achieved the the highest
value
highestatvalue
a learning rate ofrate
at a learning 0.01, where
of 0.01, R-squared
where R-squared= 0.86647.
= 0.86647.
Figure 13.
Figure 13.Electricity
Electricityload forecasting
load results
forecasting for each
results for model are based
each model areon the Adam
based on theoptimizer and
Adam optimizer and
one hidden
one hiddenlayer.
layer.
Figure 14 shows the actual and predicted results of forecasting the STLF using the
LSTM, GRU, and RNN models. Where the test result values were taken from each learning
rate. After comparing the test results for each learning rate in the LSTM model, it was
found that the LSTM model achieved the highest value at a learning rate of 0.01, where R-
Energies 2023, 16, 2283 17 of 31
Figure 14 shows the actual and predicted results of forecasting the STLF using the
LSTM, GRU, and RNN models. Where the test result values were taken from each learning
rate. After comparing the test results for each learning rate in the LSTM model, it was
Energies 2023, 16, x FOR PEER REVIEW 18 of 33
found that the LSTM model achieved the highest value at a learning rate of 0.01, where
R-squared = 0.8672. The GRU model achieved the highest value at a learning rate of 0.01,
where R-squared = 0.90228. The RNN model achieved the highest value at a learning rate of
0.001,
of where
0.001, R-squared
where R-squared= 0.8275. It is
= 0.8275. concluded
It is that
concluded thethe
that GRU model
GRU achieved
model the
achieved highest
the high-
accuracy
est andand
accuracy the the
lowest error
lowest rate.rate.
error
Figure 14. Electricity load forecasting results for each model are based on the Adam optimizer and
two hidden layers.
Table 33 shows
Table shows thethetest
testresults
resultsafter
afterapplying
applyingeach
eachmodel
modelbased
basedononthethehyperparameter
hyperparame-
(Learning
ter (Learning rate,rate,
number
numberof hidden layers).
of hidden The LSTM
layers). model
The LSTM achieved
model the best
achieved theresult when
best result
applied one hidden layer and a learning rate of 0.01, where the R- square
when applied one hidden layer and a learning rate of 0.01, where the R- square is 0.87239. is 0.87239. The
best result of R-squared is 0.87239 in the GRU model at two hidden
The best result of R-squared is 0.87239 in the GRU model at two hidden layers and the layers and the learning
rate is 0.01.
learning rateThe RNNThe
is 0.01. modelRNN obtains
modelthe best R-squared
obtains = 0.86647 =at0.86647
the best R-squared one hidden layer
at one and
hidden
the learning
layer and therate is 0.01.rate
learning All is
in0.01.
all, the
Allbest model
in all, that model
the best achievedthatthe lowest error
achieved rate is error
the lowest GRU
with
rate isa GRU
rate ofwith
learning
a rateequal to 0.01equal
of learning and two hidden
to 0.01 and layers.
two hidden layers.
After applying the Adam optimizer in more than one way (one hidden layer, two
hidden
Table 3. layers, and Adam
Result from three hidden
optimizer layers) with
for each machine learning models LSTM, RNN, and
model.
GRU. Conclude this optimizer was applied with two hidden layers With the GRU model,
Learning
gave Rate results,
the best Modelas the R-squared MSE was 90.228%,
R-Squared RMSE is RMSE
0.04647, and MAE the MAE
was 0.03266. One hidden layer
0.01 0.00282 0.87239 0.05310 0.03937
LSTM
0.001 0.00400 0.81900 0.06324 0.04786
0.01 0.00374 0.83063 0.06118 0.04731
GRU
0.001 0.00280 0.87323 0.05293 0.03790
Energies 2023, 16, 2283 18 of 31
4.1.2. Forecasting Using LSTM, RNN, and GRU Algorithms with AdaGrad Optimizer
In this section, the results obtained from LSTM, RNN, and GRU algorithms will be
discussed using one and multiple hidden layers in addition to the input and output layer
on each model with the dropout in AdaGrad optimizer.
Figure 15 shows the actual and predicted results in forecasting the STLF using the
LSTM, GRU, and RNN models. Where the test result values were taken from each learning
rate. After comparing the test results for each learning rate in the LSTM model, it was
found that the LSTM model achieved the highest value at a learning rate of 0.1, where R-
square = 0.86627. It can be seen from Figure 15 that the error rate in forecasting the loads
on the peak is minimal. The GRU model achieved the highest value at the learning rate of
0.01, where R- square = 0.86413, the results are very close to the LSTM model. The RNN
model achieved the highest value at the learning rate of 0.01, where R- square = 0.86399,
the results are very close to the LSTM and GRU models.
Figure 16 shows the actual and predicted results in forecasting the STLF using the
LSTM, GRU, and RNN models. Where the test result values were taken from each learning
rate. After comparing the test results for each learning rate in the LSTM model, it was
found that the LSTM model achieved the highest value at a learning rate of 0.1, where
R-squared = 0.8600. It can be seen from Figure 16, the error rate in forecasting the loads
on the peak is very small, but there is a clear difference between the actual and forecasted
values at the bottom (the electricity demand is minimal) in this model. The GRU model
achieved the highest value at the learning rate of 0.01, where R-squared = 0.8672. The RNN
model achieved the highest value at the learning rate of 0.01, where R-square = 0.8587. It
can be seen from Figure 16, that there is a clear difference between the actual and forecasted
Energies 2023, 16, 2283 19 of 31
values at the bottom (the electricity demand is minimal) in the RNN model, and the same
case in their peak values.
Table 4 shows the test results after applying each model based on the hyperparameter
(Learning rate, number of hidden layers). The LSTM model achieved the best result when
applied one hidden layer and a learning rate of 0.01, where the R-squared is 0.87239. The
best result of R-squared is 0.87239 in the GRU model at two hidden layers and the learning
rate is 0.01. The RNN model obtains the best R-squared = 0.86647 at one hidden layer and
Energies 2023, 16, x FOR PEER REVIEW
the learning rate is 0.01. All in all, the best model that achieved the lowest error rate20isofGRU
33
4. Result
TableFigure 16from AdaGrad
shows optimizer
the actual and for each model.
predicted results in forecasting the STLF using the
LSTM, GRU, and RNN models. Where the test result values were taken from each learning
Learning Rate Model MSE R-Squared RMSE MAE
rate. After comparing the test results for each learning rate in the LSTM model, it was
found that the LSTM model achievedOne the hidden
highestlayer
value at a learning rate of 0.1, where R-
squared0.01
= 0.8600. It can be seen from Figure 16, the
0.00295 error rate in 0.05436
0.86627 forecasting the0.04305
loads on
LSTM
the peak is
0.001 very small, but there is a clear
0.00822 difference between
0.62783 the actual
0.09069 and forecasted
0.07237
values at the bottom (the electricity demand is minimal) in this model. The GRU model
0.01 0.00319 0.85533 0.05654 0.04119
achieved the highest value
GRU at the learning rate of 0.01, where R-squared = 0.8672. The RNN
model 0.001
achieved the highest value 0.00300
at the learning 0.86413 0.05479
rate of 0.01, where R-square =0.04042
0.8587. It
can be 0.01
seen from Figure 16, that there is a clear difference
0.00303 0.86273 between the actual 0.04251
0.05508 and fore-
RNN
casted 0.001
values at the bottom (the electricity
0.00320 demand is minimal) 0.05489
0.86399 in the RNN model, and
0.04171
the same case in their peak values.
Energies 2023, 16, 2283 20 of 31
Table 4. Cont.
16. Electricity load forecasting results for each model are based on the
Figure 16. the AdaGrad
AdaGrad optimizer
and two hidden layers.
Table 4 shows the test results after applying each model based on the hyperparame-
ter (Learning rate, number of hidden layers). The LSTM model achieved the best result
when applied one hidden layer and a learning rate of 0.01, where the R-squared is 0.87239.
The best result of R-squared is 0.87239 in the GRU model at two hidden layers and the
Energies 2023, 16, 2283 21 of 31
After applying the Adam optimizer in more than one way (one hidden layer, two
hidden layers, and three hidden layers) with machine learning models LSTM, RNN, and
GRU. Conclude this optimizer was applied with two hidden layers With the GRU model,
gave the best results, as the R-squared was 90.228%, RMSE is 0.04647, and the MAE
was 0.03266.
4.1.3. Forecasting Using LSTM, RNN, and GRU Algorithms with RMSprop Optimizer
In this section, the results obtained from LSTM, RNN, and GRU algorithms will be
discussed using one and multiple hidden layers in addition to the input and output layer
on each model with the dropout in RMSprop optimizer.
Figure 17 shows the actual and predicted results in forecasting the STLF using the
LSTM, GRU, and RNN models. Where the test result values were taken from each learning
rate. After comparing the test results for each learning rate in the LSTM model, it was
found that the LSTM model achieved the highest value at a learning rate of 0.01, where
R-squared = 0.84209. It can be seen from Figure 17, there is an error rate in the difference
between the real and forecasted loads, where the accuracy rate in the small loads (minimum
load) is not high. The GRU model achieved the highest value at a learning rate of 0.01,
where R-squared = 0.87749, the error rate in forecasting the loads on the peak is very low,
but there is a very small difference in the minimum loads. The RNN model achieved the
Energies 2023, 16, x FOR PEER REVIEW 23 of
highest value at a learning rate of 0.01, where R-squared = 0.85114, the results are very
close to the GRU model.
Figure 18 shows the actual and predicted results in forecasting the STLF using t
LSTM, GRU, and RNN models. Where the test result values were taken from each learni
rate. After comparing the test results for each learning rate in the LSTM model, it w
found that the LSTM model achieved the highest value at a learning rate of 0.001, whe
Energies 2023, 16, 2283 22 of 31
Figure 18 shows the actual and predicted results in forecasting the STLF using the
LSTM, GRU, and RNN models. Where the test result values were taken from each learning
rate. After comparing the test results for each learning rate in the LSTM model, it was
found that the LSTM model achieved the highest value at a learning rate of 0.001, where
R-squared = 0.8216. It can be seen from Figure 18, the error rate in forecasting the loads on
the peak is high, but there is a clear difference between the actual and forecasted values
at the bottom and the peak loads (the electricity demand is minimal) in the LSTM model.
Where the LSTM model was able to forecast the average loads in this case, as for the
electrical loads at the top and the small loads, the model was not able to forecast their
loads, and this led to a high error rate and a lack of accuracy for this model in this case.
The GRU model achieved the highest value at a learning rate of 0.01, where R-squared
= 0.8804, and the error rate in forecasting the loads on the peak and minimum loads is
very low. The RNN model achieved the highest value at the learning rate of 0.01, where
R-squared = 0.7915, which is a clear difference between the actual and forecasted values at
Energies 2023, 16, x FOR PEER REVIEW 24 of 33
the bottom (the electricity demand is minimal) are small in the RNN model, and the same
case in the last test period, there is a fluctuation in the difference between the two values.
18. Electricity load forecasting results for each model are based on the RMSprop
Figure 18. RMSprop optimizer
and two hidden layers.
Table
Table 55 shows
showsthe
thetest
testresults
resultsafter
afterapplying
applyingeach
eachmodel
modelbased
basedononthe
thehyperparameter
hyperparame-
(Learning
ter (Learning rate, number of hidden layers). The LSTM model achieved theresult
rate, number of hidden layers). The LSTM model achieved the best when
best result
applied one hidden layer and a learning rate of 0.01, where the R-squared is 0.84209.
when applied one hidden layer and a learning rate of 0.01, where the R-squared is 0.84209. The
best result of R-squared is 0.8804 in the GRU model at two hidden layers and
The best result of R-squared is 0.8804 in the GRU model at two hidden layers and the the learning
learning rate is 0.01. The RNN model obtains the best R-squared = 0.85114 at one hidden
layer and the learning rate is 0.01. All in all, the best model that achieved the lowest error
rate is GRU with a rate of learning equal to 0.01 and two hidden layers, and the results of
the LSTM and RNN algorithms were convergent when applying a single hidden layer.
Energies 2023, 16, 2283 23 of 31
rate is 0.01. The RNN model obtains the best R-squared = 0.85114 at one hidden layer and
the learning rate is 0.01. All in all, the best model that achieved the lowest error rate is GRU
with a rate of learning equal to 0.01 and two hidden layers, and the results of the LSTM
and RNN algorithms were convergent when applying a single hidden layer.
After applying the RMSprop optimizer in more than one way (one hidden layer, two
hidden layers, three hidden layers, different learning rates) with machine learning models
LSTM, RNN, and GRU. Conclude this optimizer is applied with two hidden layers With
the GRU model and the learning rate is 0.01, it gave the best results, as the R-squared was
88.02%, RMSE is 0.0513, and the MAE was 0.0378.
4.1.4. Forecasting Using LSTM, RNN, and GRU Algorithms with Adadelta Optimizer
In this section, the results obtained from LSTM, RNN, and GRU algorithms will be
discussed using one and multiple hidden layers in addition to the input and output layer
on each model with the dropout in Adadelta optimizer.
Figure 19 shows the actual and predicted results in forecasting the STLF using the
LSTM, GRU, and RNN models. Where the test result values were taken from each learning
rate. After comparing the test results for each learning rate in the LSTM model, it was
found that the LSTM model achieved the highest value at a learning rate of 0.1, where
R-squared = 0.81143. It can be seen from Figure 19, the difference between the actual and
expected values can be seen, especially in the peak values in which the loads are the highest,
and there is a large and clear difference between the expected and actual values, which led
to an increase in the error rate in this case when the loads are minimal. The GRU model
achieved the highest value at a learning rate of 0.1, where R-squared = 0.86781, and the
Energies 2023, 16, 2283 24 of 31
error rate in forecasting the loads on the peak is very small, the RNN model obtained the
best results compared to other models. The RNN model achieved the highest value
Energies 2023, 16, x FOR PEER REVIEW at33a
26 of
learning rate of 0.1, where R-squared = 0.86348, and its results, in this case, are close to the
results of the GRU model.
19. Electricity
Figure 19.
Figure load forecasting
Electricity load forecasting results
results for
for each
each model
model are
are based
based on
on Adadelta
Adadelta optimizer
optimizer and
and
one hidden layer.
Figure 20
Figure 20 shows
shows the the actual
actual and
and predicted
predicted results
results in in forecasting
forecasting thethe STLF
STLF using
using the
the
LSTM, GRU, and RNN models. Where the test result values were
LSTM, GRU, and RNN models. Where the test result values were taken from each learningtaken from each learning
rate. After
rate. After comparing
comparingthe thetest results
test forfor
results each learning
each learningrate rate
in the
inLSTM
the LSTMmodel, it was it
model, found
was
that the LSTM model achieved the highest value at a learning rate of
found that the LSTM model achieved the highest value at a learning rate of 0.1, where R- 0.1, where R-squared
= 0.7262. =It0.7262.
squared can be It seen
canfrom Figure
be seen from20,Figure
the error
20,rate
the in forecasting
error the loads on
rate in forecasting the
the peakon
loads is
high,
the peakbut is
there
high,is abut
clear difference
there is a clearbetween the actual
difference between andthe
forecasted
actual and values at the bottom
forecasted values
and the peak loads (the electricity demand is minimal) in the
at the bottom and the peak loads (the electricity demand is minimal) in the LSTM LSTM model. Where the
model.
LSTM model was able to forecast the average loads in this case, as for
Where the LSTM model was able to forecast the average loads in this case, as for the elec- the electrical loads
at the loads
trical top and the top
at the small loads,
and the model
the small loads,was
the not
modelablewasto forecast
not abletheir loads, and
to forecast theirthis led
loads,
to a high error rate and a lack of accuracy for this model in this case. The LSTM model in
and this led to a high error rate and a lack of accuracy for this model in this case. The
this case is considered a failure, and it cannot be relied upon in forecasting electrical loads
LSTM model in this case is considered a failure, and it cannot be relied upon in forecasting
because it has a large error rate and a low accuracy with a good estimate. The GRU model
electrical loads because it has a large error rate and a low accuracy with a good estimate.
achieved the highest value at a learning rate of 0.1, where R-squared = 0.8676, and the
The GRU model achieved the highest value at a learning rate of 0.1, where R-squared =
error rate in forecasting the loads on the peak and minimum loads is very small. The RNN
0.8676, and the error rate in forecasting the loads on the peak and minimum loads is very
model achieved the highest value at a learning rate of 0.1, where R-squared = 0.8426, which
small. The RNN model achieved the highest value at a learning rate of 0.1, where R-
squared = 0.8426, which is a slight difference between the actual and expected values in
peak loads, especially at the end of the testing period, the difference was clear.
Energies 2023, 16, 2283 25 of 31
Table
Table66shows
showsthe thetest
testresults
resultsafter
afterapplying
applyingeach eachmodel
modelbased
basedonon the hyperparameter
the hyperparame-
(learning rate,rate,
ter (learning numbernumberof hidden layers).
of hidden The LSTM
layers). The LSTMmodelmodel
achieved the best
achieved theresult
best when
result
applied one hidden
when applied layer and
one hidden layera and
learning rate ofrate
a learning 0.01,
of where the R-squared
0.01, where the R-squaredis 0.81143. The
is 0.81143.
best result of R-squared is 0.86781 in the GRU model at one hidden
The best result of R-squared is 0.86781 in the GRU model at one hidden layer and the layer and the learning
rate is 0.01.
learning The
rate RNNThe
is 0.01. model RNN obtains
modelthe best R-squared
obtains = 0.86348 at
the best R-squared one hidden
= 0.86348 at onelayer and
hidden
the learning rate is 0.01. All in all, the best model that achieved the lowest
layer and the learning rate is 0.01. All in all, the best model that achieved the lowest error error rate is GRU
with a rate
rate is GRUofwithlearning
a rateequal to 0.01 and
of learning equaloneto hidden
0.01 and layer,
one and the results
hidden of the
layer, and theLSTM
resultsand
of
RNN algorithms were convergent when applying a single hidden
the LSTM and RNN algorithms were convergent when applying a single hidden layer and layer and the learning
rate is 0.01. rate is 0.01.
the learning
After applying the AdaDelta optimizer in more than one way (one hidden layer, two
hidden
Table 6.layers, threeAdaDelta
Result from hidden layers,
optimizerdifferent
for eachlearning
model. rates) with machine learning models
LSTM, RNN, and GRU. Conclude this optimizer is applied with one hidden layer With the
Learning
GRU modelRate
and theModellearning rate isMSE 0.1. In otherR-Squared
words, the results RMSE MAE the
were close between
different classes with a learning rateOne hidden
of 0.1, layer
it gave the best results, as the R-squared was
86.781%,
0.01RMSE is 0.05405, and the0.00416 MAE was 0.04006. 0.81143 0.06455 0.05274
Figure LSTM
0.001 21 shows the results obtained 0.01577 by machine 0.28612learning models
0.12561 from LSTM, 0.10147 RNN,
and GRU0.01 on two hidden layers with a
0.00292 learning rate
0.86781= 0.01. Here we
0.05405 only take
0.04006best
the
results for comparison, GRU and those applied to four optimizers (Adam, Adagrad, RMSprop,
0.001 0.00959 0.56599 0.09794 0.07986
0.01 0.00301 0.86348 0.05492 0.04120
RNN
0.001 0.00586 0.73461 0.07658 0.06180
Two hidden layers
Energies 2023, 16, 2283 26 of 31
and AdaDelta). In general, the best optimizer gave the lowest percentage of MAE (Adam
enhancer using GRU). However, the AdaDelta enhancer was the worst in terms of the high
error rate compared to the other. Figure 21 also shows that in Adam’s optimizer using GRU
Energies 2023, 16, x FOR PEER REVIEW 28 of 33
model was the best where MAE = 0.03266.
Figure 21.
Figure MAE results
21. MAE resultswere
wereobtained
obtainedbyby
LSTM, RNN,
LSTM, and
RNN, GRU
and models
GRU withwith
models more thanthan
more one optimizer.
one opti-
mizer.
In this section, all the results obtained from the proposed deep-learning methods for
forecasting short-term
In this section, electrical
all the resultsloads are discussed
obtained and explained.
from the proposed Performance
deep-learning metrics
methods for
were based on R-squared, MAE, RMSE, and MSE to compare machine-learning algorithms
forecasting short-term electrical loads are discussed and explained. Performance metrics
(LSTM,
were RNN,
based onand GRU) and
R-squared, choose
MAE, the best
RMSE, andamong
MSE to them. In addition,
compare we have applied
machine-learning algo-
rithms (LSTM, RNN, and GRU) and choose the best among them. In addition, we have
applied more than one hidden layer, more than one learning rate, and four optimizations
to choose the best and obtain the lowest error rate. The GRU model obtained the best
results, where the R-squared value = 90.2% and MAE = 0.03266, RMSE = 0.04647. That is
Energies 2023, 16, 2283 27 of 31
more than one hidden layer, more than one learning rate, and four optimizations to choose
the best and obtain the lowest error rate. The GRU model obtained the best results, where
the R-squared value = 90.2% and MAE = 0.03266, RMSE = 0.04647. That is when applying
Adam’s enhancer and two hidden layers with a learning ratio of 0.01. Where many batch
sizes and the number of epochs were tested, and the best batch size was 32 and the number
of epochs was 50, with a training rate is 70% and a test is 30%. In Table 7 we will discuss
and present the results obtained from previous studies and compare them with the results
obtained in our study as illustrated below:
From Table 7 it can be seen that the outcomes from each study are different in which
this is based on many factors namely; dataset, features, algothims used and tunning
parameters. As an overall, our study shows better results when the GRU model is applied
Energies 2023, 16, 2283 28 of 31
with MSE of 0.00215, RMSE of 0.04647, and MAE of 0.03266. This is due to the use
of seven Hyperparameters in tuning the models in order to obtain the best results and
avoid overfitting.
Author Contributions: Conceptualization, M.A. and A.Y.O.; methodology, M.A., M.O. and A.Y.O.;
software, M.A. and A.Y.O.; formal analysis, M.A., M.O. and A.Y.O.; investigation, M.A. and A.Y.O.;
resources, M.A.; data curation, M.A.; writing—original draft preparation, M.A.; writing—review and
editing, A.Y.O. and M.O. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding. The paper is part of the research and collab-
oration activities for the UNESCO Chair in Data Science for Sustainable Development at the Arab
American University—Palestine, Chairholder Dr. Majdi Owda.
Data Availability Statement: The data and source code used in this paper can be shared with other
researchers upon a reasonable request.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Yohanandhan, R.V.; Elavarasan, R.M.; Pugazhendhi, R.; Premkumar, M.; Mihet-Popa, L.; Zhao, J.; Terzija, V. A specialized review
on outlook of future Cyber-Physical Power System (CPPS) testbeds for securing electric power grid. Int. J. Electr. Power Energy
Syst. 2022, 136, 107720. [CrossRef]
2. Azarpour, A.; Mohammadzadeh, O.; Rezaei, N.; Zendehboudi, S. Current status and future prospects of renewable and sustainable
energy in North America: Progress and challenges. Energy Convers. Manag. 2022, 269, 115945. [CrossRef]
Energies 2023, 16, 2283 29 of 31
3. Huang, N.; Wang, S.; Wang, R.; Cai, G.; Liu, Y.; Dai, Q. Gated spatial-temporal graph neural network based short-term load
forecasting for wide-area multiple buses. Int. J. Electr. Power Energy Syst. 2023, 145, 108651. [CrossRef]
4. Liu, C.-L.; Tseng, C.-J.; Huang, T.-H.; Yang, J.-S.; Huang, K.-B. A multi-task learning model for building electrical load prediction.
Energy Build. 2023, 278, 112601. [CrossRef]
5. Xia, Y.; Wang, J.; Wei, D.; Zhang, Z. Combined framework based on data preprocessing and multi-objective optimizer for
electricity load forecasting. Eng. Appl. Artif. Intell. 2023, 119, 105776. [CrossRef]
6. Jena, T.R.; Barik, S.S.; Nayak, S.K. Electricity Consumption & Prediction using Machine Learning Models. Acta Tech. Corviniensis-
Bull. Eng. 2020, 9, 2804–2818.
7. Mansouri, S.A.; Jordehi, A.R.; Marzband, M.; Tostado-Véliz, M.; Jurado, F.; Aguado, J.A. An IoT-enabled hierarchical decentralized
framework for multi-energy microgrids market management in the presence of smart prosumers using a deep learning-based
forecaster. Appl. Energy 2023, 333, 120560. [CrossRef]
8. Oprea, S.-V.; Bâra, A.; Puican, F.C.; Radu, I.C. Anomaly Detection with Machine Learning Algorithms and Big Data in Electricity
Consumption. Sustainability 2021, 13, 10963. [CrossRef]
9. Lei, L.; Chen, W.; Wu, B.; Chen, C.; Liu, W. A building energy consumption prediction model based on rough set theory and deep
learning algorithms. Energy Build. 2021, 240, 110886. [CrossRef]
10. Liu, T.; Xu, C.; Guo, Y.; Chen, H. A novel deep reinforcement learning based methodology for short-term HVAC system energy
consumption prediction. Int. J. Refrig. 2019, 107, 39–51. [CrossRef]
11. Al-Bayaty, H.; Mohammed, T.; Wang, W.; Ghareeb, A. City scale energy demand forecasting using machine learning based
models: A comparative study. ACM Int. Conf. Proceeding Ser. 2019, 28, 1–9.
12. Ahmad, T.; Chen, H.; Huang, R.; Yabin, G.; Wang, J.; Shair, J.; Akram, H.M.A.; Mohsan, S.A.H.; Kazim, M. Supervised based
machine learning models for short, medium and long-term energy prediction in distinct building environment. Energy 2018, 158,
17–32. [CrossRef]
13. Geetha, R.; Ramyadevi, K.; Balasubramanian, M. Prediction of domestic power peak demand and consumption using supervised
machine learning with smart meter dataset. Multimedia Tools Appl. 2021, 80, 19675–19693. [CrossRef]
14. Chen, C.; Liu, Y.; Kumar, M.; Qin, J.; Ren, Y. Energy consumption modelling using deep learning embedded semi-supervised
learning. Comput. Ind. Eng. 2019, 135, 757–765. [CrossRef]
15. Khan, Z.; Adil, M.; Javaid, N.; Saqib, M.; Shafiq, M.; Choi, J.-G. Electricity Theft Detection Using Supervised Learning Techniques
on Smart Meter Data. Sustainability 2020, 12, 8023. [CrossRef]
16. Kaur, H.; Kumari, V. Predictive modelling and analytics for diabetes using a machine learning approach. Appl. Comput. Inform.
2022, 18, 90–100. [CrossRef]
17. Kim, T.-Y.; Cho, S.-B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81.
[CrossRef]
18. Wang, Z.; Srinivasan, R.S. A Review of Artificial Intelligence Based Building Energy Use Prediction: Contrasting the Capabilities
of single and Ensemble Prediction Models. Renew. Sustain. Energy Rev. 2017, 75, 796–808. [CrossRef]
19. Ivanov, D.; Tsipoulanidis, A.; Schönberger, J. Global Supply Chain and Operations Management; Springer International Publishing:
Cham, Switzerland, 2017.
20. Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical load forecasting models: A critical systematic review. Sustain. Cities Soc. 2017, 35,
257–270. [CrossRef]
21. Arora, S.; Taylor, J.W. Rule-based autoregressive moving average models for forecasting load on special days: A case study for
France. Eur. J. Oper. Res. 2018, 266, 259–268. [CrossRef]
22. Takeda, H.; Tamura, Y.; Sato, S. Using the ensemble Kalman filter for electricity load forecasting and analysis. Energy 2016, 104,
184–198. [CrossRef]
23. Maldonado, S.; González, A.; Crone, S. Automatic time series analysis for electric load forecasting via support vector regression.
Appl. Soft Comput. 2019, 83, 105616. [CrossRef]
24. Rendon-Sanchez, J.F.; de Menezes, L.M. Structural combination of seasonal exponential smoothing forecasts applied to load
forecasting. Eur. J. Oper. Res. 2019, 275, 916–924. [CrossRef]
25. Lindberg, K.; Seljom, P.; Madsen, H.; Fischer, D.; Korpås, M. Long-term electricity load forecasting: Current and future trends.
Util. Policy 2019, 58, 102–119. [CrossRef]
26. Hong, T.; Forecasting, S.F. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. 2016, 32, 914–938. [CrossRef]
27. Kloker, S.; Straub, T.; Weinhardt, C.; Maedche, A.; Brocke, J.V.; Hevner, A. Designing a Crowd Forecasting Tool to Combine
Prediction Markets and Real-Time Delphi. In Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2017; Volume 10243,
pp. 468–473. [CrossRef]
28. Goehry, B.; Goude, Y.; Massart, P.; Poggi, J.-M. Aggregation of Multi-Scale Experts for Bottom-Up Load Forecasting. IEEE Trans.
Smart Grid 2019, 11, 1895–1904. [CrossRef]
29. Chafi, Z.S.; Afrakhte, H. Short-Term Load Forecasting Using Neural Network and Particle Swarm Optimization (PSO) Algorithm.
Math. Probl. Eng. 2021, 2021, 5598267. [CrossRef]
30. Gao, X.; Li, X.; Zhao, B.; Ji, W.; Jing, X.; He, Y. Short-Term Electricity Load Forecasting Model Based on EMD-GRU with Feature
Selection. Energies 2020, 12, 1140. [CrossRef]
Energies 2023, 16, 2283 30 of 31
31. Yuan, B.; He, B.; Yan, J.; Jiang, J.; Wei, Z.; Shen, X. Short-term electricity consumption forecasting method based on empirical
mode decomposition of long-short term memory network. IOP Conf. Ser. Earth Environ. Sci. 2022, 983, 12004. [CrossRef]
32. He, F.; Zhou, J.; Feng, Z.-K.; Liu, G.; Yang, Y. A hybrid short-term load forecasting model based on variational mode decomposition
and long short-term memory networks considering relevant factors with Bayesian optimization algorithm. Appl. Energy 2019,
237, 103–116. [CrossRef]
33. Zhou, F.; Zhou, H.; Li, Z.; Zhao, K. Multi-Step Ahead Short-Term Electricity Load Forecasting Using VMD-TCN and Error
Correction Strategy. Energies 2022, 15, 5375. [CrossRef]
34. Nasiri, H.; Ebadzadeh, M.M. Multi-step-ahead Stock Price Prediction Using Recurrent Fuzzy Neural Network and Variational
Mode Decomposition. arXiv 2022, arXiv:2212.14687.
35. Biswas, M.R.; Robinson, M.D.; Fumo, N. Prediction of residential building energy consumption: A neural network approach.
Energy 2016, 117, 84–92. [CrossRef]
36. Bendaoud, N.M.M.; Farah, N. Using deep learning for short-term load forecasting. Neural Comput. Appl. 2020, 32, 15029–15041.
[CrossRef]
37. Thokala, N.K.; Bapna, A.; Chandra, M.G. A deployable electrical load forecasting solution for commercial buildings. In
Proceedings of the 2018 IEEE International Conference on Industrial Technology (ICIT), Lyon, France, 20–22 February 2018; pp.
1101–1106.
38. Nasiri, H.; Ebadzadeh, M.M. MFRFNN: Multi-Functional Recurrent Fuzzy Neural Network for Chaotic Time Series Prediction.
Neurocomputing 2022, 507, 292–310. [CrossRef]
39. Alobaidi, M.H.; Chebana, F.; Meguid, M.A. Robust ensemble learning framework for day-ahead forecasting of household based
energy consumption. Appl. Energy 2018, 212, 997–1012. [CrossRef]
40. Fekri, M.N.; Patel, H.; Grolinger, K.; Sharma, V. Deep learning for load forecasting with smart meter data: Online Adaptive
Recurrent Neural Network. Appl. Energy 2020, 282, 116177. [CrossRef]
41. Somu, N.; MR, G.R.; Ramamritham, K. A hybrid model for building energy consumption forecasting using long short term
memory networks. Appl. Energy 2020, 261, 114131. [CrossRef]
42. Li, L.; Ota, K.; Dong, M. Everything is Image: CNN-based Short-Term Electrical Load Forecasting for Smart Grid. In Proceedings of
the 2017 14th International Symposium on Pervasive Systems, Algorithms and Networks & 2017 11th International Conference on
Frontier of Computer Science and Technology & 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC),
Exeter, UK, 21–23 June 2017; Volume 99, pp. 344–351. [CrossRef]
43. Shi, H.; Xu, M.; Grid, R.L. Deep learning for household load forecasting—A novel pooling deep RNN. IEEE Trans. Smart Grid
2017, 8, 133–190.
44. Amarasinghe, K.; Marino, D.L.; Manic, M. Deep neural networks for energy load forecasting. In Proceedings of the 2017 IEEE
26th International Symposium on Industrial Electronics (ISIE), Edinburgh, UK, 19–21 June 2017; Volume 14, pp. 1483–1488.
45. Bache, K.; Lichman, M. UCI machine learning repository. IEEE Access 2018, 206, 23.
46. Bessani, M.; Massignan, J.A.; Santos, T.; London, J.B.; Maciel, C.D. Multiple households very short-term load forecasting using
bayesian networks. Electr. Power Syst. Res. 2020, 189, 106733. [CrossRef]
47. Gong, L.; Yu, M.; Jiang, S.; Cutsuridis, V.; Pearson, S. Deep Learning Based Prediction on Greenhouse Crop Yield Combined TCN
and RNN. Sensors 2021, 21, 4537. [CrossRef] [PubMed]
48. Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-Term Residential Load Forecasting Based on LSTM Recurrent
Neural Network. IEEE Trans. Smart Grid 2017, 10, 841–851. [CrossRef]
49. Javed, U.; Ijaz, K.; Jawad, M.; Ansari, E.A.; Shabbir, N.; Kütt, L.; Husev, O. Exploratory Data Analysis Based Short-Term Electrical
Load Forecasting: A Comprehensive Analysis. Energies 2021, 14, 5510. [CrossRef]
50. Zhang, J.; Xu, Z.; Wei, Z. Absolute logarithmic calibration for correlation coefficient with multiplicative distortion. Commun. Stat.
Comput. 2023, 52, 482–505. [CrossRef]
51. Aggarwal, C.C. Data Mining: The Textbook; Springer: Berlin/Heidelberg, Germany, 2015; Volume 1.
52. Punyani, P.; Gupta, R.; Kumar, A. A multimodal biometric system using match score and decision level fusion. Int. J. Inf. Technol.
2022, 14, 725–730. [CrossRef]
53. Vafaie, H.; De Jong, K. Genetic algorithms as a tool for feature selection in machine learning. ICTAI 2018, 200–203. [CrossRef]
54. Norouzi, A.; Aliramezani, M.; Koch, C.R. A correlation-based model order reduction approach for a diesel engine NOx and brake
mean effective pressure dynamic model using machine learning. Int. J. Engine Res. 2020, 22, 2654–2672. [CrossRef]
55. Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000, 12, 2451–2471.
[CrossRef]
56. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016.
57. Fan, C.; Wang, J.; Gang, W.; Li, S. Assessment of deep recurrent neural network-based strategies for short-term building energy
predictions. Appl. Energy 2019, 236, 700–710. [CrossRef]
58. Cho, K.; Van Merrienboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder
approaches. arXiv 2019, arXiv:1409.1259. [CrossRef]
59. Britz, D. Recurrent neural network tutorial, part 4 implementing a gru/lstm rnn with python and Theano. Inf. Syst. E-bus. Manag.
2015, 256, 560–587.
Energies 2023, 16, 2283 31 of 31
60. Ravanelli, M.; Brakel, P.; Omologo, M.; Bengio, Y. Light Gated Recurrent Units for Speech Recognition. IEEE Trans. Emerg. Top.
Comput. Intell. 2018, 2, 92–102. [CrossRef]
61. Su, Y.; Kuo, C.-C.J. On extended long short-term memory and dependent bidirectional recurrent neural network. Neurocomputing
2019, 356, 151–161. [CrossRef]
62. Gruber, N.; Jockisch, A. Are GRU Cells More Specific and LSTM Cells More Sensitive in Motive Classification of Text? Front. Artif.
Intell. 2020, 3, 40. [CrossRef]
63. Veloso, B.; Gama, J.; Malheiro, B.; Vinagre, J. Hyperparameter self-tuning for data streams. Inf. Fusion 2021, 76, 75–86. [CrossRef]
64. Plevris, V.P.; Solorzano, G.S.; Bakas, N.B.; Seghier, M.E.A.B.S. Investigation of performance metrics in regression analysis and
machine learning-based prediction models. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 13, 1–40. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.