Machine-Learning Based Methods in Short-Term Load Forecasting
Machine-Learning Based Methods in Short-Term Load Forecasting
A R T I C L E I N F O A B S T R A C T
Keywords: Short-term load forecasting is of great significance to the secure and efficient operation of power systems.
Short-term load forecasting However, loads can be affected by a variety of external impact factors and thus involve high levels of un
Machine learning certainties. So it is a challenging task to achieve an accurate load forecast. This paper discusses three commonly-
Support vector machine
used machine-learning methods used for load forecasting, i.e., the support vector machine method, the random
Long short-term memory
forest regression method, and the long short-term memory neural network method. The features and applications
of these methods are analyzed and compared. By integrating the advantages of these methods, a fusion fore
casting approach and a data preprocessing technique are proposed for improving the forecasting accuracy. A
comparative study based on real load data is performed to verify that the proposed approach is capable of
achieving a relatively higher forecasting accuracy.
1. Introduction competition in 2001, was the first to apply SVM method in load fore
casting (Chen et al., 2004). Since then, researchers have proposed
Nowadays, power system incorporates an increasing amount of various SVM-based forecasting methods to achieve more accurate fore
renewable energy (e.g., solar and wind powers). The fast growth in the cast results (Jiang et al., 2017; Hou et al., 2018; Chen et al., 2017;
penetration of renewable energy has the potential to enhance the energy Barman et al., 2018; Lu et al., 2019; Chen et al., 2019; Hoori et al.,
efficiency and economics of power systems. However, the high pene 2020). Jiang proposed a support vector regression (SVR)-based predic
tration of renewable energy also introduces additional challenges in tor and a hybrid parameter optimization based forecasting method to
power system scheduling and dispatch. Short-term load forecasting provide a high-precision and high-resolution STLF (Jiang et al., 2017).
(STLF), which is the basis for an optimal and secure dispatch, plays a Barman proposed a regional hybrid short-term load forecasting model
critical role in ensuring the secure and economic operation of power by considering regional climate conditions and using SVR model and
systems. Therefore, research efforts in recent years have been focusing grasshopper optimization algorithm (Barman et al., 2018). The results
on improving the accuracy of STLF. The past research has applied a show that the proposed model has better accuracy than the traditional
variety of traditional methods, e.g., time-series analysis and regression STLF model which uses temperature as the only climate factor. The
analysis. But these methods typically have limitations, e.g., low accu least-squares SVM, which is an extension of the standard SVM, uses
racy, weak capability of integrating impact factors such as weather data, nonlinear mapping to transform the second optimal inequality
and low sensitivity concerning input data. constraint problem in the original space to a linear system with equality
Recently, with the fast development of computer science and artifi constraints in the feature space, improving the convergence speed and
cial intelligence, machine-learning technologies have been introduced accuracy. In 2001, Breiman was the first to introduce RF, a supervised
into the STLF. These include the artificial neural network (ANN), sup learning algorithm, based on bagging ensemble learning and random
port vector machine (SVM), the random forest (RF), and the long short- attribute subspace theory (Breiman, 2001). Later, RF algorithms have
term memory (LSTM). Chen, the winner of the EUNITE load forecasting been adapted and applied in load forecasting problems, e.g., the RF
⋆
W. Guo and L. Che are with the College of Electrical and Information Engineering at Hunan University, Changsha, Hunan 410082, China. M. Shahidehpour is
with the Galvin Center for Electricity Innovation at Illinois Institute of Technology, Chicago, IL 60616, USA. Xin Wan is with the Dadu River Hydro Power
Development Company of China Guodian Corporation, Chengdu, Sichuan 610041, China.
* Corresponding author.
E-mail address: cheliang1213@163.com (L. Che).
https://doi.org/10.1016/j.tej.2020.106884
integrating multiple decision trees to form a classifier which has the deviation comparison method comprehensively processes the data in a
advantages of strong generalization ability, fewer parameters and high broader time horizon. It avoids the burden of preselecting the detection
prediction accuracy (Wu et al., 2015; Lahouar and Ben, 2015; Jurado threshold and improves the overall detection performance. After the
et al., 2015; Uriarte et al., 2016; Li et al., 2019). Wu improved the RF abnormal data is detected, this method makes a further correction that
method by using gray projection to propose a splitting algorithm (Wu firstly positions the abnormal data to zero and then processes the
et al., 2015); the experimental results showed that the proposed one has missing data using suitable data filling methods.
higher prediction accuracy and robustness than the original algorithms. The commonly-used data correction and filling method are briefly
Kong proposed a forecasting method based on a deep belief network introduced as follows:
while considering the impacts of temperature and humidity. The pro
posed method was demonstrated to have improved prediction accuracy (1) Longitudinal filling method: This method uses correlation anal
especially under the cases of large size of training samples and complex ysis or degree of correlation to a principle of similar-day substi
impact factors (Kong et al., 2018). LSTM, firstly proposed by Hochreiter tution. If two or more days have similar impact factors regarding
in 1997, has been widely used in NLP applications such as text, speech, electric load, they are referred to as similar days. For a day with
handwriting recognition, and machine translation (Hochreiter and missing data, if there is one similar day which has data, the data
Schmidhuber, 1997). LSTM is also applied in load forecasting. Kong in this similar day is used to fill the data for this day with missing
proposed an LSTM-based load forecasting framework which is tested on data, and if there are more than one similar days, the average
a publicly available set of real residential smart-meter data (Kong et al., value of the data in these days are used for the filling.
2019). It showed that the proposed method has higher accuracy (2) Regression prediction filling method: In this method, the missing
compared to benchmarks in the field of load forecasting. data is replaced by the data that is predicted by performing
In this paper, we review three commonly-used machine learning regression on historical data whose time is prior to that of the
methods (SVM, RF and LSTM) and compare their advantages and dis missing data. In this type of methods, the commonly-used ones
advantages and their applications in STLF. Then, considering the fact include parametric regression, non-parametric regression and
that a single model may not achieve a satisfying accuracy level in STLF, linear regression.
we propose a fusion forecasting approach together with a data pre (3) Normal interval value filling method: This method firstly obtains
processing method. The proposed fusion approach integrates the merits a normal range of the available load data and then use the
of the aforementioned three machine-learning methods for enhancing average or median value of this range to replace the missing data.
the forecasting accuracy.
The rest of this paper is arranged as follows. Section 2 introduces Moreover, if the number of missing data samples is much smaller
data preprocessing methods. Section 3 discusses the advantages and than the number of total samples and at the same time the dependence
disadvantages of SVM, RF, LSTM. Section 4 proposes a fusion prediction between each data is significant, then it would be difficult to fill and
algorithm and evaluation index for the accuracy of prediction results correct the missing data. This may affect the accuracy of short-term load
based on the analysis of Sections 2 and 3. Section 5 verifies the perfor forecasting.
mance of the proposed method based on simulations on real load data.
Finally, Section 6 concludes this work and discusses future research 2.2. Intelligent data preprocessing methods
directions.
In recent years, with the fast development of artificial intelligence
2. Data preprocessing technology, researchers have adopted machine-learning algorithms in
data preprocessing, e.g., the self-organizing map (SOM) neural network
Due to the complex power consumption behaviors of users on the (Diaz et al., 2019), the empirical mode decomposition (EMD) (Huang
demand side, the telemetered load data may include abnormal data, et al., 1999), the ensemble empirical mode decomposition (EEMD)
which can be generally divided into bad data and distorted data. Bad data (Hong et al., 2012), the variational mode decomposition (VMD) (Ali
is mainly caused by the failure of meters, while distorted load data refers et al., 2018), the set pair analysis, and the isolation forest method, etc.
to a sudden drop of load typically resulted from a change of large in This paper mainly focuses on the EEMD data preprocessing method,
dustrial load in the grid. To deal with the problems caused by abnormal which is an improvement to the EMD method.
data, it is necessary to use proper methods to detect the abnormal data EMD, originally proposed by Huang in 1998, decomposes the data
out from the telemetered raw load data, and then process the identified into a number of intrinsic mode function components in terms of their
abnormal data based on actual situations. In this section, we review and intrinsic characteristic scales (Huang et al., 1999). EMD allows the load
compare the commonly-used abnormal data processing methods and data to be well preprocessed and thus improves the accuracy of load
then discuss the data preprocessing approach that will be used in this forecasting. However, research efforts in recent years have revealed
paper. some limitations when using the EMD method, e.g., mode aliasing,
over-envelope, and under-envelope. To deal with the above limitations,
2.1. Traditional data preprocessing methods Hong proposed an improved decomposition method, namely EEMD, for
data preprocessing (Hong et al., 2012). Compared with the EMD, the
Presently, the commonly-used traditional methods for abnormal data EEMD introduces two important parameters: Gaussian white noise
detection include the data horizontal comparison method and the time- amplitude and primary pitch frequency. On the one hand, the Gaussian
window mean-standard-deviation comparison method. white noise has continuity across different scales and different fre
In general, the load data tends to be relatively stable within a short quencies, which can be effectively utilized to repress the aforementioned
period of time and the load variation curves are similar across different mode aliasing. On the other hand, the introduced primary pitch fre
time cycles. Based on this feature, the data horizontal comparison quency can prevent the decomposition from falling into over-envelope
method detects the load at adjacent moments. When the load data at a or under-envelope. So, EEMD is deemed as a more adaptable data pre
certain time instant is considered to be abnormal if it exceeds a given processing method and can be used to improve the accuracy of load
threshold. This method is easy-to-be implemented. However, it has the forecasting.
problems that it is difficult to set up a threshold for abnormal data
identification and that the data being abnormal at the previous or next 3. Common machine learning algorithms
time instant may cause a misjudgment of the data at the current instant.
To overcome this problem, the time-window mean-standard- Currently, a variety of forecasting algorithms have been used in the
2
W. Guo et al. The Electricity Journal 34 (2021) 106884
3
W. Guo et al. The Electricity Journal 34 (2021) 106884
Table 1
Comparison of the advantages and disadvantages of machine-learning
algorithms.
SVM RF LSTM
(1) At each time instant, the LSTM unit receives the current state (xt)
and the hidden state at the previous time instant (ht-1) through its
three gates;
(2) Each gate receives an internal information input and the memory
unit’s state (ct-1);
(3) These gates, upon receiving the input information, handle the
inputs from respective information sources (e.g., ht-1, ct-1) while
the gates’ logic function determines their activeness;
(4) Once the input information is processed by the nonlinear function
at the input gate, the state of the memory cell associated with the
forgetting gate is superimposed to form a new memory cell state
c t. Fig. 4. Model prediction algorithm for frame fusion.
(5) Finally, the memory cell state (ct) will form the LSTM unit’s
output (ht) based on the nonlinear function and the dynamic
control of the output gate. where xt denotes the input to the memory cell layer at time instant t, Wf ,
Wi , Wo , Wc , Uf , Ui , Uo and Uc are the weight matrices, and bf , bi , bo
The variables mentioned above interact with each other based on the and bc are bias vectors.
following model:
⎧ ( ) 3.4. Algorithm advantages and disadvantages
⎪
⎪ f t = σg Wf xt + Uf ht− 1 + bf
⎪
⎪
⎨ it = σg (Wi xt + Ui ht− 1 + bi ) The three machine-learning algorithms used for load forecasting
ot = σg (Wo xt + Uo ht− 1 + bo ) (1) discussed in this section, SVM, RF and LSTM, are clearly compared based
⎪
⎪
⎪
⎪ ct = f t ∙ct− 1 + it ∙σc (Wc xt + Uc ht− 1 + bc ) on their advantages and disadvantages in Table 1:
⎩
ht = ot ∙σc (ct )
4
W. Guo et al. The Electricity Journal 34 (2021) 106884
4.1. Framework fusion model algorithm In this paper, a set of indices is used to assess the forecasting accuracy
and to comprehensively benchmark the performance of the proposed
In general, the generalization ability of SVM can alleviate the errors approach against those of existing approaches. These indices include the
caused by neural network’s over-fitting to a certain extent and avoid the root mean square error (RMSE), mean absolute error (MAE) and mean
insufficient model learning problem caused by the inadequacy of input absolute percentage error (MAPE), which are formulated as follows:
data. However, the application of SVM has some limitations. For √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
example, SVM is only suitable for calculations with relateive small size 1∑ n
( )2
RMSE = y − yp (2)
of samples; its forecasting accuracy may be affected when model pa n i=1 i
rameters are not well defined or the load involves various types of
changes. The functions of RF and LSTM can be used to overcome the 1∑ n ⃒ ⃒
MAE = ⃒y − y ⃒ (3)
limitations of the SVM model. Apparently, it is impossible to have a n i=1 i p
single type of algorithm suitable for all scenarios. Hence, the idea of a
fusion method, which make full use of the advantages of different types n ⃒ ⃒
1∑ ⃒yi − yp ⃒
of algorithms, is naturally emerged. The principle of the fusion is to MAPE = ⃒ ⃒ (4)
n i=1 ⃒ yi ⃒
integrate the advantages of different algorithms to construct a more
adaptable prediction method. With such an idea in mind, this paper where n is the number of samples in the test set, yp and yi are the
proposes a load forecasting method based on the fusion model of SVM, forecasted and actual short-term loads, respectively.
RF and LSTM frameworks, together with a data preprocess method,
which is shown in Fig. 4. As shown in the figure, the input data is pre 5. Illustrative example
processed by the EEMD algorithm for linearizing the variation of load,
then, the fusion method built on SVM, RF and LSTM is used to perform 5.1. Experimental data and parameter settings
the load forecast. The purpose is to use the proposed fusion method to
improve the forecasting accuracy. In this section, the effectiveness of the proposed forecasting
approach is verified by numeric simulations based on real load data. The
data set includes samples with 15-min interval and a total of 5760
5
W. Guo et al. The Electricity Journal 34 (2021) 106884
Table 2 6. Conclusions
The forecasting accuracy of each algorithm.
Methods SVM RF LSTM FA Load forecasting plays a critical role in power system scheduling and
dispatch. Many research efforts have been put into improving the per
RMSE 106.4186 39.33402 20.2323 10.07455
MAPE 0.234434 0.036709 0.053074 0.028489 formance of power load forecasting. High-precision forecasting models
MAE 95.71316 36.45278 15.18708 7.599294 can produce large economic, social and environmental benefits, ensure
the operational security of power systems. Therefore, there is an urgent
need to develop an adaptable and high-accuracy prediction algorithm.
sample points. The first 5664 points in the data sequence are used for Different forecasting methods, as reviewed in this paper, have
training the model while the last 96 points are used for testing. The different advantages when performing the forecasting. However, it is
experimental environment is Intel (R) Core (TM) i5-6500 CPU @ 3.20 most likely that a single model cannot achieve a satisfying accuracy level
GHz processor, 8GB RAM, MATLAB and TensorFlow platform. of forecasting. To address this issue, a fusion forecasting model is pro
In the decomposition of the raw input data, the parameter is set as r = posed in this paper. It includes a data preprocessing method and a multi-
100 and standard deviation a = 0.2 according to the reference (Lee et al., step forecasting strategy that integrates the advantages of SVM, RF and
2012). The basics of parameter selection have been discussed in Section LSTM, and thus comprehensively improve the forecasting accuracy.
2.2. Further research can focus on the following problems:
Regarding the LSTM network, the number of neurons in the input
layer is equal to the number of attributes of the input data. The hidden (1) This paper ignores the conflicts between using different indices to
layer includes three layers with 10, 15 and 5 neurons, respectively, and evaluate the forecasting accuracy. Future work can focus on
uses the rectified linear unit function (ReLU) as the activation function. address the issue that the forecasting process involves multiple
The output layer is a fully-connected layer with one neuron. objectives or conflicting constraints.
(2) The forecasting can incorporate more impact factors (such as
5.2. Result analysis weather information, holidays, intermittent renewable energy,
and electric vehicles, etc.).
The decomposition result is shown in Fig. 5. In this figure, IMF stands
for intrinsic mode function, IMF1 through IMF5 with large changes Declaration of Competing Interest
represent the high-frequency variation components of loads, IMF6
through IMF8 represent the low-frequency variation components, and R The authors report no declarations of interest.
is an index that denotes the trend of the load variation. The process of
convergence during the training is shown in Fig. 6.
6
W. Guo et al. The Electricity Journal 34 (2021) 106884
References Li, Z., Shahidehpour, M., Alabdulwahab, A., Al-Turki, Y., 2019. Valuation of distributed
energy resources in active distribution networks. Electricity 32 (4), 27–36.
Lu, L., Azimi, M., Iseley, T., 2019. Short-term load forecasting of urban gas using a hybrid
Ali, M., Khan, A., Rehman, U., 2018. Hybrid multiscale wind speed forecasting based on
model based on improved fruit fly optimization algorithm and support vector
variational mode decomposition. Int. Trans. Electr. Energy Syst. 28, 1–21.
machine. Energy 19 (5), 666–677.
Barman, M., Choudhury, D., Sutradhar, S., 2018. A regional hybrid GOA-SVM model
Ronald, W., David, Z., 1989. A learning algorithm for continually running fully recurrent
based on similar day approach for short-term load forecasting in Assam. Energy 145
neural networks. Neural Comput. 1 (2), 270–280.
(February 15), 710–720.
Uriarte, R., Tiezzi, F., Tsaftaris, A., 2016. Supporting autonomic management of clouds:
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32.
service clustering with random forest. IEEE Trans. Netw. Serv. Manag. 13 (3),
Chen, J., Chang, W., Lin, J., 2004. Load forecasting using support vector machines: a
595–607.
study on EUNITE competition 2001. IEEE Trans. Power Syst. 19 (4), 1821–1830.
Wu, X., He, J., Zhang, P., et al., 2015. Power system short-term load forecasting based on
Chen, Y., Xu, P., Chu, Y., et al., 2017. Short-term electrical load forecasting using the
improved random forest with grey relation projection. Automation Electr. Power
Support Vector Regression (SVR) model to calculate the demand response baseline
Syst. 39 (12), 50–55.
for office buildings. Energy. 195, 659–670.
Xiao, B., Zhou, C., Mu, G., 2013. Review and prospect of the spatial load forecasting
Chen, H., Zhang, J., Tao, Y., Tan, F., 2019. Asymmetric GARCH type models for
methods. Proc. Chinese Soc. Electr. Eng. 000 (025), 78–92.
asymmetric volatility characteristics analysis and wind power forecasting. Prot.
Control. Mod. Power Syst. 4 (4), 356–366.
Diaz, A., Lopez-Rubio, E., Palomo, J., 2019. The forbidden region self-organizing map Weilin Guo received the B.S. degree from the College of Electrical Engineering, Tibet
neural network. IEEE Trans. Neural Netw. Learn. Syst. 31 (1), 201–211. University, Lin Zhi, Tibet, China in 2017. He is currently pursuing the PhpH.D. degree with
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8), the College of Electrical and Information Engineering at Hunan University, Changsha,
1735–1780. Hunan, China. His research interests include power system operation and planning, and
Hong, H., Zhu, X., Su, W., et al., 2012. Detection of time varying pitch in tonal languages: applications of machine learning methods to power systems.
an approach based on ensemble empirical mode decomposition. Journal of Zhejiang
University Science C. Computers & Electronics. 13 (2), 139–145.
Liang Che (M’15) received the B.S. degree from Shanghai Jiaotong University, China, in
Hoori, O., Kazzaz, A., Khimani, R., et al., 2020. Electric load forecasting model using a
2006, and the Ph.D. degree from the Illinois Institute of Technology, Chicago, IL, in 2015,
multi-column deep neural networks. IEEE Trans. Ind. Electron. 67 (8), 6473–6482.
all in electrical engineering. He is currently a Professor with the College of Electrical and
Hou, K., Shao, G., Wang, H., Zheng, L., Zhang, Q., Wu, S., Hu, W., 2018. Research on
Information Engineering, Hunan University, Changsha, China. He was with the Mid
practical power system stability analysis algorithm based on modified SVM. Prot.
continent Independent System Operator (MISO), Carmel, IN, USA from 2016 to 2019, and
Control. Mod. Power Syst. 3 (2), 119–125.
with Siemens PTI, Minnetonka, MN, USA from 2015 to 2016. His research interests include
Huang, N., et al., 1999. A NEW VIEW OF NONLINEAR WATER WAVES: the hilbert
power system operation and planning, and applications of machine learning methods to
Spectrum. Annu. Rev. Fluid Mech. 31, 417–457.
power systems.
Jahangir, H., et al., 2020. Deep learning-based forecasting approach in smart grids with
micro-clustering and Bi-directional LSTM network. IEEE Trans. Ind. Electron.
(1982), 1. Mohammad Shahidehpour (F’01) received an Honorary Doctorate degree from the
Jiang, H., Zhang, Y., Muljadi, E., et al., 2017. A short-term and high-resolution Polytechnic University of Bucharest, Bucharest, Romania. He is a University Distinguished
distribution system load forecasting approach using support vector regression with Professor, Bodine Chair Professor, and Director of the Robert W. Galvin Center for Elec
hybrid parameters optimization. IEEE Trans. Smart Grid 9 (4), 1-1. tricity Innovation at Illinois Institute of Technology. Dr. Shahidehpour was the recipient of
Jurado, S., Nebot, À., Mugica, F., et al., 2015. Hybrid methodologies for electricity load the IEEE PES Ramakumar Family Renewable Energy Excellence Award, IEEE PES Douglas
forecasting: entropy-based feature selection with machine learning and soft M. Staszesky Distribution Automation Award, IEEE PES Outstanding Power Engineering
computing techniques. Energy 86, 276–291. Educator Award, and IEEE PES T. Burke Hayes Faculty Recognition Award for his con
Kong, X., Zheng, F., Cao, J., et al., 2018. Short-term load forecasting based on deep belief tributions to hydrokinetics, He is a member of the US National Academy of Engineering,
network. Automation Electr. Power Syst. 42 (5), 133–139. and Fellow of IEEE, the American Association for the Advancement of Science (AAAS), and
Kong, W., et al., 2019. Short-term residential load forecasting based on LSTM recurrent the National Academy of Inventors (NAI).
neural network. IEEE Trans. Smart Grid 10 (1), 841–851.
Lahouar, A., Ben, J., 2015. Day-ahead load forecast using random forest and expert input
Xin Wan is with the Guodian Dadu River Drainage Area Hydroelectricity Development
selection. Energy Convers. Manage. 103, 1040–1051.
Co., Ltd.
Lee, L., Chang, C., Hsieh, Y., et al., 2012. A brain-wave-Actuated small robot Car Using
ensemble empirical mode decomposition-based approach. IEEE Trans. Syst. Man
Cybern. A. Syst. Hum. 42 (5), 1053–1064.