Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
63 views

Power System Fault Classification and Prediction Based On A Three-Layer Data Mining Structure

This document proposes a hybrid data mining method for power system fault classification and prediction based on a three-layer structure. The first layer uses K-means clustering to preprocess fault data and simplify data forms. The second layer uses association rules to eliminate low-impact data and mine highly correlated data for regression training. The third layer uses cross-validation to obtain optimal parameters for each fault model, then trains models using stochastic gradient descent for classification and prediction of each fault type. A case study shows this method improves accuracy over single algorithm models and provides a feasible approach for online fault prediction.

Uploaded by

Koti Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Power System Fault Classification and Prediction Based On A Three-Layer Data Mining Structure

This document proposes a hybrid data mining method for power system fault classification and prediction based on a three-layer structure. The first layer uses K-means clustering to preprocess fault data and simplify data forms. The second layer uses association rules to eliminate low-impact data and mine highly correlated data for regression training. The third layer uses cross-validation to obtain optimal parameters for each fault model, then trains models using stochastic gradient descent for classification and prediction of each fault type. A case study shows this method improves accuracy over single algorithm models and provides a feasible approach for online fault prediction.

Uploaded by

Koti Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Received October 20, 2020, accepted October 21, 2020, date of publication October 28, 2020, date of current

version November 16, 2020.


Digital Object Identifier 10.1109/ACCESS.2020.3034365

Power System Fault Classification and Prediction


Based on a Three-Layer Data Mining Structure
YUNLIANG WANG 1 , XIAODONG WANG 1, YANJUAN WU 1,

AND YANNAN GUO 2


1 Tianjin Key Laboratory for Control Theory and Applications in Complicated Systems, Tianjin University of Technology, Tianjin 300384, China
2 Tianjin Tianda Qiushi Electric Power High Technology Company Ltd., Tianjin 300392, China
Corresponding author: Yanjuan Wu (wuyanjuan12@126.com)
This work was supported in part by the Tianjin Science and Technology Plan Project under Grant 18ZXYENC00100, and in part by the
State Grid Chongqing Electric Power Company Science and Technology Project Funding under Grant SGTYHT/17-JS-199.

ABSTRACT In traditional fault diagnosis methods in power systems, it is difficult to accurately classify
and predict the types of faults. With the emergence of big data technology, the fault classification and
prediction methods based on big data analysis and processing have been applied in power systems.
To make the classification and prediction of the fault types more accurate, this paper proposes a hybrid
data mining method for power system fault classification and prediction based on clustering, association
rules and stochastic gradient descent. This method uses a three-layer data mining model: The first layer
uses the K -means clustering algorithm to preprocess the original fault data source, and it proposes to use
self-encoding to simplify the data form. The second layer effectively eliminates the data that have little
impact on the prediction results by using association rules, and the highly correlated data are mined to
become the regression training data. The third layer first uses the cross-validation method to obtain the
optimal parameters of each fault model, and then, it uses stochastic gradient descent for data regression
training to obtain a classification and prediction model for each fault type. Finally, a verification example
shows that compared with a single data mining algorithm model, the proposed method is more comparative
in terms of the data mining, and the established power system fault classification and prediction model
has global optimality and higher prediction accuracy, which has a certain feasibility for real-time online
power system fault classification and prediction. This method reduces the disturbances from low-impact or
irrelevant data by mining the fault data three times, and it uses cross-validation to optimize the multiple
regression parameters of the regression model to solve the problems of low accuracy, large errors and easily
falling into a local optimum, given the conduct of fault classification and prediction.

INDEX TERMS Association rules, data mining, K-means, machine learning, power system fault, stochastic
gradient descent algorithm.

I. INTRODUCTION knowledge with machine learning algorithms, (2) mining


To ensure the reliability and stability of the power system, fault historical data to find the correlation and potential laws,
predicting power faults in advance and making the corre- and then (3) building a predictive model through training data.
sponding preventive measures can effectively prevent the These steps demonstrate an important and valuable research
occurrence of power accidents and reduce economic losses. direction.
Short-circuit faults are relatively common faults in the dis- Early power fault diagnosis methods have mainly used
tribution lines of power systems, and they can easily cause protection devices at all levels to work. The staff can deter-
other corresponding electrical faults; therefore, their hazards mine the fault location based on the real-time voltage data
are large. To prevent the occurrence of short-circuit faults, and the status of the alarm device by patrolling and inspecting
the following steps can be taken: (1) combining big data the electrical equipment. The shortcomings of this traditional
diagnostic method are lower efficiency and higher cost. Thus,
The associate editor coordinating the review of this manuscript and a new mathematical analysis model has drawn the attention
approving it for publication was Jihwan P. Choi . of researchers, and through the improvement of equipment

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 200897
Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

functions, the control and protection ability of the power Gradient descent is one of the most commonly used meth-
system was improved [1], [2]. Although the performance of ods when solving for the model parameters of machine
these new mathematical analysis models and equipment is learning algorithms, especially unconstrained optimization
enhanced, the intelligence, interaction and automation of the problems. Newton’s method provides a method for solving
equipment are not sufficient. It is possible to judge the occur- nonlinear optimization problems whose convergence rate is
rence of the fault and take protective actions for the power fast, but each iteration requires solving a complex Hessian
system in time, but it cannot predict the type of the fault, and matrix. Meta-heuristic algorithms are based on an intuitive
thus, the adopted protective measures could cause protection or empirical construction, which can give a feasible solution
failure due to inappropriate choices, and even enlarge the fault to the problem for an acceptable calculation time and space
loss. Therefore, it is necessary to further study the prediction when the degree of deviation of the feasible solution from
of the power system fault types, which will help the operators the optimal solution might not necessarily be predicted in
to take correct protection and remedial measures in time to advance. However, it cannot guarantee that the global optimal
minimize the fault loss. solution will be obtained absolutely, and it often falls into
Compared with early fault diagnosis methods, artificial a local optimum on some problems. As a result, the hybrid
intelligence diagnosis methods have been applied in the fields data mining method, which combines multiple algorithms,
of fault diagnosis and prediction, such as fuzzy diagnosis has emerged. One study [26] proposed a power system line
methods [3], diagnosis methods based on genetic algo- trip fault prediction method based on an long-short term
rithms [4], [5], fault diagnosis methods using expert sys- memory (LSTM) network and SVM. Another study [27]
tems [6], [7], methods based on neural networks [8]–[10], proposed an optimized neural network fault diagnosis strat-
and diagnosis methods using the support vector machine egy for heating systems based on data mining, which used
(SVM) [11], [12]. The effective use of these artificial intel- an association rule mining method to optimize the selec-
ligence technology methods has been superior to early diag- tion of the feature sets. A data driven modeling method for
nosis methods to a certain extent. an aeroengine aerodynamic model that combined stochastic
However, with a power system that generates massive gradient descent (SGD) and support vector regression was
amounts of data every moment, the traditional artificial intel- proposed [28]. In addition, one study [29] proposed a port
ligence diagnosis method cannot process the big data system- cargo throughput prediction method based on empirical mode
atically, and the accuracy of the system fault diagnosis results decomposition (EMD) recurrent neural network and adaptive
cannot be further improved, which affects the efficiency grouping algorithm. Another study [30] proposed a similarity
of the diagnosis. The emergence of the data mining meth- grouping-guided neural network modeling method for mar-
ods [13], [14] improved the performance of the fault diagnosis itime time series prediction. The experiments on both port
to a large extent. Data mining is a cutting-edge technology of cargo throughput and vessel traffic flow have illustrated its
data analysis, which can quickly obtain valuable information superior performance in terms of prediction accuracy and
from various types of data. The functions are mainly the robustness. It can be seen that the fault diagnosis and pre-
following: 1) Automatically predicting trends and behaviors. diction model of the hybrid data mining method is excellent
2) Association analysis can find hidden associations in the and exceeds other methods.
data. 3) Clustering can enhance people’s understanding of Cluster analysis as one of the most important research
the similarities among things. 4) Deviation detection can look branches in the field of data mining, which classifies clustered
for meaningful differences between the observations and the objects according to their own characteristics. Cluster analy-
reference values. However, most of the data mining diagnosis sis has been widely used in software engineering, machine
methods are implemented using a single algorithm model. For learning, statistics, image analysis, web clustering engines
example, one study [15] proposed a fault diagnosis method and text mining. Association rules, as an inductive learning
based on decision trees for vehicle test data mining. Since algorithm, have a strong ability to discover certain rules and
the decision tree ignores the correlations of the attributes in associations in the data. As the representative algorithm of
the vehicle test data set, overfitting is prone to occur. Another association rules, Apriori [31] uses a layer-by-layer search
study [16] developed a social network analysis management strategy to traverse the solution space. SGD is often used to
framework for the industry environmental risks using associ- train various machine learning models due to its fast learning
ation rules based on frequent patterns, which is suitable for rate and online update [32]. When addressing big data, SGD
discrete data, but it is more difficult to implement, and its has a small number of calculations in a single iteration, and
performance will decrease on some data sets. Therefore, the thus, the convergence speed is significantly higher than that
models achieved by a single algorithm are not ideal. of other algorithms. The optimization efficiency is better than
Some researchers began to pay attention to the improve- that of the classic algorithm, and therefore, the application of
ment and optimization of the selected algorithms [17]. Some SGD in data regression training is extended to many different
optimization algorithms used to solve the optimal solution fields.
problem of the algorithm model have been applied, which Based on the above-mentioned considerations, this paper
mainly include the gradient descent method [18], Newton proposes a hybrid data mining algorithm based on K -means
method [19], and the meta-heuristic algorithm [20]–[25]. clustering, Apriori association rules and SGD to classify

200898 VOLUME 8, 2020


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

and predict power system faults. The hybrid algorithm per- which affects the timely handling of the fault. Therefore, this
forms three-layer mining on the fault data to establish dif- paper uses three layers of data mining on the original data
ferent fault prediction models: Firstly, K -means clustering of the power system short-circuit faults, and it establishes a
and self-coding are used to preprocess the raw data. Then, fault classification and prediction model (FCPM) to predict
the association rules filter the samples for the second layer of whether a fault is about to occur and to predict the type of
the data mining. Finally, SGD is used for the data regression fault that will occur.
training and completes the third layer of the data mining. This
mining mode solves some of the current problems faced by III. FAULT CLASSIFICATION AND PREDICTION METHOD
data mining. Firstly, it reduces the interference between the This section will introduce the structure and implementation
complex data and avoids obtaining results from local opti- process of the proposed method, and the mathematical model
mization. Secondly, the complementary functions between of each algorithm will be introduced in detail.
the algorithms ensure the integrity of the data mining. Thirdly,
the method adjusts parameters according to the different fault A. OVERALL METHOD ARCHITECTURE
prediction models, in such a way that the fault prediction This paper proposes a fault classification and prediction
model has good robustness and fault tolerance, which can method based on K -means clustering, association rules and
be applied to various actual fault prediction scenarios. Com- SGD. The source samples are the node voltage data after
pared with the single algorithm model, the proposed method a certain fault occurs in the power system. The fault types
has greatly improved the accuracy and reliability of power are mainly single-phase ground fault (SPGF), two-phase
system fault classification and prediction, which can be used phase-phase fault (TPPF), two-phase ground fault (TPGF)
to optimize parameters online and can be applied to different and three-phase fault (TPF). After the data collection is
operating states. completed, the source sample library is shown as follows:
The paper originally proposes the three-layer data mining {AQ, Gi }, where, AQ = {X1 , X2 . . . Xi } is the voltage data set
structure, each layer structure has a special data mining func- in the source sample library, and {Gi } is the fault type of the
tion, and cooperate with each other to complete the classifi- fault node, Gi ∈ {1, 2, 3, 4}, where Gi = 1 is SPGF, Gi = 2
cation and prediction work of the power system fault types. is TPPF, Gi = 3 is TPGF, and Gi = 4 is TPF.
The main contributions are outlined as follows: The overall architecture of the three-layer data mining
method is shown in Fig. 1. The proposed method integrates
1) The clustering algorithm and self-encoding were used
three data mining algorithms: K -means clustering, Apriori
to preprocess complex source data, which classifies the
association rules and SGD. In the process of three-layer data
source data and simplifies the form of the classified
mining, the K -means method and self-coding method are
data.
used to preprocess the raw data, simplify the data form,
2) The method uses association rules to filter the samples
reduce the complexity of the data set, and accelerate the data
in advance and classifies them according to the type of
processing speed. After using the Apriori algorithm to mine
fault, which increases the correlations in the data.
the data for the second time, the relevant samples are sorted
3) The cross-validation method finds the optimal parame-
out according to the fault type for regression training, which
ters that correspond to different fault models, and then,
can prevent the SGD from falling into a local optimum due to
stochastic gradient descent is used to train the fault
using random data samples and improves the accuracy of the
models, which improves the accuracy of the power
regression training.
system’s fault prediction.
4) A multi-layer data mining model based on K -means,
B. THE FIRST LAYER OF THE DATA MINING PROCESSING
association rules and stochastic gradient descent is
METHODS AND RULES
built, which improves the completeness of the data
mining. After obtaining the source samples, the K -means clustering
algorithm clusters the source samples and preprocesses the
The remainder of this paper is organized as follows: the data. Moreover, a data encoding rule is proposed to encode
description of the problem is presented in Section II. The the clustered data samples and simplify the data form, which
proposed algorithm model framework and the theory of each cooperates with K -means clustering to conduct first-layer
part are explained in Section III. Then, in the fourth section, data mining and the sorting of samples to obtain sample
the whole test example is introduced, and the results are library I. The specific methods and rules are as follows:
verified. Finally, the fifth section concludes the study.
1) K-MEANS CLUSTERING METHOD
II. PROBLEM STATEMENT The K -means clustering method in this paper includes three
Short-circuit faults are very common faults in power systems, main aspects: the Euclidean distance is used to classify the
which can cause large-scale power outages. When faults data samples; The criterion function is used to judge whether
occur, the power protection components can decide only the sample clustering is completed; and the number of best
whether to act according to the current operating conditions, classification clusters is determined by comparing contour
but they can fail to determine what type of fault has occurred, coefficients.

VOLUME 8, 2020 200899


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

FIGURE 1. The three-layer data mining process of the proposed method.

a: EUCLIDEAN DISTANCE JUDGMENT METHOD When the criterion function of formula (2) converges,
The K -means clustering method classifies the samples which is when the cluster center does not change signifi-
according to the Euclidean distance between the data sample cantly, the cluster center stops updating. At this time, the
and the center of each cluster, and they are classified into the sample classification into K clusters is completed.
cluster with the minimum Euclidean distance. The Euclidean
distance is calculated as in formula (1): c: THE CONTOUR COEFFICIENT
r To obtain the optimal number of clusters in K -means clus-
j 2 j 2 j 2
       
d X, Y = j
x1 −y1 + x2 −y2 +. . .+ xn −yn tering in the first-layer data mining, the method of calcu-
v
u n  lating the contour coefficients of different clusters is used.
uX j 2
 Then, by comparing those contour coefficients, the number
=t xi − yi (1) of clusters with the largest contour coefficient is found to
i=1
be the optimal number of clusters. For each sample of a
where X = (x1 , x2 , . . . , xn ) is any unclassified sample cluster, the contour coefficient calculation method is shown
in n-dimensional space that corresponds to the elements in formula (3):
(the voltage data on the non-faulty node) in the AQ of the (1) First, the cluster cohesion αk is calculated. (The aver-
j j j
source sample library. Y j = (y1 , y2 , . . . , yn ) is the center age distance from x to all other points in the cluster to
of the jth cluster. When classifying the samples for the first which it belongs).
time, any sample can be randomly selected as the cluster (2) Then, the separation degree bk between the cluster and
center. the other clusters is calculated. (The average distance
between x and all points that are not in the same
b: CRITERION FUNCTION cluster).
The average of all samples in each cluster is used to update the (3) Lastly, the contour coefficient Sk is calculated. (The
cluster center, and the criterion function is used to determine difference between αk and bk is divided by the larger
whether the cluster center stops updating. The criterion func- of the two).
tion is to minimize the sum of the squared errors between the
samples in the cluster and the cluster center, which is shown bk − αk
Sk = (3)
in formula (2): max (bk , αk )
K The value of the contour coefficient is in the range [−1, 1].
j 2
X X  j 
min xi − yi (2) The closer it is to 1, the larger the value of Sk is. The average
j=1 x j ∈X j ,yj ∈Y j value of the contour coefficients of all samples is used as
i i
the contour coefficient under the current cluster number K .
j
where Y j is the jth cluster center, yi is the ith element data in The larger the contour coefficient is, the farther the distance
Y j , K is the number of the clusters, X j is any samples in the between the clusters, and the better the classification effect.
j
jth cluster, and xi is the ith element data in X j . Therefore, the K value with the largest contour coefficient

200900 VOLUME 8, 2020


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

is taken to be the optimal number of clusters for the source


sample library.

2) K-MEANS CLUSTERING RULES


The rules for clustering AQ in a source sample library
using the K -means clustering method are shown in Fig. 2:
To obtain the optimal number of clusters, the enumeration
method is used to increase K from 2. When the number of
clusters is K , the clustering rules are described as follows:
1) Firstly, K samples are randomly selected as the initial
cluster center.
2) According to formula (1), the distance between each
sample and the center of each cluster is calculated,
and each sample is classified into the cluster with the
minimum Euclidean distance.
3) The average of all samples in each cluster is taken
as the new cluster center, and the criterion function
is calculated to determine whether the minimum is
reached. If the minimum of the criterion function is not
reached, then return to 2). This process will be repeated
until the criterion function of formula (2) reaches the
minimum.
4) The contour coefficient under the value of K is cal-
culated according to formula (3), which is compared
with that of the completed clusters, and the value of
K that corresponds to the maximum is taken as the
cluster number; then, the clustering of AQ is completed.
If the maximum value of the contour coefficient does
not appear, the value of K will be updated and returned
to 1) to continue the clustering.
When the source samples are clustered in the case of the
different fault types, the optimal K values of the clusters
on the different nodes are different. All of the source sam-
ples in different nodes must go through the above process
to determine their respective optimal K and complete the
clustering of the source samples on every node. After the FIGURE 2. The clustering preprocessing process of the samples.
source samples are clustered, the samples in each cluster have
some similarities.
node number except for the faulty node), 0 < W ≤ K and
3) SAMPLE SELF-CODING AND CODING LIBRARY W ∈ N . The coding rules are shown in Fig. 3:
FORMULATION RULES
Although the samples of each cluster after clustering have
a certain similarity, the data form is not simple enough to
handle. Therefore, after the AQ of the source sample library
is clustered, self-encoding is performed on the classified sam- FIGURE 3. Self-encoding rules.
ples to simplify the data form. To keep the important attribute
information of the encoded data, such as the node that the AQ is recorded as BQ after the clustering and the self-
sample belongs to and the cluster that the sample belongs to, encoding. After the source samples are clustered and self-
the rules are formulated as follows: For the each sample in encoded, the sample data have a concise form, which is easier
the AQ after clustering, the node (T ) to which it belongs is to manage.
queried first, and then, the cluster (W ) that it belongs to is After processing the source samples through K -means
queried, and the final coding form is T0W. For example, the clustering and self-encoding, the source sample library
T0W is 103, which represents that the sample is the voltage {AQ, Gi } is transformed into the sample library I {BQ, Gi },
data sample classified into the third cluster on the first node, and the first-layer data mining is completed. It digs out the
where T ∈ N (N is a natural number, which represents the inner connections of the unlabeled different data samples in

VOLUME 8, 2020 200901


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

the source samples, makes the data sample in the same cluster 2) The candidate N+1 item sets are found by connect-
as relevant as possible, and prepares for the second-layer data ing and pruning based on the frequent N item sets
mining. (N + 1 = 2, 3 . . .).
3) By scanning the sample library M, all of the non-empty
C. THE SECOND LAYER OF THE DATA MINING sets larger than the minimum support in the candidate
PROCESSING METHODS AND RULES N+1 item set are found as the frequent N+1 item sets.
Because there are some potential laws between the voltage 4) If the frequent N+1 item sets are empty sets, then the
at the node and the fault types in the power system, the confidence and the lift of the rules composed of all
association rules are used in the second-layer data mining to of the frequent item sets are calculated, and the rules
find out the samples that are highly correlated with a certain that meet the minimum confidence and that have a lift
fault type. Training the FCPM with these highly correlated greater than 1 are found to be the strong association
samples will greatly improve the accuracy of the FCPM. rules. Otherwise, return to 2) to search the higher order
frequent item sets.
1) APRIORI ASSOCIATION RULE METHOD
The sample sets that satisfy the strong association rules
The Apriori algorithm is an association rule algorithm that constitute the association library; then, all of the sample sets
is based on mining frequent item sets: the elements in BQ related to Gi are extracted, where the samples are sorted out
and Gi in sample library I were correspondingly combined according to the fault types. These samples form the sample
into a whole sample library M: {Z1 , Z2 , . . . , Zi }, and each library II: {CQj , Gj }, where Gj is the jth fault type, and
row of the sample library M was taken as a sample group. CQj is the strong association sample sets that correspond to
The association rules for frequent item sets are used to find Gj . The difference between the source sample library, the
the association between two or more samples in the sample sample library I and the sample library II is as follows: the
group. By calculating the support, the confidence, and the lift source sample library and the sample library I are the same
of these frequent item sets, the correlation degree between in their dimension and in the number of samples, and the
the samples is measured, and the non-empty sets that meet source sample library standardizes the form of the samples
the requirements of the support, the confidence and the lift through clustering preprocessing and self-encoding to form
are selected. sample library I. After association mining, the associated
Assuming that Zx and Zy are non-empty sets of M, the library obtained from sample library I is very large, but
support, the confidence and the lift are calculated as follows: only the samples related to Gi are extracted to form sam-
ple library II, and thus, the data size of sample library II
a: SUPPORT
is much smaller than that of the complete association
Support is the probability of Zx and Zy appearing simultane- library.
ously. After the association rules mining, the samples are highly
correlated in their attributes, and the information associated
 
Support Zx → Zy = P Zx ∩ Zy (4)
with the fault types is stored, which is helpful for mining
b: CONFIDENCE valuable results during the SGD data regression training.
Confidence is the probability that Zy appears at the same time In this way, the result deviation caused by data redundancy is
when Zx appears. avoided, and the performance and accuracy of the regression
 analysis are improved.
Confidence Zx → Zy = P(Z x ∩ Zy )/P(Zx ) (5)
D. THE THIRD LAYER OF THE DATA MINING PROCESSING
c: LIFT
METHODS AND RULES
Lift represents the ratio of the probability of Zy appearing
After the first two layers of data mining, K -means clus-
at the same time that Zx appears and the probability of Zy
tering and Apriori association rules have mined the strong
appearing.
correlation samples that correspond to the different types
 P(Zx ∩ Zy ) of power system faults. The third layer of the data mining
Lift Zx → Zy = (6)
P(Zx )P(Zy ) uses these strong association samples of sample library II to
establish the FCPM for each fault type, and it achieves the
2) THE SECOND-LAYER MINING RULES BASED ON APRIORI goal of fault classification and prediction. To accelerate the
ASSOCIATION RULE prediction speed and further improve the prediction accuracy,
The Apriori association rule method is used to conduct the the cross-validation method is used to obtain the optimal
second-layer data mining of BQ in the sample library I: parameters in each fault prediction model. Then, the SGD
1) Firstly, the minimum support and the minimum con- obtains the solution of the optimal parameters for each fault
fidence are set, and the sample library M is scanned prediction model by performing regression training on the
to find all of the frequent N item sets. (N increases strong association samples. The specific description is as
from 1.) follows:

200902 VOLUME 8, 2020


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

1) FAULT CLASSIFICATION AND PREDICTION MODEL BASED where α is a hyperparameter. By setting α to reduce the
ON STOCHASTIC GRADIENT DESCENT parameter scale, the purpose of model simplification is
SGD is an iterative optimization algorithm that is often used achieved, which means that the model has better generaliza-
to solve and optimize model parameters of machine learning tion ability. The regular item R(w) is used to measure the
algorithms. SGD is a deformed form of the gradient descent complexity of the loss function, and it limits the parameters
algorithm, which has been successfully applied to text clas- of the loss function. The regular items R(w) mainly include
sification [33] and large-scale sparse machine learning prob- L1 regularization and L2 regularization:
lems in natural language processing [34], [35]. The gradient Xm
is to obtain the partial derivative of the unknown parameters L1 = |w|j = kwk1 (12)
j=1
of a multivariate function and obtain the vector composed 1 Xm
of these partial derivative functions. When all of the partial L2 = w2 = kwk22 (13)
2 j=1 j
derivatives in the gradient are 0, the optimal solution of the
where L1 regularization can produce a sparse weight matrix,
model parameters can be obtained. SGD uses only one sample
which can be used for feature selection. L2 regularization can
per iteration. When processing large-volume samples, only
prevent the model from overfitting by reducing the weight
a small number of samples can be used to iterate the model
coefficient. To a certain extent, L1 can also prevent overfit-
parameters to obtain the optimal solution. Therefore, SGD
ting, but the effect is not as good as L2.
has the advantage of having a fast training speed.
c: THE OPTIMIZED OBJECTIVE FUNCTION
a: PREDICTION MODEL FUNCTION
Given sample library II: {CQj , Gj }, assuming that the weight The smaller the empirical risk and structural risk are, the bet-
coefficients of the samples at each node are linear, a linear ter the model fit; as a result, the final objective optimization
model function is obtained: function is
n
1X
f CQj = wT CQj + b
 
(7) min : E (w, b) = L Gj , f CQj + αR (w) (14)

n
j=1
where w is the model parameter vector, and b is the intercept.
wT CQj is the inner product of CQj and w. SGD considers a set of training samples each time to find the
true gradient of the objective optimization function. For each
b: THE PARAMETER OPTIMIZATION METHOD BASED ON set of samples, the iterative model parameters are updated by
SGD OF THE FAULT PREDICTION MODEL the update rule given by formula (15):
i) LOSS FUNCTION !
The loss function is used to estimate the difference between ∂R (w) ∂L wT (CQj ) + b, Gj
w←w−η α + (15)
the actual value Gj and the model predicted value f (CQj ) ∂w ∂w
that corresponds to the sample, which is expressed by L(Gj ,
where η is the learning rate of the step size in the control
f (CQj )). This article uses the following two loss functions:
parameter space. To prevent the parameter w from oscillating
the SVM type loss function is shown in formula (8), and the
near the solution, η is decreased according to the following
logistic regression type loss function is shown in formula (9):
formula (16):
Hinge: equivalent to SVM classification:
1
L Gj , f CQj = max 0, 1 − Gj f CQj η(t) =
 
(8) (16)
α (t0 + t)
Log: equivalent to Logistic regression:
where t is the time step, and t0 is the initial step size, which
L Gj , f CQj = log 1 + exp −Gj f CQj
 
(9) is the same as the initial value of the weight by default;
additionally, α and t jointly affect the learning rate.
ii) RISK FUNCTION
The risk function is the expectation of the loss function, and d: K -FOLD CROSS-VALIDATION PARAMETER OPTIMIZATION
it is also called the empirical risk: METHOD
n The K -fold cross-validation method is used to find out the
1X
L Gj , f CQj optimal parameter group (loss function L, hyperparameter

Er = (10)
n
i=1 α, regular term R(w) and iteration number N ); then, the
optimal model parameter w is solved by the iteration cal-
Although the objective function is to minimize the empirical
culation of SGD. The strong correlation samples that corre-
risk, because of learning historical data and the complexity
spond to a certain fault type in sample library II are used to
of the functions, it could lead to overfitting of the predic-
train the parameter group, and the solution with the highest
tion results. Therefore, the structural risks is used to avoid
cross-validation score under the fault type is regarded as the
over-fitting:
optimal solution of the parameter group. The optimal value
Sr = αR (w) (11) of the parameter group and its cross-validation scores that

VOLUME 8, 2020 200903


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

correspond to different faults are different. The method steps


are as follows:
1) Firstly, all of the samples in the jth sample set (CQj , Gj )
are divided into K parts in equal proportion, and each
part is used as the cross-validation set; the other K -1
parts are used as the training set.
2) After completing the cross-validation K times, the
average of the correct rate over K times for the
cross-validation results is used as the cross-validation
score.
3) By comparing the cross-validation scores, the parame-
ter group (L, R, α, N ) with the highest score is selected
as the optimal parameter set.
4) The optimal model parameter w is solved by substi-
tuting the optimal parameters (L, R, α, N ) into for-
mula (14) and (15) for the current iteration.

2) THE THIRD-LAYER DATA MINING RULES BASED ON THE


STOCHASTIC GRADIENT DESCENT METHOD
The SGD optimization algorithm performs third-layer data
mining on sample library II, as shown in Fig. 4:
1) Firstly, a prediction model function is established.
2) The K -fold cross-validation method optimizes the four
parameters (L, R, α, N ) of the SGD under the prediction
model function.
3) By comparing the cross-validation scores, the optimal
parameter set is determined.
4) The training set is retrained under the optimal parame-
ter group to obtain the FCPM.
5) Finally, the test set is used to test the performance of
the FCPM.
The optimal model parameter w can be obtained through
the optimal loss function and the optimal regular terms; then,
the fitting law of the samples in CQj to the fault result in Gj is
found, in such a way that the optimization model can classify
and predict the faults from the new data.

E. ALGORITHMIC MODEL EVALUATION: CONFUSION


MATRIX AND ROC CURVE
Model evaluation can more intuitively see the quality of the
model based on the corresponding indicators. The confusion
matrix and the ROC curve are used in this article to evaluate FIGURE 4. The third-layer data mining based on SGD.
the results.

1) CONFUSION MATRIX 2) Precision: the ratio of the correct positive number to the
The confusion matrices are also called the probability tables true and false positives number:
or the error matrices. This type of matrix is a specific matrix TP
that is used to visualize the performance of the algorithm. The Pr ecision = (18)
TP+FP
calculation formula of the overall model accuracy of FCPM,
3) Recall: the ratio of the correct positive number to the
the precision of each fault type, the recall rate, and the F1
true and false negatives number:
score are as follows:
Assuming that the test sample set has a total of S samples: TP
Re call = (19)
1) Accuracy: the ratio between the number of correct TP+FN
predictions and the total number of predictions: 4) F1: Harmonic average of the Precision and the Recall.
TP+TN 2 ∗ Pr ecision ∗ Re call
Accuracy = (17) F1 = (20)
S Pr ecision + Re call
200904 VOLUME 8, 2020
Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

The multi-class classification confusion matrix of the Then, the test statistics are constructed as follows:
model is converted into a binary classification confusion
yi = βxi + εi (25)
matrix to calculate the above indicators. Each type of fault
is considered separately from the other three types of fault. where xi is the sample vector, yi is the predicted value vector,
The three-phase fault (Gj = 4) is taken as an example: β is the variable coefficient, and εi is the difference between
the average value of a single sample and the average value of
TABLE 1. The meaning of TP, TN, FN and FP in the confusion matrix. the overall sample.
Regression sum of squares:
n
X 2
SSR = yi − ya (26)
i=1
Sum of squared residuals for regression:
n
where TP, TN, FN, and FP in formulas (17) (18) (19) are the X
SSE = (yi − y)2 (27)
number of samples that meet the above.
i=1
According to these performance indicators of the FCPM,
it can be compared with other methods to find the advantages Then, the F statistic is constructed:
and disadvantages of the method’s performance. SSR/p
F= (F ≥ 0) (28)
SSE/ (n − p − 1)
2) ROC CURVE where y is the actual value that corresponds to the sample
The Receiver Operating Characteristic Curve (ROC) is an vector, ya is the average value of y, p is the degree of freedom,
important and common model evaluation method to judge the and n is a small number of samples extracted from the sample
classification results. The ROC space defines the false posi- library.
tive rate (FPR) as the X axis and the true positive rate (TPR) The F value is used to test and measure the overall signifi-
as the Y axis. cance level of the model. When the F statistic is close to zero,
TPR: The rate of being correctly judged to be positive it proves that the original hypothesis H0 holds, which means
among all of the actually positive samples. that the overall significance level of the model is low. The
TP larger the F statistic is, the higher the significance level of the
TPR = (21) model, which proves that the model fits well and the model
TP + FN
is built successfully.
FPR: the rate of being falsely judged to be positive among
all of the actually negative samples.
2) BIG O NOTATION
FP The more statements that are executed in the algorithm,
FPR = (22)
FP + TN the more time it takes for the computation. The number of
Given a classification model, a coordinate point (X=FPR, executions of a statement in an algorithm is called the time
Y=TPR) can be calculated from the true and predicted values frequency, which is denoted as V (n), where n is the number of
of all of the samples. In a model, the coordinates (FPR, TPR) samples. If there is an auxiliary function f (n) such that when n
under different thresholds are drawn in the ROC space, which approaches infinity, the limit value of V (n)/f (n) is a constant
becomes the ROC curve of the specific model. that is not equal to zero, then f (n) is said to be a function
of the same magnitude as V (n), and thus, it is denoted as
F. STATISTICAL TEST AND ALGORITHM TIME COMPLEXITY V (n) = O(f (n)), which is called the time complexity.
To judge about the significance of the results, the statistical The calculation method is called Big O notation, whose
test method is added to the discussion. In addition, in con- derivation rules are as follows: 1) O(1) represents the time
sideration of the effectiveness of the proposed method, the complexity of all constant functions. 2) The time complexity
time complexity and the computational running time are also of other functions retains only the highest order, and its
discussed. coefficient is 1.

1) F TEST G. REALIZATION PROCESS OF HYBRID ALGORITHM


The statistical test method used in this paper is the F test, PREDICTION METHOD
which tests the overall significance of the linear regres- The flowchart of the method’s implementation is shown in
sion equation. The multiple variables in the model are used Fig. 5:
to judge the significance of the impact, and the following
assumptions are constructed: IV. EXAMPLES
The calculation examples in this section are compiled and run
H0 = β1 = β2 = . . . = βn = 0 (23) in the jupyter notebook of Anaconda with the help of some
H1 = ∃i ∈ {1, 2, . . . , n} , s.t. βi 6 = 0 (24) SK-learn toolkit functions in the Windows 7 environment.

VOLUME 8, 2020 200905


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

B. DATA COLLECTION AND PREPROCESSING


When different fault types occur at 8 nodes, the voltage data
on the other nodes are collected. After the simulation is
completed, the waveform is obtained, and the voltage data
are established in the data table. The calculation example is
shown in Fig. 6, where more than 20,000 sets of data on the
different fault types are used as source sample data to train
the different fault models. When the three-phase short-circuit
fault occurs at node 8, the voltage waveforms of node 7 and
node 8 are shown in Fig. 7. The horizontal axis of the figure
is the time, and the vertical axis is the voltage value.
The first-layer clustering preprocessing is to randomly
select K as the initial number of the classification clusters,
and after the criterion function converges, the contour coeffi-
cient method is used to find the best K value of the samples
on each node. When four types of short-circuit faults occur
at 8 nodes, all of the voltage data samples at node 2 in
the voltage data source sample library are extracted, and
then, they are clustered according to the clustering process
mentioned in section III.B. Finally, the contour coefficient
method is used to match the best K value of the voltage data
samples at node 2, and the result is shown in Fig. 8. It can be
seen that when K = 10, the contour coefficient Sk is the local
maximum, and therefore, the K value of the cluster number
of the voltage data samples at node 2 is set to 10, and all of the
voltage data samples are divided into 10 clusters. The voltage
data samples at other nodes are also divided into the most
suitable number of clusters according to the above method,
where the data samples in each cluster have large similarity
FIGURE 5. Overall method flow.
after clustering.
After the data preprocessing of all nodes is completed,
A. SIMULATION ESTABLISHMENT the classified data samples are encoded through encoding
The U.S. Western Power Grid WSCC 9 bus system is taken as rules. For example, when the amplitude of the voltage data
an example, and some parameters were modified to establish at node 2 ranges from 108.26 to 226.82, the optimal K
a simulation model of the power system. The classification value is 10, and thus, the voltage data samples are divided
and prediction of common short-circuit faults in power grids into 10 clusters. The sample value ranges of each clus-
is studied. As shown in Fig. 6, the fault simulation model is ter are [108.26,113.45], [113.46,151.40], [151.41,164.36],
established using PSCAD, which is electromagnetic transient [164.37,177.85], [177.86,182.45], [182.46,185.66],
simulation software [36]. Firstly, the node load and network [185.67,188.49], [188.50,192.15], [192.16,206.17], and
parameters of the IEEE 9 bus system were set, where node 8 is [206.18,226.82]. When the voltage of a certain classified data
set as the faulty node, and a universal meter on each node sample is 158.60, the value falls in the third cluster of the
was installed to obtain the real-time voltage data. Then, the voltage range at node 2, and thus, the data sample is recorded
occurrence of the faults was controlled through time fault as 203. The voltage data samples at the other nodes are also
logic, to ensure the timeliness of addressing faults in practice, encoded in the same way to obtain sample library I. The
and the time for the occurrence of a fault is set to 0.2 s. The self-encoding form of some samples in sample library I is
control type is external control, a dial was set that can change shown in Table 2:
the fault type by manual interactive control, and the number
on the dial corresponds to a certain fault. For example, if the C. ASSOCIATION MINING AND REGRESSION TRAINING
value of the dial is 1, it corresponds to SPGF, and the dial After the source samples’ clustering is preprocessed, the
was linked to the control panel user interface, which changes Apriori algorithm is used to mine the association rules for
the fault type by changing the digital position on the control the data samples of sample library I: Firstly, the minimum
panel. When a type of fault occurs, the voltage data are support is set to 0.3, and the minimum confidence is set to
collected separately. This simulation mainly collects the data 0.7. The sample sets in sample library I whose support degree
within the visible range of the waveform graph. Faults in is greater than 0.3 are determined to be the frequent itemsets.
multiple time periods are set, and the time of the voltage Then, all of the frequent itemsets that meet the conditions of
sample collection is the same after all of the faults. the confidence being greater than 0.7 and the lift being greater

200906 VOLUME 8, 2020


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

FIGURE 6. IEEE 9 bus system fault simulation model.

FIGURE 7. The voltage waveform of node 7 and node 8.

than 1 are screened out as samples of the strong association


rules. The mining result of the frequent itemsets is partially FIGURE 8. The best K of the voltage samples of node 2 (three-phase
fault).
listed in Table 3:
The brackets and arrows in the second column of
Table 3 represent that the preceding event could cause the those samples in sample library II that are highly related to
subsequent event to occur. The corresponding values of the single-phase ground faults is shown in Table 4:
support degree, the confidence degree, and the lift degree are Before the training on sample library II, to avoid the
given in the third, fourth and fifth columns, respectively. For sensitivity of the proposed method to the parameter value,
example, the strong association rule of {101,206,906}→{3} the 10-fold cross-validation method is used to optimize the
in the seventh row and second column of Table 3 is that parameters of L, R, α in formula (14) and hyperparameter
when the voltage data at node 1 are in the first cluster, the N , and the model is retrained by the training set under the
voltage at node 2 are in the sixth cluster, and the voltage at optimal parameters to obtain the optimal model; finally, the
node 9 is in the sixth cluster, which could cause two-phase model result is verified by the test set. The specific imple-
ground faults at the faulty node. All of the sample groups mentation is as follows:
with strong association rules are sorted from sample library II 1) All of the samples of a certain fault in sample library II
according to the type of faults. The self-encoding form of are subjected to 10-fold cross-validation. All of the

VOLUME 8, 2020 200907


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

TABLE 2. The self-encoding form of some samples in sample library I. highest cross-validation score are taken as the optimal
parameters.
2) Sample library II is divided into training set A and test
set B.
3) Training set A retrains the model under the optimal
parameters.
4) Finally, test set B tests the model and obtains the results.
To prove that the regression training accuracy of the sample
library after the clustering and association rule mining is
higher than after only clustering (without mining by asso-
ciation rules), sample library I is also subjected to 10-fold
cross-validation. For example, in the parameter optimization
process of the single-phase short-circuit fault model, after
each group of parameters is substituted into the SGD algo-
rithm program, the cross-validation scores of sample library I
and sample library II that correspond to these solutions are
listed, as shown in Table 5. Then, the cross-validation scores
are compared, and the optimal solution of the parameters
TABLE 3. Partial rules mined from the frequent itemsets.
(L, R, α, N ) of each fault model is selected. It can be seen that
the cross-validation score is the highest at 0.556 during the
regression of sample library I, and the corresponding optimal
parameters group is (Log, L1, 0.1, 1000). During the regres-
sion of sample library II, the cross-validation score is the
highest at 0.788, and the corresponding optimal parameters
group is (Log, L2, 0.1, 500). In addition, the cross-validation
scores of sample library II is higher than those of sample
library I under the same parameter groups.
It can be seen from Table 5 that under the mathemat-
ical model of SGD, the optimal loss functions of sample
library I and sample library II both selected logistic regres-
sion. Because the amount of classification calculation in
Logistic regression is less and the storage resource is less, the
training data can be quickly integrated into the model. Com-
pared with SVM, it is easy to obtain the probability scores of
the samples. For the optimal regularization items, the sample
TABLE 4. The self-encoding form of the some samples of sample library I chooses L1, because the features between the sam-
library II.
ples are not obvious in sample library I, and thus, the features
must be sparse, which reduces the number of weight parame-
ters and the complexity of the model. Sample library II selects
L2, which reflects that the features between the samples in
sample library II have a certain similarity after the association
rules, and therefore, the complexity of the model is reduced
only by reducing the value of the weight. In addition, L2
can also be combined with logistic regression to solve mul-
ticollinearity problems. Both choices for α are 0.1, which
reflects the same degree of simplification of the parameter
scale. For the optimal number of iterations N, sample library
I iterated 1000 times, while sample library II iterated only
500 times. It can be seen that the cross-validation training
of sample library II has a short convergence time, fast fitting
speed, and lower model complexity.
To further explain that the data mining process using the
samples are divided into 10 parts, each of which clustering and association rules is more accurate than that
is used as a cross-validation set in turn, and the using only clustering, training set A in sample library I and
other 9 parts are used as a training set. The samples training set A in sample library II are used to retrain the model
were trained 10 times in total. The parameters with the separately under each group of parameters, and the SGD test

200908 VOLUME 8, 2020


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

TABLE 5. Cross-validation scores of the sample library I and the sample in advance by adopting the cross-validation method, which
library II in the single-phase short-circuit fault model.
greatly accelerates the training speed of the fault prediction
model and improves the accuracy of the model.
Through the obtained optimal parameter set (L, R, α, N ),
the optimal solutions of the model parameters w of different
fault types are solved by SGD iterations according to formu-
las (14) and (15), which are shown in Table 6, where a positive
value indicates a positive correlation that makes the variable
and the dependent variable change in the same direction; a
negative value indicates a negative correlation that makes the
variable and the dependent variable change in the opposite
direction.

TABLE 6. The model parameter w of the different fault types.

D. TEST SET VERIFICATION


To measure the performance of the fault prediction model
obtained by the proposed method, the test set of the
source sample library is selected to test the model. There
are 2658 group samples, which are randomly selected from
the source sample library to participate in the test, and the
confusion matrix is used to evaluate the accuracy of the test
results and the precision of each fault type. The result of the
confusion matrix of the test is shown in Fig. 10, in that the
scores of each group of parameters are compared. As shown predicted and actual value distributions of each fault type can
in Fig. 9(a), it can be seen that sample library I obtains the be found, where each row represents the real fault type, and
highest SGD test score at the 20th group of the parameters each column represents the fault type predicted by the model.
(Log, L1, 0.1, 1000), which is 0.71. Sample library II obtains The darker the color is, the larger the number of samples.
the highest SGD test score at the 13th group of the parameters The test results of the test set show that the overall accuracy
(Log, L2, 0.1, 500), which is 0.82. Furthermore, the parame- of the model is 93.8%. The precision of each fault model
ters of the other fault models have also been optimized, and is the following: the single-phase ground fault is 94.7%, the
the SGD test scores of sample library I and sample library II two-phase phase-phase fault is 91%, the two-phase ground
are obtained; the results are shown in Fig. 9(b), 9(c) and 9(d). fault is 95.4% and the three-phase fault is 93%. Therefore, the
It can be seen from Fig. 9 that under the same parameters, accuracy of the FCPM of sample library II under the optimal
the SGD test score of sample library II is always higher parameters is high, and the prediction precision of each fault
than that of sample library I. This finding occurs because model is also high.
the parameter optimization directly uses SVM or SGD or
other algorithms in sample library I, which is slow and can E. STATISTICAL TEST AND COST EFFECTIVENESS OF THE
easily fall into a local optimum. However, the samples in PROPOSED METHOD
sample library II, which were processed by the clustering Because the classification prediction model is a multiple lin-
and association rules, are more closely related to each other, ear regression model, the F test mentioned in section III.F.1)
and thus, the parameter optimization speed is faster, and the is used to test the significant difference of the model and
best classification point or the best classification line or the whether the selection of multiple parameters in the model is
best classification surface will be found accurately. More- appropriate. 100 sets of samples were selected for the F test,
over, the proposed method optimizes the unknown parameters and 8 non-faulty nodes decided the degree of freedom p = 8.

VOLUME 8, 2020 200909


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

FIGURE 9. SGD test scores of the different fault models. (a) Single-phase ground fault model. (b) two-phase phase-phase fault
model. (c) two-phase ground fault model. (d) three-phase fault model.

TABLE 7. The variance analysis of the regression equation.

of 95%. When p = 8, Fp = 1.94, F > Fp . Therefore, there


are significant differences between the variables and depen-
dent variables in this model, and the model is constructed
reasonably.
To measure the computational complexity of the model
algorithm, the Big O notation method mentioned in
FIGURE 10. Confusion matrix for various types of faults.
section III.F.2) is used to calculate the time complexity of the
algorithm. The time complexity of each method is shown in
The variance analysis of the regression equation is shown in Table 8:
Table 7: The time complexity of the overall model is O (n2 ), and
It is calculated that F=87.58, which refers to the F-value the commonly used time complexity from small to large is
table with a significance level of 0.05 and a confidence level O(1)<O(n)<O(nlogn)<O(n2 )<O(n3 )<O(2n )<O(n!), thus, the

200910 VOLUME 8, 2020


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

TABLE 8. The time complexity of each method.

time complexity of the overall model is feasible. In addition,


the overall running time of the program is 3.7 s. The over-
all cost effectiveness of the proposed method is not large,
which guarantees the timeliness of the fault classification and
prediction.

F. METHODS COMPARISON AND VERIFICATION


The proposed method is compared with other methods to
accomplish its algorithm verification, which further illus-
trates the effectiveness and scalability of the proposed
method. The regression algorithms involved in the compari-
son include logistic regression [37], SVM [38], [39], random
forest [40], [41], SGD and the proposed method. The con-
structed model is shown in Fig. 11(a), and the test scores of
the performance indexes are shown in Fig. 11(b).
To compare the performances of several algorithms, the
ROC curve of each algorithm is shown in Fig. 12. The larger
the area under the ROC curve (AUC) is, the better the per-
formances of the algorithms. In addition, the F1 scores, the
precision and the recall rate of the algorithms are shown in
Fig. 13.
As can be seen from Fig. 12, the AUC values of the FIGURE 11. Model building and the scores of each evaluation index.
proposed method, logistic regression, random forest, SVM,
and SGD are 0.921, 0.822, 0.782, 0.753, 0.633, respectively,
which proves that the proposed method has the best classifi-
cation and prediction effect.
In addition, it can be seen from Fig. 13 that the perfor-
mance index scores of all of the algorithms sorting from the
largest to the smallest are as follows: the proposed method,
logistic regression, random forest, SVM and SGD. Before
the parameter optimization and the association rules, the F1
value, the precision rate and the recall rate of the model are
64.5%, 66.6%, and 67.2%, respectively. After the parameter
optimization and the association rules, the F1 value, the pre-
cision rate and the recall rate of the model are 90.9%, 93.8%,
and 91.2%, respectively. The experimental results show that
compared with logistic regression, SVM and random forest,
the performance indexes of the proposed method are signif-
icantly better. The reason is that there is a common prob- FIGURE 12. Comparison of the ROC curve of the different regression
lem with SVM, random forest and logistic regression: When algorithms.
faced with a small amount of data and ambiguous features,
it cannot classify or perform regression very well, and thus, and association rules. The parameter optimization makes
the performance will be defective. However, the proposed the model results better, and thus, the performance indexes
method clarifies the features in advance through clustering are better than the above three algorithms. In addition, the

VOLUME 8, 2020 200911


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

test is completed, the accuracy of the fault classification, the


runtime and the computational cost of two systems are shown
in Table 9:

TABLE 9. Comparison of fault classification accuracy, runtime and


computational cost in the different systems.

FIGURE 13. The F1 score, precision and recall of the algorithm model. Among them, B stands for billion, which is the unit of the
number of the times the computer runs. It can be seen from
the table that, compared with the WSCC 9 bus system, the
proposed method also proves that the performance index accuracy of the fault classification results has little change,
scores of the SGD when directly used for regression is not being 93.8% and 89.2% respectively. The computational cost
high. However, after the source samples are processed by is 0.235B, which proves that the proposed method can be
the clustering preprocessing and association rules, SGD after computationally efficient to be used in practical applications.
parameter optimization is used to train the processed samples It is concluded that with the increase of the system nodes,
again, and the performance index scores are significantly the runtime and the computational cost will increase, which
improved. The reason is that the loss function used by SGD is because the runtime and the computational cost are related
each time is only determined by a small batch of data, and to the complexity of the actual system, such as the number of
the loss function is different from the real complete set loss nodes, the number of line branches, etc. Therefore, it can be
function; thus, the gradient of its solution also contains a predicted that in a more complex actual system, the compu-
certain degree of randomness. At a saddle point or local tational cost will be greater.
minimum point, it will oscillate and jump, and thus, the result
is that the prediction accuracy is not high. Applying clustering V. CONCLUSION
and association rules to filter the samples in advance can Through multiple data mining methods, including clustering,
reduce the irrelevance between small batches of data and association rules, cross-validation optimization and SGD, the
reduce the shock. Therefore, the performance indexes of SGD proposed method identifies the data samples that are strongly
are improved based on the fast data training speed of the related to the specific faults and determines the potential
clustering preprocessing and association rules. laws for building a more accurate fault classification and
prediction model. Moreover, through the existing operating
G. PRACTICAL APPLICATIONS data, the proposed method can predict which type of fault will
To prove the scalability of the proposed method, a short- occur soon, and it plays an important role in the classification
circuit fault classification and prediction test is carried out and prediction of fault types. For the specific fault models,
with the help of a power grid cooperation project in a certain the proposed method uses an optimization algorithm to deter-
urban area distribution network. The distribution network mine the optimal parameters of the fault models after clus-
has 8 generators, 12 transformers, and 56 data collection tering and association mining of the training samples; then,
points. According to the method in the article, in the dis- the fault models are obtained from training samples under the
tribution network, more than 20,000 sets of voltage data on optimal parameters, and thus, the effect of the classification
the different fault types are used as source sample data, the and prediction is better than other methods mentioned in this
amount of the source sample data and the types of the fault paper.
are the same as the WSCC 9 bus system. In this experiment, The proposed method processes the source data in advance,
the voltage data on the 56 collection points in the distribu- avoiding the low accuracy of the fault classification and
tion network affect the fault types together, the data at each prediction model due to the low-impact or irrelevant data,
collection point have been processed by the algorithm model and it can realize the fault classification and prediction of
for a total of 210 times, therefore, the computational cost the power system in time and accurately. Otherwise, the pro-
of the classification and prediction model in the distribution posed method can be widely applied to the fault classification
network is 56 × 20000 × 210 = 0.235B. In addition, the and prediction of various busbars, transformers, transmis-
computational cost of the classification and prediction model sion lines in the power system and the classification and
in the WSCC 9 bus system is 9 × 20000 × 78 ≈ 0.014B. This prediction of the other systems that involve multi-attribute
experiment was also done on a personal computer. After the classification. In addition, it can also be extended to medical

200912 VOLUME 8, 2020


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

disease prediction, electronic communication fault detection [4] L. Bessissa, L. Boukezzi, D. Mahi, and A. Boubakeur, ‘‘Lifetime estima-
and other fields. tion and diagnosis of XLPE used in HV insulation cables under thermal
ageing: Arithmetic sequences optimised by genetic algorithms approach,’’
However, the proposed method has some limitations: IET Gener., Transmiss. Distrib., vol. 11, no. 10, pp. 2429–2437, Jul. 2017.
through the experiment test on the distribution network in [5] D. Kumar, I. Kamwa, and S. R. Samantaray, ‘‘Multi-objective design of
an urban area, it is found that the proposed method can- advanced power distribution networks using restricted-population-based
multi-objective seeker-optimisation-algorithm and fuzzy-operator,’’ IET
not achieve rapid and real-time fault prediction in practical Gener., Transmiss. Distrib., vol. 9, no. 11, pp. 1195–1215, Aug. 2015.
applications, and the timeliness needs to be further improved. [6] L. Ali, A. Niamat, J. A. Khan, N. A. Golilarz, X. Xingzhong, A. Noor,
In addition, this paper only considers the voltage data of each R. Nour, and S. A. C. Bukhari, ‘‘An optimized stacked support vector
machines based expert system for the effective prediction of heart failure,’’
node in the same time period under different fault conditions,
IEEE Access, vol. 7, pp. 54007–54014, 2019.
without further analysis the high-dimensional data sample [7] T. K. Saha and P. Purkait, ‘‘Investigation of an expert system for the con-
composed of voltage data of each node, current data of each dition assessment of transformer insulation based on dielectric response
branch and time. Therefore, the future research work will measurements,’’ IEEE Trans. Power Del., vol. 19, no. 3, pp. 1127–1134,
Jul. 2004.
focus on the practical engineering applications of the multi- [8] S. Chen, H. Ge, J. Li, and M. Pecht, ‘‘Progressive improved convolu-
dimensional data of power systems for fault classification and tional neural network for avionics fault diagnosis,’’ IEEE Access, vol. 7,
prediction. pp. 177362–177375, 2019.
[9] G. Rigatos, A. Piccolo, and P. Siano, ‘‘Neural network-based approach for
early detection of cascading events in electric power systems,’’ IET Gener.,
APPENDIX Transmiss. Distrib., vol. 3, no. 7, pp. 650–665, Jul. 2009.
[10] H. Malik and S. Mishra, ‘‘Artificial neural network and empirical mode
The nomenclature of the article: decomposition based imbalance fault diagnosis of wind turbine using
TurbSim, FAST and simulink,’’ IET Renew. Power Gener., vol. 11, no. 6,
Terms Abbreviations pp. 889–902, May 2017.
[11] J. Liu, Z. Zhao, C. Tang, C. Yao, C. Li, and S. Islam, ‘‘Classifying
transformer winding deformation fault types and degrees using FRA based
support vector machine SVM on support vector machine,’’ IEEE Access, vol. 7, pp. 112494–112504,
long-short term memory LSTM 2019.
stochastic gradient descent SGD [12] F. N. Rudsari, A. A. Razi-Kazemi, and M. A. Shoorehdeli, ‘‘Fault analysis
empirical mode decomposition EMD of high-voltage circuit breakers based on coil current and contact travel
waveforms through modified SVM classifier,’’ IEEE Trans. Power Del.,
fault classification and prediction vol. 34, no. 4, pp. 1608–1618, Aug. 2019.
model FCPM [13] H. Lei, H. Yifei, and G. Yi, ‘‘The research of business intelligence system
single-phase ground fault SPGF based on data mining,’’ in Proc. Int. Conf. Logistics, Informat. Service Sci.
(LISS), Jul. 2015, pp. 1–5.
two-phase phase-phase fault TPPF
[14] S. D. Mohaghegh, ‘‘Reservoir simulation and modeling based on artificial
two-phase ground fault TPGF intelligence and data mining (AI&DM),’’ J. Natural Gas Sci. Eng., vol. 3,
three-phase fault TPF no. 6, pp. 697–705, 2011.
structural risks SR [15] H. L. Han, H. Y. Ma, and Y. Yang, ‘‘Study on the test data fault mining
technology based on decision tree,’’ Procedia Comput. Sci., vol. 154,
empirical risk ER pp. 232–237, Jan. 2019.
loss function L [16] F. Ciarapica, M. Bevilacqua, and S. Antomarioni, ‘‘An approach based on
receiver operating characteristic ROC association rules and social network analysis for managing environmental
risk: A case study from a process industry,’’ Process Saf. Environ. Protec-
true positive TP tion, vol. 128, pp. 50–64, Aug. 2019.
true negative TN [17] M. Krishnan and G. Jabert, ‘‘Detection of soil borne pathogens in coffee
false positive FP plantations by modified k-means clustering,’’ in Proc. Int. Conf. Opt. Imag.
false negative FN Sensor Secur. (ICOSS), Coimbatore, India, Jul. 2013, pp. 1–8.
[18] S. Debnath and M. Saeedifard, ‘‘Simulation-based gradient-descent opti-
true positive rate TPR mization of modular multilevel converter controller parameters,’’ IEEE
false positive rate FPR Trans. Ind. Electron., vol. 63, no. 1, pp. 102–112, Jan. 2016.
regression sum of squares SSR [19] A. Garcés, ‘‘On the convergence of Newton’s method in power flow
studies for DC microgrids,’’ IEEE Trans. Power Syst., vol. 33, no. 5,
sum of squared residuals for pp. 5770–5777, Sep. 2018.
regression SSE [20] A. A. El-Fergany and M. El-Arini, ‘‘Meta-heuristic algorithms-based real
area under receiver operating power loss minimisation including line thermal overloading constraints,’’
characteristic curve AUC IET Gener., Transmiss. Distrib., vol. 7, no. 6, pp. 613–619, Jun. 2013.
[21] L. Yang, S. L. Ho, and W. N. Fu, ‘‘Design optimizations of electromag-
netic devices using sensitivity analysis and Tabu algorithm,’’ IEEE Trans.
REFERENCES Magn., vol. 50, no. 11, pp. 1–4, Nov. 2014.
[1] G. M. Ali and S. A. Al-Mawsawi, ‘‘Multiple UPFCs mathematical model [22] H. Jia, J. Li, W. Song, X. Peng, C. Lang, and Y. Li, ‘‘Spotted hyena
enhancing multi-machine power system control,’’ in Proc. 10th Jordanian optimization algorithm with simulated annealing for feature selection,’’
Int. Electr. Electron. Eng. Conf. (JIEEEC), May 2017, pp. 1–4. IEEE Access, vol. 7, pp. 71943–71962, 2019.
[2] Q. Wang and P. Qiu, ‘‘The application of equipment overheating and arcing [23] A. Y. Abdelaziz, R. A. Osama, and S. M. El-Khodary, ‘‘Reconfiguration
fault warning and protection systems of switchgear in power systems,’’ of distribution systems for loss reduction using the hyper-cube ant colony
in Proc. IEEE Innov. Smart Grid Technol.-Asia (ISGT Asia), May 2019, optimisation algorithm,’’ IET Gener., Transmiss. Distrib., vol. 6, no. 2,
pp. 1135–1137. pp. 176–187, 2012.
[3] L. Song, H. Wang, and P. Chen, ‘‘Step-by-step fuzzy diagnosis method for [24] Z. Wang, Y. Fu, C. Song, P. Zeng, and L. Qiao, ‘‘Power system anomaly
equipment based on symptom extraction and trivalent logic fuzzy diagnosis detection based on OCSVM optimized by improved particle swarm opti-
theory,’’ IEEE Trans. Fuzzy Syst., vol. 26, no. 6, pp. 3467–3478, Dec. 2018. mization,’’ IEEE Access, vol. 7, pp. 181580–181588, 2019.

VOLUME 8, 2020 200913


Y. Wang et al.: Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure

[25] X. Zhou, Z. Wang, D. Li, H. Zhou, Y. Qin, and J. Wang, ‘‘Guidance YUNLIANG WANG received the M.S. degree
systematic error separation for mobile launch vehicles using artificial fish in power systems and automation from Tianjin
swarm algorithm,’’ IEEE Access, vol. 7, pp. 31422–31434, 2019. University, Tianjin, China, in 1988. He is cur-
[26] S. Zhang, Y. Wang, M. Liu, and Z. Bao, ‘‘Data-based line trip fault rently a Professor with the School of Electrical
prediction in power systems using LSTM networks and SVM,’’ IEEE and Electronic Engineering, Tianjin University of
Access, vol. 6, pp. 7675–7686, 2018. Technology. His current research interests include
[27] Y. Guo, G. Li, H. Chen, J. Wang, M. Guo, S. Sun, and W. Hu, ‘‘Optimized intelligent control, data mining, multi-motor
neural network-based fault diagnosis strategy for VRF system in heating
coordinated control, microcomputer control, and
mode using data mining,’’ Appl. Thermal Eng., vol. 125, pp. 1402–1413,
power electronics technology.
Oct. 2017.
[28] L.-H. Ren, Z.-F. Ye, and Y.-P. Zhao, ‘‘A modeling method for aero-engine
by combining stochastic gradient descent with support vector regression,’’
Aerosp. Sci. Technol., vol. 99, Apr. 2020, Art. no. 105775.
[29] Y. Li, R. W. Liu, Z. Liu, and J. Liu, ‘‘EMD-based recurrent neural network
with adaptive regrouping for port cargo throughput prediction,’’ in Proc.
Int. Conf. Neural Inf. Process., 2018, pp. 499–510.
[30] Y. Li, R. W. Liu, Z. Liu, and J. Liu, ‘‘Similarity grouping-guided neural XIAODONG WANG was born in Handan, Hebei,
network modeling for maritime time series prediction,’’ IEEE Access,
China. In 2018, he joined the School of Electri-
vol. 7, pp. 72647–72659, 2019.
cal and Electronic Engineering, Tianjin University
[31] M. Tian, L. Zhang, P. Guo, H. Zhang, Q. Chen, Y. Li, and A. Xue, ‘‘Data
dependence analysis for defects data of relay protection devices based on of Technology, where he is currently a Graduate
apriori algorithm,’’ IEEE Access, vol. 8, pp. 120647–120653, 2020. Student. His major is electrical engineering. His
[32] G. Cui, J. Guo, Y. Fan, Y. Lan, and X. Cheng, ‘‘Trend-smooth: Accelerate main research interests include smart grid, artifi-
asynchronous SGD by smoothing parameters using parameter trends,’’ cial intelligence, data mining, and power system
IEEE Access, vol. 7, pp. 156848–156859, 2019. fault prediction.
[33] A. B. Prasetijo, R. R. Isnanto, D. Eridani, Y. A. A. Soetrisno, M. Arfan,
and A. Sofwan, ‘‘Hoax detection system on Indonesian news sites based on
text classification using SVM and SGD,’’ in Proc. 4th Int. Conf. Inf. Tech-
nol., Comput., Electr. Eng. (ICITACEE), Semarang, Indonesia, Oct. 2017,
pp. 45–49.
[34] F. Kabir, S. Siddique, M. R. A. Kotwal, and M. N. Huda, ‘‘Bangla text
YANJUAN WU received the M.S. and Ph.D.
document categorization using stochastic gradient descent (SGD) classi-
degrees in power systems and automation from
fier,’’ in Proc. Int. Conf. Cognit. Comput. Inf. Process. (CCIP), Mar. 2015,
pp. 1–4. Tianjin University, Tianjin, China, in 2005 and
[35] X. Zhang, N. Gu, R. Yasrab, and H. Ye, ‘‘GT-SGD: A novel gradient 2013, respectively. She is currently an Asso-
synchronization algorithm in training distributed recurrent neural network ciate Professor with the School of Electrical
language models,’’ in Proc. Int. Conf. Netw. Netw. Appl. (NaNA), Oct. 2017, and Electronic Engineering, Tianjin University of
pp. 274–278. Technology. Her current research interests include
[36] X. Han and H. Zhang, ‘‘Power system electromagnetic transient and elec- intelligent control, data mining, smart grids, and
tromechanical transient hybrid simulation based on PSCAD,’’ in Proc. 5th grid optimization and control.
Int. Conf. Electr. Utility Deregulation Restruct. Power Technol. (DRPT),
Changsha, China, Nov. 2015, pp. 210–215.
[37] Z. Zhang and Y. Han, ‘‘Detection of ovarian tumors in obstetric ultrasound
imaging using logistic regression classifier with an advanced machine
learning approach,’’ IEEE Access, vol. 8, pp. 44999–45008, 2020.
[38] N. Yang and Y. Wang, ‘‘Identify silent data corruption vulnerable instruc-
tions using SVM,’’ IEEE Access, vol. 7, pp. 40210–40219, 2019. YANNAN GUO was born in Chifeng, Inner Mon-
[39] X. Yuan, Z. Liu, Z. Miao, Z. Zhao, F. Zhou, and Y. Song, ‘‘Fault diagnosis golia. He received the master’s degree in electrical
of analog circuits based on IH-PSO optimized support vector machine,’’ and electronic engineering from the Tianjin Uni-
IEEE Access, vol. 7, pp. 137945–137958, 2019. versity of Technology. He currently works with
[40] Y. Shi, L. Cui, Z. Qi, F. Meng, and Z. Chen, ‘‘Automatic road crack Tianjin Tianda Qiushi Electric Power High Tech-
detection using random structured forests,’’ IEEE Trans. Intell. Transp. nology Company Ltd. His main research interests
Syst., vol. 17, no. 12, pp. 3434–3445, Dec. 2016. include power system automation, smart grid, and
[41] B. Ni, S. Yan, M. Wang, A. A. Kassim, and Q. Tian, ‘‘High-order local power fault detection.
spatial context modeling by spatialized random forest,’’ IEEE Trans. Image
Process., vol. 22, no. 2, pp. 739–751, Feb. 2013.

200914 VOLUME 8, 2020

You might also like