Analysis and Comparison of Machine Learning Approaches For Transmission Line Fault Prediction in Power Systems

Journal of Research in Engineering and Applied Sciences
ANALYSIS AND COMPARISON OF MACHINE LEARNING

APPROACHES FOR TRANSMISSION LINE FAULT
PREDICTION IN POWER SYSTEMS
1
Meera Viswavandya, 2Shashwat Patel, 3Kaushik Sahoo
1
Head of Department, Electrical Engineering, College of Engineering and Technology, Bhubaneswar, India
2,3
B.Tech. Student, Electrical Engineering, College of Engineering and Technology, Bhubaneswar, India
Email: {1mviswavandya@cet.edu.in, 2shashwatpatel5@gmail.com, 3kaushiksahoo54@gmail.com}
Abstract
The transmission lines suffer from various faults subjected to numerous natural as well as manmade causes. This paper presents
a proposed MATLAB-SIMULINK model for generation of such random disturbances. The output of the system is input to
another python-based model in order to detect and predict the exact nature of disturbances using various artificial neural networks
with their respective accuracy scores. This paper provides a brief comparison between Decision Tree Classifier, Random Forest
Classifier, Support Vector Machines, K-Nearest Neighbors and Multi-Layer Perceptron methodologies for detection of line to
ground fault, as an example in this model-based approach.
Key Words – Transmission Line Faults, K-Nearest Neighbors, Multi-Layer Perceptron, Support Vector Machines,
Decision Tree, Random Forest
1. Introduction kinds of fault in real-time to restore uninterrupted supply

within the minimum possible time thereby increasing the
We live in an era of ever-increasing power demand. reliability of the overall power system. In reality, the
Nowadays, every power utility is working hard with transmission line system consists of thousands of
immense efforts to reduce the consequences of power interconnected buses and protective equipment that makes
failure and to reduce system downtime, keeping in mind the conventional study unsuitable for fault detection and
that every transmission line has its own operating limits. classification accurately in real time. The conventional
Faults within a transmission line should be cleared as soon study includes applications of traditional distance relay as a
as possible to increase the overall reliability of the system parameter of study, which may introduce additional errors
[1-2]. into the system. Fault classification is generally done by
Faults may occur in the transmission line for different comparing the matrix values of current and voltage in a
reasons. Each type of fault has different phase angles, healthy phase with the help of fault time matrix values,
magnitude, and intensity at the sink point [3-4]. The sink requiring high computational power and software
point of fault may result in increase in the magnitude of examining efficiency. [9-12]
phase current or decrease in the magnitude of phase It takes unnecessary time in classification, and decision-
voltage. Intensity of the fault depends on the type of fault making leading to decreased reliability of the overall
occurring at that point e.g. Line to ground (L-G), Line to system.
line (L-L), Double line to ground (L-L-G) or three-phase In the present scenario, utilities and customers need high
fault (L-L-L). Among all these faults, the most frequently reliability of power systems. Hence, the system needs to be
occurring that is 70% of all faults are line to ground (L-G) error-free, efficient, and able to take various autonomous
fault [5-8]. When a particular kind of fault involves decisions in case of a critical situation. This paper,
disturbances in all the three phases is termed as introduces various machine learning approaches such as K-
symmetrical fault, and another configuration of fault Nearest neighbors, Multilayer perceptron, Support vector
involving faults in one or two phases is termed as machine, and Decision tree classifier for the classification
unsymmetrical fault. The need of the hour is to classify all and predictive analysis of the transmission line faults using
ISSN (Print): 2456-6411 | ISSN (Online): 2456-6403 24 JREAS, Vol. 06, Issue 01, Jan 2021
the dataset matrices generated during normal and faulted
condition. The output of this paper produces an accuracy
1.2 Predictive Algorithm
score of the above-mentioned algorithms, compares among
Supervised machine learning algorithms uses various
all the proposed Python-based models and concludes the
learning patterns to feature sets of RMS values of voltage
best method for analysis and prediction of line to ground
and current. This paper implements optimizing feature sets
fault.
to strengthen the predictive ability of four algorithms namely
KNN, SVM, Decision tree classifiers, and MLP. It also
provides a brief comparison among these algorithms based
2. Machine Learning Techniques on the Root-Mean-Square (RMS) error, and accuracy score
Machine learning enables computers to make smart obtained upon experimentation on LG fault.
decisions without being explicitly programmed. It enables 1.2.1. K-Nearest Neighbors (KNN)
computers to predict a certain output based on some
KNN is a non- parametric and lazy learning tool used for
experience data sets. A machine may learn based on certain
regression and classification of predictive problems. K’ in
mapping function (supervised learning) or some clustering
KNN is the number of nearest neighbors to include in the
algorithms (unsupervised learning). Some machine learning
majority voting process for the similarity measure. The
algorithms also revolves around decision-making
algorithm is based on the feature similarity process choosing
algorithms such as Decision-Tree Classifier and Random
the right value of ‘K’ by parameter tuning that is very
Forest Classifier. A decision tree classifier predicts the
important for improved accuracy. In this paper, the K-
value of responses by learning decision rules that are
nearest neighbor works within a python module K-
derived from certain feature points. This paper provides a
Neighbors Classifier. This classifier works as a clustering
brief comparison between various supervised algorithms
algorithm that map the distance between various feature sets.
for predicting the line to ground fault. The methodology
‘K’ value is varied between the limits i.e. 1 and 25.
opted is supervised learning techniques that includes K-
Nearest neighbors, Multi-layer perceptron, Support vector 1.2.2. Support Vector Machines (SVM)
machine and Decision tree classifier. A supervised Usually, it is much easier to classify patterns that are linearly
machine-learning algorithm requires optimizing datasets separable, that is a hyperplane separating the classes can be
with clear-cut learning patterns to perform with a good formulated so that the patterns belonging to a particular class
accuracy score and to obtain fast processing capabilities. lie in a distinct side of the hyperplane. But if the patterns are
not linearly separable, the classification task becomes much
1.1 Dataset Filtering more difficult. The SVM is capable of classifying both
linearly and non-linearly separable patterns. A hyperplane is
Transmission line fault simulation is performed using the
formulated using an instance object, which fits the dataset
MATLAB-SIMULINK platform. Datasets generated is
according to the classes. It revolves around the idea of
exported to MATLAB workspace from SIMULINK,
finding a hyperplane that best separate features into different
consisting of specific labels and specific features in RMS
domains. The point closest to the hyper-plane is called
values of volts and amperes. Feature points consists of 3
support vectors and the distance of the vectors from the
sets of voltage and current – Va, Vb, Vc and Ia, Ib, Ic .
hyper-plane is called the margins. The SVM seeks to draw
an optimal hyperplane between the classes that maximize the
margin of separation between the classes, so that the number
of misclassified classes is reduced. In this paper, Radial
Basis Function (RBF) is used as a non-linear kernel function
for the SVM model. SVM works fine with both linear and
non-linear kernel functions using sklearn. SVM module runs
on anaconda-python IDE.
1.2.3. Multi-Layer Perceptron (MLP)
MLP is a kind of supervised learning technique principally
working with backpropagation algorithm. MLP neural
Fig. 1: Snapshot of Dataset networks use a gradient descent approach to update their
iterative weights in a feed-forward neural network, so that Among all the power system equipment, transmission line
after training and testing, the MLP captures the inherent is most exposed to environment. Hence, the transmission
characteristics of the training data and can act as a non- line is more prone to faults compared to any other equipment
linear model of the actual system, in this case, a fault that affects its stability and operating limits. The parameters
classifier. In this paper, MLP is used to separate non- of the transmission line that vary during the fault conditions
linearly separable data using a non-linear activation are voltage, current, and impedance value of the line. At the
function using sklearn.neural_network running on safe operating limits, transmission line carries the rated
anaconda-python IDE. voltage and current. As a fault occurs in the line the value of
voltage and current deviate from their nominal values. These
1.2.4. Decision Tree (DT)
values follow a specific pattern depicting the fault nature
In this method, a supervised and non-parametric method is when compared to the standard operating waveforms of
used to classify feature sets and is based on a decision tree voltage and current.
rule traversing to multiple nodes. In this paper, the decision Some major causes of faults include open-circuit fault
tree is imported using sklearn.tree module in anaconda- and short circuit fault. Open-circuit fault results from the
python IDE for experimentation. uneven breakage of the conductors or false opening of circuit
1.2.5. Random Forest (RF) breakers. Short-circuit fault occurs due to the physical
breakage of a transmission line or due to the loss of
RF is a supervised learning technique that consists of
insulation on the line or due to improper installation.
multiple decision trees with the same nodes but every node
Over-loading is also a catalytic factor, which leads to
leads to a different leaf node. Random Forest in general is
insulation breakdown at an early stage. In this paper, the line
a bunch of decision trees with an average of all trees as their
to ground fault is taken as an experimenting factor for the
output. Here, the Random Forest Classifier is implemented
predictive models mostly because the majority of the faults
using sklearn.ensemble.Random Forest-Classifier module
occurring in transmission lines are line to ground in nature.
in anaconda-python IDE.
The physical damage to the conductor may be due to natural
reasons, which results in the contact of one of the three
phases with the ground.
Further, sections of this paper consist of simulation of
transmission line using MATLAB-SIMULINK in normal
conditions as well as in line to ground fault conditions to
generate specific datasets in CSV file format. Dataset acts as
an experience feature sets for the respective predictive
algorithms to generate an accuracy score and Root-Mean-
Square error value.
Fig. 2: Supervised Learning Model 3. System Modelling for Fault Datasheet

Generation
2. Transmission line fault
MATLAB environment has been used here for the purpose
2.1 Faults in transmission line of simulation modelling. The component libraries have been
provided by Simscape Electrical.
As discussed in previous sections, the use of machine Here a Simulink model is designed with all the simulink
learning techniques can very well enhance the overall blocks, which constitute a transmission line of a power
reliability of the power systems as it can precisely predict system model. A 400 KV transmission model is used to
the nature of fault occurring in the transmission line thereby develop the neural network models. The system consists of
helping utilities in fault detection, isolation, and clearance a generator of 11 KV located at the source end, an inductive
procedure within the minimum time possible.
load at the other end and a 3 phase fault simulator block with
a view to simulating faults is placed in between the
2.2 Causes of Fault
transmission line. A 200 Km overhead transmission line
Faults are unavoidable as well as random in occurrence. simulation block is used here.
In a faulty line the current rises remarkably then the
normal current. The waveform in Fig. 5 shows the
remarkable rise of current in the occurrence of single line to
ground fault in the transmission line.
Fig. 3: Transmission line model using Simulink
The model shown in Fig. 3 is modeled using MATLAB

(R2017b). This model is used with a view to obtaining the
datasets for training and testing purpose. In this model the
three phase V-I measurement block use is for measuring
different samples of voltage and current. The 1:1 division
of the transmission lines as line 1 & line 2 having 100 Km
long each. The 3-phase fault simulator simulates different Fig. 6: Snapshot of Testing Dataset
types of faults that are line to ground fault, line-to-line fault,
and three-phase fault in the power system model.
Fig. 4: Current waveform for no fault network
The waveform in Fig. 4 is the output of the simulation Fig. 7: Snapshot of Training Dataset
of the power system model in no fault condition.
The occurrences of single line to ground fault take place The numeric quantities of the 3 phase voltages Va, Vb, Vc
when one of the phases of the three-phase line gets short and currents Ia, Ib, Ic are fed after having been generated in
with the ground. At the time of occurrence of the fault, the both the normal and the faulty condition. Then the data is
impedance need not be zero but a very minute value in tabulated and exported as a CSV file from workspace.
accordance with the line impedance. A snapshot of a CSV sheet having the data in the normal
and faulty condition is shown in Fig. 6 and Fig. 7. The
training and testing datasets are given as zero signifies
healthy network and one signifies faulty network. The data
is subsequently fed into the machine-learning algorithm for
training.
3. Machine learning Algorithm Design and Accuracy

Count
The main objective of this paper is to thrive a machine

Fig. 5: Current waveform for faulty network learning based autonomous self-learning system that has the
capability of self-acquisition of knowledge in real time with
a little supervision.
In this paper, the evaluation of different algorithm is
done by the accuracy score and mean squared error, which
is mostly used having multi-labels, and the result is
measured in percentage.
Here the accuracy can be represented as
1 𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠 −1
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 (𝑦, 𝑦̂) = ∑𝑖=0 1(𝑦̂ = 𝑦𝑖 )
𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠
The accuracy is found by dividing the number of

matches by the number of samples. (b)
From the given list of y_predict and y_true, for sample Fig. 8: Predicted (a) and Testing (b) Labels of Decision
index value of ‘i’ is compared to find matches. Based upon Tree
the number of matches accuracy is calculated.
Here the root mean square error can be represented as The Fig. 8 shows the predicted and testing labels of
decision tree classifier plotted which a non-parametric
1 method of supervised learning is.
RMSE(X, Y) = √ ∑𝑛𝑖=1(𝑓(𝑋𝑖 ) − 𝑌𝑖 )2
𝑛 Prediction points for datasets in red line and all testing
data points represented in blue line. ‘0’ level points indicate
Root mean square error measures the average magnitude training and testing data sets for normal operating condition.
of the error as a square root of the average squared ‘1’ indicates training and testing datasets for fault
differences between prediction and actual observations. conditions. The training dataset is fed into decision tree
classifier and the testing dataset was predicted by the
3.1 Implementation of Decision Tree Classifier
classifier with accuracy up to 86.17%.
A sequence of test cases and different conditions is being
organized in a tree structure in the decision tree classifier
3.2 Implementation of Support Vector Machines
model and the classification takes place based on decision
rules. Support Vector Machines (SVM) is a supervised learning
algorithm, which fits the data in accordance to the classes
after finding a hyperplane and does a distinct classification
of data points.
(a) (a)
(b) (b)
Fig. 9: Predicted (a) and Testing (b) Labels of Fig. 10: Predicted (a) and Testing (b) Labels of KNN
SVM
The Fig. 10 shows the predicted and testing labels of K
The Fig. 9 shows the predicted and testing labels of Nearest Neighbor plotted by assignment of weights for the
support vector machine plotted after separation of data contributions of the neighbors, where the nearest neighbors
points in different classes by a hyperplane. has more contribution.
Prediction points for datasets in red line and all testing Prediction points for datasets in red line and all testing
data points represented in blue line. ‘0’ level points indicate data points represented in blue line. ‘0’ level points indicate
training and testing data sets for normal operating training and testing data sets for normal operating condition.
condition. ‘1’ indicates training and testing datasets for ‘1’ indicates training and testing datasets for fault
fault conditions. The training dataset is fed into support conditions. The training dataset is fed into K Nearest
vector machine classifier and the testing dataset was Neighbor classifier and the testing dataset was predicted by
predicted by the classifier with accuracy up to 75.94%. the classifier with accuracy up to 88.89%.
3.3 Implementation of K Nearest Neighbor 3.4 Implementation of Multi-Layer Perceptron
K Nearest Neighbor is a supervised lazy and non- Multi-Layer Perceptron (MLP) provides a mapping which is
parametric learning algorithm use for predictive problems nonlinear in midst of an input and an output vector and uses
classification having a class membership as its output, a nonlinear activation function. It employs a supervised
which uses the distance for classification. learning technique called backpropagation for training
purpose.
(a)
(a)
(b)
(b)
Fig. 12: Predicted (a) and Testing (b) Labels of Random
Fig. 11: Predicted (a) and Testing (b) Labels of MLP
forest Classifier
The Fig. 11 shows the predicted and testing labels of
Multi-Layer Perceptron plotted by utilizing nonlinear
The Fig. 12 shows the predicted and testing labels of Random
activation function and backpropagation for training.
forest Classifier plotted by getting the mean prediction of each
Prediction points for datasets in red line and all testing
of the trees.
training and testing data sets for normal operating
Prediction points for datasets in red line and all testing
condition. ‘1’ indicates training and testing datasets for
fault conditions. The training dataset is fed into Multi-Layer
training and testing data sets for normal operating condition.
Perceptron classifier and the testing dataset was predicted
‘1’ indicates training and testing datasets for fault
by the classifier with accuracy up to 78.53%.
conditions. The training dataset is fed into Random forest
Classifier and the testing dataset was predicted by the
3.5 Implementation of Random Forest Classifier
classifier with accuracy up to 85.55%.
Random forest Classifier is also a supervised learning
algorithm. It creates many decision trees, takes the 4. Analysis of Results
prediction value from each of them, and among them selects
The five algorithms, namely, Decision Tree Classifier,
the best result by voting.
Support Vector Machines classifier, K Nearest Neighbors
Classifier, Multi-Layer Perceptron and Random Forest
Classifier were implemented to the whole dataset by
splitting it into training and testing part. The comparison is
done based on the accuracy score where K Nearest
Neighbors gave the best accuracy, which is close to 89
percent, whereas Support Vector Machines did not perform
well producing accuracy close to 76 percent.
(a)
Table 1
Comparison of different Machine Learning Algorithm
Sl Algorithm Accuracy Root [2] P. P. Pattanaik and C. K. Panigrahi, "Stability and

No. (in percentage) Mean fault analysis in a power network considering
Square IEEE 14 bus system," 2018 2nd International
Error Conference on Inventive Systems and Control
1 Decision 86.1693 0.1383 (ICISC), Coimbatore, pp. 1134-1138, doi:
Tree 10.1109/ICISC.2018.8398981, 2018.
Classifier [3] S. Saha, M. Aldeen, C.P.Tan, “Fault detection in
2 Support 75.9362 0.2406 transmission networks of power systems,” Science
Vector Direct Electrical Power and Energy Systems 33, pp
Machine 887–900, 2011.
3 K-Nearest 88.8916 0.1111 [4] H.Singh, M.S. Sachdev, T.S. Sidhu “Design,
Neighbor Implementation and Testing of an Artificial Neural
4 Multi-Layer 78.5345 0.2146 Network Based Fault Direction Discriminator for
Perceptron protecting Transmission Lines,” IEEE
5 Random 85.5478 0.1445 Transactions on Power Delivery , Vol. 10, No. 2,
Forest 1995, pp 697-706.
Classifier [5] Abhijit A Dutta, A.K.Naidu & M.M.Rao 2011
“Intelligent control for locating fault in
transmission lines” International Journal of
5. Conclusion Instrumentation, Control & Automation (IJICA)
ISSN: 2231-1890 volume 1, Issue-2.
This paper provides a predictive model for the detection of [6] Thomas Dalstein, Brend Kulicke 1995, IEEE
Transaction on Power Delivery, volume 10, Issue-
faults in transmission lines. This predictive model uses
2, pp 1002-1011.
phase currents as input to the system of neural network. The [7] Rajveer Singh 2012, “Fault detection of electric
outcome of this predictive model provides a suitable power transmission line by using neural network”,
algorithm for the designing of a protective stratagem for Volume-02, Issue-12.
[8] S.N.Sivanandam, S.Sumathi, S.N.Deepa
transmission line based on the machine-learning algorithm. “Introduction of Neural Network Using MATLAB
Our method being reliable and feasible, modelling of 6.0” TMH Pbs.
[9] Ibrahim Farahat, Dept. of Electrical & computer
transmission line can be done. Support vector machines are
Science engineering, Concordia University,
supposed to perform well in small feature-sets, but this is Canada.
not always true. In cases where the dataset is not separable [10] An Adaptive k-Nearest Neighbor Algorithm, by
by a single curve, SVM will perform worse than other Shiliang Sun, Rongqing Huang S. Websper, R. W.
neural networks and by feeding more data, MLP will Dunn, R. K. Aggarwal, A. T. Johns and A.
perform naturally better than SVMs. From this, it is Bennett: Feature extraction methods for neural
concluded that dataset is not perfect curve separable, and network-based transmission line fault
discrimination.
forms small clusters in feature space where KNN often
[11] F.Zahra, B. Jeyasurya, J. E. Quaicoe: High-speed
gives results in clustered data as in this case. Transmission Line relaying using artificial neural
networks.
References [12] A.P.Vaidya, Prasad A. Venikar: ANN Based
Distance Protection of Long Transmission Lines
[1] Eisa Bashier M Tayeb 2013, “Neural network by Considering the Effect of Fault Resistance.
approach to fault classification for high speed
protective relaying” American Journal of
engineering research (AJER) volume 02, pp 69-
75.

Analysis and Comparison of Machine Learning Approaches For Transmission Line Fault Prediction in Power Systems

Uploaded by

Copyright:

Available Formats

Analysis and Comparison of Machine Learning Approaches For Transmission Line Fault Prediction in Power Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis and Comparison of Machine Learning Approaches For Transmission Line Fault Prediction in Power Systems

Uploaded by

Copyright:

Available Formats

Journal of Research in Engineering and Applied Sciences

ANALYSIS AND COMPARISON OF MACHINE LEARNING

1. Introduction kinds of fault in real-time to restore uninterrupted supply

Fig. 2: Supervised Learning Model 3. System Modelling for Fault Datasheet

Fig. 3: Transmission line model using Simulink

The model shown in Fig. 3 is modeled using MATLAB

Fig. 4: Current waveform for no fault network

3. Machine learning Algorithm Design and Accuracy

The main objective of this paper is to thrive a machine

The accuracy is found by dividing the number of

3.3 Implementation of K Nearest Neighbor 3.4 Implementation of Multi-Layer Perceptron

Sl Algorithm Accuracy Root [2] P. P. Pattanaik and C. K. Panigrahi, "Stability and

You might also like