Analysis and Comparison of Machine Learning Approaches For Transmission Line Fault Prediction in Power Systems
Analysis and Comparison of Machine Learning Approaches For Transmission Line Fault Prediction in Power Systems
Analysis and Comparison of Machine Learning Approaches For Transmission Line Fault Prediction in Power Systems
Abstract
The transmission lines suffer from various faults subjected to numerous natural as well as manmade causes. This paper presents
a proposed MATLAB-SIMULINK model for generation of such random disturbances. The output of the system is input to
another python-based model in order to detect and predict the exact nature of disturbances using various artificial neural networks
with their respective accuracy scores. This paper provides a brief comparison between Decision Tree Classifier, Random Forest
Classifier, Support Vector Machines, K-Nearest Neighbors and Multi-Layer Perceptron methodologies for detection of line to
ground fault, as an example in this model-based approach.
Key Words – Transmission Line Faults, K-Nearest Neighbors, Multi-Layer Perceptron, Support Vector Machines,
Decision Tree, Random Forest
ISSN (Print): 2456-6411 | ISSN (Online): 2456-6403 25 JREAS, Vol. 06, Issue 01, Jan 2021
iterative weights in a feed-forward neural network, so that Among all the power system equipment, transmission line
after training and testing, the MLP captures the inherent is most exposed to environment. Hence, the transmission
characteristics of the training data and can act as a non- line is more prone to faults compared to any other equipment
linear model of the actual system, in this case, a fault that affects its stability and operating limits. The parameters
classifier. In this paper, MLP is used to separate non- of the transmission line that vary during the fault conditions
linearly separable data using a non-linear activation are voltage, current, and impedance value of the line. At the
function using sklearn.neural_network running on safe operating limits, transmission line carries the rated
anaconda-python IDE. voltage and current. As a fault occurs in the line the value of
voltage and current deviate from their nominal values. These
1.2.4. Decision Tree (DT)
values follow a specific pattern depicting the fault nature
In this method, a supervised and non-parametric method is when compared to the standard operating waveforms of
used to classify feature sets and is based on a decision tree voltage and current.
rule traversing to multiple nodes. In this paper, the decision Some major causes of faults include open-circuit fault
tree is imported using sklearn.tree module in anaconda- and short circuit fault. Open-circuit fault results from the
python IDE for experimentation. uneven breakage of the conductors or false opening of circuit
1.2.5. Random Forest (RF) breakers. Short-circuit fault occurs due to the physical
breakage of a transmission line or due to the loss of
RF is a supervised learning technique that consists of
insulation on the line or due to improper installation.
multiple decision trees with the same nodes but every node
Over-loading is also a catalytic factor, which leads to
leads to a different leaf node. Random Forest in general is
insulation breakdown at an early stage. In this paper, the line
a bunch of decision trees with an average of all trees as their
to ground fault is taken as an experimenting factor for the
output. Here, the Random Forest Classifier is implemented
predictive models mostly because the majority of the faults
using sklearn.ensemble.Random Forest-Classifier module
occurring in transmission lines are line to ground in nature.
in anaconda-python IDE.
The physical damage to the conductor may be due to natural
reasons, which results in the contact of one of the three
phases with the ground.
Further, sections of this paper consist of simulation of
transmission line using MATLAB-SIMULINK in normal
conditions as well as in line to ground fault conditions to
generate specific datasets in CSV file format. Dataset acts as
an experience feature sets for the respective predictive
algorithms to generate an accuracy score and Root-Mean-
Square error value.
ISSN (Print): 2456-6411 | ISSN (Online): 2456-6403 26 JREAS, Vol. 06, Issue 01, Jan 2021
In a faulty line the current rises remarkably then the
normal current. The waveform in Fig. 5 shows the
remarkable rise of current in the occurrence of single line to
ground fault in the transmission line.
The waveform in Fig. 4 is the output of the simulation Fig. 7: Snapshot of Training Dataset
of the power system model in no fault condition.
The occurrences of single line to ground fault take place The numeric quantities of the 3 phase voltages Va, Vb, Vc
when one of the phases of the three-phase line gets short and currents Ia, Ib, Ic are fed after having been generated in
with the ground. At the time of occurrence of the fault, the both the normal and the faulty condition. Then the data is
impedance need not be zero but a very minute value in tabulated and exported as a CSV file from workspace.
accordance with the line impedance. A snapshot of a CSV sheet having the data in the normal
and faulty condition is shown in Fig. 6 and Fig. 7. The
training and testing datasets are given as zero signifies
healthy network and one signifies faulty network. The data
is subsequently fed into the machine-learning algorithm for
training.
ISSN (Print): 2456-6411 | ISSN (Online): 2456-6403 27 JREAS, Vol. 06, Issue 01, Jan 2021
capability of self-acquisition of knowledge in real time with
a little supervision.
In this paper, the evaluation of different algorithm is
done by the accuracy score and mean squared error, which
is mostly used having multi-labels, and the result is
measured in percentage.
Here the accuracy can be represented as
1 𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠 −1
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 (𝑦, 𝑦̂) = ∑𝑖=0 1(𝑦̂ = 𝑦𝑖 )
𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠
(a) (a)
ISSN (Print): 2456-6411 | ISSN (Online): 2456-6403 28 JREAS, Vol. 06, Issue 01, Jan 2021
(b) (b)
Fig. 9: Predicted (a) and Testing (b) Labels of Fig. 10: Predicted (a) and Testing (b) Labels of KNN
SVM
The Fig. 10 shows the predicted and testing labels of K
The Fig. 9 shows the predicted and testing labels of Nearest Neighbor plotted by assignment of weights for the
support vector machine plotted after separation of data contributions of the neighbors, where the nearest neighbors
points in different classes by a hyperplane. has more contribution.
Prediction points for datasets in red line and all testing Prediction points for datasets in red line and all testing
data points represented in blue line. ‘0’ level points indicate data points represented in blue line. ‘0’ level points indicate
training and testing data sets for normal operating training and testing data sets for normal operating condition.
condition. ‘1’ indicates training and testing datasets for ‘1’ indicates training and testing datasets for fault
fault conditions. The training dataset is fed into support conditions. The training dataset is fed into K Nearest
vector machine classifier and the testing dataset was Neighbor classifier and the testing dataset was predicted by
predicted by the classifier with accuracy up to 75.94%. the classifier with accuracy up to 88.89%.
K Nearest Neighbor is a supervised lazy and non- Multi-Layer Perceptron (MLP) provides a mapping which is
parametric learning algorithm use for predictive problems nonlinear in midst of an input and an output vector and uses
classification having a class membership as its output, a nonlinear activation function. It employs a supervised
which uses the distance for classification. learning technique called backpropagation for training
purpose.
(a)
(a)
ISSN (Print): 2456-6411 | ISSN (Online): 2456-6403 29 JREAS, Vol. 06, Issue 01, Jan 2021
(b)
(b)
Fig. 12: Predicted (a) and Testing (b) Labels of Random
Fig. 11: Predicted (a) and Testing (b) Labels of MLP
forest Classifier
The Fig. 11 shows the predicted and testing labels of
Multi-Layer Perceptron plotted by utilizing nonlinear
The Fig. 12 shows the predicted and testing labels of Random
activation function and backpropagation for training.
forest Classifier plotted by getting the mean prediction of each
Prediction points for datasets in red line and all testing
of the trees.
data points represented in blue line. ‘0’ level points indicate
training and testing data sets for normal operating
Prediction points for datasets in red line and all testing
condition. ‘1’ indicates training and testing datasets for
data points represented in blue line. ‘0’ level points indicate
fault conditions. The training dataset is fed into Multi-Layer
training and testing data sets for normal operating condition.
Perceptron classifier and the testing dataset was predicted
‘1’ indicates training and testing datasets for fault
by the classifier with accuracy up to 78.53%.
conditions. The training dataset is fed into Random forest
Classifier and the testing dataset was predicted by the
3.5 Implementation of Random Forest Classifier
classifier with accuracy up to 85.55%.
Random forest Classifier is also a supervised learning
algorithm. It creates many decision trees, takes the 4. Analysis of Results
prediction value from each of them, and among them selects
The five algorithms, namely, Decision Tree Classifier,
the best result by voting.
Support Vector Machines classifier, K Nearest Neighbors
Classifier, Multi-Layer Perceptron and Random Forest
Classifier were implemented to the whole dataset by
splitting it into training and testing part. The comparison is
done based on the accuracy score where K Nearest
Neighbors gave the best accuracy, which is close to 89
percent, whereas Support Vector Machines did not perform
well producing accuracy close to 76 percent.
(a)
ISSN (Print): 2456-6411 | ISSN (Online): 2456-6403 30 JREAS, Vol. 06, Issue 01, Jan 2021
Table 1
Comparison of different Machine Learning Algorithm
ISSN (Print): 2456-6411 | ISSN (Online): 2456-6403 31 JREAS, Vol. 06, Issue 01, Jan 2021