Support Vector Machine Based Data Hacking Prediction Using PMU Data
Support Vector Machine Based Data Hacking Prediction Using PMU Data
Abstract:- As global reliance on power systems grows Protocol (BGP), which can allow attackers to reroute data
due to increasing energy demands and modern packets to unintended destinations [1]. To address these
consumption patterns, maintaining the stability and risks, implementing a unique network architecture, despite
reliability of the power grid has become crucial. Power its cost, is crucial. Additionally, enforcing mandatory
systems are complex and nonlinear, and their operations updates for default passwords can help prevent unauthorized
are continuously evolving, making it difficult and access.To counter these security challenges, several methods
expensive to ensure stability. Traditionally, power have been proposed. Principal Component Analysis (PCA)
systems are designed to handle a single outage at a time. and Support Vector Machines (SVM) can be used to
However, recent years have seen several significant identify fraudulent data entries. A data- driven approach
blackouts, each originating from a single failure, which utilizing spatiotemporal relationships in PMU measurements
have been extensively reported. These reports are vital has been suggested to differentiate between real and fake
for mitigating operational risks by strengthening systems power grid events [2]. Enhancing security through bit
against identified high-risk scenarios. While extensive masking has been proposed to ensure data integrity and
research has been conducted on these blackouts, cyber- confidentiality [3]. Developing a cybersecurity research
attacks introduce a new dimension of risk. The advent of simulation testbed within the PMU's allotted time frame has
Phasor Measurement Units (PMUs) has enabled progressed. The simulation application was created by the
centralized monitoring of power system data, allowing University of Illinois at Urbana-Champaign and is both
for more effective fault and cyber-attack detection.This interactive and extensible. There are three customizable
paper proposes a machine learning-based approach to simulators included in this package: a PMU, a PDC, and a
detecting cyber-attacks using PMU data. Given the control center. Moreover, artificial neural networks (ANN)
complexity and volume of power system data, traditional have been widely renowned as a highly utilized method for
mathematical and statistical methods are challenging to classification and prediction, in addition to the previously
implement. Instead, a Support Vector Classification mentioned methodologies[4]. The ANN model can be
(SVC) algorithm is used for binary classification, represented as either a simple feed forward neural network
distinguishing between 'attack' and 'normal' states. The (FNN) or a more intricate deep neural network (DNN)[5].
algorithm is trained on PMU data and evaluated using Their model can be obtained by solving an optimization
metrics such as the AUC-ROC curve and confusion issue, which can be efficiently tackled utilizing various local
matrix, achieving an 82% AUC- ROC score, and global methods such as gradient-based search
demonstrating its effectiveness in identifying cyber- techniques [6], genetic methods [7], and others.
attacks. Unsupervised learning (UL) refers to the extraction of
significant patterns from unlabeled data. This process entails
Keywords:- Cyber Attack; Support Vector Machine; AUC- extracting pertinent attributes, classifications, and
ROC; Support Vector Classification. frameworks straight from the unprocessed data, without any
manual intervention such as labeling or input
I. INTRODUCTION
Artificial neural networks (ANNs), including both
The data transmitted from Phasor Measurement Units simple feedforward neural networks (FNNs) and more
(PMUs) to Phasor Data Concentrators (PDCs) can be easily complex deep neural networks (DNNs), are widely used for
accessed and modified, posing significant security risks. classification and prediction. Optimization techniques such
Although previous attacks have been confined to local area as gradient-based searches and genetic algorithms are
networks (LANs), similar vulnerabilities can be exploited employed to refine ANN models. Unsupervised learning
over wide area networks (WANs) such as the Internet. (UL) methods like Isolation Forests (IF) and Autoencoders
Research has highlighted weaknesses in the Border Gateway (AE) are used to detect anomalies such as false data injection
attacks (FDIA) and Denial of Service (DoS) attacks [8], [9], Data preprocessing involves several automated steps,
[10], [11]. Dynamic Bayesian Networks (DBN) are also including anomaly detection, data cleaning, and the
utilized for attack detection [12]. Semi- supervised learning organization of data into balanced and unbalanced datasets.
(SSL) combines labeled and unlabeled data to enhance This process establishes the framework for the fault
detection capabilities. Techniques like semi- supervised prediction model. Automated procedures address data
adversarial autoencoders (SSAA) and generative- impurity and missing values, with mean values used to
adversarial frameworks are proposed for improved FDIA replace missing entries. Given the critical role of fault
detection, with new models such as SS-deep-ID and robust prediction in electrical systems, ensuring the reliability of
semi-supervised prototypical networks (RSSPN) offering the prediction algorithm is paramount.
advanced detection methods (References [13], [14], [15],
[16]. To handle large volumes of data effectively, the method
must offer strong generalization and utilize highly orthogonal
II. METHODOLOGY inputs.
III. RESULTS AND DISCUSSION The performance of the cyber attack detection
implementation is found to be satisfactory with 0.82 as the
To generate box plots for the first 14 columns of area under the curve. It is a measure of how many correct
numerical data from a dataset containing Phasor Measurement classification can happen in the machine learning algorithm.
Unit (PMU) data the sea born library from python is used and It infers that above 80% of the classification is correct. On
they are shown as follows. Since it is a classification further tuning the algorithm the performance can be improved.
algorithm the amount of majority and minority class has to From the analysis thus developed the performance metrics is
be checked whether it is balanced or imbalanced . The class as given in the table 2.
distribution graph for the PMU considered is as given in the
Figure 1 Table 2 Performance Metrics
Accuracy 0.76
Precision 0.69
Recall (Sensitivity): 0.94
F1 Score 0.80
Specificity 0.58
IV. CONCLUSION