Received August 24, 2021, accepted November 4, 2021, date of publication November 29, 2021,
date of current version December 17, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3131274
Data-Driven Correlation of Cyber and Physical
Anomalies for Holistic System Health Monitoring
DANIEL L. MARINO 1 , (Graduate Student Member, IEEE),
CHATHURIKA S. WICKRAMASINGHE 1 , (Member, IEEE), BILLY TSOUVALAS 1 ,
CRAIG RIEGER 2 , (Senior Member, IEEE), AND MILOS MANIC 1 , (Fellow, IEEE)
1 Virginia
2 Idaho
Commonwealth University, Richmond, VA 23220, USA
National Laboratory, Idaho Falls, ID 83415, USA
Corresponding author: Daniel L. Marino (marinodl@vcu.edu)
This work was supported in part by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) through
the Solar Energy Technology Office under Award DE-0008775, in part by the Department of Energy through the U.S. DOE Idaho
Operations Office under Contract DE-AC07-05ID14517, in part by the Resilient Control and Instrumentation Systems (ReCIS) Program of
Idaho National Laboratory, and in part by the Commonwealth Cyber Initiative, an Investment in the Advancement of Cyber Research and
Development, Innovation and Workforce Development (cyberinitiative.org).
ABSTRACT Concerns of cyber-security threats are increasingly becoming a part of everyday operations
of cyber-physical systems, especially in the context of critical infrastructures. However, despite the tight
integration of cyber and physical components in modern critical infrastructures, the monitoring of cyber
and physical subsystems is still done separately. For successful health monitoring of such systems, a holistic
approach is needed. This paper presents an approach for holistic health monitoring of cyber-physical systems
based on cyber and physical anomaly detection and correlation. We provide a data-driven approach for the
detection of cyber and physical anomalies based on machine learning. The benefits of the presented approach
are: 1) integrated architecture that supports the acquisition and real-time analysis of both cyber and physical
data; 2) a metric for holistic health monitoring that allows for differentiation between physical faults, cyber
intrusion, and cyber-physical attacks. We present experimental analysis on a power-grid use case using the
IEEE-33 bus model. The system was tested on several types of attacks such as network scan, Denial of
Service (DOS), and malicious command injections.
INDEX TERMS Anomaly detection, cyber-physical systems, system health monitoring, cyber and physical
anomalies.
I. INTRODUCTION
Cyber-Physical Systems (CPSs) are nowadays ubiquitous in
the core of mission-critical infrastructure due to their significant competitive advantages, such as adaptability, scalability, and usability [1]. Such systems typically take the
form of a collection of interconnected physical and computing resources to accomplish a specific task. They integrate computational resources, communication, control, and
physical processes into a single system [2], [3]. Although
the integration of cyber and physical systems has lead to
improved efficiency, this integration makes CPS vulnerable to cyber-security threats, leading to a degradation of
resiliency [4]. In order to address this issue, maintaining
The associate editor coordinating the review of this manuscript and
approving it for publication was Michael Lyu.
163138
situational awareness becomes essential in the effort to ensure
both efficiency and resiliency.
Maintaining situational awareness is critical for successful
decision-making [5], [6]. Providing relevant and well-timed
information to domain experts and system users is essential
to understand the state of the system [7]. To this end, health
monitoring of CPSs aims to provide a human-recognizable
measure of misbehavior of a system caused by internal
anomalies and external intrusions [8]. Health monitoring
provides a compelling approach to inform operators of the
presence and type of anomalies, serving as support for the
execution of well-informed decisions to recover from disturbances (benign and malicious). Given the interdisciplinary
nature of CPSs, characterizing the health of such systems
requires holistic monitoring of both cyber and physical components. However, despite the tight integration of cyber and
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
physical components in CPSs, monitoring of these systems
is traditionally performed by monitoring cyber and physical
subsystems independently. Currently, there is limited work on
the development of a metric that characterizes the health of a
CPS by considering both cyber and physical components in
the same context.
This paper presents a data-driven approach for cyberphysical health monitoring. Integrating data from physical
as well as cyber elements allows for a comprehensive and
holistic assessment of the health of the CPS. We employ
data-driven Anomaly Detection Systems (ADSs), which
combine anomaly detection on a physical and cyber level.
The physical ADS performs analysis on data acquired by
sensors in the physical system, while the cyber ADS employs
cyber-sensors to capture and analyze network packets in realtime. Three unsupervised algorithms were used for comparative analysis: one-class support vector machines (OCSVMs),
Local Outlier Factor (LOF), and Autoencoders (AEs).
Cyber-physical health monitoring is a complex process
that must be performed accurately in real-time, minimizing the time needed to detect and restore the system to a
healthy state [9]. Regarding the state-of-the-art, there is limited related work that focuses on holistic real-time monitoring
of CPS health, where both cyber and physical components
are evaluated together in real-time [10]. Even though the
importance of integrating the two is underlined in previous efforts, existing literature relevant to anomaly detection
in CPSs mainly considers cyber and physical data separately [11]–[13]. Some efforts perform anomaly detection by
using the physical process dynamics [10]. Certain attempts
establish correlation schemes between cyber and physical
data using Bayesian networks for anomaly detection and
root-cause isolation [14]. In [15], a cross-correlation between
anomaly detection on network flow data and parsed Supervisory Control and Data Acquisition (SCADA) log files is
used to increase the confidence of security alerts. Unlike
similar efforts that examine the temporal correlation of different anomaly detection outputs [15], we present a health
monitoring metric that indicates the source of the disturbance
(cyber or physical). The metric serves as a holistic health
monitoring mechanism that provides a human-recognizable
measure of misbehavior caused either by internal anomalies or external intrusions. The metric is presented along a
full pipeline to acquire cyber and physical data for analysis
using ADSs. Furthermore, we provide a comparative analysis
of anomaly detection algorithms for distinctions between
normal, physical faults, cyber intrusions, and cyber-physical
attack scenarios.
Contributions of the Presented Approach:
• We present an integrated cyber and physical data-driven
architecture that performs data acquisition, management, and analysis of both cyber and physical
data.
• We introduce a health monitoring metric based on the
correlation of cyber and physical Anomaly Detection
Systems (ADS).
VOLUME 9, 2021
We validate the presented approach by evaluating its capability to distinguish between physical faults, cyber intrusion, and cyber-physical attacks. We evaluate the presented
metric using different combinations of three machine learning algorithms: One-Class-SVM, Local Outlier Factor, and
Autoencoders.
The rest of the paper is organized as follows: Section II
presents the related work; Section III presents the proposed
data-driven cyber physical health monitoring; Section IV discusses the experiments and results; and finally, Section V
presents the conclusions of the paper.
II. RELATED WORK
CPSs are a prime target for many cyber-attack vectors such
as Denial-of-Service (DOS), data injection, and interception
schemes [3], [16]. Anomaly Detection Systems (ADSs) are
becoming a common component of CPSs which are implemented to detect anomalies in CPS networks [17]. These
anomalies may be signs that an intruder is attempting unauthorized access to the system [18]. ADSs are implemented to
detect cyber-attacks, and intrusion attempts are referred to as
anomaly-based Intrusion Detection Systems (IDS) [18]. Due
to the complex nature of CPSs, various ADS strategies have
been exploited in recent literature [3], [10], [15].
Some efforts have attempted to exploit the procedural
constraints of CPS to detect anomalies and to identify
specific cyber attacks. For example, in [19], researchers
have employed pattern matching methodologies over specific
communication protocols to build ADSs. The network traffic
characteristics and their importance have been used in [20] to
detect anomalies. Some other efforts have been implemented
in a more attack-centric manner where the evaluation of a
systems cybersecurity by investigating specific attacks such
as malware attacks [21]–[23], attacks on communication protocols [24]–[26], DOS attacks [23], [27], Man-In-the-Middle
attacks [28], false sequential [29], data or code injection
attacks [30], and other integrity attacks [31]. Some recent
attempts have used artificial neural network based approaches
to build ADSs. In [32], researchers have trained Autoencoders (AEs) and Generative Adversarial Networks (GANs)
using GAF images to identify and detect anomalies in system components. In [33], authors have converted time-series
data from sensors into time-frequency images for detecting
anomalies using Convolutional Neural Networks (CNN).
Proposed solutions to recognizing anomalies have also
been targeted towards Cyber-Physical System Health Monitoring and Management (HMM) systems [34]. They consist
of ADS implementations that employ different types of modern Machine Learning and Neural Network methodologies to
perform real-time health monitoring of CPSs. Some efforts
identify the faulty components by implementing a Fault Signature Matrix (FSM), which associates the sensors and target
system components with the rules that describe the normal
behavior of the system [34]. Other system health implementations, such as Hackmann et al. [35], propose a Structural
Health Monitoring (SHM) system that focuses on structural
163139
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
deficiencies (environmental corrosion, persistent traffic, and
wind loading) that occur during the lifetime of CPSs. Other
approaches use the dynamic constraints of physical systems
to detect anomalous behavior in physical data [10].
Cross-correlation of several anomaly detection systems has
been recognized in the literature as an approach to provide
more accurate detection of anomalies in CPSs [15]. In [15],
a combination of Autoencoders and parsers are used to detect
anomalies by analyzing network flows and SCADA log files.
In [14], Bayesian networks are used to identify anomalous
behavior using network events and physical data. In [3], physical data is extracted by parsing captured packet data. One
Class SVMs and PCA are used to perform anomaly detection
on cyber and physical data. The detection is performed in
parallel, but results are presented independently.
In our approach, we used three machine learning
algorithms, namely One-Class Support Vector Machines
(OCSVMs), Local Outlier Factor (LOFs), and Autoencoders
(AEs). These algorithms were selected as they represent a set
of frequently used unsupervised machine learning algorithms
for anomaly detection in the recent literature [36]–[40]. Since
these algorithms are unsupervised machine learning algorithms, they do not require any labeled data to train these
algorithms. Many real-world systems bring the challenge of
collecting unlabeled data because the data labeling process is
very expensive [41]. In addition, it is a time-consuming task
that requires domain experts to manually analyze data [42].
Further, Cyber-physical Systems generate large volumes of
data rapidly, making the labeling process inefficient and
impractical. Therefore, these unsupervised algorithms are a
viable solution for benefiting from the abundance of unlabelled data generated in these CPSs. Below, a brief description of the anomaly detection algorithms is presented:
• One-class Support Vector Machines: OCSVMs are
widely used unsupervised machine learning algorithms
for anomaly detection. They are trained using the
system’s normal behavior, and any unseen behavior
is identified as an anomaly or an attack. OCSVMs
are extensions of Support Vector Machines (SVMs)
[43], [44]. They can learn a decision boundary of a single
class. In the case of anomaly detection in CPS, they learn
the decision boundary of the normal behavior class. Any
behavior which is different from the learned normal
behavior will be detected as an outlier [45], [46]. There
have been several proposed solutions and enhancements
based on OSVMs [45], [47], [48].
• Local Outlier Factor: The Local Outlier Factor (LOF)
algorithm is a clustering-based unsupervised anomaly
detection method that computes the local density deviation of a given data point with respect to its neighbors [49]. It identifies outliers or anomalies as data
records that have a significantly lower density compared
to its neighbor data points [50]. This has been widely
employed for anomaly detection in CPSs [51]–[53].
• Autoencoder: Autoencoders (AEs) are widely used neural network architectures which have the capability to
163140
learn the encoding of input data. The architecture of
AEs consists of two parts: encoder and decoder. The
encoder converts input data into an abstract representation which is then reconstructed using the decoder
[16], [42]. In anomaly detection, the difference between
input and the reconstruction indicates whether the data
record is an anomaly or not. AEs have been successfully
used in CPSs for malicious code detection, malware
detection, and anomaly detection [16], [54].
III. PRESENTED DATA-DRIVEN CYBER-PHYSICAL
HEALTH MONITORING
This section describes the presented approach for data-driven
cyber-physical health monitoring. The presented approach
monitors cyber and physical data in order to identify internal
anomalies and external intrusions through the use of anomaly
detection algorithms. The information is consolidated in a
metric that uses the correlation of cyber and physical anomalies as the basis for the characterization of the health of the
system.
The overall architecture is presented in Figure 1. The architecture provides support for managing data streams of cyber
and physical data through a publisher/subscriber model. The
approach consolidates data coming from different sensors,
communication media, and protocols. Physical data is collected from measurements obtained by field devices and
communicated to a master using industrial protocols. Cyber
data is provided by a cyber-sensor that collects and analyzes
FIGURE 1. Architecture of presented data-driven cyber-physical health
monitoring.
VOLUME 9, 2021
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
packet data. Cyber and physical data streams are managed
by Kafka, which collects, stores, and serves the data to any
process that requires it.
Kafka was chosen as the publisher/subscriber architecture
to manage cyber and physical data. Kafka is an open-source
distributed streaming platform based on the publish/subscribe
architecture. The main benefits of Kafka included high scalability, fault tolerance, and support to most common programming languages. DNP3 was chosen as the industrial
protocol for collecting physical data. The protocol supports
communication between different field devices and master
stations in SCADA systems via TCP/IP encapsulation. In the
presented application, the master in Figure 1 uses the DNP3
protocol to collect data from DNP3 outstations located in the
field devices. To the best of our knowledge, the presented
framework can be adapted to use any other industrial protocols, but DNP3 was chosen for demonstration purposes.
DNP3 is one of the primary industrial protocols for SCADA
systems, and it is commonly found in electric power grid
systems [55], [56].
In order to evaluate the presence of internal anomalies and
external intrusion, the cyber and physical data is analyzed
using data-driven Anomaly Detection Systems (ADSs). Physical ADS analyzes the data provided by the field devices.
Cyber ADS analyzes the packet data collected by the cyber
sensor. The result from the physical ADS and the cyber ADS
is fed into a cyber-physical metric that provides a quantitative
value of the health of the system. The following sections
describe the cyber sensor, anomaly detection, and cyberphysical metric.
A. CYBER ANOMALY DETECTION
A cyber sensor was designed and implemented to capture and
analyze packet data in real-time in order to detect anomalous
behavior. As presented in Figure 1, the cyber-sensor is connected to a network switch to monitor the communication
between devices in the network. The sensor uses Scapy to
capture and analyze the network traffic. The sensor is connected to a switch port analyzer (SPAN port). All incoming
and outgoing communication passing through the switch is
mirrored to the SPAN port, allowing the cyber-sensor to
have access to all packets communicated through the switch.
The data acquired through the cyber-sensor is processed in
a multi-processing pipeline which is presented in Figure 2.
A rolling window of one second is used to analyze sections
of data in the communication. To increase the throughput,
TCP/UDP packet dissection is performed in parallel. Packet
level and window level features are extracted from dissected
TCP/UDP packets to train data-driven anomaly detection
algorithms. PCAP data is stored for further analysis. Alarm
notifications are delivered to the Kafka database, which is
used to generate notification alerts.
The anomaly detection system of the cyber-sensor was
developed to identify any anomalous behaviors in the communication network. To achieve that, the system’s normal
behavior is learned by the machine learning algorithm so that
VOLUME 9, 2021
FIGURE 2. Cyber-sensor architecture.
any behaviors that are different from previously seen data are
flagged as anomalies. The anomalous events are sent to the
Kafka database for generating alert notifications.
To train the machine learning models, a set of features were
extracted from the stream of raw packet data. The machine
learning models are trained using only data from normal
operations. As presented in Figure 2, raw data is grouped in
windows of one second. TCP/UDP packets are then dissected
in parallel. A set of packet features is obtained from the
dissection of TCP/UDP packet headers. Then, packet features
within each window are used to obtain a set of window
features that capture a series of statistical behaviors within the
one-second window. Appendix presents the window features
extracted from the packet data. During training, the window
features characterize the behavior of communications and are
used to define the baseline behavior of the system. During
deployment, the extracted features are fed into the trained
machine learning algorithms (OCSVMs, LOFs, or AEs) for
detecting anomalies.
B. PHYSICAL ANOMALY DETECTION
Physical anomaly detection analyzes the physical data collected from a distributed network of sensors to identify situations when the physical state of the system has considerably
deviated from expected behavior. Physical data include voltage, current, and power measurements. Data is measured by
field devices and sent to a master using the DNP3 protocol.
Each field device hosts a DNP3 outstation that serves the
measured data. The master collects the data from all DNP3
outstations and publishes it to a Kafka topic. Once the data is
located in Kafka, any process with access to the Kafka broker
can read the data. In particular, the presented approach uses a
process to store the data for offline analysis while a different
process analyzes the data in real-time.
163141
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
The physical ADS accesses the physical data by subscribing to the DNP3 Kafka topic. The physical ADS uses one
of the previously discussed anomaly detection systems: LOF,
OCSVM, or Autoencoders. The physical ADS analyzes all
the data collected from the DNP3 outstations. It uses the
data directly as it is provided, which in this case consists
of voltages, currents, and power measurements. The only
pre-processing performed on the data is normalization to zero
mean and unit variance when using the Autoencoder model.
The ADS is trained exclusively with data recorded during
the normal operation of the system. Deviations from the
expected normal behavior are flagged by the physical ADS
and reported to the cyber-physical metric (see Fig. 1).
C. METRIC FOR CYBER-PHYSICAL HEALTH MONITORING
The output of the cyber and physical ADS is used to construct
a metric that informs the operator of the cyber-physical health
of the system. The objective of the metric is to provide an
intuitive set of numbers in the range 0-1 that informs an
operator of the cyber and physical state of the system. The
cyber-physical metric consists of a tuple of two elements. The
first element indicates the cyber health. The second element
indicates the physical health of the system. A tuple (0,0)
indicates normal operation while (1, 1) indicates a cyber
threat and a physical fault. Together, they provide a holistic
view of the state of the system.
As previously mentioned, the CPS metric (Mcps ∈ R2 )
consists of a cyber (Mc ∈ R) and a physical component
(Mp ∈ R):
Mcps = (Mc , Mp )
(1)
each component is extracted by filtering the output of the
cyber ADS and the physical ADS. The cyber component is
computed as follows:
Mc = σ kc (ATc wc ) + bc
(2)
physical ADS and a different set of weights:
Mp = σ kp (ATp wp ) + bp
(4)
where Mp ∈ R, wp ∈ RN , kp ∈ R, bp ∈ R. Ap ∈ RN is
the vector that contains the output of the physical ADS for a
window of N seconds.
The parameters of the metric are obtained by minimizing
the cross entropy between the output of the metric and a set of
labeled data used for tuning: Given a dataset of labels Y and a
set of vectors A representing the output of the cyber ADS and
physical ADS, the cross-entropy loss is defined as follows:
D
L (Y , A) = −
1X
H Yi , Mcps (Ai )
D
i=1
T
H (y, Mcps ) = y log Mcps + (1 − y)T log 1 − Mcps
where the label y ∈ R2 is a tuple of two elements (cyber and
physical respectively) where 0 indicates normal behavior and
1 abnormal behavior. D ∈ R is the number of samples in the
training dataset. Yi and Ai are the ith sample from the dataset,
where Yi ∈ R2 . Ai is the set of cyber and physical anomaly
vectors (Ac , Ap ) used to compute the metric Mcps , where Ac ∈
RN and Ap ∈ RN . The parameters obtained by minimizing
the cross-entropy include the weights of the weighed average
(wc , wp ), the sensitivity of the sigmoids (kc , kp ), and the shift
of the sigmoids (bc , bp ). The minimization is performed using
stochastic gradient descent (SGD). A softmax is used in order
to ensure that weights (wc , wp ) meet the constraints of a
weighted average. This results in a parameterization of the
weights as wc = Softmax(ŵc ) and wp = Softmax(wˆp ),
where (ŵc , wˆp ) are a set of free parameters that can be directly
optimized with SGD. This parameterization ensures that the
elements of the weights (wc , wp ) are in the range of 0-1 and
the sum is equal to 1. Figure 3 shows the overview of the
metric calculation.
where:
• Ac ∈ RN is a vector that contains the output of the cyber
ADS for a window of N seconds.
• wc ∈ RN are a set of weights that perform a weighted
average of the elements of Ac . Hence, elements of wc
are between 0-1 and sum up to 1.
• σ is the sigmoid function:
σ (x) =
1
1 + exp−x
(3)
We use the sigmoid function in order to ensure the output
of the cyber component (Mc ) is constrained to the range
of 0-1.
• kc ∈ R and bc ∈ R are parameters used to control
the sensitivity and the activation position of the sigmoid
function.
The physical component is computed using the same
approach as the cyber component, but using the output of the
163142
FIGURE 3. Cyber-Physical health metric.
IV. EXPERIMENTS
This section presents the experimental procedure and results
for cyber-physical health monitoring. For experimental
VOLUME 9, 2021
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
evaluation, we chose the IEEE 33 bus distribution system,
shown in Figure 4. The original version of the IEEE 33 bus
distribution system was proposed by Baran & Wu [57].
The IEEE 33 Bus system is a generic model which facilitates customization for more specific studies. It consists of
33 buses and 32 lines and has a voltage of 12.66kV, load
size of 3.715MW, and 2.3MVar [58]. For our study, the
33 bus model is divided into six ASRs (Aggregated System
Resources [59]), which are a logically grouped set of assets
shown in Figure 4. The ASRs are connected by lines with
breakers that provide protection in case of voltage unbalance
or over-current.
unbalances, and low voltage. Line outstations also open/close
a breaker when commanded by a remote master. All outstations communicate sensor and control data using DNP3.
For experimental analysis, the physical model is simulated
using Simulink, while the communication network is emulated using Mininet [60].
In order to test the presented cyber-physical health monitoring approach, we consider the following scenarios for
experimentation and analysis:
•
•
•
•
FIGURE 4. IEEE 33 bus model.
Normal: Under this scenario, the system performs
under normal operating conditions. The physical devices
exhibit normal operating behavior, and the collection of
data leads to the establishment of the expected behavior
baseline for anomaly detection. All cyber communication follows the normal behavior pattern.
Physical fault: Under this scenario, the normal operating
behavior of physical devices is interrupted due to a fault
in the physical system. For the experiments, we simulate
line faults (e.g., line-to-ground fault) that trip the protection breakers causing loss of power.
Cyber intrusion: Under this scenario, the normal cyber
communication of the system is disrupted due to various
cyber-attacks. We executed a series of cyber attack scenarios such as IP scan, ping sweep, port scan, and DOS
flood. Physical behavior is not affected during these
attacks.
Cyber-physical attack: In this scenario, a cyber attack is
executed to disrupt the normal operating behavior of a
physical component of the system. A DNP3 command
injection is used to close the breakers, causing a loss of
power in the corresponding ASRs.
A. CYBER ANOMALY DETECTION
FIGURE 5. Cyber architecture for the IEEE 33 bus model case study.
Figure 5 shows the configuration of the cyber components
in the IEEE 33 bus model. The cyber architecture is composed of field devices, an attack PC, a cyber-sensor, and the
cyber-physical health monitoring. All devices are connected
to a single switch. The cyber-sensor is connected as shown
in Figure 1. We consider two types of field devices: 1) ASR
outstations, 2) Line (LN) outstations. Field devices interact
directly with the physical system, collecting sensor data and
executing control actions. ASR outstations collect voltage,
current, power, and reactive power data from all lines in their
respective ASR. Line outstations collect data and implement
a protection algorithm that checks for over currents, voltage
VOLUME 9, 2021
This section presents the analysis of the cyber data collected
from the cyber sensor and the results from trained anomaly
detection algorithms.
As described in the previous section, the algorithms are
trained only on normal network communication data collected using the windowing technique. In order to test the performance of trained algorithms, a collection of cyber attacks
was executed. The executed attacks are IP scan, ping sweep,
port scan, DNP3 data injection, and DOS flood.
Figure 6 shows measurements obtained from the cyber sensor. It shows the average number of packets communicated
between two devices during normal communication and during attack communication (IP scan, port scan, ping sweep).
The figure uses the IP addresses to represent the devices
in the network. The figure shows that the average number
of packets communicated between IP addresses is higher
during attack communications. Further, during the attack,
abnormal communication between the system components
and the attacker’s IP address (30.2.2.151) can be observed.
These figures are useful to identify active communications
and possible unexpected devices that should not be in the
network. The right side color bar represents the rate of packets
163143
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
FIGURE 6. Average number of packets communicated between two devices during normal communication (left) and during attack
communication (right).
FIGURE 7. T-SNE embeddings of cyber data for normal, cyber intrusion,
and cyber-physical attack.
communicated between two devices. Changes in these color
bars also act as an indication for possible abnormal behaviors
in communication.
Figure 7 shows the T-SNE embeddings of cyber features
for Normal, cyber intrusion, and cyber-physical attack scenarios. T-SNE is an algorithm useful for visualization of high
dimensional data in a low dimensional embedded space [61].
The combined view in the figure shows the embeddings
for three scenarios in a single plot: normal, cyber intrusion,
and cyber-physical attack. We observe a high overlap in
the embeddings for the scenarios, especially between normal and cyber-physical attack scenarios. Data from cyber
intrusion scenarios also have considerable overlap; however,
163144
we observe clusters of data from the cyber intrusion scenario,
which are considerably separated from data in normal and
cyber-physical attack scenarios.
Figure 8 presents an example of how the extracted window
feature values change over time during normal communication and attack communication. The figure shows the number of packets over time for normal and attack scenarios.
Figure 8b shows the labels of the attacks that were executed during the experimental scenarios. In figure 8b we can
clearly observe surges in features as a consequence of the
attacks. These peaks provide an indication of attack/abnormal
behaviors of the system. IP information can also be used to
identify anomalous behavior (as shown in Figure 6); however, to ensure generalization of the approach and because
IP addresses are easy to spoof, IP address information is
not directly used as part of the cyber features for the ADS.
Figure 8c shows the output of the cyber ADS during the attack
scenarios. The results were obtained using an Autoencoder
ADS algorithm. The figure shows that the cyber ADS reports
anomalies during all attack scenarios, providing evidence that
a cyberattack is being executed.
B. PHYSICAL ANOMALY DETECTION
Figure 9 shows the output of the Physical ADS during a physical fault scenario and a cyber-physical attack scenario. The
physical ADS uses Voltages, Currents, Power, and Reactive
Power from all lines in the system as data features. The figure
shows the output of the ADS along with the value of the
current in line 1 for illustration purposes. We observe that the
physical ADS is able to detect changes during physical fault
and cyber-physical attack scenarios. Although an operator
can carefully select threshold values to detect this specific
set of anomalies, a data-driven ADS allows automating the
VOLUME 9, 2021
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
FIGURE 8. Window feature values and cyber ADS output over time during normal and attack scenarios. Results were obtained using Autoencoder ADS.
FIGURE 9. Physical ADS during physical fault scenarios and cyber-physical
attack scenarios. Results were obtained using Autoencoder ADS.
detection process. The data used to train and test the physical
ADS included random variations in the power loads with
±10% from the nominal value specified by the IEEE 33 Bus
model.
Figure 10 shows a visualization of the physical data for
normal, physical fault, and cyber-physical attack scenarios.
The visualization uses T-SNE embeddings to visualize the
data in two dimensions. The visualization helps us see that
physical data from the normal scenario is clearly separated
from data that belongs to physical faults or cyber-physical
attacks. However, the physical data from physical faults and
cyber-physical attacks have considerable overlap. This figure
shows that cyber data is necessary in order to distinguish
physical faults from cyber-physical attacks.
VOLUME 9, 2021
FIGURE 10. T-SNE embeddings of physical data for normal, physical fault,
and cyber-physical attack. Physical data includes Voltages, Currents, and
Power. The figure shows an overlap between physical fault and
cyber-physical attacks, precluding the distinction between faults and
attacks.
C. METRIC FOR CYBER-PHYSICAL HEALTH MONITORING
Figure 11 shows the results of cyber and physical anomaly
detection in different scenarios. Results of the cyber ADS
are shown above of the results of the physical ADS in each
scenario. The figure shows the output of cyber and physical
ADSs, where 0 means no anomaly and 1 means anomaly
detected. We plot one of the cyber and physical features
alongside each ADS output for illustration purposes. For the
cyber plot, the figure shows the value of packets per second
163145
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
the physical fault scenario. The cyber ADS does not report
any anomaly, while the physical ADS reports anomalies after
the first fault occurs. Figure 11c shows the result for the cyber
intrusion scenario. As expected, the physical ADS reports no
anomaly. The cyber ADS reports several anomalies when the
scan attacks are executed. Figure 11d shows the result for the
cyber-physical attack scenario. As expected, both cyber and
physical ADS report anomalies when the DNP3 command
injection attacks are introduced.
Figure 12 shows a visualization of the cyber-physical
metric for normal, physical fault, cyber intrusion, and
cyber-physical attack scenarios. A value of 1 represents an
anomaly, whereas a value of 0 represents normal behavior.
The figure displays the value of the metric computed for each
rolling window in the experimental scenarios. The metric values are displayed in a 2D plot, where the x-axis corresponds
to the cyber component of the metric, while the y-axis corresponds to the physical component of the metric. We observe
that for normal scenarios, the cyber-physical metric reports
values close to (0,0). Pure physical faults are also clearly
distinguished, with the metric output close to (0, 1). Cyber
intrusions are characterized by most metric values being close
to (1, 0). Although a few segments of the cyber intrusion
have metric values between (0, 0) and (0.75, 0), the majority
of the intrusion scenarios have high values of Mc, with the
maximum value of the metric being (0, 1), demonstrating that
the metric is able to identify the cyber intrusion. Considering
the cyber-physical attack scenario, we clearly detect several
cyber communication anomalies along with a disruption in
the physical system, leading to metric values approaching
(1, 1). The figure shows that the metric successfully differentiates between normal, physical fault, cyber intrusion, and
cyber-physical attack.
FIGURE 11. Cyber and Physical Anomaly Detection (Ac, Ap) before the
Cyber-Physical Health metric is computed.
over time with the corresponding output of the cyber ADS.
For the physical plot, the figure shows the value of the current
Ia in line 1 with the corresponding output of the physical
ADS. Figure 11a shows the cyber and physical ADS output
from the Normal scenario. The physical ADS does not report
any anomaly during the normal scenario. The cyber ADS only
reports one false positive anomaly during the normal scenario. These false positives are filtered later on by the cyberphysical metric. The metric uses its weighted average vector
to filter the output of the anomaly detection system, reducing
the number of false positives. Figure 11b shows the result for
163146
FIGURE 12. Cyber-Physical Health Monitoring for Normal scenario,
physical fault scenario, cyber intrusion, and cyber-physical attack. Blue
shows the metric for all windows in a scenario. Red shows the maximum
value of the metric across a single scenario.
D. COMPARATIVE ANALYSIS
For comparative analysis, we considered three types of
anomaly detection algorithms: Local Outlier Factor (LOF),
one-class SVM (OCSVM), and Autoencoder (AE). Table 1
VOLUME 9, 2021
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
TABLE 1. Performance of individual anomaly detection algorithms
(before the metric calculation). Results show performance on
Training/Testing data, measured using k-fold cross-validation.
shows the performance of each anomaly detection algorithm
when used for the cyber ADS and the physical ADS. The
table shows the False Positive Rate (FPR) and True Positive
Rate (TPR). We report performance on training/testing data,
with results evaluated using k-fold cross-validation. The table
shows the results of individual cyber and physical anomaly
detection before the metric calculation. The results show that
AE provided the lowest FPR for both cyber and physical
ADSs.
We tested the performance of the presented metric with
several combinations of anomaly detection algorithms for
cyber and physical ADS. We evaluated the performance of
the metric on distinguishing between the four experimental scenarios: normal, physical fault, cyber intrusion, cyberphysical attack. Table 2 shows a comparative analysis for
different combinations of ADS algorithms used with the
cyber-physical health monitoring metric. The accuracy measures the ability of the metric with different ADS algorithms
to differentiate between the four types of experimental scenarios. For example, if the values of the metric report below
(0.5, 0.5), these are considered as part of a normal scenario.
The accuracy measures the ratio of outputs correctly mapped
to the respective scenario category. The accuracy is measured
using k-fold validation, and we report average accuracy values for training and testing datasets. The data for each fold is
selected by keeping data from individual runs of the scenarios
together, i.e., data from a contiguous run of a scenario is not
split, ensuring that contiguous data reside either completely
in training or completely in testing folds. Table 2 shows the
accuracy of the metric when using different combinations
of algorithms for cyber ADS and physical ADS. Compared
with the results in Table 1, which evaluates the ADS models
separately, Table 2 shows the accuracy obtained with the
presented metric, which uses the output of both cyber and
physical ADS over a window of time. For Table 2, we use
the output of the cyber ADS and physical ADS in a window
FIGURE 13. Confusion matrix on test data after k-fold cross-validation.
Result show the average between k-fold test data.
of N=120 seconds to compute the metric (see Eq. 2 and 4).
Table 2 shows that OCSVM provided better performance
when used for physical ADS, while LOF performed better
when used for cyber ADS. AE performed well for both cyber
and physical ADS. The two combinations with the highest
accuracy were (AE, AE) and (AE, OCSVM).
Table 2 shows that a metric that uses AE for cyber and
physical ADS provides the highest accuracy. When compared
with the results in Table 1, we observe that AE also provided
the lowest FPR when evaluating cyber and physical ADS
independently. Although OCSVM and LOF provided higher
TPR than AE, the higher accuracy of the metric obtained with
AE (Table 2) suggests that lower FPR are preferable to higher
TPR for this application. By using and combining the outputs of the individual ADS systems over the last N seconds,
Table 2 shows that the metric is able to identify scenarios with
high accuracy, even when the TPR of individual ADSs are
relatively low.
Figure 13 shows the confusion matrix obtained after averaging the results of k-fold validation on the testing data.
The figure shows the results for the cyber-physical metric
with AE for both cyber and physical ADS, which is the best
performing configuration in the comparative analysis. The
figure shows that the metric correctly characterizes all normal
and fault samples as part of normal and fault scenarios,
respectively. For cyber intrusion samples, 97% of samples
are correctly characterized as part of the intrusion scenarios,
with the remaining characterized as normal. According to
TABLE 2. Training/Testing accuracy obtained with different anomaly detection algorithms used for the cyber-physical Health Monitoring Metric. The
metric is evaluated using the output of the cyber and physical ADS in a window of 120 seconds. Training and testing accuracy reflect the ability of the
metric to differentiate between the four experimental scenarios: normal, physical fault, cyber intrusion, and cyber-physical attack. Results are measured
using k-fold cross-validation. ± values represent the standard deviation across folds.
VOLUME 9, 2021
163147
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
TABLE 3. Window based TCP packet features [3].
real-time data acquisition and management of both cyber and
physical data using a publisher/subscriber model, which can
consolidate data coming from different protocols, communication media, and sensors. The collected cyber and physical data were analyzed using data-driven anomaly detection
systems that employed machine learning algorithms to identify anomalous data. We used three unsupervised machine
learning algorithms and performed a comparative analysis
of anomaly detection between them. The best performance
for anomaly detection was obtained using Autoencoders for
both cyber and physical ADSs. Cyber and physical ADSs
were used to introduce a metric for health monitoring that
provides a holistic view of the state of the system. We tested
our approach on the IEEE-33 bus model under four scenarios:
normal, physical faults, cyber intrusions, and cyber-physical
attacks. The presented approach was able to distinguish
between normal state, physical faults, cyber intrusion, and
cyber-physical attacks. Future work will explore more elaborate models for combining cyber and physical ADSs based
on the presented foundational work, with a special focus
on maintaining interpretability and limited use of labeled
data.
APPENDIX. WINDOW BASED TCP PACKET FEATURES
Table 3 presents the set of cyber features used for the cyber
ADS presented in section III-A. Features are computed using
packet windows of 1 second.
REFERENCES
the obtained results, the most challenging scenario is the
cyber-physical attacks with 95% accuracy, with a small percentage of samples characterized as part of fault and attack
scenarios.
As shown in Figures 10 and 7, there is a larger overlap in
cyber features between scenarios than in physical data, which
illustrates why cyber intrusion and cyber-physical attacks
scenarios are more difficult to characterize. The results in
figure 13 demonstrate that the presented approach is able to
use the output of both cyber and physical ADS to successfully
differentiate between scenarios that otherwise have overlapping representations, as illustrated in figures Figure 10 and
Figure 7.
V. CONCLUSION
This paper presented an approach for data-driven correlation of cyber and physical anomalies for holistic system
health monitoring of cyber physical systems. We performed
163148
[1] Cyber-Physical Systems (CPS). Accessed: Mar. 22, 2021. [Online]. Available: https://www.nsf.gov/pubs/2021/nsf21551/nsf21551.htm
[2] K. Amarasinghe, C. Wickramasinghe, D. Marino, C. Rieger, and
M. Manicl, ‘‘Framework for data driven health monitoring of cyberphysical systems,’’ in Proc. Resilience Week (RWS), Aug. 2018, pp. 25–30,
doi: 10.1109/RWEEK.2018.8473535.
[3] D. L. Marino, C. S. Wickramasinghe, K. Amarasinghe, H. Challa,
P. Richardson, A. A. Jillepalli, B. K. Johnson, C. Rieger, and
M. Manic, ‘‘Cyber and physical anomaly detection in smart-grids,’’
in Proc. Resilience Week (RWS), vol. 1, Nov. 2019, pp. 187–193, doi:
10.1109/RWS47064.2019.8972003.
[4] Y. Guan and X. Ge, ‘‘Distributed attack detection and secure estimation of
networked cyber-physical systems against false data injection attacks and
jamming attacks,’’ IEEE Trans. Signal Inf. Process. Netw., vol. 4, no. 1,
pp. 48–59, Mar. 2018, doi: 10.1109/TSIPN.2017.2749959.
[5] B.-M. Cho, M.-S. Jang, and K.-J. Park, ‘‘Channel-aware congestion
control in vehicular cyber-physical systems,’’ IEEE Access, vol. 8,
pp. 73193–73203, 2020, doi: 10.1109/ACCESS.2020.2987416.
[6] M. Eckhart, A. Ekelhart, and E. Weippl, ‘‘Enhancing cyber situational
awareness for cyber-physical systems through digital twins,’’ in Proc.
24th IEEE Int. Conf. Emerg. Technol. Factory Autom. (ETFA), Sep. 2019,
pp. 1222–1225, doi: 10.1109/ETFA.2019.8869197.
[7] H. Muccini and M. Sharaf, ‘‘CAPS: Architecture description of situational
aware cyber physical systems,’’ in Proc. IEEE Int. Conf. Softw. Architecture
(ICSA), Apr. 2017, pp. 211–220, doi: 10.1109/ICSA.2017.21.
[8] L. Shangguan and S. Gopalswamy, ‘‘Health monitoring for cyber physical
systems,’’ IEEE Syst. J., vol. 14, no. 1, pp. 1457–1467, Mar. 2020, doi:
10.1109/JSYST.2019.2922982.
[9] W. U. Guangyu, J. Sun, and J. Chen, ‘‘A survey on the security of cyberphysical systems,’’ Control Theory Technol., vol. 14, no. 1, pp. 2–10,
2016.
[10] J. Yang, C. Zhou, S. Yang, H. Xu, and B. Hu, ‘‘Anomaly detection based on
zone partition for security protection of industrial cyber-physical systems,’’
IEEE Trans. Ind. Electron., vol. 65, no. 5, pp. 4257–4267, May 2018, doi:
10.1109/TIE.2017.2772190.
VOLUME 9, 2021
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
[11] M. Keshk, E. Sitnikova, N. Moustafa, J. Hu, and I. Khalil, ‘‘An integrated
framework for privacy-preserving based anomaly detection for cyberphysical systems,’’ IEEE Trans. Sustain. Comput., vol. 6, no. 1, pp. 66–79,
Jan. 2021.
[12] A. Bezemskij, G. Loukas, R. J. Anthony, and D. Gan, ‘‘Behaviour-based
anomaly detection of cyber-physical attacks on a robotic vehicle,’’ in Proc.
15th Int. Conf. Ubiquitous Comput. Commun. Int. Symp. Cyberspace Secur.
(IUCC-CSS), Dec. 2016, pp. 61–68.
[13] M. Raciti and S. Nadjm-Tehrani, ‘‘Embedded cyber-physical anomaly
detection in smart meters,’’ in Critical Information Infrastructures Security, B. M. Hämmerli, N. K. Svendsen, and J. Lopez, Eds. Berlin, Germany:
Springer, 2013, pp. 34–45.
[14] S. Krishnamurthy, S. Sarkar, and A. Tewari, ‘‘Scalable anomaly detection
and isolation in cyber-physical systems using Bayesian networks,’’ in Proc.
Dyn. Syst. Control Conf., vol. 46193. New York, NY, USA: American
Society of Mechanical Engineers, 2014, Art. no. V002T26A006.
[15] F. Skopik, M. Landauer, M. Wurzenberger, G. Vormayr, J. Milosevic,
J. Fabini, W. Prüggler, O. Kruschitz, B. Widmann, K. Truckenthanner,
S. Rass, M. Simmer, and C. Zauner, ‘‘SynERGY: Cross-correlation of
operational and contextual data to timely detect and mitigate attacks
to cyber-physical systems,’’ J. Inf. Secur. Appl., vol. 54, Oct. 2020,
Art. no. 102544.
[16] C. S. Wickramasinghe, D. L. Marino, K. Amarasinghe, and M. Manic,
‘‘Generalization of deep learning for cyber-physical system security: A
survey,’’ in Proc. 44th Annu. Conf. IEEE Ind. Electron. Soc. (IECON),
Oct. 2018, pp. 745–751.
[17] Y. Luo, Y. Xiao, L. Cheng, G. Peng, and D. Yao, ‘‘Deep learning-based
anomaly detection in cyber-physical systems: Progress and opportunities,’’ ACM Comput. Surv., vol. 54, no. 5, pp. 1–36, Jun. 2021, doi:
10.1145/3453155.
[18] P. Garcia-Teodoro, J. Diaz-Verdejo, G. Maciá-Fernández, and E. Vázquez,
‘‘Anomaly-based network intrusion detection: Techniques, systems and
challenges,’’ Comput. Secur., vol. 28, nos. 1–2, pp. 18–28, 2009.
[19] D. Yang, A. Usynin, and J. W. Hines, ‘‘Anomaly-based intrusion detection
for SCADA systems,’’ in Proc. 5th Int. Top. Meeting Nucl. Plant Instrum.,
Control Hum. Mach. Interface Technol. (NPIC HMIT), 2006, pp. 12–16.
[20] M. Mantere, M. Sailio, and S. Noponen, ‘‘Network traffic features for
anomaly detection in specific industrial control system network,’’ Future
Internet, vol. 5, no. 4, pp. 460–473, Sep. 2013.
[21] I. N. Fovino, A. Carcano, M. Masera, and A. Trombetta,
‘‘An experimental investigation of malware attacks on SCADA
systems,’’ Int. J. Crit. Infrastruct. Protect., vol. 2, no. 4, pp. 139–145,
Dec. 2009. [Online]. Available: https://www.sciencedirect.com/science/
article/pii/S1874548209000419, doi: 10.1016/j.ijcip.2009.10.001.
[22] R. Leszczyna, I. N. Fovino, and M. Masera, ‘‘Simulating malware with
MAlSim,’’ J. Comput. Virol., vol. 6, no. 1, pp. 65–75, Feb. 2010.
[23] E. Ciancamerla, M. Minichino, and S. Palmieri, ‘‘Modeling cyber attacks
on a critical infrastructure scenario,’’ in Proc. IISA, Jul. 2013, pp. 1–6, doi:
10.1109/IISA.2013.6623699.
[24] J. M. Moya, A. Araujo, Z. Banković, J.-M. De Goyeneche, J. C. Vallejo,
P. Malagón, D. Villanueva, D. Fraga, E. Romero, and J. Blesa, ‘‘Improving security for SCADA sensor networks with reputation systems and
self-organizing maps,’’ Sensors, vol. 9, no. 11, pp. 9380–9397, 2009.
[Online]. Available: https://www.mdpi.com/1424-8220/9/11/9380, doi:
10.3390/s91109380.
[25] D. Jin, D. M. Nicol, and G. Yan, ‘‘An event buffer flooding attack in
DNP3 controlled SCADA systems,’’ in Proc. Winter Simul. Conf. (WSC),
Dec. 2011, pp. 2614–2626, doi: 10.1109/WSC.2011.6147969.
[26] I. N. Fovino, A. Coletta, A. Carcano, and M. Masera, ‘‘Critical statebased filtering system for securing SCADA network protocols,’’ IEEE
Trans. Ind. Electron., vol. 59, no. 10, pp. 3943–3950, Oct. 2012, doi:
10.1109/TIE.2011.2181132.
[27] J. D. Markovic-Petrovic and M. D. Stojanovic, ‘‘Analysis of SCADA
system vulnerabilities to DDoS attacks,’’ in Proc. 11th Int. Conf. Telecommun. Modern Satell., Cable Broadcast. Services (TELSIKS), Oct. 2013,
pp. 591–594, doi: 10.1109/TELSKS.2013.6704448.
[28] P. Maynard, K. McLaughlin, and B. Haberler, ‘‘Towards understanding
man-in-the-middle attacks on IEC 60870-5-104 SCADA networks,’’ in
Proc. 2nd Int. Symp. ICS SCADA Cyber Secur. Res., Sep. 2014, pp. 30–42.
[29] W. Li, L. Xie, Z. Deng, and Z. Wang, ‘‘False sequential logic attack
on SCADA system and its physical impact analysis,’’ Comput.
Secur., vol. 58, pp. 149–159, May 2016. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0167404816000031,
doi: 10.1016/j.cose.2016.01.001.
VOLUME 9, 2021
[30] G. Hug and J. A. Giampapa, ‘‘Vulnerability assessment of AC state
estimation with respect to false data injection cyber-attacks,’’ IEEE
Trans. Smart Grid, vol. 3, no. 3, pp. 1362–1370, Sep. 2012, doi:
10.1109/TSG.2012.2195338.
[31] Y. Mo, R. Chabukswar, and B. Sinopoli, ‘‘Detecting integrity attacks on
SCADA systems,’’ IEEE Trans. Control Syst. Technol., vol. 22, no. 4,
pp. 1396–1407, Jul. 2014, doi: 10.1109/TCST.2013.2280899.
[32] J. Mao, H. Wang, and B. F. Spencer, Jr., ‘‘Toward data anomaly detection
for automated structural health monitoring: Exploiting generative adversarial nets and autoencoders,’’ Struct. Health Monit., vol. 20, no. 4, 2020,
Art. no. 1475921720924601.
[33] Z. Tang, Z. Chen, Y. Bao, and H. Li, ‘‘Convolutional neural network-based
data anomaly detection method using multiple information for structural
health monitoring,’’ Struct. Control Health Monit., vol. 26, no. 1, p. e2296,
Jan. 2019.
[34] Y. Zhang, I.-L. Yen, F. B. Bastani, A. T. Tai, and S. Chau, ‘‘Optimal adaptive system health monitoring and diagnosis for resource constrained cyber-physical systems,’’ in Proc. 20th Int. Symp. Softw. Rel. Eng.,
Nov. 2009, pp. 51–60, doi: 10.1109/ISSRE.2009.21.
[35] G. Hackmann, W. Guo, G. Yan, Z. Sun, C. Lu, and S. Dyke, ‘‘Cyberphysical codesign of distributed structural health monitoring with wireless
sensor networks,’’ IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 1,
pp. 63–72, Jan. 2014, doi: 10.1109/TPDS.2013.30.
[36] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,
A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay,
‘‘Scikit-learn: Machine learning in Python,’’ J. Mach. Learn. Res., vol. 12,
pp. 2825–2830, Nov. 2011.
[37] D. Kwon, H. Kim, J. Kim, S. C. Suh, I. Kim, and K. J. Kim, ‘‘A survey
of deep learning-based network anomaly detection,’’ Cluster Comput.,
vol. 22, no. S1, pp. 949–961, Jan. 2019, doi: 10.1007/s10586-017-11178.
[38] S. Agrawal and J. Agrawal, ‘‘Survey on anomaly detection using
data mining techniques,’’ Proc. Comput. Sci., vol. 60, pp. 708–713,
2015. [Online]. Available: https://www.sciencedirect.com/science/
article/pii/S1877050915023479, doi: 10.1016/j.procs.2015.08.220.
[39] S. Eltanbouly, M. Bashendy, N. AlNaimi, Z. Chkirbene, and A. Erbad,
‘‘Machine learning techniques for network anomaly detection: A survey,’’ in Proc. IEEE Int. Conf. Informat., IoT, Enabling Technol. (ICIoT),
Feb. 2020, pp. 156–162, doi: 10.1109/ICIoT48696.2020.9089465.
[40] O. Alghushairy, R. Alsini, T. Soule, and X. Ma, ‘‘A review of local
outlier factor algorithms for outlier detection in big data streams,’’ Big
Data Cogn. Comput., vol. 5, no. 1, p. 1, Dec. 2020. [Online]. Available:
https://www.mdpi.com/2504-2289/5/1/1, doi: 10.3390/bdcc5010001.
[41] M. Längkvist, L. Karlsson, and A. Loutfi, ‘‘A review of unsupervised
feature learning and deep learning for time-series modeling,’’ Pattern
Recognit. Lett., vol. 42, pp. 11–24, Jun. 2014.
[42] C. S. Wickramasinghe, D. L. Marino, and M. Manic, ‘‘ResNet autoencoders for unsupervised feature learning from high-dimensional data:
Deep models resistant to performance degradation,’’ IEEE Access, vol. 9,
pp. 40511–40520, 2021.
[43] B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and
J. C. Platt, ‘‘Support vector method for novelty detection,’’ in Proc. NIPS,
vol. 12. Princeton, NJ, USA: Citeseer, 1999, pp. 582–588.
[44] B. Schölkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and
R. C. Williamson, ‘‘Estimating the support of a high-dimensional distribution,’’ Neural Comput., vol. 13, no. 7, pp. 1443–1471, 2001.
[45] M. Amer, M. Goldstein, and S. Abdennadher, ‘‘Enhancing one-class support vector machines for unsupervised anomaly detection,’’ in Proc. ACM
SIGKDD Workshop Outlier Detection Description (ODD). New York,
NY, USA: Association for Computing Machinery, 2013, pp. 8–15, doi:
10.1145/2500853.2500857.
[46] S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie,
‘‘High-dimensional and large-scale anomaly detection using
a linear one-class SVM with deep learning,’’ Pattern Recognit., vol. 58, pp. 121–134, Oct. 2016. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0031320316300267,
doi: 10.1016/j.patcog.2016.03.028.
[47] K.-L. Li, H.-K. Huang, S.-F. Tian, and W. Xu, ‘‘Improving one-class SVM
for anomaly detection,’’ in Proc. Int. Conf. Mach. Learn. Cybern., vol. 5,
2003, pp. 3077–3081.
[48] R. Perdisci, G. Gu, and W. Lee, ‘‘Using an ensemble of one-class SVM
classifiers to harden payload-based anomaly detection systems,’’ in Proc.
6th Int. Conf. Data Mining (ICDM), Dec. 2006, pp. 488–498.
163149
D. L. Marino et al.: Data-Driven Correlation of Cyber and Physical Anomalies for Holistic System Health Monitoring
[49] M. Alshawabkeh, B. Jang, and D. Kaeli, ‘‘Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems,’’ in Proc. 3rd Workshop Gen.-Purpose Comput. Graph. Process. Units (GPGPU). New York,
NY, USA: Association for Computing Machinery, 2010, pp. 104–110, doi:
10.1145/1735688.1735707.
[50] W. Wang and P. Lu, ‘‘An efficient switching median filter based on local
outlier factor,’’ IEEE Signal Process. Lett., vol. 18, no. 10, pp. 551–554,
Oct. 2011, doi: 10.1109/LSP.2011.2162583.
[51] R. Sandhya, J. Prakash, and B. V. Kumar, ‘‘Comparative analysis of
clustering techniques in anomaly detection wind turbine data,’’ J. Xi’an
Univ. Archit. Technol., vol. 12, no. 3, pp. 5684–5694, 2020.
[52] M. Ahmed, A. Anwar, A. N. Mahmood, Z. Shah, and M. J. Maher,
‘‘An investigation of performance analysis of anomaly detection techniques for big data in SCADA systems,’’ EAI Endorsed Trans. Ind. Netw.
Intell. Syst., vol. 2, no. 3, p. e5, May 2015.
[53] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, ‘‘LOF: Identifying
density-based local outliers,’’ in Proc. ACM SIGMOD Int. Conf. Manage.
Data, 2000, pp. 93–104.
[54] C. Zhou and R. C. Paffenroth, ‘‘Anomaly detection with robust deep
autoencoders,’’ in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery
Data Mining (KDD). Halifax, NS, Canada: Association for Computing
Machinery, Aug. 2017, pp. 665–674.
[55] R. Amoah, S. Camtepe, and E. Foo, ‘‘Securing DNP3 broadcast communications in SCADA systems,’’ IEEE Trans. Ind. Informat., vol. 12, no. 4,
pp. 1474–1485, Aug. 2016.
[56] S. East, J. Butts, M. Papa, and S. Shenoi, ‘‘A taxonomy of attacks on the
DNP3 protocol,’’ in Critical Infrastructure Protection III, C. Palmer and
S. Shenoi, Eds. Berlin, Germany: Springer, 2009, pp. 67–81.
[57] S. H. Dolatabadi, M. Ghorbanian, P. Siano, and N. D. Hatziargyriou,
‘‘An enhanced IEEE 33 bus benchmark test system for distribution system studies,’’ IEEE Trans. Power Syst., vol. 36, no. 3, pp. 2565–2572,
May 2021, doi: 10.1109/TPWRS.2020.3038030.
[58] V. Vita, ‘‘Development of a decision-making algorithm for the optimum
size and placement of distributed generation units in distribution networks,’’ Energies, vol. 10, no. 9, p. 1433, Sep. 2017. [Online]. Available:
https://www.mdpi.com/1996-1073/10/9/1433, doi: 10.3390/en10091433.
[59] B. Vaagensmith, V. K. Singh, R. Ivans, D. L. Marino,
C. S. Wickramasinghe, J. Lehmer, T. Phillips, C. Rieger, and M. Manic,
‘‘Review of design elements within power infrastructure cyber–physical
test beds as threat analysis environments,’’ Energies, vol. 14, no. 5,
p. 1409, Mar. 2021. [Online]. Available: https://www.mdpi.com/19961073/14/5/1409, doi: 10.3390/en14051409.
[60] B. Lantz, B. Heller, and N. McKeown, ‘‘A network in a laptop: Rapid
prototyping for software-defined networks,’’ in Proc. 9th ACM SIGCOMM
Workshop Hot Topics Netw. (HotNets). New York, NY, USA: Association
for Computing Machinery, 2010, pp. 1–6, doi: 10.1145/1868447.1868466.
[61] L. Van der Maaten and G. Hinton, ‘‘Visualizing data using t-SNE,’’
J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, 2008.
DANIEL L. MARINO (Graduate Student Member,
IEEE) received the B.Eng. degree in automation
engineering from La Salle University, Colombia,
in 2015. He is currently pursuing the Ph.D. degree
with Virginia Commonwealth University. He is
a Research Assistant at Virginia Commonwealth
University, with over six years of research and
development experience, collaborating with US
DOE National Labs, universities, and industry
partners. He has authored over 27 articles in peer
reviewed journals and conferences. His research interests include stochastic
modeling, deep learning, and explainable AI with applications in cyberphysical systems, energy, and robotics. He received the IEEE IES Student
Paper Travel Award, in 2016 and 2019, the VCU CS Outstanding Paper
Award in 2020, the VCU CS Outstanding Early-Career Student Researcher
Award, in 2017, and the Honor Scholarship Granted by La Salle University,
from 2010 to 2013.
163150
CHATHURIKA
S.
WICKRAMASINGHE
(Member, IEEE) received the B.Sc. degree in computer science from the University of Peradeniya,
Sri Lanka, in 2016. She is currently pursuing
the Ph.D. degree in computer science with Virginia Commonwealth University, Richmond. She
is a Research Assistant at Virginia Commonwealth
University. Her research interests include machine
learning, unsupervised learning, explainable AI,
generalization, and visual data mining.
BILLY TSOUVALAS received the B.Sc. degree
in electrical and computer engineering and the
M.Sc. degree in computer science and electronics
from the University of Patras, Greece, with a concentration on computer and information security.
He is currently pursuing the Ph.D. degree with Virginia Commonwealth University. He is a Research
Assistant at Virginia Commonwealth University.
He is an IBM Certified Cybersecurity Analyst.
His research interests include malware analysis,
detection, and classification, critical infrastructure, cybersecurity, machine
learning, generative adversarial learning, network security, intrusion detection, and forensics.
CRAIG RIEGER (Senior Member, IEEE) received
the B.S. and M.S. degrees in chemical engineering from Montana State University, Bozeman,
MT, USA, in 1983 and 1985, respectively, and
the Ph.D. degree in engineering and applied science from Idaho State University, Pocatello, ID,
USA, in 2008. He is currently the Chief Control Systems Research Engineer and a Directorate
Fellow with Idaho National Laboratory (INL),
Idaho Falls, ID, USA, pioneering interdisciplinary
research in the area of next-generation resilient control systems. In addition,
he has organized and chaired the 11 Institute of Electrical and Electronics Engineers technically cosponsored symposia and one National Science
Foundation Workshop in this new research area, and authored more than
50 peer-reviewed publications. He has 20 years of software and hardware
design experience for process control system upgrades and new installations.
He has also been a Supervisor and a Technical Lead for control systems
engineering groups having design, configuration management, and security
responsibilities for several INL nuclear facilities and various control system
architectures.
MILOS MANIC (Fellow, IEEE) is currently a
Professor with the Computer Science Department
and the Director of VCU Cybersecurity Center, Virginia Commonwealth University. He completed over 40 research grants in data mining
and machine learning applied to cyber security,
critical infrastructure protection, energy security,
and resilient intelligent control. He has given
over 40 invited talks around the world, authored
over 200 refereed articles in international journals,
books, and conferences, holds several U.S. patents. He is an Inductee of
the U.S. National Academy of Inventors (class of 2019) and a fellow of
Commonwealth Cyber Initiative (specialty in AI and Cybersecurity). He is
the IES Officer and a Senior AdCom Member. He was the Founding Chair of
Technical Committee, IEEE IES. He has won the 2018 R&D 100 Award for
Autonomic Intelligent Cyber Sensor (AICS) and the one of top 100 Science
and Technology Worldwide Innovations, in 2018. He was a recipient of
the IEEE IES 2019 Anthony J.Hornfeck Service Award, the 2012 J. David
Irwin Early Career Award, and the 2017 IEM Best Paper Award. He serves
as an Associate Editor for IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS,
Open Journal of Industrial Electronics Society, and IEEE TRANSACTION ON
INDUSTRIAL ELECTRONICS.
VOLUME 9, 2021