Moving-Target Defense Against Cyber-Physical Attacks in Power Grids Via Game Theory

1
Moving-Target Defense Against Cyber-Physical

Attacks in Power Grids via Game Theory
Subhash Lakshminarayana, Senior Member, IEEE, E. Veronica Belmega, Senior Member, IEEE,
and H. Vincent Poor, Fellow, IEEE
Abstract—This work proposes a moving target defense (MTD) whose life cycles can last several decades, and incorporating
strategy to detect coordinated cyber-physical attacks (CCPAs) major security upgrades in these devices can be quite ex-
against power grids. The main idea of the proposed approach pensive. Moreover, extensive research has shown that PMUs
is to invalidate the knowledge that the attackers use to mask
the effects of their physical attack by actively perturbing the themselves are vulnerable to false data injection (FDI) attacks,
grid’s transmission line reactances via distributed flexible AC which can be launched by spoofing their GPS receivers [6].
arXiv:2006.07697v2 [cs.CR] 27 Jul 2021
transmission system (D-FACTS) devices. The proposed MTD For a general class for false data injection attacks against
design consists of two parts. First, we identify the subset of links state estimation, recent works [7], [8] have proposed machine
for D-FACTS device deployment that enables the defender to learning (ML) based methods to detect the attacks. References
detect CCPAs against any link in the system. Then, in order
to minimize the defense cost during the system’s operational [9], [10] propose ML-based defense for FDI attacks against
time, we formulate a zero-sum game to identify the best subset DC micro-grids. However, recent research has shown that ML-
of links to perturb (which will provide adequate protection) based algorithms can be vulnerable in adversarial scenarios,
against a strategic attacker. The Nash equilibrium robust solution which can significantly reduce their efficacy [11]. Thus, exist-
is computed via exponential weights, which does not require ing defense mechanisms are not foolproof.
complete knowledge of the game but only the observed payoff at
each iteration. Extensive simulations performed using the MAT- In this paper, we propose a novel defense strategy to detect
POWER simulator on IEEE bus systems verify the effectiveness CCPAs based on the moving target defense (MTD) technique.
of our approach in detecting CCPAs and reducing the operator’s As opposed to traditional static defense, MTD is a dynamic
defense cost. strategy that has the potential to increase the complexity and
Index Terms—moving-target defense, coordinated cyber- the cost for potential attackers. As in prior works [2], [3], [4],
physical attacks, zero-sum non-cooperative games, Nash equi- [5], we consider physical attacks that disconnect transmission
librium, exponential weights algorithm lines. We note that to craft an undetectable CCPA, the attacker
must obtain an accurate knowledge of the transmission line
reactances [3], [5]. The main idea of the proposed MTD
I. I NTRODUCTION
defense in this context is to invalidate the attacker’s prior
Cyber threats against power grids are of increasing concern acquired knowledge by actively perturbing of the grid’s line
due to the deep integration of information and communication reactance settings. This can be accomplished using distributed
technologies (ICT) into grid operations. In particular, the so- flexible AC transmission system (D-FACTS) devices, which
called coordinated cyber-physical attacks (CCPAs) can be are capable of performing active impedance injection and are
dangerously covert. Indeed, while the physical attack involves being increasingly deployed in power grids [12]. The proposed
disconnecting a transmission line, generator or transformer, MTD defense strategy has the potential to make it extremely
the simultaneous cyber attack plays the role of masking the difficult for the attacker to track the system’s dynamics and
physical attack by manipulating the sensor measurements that gather sufficient information to craft covert CCPA.
are conveyed from the field devices to the control center. Recent related works: [13], [14], [15], [16] have studied
CCPAs can have severe effects on the grid, since undetected MTD to defend the power grid’s state estimation against false
line/generator outages may trigger cascading failures, and have data injection (FDI) attacks. References [13] and [14] proposed
received significant attention [2], [3], [4], [5]. a design criteria to compute MTD perturbations that can detect
To defend against CCPAs, recent studies [3] and [5] have FDI attacks effectively. They also showed that effective MTD
proposed strategies based either on securing a set of measure- perturbations incur a non-trivial operational cost, evaluated in
ments (e.g., by encryption) or relying on measurements from terms of the grid’s optimal power flow (OPF) cost [13] and the
known-secure phasor measurement units (PMU) deployed in transmission power losses [14]. The problem of hiding MTD’s
the grid. However, power grids consist of many legacy devices activation from the attacker was considered in [15], which
proposed the so-called hidden MTD. Reference [16] analyzed
S. Lakshminarayana is with the University of Warwick, Coventry, UK
(subhash.lakshminarayana@warwick.ac.uk). E.V. Belmega is with ETIS, how the topology of the power grid impacts the completeness
UMR 8051, CY Cergy Paris Université, ENSEA, CNRS, F-95000, France of MTD’s detection capability. Finally, [17] has studied the
(belmega@ensea.fr). H. V. Poor is with the Department of Electrical En- deployment of D-FACTS devices to maximize MTD’s attack
gineering, Princeton University, USA (poor@princeton.edu). The work was
partially presented at IEEE Smartgridcomm-2019 [1]. This research was detection capability against FDI attacks.
supported in part by a Startup grant at the University of Warwick, Cergy : Compared to the above references, the novelty of our work
ENSEA (SRV), Paris Seine EUTOPIA international fellowship, COST Action is two-fold. First, with the exception of our preliminary work
CA16228 “European Network for Game Theory” (GAMENET), and by the
U.S. National Science Foundation under Grants ECCS-1824710 and ECCS- [1], none of them have considered defense against CCPAs
203971. by exploiting MTD. The solution requires the formulation of
2
novel design criteria both in terms of D-FACTS placement to defending against CCPAs (as in [1]) also minimizes the OPF
as well as D-FACTS perturbation selection. Second, existing cost simultaneously; (ii) EXP3 is proposed to compute the NE
works seek to design MTD that can defend against all potential instead of the Lemke-Howson approach in [1], which requires
threats (specifically, FDI attacks in these papers). However, as less information and comes with performance guarantees; (iii)
we will show in this work, this is not always necessary. In multi-line attacks are considered here, while only single line
fact, MTD’s operational cost can be significantly reduced by disconnections were considered in [1]; (v) as opposed to the
identifying and defending against only the most likely threats full placement of power flow/injection measurement sensors at
against an active and strategic attacker. We formalize this idea all links/busses assumed in [1], here we also consider partial
in the context of MTD for CCPAs using a game-theoretic sensor placement; and (iv) we show via numerical experiments
framework. We note that although game theory has been used that our defense strategy remains effective in the AC-power
in the context of defense against FDI attacks [18], [19], our flow case.
work is the first to apply it in the context of MTD. Finally, we note that although in this paper, we focus
Our main contributions: in the MTD design against CC- specifically on strengthening the BDD to detect CCPAs, the
PAs concern two different aspects: (i) D-FACTS deployment, proposed MTD method can be broadly applied to other data-
and (ii) D-FACTS operation. First, in the D-FACTS deployment driven methods for line outage detection/classification such
problem, we seek to find the best subset of links for this as those proposed in [22], [23], [24], [25]. These data-driven
purpose that combines the dual criteria of minimizing the approaches leveraging the measurements provided by the
system’s OPF cost and detecting CCPAs. We identify a graph- supervisory control and data acquisition (SCADA) system and
theoretic criteria to characterize MTD’s effectiveness against PMUs are being increasingly adopted in power grids, and an
CCPAs. The optimal subset of links for D-FACTS deployment attacker aiming to cause line outages must mask the effect
are found by solving a feedback edge set problem [20] on the of the physical attack on the power grid measurements to
graph associated with the power grid. Our proposed D-FACTS remain stealthy. Thus, a similar MTD strategy can also be
deployment solution provides the defender with the ability to developed to strengthen the aforementioned data-driven line
detect CCPAs against any transmission line of the grid. outage detection/classification approaches.
Second, to reduce the prohibitive MTD operational cost
[13], we focus on D-FACTS operation that minimizes the II. S YSTEM M ODEL
defense cost while accounting for a strategic and intelligent
We consider a power grid characterized by a graph
attacker.1 For this, we study the interaction between the grid’s
G = (N , L), where N = {1, . . . , N } is the set of buses and
defender and attacker via zero-sum non-cooperative games.
L = {1, . . . , L} is the set of transmission lines. Without the
This enables us to anticipate the attacker’s strategic behaviour
loss of generality, we assume throughout that bus 1 is the slack
and to develop robust defense policies against CCPAs. Then,
bus. At bus i, we denote the amount of generation and load by
in order to compute the Nash equilibrium (NE) minimax
Gi and Li respectively. We let l = {i, j} denote a transmission
robust solution, we propose to exploit a reinforcement learning
line l ∈ L that connects bus i and bus j and its reactance by
algorithm, namely, the exponential weights or EXP3 [21].
xl . The power flowing on the corresponding line l is denoted
The latter has two major advantages compared to the Lemke-
by Fl , which under the DC power flow model is given by
Howson algorithm: (1) EXP3 does not require the perfect and
Fl = x1l (θi −θj ), where θi and θj are the voltage phase angles
complete knowledge of the game, which may be problematic
at buses i, j ∈ N respectively. Note for the slack bus θ1 = 0.
in adversarial settings, and relies solely on the past observed
In vector form, the power flow vector f = [F1 , . . . , FL ]T is
payoff; (2) Lemke-Howson is a combinatorial algorithm and,
related to the voltage phase angle vector θ = [θ2 , . . . , θN ]
although efficient in practice, it has exponential complexity in N −1×L
as f = DAT θ, where the matrix A ∈ R is the
the worst case. At the opposite, EXP3 is a simple iterative
reduced branch-bus incidence matrix obtained by removing
procedure converging to the solution as O(T −1/2 ), with T L×L
the row of the slack bus and D ∈ R is a diagonal matrix
being the horizon of play.
of the reciprocals of link reactances. We denote the set of
At last, our extensive simulations conducted using the MAT-
links on which D-FACTS devices are deployed by LD where
POWER simulator shows the effectiveness of the proposed
LD ⊆ L. D-FACTS devices enable the reactances of these
MTD solution in detecting CCPAs. Our results also demon-
lines to be varied within a pre-defined range [xmin , xmax ],
strate that the game-theoretic robust solution significantly
where xmin , xmax are the reactance limits achievable by the
reduces the operator’s defense cost. Moreover, the simulation
D-FACTS devices. Note that xmin l = xmax
l = xl , ∀l ∈ L \ LD .
results also demonstrate that MTD designed according to the
State Estimation & Bad Data Detection: The system state,
DC power flow model remains effective in detecting CCPA
i.e., the voltage phase angles θ, are estimated from the
attacks under an AC-power flow model as well.
noisy sensor measurements using the state estimation (SE)
Compared to our preliminary work [1], the novel con-
technique. The sensor measurements, which we denote by z ∈
tributions of this paper are significant, as follows: (i) we
provide a new D-FACTS deployment scheme that in addition
RM , correspond to the nodal power injections, and the forward
and reverse branch power flows, i.e. z = [p̃, f̃ , −f̃ ]T and M is
1 A key observation that enables the cost reduction is that, during system the total number of measurements2 , where M = N + 2L. We
operations, it is not necessary to defend against CCPAs all transmission lines
in the grid. This is because many of these attacks are not harmful enough 2 We will generealize our results to the case of partial placement of sensors
and, hence, the attacker is unlikely to target those transmission lines. in Sec. V-B.
3
denote the sensor measurement noises by a vector n ∈ R ,

M G1 G2
Bus 1 Bus 4
which is assumed to follow a Gaussian distribution. Under the L1
DC power flow model, the relationship between θ and z is

M ×N L5
given by z = Hθ + n, where H ∈ R is the system’s L2 L4
measurement matrix given by H = [DAT ; −DAT ; ADAT ]. L3
Bus 3
Bus 2
The maximum likelihood (ML) technique is used for system
state estimation. Under ML estimation, the estimate θ b is
related to the measurements z as θ = (H WH) H Wz,
b T −1 T Fig. 1: An example of a 4 bus power grid.
where W is a diagonal weighting matrix whose elements are N
reciprocals of the variances of the sensor measurement noise where c ∈ R , remains undetected by the BDD. Specifically,
components. the probability of detection for such attacks is equal to the FP
After state estimation, a bad data detector (BDD) computes rate α. We call these attacks undetectable FDI attacks.
the residual, which we denote by r, as r = ||z − Hθ||. b A Coordinated Cyber and Physical Attack: While an FDI
bad data alarm is flagged if the residual exceeds a predefined attack only modifies the sensor measurements, a CCPA harms
threshold τ. The threshold is adjusted to ensure that the false the grid physically followed by a coordinated FDI attack on
positive (FP) rate does not exceed α, where α > 0 (usually a the sensor measurements, mentioned above. In particular, we
small value close to zero). consider physical attacks that disconnect a set of transmission
Optimal Power Flow: For any given load condition d = lines, e.g., by opening the line circuit breakers by accessing
[D1 , . . . , DN ], the system operator sets the generation dispatch them remotely exploiting the SCADA vulnerabilities. The
and line reactance settings by solving the optimal power flow physical attack will alter the power grid’s topology and power
(OPF) problem, stated as follows: flow, and the mismatch between the pre-attack (i.e., line
X disconnections) and post-attack measurements can generally
COPF = min Ci (PGi ) (1a) be detected by the BDD. However, it has been shown that if the
pg ,qg ,x
i∈N attacker injects a carefully-constructed coordinated FDI attack
s.t. g − d = Bθ, (1b) on the sensor measurements, then the effect of the physical
0
|g − g | ≤ ∆max
g (1c) attack on the BDD residual can be completely masked [5].
f ∈ F, g ∈ Z, x ∈ X , (1d) Hence, the attack remains undetected by the BDD.
Denote the set of links disconnected by the attacker under
where Ci (·) is the generation cost at bus i ∈ N and pg = a physical attack by LA . We use the subscript “p” to denote
[PG1 , . . . , PGN ] is the power generation vector at all busses. the power grid parameters following the physical attack. It
Equation (1b) is the nodal power balance constraint, where the can be shown that the grid measurements post physical attack
matrix B = ADAT . Constraint (1c) is the generator ramp are related to the pre-attack measurements by zp = z + ap ,
rate constraint, where g0 = [G01 , . . . , G0N ] is the vector of where ap = H∆θ + ∆Hθ p , where ∆H is the change in
generations at the previous decision instant and ∆max g is the the measurement matrix before and after the physical attack,
permissible change in the generation between two decision in- given by, ∆H = H − Hp . As shown in [5], in order to mask
stants. Constraints (1d) correspond to the branch power flows, the effect of the physical attack and remain undetected by the
generator limits, and D-FACTS limits, respectively, where BDD, the attacker must inject a coordinated FDI attack of the
F = [−f max , f max ], Z = [gmin , gmax ] and X = [xmin , xmax ] form a = ∆Hθ p .
and f max is the maximum permissible line power flow (i.e., Knowledge Required to Launch a CCPA: Next, we enlist
the thermal limit) and gmin , gmax are the generator limits. We the knowledge required by the attacker to construct an FDI
note that in the absence of D-FACTS, OPF optimizes over the attack of the form a = ∆Hθ p . Assume that the attacker
generator dispatch values only. disconnects a single branch LA = {l} that connects buses
i and j. It can be easily verified that ∆H depends on the
III. C OORDINATED C YBER AND P HYSICAL ATTACKS tripped branch reactance xl only. Therefore, to construct the
The focus of this work is the power grid SCADA system. attack a = ∆Hθ p , the attacker must obtain knowledge of the
Existing SCADA communication standards, such as the IEEE branch reactance xl and the difference in phase angles of the
C37.118 and IEC-61850 frameworks are known to have poor buses i and j following the physical attack, i.e., θi,p − θj,p [5].
security features (e.g., lack of encryption, etc.) [26]. Their The knowledge of θi,p − θj,p can be obtained by monitoring
vulnerabilities can be exploited by an attacker to obtain unau- the line power flows following the physical attack as follows:
thorized access to the field devices and alter their status, and/or
X
θi,p − θj,p = − xlm Flm ,p , (2)
modify the data packets which convey the field measurements
m∈pk
l
and the control commands.
Undetectable False Data Injection Attacks: We denote where pkl is any alternative path between nodes i and j in
M
the false data injection (FDI) attack vector by a ∈ R , which the residual power network following the physical disconnec-
the attacker injects into the sensor measurements by exploiting tions, i.e., L \ LA . Each path pkl in turn is a collection of
the aformentioned SCADA vulnerabilities. The measurement links pkl = {lk1 , lk2 , . . . , lkM } such that src(lk1 ) = i and
vector with the FDI attack by za , given by za = z + a. It dst(lkM ) = j, and kM is the number of links in the path
has been shown [27] that an FDI attack of the form a = Hc, pkl . We denote by Pl = {p1l , p2l , . . . , pKl } a collection of
l
4
all alternative paths between buses i and j, where Kl is the

number of such alternative paths. Note that the subscript l
denotes the disconnected link.
In the IEEE-4 bus example shown in Fig. 1, assume that
the attacker disconnects link 1. After the disconnection, there
are two alternative paths between buses 1 and 4, and hence,
K1 = 2. These paths are given by p1l = {2, 3, 4} with k1 = 3
and p2l = {5, 4} with k2 = 2. The attacker can compute the
phase angle difference between nodes 1 and 2 using (2) as
θ1,p − θj2p = − (x2 F2,p + x3 F3,p + x4 F4,p ) or, θ1,p − θj2p =
Fig. 2: A time line of the proposed MTD scheme.
− (x5 F5,p + x4 F4,p ) .
The attacker can obtain the knowledge of power flows Flm ,p
in (2) by monitoring the line flow sensor measurements. The The cost of MTD is the difference between the OPF cost with-
0
line reactances xlm can be learned by monitoring the grid out MTD and OPF cost with MTD, i.e., CMTD = COPF −COPF .
power flows over a period of time using existing techniques Note that the MTD cost is always non-negative since the
[28]. The attacker can also learn the reactance of the discon- additional perturbation due to MTD will increase the OPF
nected branch xl similarly. cost. Moreover, the magnitude of ∆x affects the MTD cost.
In general, a large perturbation effectively invalidates the
CCPA with Multiple Line Disconnections: While the de-
attacker’s knowledge, but also incurs higher operation cost.
scription above considers CCPAs with single line connections,
Thus, there exists a trade-off between MTD’s effectiveness
an attacker who has access to the line circuit breakers can
and cost [13].
simultaneously disconnect multiple tranmission lines. In this
case, to launch an undetectable CCPA, the attacker would MTD Perturbation Frequency: For MTD to be effective, the
need to acquire the knowledge of the reactances of all tripped system settings must be changed before the attacker gathers
branches, and the phase angles of all buses connecting to the sufficient information to conduct a successful attack. The
tripped branches after the physical attack [5]. attacker can acquire the knowledge of the system topology
and branch reactances (required to launch undetectable CCPA)
by monitoring the power grid’s measurement data over a
IV. M OVING -TARGET D EFENSE FOR CCPA S period of time [28]. The experimental evidence suggests that
In this work, we propose a solution to defend the system hourly perturbations are sufficient for practical systems (see
against CCPAs based on the MTD technique. In the following, the discussion in Section IV of [13]).
we first describe MTD for power grids and then formalize the Practical Implementation: A time line showing the practical
MTD design problem to defend against CCPAs. implementation of the proposed MTD scheme is presented in
Fig. 2. As shown in the figure, the MTD perturbation interval is
on the order of hours, whereas the SCADA measurement fre-
A. MTD Description and Practical Implementation
quency is in the order of 4-6 seconds (considering traditional
The main idea behind this approach is to periodically SCADA) and about 50 measurements per second considering
perturb the branch reactances of certain transmission lines to the advanced PMUs. When the reactance of the transmission
invalidate the attacker’s acquired knowledge. Hence, an attack lines is perturbed, the attacker’s acquired knowledge of the
constructed using outdated knowledge of the system can be power system becomes invalid. Thus, any CCPA executed
detected by the BDD. Two important considerations in MTD by the attacker with an outdated knowledge of the system
design are the MTD cost and the perturbation frequency, which will be detected by the BDD, once the next set of field
we explain in the following. measurements from the SCADA system reach the control
MTD Cost: Note that in the absence of MTD consideration, center. Thus, BDD strengthened with the MTD will be able
the D-FACTS device settings are optimized to minimize the to detect CCPAs within this time frame of the SCADA’s
4
OPF cost as in (1). Let us denote g∗ , x∗ = arg ming,x COPF . measurement frequency.
However, under MTD, to invalidate the attacker’s knowledge,
the reactance settings are set to be different from x∗ . Thus,
MTD perturbations will incur an operational cost [13]. B. Defending Against CCPAs
Let us denote the line reactnace settings with MTD as x0 = Next, we formalize the MTD design problem to defend
x + ∆x, where ∆x is the reactance perturbation. The OPF against CCPAs. For clarity, we set up the problem for CCPAs
cost with MTD is given by involving single line disconnections only. However, in Section
V-A, we present arguments to show that the proposed MTD
X
0
COPF = min Ci (Gi ) (3a)
g
i∈N
design remains effective against CCPAs with multiple line
s.t. g − l = Bθ, (3b) disconnections.
Recall from (2) that to construct an undetectable CCPA,
|g − g0 | ≤ ∆max
g (3c)
∗
the attacker must acquire the following: (i) knowledge of the
x = x + ∆x (3d) reactance of the tripped branch, xl , and (ii) knowledge of
f ∈ F, g ∈ Z. (3e) branch reactances in at-least one alternate paths pkl between
5
the nodes i and j. Therefore, under MTD, the defender can First, we consider the problem of finding LD that satisfies
thwart the CCPA by invalidating one of the two: (i) above. For this, the key observation is that each set of
C1. Invalidate the attacker’s knowledge of the tripped links {l} ∪ pkl , k = 1, . . . , kM , forms a loop in the graph
branch’s reactance xl . G. In Figure 1, assuming that the attacker disconnects link
C2. Invalidate the attacker’s knowledge of at-least one of the 1, the links {1} ∪ {2, 3, 4} and {1} ∪ {4, 5} form loops in
branches in the path pkl between nodes i and j. the corresponding graph. If D-FACTS devices are installed on
Note that the defender cannot have prior knowledge of which a subset of links in the graph such that every loop contains
link the attacker chooses to disconnect. Moreover, for a at least one D-FACTS link, then the attacker cannot launch
disconnected link l, the defender has no way of knowing which an undetectable CCPA. In graph-theoretic terms, the problem
path pkl ∈ Pl the attacker may have used to compute the phase is equivalent to removing a subset of links in the network
angle difference θi,p − θj,p as in (2). Thus, the defender must such that the residual graph has no loops. For optimized
invalidate the attacker’s knowledge of the reactance of at-least deployment, LD must contain the minimum number of links.
one link in every path pkl ∈ Pl . The defender must do so for The set LD can be found by solving the feedback edge set
every link l ∈ L (such that the attacker cannot launch a CCPA problem in an undirected graph [20]. The solution proceeds
by disconnecting any link in the grid). by finding the spanning trees of the graph G. Let Lsptr denote
To sum up, the MTD perturbation selection problem can be a spanning tree of G. If D-FACTS devices are installed on the
stated as follows: links L \ Lsptr , then the attacker cannot find a loop within the
graph whose branches do not have a D-FACTS device. Equiv-
Problem 1 (MTD design). For each branch l ∈ L, invalidate
alently, the attacker cannot launch an undetectable CCPA.
the knowledge of at-least one of the branches in {l} ∪ pkl , k =
Further, by definition, every spanning tree in a connected graph
1, . . . , Kl .
contain precisely N − 1 links and, hence, L \ Lsptr contains the
This problem poses constraints on the D-FACTS deploy- minimum number of links that can be disconnected and satisfy
ment set LD , since a preliminary requirement to invalidate (i). Thus, the D-FACTS deployment set is LD = L \ Lsptr .
the attacker’s knowledge of a branch reactance is the presence The graph G might have multiple spanning trees, which
of a D-FACTS device on that link. Thus, LD must be chosen imply multiple subsets of links LD satisfy (i). A natural
in a way that it gives the defender the ability to protect every question is: which of these LD must be chosen for D-
link l ∈ L. A trivial solution is to deploy a D-FACTS device on FACTS deployment? To answer this question, we consider the
every link of the power grid. However, a system operator may secondary problem of cost minimization described in (ii) .
wish to minimize the number of D-FACTS devices installed In the absence of defense consideration, reference [29] pro-
in order to minimize the device deployment cost. posed an approach to find LD that minimizes the transmission
On the other hand, MTD perturbations also incur an op- losses by choosing the set of links for D-FACTS deployment
erational cost for the defender as shown in [13]. Indeed, that have the highest power loss to impedance sensitivity
perturbing the reactances of a large number of links may be factors. We adopt a similar approach in this work. Since the
expensive. Thus, at the system’s operational time, the defender primary interest of this work is the OPF cost, we compute
may wish to perturb the reactances of only a subset of links, the OPF cost to impedance (OCI) sensitivity factor for each
which we denote by LDw , where LDw ⊆ LD , such that the link, i.e., dCOPF /dxl . Then, installing a D-FACTS device on
attacker cannot launch CCPAs against some specific links that the links with the greatest absolute values of dCOPF /dxl will
are perceived to be critical and vulnerable. achieve the least OPF cost.
In what follows, we provide solutions to both the afore-
With the MTD consideration, this approach must be modi-
mentioned aspects of MTD design problem. Specifically, in
fied to select the set of links that have the highest OCI sensi-
Section V, we first present an algorithm to find the D-FACTS
tivity values while satisfying the attack detection conditions.
deployment set LD that satisfies the MTD design problem
This can be accomplished by converting G to a weighted graph,
with a minimum number of devices based on a graph-theoretic
where the link weights are their OCI sensitivity factors. Then,
approach. Subsequently, in Section VI, we present a solution
among all sets LD that satisfy (i), we must select the set that
to the problem of selecting a subset of links LDw for reactance
has the greatest sum of link weights.
perturbation at the operational time based on a game-theoretic
approach. Although the procedure stated above satisfies conditions
(i) and (ii) optimally, listing all the spanning trees of G
V. D-FACTS D EPLOYMENT S OLUTION can be computationally complex. However, this issue can be
In the absence of defense consideration, D-FACTS are tra- addressed easily. Specifically, we first choose the minimum-
ditionally deployed in the grid with an objective of minimizing weight spanning tree of the weighted graph, denoted by
the cost of operation [29]. In this work, we seek to find the D- Lminsptr . Then setting LD = L \ Lminsptr solves both (i) and (ii)
FACTS deployment set LD that balances the dual objectives optimally. The D-FACTS deployment algorithm is summarized
of defense against CCPAs and minimizing the OPF cost. in Algorithm 1.
Specifically, LD must satisfy the following criteria: (i) LD Complexity of Algorithm 1: The D-FACTS placement
must meet the CCPA detection conditions listed in Section IV algorithm requires solving a Minimum Spanning Tree (MST)
and the number of links in LD , denoted by |LD |, must be problem, which has a complexity of O(L log N ) (e.g., by
minimal; and (ii) LD must achieve the least OPF cost. using Kruskal’s algorithm) [20].
6
ALGORITHM 1: D-FACTS Deployment Set this subsection, we investigate how the D-FACTS deployment
Data: Power grid graph G = (N , L) solution is affected by these two factors.
Result: LD We assume that the system operator deploys the power
1 Set the weight of link l ∈ L as dCOPF /dxl . flow/injection sensors to satisfy the basic observability con-
2 Compute the minimum weight spanning tree Lminsptr of G. dition, i.e., to ensure that the observability matrix has full
3 Set LD = L \ Lminsptr .
rank [30]. Under this assumption, if the attacker has access
to all the deployed sensors, then the proposed D-FACTS
deployment in Algorithm 1 remains valid and optimal. Indeed,
To conclude, consider the D-FACTS deployment set LD in this scenario, the attacker can derive the power flows on all
chosen according to Algorithm 1. Assume that the defender branches of the power grid (fully observable system). Thus,
perturbs the reactances of the set of links LDw ⊆ LD . Then, the D-FACTS deployment solution will not change.
we have the following: However, if the attacker can access only a subset of the
• A physical attack against a link l can be detected by the deployed sensors, then s/he cannot derive the power flows on
BDD if the links in LDw ensure that the conditions listed all the branches. Thus, the attacker will not be able to use
in Problem 1 are satisfied for that link. We will henceforth the paths that contain the unobservable branches to derive the
refer to such a link to being “protected” under the MTD phase angle difference between the disconnected nodes. In this
link perturbation set LDw . scenario, the number of D-FACTS devices that need to be
• Naturally, based on the arguments stated in this section, if deployed (for MTD) can be significantly reduced. The system
LDw = LD , then all the links l ∈ L are protected from the operator can first determine the unobservable branches and the
physical attacks. observable islands [30] from the attacker’s perspective based
We now provide extensions of the D-FACTS placement on the measurement sensors that s/he can access (e.g., knowl-
solution under some generalized conditions. edge of unprotected sensors). Note that observable islands in a
power system can be determined in an efficient manner using
existing numerical methods (see e.g., [31]). Let us denote these
A. Multiple Line Disconnections observable islands by Gi = {Ni , Li }, i = 1, . . . , K, where K
In the D-FACTS placement algorithm presented above, denotes the number of islands.
we only considered single line disconnections. However, if Then, the D-FACTS deployment set can be determined
the attacker is able to remotely access multiple line circuit by ensuring that there are no loops within any observable
breakers, the s/he can disconnect multiple transmission lines at island Gi , i = 1, . . . , K. The rationale is that, if the attacker
once. The MTD solution must be able to detect such multiple disconnects a link between any buses that belong to different
line disconnections. observable islands, then every alternative path between the
The proposed D-FACTS placement algorithm remains ef- disconnected nodes will necessarily involve an unobservable
fective in this generalized setting. Note that if the attacker link. Therefore, s/he cannot launch an undetectable CCPA
disconnects multiple links, then s/he must obtain the knowl- against such links. In other words, the attacker can launch
edge of susceptances of all tripped branches, and the phase undetectable CCPAs only against the links that connected two
angles of all buses connecting to the tripped branches after the nodes within the same observable islands. Therefore, to defend
physical attack to launch an undetectable CCPA [5]. This in against these CCPAs, it suffices to ensure that each observable
turn would require the attacker to obtain the knowledge of the island is loopless. The overall solution is summarized in
line reactances of multiple loops within the power grid. Since Algorithm 2.
our placement algorithm is designed to ensure that every loop
in the graph contains at least one link with a D-FACTS device ALGORITHM 2: D-FACTS Deployment Set With Partial
installed, the proposed algorithm can successfully invalidate Sensor Placement
the attacker’s knowledge under multiple line disconnections. Data: Power grid graph G = (N , L), Attacker’s sensor
access set PA
Result: LD
B. Partial Placement of Measurement Sensors and Attacker’s 1 Set the weight of link l ∈ L as dCOPF /dxl .
2 Using PA , compute the ubobservable branches and enlist the
Access
set of observable islands Gi = {Ni , Li }, i = 1, . . . , K.
The system model considered thus far assumes full place- 3 For each observable island, compute the minimum weight
ment of power flow/injection measurements. However, in spanning tree Li,minsptr of the corresponding graph.
K
4 Set LD = ∪i=1 Li \ Li,minsptr .
practice, the sensors may only be deployed partially. Moreover,
the attacker may have access to only a subset of the deployed
sensors, either due to his/her limited resources or due to Complexity of Algorithm 2: If the attacker has access to
the protection measures deployed at some of the sensors. M measurements (where M ≤ 2L + N , then the complexity
The sensor placement and the attacker’s accessed measure- of computing the observable islands using numerical methods
ments affects the D-FACTS deployment solution, since they is O(M N ) [31]. The overall complexity of finding MSTs in
determine the knowledge of branch power flows that the all the observable islands is O(N L).
attacker can obtain (which is essential for the attacker to Note that in the general case where the there are no
craft an undetectable CCPA as argued in Section III). In protected sensors or if the system operator has no knowledge
7
of the sensors that can be accessed by the attacker, our original SA and SD respectively. The attacker’s action set is the subset
MTD deployment (Algorithm 1) remains valid. An illustration of links it disconnects physically. We denote the set of links
of D-FACTS devices under partial sensor deployment is pre- disconnected by the attacker under action ai by Lai , where,
sented in Section VII. Lai ⊆ L, i = 0, 1, . . . , NA − 1. The action a0 corresponds
to the case when the attacker does not attack any link. Note
C. Attack Detection Under the AC Power Flow Model that the attacker’s action set SA may also include multiple
line disconnections. The defender’s action is to select a subset
The proposed MTD design will also be effective under the of links within LD whose reactances will be perturbed. We
AC power flow model. This can be explained as follows. Re- denote the set of links chosen by the defender under action di
call that in general, an undetectable CCPA can be constructed by Ldi , where, Ldi ⊆ LD , i = 1, . . . , ND − 1. The action d0
as a = z − zp , where a is the attack vector to be injected, z corresponds to the case when the defender does not perturb
are the original measurements and zp are the measurements the reactance of any link.
following the physical attack (line disconnection). In the case
Next, we characterize the payoff ud (sD , sA ). If the at-
of the DC power flow model, as shown in [5], the attack
tacker’s chosen action sA includes a link l that is protected
vector a can be computed in an optimized manner using the
by the defender (via MTD), then the CCPA will be detected
knowledge of the reactances of only a few links with in the
by the BDD, and the operator can quickly restore the link
grid (as explained in Section III of the paper). In contrast, no
to avoid any further damage. For instance, the defender can
optimal methods are known to compute the attack vector a for
quickly restore the circuit breaker of the disconnected link to a
the AC power flow model. Thus, the attacker will have first
closed position. On the other hand, if the attacker disconnects
to recompute the AC power flow equations to obtain zp , and
a subset of links that are not protected, then the CCPA will go
subsequently obtain the attack vector a. This in turn would
undetected. The link disconnection will result in redistribution
require the knowledge of the reactances of all the links in
of power flows. Consequently, the power flow on some of the
the power grid. Thus, our proposed MTD design will remain
links may exceed the corresponding thermal limits. The system
effective because it invalidates the attacker’s knowledge of the
operator will notice the power flow violations and initiate a
branch reactances of the links within LD .
generator dispatch/load shedding to resolve this issue, which
Alternately, the recent work [32] shows that the attacker
in turn rebalances the power flows and rectifies the overflows.
can use line outage distribution factors (LODFs) to compute
We denote the cost of load shedding at bus i by Ci,s (Lsi ),
zp . However, the computation of LODFs also require the
where Lsi (≤ Li ) is the quantity of load that is shed, and
knowledge of the line reactances of all the links in the
ds = [D1s , . . . , DN
s
].
grid [33] (note that [32] also assumes that the attacker has
Let COPF (am , dn ) denote the OPF cost when the attacker
knowledge about the topology of the entire system). Our
takes an action am and the defender takes an action dn , which
extensive simulation results presented in Section VII verify
can be computed as follows:
that the proposed MTD design remains effective under the AC
power flow model under both full power flow recomputation
X
COPF (am , dn ) = min Ci,g (Gi ) + Ci,s (Dis ) (4)
as well as using LODFs. g,ds
i∈N
s.t. gaPm ,dn (pG , θ, v, x) = 0,
VI. G AME T HEORETIC MTD ROBUST S TRATEGIES
s.t. gaQm ,dn (pG , θ, v, x) = 0,
MTD perturbations incur an operational cost, and perturbing
the reactances of a large set of links may not be cost effective. |g − g0 | ≤ ∆maxg
Instead, we propose to protect only a subset of links from f ∈ F, g ∈ G,
physical attacks depending on the operational system state, as
well as the perceived threat to those links. This is approached where Bam ,dn is given by Bam ,dn = Aam ,dn Dam ,dn ATam ,dn .
using a game-theoretic formulation presented next. Here, Aam ,dn is the bus-branch connectivity matrix when
1) Zero-sum Game Formulation: We define the strategic the attacker and the defender choose actions am and dn
interactions between the attacker and the defender as a two- respectively. These quantities are computed as in Algorithm 2.
player zero-sum game. To formalize this, we define the game
as a triplet Γ , ({D, A}, {SD , SA }, {uD , uA }) in which the ALGORITHM 3: Payoff Computation
components are: (i) the set of players {D, A}; (ii) SD and Data: am , dn
SA , the sets of actions that defender and attacker can take Result: COPF (am , dn )
respectively; and (iii) the payoffs of the players uk : SD × 1 Set branch reactances to xdn .
2 Set Aam ,dn = Aa0 ,d0 .
SA → R for k ∈ {D, A}, where uk (sD , sA ) measures the 3 Solve (4) to obtain COPF (a0 , dn ).
benefit obtained by player k when the action profile that has 4 if attack is successful then
been played is s = (sD , sA ). In a zero-sum game, the payoffs 5 Set Aam ,dn , Dam ,dn by removing the branches Lam .
are opposite and uA (sD , sA ) = −uD (sD , sA ) such that a cost Solve (4) to compute COPF (am , dn ).
for the defender is a benefit for the attacker and vice-versa. 6 else
7 Set COPF (am , dn ) = COPF (a0 , dn ).
We denote the attacker’s and the defender’s action sets by 8 end
SA = {a0 , a1 , . . . , aNA −1 } and SD = {d0 , d1 , . . . , dND −1 }
respectively, where NA and ND are the cardinality of the sets
8
Complexity of Algorithm 3: The payoff computation that are not played at the NE give strictly lower payoffs than
algorithm requires solving NA ND DC OPF problems. Note the ones that are played, for both players.
that DC OPF problems can be solved efficiently, e.g., by This characterisation allows one to compute the mixed NE
interior point methods in polynomial time. in a straightforward manner (by solving a linear system of
Based on the above, the defender’s payoff is given by equations), if the exact face of the simplex ∆D × ∆A on
which the NE (p∗D , p∗A ) lies is known. Computing this among
COPF (d0 , a0 ) − COPF (sD , a0 ), if IS = 0 2ND +NA possible faces is a well-known difficult problem: the
uD (sD , sA ) =
COPF (d0 , a0 ) − COPF (sD , sA ), if IS = 1. Lemke-Howson algorithm is the best combinatorial algorithm
The term IS is an indicator variable to represent the success [35] and is PSPACE-complete.3 . In the worst case scenario,
(IS = 1) or failure of an attack (IS = 0). Both players aim to it performs as poorly as exhaustive search (complexity grows
choose their actions such that their own payoff is maximized exponential with the dimension of the action profile set), but
and we can see that the two players have contradictory in general it can be quite efficient.
objectives. Aside from complexity, the major drawback of the Lemke-
The above payoffs can be explained by looking at them Howson algorithm is that it requires perfect and complete
as negative costs. First, COPF (d0 , a0 ) denotes the benchmark information of the game G at both players. In an adversarial
operating cost of the defender when none of the players setting, assuming that the players know precisely the action
takes an action to either disrupt or defend the system. The sets of their opponent and the payoff function is not realistic.
term COPF (sD , sA ) − COPF (d0 , a0 ) denotes the the additional Instead, we suggest to exploit iterative learning processes that
cost incurred by the defender and caused by a successful allow the players to compute their NE strategies by learning
attack, when the attacker chooses sA and the defender chooses from their past interactions.
sD . The term COPF (sD , a0 ) − COPF (d0 , a0 ) represents the 3) Machine Learning to Solve the Game: We focus here
additional cost incurred by the defender for choosing an action on the exponential weights for exploration and exploitation
sD against an unsuccessful attack sA . Hence, the aim of the (EXP3) algorithm, also known as the multiplicative weights,
defender is to minimize these costs, whereas the attacker will which is a ubiquitous iterative decision process that has been
seek to maximize them. repeatedly discovered in machine learning, optimization and
2) Nash Equilibrium Solution: In such an interactive sit- game theory [21], [36]; having wide applications such as: Ad-
uation, the natural solution is the Nash equilibrium (NE). aBoost for classification and prediction [37], pooling problems
However, the game Γ above is discrete and finite and may for blending industries [38], graphical model learning [39],
not have a NE solution in pure strategies. Instead, it always and learning the NE in certain types of games [21], [40] (e.g.,
has at least one mixed-strategy NE [34], which is the NE of its potential games, two-player zero-sum games, etc.).
extension to mixed strategies. The latter is defined as follows: The main idea of EXP3 is that, at each iteration t, the
e , ({D, A}, {∆D , ∆A }, {ũD , ũA }). The action sets of the
Γ decision agent k chooses a random action sk,t following the
extended game Γ e are the probability simplices of dimension probability distribution pk,t = [pk,t (s)]s∈Sk . As a result, the
n PNk o agent observes the value of its payoff: uk (sk,t , s−k,t ), based
Nk , k ∈ {D, A}: ∆k = pk ∈ RN +
k
p
j=0 k,j = 1 where
on which the cumulative scores of all actions are updated.
pk = (pk,0 , . . . , pk,Nk −1 ) is the discrete probability vector of Since only the payoff of the realized action can be observed,
player k such that pD,j and pA,j represent the probability of we have to estimate the payoffs of the unplayed ones. For this,
choosing the action dj by the defender and the probability the following pseudo-estimator can be built, for any s ∈ Sk
of choosing the action aj by the attacker, respectively. The
modified payoffs are simply the resulting expected payoffs uk (sk,t , s−k,t )I{s = sk,t } + βt
ûk,t (s) = (6)
following the randomization of play: pk,t (s)
D −1 NX
NX A −1 where the first term, uk (s, s−k,t )I{s = sk,t }/pk,t (s) is known
ũk (pD , pA ) = uk (dj , ai ) pD,j pA,i . (5) as importance sampling and represents an unbiased estimator
j=0 i=0 of uk (s, s−k,t ) for any s at time t. To control the variance of
this estimator, the bias βt > 0 is introduced [21].
The mixed NE is a stable state to unilateral deviations,
PThe cumulative score of action s is defined as: Gk,t (s) =
which means that no player can benefit by deviating from t
their NE strategy individually. τ =1 τ ûk,τ (s), which measures how well the explored ac-
η
tions have performed in the past. Then, these cumulative scores
Definition 1. A strategy profile (p∗D , p∗A ) is a mixed NE are mapped on the probability simplex via a well chosen
for the game Γ, iff the following conditions are met: exponential map
ũD (p∗D , p∗A ) ≥ ũD (pD , p∗A ), ∀ pD ∈ ∆D , and ũA (p∗D , p∗A ) ≥
1 exp(Gk,t (s))
ũA (p∗D , pA ), ∀ pA ∈ ∆A . pk,t+1 (s) = γt +(1−γt ) P , ∀s, (7)
|Sk | r∈Sk exp(Gk,t (r))
The mixed NE can also be characterized by the Von-
Neumann indifference principle [34], which requires that: i) 3 PSPACE-complete problems are the hardest problems in polynomial space
player k is rendered indifferent between its pure actions played (PSPACE, i.e., problems that can be solved using an amount of memory that
is polynomial in the input length) and are such that every other PSPACE
at the NE (with strictly positive probability), by the choice of problem can be transformed to it in polynomial time. They are suspected to
the other player p−k , for all k ∈ {D, A}; and ii) the actions lie outside of the set of NP-hard problems but this remains to be proven.
9
where γt ∈ (0, 1] and ηt > 0 are learning parameters observe the payoff value as a result of their chosen actions at
and |Sk | denotes the cardinal of the set Sk . Intuitively, this each iteration. Moreover, unlike the Lemke-Howson algorithm,
means that the actions that have been performed well in the EXP3 is shown to be robust to estimation errors of the payoff
past are played with relatively higher probability, without observation [40]: under mild assumptions on the error process
discarding completely poor or unexplored actions that may (zero-mean and finite variance) EXP3 retains its O(T −1/2 )
perform well in the future; this illustrates the data exploration convergence speed even when only a noisy observation of the
vs. exploitation tradeoff. The updated probability distribution payoff is available at each iteration.
pk,t+1 will hence be used by player k to generate the random Complexity of EXP3: One iteration of EXP3 has a
action at the next iteration t + 1 and so on. The details are relatively low complexity, increasing only linearly with the
provided in Algorithm 3 below. number of possible actions, i.e., O(|Sk |) for player k. Also,
the convergence speed of EXP3 is O(T −1/2 ), which implies
ALGORITHM 4: EXP3 to compute the NE at player k that to reach an ε-neighborhood of the Nash equilibrium
Data: γt , βt , ηt , uk (sk,t , s−k,t ) solution, the algorithm requires 1/ε2 iterations. We note that
Result: pk,t in the adversarial setting, this convergence speed is optimal
1 Initialize t = 1, pk,1 (s) = |S1k | , Gk,0 (s) = 0 for all s ∈ Sk and cannot be improved because of the limited available
2 while pk,t has not converged do information [21].
3 Draw random action sk,t following distribution pk,t At this point, some comments on the convergence time
4 Observe payoff uk (sk,t , sk,−t )
5 Compute pseudo-estimations ûk,t (s), ∀s ∈ Sk as in eq. of EXP3 algorithm are in order. First, the convergence time
(6) depends on the defender’s action set SD and also on the
6 Update cumulative scores: attacker’s action set SA , which can be exponentially large
when considering multiple line disconnections. Second, the
Gk,t (s) = Gk,t−1 (s) + ηt ûk,t (s), ∀s ∈ Sk
convergence time O(T −1/2 ) may seem long. Nevertheless,
7 Update probability distribution pk,t+1 as in eq. (7) for the proposed MTD application, these factors will not be
8 Compute empirical frequency pk,t as in (8) a major issue, for two reasons: (i) As we have explained
9 Step t ← t + 1
10 end in Section IV-A, the time period between successive MTD
perturbations can be reasonably long (e.g., in [13], we show
that hourly MTD perturbations might be realistic for practi-
Notice that the three parameters of the algorithm: βt , γt , and cal systems). (ii) For several combinations of multiple line
ηt have to be very carefully tuned. Indeed βt > 0 controls the disconnections, the net measurements following the CCPA,
bias vs. variance tradeoff of the payoff estimator ûk,t (s), ∀s i.e., zp + ∆Hθp turns out to be significantly different from
necessary because of the limited information available to the original measurements z (before the CCPA). Such attacks
player k. Indeed, only the value of the payoff uk (sk,t , s−k,t ) can be easily detected by the system operator, by monitoring
is observed while the opponent’s action s−k,t is unknown. the change in the measurements over successive monitoring
The parameters γt ∈ (0, 1] and ηt > 0 impact the periods. Note that under normal system operation, the mea-
exploration vs. exploitation tradeoff. For γt = 1 (or η 1) surements change only due to the fluctuation in the system
there is no data exploitation but pure exploration (no learning load, which tends to be smooth. Thus, any abrupt change in
from past results), as the actions are drawn following the the system measurements can trigger an alarm. Thus, many of
uniform distribution; whereas for γt 1, the exploration term the multiple line disconnections need not be considered in the
is reduced in favour of the exponential weights. For ηt 1, NE computation. We verify this using simulations presented
the algorithm stops exploring at the first chosen action. in Section VII, Fig. 7 .
In [21], it was shown that the EXP3 algorithm above At last, if one were to investigate AC OPF considerations,
converges to the mixed Nash equilibria. The theoretical result this would only change the computation of COPF (am , dn ) in
is reported below as in [41], for the sake of simplicity. equation (4). Algorithm 3 would be impacted in the sense
Proposition 1. If each player of the zero-sum game Γ runs that it would require solving NA ND AC OPF problems as
the EXP3 opposed to NA ND DC OPF problems. Although this would
p algorithm with parameters: p
γt = |Sk | log |Sk |/t, ηt ≈ βt = 2 log |Sk |/(t|Sk |), then imply a change in the values of the payoffs uD (sD , sA ) and
their empirical frequencies uA (sD , sA ), this would not change the structure of the game
t Γ under study, which remains a two player zero-sum game
1 X
with discrete and finite sets of actions. Once the new values
pk,t = Pt ητ pk,τ (8)
τ =1 ητ τ =1
of the payoffs are computed, the entire analysis of the Nash
equilibrium solution provided above still stands. Furthermore,
converge asymptotically to the set of the mixed NEs. The
the machine learning approach to find the Nash equilibrium
convergence rate is O(T −1/2 ), where T is the horizon of play.
also stands.
Information requirement of EXP3: The major advantage
of the EXP3 algorithm is the fact that it does not require VII. S IMULATION R ESULTS
the players to have complete and perfect knowledge of the Below, we present our simulation results, performed with
game (as opposed to the Lemke-Howson algorithm), neither the MATPOWER toolbox, to show the effectiveness of the
the actions chosen by their adversary. Both players only proposed defense.
10
a) MTD Attack Detection: First, we examine the MTD’s 1

capability in detecting CCPAs using IEEE 14-, 39- and 118-
0.8
bus systems. As proposed in Section V, we solve the minimum
weight feedback edge set problem for the graph corresponding 0.6
to the IEEE bus system and determine LD . Since there is no
closed form expression for the weights dCOPF /dxl , we com- 0.4
pute them by simulations. We then perturb the reactances of 0.2
all the links in the set LD . First, we simulate physical attacks
against a single randomly chosen link in the bus system and 0
0.05 0.1 0.15 0.2
inject a corresponding CCPA of the form a = ∆Hθ p , where
both ∆H and θ p are computed using outdated knowledge of
the system. Following the arguments presented in Section V, Fig. 3: Attack detection probability for single line disconnec-
we note that any random value of reactance perturbation will tions as a function of the percentage change (η) in the link
ensure that the CCPAs no longer remain undetectable (i.e., its reactance.
detection rate will be greater than the false positive rate, since 1
Links {2,5}
the BDD residual will be necessarily non-zero). However, Links {2,5,6}
in addition, the system operator may also want to ensure 0.8
that the CCPAs will also detected with a high probability

0.6
(i.e., detection rate close to 1). We plot the BDD’s attack
detection probability for each case in Fig. 3 as a function of 0.4
the percentage change in line reactances. It can be observed
that that 10 − 20% perturbation in the line reactances will 0.2
ensure that CCPAs are detected with a very high probability

0
for most IEEE bus systems. 2 3 4 5 6 7 8 9 10
Next, we verify the effectiveness of the proposed approach

in detecting CCPAs against multiple link disconnections. We Fig. 4: Attack detection probability for multiple line discon-
disconnect links {2, 5} and {2, 5, 6} in the IEEE-14 bus nections in the IEEE-14 bus system.
system are inject the corresponding cyber attacks (constructed
1.5
based on the outdated reactance knowledge) to mask the effect Optimal Placement
of the physical attack. The results are plotted in Figure 4. We Minimum Cost
observe that the proposed MTD can effectively detect CCPAs Random Placement
1
in this case as well.
We also compare the attack detection rate achieved by
the proposed D-FACTS placement strategy against two other 0.5
placement strategies: (i) D-FACTS placement that minimizes
the OPF cost only (i.e., obtained by choosing only the links
with highest dCOPF /dxl , values ) (ii) random D-FACTS place- 0
1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 18 19 20
ment (in which the D-FACT links are chosen at random).
We perform simulations using the IEEE-14 bus system. For
Fig. 5: Attack detection probability for different D-FACTS
optimal D-FACTS placement (i.e., obtained by Algorithm 1),
placement strategies. The x-axis label denotes the index of
D-FACTS devices must be placed on 7 links given by
the disconnected link under the CCPA.
1, 3, 5, 8, 9, 18, 19. For fair comparison, we also place 7 D-
FACTS devices for the minimum cost and random D-FACTS
placement. The detection rate achieved by the three placement We conduct simulations using the IEEE-24 bus system. The
strategies is plotted in Fig. 5 for different CCPAs against simulation settings are listed in Table I. The sensor deployment
different links of the grid. It can be observed that the optimal set is selected to ensure full system observability. However,
D-FACTS placement effectively detects all physical attacks, the attacker has access only a subset of the system, due to
where as the other two strategies are only effective against a which several branches become unobservable. Among all the
few link disconnections only. observable islands, only the island {1,2,4,5} contains a loop.
We also enlist the size of the D-FACTS deployment sets Deploying D-FACTS device on link 1 − 2 ensures that this
(that can detect any CCPA) in Table II. The proposed approach island is loopless.
enables the defender to protect the power grid with relatively b) NE-efficiency in Reducing the MTD Cost: We show
few D-FACTS devices. We can also conclude that |LD | the efficiency of the game-theoretic solution in reducing the
depends on the grid’s actual topology and not just on its size operator’s defense cost. The simulations are done on a IEEE-
(e.g., |LD | = 15 for the 24 bus system, whereas |LD | = 9 for 14 bus system. The generation cost is assumed to be linear,
the 39-bus system). i.e., Ci (Gi,t ) = ci Gi,t . The generators’ capacities at buses
We also investigate the D-FACTS deployment set under 1, 2, 3, 6, 8 are Gmax = 300, 50, 30, 50, 20 MWs
partial sensor placement and limited access of the attacker. and ci = 20, 30, 40, 50, 35 $/MWh respectively. fmax
11
Measurements Index LD (Full access) LD (Partial access)
Links Links {1,4,

Power flow 6,7,12,14,18,19,22,25,31}
{1, 3, 4, 7, 13, 15, 17, 23, 32, 34, 6, 18, 29, 33 } Link {1}
Power injection Nodes {2, 7, 10, 13, 15, 19, 22, 24, 3}
TABLE I: D-FACTS placement under partial placement of sensors for the IEEE-24 bus system. Underline indicates sensors
that are inaccessible to the attacker. Observable islands with attacker’s accessed measurements {1,2,4,5}, {3,15,24}, {6},
{7,8,9,10,12}, {11}, {13}, {14,16,19,20}, {17,21,22}, {18} and {23}.
Bus system |L| |LD | 1

IEEE 9-bus system 9 1
IEEE 14-bus system 20 7 0.8
IEEE 39-bus system 36 8 0.6
0.4
TABLE II: Size of the D-FACTS deployment set |LD |. 0.2

1 1
0
0 10 20 30 40 50
0.8 0.8
0.6 0.6
Fig. 7: Cumulative distribution function (CDF) of Λ over
100 combinations of multiple line outages and measurements
0.4 0.4 derived from normal load fluctuations.
0.2 0.2
Howson algorithm (based on complete and perfect game
0
0 0.5 1 1.5 2
0
0 0.5 1 1.5 2
knowledge) lead to the same NE solution, which validates
104 104 our approach. Then, the convergence rate of EXP3 is also
Fig. 6: Convergence of p̄k,t to the NE for IEEE-14 bus system. reasonably fast (within 104 iterations). Furthermore, it can be
Left: Heavily loaded system. Right: Lightly loaded system. observed that the NE robust solution depends on the system
load. While, in the heavily loaded scenario, all the links in
Load scenario NE D-FACTS NE Full LD need to be perturbed, in the lightly loaded scenario, it is
perturbation set defense defense sufficient to perturb the reactance of link 1 only. The rationale
(Lemke-Howson) cost cost
is that in the lightly loaded scenario, only a subset of links
Scenario 1 {1,3,5,8,9,18,19} 9.85 % 9.85 %
Scenario 2 {1 } 0.8 % 4.37 % need to be protected from physical attacks, since the attacker
is unlikely to target the unimportant links (i.e., the links that
TABLE III: D-FACTS perturbation set and defense cost (the have very little power flow).
% increase in OPF cost) at the NE for different system loads. In Table III, we also list the defense cost as the percentage
increase in the OPF cost over its optimal value (by solving
(1)). The NE solution of scenario 2 incurs much lower defense
is chosen to be 160 MWs for link 1, and 60 MWs for all cost, since only a subset of links are perturbed. The above
other links. We consider two scenarios: (1) heavily loaded experiments show that the MTD perturbation set depends on
system, with the load values at Bus 1 to 14 given by the operational state of the system. Also, by exploiting the NE
0, 21.7, 94.2, 47.8, 7.6, 11.2, 0, 0, 29.5, 9, 3.5, 6.1, 13.5, solution, the operator can reduce its defense cost.
14.9 MWs respectively, and (2) lightly loaded system, with the We also examine in the change in the measurements due
load values at Bus 1 to 6 0, 80, 44.2, 47.8, 30, 11.2 MWs to CCPA with respect to the original measurements due to
respectively and zero loads at Bus 7 to 14. We consider multiple line disconnections. We conduct simulations using
five MTD perturbation strategies for the defender, i.e., d1 = the IEEE-14 bus system and record Λ defined as
{1}, d2 = {1, 3}, d3 = {1, 3, 5}, d4 = {1, 3, 5, 8}, d5 =
{1, 3, 5, 8, 9, 18, 19}. We note that d5 = LD , which protects (zp + ∆Hθp )i − zi
Λ= max
all the links of the system from CCPA. In each case, we perturb i=1,...,M zi
the link reactance by 15% of their original values. The attacker for 100 different combinations of multiple line outages
in turn launches a CCPA by disconnecting one of the links at a (2−, 3− and 4− simultaneous line outages). We plot the
time. Under this set-up, we compute the NE solution according cumulative distribution function (CDF), F (Λ) of Λ obtained
to EXP3 algorithm, with γt = βt = 0 and ηt = 0.01. by the 100 different line outage combinations. We also plot
The evolution of p̄k,t in the two scenarios are shown in the CDF of the difference is successive measurements due to
Fig. 6. First, we have verified that the EXP3 and the Lemke- the normal load changes (the load data is obtained from New
12
York state, available online [42]). We observe that the value

1
of Λ is very high for almost all combinations of multi-line
outages as compared to the fluctuations due to normal load
changes, thus verifying that such attacks can be easily detected
0.5
by the system operator. Thus, in practice, the action set of the
attacker required to compute the NE can be limited to single
line disconnections only. 0
0.05 0.1 0.15 0.2
c) Effectiveness of the Proposed MTD under the AC
Power Flow Model: We investigate here the effectiveness of
the proposed MTD design in detecting CCPA attacks under Fig. 8: Attack detection probability as a function of the %
an AC power flow model. First note that under AC, the change (η) in the link reactance, for an AC power flow
measurements are given by z = h(v) + n, where the function model. The power flows following the physical attack are
h(v) is a non-linear mapping between the system state (which obtained using two methods (i) full recomputation and (ii)
includes both the voltage magnitudes and the phase angles) using LODFs.
and the measurements. The new measurements following the
physical attack are denoted by zp , where zp = hp (vp ) + n. Bus system NA × ND OPF compu- OPF compu-
tation time in tation time in
Here in hp and vp are the non-linear measurement function seconds (DC seconds (AC
mapping and the bus voltages following the physical attack4 . power flow) power flow)
Similar to the DC case, we seek a cyber attack that will IEEE 14-bus 20 × 5 = 100 1.29 2.03
completely mask the effect of the physical attack. Under the system
AC power flow model, this will be given by a = z − zp . IEEE 39-bus 41 × 20 = 820 17.4 44.36
system
We compute zp using two methods, (i) fully recomputing the IEEE 118- 186 × 50 ≈ 10000 270 600
power flows and (ii) by using the LODFs. bus system
To compute the efficacy of our proposed MTD design, we
conduct the following experiments. First, we compute the TABLE IV: Time required to compute NA × ND OPF prob-
MTD reactance perturbation according to Algorithm 1 (i.e., lems under the DC and AC models involved in the game-
based on the DC power flow model). Then, we compute the theoretic payoff in seconds using the Matpower simulator.
attacker’s undetectable CCPA vector a based on the AC power
flow (described above) using an outdated system model, and
we inject it into the measurements z0p (after MTD). We then in this context. We have identified the subset of links for D-
implement the state estimation and bad data detection based FACTS device deployment that enables the defender to detect
on the AC power flow model (with the new reactance values physical attacks against any link in the system. Further, to
after MTD) and examine the attack detection rate. The results reduce the operator’s defense cost, we have identified the
reported in Fig. 8. It can be observed that the proposed MTD optimal set of links whose reactances must be perturbed at
is effective in detecting CCPAs in an AC power flow model. the operational time based on a game-theoretic approach. We
These experiments show that our MTD approach based on the showed that the robust solution against a strategic attacker can
DC power flow model remains effective in the AC power flow be computed efficiently exploiting a well-known algorithm in
model as well. reinforcement learning, which has low complexity and requires
At last, we have also recorded the time required to compute little information.
NA × ND OPF problems under the DC and AC models in There are several open research directions that follow from
Table IV involved in the game-theoretic payoff computation this work. Firstly, a concurrent work [17] proposes optimal D-
in Sec. VI. The simulations are conducted using a Windows PC FACTS placement to defend against FDI attacks. Interestingly,
with 3.2 GHz Intel Core i7 processor and 16 GB RAM using their result suggests that the optimal D-FACTS placement
the Matpower simulator. It can be observed that it only takes to defend against FDI attacks must ensure that the residual
a few seconds to tens of minutes to compute the payoffs. As graph (obtained by removing the D-FACTS links) has no
we have already explained, the time period between successive loops. This criteria is similar to the one derived in our work,
MTD perturbations can be reasonably long (e.g., in [13], we which suggests the possibility of obtaining unified D-FACTS
show that hourly MTD perturbations might be realistic for placement algorithm that can defend against both FDI attacks
practical systems). Thus, within this interval, it will be entirely and CCPAs. Second, this work focusses on the attack detec-
feasible to implement the AC OPF problems to compute the tion problem. An interesting research direction is to exploit
payoffs. MTD to identify the attack, i.e., locate the transmission line
disconnected by the attacker and identify the compromised
VIII. C ONCLUSIONS AND F UTURE W ORK sensors. Third, designing MTD against multi-stage CCPAs and
considerations of power grid dynamics via online optimization
In this work, we have proposed a novel strategy to detect
and learning is an interesting future research direction. Finally,
CCPAs based on MTD and presented MTD design criteria
D-FACTS placement algorithm for MTD considering power
4 Note that h reflects the modified system topology after line disconnec-
p
grid topology reconfigurations is a challenging problem that
tions. must be addressed.
13
R EFERENCES [20] J. A. Bondy and U. S. R. Murty, Graph Theory with Applications.

London : Macmillan, 1976.
[1] S. Lakshminarayana, E. V. Belmega, and H. V. Poor, “Moving-target [21] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The non-
defense for detecting coordinated cyber-physical attacks in power grids,” stochastic multiarmed bandit problem,” SIAM journal on computing,
in Proc. IEEE SmartGridComm, Oct 2019, pp. 1–7. vol. 32, no. 1, pp. 48–77, 2002.
[2] S. Soltan, M. Yannakakis, and G. Zussman, “Joint cyber and physical [22] J. E. Tate and T. J. Overbye, “Line outage detection using phasor angle
attacks on power grids: Graph theoretical approaches for information measurements,” IEEE Transactions on Power Systems, vol. 23, no. 4,
recovery,” in Proc. ACM Intl. Conf. on Meas. and Modeling of Comp. pp. 1644–1652, 2008.
Sys. (SIGMETRICS), 2015, pp. 361–374. [23] H. Zhu and G. B. Giannakis, “Sparse overcomplete representations for
[3] Z. Li, M. Shahidehpour, A. Alabdulwahab, and A. Abusorrah, “Bilevel efficient identification of power line outages,” IEEE Transactions on
model for analyzing coordinated cyber-physical attacks on power sys- Power Systems, vol. 27, no. 4, pp. 2215–2224, 2012.
tems,” IEEE Trans. Smart Grid, vol. 7, no. 5, pp. 2260–2272, Sep. 2016. [24] G. Rovatsos, X. Jiang, A. D. Domı́nguez-Garcı́a, and V. V. Veeravalli,
[4] ——, “Analyzing locally coordinated cyber-physical attacks for unde- “Statistical power system line outage detection under transient dynam-
tectable line outages,” IEEE Trans. Smart Grid, vol. 9, no. 1, pp. 35–47, ics,” IEEE Transactions on Signal Processing, vol. 65, no. 11, pp. 2787–
Jan. 2018. 2797, 2017.
[5] R. Deng, P. Zhuang, and H. Liang, “CCPA :Coordinated cyber-physical [25] Y. Zhao, J. Chen, and H. V. Poor, “A learning-to-infer method for real-
attacks and countermeasures in smart grid,” IEEE Trans. Smart Grid, time power grid multi-line outage identification,” IEEE Transactions on
vol. 8, no. 5, pp. 2420–2430, Sept. 2017. Smart Grid, vol. 11, no. 1, pp. 555–564, 2020.
[6] D. P. Shepard, T. E. Humphreys, and A. A. Fansler, “Evaluation of the [26] R. Khan, K. McLaughlin, D. Laverty, and S. Sezer, “Analysis of IEEE
vulnerability of phasor measurement units to GPS spoofing attacks,” Intl. C37.118 and IEC 61850-90-5 synchrophasor communication frame-
Journ. of Critical Infra. Protection (IJCIP), vol. 5, pp. 146–153, 2012. works,” in Proc. IEEE Power and Energy Society General Meeting
[7] M. Ozay, I. Esnaola, F. T. Yarman Vural, S. R. Kulkarni, and H. V. Poor, (PESGM), 2016, pp. 1–5.
“Machine learning methods for attack detection in the smart grid,” IEEE [27] Y. Liu, P. Ning, and M. K. Reiter, “False data injection attacks against
Transactions on Neural Networks and Learning Systems, vol. 27, no. 8, state estimation in electric power grids,” in Proc. ACM Conf. on Comp.
pp. 1773–1786, 2016. and Commun. Security (CCS), 2009, pp. 21–32.
[8] Y. He, G. J. Mendis, and J. Wei, “Real-time detection of false data [28] S. Lakshminarayana, A. Kammoun, M. Debbah, and H. V. Poor, “Data-
injection attacks in smart grid: A deep learning-based intelligent mech- driven false data injection attacks against power grid: A random matrix
anism,” IEEE Transactions on Smart Grid, vol. 8, no. 5, pp. 2505–2516, approach,” 2020. [Online]. Available: https://arxiv.org/abs/2002.02519
2017. [29] K. M. Rogers and T. J. Overbye, “Some applications of distributed
[9] M. R. Habibi, H. R. Baghaee, T. Dragičević, and F. Blaabjerg, “Detection flexible AC transmission system (D-FACTS) devices in power systems,”
of false data injection cyber-attacks in dc microgrids based on recurrent in Proc. North American Power Symposium (NAPS), Sept 2008, pp. 1–8.
neural networks,” IEEE Journal of Emerging and Selected Topics in [30] A. Abur and A. G. Exposito, Power System State Estimation: Theory
Power Electronics, pp. 1–1, 2020. and Implementation. CRC press, 2004.
[10] ——, “False data injection cyber-attacks mitigation in parallel dc/dc [31] E. Castillo, A. J. Conejo, R. E. Pruneda, and C. Solares, “Observabil-
converters based on artificial neural networks,” IEEE Transactions on ity analysis in state estimation: a unified numerical approach,” IEEE
Circuits and Systems II: Express Briefs, pp. 1–1, 2020. Transactions on Power Systems, vol. 21, no. 2, pp. 877–886, 2006.
[11] A. Sayghe, O. M. Anubi, and C. Konstantinou, “Adversarial examples [32] H. Chung, W. Li, C. Yuen, W. Chung, Y. Zhang, and C. Wen, “Local
on power systems state estimation,” in Proc. IEEE Power Energy Society cyber-physical attack for masking line outage and topology attack in
Innovative Smart Grid Technologies Conference (ISGT), 2020, pp. 1–5. smart grid,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 4577–
[12] D. Divan and H. Johal, “Distributed FACTS; A new concept for realizing 4588, 2019.
grid power flow control,” IEEE Trans. Power Syst., vol. 22, no. 6, pp. [33] J. Guo, Y. Fu, Z. Li, and M. Shahidehpour, “Direct calculation of
2253–2260, Nov. 2007. line outage distribution factors,” IEEE Transactions on Power Systems,
[13] S. Lakshminarayana and D. K. Y. Yau, “Cost-Benefit analysis of moving- vol. 24, no. 3, pp. 1633–1634, 2009.
target defense in power grids,” in Proc. IEEE/IFIP Dependable Systems [34] D. Fudenberg and J. Tirole, Game theory. MIT Press, 1991.
and Networks (DSN), June 2018, pp. 139–150. [35] T. Roughgarden, “Algorithmic game theory,” Commun. of the ACM,
[14] C. Liu, J. Wu, C. Long, and D. Kundur, “Reactance perturbation for vol. 53, no. 7, pp. 78–86, 2010.
detecting and identifying fdi attacks in power system state estimation,” [36] S. Arora, E. Hazan, and S. Kale, “The multiplicative weights update
IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 4, method: a meta-algorithm and applications,” Theory of Computing,
pp. 763–776, Aug 2018. vol. 8, no. 6, pp. 121–164, 2012.
[15] J. Tian, R. Tan, X. Guan, and T. Liu, “Enhanced hidden moving target [37] T. Hastie, S. Rosset, J. Zhu, and H. Zou, “Multi-class adaboost,”
defense in smart grids,” IEEE Trans. Smart Grid, vol. 10, no. 2, pp. Statistics and its Interface, vol. 2, no. 3, pp. 349–360, 2009.
2208–2223, March 2019. [38] L. Mencarelli, Y. Sahraoui, and L. Liberti, “A multiplicative weights
[16] Z. Zhang, R. Deng, D. K. Y. Yau, P. Cheng, and J. Chen, “Analysis update algorithm for minlp,” EURO Journal on Computational Opti-
of moving target defense against false data injection attacks on power mization, vol. 5, no. 1-2, pp. 31–86, 2017.
grid,” IEEE Trans. Inf. Forensics Secur., vol. 15, pp. 2320–2335, 2020. [39] A. Klivans and R. Meka, “Learning graphical models using multiplica-
[17] B. Liu and H. Wu, “Optimal D-FACTS placement in moving target tive weights,” in 2017 IEEE 58th Symp. on Found. of Comp. Sci. (FOCS).
defense against false data injection attacks,” IEEE Trans. Smart Grid, IEEE, 2017, pp. 343–354.
2020. [40] P. Mertikopoulos and Z. Zhou, “Learning in games with continuous
[18] M. Esmalifalak, G. Shi, Z. Han, and L. Song, “Bad data injection attack action sets and unknown payoff functions,” Mathematical Programming,
and defense in electricity market using game theory study,” IEEE Trans. vol. 173, no. 1-2, pp. 465–507, 2019.
Smart Grid, vol. 4, no. 1, pp. 160–169, March 2013. [41] A. Lazaric, “Learning in zero-sum games,” lecture notes, 2017.
[19] A. Sanjab and W. Saad, “Data injection attacks on smart grids with [Online]. Available: http://chercheurs.lille.inria.fr/∼lazaric/Webpage/
multiple adversaries: A game-theoretic perspective,” IEEE Trans. Smart MVA-RL Course17.html
Grid, vol. 7, no. 4, pp. 2038–2049, July 2016. [42] “NYISO load data,” https://tinyurl.com/kx3h82t.

Moving-Target Defense Against Cyber-Physical Attacks in Power Grids Via Game Theory

Uploaded by

Copyright:

Available Formats

Moving-Target Defense Against Cyber-Physical Attacks in Power Grids Via Game Theory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Moving-Target Defense Against Cyber-Physical Attacks in Power Grids Via Game Theory

Uploaded by

Copyright:

Available Formats

1

Moving-Target Defense Against Cyber-Physical

denote the sensor measurement noises by a vector n ∈ R ,

DC power flow model, the relationship between θ and z is

all alternative paths between buses i and j, where Kl is the

a) MTD Attack Detection: First, we examine the MTD’s 1

that the CCPAs will also detected with a high probability

ensure that CCPAs are detected with a very high probability

Next, we verify the effectiveness of the proposed approach

Measurements Index LD (Full access) LD (Partial access)

Links Links {1,4,

Bus system |L| |LD | 1

TABLE II: Size of the D-FACTS deployment set |LD |. 0.2

York state, available online [42]). We observe that the value

R EFERENCES [20] J. A. Bondy and U. S. R. Murty, Graph Theory with Applications.

You might also like