Moving-Target Defense Against Cyber-Physical Attacks in Power Grids Via Game Theory
Moving-Target Defense Against Cyber-Physical Attacks in Power Grids Via Game Theory
Moving-Target Defense Against Cyber-Physical Attacks in Power Grids Via Game Theory
Abstract—This work proposes a moving target defense (MTD) whose life cycles can last several decades, and incorporating
strategy to detect coordinated cyber-physical attacks (CCPAs) major security upgrades in these devices can be quite ex-
against power grids. The main idea of the proposed approach pensive. Moreover, extensive research has shown that PMUs
is to invalidate the knowledge that the attackers use to mask
the effects of their physical attack by actively perturbing the themselves are vulnerable to false data injection (FDI) attacks,
grid’s transmission line reactances via distributed flexible AC which can be launched by spoofing their GPS receivers [6].
arXiv:2006.07697v2 [cs.CR] 27 Jul 2021
transmission system (D-FACTS) devices. The proposed MTD For a general class for false data injection attacks against
design consists of two parts. First, we identify the subset of links state estimation, recent works [7], [8] have proposed machine
for D-FACTS device deployment that enables the defender to learning (ML) based methods to detect the attacks. References
detect CCPAs against any link in the system. Then, in order
to minimize the defense cost during the system’s operational [9], [10] propose ML-based defense for FDI attacks against
time, we formulate a zero-sum game to identify the best subset DC micro-grids. However, recent research has shown that ML-
of links to perturb (which will provide adequate protection) based algorithms can be vulnerable in adversarial scenarios,
against a strategic attacker. The Nash equilibrium robust solution which can significantly reduce their efficacy [11]. Thus, exist-
is computed via exponential weights, which does not require ing defense mechanisms are not foolproof.
complete knowledge of the game but only the observed payoff at
each iteration. Extensive simulations performed using the MAT- In this paper, we propose a novel defense strategy to detect
POWER simulator on IEEE bus systems verify the effectiveness CCPAs based on the moving target defense (MTD) technique.
of our approach in detecting CCPAs and reducing the operator’s As opposed to traditional static defense, MTD is a dynamic
defense cost. strategy that has the potential to increase the complexity and
Index Terms—moving-target defense, coordinated cyber- the cost for potential attackers. As in prior works [2], [3], [4],
physical attacks, zero-sum non-cooperative games, Nash equi- [5], we consider physical attacks that disconnect transmission
librium, exponential weights algorithm lines. We note that to craft an undetectable CCPA, the attacker
must obtain an accurate knowledge of the transmission line
reactances [3], [5]. The main idea of the proposed MTD
I. I NTRODUCTION
defense in this context is to invalidate the attacker’s prior
Cyber threats against power grids are of increasing concern acquired knowledge by actively perturbing of the grid’s line
due to the deep integration of information and communication reactance settings. This can be accomplished using distributed
technologies (ICT) into grid operations. In particular, the so- flexible AC transmission system (D-FACTS) devices, which
called coordinated cyber-physical attacks (CCPAs) can be are capable of performing active impedance injection and are
dangerously covert. Indeed, while the physical attack involves being increasingly deployed in power grids [12]. The proposed
disconnecting a transmission line, generator or transformer, MTD defense strategy has the potential to make it extremely
the simultaneous cyber attack plays the role of masking the difficult for the attacker to track the system’s dynamics and
physical attack by manipulating the sensor measurements that gather sufficient information to craft covert CCPA.
are conveyed from the field devices to the control center. Recent related works: [13], [14], [15], [16] have studied
CCPAs can have severe effects on the grid, since undetected MTD to defend the power grid’s state estimation against false
line/generator outages may trigger cascading failures, and have data injection (FDI) attacks. References [13] and [14] proposed
received significant attention [2], [3], [4], [5]. a design criteria to compute MTD perturbations that can detect
To defend against CCPAs, recent studies [3] and [5] have FDI attacks effectively. They also showed that effective MTD
proposed strategies based either on securing a set of measure- perturbations incur a non-trivial operational cost, evaluated in
ments (e.g., by encryption) or relying on measurements from terms of the grid’s optimal power flow (OPF) cost [13] and the
known-secure phasor measurement units (PMU) deployed in transmission power losses [14]. The problem of hiding MTD’s
the grid. However, power grids consist of many legacy devices activation from the attacker was considered in [15], which
proposed the so-called hidden MTD. Reference [16] analyzed
S. Lakshminarayana is with the University of Warwick, Coventry, UK
(subhash.lakshminarayana@warwick.ac.uk). E.V. Belmega is with ETIS, how the topology of the power grid impacts the completeness
UMR 8051, CY Cergy Paris Université, ENSEA, CNRS, F-95000, France of MTD’s detection capability. Finally, [17] has studied the
(belmega@ensea.fr). H. V. Poor is with the Department of Electrical En- deployment of D-FACTS devices to maximize MTD’s attack
gineering, Princeton University, USA (poor@princeton.edu). The work was
partially presented at IEEE Smartgridcomm-2019 [1]. This research was detection capability against FDI attacks.
supported in part by a Startup grant at the University of Warwick, Cergy : Compared to the above references, the novelty of our work
ENSEA (SRV), Paris Seine EUTOPIA international fellowship, COST Action is two-fold. First, with the exception of our preliminary work
CA16228 “European Network for Game Theory” (GAMENET), and by the
U.S. National Science Foundation under Grants ECCS-1824710 and ECCS- [1], none of them have considered defense against CCPAs
203971. by exploiting MTD. The solution requires the formulation of
2
novel design criteria both in terms of D-FACTS placement to defending against CCPAs (as in [1]) also minimizes the OPF
as well as D-FACTS perturbation selection. Second, existing cost simultaneously; (ii) EXP3 is proposed to compute the NE
works seek to design MTD that can defend against all potential instead of the Lemke-Howson approach in [1], which requires
threats (specifically, FDI attacks in these papers). However, as less information and comes with performance guarantees; (iii)
we will show in this work, this is not always necessary. In multi-line attacks are considered here, while only single line
fact, MTD’s operational cost can be significantly reduced by disconnections were considered in [1]; (v) as opposed to the
identifying and defending against only the most likely threats full placement of power flow/injection measurement sensors at
against an active and strategic attacker. We formalize this idea all links/busses assumed in [1], here we also consider partial
in the context of MTD for CCPAs using a game-theoretic sensor placement; and (iv) we show via numerical experiments
framework. We note that although game theory has been used that our defense strategy remains effective in the AC-power
in the context of defense against FDI attacks [18], [19], our flow case.
work is the first to apply it in the context of MTD. Finally, we note that although in this paper, we focus
Our main contributions: in the MTD design against CC- specifically on strengthening the BDD to detect CCPAs, the
PAs concern two different aspects: (i) D-FACTS deployment, proposed MTD method can be broadly applied to other data-
and (ii) D-FACTS operation. First, in the D-FACTS deployment driven methods for line outage detection/classification such
problem, we seek to find the best subset of links for this as those proposed in [22], [23], [24], [25]. These data-driven
purpose that combines the dual criteria of minimizing the approaches leveraging the measurements provided by the
system’s OPF cost and detecting CCPAs. We identify a graph- supervisory control and data acquisition (SCADA) system and
theoretic criteria to characterize MTD’s effectiveness against PMUs are being increasingly adopted in power grids, and an
CCPAs. The optimal subset of links for D-FACTS deployment attacker aiming to cause line outages must mask the effect
are found by solving a feedback edge set problem [20] on the of the physical attack on the power grid measurements to
graph associated with the power grid. Our proposed D-FACTS remain stealthy. Thus, a similar MTD strategy can also be
deployment solution provides the defender with the ability to developed to strengthen the aforementioned data-driven line
detect CCPAs against any transmission line of the grid. outage detection/classification approaches.
Second, to reduce the prohibitive MTD operational cost
[13], we focus on D-FACTS operation that minimizes the II. S YSTEM M ODEL
defense cost while accounting for a strategic and intelligent
We consider a power grid characterized by a graph
attacker.1 For this, we study the interaction between the grid’s
G = (N , L), where N = {1, . . . , N } is the set of buses and
defender and attacker via zero-sum non-cooperative games.
L = {1, . . . , L} is the set of transmission lines. Without the
This enables us to anticipate the attacker’s strategic behaviour
loss of generality, we assume throughout that bus 1 is the slack
and to develop robust defense policies against CCPAs. Then,
bus. At bus i, we denote the amount of generation and load by
in order to compute the Nash equilibrium (NE) minimax
Gi and Li respectively. We let l = {i, j} denote a transmission
robust solution, we propose to exploit a reinforcement learning
line l ∈ L that connects bus i and bus j and its reactance by
algorithm, namely, the exponential weights or EXP3 [21].
xl . The power flowing on the corresponding line l is denoted
The latter has two major advantages compared to the Lemke-
by Fl , which under the DC power flow model is given by
Howson algorithm: (1) EXP3 does not require the perfect and
Fl = x1l (θi −θj ), where θi and θj are the voltage phase angles
complete knowledge of the game, which may be problematic
at buses i, j ∈ N respectively. Note for the slack bus θ1 = 0.
in adversarial settings, and relies solely on the past observed
In vector form, the power flow vector f = [F1 , . . . , FL ]T is
payoff; (2) Lemke-Howson is a combinatorial algorithm and,
related to the voltage phase angle vector θ = [θ2 , . . . , θN ]
although efficient in practice, it has exponential complexity in N −1×L
as f = DAT θ, where the matrix A ∈ R is the
the worst case. At the opposite, EXP3 is a simple iterative
reduced branch-bus incidence matrix obtained by removing
procedure converging to the solution as O(T −1/2 ), with T L×L
the row of the slack bus and D ∈ R is a diagonal matrix
being the horizon of play.
of the reciprocals of link reactances. We denote the set of
At last, our extensive simulations conducted using the MAT-
links on which D-FACTS devices are deployed by LD where
POWER simulator shows the effectiveness of the proposed
LD ⊆ L. D-FACTS devices enable the reactances of these
MTD solution in detecting CCPAs. Our results also demon-
lines to be varied within a pre-defined range [xmin , xmax ],
strate that the game-theoretic robust solution significantly
where xmin , xmax are the reactance limits achievable by the
reduces the operator’s defense cost. Moreover, the simulation
D-FACTS devices. Note that xmin l = xmax
l = xl , ∀l ∈ L \ LD .
results also demonstrate that MTD designed according to the
State Estimation & Bad Data Detection: The system state,
DC power flow model remains effective in detecting CCPA
i.e., the voltage phase angles θ, are estimated from the
attacks under an AC-power flow model as well.
noisy sensor measurements using the state estimation (SE)
Compared to our preliminary work [1], the novel con-
technique. The sensor measurements, which we denote by z ∈
tributions of this paper are significant, as follows: (i) we
provide a new D-FACTS deployment scheme that in addition
RM , correspond to the nodal power injections, and the forward
and reverse branch power flows, i.e. z = [p̃, f̃ , −f̃ ]T and M is
1 A key observation that enables the cost reduction is that, during system the total number of measurements2 , where M = N + 2L. We
operations, it is not necessary to defend against CCPAs all transmission lines
in the grid. This is because many of these attacks are not harmful enough 2 We will generealize our results to the case of partial placement of sensors
and, hence, the attacker is unlikely to target those transmission lines. in Sec. V-B.
3
the nodes i and j. Therefore, under MTD, the defender can First, we consider the problem of finding LD that satisfies
thwart the CCPA by invalidating one of the two: (i) above. For this, the key observation is that each set of
C1. Invalidate the attacker’s knowledge of the tripped links {l} ∪ pkl , k = 1, . . . , kM , forms a loop in the graph
branch’s reactance xl . G. In Figure 1, assuming that the attacker disconnects link
C2. Invalidate the attacker’s knowledge of at-least one of the 1, the links {1} ∪ {2, 3, 4} and {1} ∪ {4, 5} form loops in
branches in the path pkl between nodes i and j. the corresponding graph. If D-FACTS devices are installed on
Note that the defender cannot have prior knowledge of which a subset of links in the graph such that every loop contains
link the attacker chooses to disconnect. Moreover, for a at least one D-FACTS link, then the attacker cannot launch
disconnected link l, the defender has no way of knowing which an undetectable CCPA. In graph-theoretic terms, the problem
path pkl ∈ Pl the attacker may have used to compute the phase is equivalent to removing a subset of links in the network
angle difference θi,p − θj,p as in (2). Thus, the defender must such that the residual graph has no loops. For optimized
invalidate the attacker’s knowledge of the reactance of at-least deployment, LD must contain the minimum number of links.
one link in every path pkl ∈ Pl . The defender must do so for The set LD can be found by solving the feedback edge set
every link l ∈ L (such that the attacker cannot launch a CCPA problem in an undirected graph [20]. The solution proceeds
by disconnecting any link in the grid). by finding the spanning trees of the graph G. Let Lsptr denote
To sum up, the MTD perturbation selection problem can be a spanning tree of G. If D-FACTS devices are installed on the
stated as follows: links L \ Lsptr , then the attacker cannot find a loop within the
graph whose branches do not have a D-FACTS device. Equiv-
Problem 1 (MTD design). For each branch l ∈ L, invalidate
alently, the attacker cannot launch an undetectable CCPA.
the knowledge of at-least one of the branches in {l} ∪ pkl , k =
Further, by definition, every spanning tree in a connected graph
1, . . . , Kl .
contain precisely N − 1 links and, hence, L \ Lsptr contains the
This problem poses constraints on the D-FACTS deploy- minimum number of links that can be disconnected and satisfy
ment set LD , since a preliminary requirement to invalidate (i). Thus, the D-FACTS deployment set is LD = L \ Lsptr .
the attacker’s knowledge of a branch reactance is the presence The graph G might have multiple spanning trees, which
of a D-FACTS device on that link. Thus, LD must be chosen imply multiple subsets of links LD satisfy (i). A natural
in a way that it gives the defender the ability to protect every question is: which of these LD must be chosen for D-
link l ∈ L. A trivial solution is to deploy a D-FACTS device on FACTS deployment? To answer this question, we consider the
every link of the power grid. However, a system operator may secondary problem of cost minimization described in (ii) .
wish to minimize the number of D-FACTS devices installed In the absence of defense consideration, reference [29] pro-
in order to minimize the device deployment cost. posed an approach to find LD that minimizes the transmission
On the other hand, MTD perturbations also incur an op- losses by choosing the set of links for D-FACTS deployment
erational cost for the defender as shown in [13]. Indeed, that have the highest power loss to impedance sensitivity
perturbing the reactances of a large number of links may be factors. We adopt a similar approach in this work. Since the
expensive. Thus, at the system’s operational time, the defender primary interest of this work is the OPF cost, we compute
may wish to perturb the reactances of only a subset of links, the OPF cost to impedance (OCI) sensitivity factor for each
which we denote by LDw , where LDw ⊆ LD , such that the link, i.e., dCOPF /dxl . Then, installing a D-FACTS device on
attacker cannot launch CCPAs against some specific links that the links with the greatest absolute values of dCOPF /dxl will
are perceived to be critical and vulnerable. achieve the least OPF cost.
In what follows, we provide solutions to both the afore-
With the MTD consideration, this approach must be modi-
mentioned aspects of MTD design problem. Specifically, in
fied to select the set of links that have the highest OCI sensi-
Section V, we first present an algorithm to find the D-FACTS
tivity values while satisfying the attack detection conditions.
deployment set LD that satisfies the MTD design problem
This can be accomplished by converting G to a weighted graph,
with a minimum number of devices based on a graph-theoretic
where the link weights are their OCI sensitivity factors. Then,
approach. Subsequently, in Section VI, we present a solution
among all sets LD that satisfy (i), we must select the set that
to the problem of selecting a subset of links LDw for reactance
has the greatest sum of link weights.
perturbation at the operational time based on a game-theoretic
approach. Although the procedure stated above satisfies conditions
(i) and (ii) optimally, listing all the spanning trees of G
V. D-FACTS D EPLOYMENT S OLUTION can be computationally complex. However, this issue can be
In the absence of defense consideration, D-FACTS are tra- addressed easily. Specifically, we first choose the minimum-
ditionally deployed in the grid with an objective of minimizing weight spanning tree of the weighted graph, denoted by
the cost of operation [29]. In this work, we seek to find the D- Lminsptr . Then setting LD = L \ Lminsptr solves both (i) and (ii)
FACTS deployment set LD that balances the dual objectives optimally. The D-FACTS deployment algorithm is summarized
of defense against CCPAs and minimizing the OPF cost. in Algorithm 1.
Specifically, LD must satisfy the following criteria: (i) LD Complexity of Algorithm 1: The D-FACTS placement
must meet the CCPA detection conditions listed in Section IV algorithm requires solving a Minimum Spanning Tree (MST)
and the number of links in LD , denoted by |LD |, must be problem, which has a complexity of O(L log N ) (e.g., by
minimal; and (ii) LD must achieve the least OPF cost. using Kruskal’s algorithm) [20].
6
ALGORITHM 1: D-FACTS Deployment Set this subsection, we investigate how the D-FACTS deployment
Data: Power grid graph G = (N , L) solution is affected by these two factors.
Result: LD We assume that the system operator deploys the power
1 Set the weight of link l ∈ L as dCOPF /dxl . flow/injection sensors to satisfy the basic observability con-
2 Compute the minimum weight spanning tree Lminsptr of G. dition, i.e., to ensure that the observability matrix has full
3 Set LD = L \ Lminsptr .
rank [30]. Under this assumption, if the attacker has access
to all the deployed sensors, then the proposed D-FACTS
deployment in Algorithm 1 remains valid and optimal. Indeed,
To conclude, consider the D-FACTS deployment set LD in this scenario, the attacker can derive the power flows on all
chosen according to Algorithm 1. Assume that the defender branches of the power grid (fully observable system). Thus,
perturbs the reactances of the set of links LDw ⊆ LD . Then, the D-FACTS deployment solution will not change.
we have the following: However, if the attacker can access only a subset of the
• A physical attack against a link l can be detected by the deployed sensors, then s/he cannot derive the power flows on
BDD if the links in LDw ensure that the conditions listed all the branches. Thus, the attacker will not be able to use
in Problem 1 are satisfied for that link. We will henceforth the paths that contain the unobservable branches to derive the
refer to such a link to being “protected” under the MTD phase angle difference between the disconnected nodes. In this
link perturbation set LDw . scenario, the number of D-FACTS devices that need to be
• Naturally, based on the arguments stated in this section, if deployed (for MTD) can be significantly reduced. The system
LDw = LD , then all the links l ∈ L are protected from the operator can first determine the unobservable branches and the
physical attacks. observable islands [30] from the attacker’s perspective based
We now provide extensions of the D-FACTS placement on the measurement sensors that s/he can access (e.g., knowl-
solution under some generalized conditions. edge of unprotected sensors). Note that observable islands in a
power system can be determined in an efficient manner using
existing numerical methods (see e.g., [31]). Let us denote these
A. Multiple Line Disconnections observable islands by Gi = {Ni , Li }, i = 1, . . . , K, where K
In the D-FACTS placement algorithm presented above, denotes the number of islands.
we only considered single line disconnections. However, if Then, the D-FACTS deployment set can be determined
the attacker is able to remotely access multiple line circuit by ensuring that there are no loops within any observable
breakers, the s/he can disconnect multiple transmission lines at island Gi , i = 1, . . . , K. The rationale is that, if the attacker
once. The MTD solution must be able to detect such multiple disconnects a link between any buses that belong to different
line disconnections. observable islands, then every alternative path between the
The proposed D-FACTS placement algorithm remains ef- disconnected nodes will necessarily involve an unobservable
fective in this generalized setting. Note that if the attacker link. Therefore, s/he cannot launch an undetectable CCPA
disconnects multiple links, then s/he must obtain the knowl- against such links. In other words, the attacker can launch
edge of susceptances of all tripped branches, and the phase undetectable CCPAs only against the links that connected two
angles of all buses connecting to the tripped branches after the nodes within the same observable islands. Therefore, to defend
physical attack to launch an undetectable CCPA [5]. This in against these CCPAs, it suffices to ensure that each observable
turn would require the attacker to obtain the knowledge of the island is loopless. The overall solution is summarized in
line reactances of multiple loops within the power grid. Since Algorithm 2.
our placement algorithm is designed to ensure that every loop
in the graph contains at least one link with a D-FACTS device ALGORITHM 2: D-FACTS Deployment Set With Partial
installed, the proposed algorithm can successfully invalidate Sensor Placement
the attacker’s knowledge under multiple line disconnections. Data: Power grid graph G = (N , L), Attacker’s sensor
access set PA
Result: LD
B. Partial Placement of Measurement Sensors and Attacker’s 1 Set the weight of link l ∈ L as dCOPF /dxl .
2 Using PA , compute the ubobservable branches and enlist the
Access
set of observable islands Gi = {Ni , Li }, i = 1, . . . , K.
The system model considered thus far assumes full place- 3 For each observable island, compute the minimum weight
ment of power flow/injection measurements. However, in spanning tree Li,minsptr of the corresponding graph.
K
4 Set LD = ∪i=1 Li \ Li,minsptr .
practice, the sensors may only be deployed partially. Moreover,
the attacker may have access to only a subset of the deployed
sensors, either due to his/her limited resources or due to Complexity of Algorithm 2: If the attacker has access to
the protection measures deployed at some of the sensors. M measurements (where M ≤ 2L + N , then the complexity
The sensor placement and the attacker’s accessed measure- of computing the observable islands using numerical methods
ments affects the D-FACTS deployment solution, since they is O(M N ) [31]. The overall complexity of finding MSTs in
determine the knowledge of branch power flows that the all the observable islands is O(N L).
attacker can obtain (which is essential for the attacker to Note that in the general case where the there are no
craft an undetectable CCPA as argued in Section III). In protected sensors or if the system operator has no knowledge
7
of the sensors that can be accessed by the attacker, our original SA and SD respectively. The attacker’s action set is the subset
MTD deployment (Algorithm 1) remains valid. An illustration of links it disconnects physically. We denote the set of links
of D-FACTS devices under partial sensor deployment is pre- disconnected by the attacker under action ai by Lai , where,
sented in Section VII. Lai ⊆ L, i = 0, 1, . . . , NA − 1. The action a0 corresponds
to the case when the attacker does not attack any link. Note
C. Attack Detection Under the AC Power Flow Model that the attacker’s action set SA may also include multiple
line disconnections. The defender’s action is to select a subset
The proposed MTD design will also be effective under the of links within LD whose reactances will be perturbed. We
AC power flow model. This can be explained as follows. Re- denote the set of links chosen by the defender under action di
call that in general, an undetectable CCPA can be constructed by Ldi , where, Ldi ⊆ LD , i = 1, . . . , ND − 1. The action d0
as a = z − zp , where a is the attack vector to be injected, z corresponds to the case when the defender does not perturb
are the original measurements and zp are the measurements the reactance of any link.
following the physical attack (line disconnection). In the case
Next, we characterize the payoff ud (sD , sA ). If the at-
of the DC power flow model, as shown in [5], the attack
tacker’s chosen action sA includes a link l that is protected
vector a can be computed in an optimized manner using the
by the defender (via MTD), then the CCPA will be detected
knowledge of the reactances of only a few links with in the
by the BDD, and the operator can quickly restore the link
grid (as explained in Section III of the paper). In contrast, no
to avoid any further damage. For instance, the defender can
optimal methods are known to compute the attack vector a for
quickly restore the circuit breaker of the disconnected link to a
the AC power flow model. Thus, the attacker will have first
closed position. On the other hand, if the attacker disconnects
to recompute the AC power flow equations to obtain zp , and
a subset of links that are not protected, then the CCPA will go
subsequently obtain the attack vector a. This in turn would
undetected. The link disconnection will result in redistribution
require the knowledge of the reactances of all the links in
of power flows. Consequently, the power flow on some of the
the power grid. Thus, our proposed MTD design will remain
links may exceed the corresponding thermal limits. The system
effective because it invalidates the attacker’s knowledge of the
operator will notice the power flow violations and initiate a
branch reactances of the links within LD .
generator dispatch/load shedding to resolve this issue, which
Alternately, the recent work [32] shows that the attacker
in turn rebalances the power flows and rectifies the overflows.
can use line outage distribution factors (LODFs) to compute
We denote the cost of load shedding at bus i by Ci,s (Lsi ),
zp . However, the computation of LODFs also require the
where Lsi (≤ Li ) is the quantity of load that is shed, and
knowledge of the line reactances of all the links in the
ds = [D1s , . . . , DN
s
].
grid [33] (note that [32] also assumes that the attacker has
Let COPF (am , dn ) denote the OPF cost when the attacker
knowledge about the topology of the entire system). Our
takes an action am and the defender takes an action dn , which
extensive simulation results presented in Section VII verify
can be computed as follows:
that the proposed MTD design remains effective under the AC
power flow model under both full power flow recomputation
X
COPF (am , dn ) = min Ci,g (Gi ) + Ci,s (Dis ) (4)
as well as using LODFs. g,ds
i∈N
s.t. gaPm ,dn (pG , θ, v, x) = 0,
VI. G AME T HEORETIC MTD ROBUST S TRATEGIES
s.t. gaQm ,dn (pG , θ, v, x) = 0,
MTD perturbations incur an operational cost, and perturbing
the reactances of a large set of links may not be cost effective. |g − g0 | ≤ ∆maxg
Instead, we propose to protect only a subset of links from f ∈ F, g ∈ G,
physical attacks depending on the operational system state, as
well as the perceived threat to those links. This is approached where Bam ,dn is given by Bam ,dn = Aam ,dn Dam ,dn ATam ,dn .
using a game-theoretic formulation presented next. Here, Aam ,dn is the bus-branch connectivity matrix when
1) Zero-sum Game Formulation: We define the strategic the attacker and the defender choose actions am and dn
interactions between the attacker and the defender as a two- respectively. These quantities are computed as in Algorithm 2.
player zero-sum game. To formalize this, we define the game
as a triplet Γ , ({D, A}, {SD , SA }, {uD , uA }) in which the ALGORITHM 3: Payoff Computation
components are: (i) the set of players {D, A}; (ii) SD and Data: am , dn
SA , the sets of actions that defender and attacker can take Result: COPF (am , dn )
respectively; and (iii) the payoffs of the players uk : SD × 1 Set branch reactances to xdn .
2 Set Aam ,dn = Aa0 ,d0 .
SA → R for k ∈ {D, A}, where uk (sD , sA ) measures the 3 Solve (4) to obtain COPF (a0 , dn ).
benefit obtained by player k when the action profile that has 4 if attack is successful then
been played is s = (sD , sA ). In a zero-sum game, the payoffs 5 Set Aam ,dn , Dam ,dn by removing the branches Lam .
are opposite and uA (sD , sA ) = −uD (sD , sA ) such that a cost Solve (4) to compute COPF (am , dn ).
for the defender is a benefit for the attacker and vice-versa. 6 else
7 Set COPF (am , dn ) = COPF (a0 , dn ).
We denote the attacker’s and the defender’s action sets by 8 end
SA = {a0 , a1 , . . . , aNA −1 } and SD = {d0 , d1 , . . . , dND −1 }
respectively, where NA and ND are the cardinality of the sets
8
Complexity of Algorithm 3: The payoff computation that are not played at the NE give strictly lower payoffs than
algorithm requires solving NA ND DC OPF problems. Note the ones that are played, for both players.
that DC OPF problems can be solved efficiently, e.g., by This characterisation allows one to compute the mixed NE
interior point methods in polynomial time. in a straightforward manner (by solving a linear system of
Based on the above, the defender’s payoff is given by equations), if the exact face of the simplex ∆D × ∆A on
which the NE (p∗D , p∗A ) lies is known. Computing this among
COPF (d0 , a0 ) − COPF (sD , a0 ), if IS = 0 2ND +NA possible faces is a well-known difficult problem: the
uD (sD , sA ) =
COPF (d0 , a0 ) − COPF (sD , sA ), if IS = 1. Lemke-Howson algorithm is the best combinatorial algorithm
The term IS is an indicator variable to represent the success [35] and is PSPACE-complete.3 . In the worst case scenario,
(IS = 1) or failure of an attack (IS = 0). Both players aim to it performs as poorly as exhaustive search (complexity grows
choose their actions such that their own payoff is maximized exponential with the dimension of the action profile set), but
and we can see that the two players have contradictory in general it can be quite efficient.
objectives. Aside from complexity, the major drawback of the Lemke-
The above payoffs can be explained by looking at them Howson algorithm is that it requires perfect and complete
as negative costs. First, COPF (d0 , a0 ) denotes the benchmark information of the game G at both players. In an adversarial
operating cost of the defender when none of the players setting, assuming that the players know precisely the action
takes an action to either disrupt or defend the system. The sets of their opponent and the payoff function is not realistic.
term COPF (sD , sA ) − COPF (d0 , a0 ) denotes the the additional Instead, we suggest to exploit iterative learning processes that
cost incurred by the defender and caused by a successful allow the players to compute their NE strategies by learning
attack, when the attacker chooses sA and the defender chooses from their past interactions.
sD . The term COPF (sD , a0 ) − COPF (d0 , a0 ) represents the 3) Machine Learning to Solve the Game: We focus here
additional cost incurred by the defender for choosing an action on the exponential weights for exploration and exploitation
sD against an unsuccessful attack sA . Hence, the aim of the (EXP3) algorithm, also known as the multiplicative weights,
defender is to minimize these costs, whereas the attacker will which is a ubiquitous iterative decision process that has been
seek to maximize them. repeatedly discovered in machine learning, optimization and
2) Nash Equilibrium Solution: In such an interactive sit- game theory [21], [36]; having wide applications such as: Ad-
uation, the natural solution is the Nash equilibrium (NE). aBoost for classification and prediction [37], pooling problems
However, the game Γ above is discrete and finite and may for blending industries [38], graphical model learning [39],
not have a NE solution in pure strategies. Instead, it always and learning the NE in certain types of games [21], [40] (e.g.,
has at least one mixed-strategy NE [34], which is the NE of its potential games, two-player zero-sum games, etc.).
extension to mixed strategies. The latter is defined as follows: The main idea of EXP3 is that, at each iteration t, the
e , ({D, A}, {∆D , ∆A }, {ũD , ũA }). The action sets of the
Γ decision agent k chooses a random action sk,t following the
extended game Γ e are the probability simplices of dimension probability distribution pk,t = [pk,t (s)]s∈Sk . As a result, the
n PNk o agent observes the value of its payoff: uk (sk,t , s−k,t ), based
Nk , k ∈ {D, A}: ∆k = pk ∈ RN +
k
p
j=0 k,j = 1 where
on which the cumulative scores of all actions are updated.
pk = (pk,0 , . . . , pk,Nk −1 ) is the discrete probability vector of Since only the payoff of the realized action can be observed,
player k such that pD,j and pA,j represent the probability of we have to estimate the payoffs of the unplayed ones. For this,
choosing the action dj by the defender and the probability the following pseudo-estimator can be built, for any s ∈ Sk
of choosing the action aj by the attacker, respectively. The
modified payoffs are simply the resulting expected payoffs uk (sk,t , s−k,t )I{s = sk,t } + βt
ûk,t (s) = (6)
following the randomization of play: pk,t (s)
D −1 NX
NX A −1 where the first term, uk (s, s−k,t )I{s = sk,t }/pk,t (s) is known
ũk (pD , pA ) = uk (dj , ai ) pD,j pA,i . (5) as importance sampling and represents an unbiased estimator
j=0 i=0 of uk (s, s−k,t ) for any s at time t. To control the variance of
this estimator, the bias βt > 0 is introduced [21].
The mixed NE is a stable state to unilateral deviations,
PThe cumulative score of action s is defined as: Gk,t (s) =
which means that no player can benefit by deviating from t
their NE strategy individually. τ =1 τ ûk,τ (s), which measures how well the explored ac-
η
tions have performed in the past. Then, these cumulative scores
Definition 1. A strategy profile (p∗D , p∗A ) is a mixed NE are mapped on the probability simplex via a well chosen
for the game Γ, iff the following conditions are met: exponential map
ũD (p∗D , p∗A ) ≥ ũD (pD , p∗A ), ∀ pD ∈ ∆D , and ũA (p∗D , p∗A ) ≥
1 exp(Gk,t (s))
ũA (p∗D , pA ), ∀ pA ∈ ∆A . pk,t+1 (s) = γt +(1−γt ) P , ∀s, (7)
|Sk | r∈Sk exp(Gk,t (r))
The mixed NE can also be characterized by the Von-
Neumann indifference principle [34], which requires that: i) 3 PSPACE-complete problems are the hardest problems in polynomial space
player k is rendered indifferent between its pure actions played (PSPACE, i.e., problems that can be solved using an amount of memory that
is polynomial in the input length) and are such that every other PSPACE
at the NE (with strictly positive probability), by the choice of problem can be transformed to it in polynomial time. They are suspected to
the other player p−k , for all k ∈ {D, A}; and ii) the actions lie outside of the set of NP-hard problems but this remains to be proven.
9
where γt ∈ (0, 1] and ηt > 0 are learning parameters observe the payoff value as a result of their chosen actions at
and |Sk | denotes the cardinal of the set Sk . Intuitively, this each iteration. Moreover, unlike the Lemke-Howson algorithm,
means that the actions that have been performed well in the EXP3 is shown to be robust to estimation errors of the payoff
past are played with relatively higher probability, without observation [40]: under mild assumptions on the error process
discarding completely poor or unexplored actions that may (zero-mean and finite variance) EXP3 retains its O(T −1/2 )
perform well in the future; this illustrates the data exploration convergence speed even when only a noisy observation of the
vs. exploitation tradeoff. The updated probability distribution payoff is available at each iteration.
pk,t+1 will hence be used by player k to generate the random Complexity of EXP3: One iteration of EXP3 has a
action at the next iteration t + 1 and so on. The details are relatively low complexity, increasing only linearly with the
provided in Algorithm 3 below. number of possible actions, i.e., O(|Sk |) for player k. Also,
the convergence speed of EXP3 is O(T −1/2 ), which implies
ALGORITHM 4: EXP3 to compute the NE at player k that to reach an ε-neighborhood of the Nash equilibrium
Data: γt , βt , ηt , uk (sk,t , s−k,t ) solution, the algorithm requires 1/ε2 iterations. We note that
Result: pk,t in the adversarial setting, this convergence speed is optimal
1 Initialize t = 1, pk,1 (s) = |S1k | , Gk,0 (s) = 0 for all s ∈ Sk and cannot be improved because of the limited available
2 while pk,t has not converged do information [21].
3 Draw random action sk,t following distribution pk,t At this point, some comments on the convergence time
4 Observe payoff uk (sk,t , sk,−t )
5 Compute pseudo-estimations ûk,t (s), ∀s ∈ Sk as in eq. of EXP3 algorithm are in order. First, the convergence time
(6) depends on the defender’s action set SD and also on the
6 Update cumulative scores: attacker’s action set SA , which can be exponentially large
when considering multiple line disconnections. Second, the
Gk,t (s) = Gk,t−1 (s) + ηt ûk,t (s), ∀s ∈ Sk
convergence time O(T −1/2 ) may seem long. Nevertheless,
7 Update probability distribution pk,t+1 as in eq. (7) for the proposed MTD application, these factors will not be
8 Compute empirical frequency pk,t as in (8) a major issue, for two reasons: (i) As we have explained
9 Step t ← t + 1
10 end in Section IV-A, the time period between successive MTD
perturbations can be reasonably long (e.g., in [13], we show
that hourly MTD perturbations might be realistic for practi-
Notice that the three parameters of the algorithm: βt , γt , and cal systems). (ii) For several combinations of multiple line
ηt have to be very carefully tuned. Indeed βt > 0 controls the disconnections, the net measurements following the CCPA,
bias vs. variance tradeoff of the payoff estimator ûk,t (s), ∀s i.e., zp + ∆Hθp turns out to be significantly different from
necessary because of the limited information available to the original measurements z (before the CCPA). Such attacks
player k. Indeed, only the value of the payoff uk (sk,t , s−k,t ) can be easily detected by the system operator, by monitoring
is observed while the opponent’s action s−k,t is unknown. the change in the measurements over successive monitoring
The parameters γt ∈ (0, 1] and ηt > 0 impact the periods. Note that under normal system operation, the mea-
exploration vs. exploitation tradeoff. For γt = 1 (or η 1) surements change only due to the fluctuation in the system
there is no data exploitation but pure exploration (no learning load, which tends to be smooth. Thus, any abrupt change in
from past results), as the actions are drawn following the the system measurements can trigger an alarm. Thus, many of
uniform distribution; whereas for γt 1, the exploration term the multiple line disconnections need not be considered in the
is reduced in favour of the exponential weights. For ηt 1, NE computation. We verify this using simulations presented
the algorithm stops exploring at the first chosen action. in Section VII, Fig. 7 .
In [21], it was shown that the EXP3 algorithm above At last, if one were to investigate AC OPF considerations,
converges to the mixed Nash equilibria. The theoretical result this would only change the computation of COPF (am , dn ) in
is reported below as in [41], for the sake of simplicity. equation (4). Algorithm 3 would be impacted in the sense
Proposition 1. If each player of the zero-sum game Γ runs that it would require solving NA ND AC OPF problems as
the EXP3 opposed to NA ND DC OPF problems. Although this would
p algorithm with parameters: p
γt = |Sk | log |Sk |/t, ηt ≈ βt = 2 log |Sk |/(t|Sk |), then imply a change in the values of the payoffs uD (sD , sA ) and
their empirical frequencies uA (sD , sA ), this would not change the structure of the game
t Γ under study, which remains a two player zero-sum game
1 X
with discrete and finite sets of actions. Once the new values
pk,t = Pt ητ pk,τ (8)
τ =1 ητ τ =1
of the payoffs are computed, the entire analysis of the Nash
equilibrium solution provided above still stands. Furthermore,
converge asymptotically to the set of the mixed NEs. The
the machine learning approach to find the Nash equilibrium
convergence rate is O(T −1/2 ), where T is the horizon of play.
also stands.
Information requirement of EXP3: The major advantage
of the EXP3 algorithm is the fact that it does not require VII. S IMULATION R ESULTS
the players to have complete and perfect knowledge of the Below, we present our simulation results, performed with
game (as opposed to the Lemke-Howson algorithm), neither the MATPOWER toolbox, to show the effectiveness of the
the actions chosen by their adversary. Both players only proposed defense.
10
TABLE I: D-FACTS placement under partial placement of sensors for the IEEE-24 bus system. Underline indicates sensors
that are inaccessible to the attacker. Observable islands with attacker’s accessed measurements {1,2,4,5}, {3,15,24}, {6},
{7,8,9,10,12}, {11}, {13}, {14,16,19,20}, {17,21,22}, {18} and {23}.
0.6 0.6
Fig. 7: Cumulative distribution function (CDF) of Λ over
100 combinations of multiple line outages and measurements
0.4 0.4 derived from normal load fluctuations.
0.2 0.2
Howson algorithm (based on complete and perfect game
0
0 0.5 1 1.5 2
0
0 0.5 1 1.5 2
knowledge) lead to the same NE solution, which validates
104 104 our approach. Then, the convergence rate of EXP3 is also
Fig. 6: Convergence of p̄k,t to the NE for IEEE-14 bus system. reasonably fast (within 104 iterations). Furthermore, it can be
Left: Heavily loaded system. Right: Lightly loaded system. observed that the NE robust solution depends on the system
load. While, in the heavily loaded scenario, all the links in
Load scenario NE D-FACTS NE Full LD need to be perturbed, in the lightly loaded scenario, it is
perturbation set defense defense sufficient to perturb the reactance of link 1 only. The rationale
(Lemke-Howson) cost cost
is that in the lightly loaded scenario, only a subset of links
Scenario 1 {1,3,5,8,9,18,19} 9.85 % 9.85 %
Scenario 2 {1 } 0.8 % 4.37 % need to be protected from physical attacks, since the attacker
is unlikely to target the unimportant links (i.e., the links that
TABLE III: D-FACTS perturbation set and defense cost (the have very little power flow).
% increase in OPF cost) at the NE for different system loads. In Table III, we also list the defense cost as the percentage
increase in the OPF cost over its optimal value (by solving
(1)). The NE solution of scenario 2 incurs much lower defense
is chosen to be 160 MWs for link 1, and 60 MWs for all cost, since only a subset of links are perturbed. The above
other links. We consider two scenarios: (1) heavily loaded experiments show that the MTD perturbation set depends on
system, with the load values at Bus 1 to 14 given by the operational state of the system. Also, by exploiting the NE
0, 21.7, 94.2, 47.8, 7.6, 11.2, 0, 0, 29.5, 9, 3.5, 6.1, 13.5, solution, the operator can reduce its defense cost.
14.9 MWs respectively, and (2) lightly loaded system, with the We also examine in the change in the measurements due
load values at Bus 1 to 6 0, 80, 44.2, 47.8, 30, 11.2 MWs to CCPA with respect to the original measurements due to
respectively and zero loads at Bus 7 to 14. We consider multiple line disconnections. We conduct simulations using
five MTD perturbation strategies for the defender, i.e., d1 = the IEEE-14 bus system and record Λ defined as
{1}, d2 = {1, 3}, d3 = {1, 3, 5}, d4 = {1, 3, 5, 8}, d5 =
{1, 3, 5, 8, 9, 18, 19}. We note that d5 = LD , which protects (zp + ∆Hθp )i − zi
Λ= max
all the links of the system from CCPA. In each case, we perturb i=1,...,M zi
the link reactance by 15% of their original values. The attacker for 100 different combinations of multiple line outages
in turn launches a CCPA by disconnecting one of the links at a (2−, 3− and 4− simultaneous line outages). We plot the
time. Under this set-up, we compute the NE solution according cumulative distribution function (CDF), F (Λ) of Λ obtained
to EXP3 algorithm, with γt = βt = 0 and ηt = 0.01. by the 100 different line outage combinations. We also plot
The evolution of p̄k,t in the two scenarios are shown in the CDF of the difference is successive measurements due to
Fig. 6. First, we have verified that the EXP3 and the Lemke- the normal load changes (the load data is obtained from New
12