Adaptive MTD Security using Markov Game
Modeling
Ankur Chowdhary, Sailik Sengupta, Adel Alshamrani, Dijiang Huang, and Abdulhakim Sabur
arXiv:1811.00651v1 [cs.CR] 1 Nov 2018
Arizona State University
{achaud16, sailiks, aalsham4, dijiang, asabur}@asu.edu
Abstract—Large scale cloud networks consist of distributed networking and computing elements that process critical information
and thus security is a key requirement for any environment.
Unfortunately, assessing the security state of such networks is
a challenging task and the tools used in the past by security
experts such as packet filtering, firewall, Intrusion Detection
Systems (IDS) etc., provide a reactive security mechanism. In
this paper, we introduce a Moving Target Defense (MTD) based
proactive security framework for monitoring attacks which lets us
identify and reason about multi-stage attacks that target software
vulnerabilities present in a cloud network. We formulate the
multi-stage attack scenario as a two-player zero-sum Markov
Game (between the attacker and the network administrator)
on attack graphs. The rewards and transition probabilities are
obtained by leveraging the expert knowledge present in the
Common Vulnerability Scoring System (CVSS). Our framework
identifies an attacker’s optimal policy and places countermeasures to ensure that this attack policy is always detected, thus
forcing the attacker to use a sub-optimal policy with higher cost.
I. I NTRODUCTION
A cloud data center consists of software and services from
various vendors. Although the security policies of an organization might be up to date, vulnerabilities in software and
presence of untrusted insiders can put sensitive information
and communication in the network at risk.
Traditional defense mechanisms in networks are composed
of distributed elements such as firewalls, Intrusion Detection
Systems (IDS), log monitoring systems, etc. Also, most of the
defense mechanisms are based on reactive/incident-response
mechanism. In a modern-era, such an approach can lead to
loss of business. Therefore, we need a pro-active approach
that anticipates potential weak links in security and assesses
the possible behavior of the attacker, in effect providing a
defense mechanism that will lead to increased difficulty for
an attacker to exploit the network.
One such approach that has emerged based on pro-active
defense is known as Moving Target Defense (MTD). The goal
in network-based MTD is to reconfigure the network services
and connectivity in a way that any strategy devised by attacker
based on a static view of the network becomes less effective.
MTD based adaptive security can increase exploitation surface
and decrease attack surface compared to a static system.
An ad-hoc approach of switching services and connections in
the network can prove to be more catastrophic than useful.
Thus, some intelligent scheme is required to take such key
decisions. We need to perform some cost-intrusiveness anal-
ysis before taking some decision that can have a cascading
effect on dependent system components. Game Theory has
proved to be very effective in economics, biology, and other
areas for taking some important decisions. In this research
work, we utilize dynamic game of perfect information between
the attacker and the system administrator to create an MTD
strategy against attackers targeting software vulnerabilities.
The key contribution of this research work are as follows:
• Dynamic Game to perform MTD attack analysis and
countermeasure evaluation. Most of the works we came
across consider security as a static game. The dynamic
game we present mimics realistic network security scenario where attack and defense is a continuous process.
• Optimal countermeasure selection, which identifies the
critical nodes in an Attack Graph based on CVSS
score [6] as a reward metric, and selectively applies
countermeasure to mitigate security threats in a cloud
network.
II. BACKGROUND
Definition 1. A vulnerability is a security flaw in a software
service hosted over a given port, that when exploited by a malicious attacker, can cause loss of Confidentiality, Availability
or Integrity (CIA) for a virtual machine (VM).
A. Threat Model
Consider the cloud system in Figure 1(a), where the attacker
has user level access to the LDAP server, which is the initial
state of our game and the goal state is to compromise the
FTP server. The attacker can perform actions such as 1:
exploit-LDAP, exploit-Web, exploit-FTP. In the scenario above
the attacker has two possible paths to reach the goal node
priv(attacker, (FTP: root)), i.e.
• Path 1: exploit-LDAP → exploit-FTP
• Path 2: exploit-LDAP → exploit-Web → exploit-FTP
The Admin, on the other hand, can choose to monitor (1)
running services, (2) network traffic along both the paths
using network and host-based monitoring agents, e.g., monitorLDAP, monitor-FTP, etc. We assume that the Admin has a
limited budget and thus must try to perform monitoring in an
optimized fashion. On the other hand, the attacker should try
to perform attacks along the path not monitored by the Admin.
For example, if the attacker is monitoring traffic only between
LDAP and FTP server (mon-LDAP, mon-FTP), the attacker
can choose Path 2 to avoid detection.
Table I
V ULNERABILITY I NFORMATION FOR THE C LOUD N ETWORK
Attacker
public-net
1: exploit-LDAP
VM
LDAP
Vulnerability
Local Priv Esc
Web
Server
FTP
Cross Site Scripting
Remote
Code
Execution
CVE-2016-5195
monitor-LDAP
LDAP Server
private-net
Admin monitor-Web Server
monitor-FTP
CVE-2015-3306
CVE
CVE-20165195
CVE-20175095
CVE-20153306
CIA
5.0
AC
MEDIUM
7.0
EASY
10.0
MEDIUM
CVE-2017-5045
3: exploit-ftp
FTP Server
Web Server
(a) Motivating Example multi-stage attack
vulExists (LDAP,
Local Priv.
Escalation)
priv (attacker, (LDAP
:user))
Attack Path 1: 1 -> 2(a)
Attack Path 2: 1 -> 2(b) -> 3
RULE 1: execCode
(LDAP)
represents how difficult it is to exploit a vulnerability, and,
Confidentiality, Integrity, and Availability (CIA) gained by
exploiting the vulnerabilities in the cloud network above. The
values of AC are categorical {EASY, MEDIUM, HIGH},
while CIA values are in the range [0, 10].
B. Game Theoretic Modeling
vulExists (FTP,
Remote Code
Execution)
priv (attacker, (LDAP :
root, FTP: user, Web Server:
user))
vulExists (Web
Server, Cross Site
Scripting )
RULE 4: execCode
(Web Server)
RULE 2: execCode
(FTP)
priv (attacker, ( FTP: user,
Web Server: root))
RULE 4: execCode
(FTP)
priv (attacker, (FTP: root))
(b) Attack Graph corresponding to multi-stage attack
Figure 1. An example cloud network scenario.
Attack Graphs (AGs) have proved to be a successful tool for
modeling attack behavior. Sheyner et al. [11] have discussed
a framework for analyzing attacks using formal methods,
in which they model the attacker’s behavior as an MDP.
Unfortunately, the authors have not studied the impact of
deploying MTD countermeasures on normal services of the
system. We utilize attack graphs to define actions of Attacker
and Admin over different stages of the network.
Attack Graph G = {N, E} consists of a set of nodes (N)
and a set of edges (E) where,
• As shown in the Figure 1(b), the nodes (N) of attack graph
can be denoted by N = {Nf ∪ Nc ∪ Nd ∪ Nr }. Here
Nf denotes primitive/fact nodes e.g. vulExists (LDAP,
Local Priv. Escalation), Nc denotes the exploit, e.g.,
execCode(LDAP), Nd denotes the privilege level, e.g.,
priv(attacker, (LDAP :user)) and Nr represents the root
or goal node, e.g., priv(attacker, (FTP: root));
• The edges (E) of the attack graph can be denoted by
E = {Epre ∪Epost }. Here Epre ⊆ (Nf ∪Nc )×(Nd ∪Nr )
ensures that pre-conditions Nc and Nf must be met to
achieve Nd and Epost ⊆ (Nd ∪ Nr ) × (Nf ∪ Nc ) means
post-condition Nd achieved on satisfaction of Nf and Nc .
The Table I, shows the Access Complexity (AC) which
Players: We consider the cloud system to be a dynamic game
with imperfect information between two players (Attacker:P1
and Admin:P2 ). P1 is located outside the cloud network, or is
a stealthy attacker having user-level access on a particular VM
in the cloud network. P2 has the global view of the network.
Goals: The goal of the attacker is also well defined, which
in this case is to obtain root privileges on critical resource(s)
of the cloud network like the FTP-Server. We consider the
system architecture and a particular use case as shown in
Figure 1(a). Further, the attack mounted by the attacker is
considered monotonic, i.e., once an attacker has reached a
certain state, they do not need to go back to any previous
state, when targeting a specific goal.
States: represent the privilege attacker/defender currently have
in the network over different resources. We extract the information from the network attack graph to define the state information, e.g., for the attacker, initial state s1 = (LDAP, user),
on the successful execution of the exploit-LDAP, the attacker
can transition to another state s2 = (LDAP, root).
Actions and State Transitions: The P1 has two possible
actions in each state for the example defined in the Figure 1,
e.g., a11 = no-action, a21 = exploit-LDAP. Similarly the P2 has
possible actions to monitor LDAP, i.e., a12 = mon-LDAP, or
perform no monitoring, i.e., a22 = no-mon.
The Figure 2, shows the probabilities of players P1 , P2 actions
in each state of the attack along the Path 1, exploit-LDAP
→ exploit-FTP. The state-transition in the Markov Game is
conditioned upon the actions of both players, in each state, as
shown above. Initially the attacker is present in the state (user,
LDAP). The attacker can choose to take an action from the
set {no-exploit, exploit-LDAP} and the probability of taking
action exploit-LDAP is 0.66 for the attacker. Similarly, the
Admin, has two possible actions, i.e., {no-mon, mon-LDAP}.
The admin performs mon-LDAP with a probability 0.5.
If the attacker is in state ’s’, then τ (s, a1 , a2 , s0 ) is the next
state of the game provided P1 and P2 take actions a1 and a2
in the state s. In the example above the attacker is only able to
exploit LDAP vulnerability if τ (s1 =(LDAP, user), a1 =exploitLDAP, a2 =no-mon, s2 =(LDAP,root)) > 0.
0.34
S1: (user,
LDAP)
0.50
no-action
mon-LDAP
0.34
1:exploit-LDAP
no-action
0.66
no-mon
0.66
1: exploit-LDAP
0.50
mon-LDAP
0.50
no-mon
0.50
S2: (root,
LDAP)
Definition 2. A Markov game for two players P1 and P2 can
be defined by the tuple (S, A1 , A2 , τ, R, δ) where,
• S = {s1 , s2 , s3 , . . . , sk } are finite states of the game,
1 2
m
• A1 = {a1 , a1 , . . . , a1 } represents the possible finite
action sets for P1 ,
1 2
n
• A2 = {a2 , a2 , . . . , a2 } are finite action sets for P2 ,
0
• τ (s, a1 , a2 , s ) is the probability of reaching a state
s0 ∈ S for state s if P1 and P2 take actions a1 and
a2 respectively,
• R(s, a1 , a2 ) is the reward obtained by P1 if in state s,
P1 and P2 take the actions a1 and a2 respectively. Note
that the reward for P2 is −R(s, a1 , a2 ) by definition of
a zero-sum game, and
• γ 7→ (0, 1] is discount factor for future discount rewards.
Blocked
0.10
0.50
no-action
mon-FTP
0.90
no-action
0.10
2(a): exploit-FTP
2(a):exploit-FTP
0.90
0.50
no-mon
0.50
mon-FTP
0.50
no-mon
S3: (root,
FTP)
Figure 2. Actions and State Transitions for Markov Game along Path 1
Rewards: The reward function is dependent upon the actions
of the attacker and the defender in each state. We refer to the
CIA values defined in Table I for the reward associated with
successful/unsuccessful action of P1 and P2 .
P1
no-act
exp-Web
exp-FTP
no-mon
0, 0
7, −7
10, −10
P2
mon-Web
0.5, −0.5
−7, 7
10, −10
mon-FTP
0.5, −0.5
7, −7
−10, 10
Table II
TABLE FOR STATE s2
no-act
LDAP
no-mon
0, 0
5, −5
mon-LDAP
0.5, −0.5
−5, 5
knowledge in the the Attack Graph (AG) defined in Section II.
Besides the obvious Markovian assumption, we further assume
that this model has (1) states and actions that are both discrete
and finite, and (2) transition from each state and the reward
in each state depends on the action each player decides to
take in that state. We now formally define a zero-sum Markov
Game model and then clearly highlight how each of these are
obtained in our setting.
no-act
LDAP
no-mon
0, 0
10, −10
mon-LDAP
0.5, −0.5
−10, 10
Table III
TABLE FOR STATE s0 ( LEFT ) AND s1 ( RIGHT ).
Table III, shows the possible actions in three important states
of the attack propagation, and the corresponding reward metric. When the attacker is present in the state (user, LDAP) Table III(a), the P1 can take actions {no-act, exploit-LDAP},
similarly the P2 can choose to monitor or not monitor the
LDAP server, so the actions for P2 are {no-mon, mon-LDAP}.
The reward function is defined as R(s, a1 , a2 , s0 ), where s
is the current state and a1 , a2 are actions of both players.
For instance if s1 =(LDAP, user), a1 =exp-LDAP, a2 =no-mon,
R(s, a1 , a2 , s0 ) = 5.0 for the attacker and -5.0 for the Admin.
Similarly the reward metrics for other possible states has been
defined in the transition tables above.
III. M ARKOV G AME
We model the scenario between an attacker and an administrator as a two-player zero-sum Markov game leveraging the
We model each VM in the AG as a state in our game. The
attacker is trying to take actions that help it to reach a particular VM, which is the terminal state, while the administrator’s
action represents placing a monitoring system to detect an
attack. Considering placing a VM incurs cost (negative reward)
and there might be multiple vulnerabilities in a single VM,
both the attacker and defender have multiple actions depending
on the state they are in. If the attacker tries to exploit a
vulnerability in a state for which the defender choose to place a
detection system, they will are detected and blocked with high
probability (as defined by τ ) and gets a negative reward (which
implies the defender gets a positive reward). Otherwise, it has
higher probability of succeeding in the attack, thus moving
closer to the goal state in the AG and obtaining a positive
reward. A normal-form zero-sum reward matrix for three states
in our simple Markov game is shown in Table III. Note that,
in this formulation which is a first step, we assume that the
states are visible to both the players, which implies that the
defender gets to know if an attacker has succeeded and hence
moved on the a new state s0 from s. We plan to consider partial
observability about the state as future work.
In this game, P1 will try to maximize his expected discounted
reward, while P2 will try to select actions that minimize the
expected reward for P1 . We consider the min − max strategy
for calculating the expected reward of P1 in our Markov game.
Due to the underlying stochasticity of the game, the players
have to reason about the expected reward that they will get for
making a particular decision in a certain state. Going forward,
we will reason about the rewards for P1 , i.e. the attacker and
by the property of a zero sum game, we will show that an
optimal policy for the defender seeks to reduce this reward
value. We now define the quality of an action or the Q value
used to represent the expected reward P1 will get for choosing
Algorithm 1 MDP - VALUE ITERATION
action a1 ∈ A1 (while P2 chooses a2 ∈ A2 ),
X
1: procedure E MIT- V ∗ (S)
AT tk
τ (s, a1 , a2 , s0 ) · V (s0 ) (1) 2:
Q(s, a1 , a2 ) = R(s, a1 , a2 ) + γ
V0∗ (s) = 0 for all s. {Initialize value function to start}
s0
3:
s ← s0
Let us now denote the mixed policy for state s as π(s), which
4:
τ (s, a, s0 ) {Transition probability from state s to s’}
is a probability distribution that P1 can now have over the
5:
R(s, a, s0 ) {Reward obtained by taking action a in
possible actions it can take in state s. With that, we can define
state s}
the value of state s for P1 using the equation,
6:
δ ∈ [0, 1] {Discount factor}
X
7:
i←0
V (s) = max min
Q(s, a1 , a2 ) · πa1
(2)
8:
loop: i == k break; P
a
π(s)
2
a1
9:
V ∗ (s) ← maxA s0 τ (s, a, s0 ) × [R(s, a, s0 ) +
∗
Note that the the max-min strategy of attacker (and the
γV (s0 )]
P
defender) can be captured by using a modified version of the 10:
π(s) = argmaxa s0 τ (s, a, s0 ) × [R(s, a, s0 ) +
classic value iteration algorithm. Using the min-max strategy
γV ∗ (s0 )]
implies that the defender is trying to minimize the reward 11:
i←i+1
of then attacker by placing detection nodes in the various 12:
goto loop.
states that are a part of the various attack paths present in
the attack graph, while the attacker is trying to use a mixed
policy π(s) over it possible actions in A2 to maximize their VM. The attacker’s goal is to obtain ‘root’ level privileges on
total reward. Thus, the Markov Game framework helps the the target VM. We further utilized cve − search [8] to obtain
administrator model the attackers policy so that they can take host and guest vulnerability information.
necessary countermeasures (decision in each state for player We conducted Markov Game cost-benefit analysis for the
P2 ) to minimize the expected utility for the attacker.
two players– Attacker and Admin – on the system (with
V M1 , V M2 , V M3 ) defined in our evaluation network. We
IV. I MPLEMENTATION AND E VALUATION
consider 100 known vulnerabilities spread across the three
A. Implementation
VMs, that have not yet been fixed due to resource considNote that in order to implement the actual modified value erations. The goal state for the attacker is to obtain root level
iteration (as described by Eq. 1 and 2), one needs to solve privileges on all VM’s using the available system and network
a linear program for each state after obtaining the updated exploits. We conducted experiments to evaluate the effectiveQ values. This may become computationally inefficient for ness of strategic MDP Value Iteration based countermeasure
large real networks and thus, we make two assumptions, that vs naive strategy that selects only top x={10, 50}% of the
provide us with an approximate strategy for the players. First, vulnerabilities to apply countermeasure. The naive strategy
we restrict the attacker to select a pure strategy, i.e. π(s) ∀s assumes that the administrator should only select VMs with
has the value of 1 for only one action a ∈ A2 that it can vulnerabilities having high CIA for patching. The strategic
execute in state s and zero for all other actions. Second, we approach assumes that the administrator has conducted attack
consider that the defender has observability of the attacker’s analysis in advance and knows the high-value targets based
action and thus only use action pairs (a1 , a2 ) that respect the on MDP value iteration in the cloud network that may be
min-max condition. With these, we can not use the classic subjected to attack.
value iteration algorithm where the action set of the MDP As shown in the Figure 3 the reward value for Attacker
A is restricted action pairs mentioned above and the reward decreases as the administrator increases his attack surface
and transition function is a subset of the original ones for the coverage. The reward for attacker decreases from 160 to 96
Markov Game only defined for those action pairs that respect on providing a countermeasure for 30 percent vulnerable states
when a naive strategy is used. The value further decreases from
the min-max strategy.
∼ 96 to ∼ 65 on implementing countermeasure for half of the
B. Optimal Countermeasure Selection
vulnerable states. The countermeasures are deployed naively
We consider countermeasure deployment using the equilib- based on top vulnerabilities, irrespective of analysis of asset
rium strategy for the markov game, which means for each value to an attacker.
state of the game, the administrator minimizes the maximum In the case of a strategic approach, the administrator performs
benefit to the attacker. We use an OpenStack based cloud a thorough analysis of vulnerable states, high-value targets,
environment to make a small system with three VMs, i.e, a) shortest paths to goal states and strategically implements
Ubuntu 16.04: 192.168.1.5 b) Fedora 23: 192.168.1.6 countermeasures for those states. The reward value for attacker
c) metasploitable: 192.168.1.7. In this environment, we decreases from ∼ 133 to 45 for the attacker when administraassume that the attacker is initially located on the VM with tor increases countermeasures for about 30 percent vulnerable
Ubuntu 16.04 and has user-level privileges on it. The attacker states from 10 percent states. This value further decreases to
conducts a nmap scan to enumerate the vulnerable system and 30 for the attacker when the administrator has coverage for
network services that are present on the host and the target about 50 percent vulnerable states.
Attacker’s Reward Value
210
180
Naive Strategy
MDP Value Iteration
150
120
90
60
30
0
10
20
30
40
50
Percentage of States with countermeasures
Figure 3. P1 Reward Goal Value vs P2 ’s Naive and Strategic countermeasures
We can observe that the strategic approach affects attackers
reward significantly, a reduction of almost half. An attacker
can obtain the reward of 30 for a strategic approach, compared
to reward ∼ 65 for naive approach when administrator deploys
countermeasures for 50% vulnerable states. The interdiction of
attack paths by the administrator using Markov Game, being
the optimal, will strictly dominate any other strategy and thus,
significantly limit the attacker’s capability as the number of
vulnerabilities will increase in the cloud environment.
V. R ELATED W ORK
Sheyner et al [2] present a formal analysis of attacks on a
network along with cost-benefit analysis and security measures
to defend against the network attacks. In [1], Chowdhary
et al. provide a polynomial time method for attack graph
construction and network reconfiguration using a parallel computing approach, making it possible to leverage information for
strategic reason of attacks in large-scale systems.
Authors in [3] introduced the idea of moving secret proxies
to new network locations using a greedy algorithm, which
they show can thwart brute force and DDoS attacks. In [12],
Zhuang et al shows that MTD system designed with intelligent
adaptations improve the effectiveness further. In [10], authors
shows that intelligent strategies based on common intuitions
can be detrimental to security and highlight how game theoretic reasoning can alleviate the problem. On those line, Wei et
al [5] and Sengupta et al [9] use a game theoretic approach to
model the attacker-defender interaction as a two-player game
where they calculate the optimal response for the players
using the Nash and the Stackelberg Equilibrium concepts
respectively. Although they propose the use of the Markov
Decision Process (MDP) and attack graph-based approaches,
they leave it as future work.
In the context of cloud systems, Peng et al discusses a riskaware MTD strategy [7] where they model the attack surface
as a non-decreasing probability density function and then
estimate the risk of migrating a VM to a replacement node
using probabilistic inference. Kampanakis et al [4] highlight
obfuscation as a possible MTD strategy in order to deal with
attacks like OS fingerprinting and network reconnaissance in
the SDN environment. Furthermore, they highlight that the
trade-off between such random mutations, which may disrupt
any active services, require analysis of cost-benefits.
In this paper, we identify an adaptive MTD strategy against
multi-hop monotonic attacks for cloud networks which optimizes the performance while providing gains in security.
VI. C ONCLUSION AND F UTURE W ORK
A cloud network is composed of heterogeneous network
devices and applications interacting with each other. The interaction of these entities poses both (1) a security risk to overall
cloud infrastructure and (2) makes it difficult to secure them.
While traditional security solutions provide reactive security
mechanisms to detect and mitigate a threat, they may fail to
assess the damage to infrastructure due to a cascading security
breach. We presented Markov Game as an assessment tool
to perform a cost-benefit analysis of security vulnerabilities
and corresponding countermeasures in the cloud network.
The assessment shows that a network administrator needs to
proactively identify critical security assets and strategically
deploy available countermeasures. Game Theoretic approach
will help the administrator to quantify and minimize risk
provided limited resources.
ACKNOWLEDGMENT
This research is based upon work supported by the
NRL N00173-15-G017, NSF Grants 1642031, 1528099, and
1723440, and NSFC Grants 61628201 and 61571375. The
second author is supported by the IBM Ph.D. Fellowship.
R EFERENCES
[1] A. Chowdhary, S. Pisharody, and D. Huang. Sdn based scalable mtd
solution in cloud network. In Proceedings of the 2016 ACM Workshop
on Moving Target Defense, pages 27–36. ACM, 2016.
[2] S. Jha, O. Sheyner, and J. Wing. Two formal analyses of attack graphs.
In Computer Security Foundations Workshop, 2002. Proceedings. 15th
IEEE, pages 49–63. IEEE, 2002.
[3] Q. Jia, K. Sun, and A. Stavrou. Motag: Moving target defense against
internet denial of service attacks. In 2013 22nd International Conference
on Computer Communication and Networks, pages 1–9. IEEE, 2013.
[4] P. Kampanakis, H. Perros, and T. Beyene. Sdn-based solutions for
moving target defense network protection. In World of Wireless, Mobile
and Multimedia Networks (WoWMoM), 2014 IEEE 15th International
Symposium on a, pages 1–6. IEEE, 2014.
[5] K.-w. Lye and J. M. Wing. Game strategies in network security.
International Journal of Information Security, 4(1-2):71–86, 2005.
[6] NIST. CVSS. https://www.first.org/cvss, 2016. [19-Nov-2016].
[7] W. Peng, F. Li, C.-T. Huang, and X. Zou. A moving-target defense
strategy for cloud-based services with heterogeneous and dynamic attack
surfaces. In 2014 IEEE International Conference on Communications
(ICC), pages 804–809. IEEE, 2014.
[8] W. R. Pieter-Jan Moreels, Alexandre Dulaunoy. cve-search. https://
github.com/cve-search/cve-search, 2016.
[9] S. Sengupta, A. Chowdhary, D. Huang, and S. Kambhampati. Moving
target defense for the placement of intrusion detection systems in the
cloud. Conference on Decision and Game Theory for Security, 2018.
[10] S. Sengupta, S. G. Vadlamudi, S. Kambhampati, A. Doupé, Z. Zhao,
M. Taguinod, and G.-J. Ahn. A game theoretic approach to strategy
generation for moving target defense in web applications. International
Foundation for Autonomous Agents and Multiagent Systems, 2017.
[11] O. Sheyner, J. Haines, S. Jha, R. Lippmann, and J. M. Wing. Automated
Generation and Analysis of Attack Graphs. In IEEE Symposium on
Security and privacy, pages 273–284, 2002.
[12] R. Zhuang, S. Zhang, A. Bardas, S. A. DeLoach, X. Ou, and A. Singhal.
Investigating the application of moving target defenses to network
security. In Resilient Control Systems (ISRCS), 2013 6th International
Symposium on, pages 162–169. IEEE, 2013.