Adaptive MTD Security using Markov Game Modeling

Adel  Alshamrani

Adaptive MTD Security using Markov Game Modeling Ankur Chowdhary, Sailik Sengupta, Adel Alshamrani, Dijiang Huang, and Abdulhakim Sabur arXiv:1811.00651v1 [cs.CR] 1 Nov 2018 Arizona State University {achaud16, sailiks, aalsham4, dijiang, asabur}@asu.edu Abstract—Large scale cloud networks consist of distributed networking and computing elements that process critical information and thus security is a key requirement for any environment. Unfortunately, assessing the security state of such networks is a challenging task and the tools used in the past by security experts such as packet filtering, firewall, Intrusion Detection Systems (IDS) etc., provide a reactive security mechanism. In this paper, we introduce a Moving Target Defense (MTD) based proactive security framework for monitoring attacks which lets us identify and reason about multi-stage attacks that target software vulnerabilities present in a cloud network. We formulate the multi-stage attack scenario as a two-player zero-sum Markov Game (between the attacker and the network administrator) on attack graphs. The rewards and transition probabilities are obtained by leveraging the expert knowledge present in the Common Vulnerability Scoring System (CVSS). Our framework identifies an attacker’s optimal policy and places countermeasures to ensure that this attack policy is always detected, thus forcing the attacker to use a sub-optimal policy with higher cost. I. I NTRODUCTION A cloud data center consists of software and services from various vendors. Although the security policies of an organization might be up to date, vulnerabilities in software and presence of untrusted insiders can put sensitive information and communication in the network at risk. Traditional defense mechanisms in networks are composed of distributed elements such as firewalls, Intrusion Detection Systems (IDS), log monitoring systems, etc. Also, most of the defense mechanisms are based on reactive/incident-response mechanism. In a modern-era, such an approach can lead to loss of business. Therefore, we need a pro-active approach that anticipates potential weak links in security and assesses the possible behavior of the attacker, in effect providing a defense mechanism that will lead to increased difficulty for an attacker to exploit the network. One such approach that has emerged based on pro-active defense is known as Moving Target Defense (MTD). The goal in network-based MTD is to reconfigure the network services and connectivity in a way that any strategy devised by attacker based on a static view of the network becomes less effective. MTD based adaptive security can increase exploitation surface and decrease attack surface compared to a static system. An ad-hoc approach of switching services and connections in the network can prove to be more catastrophic than useful. Thus, some intelligent scheme is required to take such key decisions. We need to perform some cost-intrusiveness anal- ysis before taking some decision that can have a cascading effect on dependent system components. Game Theory has proved to be very effective in economics, biology, and other areas for taking some important decisions. In this research work, we utilize dynamic game of perfect information between the attacker and the system administrator to create an MTD strategy against attackers targeting software vulnerabilities. The key contribution of this research work are as follows: • Dynamic Game to perform MTD attack analysis and countermeasure evaluation. Most of the works we came across consider security as a static game. The dynamic game we present mimics realistic network security scenario where attack and defense is a continuous process. • Optimal countermeasure selection, which identifies the critical nodes in an Attack Graph based on CVSS score [6] as a reward metric, and selectively applies countermeasure to mitigate security threats in a cloud network. II. BACKGROUND Definition 1. A vulnerability is a security flaw in a software service hosted over a given port, that when exploited by a malicious attacker, can cause loss of Confidentiality, Availability or Integrity (CIA) for a virtual machine (VM). A. Threat Model Consider the cloud system in Figure 1(a), where the attacker has user level access to the LDAP server, which is the initial state of our game and the goal state is to compromise the FTP server. The attacker can perform actions such as 1: exploit-LDAP, exploit-Web, exploit-FTP. In the scenario above the attacker has two possible paths to reach the goal node priv(attacker, (FTP: root)), i.e. • Path 1: exploit-LDAP → exploit-FTP • Path 2: exploit-LDAP → exploit-Web → exploit-FTP The Admin, on the other hand, can choose to monitor (1) running services, (2) network traffic along both the paths using network and host-based monitoring agents, e.g., monitorLDAP, monitor-FTP, etc. We assume that the Admin has a limited budget and thus must try to perform monitoring in an optimized fashion. On the other hand, the attacker should try to perform attacks along the path not monitored by the Admin. For example, if the attacker is monitoring traffic only between LDAP and FTP server (mon-LDAP, mon-FTP), the attacker can choose Path 2 to avoid detection. Table I V ULNERABILITY I NFORMATION FOR THE C LOUD N ETWORK Attacker public-net 1: exploit-LDAP VM LDAP Vulnerability Local Priv Esc Web Server FTP Cross Site Scripting Remote Code Execution CVE-2016-5195 monitor-LDAP LDAP Server private-net Admin monitor-Web Server monitor-FTP CVE-2015-3306 CVE CVE-20165195 CVE-20175095 CVE-20153306 CIA 5.0 AC MEDIUM 7.0 EASY 10.0 MEDIUM CVE-2017-5045 3: exploit-ftp FTP Server Web Server (a) Motivating Example multi-stage attack vulExists (LDAP, Local Priv. Escalation) priv (attacker, (LDAP :user)) Attack Path 1: 1 -> 2(a) Attack Path 2: 1 -> 2(b) -> 3 RULE 1: execCode (LDAP) represents how difficult it is to exploit a vulnerability, and, Confidentiality, Integrity, and Availability (CIA) gained by exploiting the vulnerabilities in the cloud network above. The values of AC are categorical {EASY, MEDIUM, HIGH}, while CIA values are in the range [0, 10]. B. Game Theoretic Modeling vulExists (FTP, Remote Code Execution) priv (attacker, (LDAP : root, FTP: user, Web Server: user)) vulExists (Web Server, Cross Site Scripting ) RULE 4: execCode (Web Server) RULE 2: execCode (FTP) priv (attacker, ( FTP: user, Web Server: root)) RULE 4: execCode (FTP) priv (attacker, (FTP: root)) (b) Attack Graph corresponding to multi-stage attack Figure 1. An example cloud network scenario. Attack Graphs (AGs) have proved to be a successful tool for modeling attack behavior. Sheyner et al. [11] have discussed a framework for analyzing attacks using formal methods, in which they model the attacker’s behavior as an MDP. Unfortunately, the authors have not studied the impact of deploying MTD countermeasures on normal services of the system. We utilize attack graphs to define actions of Attacker and Admin over different stages of the network. Attack Graph G = {N, E} consists of a set of nodes (N) and a set of edges (E) where, • As shown in the Figure 1(b), the nodes (N) of attack graph can be denoted by N = {Nf ∪ Nc ∪ Nd ∪ Nr }. Here Nf denotes primitive/fact nodes e.g. vulExists (LDAP, Local Priv. Escalation), Nc denotes the exploit, e.g., execCode(LDAP), Nd denotes the privilege level, e.g., priv(attacker, (LDAP :user)) and Nr represents the root or goal node, e.g., priv(attacker, (FTP: root)); • The edges (E) of the attack graph can be denoted by E = {Epre ∪Epost }. Here Epre ⊆ (Nf ∪Nc )×(Nd ∪Nr ) ensures that pre-conditions Nc and Nf must be met to achieve Nd and Epost ⊆ (Nd ∪ Nr ) × (Nf ∪ Nc ) means post-condition Nd achieved on satisfaction of Nf and Nc . The Table I, shows the Access Complexity (AC) which Players: We consider the cloud system to be a dynamic game with imperfect information between two players (Attacker:P1 and Admin:P2 ). P1 is located outside the cloud network, or is a stealthy attacker having user-level access on a particular VM in the cloud network. P2 has the global view of the network. Goals: The goal of the attacker is also well defined, which in this case is to obtain root privileges on critical resource(s) of the cloud network like the FTP-Server. We consider the system architecture and a particular use case as shown in Figure 1(a). Further, the attack mounted by the attacker is considered monotonic, i.e., once an attacker has reached a certain state, they do not need to go back to any previous state, when targeting a specific goal. States: represent the privilege attacker/defender currently have in the network over different resources. We extract the information from the network attack graph to define the state information, e.g., for the attacker, initial state s1 = (LDAP, user), on the successful execution of the exploit-LDAP, the attacker can transition to another state s2 = (LDAP, root). Actions and State Transitions: The P1 has two possible actions in each state for the example defined in the Figure 1, e.g., a11 = no-action, a21 = exploit-LDAP. Similarly the P2 has possible actions to monitor LDAP, i.e., a12 = mon-LDAP, or perform no monitoring, i.e., a22 = no-mon. The Figure 2, shows the probabilities of players P1 , P2 actions in each state of the attack along the Path 1, exploit-LDAP → exploit-FTP. The state-transition in the Markov Game is conditioned upon the actions of both players, in each state, as shown above. Initially the attacker is present in the state (user, LDAP). The attacker can choose to take an action from the set {no-exploit, exploit-LDAP} and the probability of taking action exploit-LDAP is 0.66 for the attacker. Similarly, the Admin, has two possible actions, i.e., {no-mon, mon-LDAP}. The admin performs mon-LDAP with a probability 0.5. If the attacker is in state ’s’, then τ (s, a1 , a2 , s0 ) is the next state of the game provided P1 and P2 take actions a1 and a2 in the state s. In the example above the attacker is only able to exploit LDAP vulnerability if τ (s1 =(LDAP, user), a1 =exploitLDAP, a2 =no-mon, s2 =(LDAP,root)) > 0. 0.34 S1: (user, LDAP) 0.50 no-action mon-LDAP 0.34 1:exploit-LDAP no-action 0.66 no-mon 0.66 1: exploit-LDAP 0.50 mon-LDAP 0.50 no-mon 0.50 S2: (root, LDAP) Definition 2. A Markov game for two players P1 and P2 can be defined by the tuple (S, A1 , A2 , τ, R, δ) where, • S = {s1 , s2 , s3 , . . . , sk } are finite states of the game, 1 2 m • A1 = {a1 , a1 , . . . , a1 } represents the possible finite action sets for P1 , 1 2 n • A2 = {a2 , a2 , . . . , a2 } are finite action sets for P2 , 0 • τ (s, a1 , a2 , s ) is the probability of reaching a state s0 ∈ S for state s if P1 and P2 take actions a1 and a2 respectively, • R(s, a1 , a2 ) is the reward obtained by P1 if in state s, P1 and P2 take the actions a1 and a2 respectively. Note that the reward for P2 is −R(s, a1 , a2 ) by definition of a zero-sum game, and • γ 7→ (0, 1] is discount factor for future discount rewards. Blocked 0.10 0.50 no-action mon-FTP 0.90 no-action 0.10 2(a): exploit-FTP 2(a):exploit-FTP 0.90 0.50 no-mon 0.50 mon-FTP 0.50 no-mon S3: (root, FTP) Figure 2. Actions and State Transitions for Markov Game along Path 1 Rewards: The reward function is dependent upon the actions of the attacker and the defender in each state. We refer to the CIA values defined in Table I for the reward associated with successful/unsuccessful action of P1 and P2 . P1 no-act exp-Web exp-FTP no-mon 0, 0 7, −7 10, −10 P2 mon-Web 0.5, −0.5 −7, 7 10, −10 mon-FTP 0.5, −0.5 7, −7 −10, 10 Table II TABLE FOR STATE s2 no-act LDAP no-mon 0, 0 5, −5 mon-LDAP 0.5, −0.5 −5, 5 knowledge in the the Attack Graph (AG) defined in Section II. Besides the obvious Markovian assumption, we further assume that this model has (1) states and actions that are both discrete and finite, and (2) transition from each state and the reward in each state depends on the action each player decides to take in that state. We now formally define a zero-sum Markov Game model and then clearly highlight how each of these are obtained in our setting. no-act LDAP no-mon 0, 0 10, −10 mon-LDAP 0.5, −0.5 −10, 10 Table III TABLE FOR STATE s0 ( LEFT ) AND s1 ( RIGHT ). Table III, shows the possible actions in three important states of the attack propagation, and the corresponding reward metric. When the attacker is present in the state (user, LDAP) Table III(a), the P1 can take actions {no-act, exploit-LDAP}, similarly the P2 can choose to monitor or not monitor the LDAP server, so the actions for P2 are {no-mon, mon-LDAP}. The reward function is defined as R(s, a1 , a2 , s0 ), where s is the current state and a1 , a2 are actions of both players. For instance if s1 =(LDAP, user), a1 =exp-LDAP, a2 =no-mon, R(s, a1 , a2 , s0 ) = 5.0 for the attacker and -5.0 for the Admin. Similarly the reward metrics for other possible states has been defined in the transition tables above. III. M ARKOV G AME We model the scenario between an attacker and an administrator as a two-player zero-sum Markov game leveraging the We model each VM in the AG as a state in our game. The attacker is trying to take actions that help it to reach a particular VM, which is the terminal state, while the administrator’s action represents placing a monitoring system to detect an attack. Considering placing a VM incurs cost (negative reward) and there might be multiple vulnerabilities in a single VM, both the attacker and defender have multiple actions depending on the state they are in. If the attacker tries to exploit a vulnerability in a state for which the defender choose to place a detection system, they will are detected and blocked with high probability (as defined by τ ) and gets a negative reward (which implies the defender gets a positive reward). Otherwise, it has higher probability of succeeding in the attack, thus moving closer to the goal state in the AG and obtaining a positive reward. A normal-form zero-sum reward matrix for three states in our simple Markov game is shown in Table III. Note that, in this formulation which is a first step, we assume that the states are visible to both the players, which implies that the defender gets to know if an attacker has succeeded and hence moved on the a new state s0 from s. We plan to consider partial observability about the state as future work. In this game, P1 will try to maximize his expected discounted reward, while P2 will try to select actions that minimize the expected reward for P1 . We consider the min − max strategy for calculating the expected reward of P1 in our Markov game. Due to the underlying stochasticity of the game, the players have to reason about the expected reward that they will get for making a particular decision in a certain state. Going forward, we will reason about the rewards for P1 , i.e. the attacker and by the property of a zero sum game, we will show that an optimal policy for the defender seeks to reduce this reward value. We now define the quality of an action or the Q value used to represent the expected reward P1 will get for choosing Algorithm 1 MDP - VALUE ITERATION action a1 ∈ A1 (while P2 chooses a2 ∈ A2 ), X 1: procedure E MIT- V ∗ (S) AT tk τ (s, a1 , a2 , s0 ) · V (s0 ) (1) 2: Q(s, a1 , a2 ) = R(s, a1 , a2 ) + γ V0∗ (s) = 0 for all s. {Initialize value function to start} s0 3: s ← s0 Let us now denote the mixed policy for state s as π(s), which 4: τ (s, a, s0 ) {Transition probability from state s to s’} is a probability distribution that P1 can now have over the 5: R(s, a, s0 ) {Reward obtained by taking action a in possible actions it can take in state s. With that, we can define state s} the value of state s for P1 using the equation, 6: δ ∈ [0, 1] {Discount factor} X 7: i←0 V (s) = max min Q(s, a1 , a2 ) · πa1 (2) 8: loop: i == k break; P a π(s) 2 a1 9: V ∗ (s) ← maxA s0 τ (s, a, s0 ) × [R(s, a, s0 ) + ∗ Note that the the max-min strategy of attacker (and the γV (s0 )] P defender) can be captured by using a modified version of the 10: π(s) = argmaxa s0 τ (s, a, s0 ) × [R(s, a, s0 ) + classic value iteration algorithm. Using the min-max strategy γV ∗ (s0 )] implies that the defender is trying to minimize the reward 11: i←i+1 of then attacker by placing detection nodes in the various 12: goto loop. states that are a part of the various attack paths present in the attack graph, while the attacker is trying to use a mixed policy π(s) over it possible actions in A2 to maximize their VM. The attacker’s goal is to obtain ‘root’ level privileges on total reward. Thus, the Markov Game framework helps the the target VM. We further utilized cve − search [8] to obtain administrator model the attackers policy so that they can take host and guest vulnerability information. necessary countermeasures (decision in each state for player We conducted Markov Game cost-benefit analysis for the P2 ) to minimize the expected utility for the attacker. two players– Attacker and Admin – on the system (with V M1 , V M2 , V M3 ) defined in our evaluation network. We IV. I MPLEMENTATION AND E VALUATION consider 100 known vulnerabilities spread across the three A. Implementation VMs, that have not yet been fixed due to resource considNote that in order to implement the actual modified value erations. The goal state for the attacker is to obtain root level iteration (as described by Eq. 1 and 2), one needs to solve privileges on all VM’s using the available system and network a linear program for each state after obtaining the updated exploits. We conducted experiments to evaluate the effectiveQ values. This may become computationally inefficient for ness of strategic MDP Value Iteration based countermeasure large real networks and thus, we make two assumptions, that vs naive strategy that selects only top x={10, 50}% of the provide us with an approximate strategy for the players. First, vulnerabilities to apply countermeasure. The naive strategy we restrict the attacker to select a pure strategy, i.e. π(s) ∀s assumes that the administrator should only select VMs with has the value of 1 for only one action a ∈ A2 that it can vulnerabilities having high CIA for patching. The strategic execute in state s and zero for all other actions. Second, we approach assumes that the administrator has conducted attack consider that the defender has observability of the attacker’s analysis in advance and knows the high-value targets based action and thus only use action pairs (a1 , a2 ) that respect the on MDP value iteration in the cloud network that may be min-max condition. With these, we can not use the classic subjected to attack. value iteration algorithm where the action set of the MDP As shown in the Figure 3 the reward value for Attacker A is restricted action pairs mentioned above and the reward decreases as the administrator increases his attack surface and transition function is a subset of the original ones for the coverage. The reward for attacker decreases from 160 to 96 Markov Game only defined for those action pairs that respect on providing a countermeasure for 30 percent vulnerable states when a naive strategy is used. The value further decreases from the min-max strategy. ∼ 96 to ∼ 65 on implementing countermeasure for half of the B. Optimal Countermeasure Selection vulnerable states. The countermeasures are deployed naively We consider countermeasure deployment using the equilib- based on top vulnerabilities, irrespective of analysis of asset rium strategy for the markov game, which means for each value to an attacker. state of the game, the administrator minimizes the maximum In the case of a strategic approach, the administrator performs benefit to the attacker. We use an OpenStack based cloud a thorough analysis of vulnerable states, high-value targets, environment to make a small system with three VMs, i.e, a) shortest paths to goal states and strategically implements Ubuntu 16.04: 192.168.1.5 b) Fedora 23: 192.168.1.6 countermeasures for those states. The reward value for attacker c) metasploitable: 192.168.1.7. In this environment, we decreases from ∼ 133 to 45 for the attacker when administraassume that the attacker is initially located on the VM with tor increases countermeasures for about 30 percent vulnerable Ubuntu 16.04 and has user-level privileges on it. The attacker states from 10 percent states. This value further decreases to conducts a nmap scan to enumerate the vulnerable system and 30 for the attacker when the administrator has coverage for network services that are present on the host and the target about 50 percent vulnerable states. Attacker’s Reward Value 210 180 Naive Strategy MDP Value Iteration 150 120 90 60 30 0 10 20 30 40 50 Percentage of States with countermeasures Figure 3. P1 Reward Goal Value vs P2 ’s Naive and Strategic countermeasures We can observe that the strategic approach affects attackers reward significantly, a reduction of almost half. An attacker can obtain the reward of 30 for a strategic approach, compared to reward ∼ 65 for naive approach when administrator deploys countermeasures for 50% vulnerable states. The interdiction of attack paths by the administrator using Markov Game, being the optimal, will strictly dominate any other strategy and thus, significantly limit the attacker’s capability as the number of vulnerabilities will increase in the cloud environment. V. R ELATED W ORK Sheyner et al [2] present a formal analysis of attacks on a network along with cost-benefit analysis and security measures to defend against the network attacks. In [1], Chowdhary et al. provide a polynomial time method for attack graph construction and network reconfiguration using a parallel computing approach, making it possible to leverage information for strategic reason of attacks in large-scale systems. Authors in [3] introduced the idea of moving secret proxies to new network locations using a greedy algorithm, which they show can thwart brute force and DDoS attacks. In [12], Zhuang et al shows that MTD system designed with intelligent adaptations improve the effectiveness further. In [10], authors shows that intelligent strategies based on common intuitions can be detrimental to security and highlight how game theoretic reasoning can alleviate the problem. On those line, Wei et al [5] and Sengupta et al [9] use a game theoretic approach to model the attacker-defender interaction as a two-player game where they calculate the optimal response for the players using the Nash and the Stackelberg Equilibrium concepts respectively. Although they propose the use of the Markov Decision Process (MDP) and attack graph-based approaches, they leave it as future work. In the context of cloud systems, Peng et al discusses a riskaware MTD strategy [7] where they model the attack surface as a non-decreasing probability density function and then estimate the risk of migrating a VM to a replacement node using probabilistic inference. Kampanakis et al [4] highlight obfuscation as a possible MTD strategy in order to deal with attacks like OS fingerprinting and network reconnaissance in the SDN environment. Furthermore, they highlight that the trade-off between such random mutations, which may disrupt any active services, require analysis of cost-benefits. In this paper, we identify an adaptive MTD strategy against multi-hop monotonic attacks for cloud networks which optimizes the performance while providing gains in security. VI. C ONCLUSION AND F UTURE W ORK A cloud network is composed of heterogeneous network devices and applications interacting with each other. The interaction of these entities poses both (1) a security risk to overall cloud infrastructure and (2) makes it difficult to secure them. While traditional security solutions provide reactive security mechanisms to detect and mitigate a threat, they may fail to assess the damage to infrastructure due to a cascading security breach. We presented Markov Game as an assessment tool to perform a cost-benefit analysis of security vulnerabilities and corresponding countermeasures in the cloud network. The assessment shows that a network administrator needs to proactively identify critical security assets and strategically deploy available countermeasures. Game Theoretic approach will help the administrator to quantify and minimize risk provided limited resources. ACKNOWLEDGMENT This research is based upon work supported by the NRL N00173-15-G017, NSF Grants 1642031, 1528099, and 1723440, and NSFC Grants 61628201 and 61571375. The second author is supported by the IBM Ph.D. Fellowship. R EFERENCES [1] A. Chowdhary, S. Pisharody, and D. Huang. Sdn based scalable mtd solution in cloud network. In Proceedings of the 2016 ACM Workshop on Moving Target Defense, pages 27–36. ACM, 2016. [2] S. Jha, O. Sheyner, and J. Wing. Two formal analyses of attack graphs. In Computer Security Foundations Workshop, 2002. Proceedings. 15th IEEE, pages 49–63. IEEE, 2002. [3] Q. Jia, K. Sun, and A. Stavrou. Motag: Moving target defense against internet denial of service attacks. In 2013 22nd International Conference on Computer Communication and Networks, pages 1–9. IEEE, 2013. [4] P. Kampanakis, H. Perros, and T. Beyene. Sdn-based solutions for moving target defense network protection. In World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2014 IEEE 15th International Symposium on a, pages 1–6. IEEE, 2014. [5] K.-w. Lye and J. M. Wing. Game strategies in network security. International Journal of Information Security, 4(1-2):71–86, 2005. [6] NIST. CVSS. https://www.first.org/cvss, 2016. [19-Nov-2016]. [7] W. Peng, F. Li, C.-T. Huang, and X. Zou. A moving-target defense strategy for cloud-based services with heterogeneous and dynamic attack surfaces. In 2014 IEEE International Conference on Communications (ICC), pages 804–809. IEEE, 2014. [8] W. R. Pieter-Jan Moreels, Alexandre Dulaunoy. cve-search. https:// github.com/cve-search/cve-search, 2016. [9] S. Sengupta, A. Chowdhary, D. Huang, and S. Kambhampati. Moving target defense for the placement of intrusion detection systems in the cloud. Conference on Decision and Game Theory for Security, 2018. [10] S. Sengupta, S. G. Vadlamudi, S. Kambhampati, A. Doupé, Z. Zhao, M. Taguinod, and G.-J. Ahn. A game theoretic approach to strategy generation for moving target defense in web applications. International Foundation for Autonomous Agents and Multiagent Systems, 2017. [11] O. Sheyner, J. Haines, S. Jha, R. Lippmann, and J. M. Wing. Automated Generation and Analysis of Attack Graphs. In IEEE Symposium on Security and privacy, pages 273–284, 2002. [12] R. Zhuang, S. Zhang, A. Bardas, S. A. DeLoach, X. Ou, and A. Singhal. Investigating the application of moving target defenses to network security. In Resilient Control Systems (ISRCS), 2013 6th International Symposium on, pages 162–169. IEEE, 2013.

RELATED PAPERS

RELATED TOPICS

Log In

Adaptive MTD Security using Markov Game Modeling

Adaptive MTD Security using Markov Game Modeling

Related Papers

RELATED PAPERS

RELATED TOPICS