Abstract
Intention deception involves computing a strategy which deceives the opponent into a wrong belief about the agent’s intention or objective. This paper studies a class of probabilistic planning problems with intention deception and investigates how a defender’s limited sensing modality can be exploited by an attacker to achieve its attack objective almost surely (with probability one) while hiding its intention. In particular, we model the attack planning in a stochastic system modeled as a Markov decision process (MDP). The attacker is to reach some target states while avoiding unsafe states in the system and knows that his behavior is monitored by a defender with partial observations. Given partial state observations for the defender, we develop qualitative intention deception planning algorithms that construct attack strategies to play against an action-visible defender and an action-invisible defender, respectively. The synthesized attack strategy not only ensures the attack objective is satisfied almost surely but also deceives the defender into believing that the observed behavior is generated by a normal/legitimate user and thus failing to detect the presence of an attack. We show the proposed algorithms are correct and complete and illustrate the deceptive planning methods with examples.
Research was sponsored by the Army Research Office and was accomplished under Grant Number W911NF-22-1-0034.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In temporal logic, the reachability objective is expressed as \(\textsf{true}\, \textsf{U}\,F\) and the safety objective is expressed as \(\lnot (\textsf{true}\, \textsf{U}\,U)\).
- 2.
The computation is performed in a MacBook Pro with 16 GB memory and Apple M1 Pro chip. The computation time is the total time taken to compute the attacker’s intention deception ASW region in the augmented MDP.
- 3.
Videos of the sampled runs for case \(\text{(c)-B-I },\text{(c)-P-I }\) can be found at https://bit.ly/3BiPRb9 where the light green cells are states in the defender’s belief.
References
Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)
Baier, C., Katoen, J.P.: Principles of Model Checking (Representation and Mind Series). The MIT Press, Cambridge (2008)
Bernardini, S., Fagnani, F., Franco, S.: An optimization approach to robust goal obfuscation. In: Proceedings of the Seventeenth International Conference on Principles of Knowledge Representation and Reasoning, pp. 119–129. International Joint Conferences on Artificial Intelligence Organization, Rhodes (2020)
Chatterjee, K., Doyen, L.: Partial-observation stochastic games: how to win when belief fails. CoRR (2011)
Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-observable Markov decision processes. In: Hliněný, P., Kučera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 258–269. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15155-2_24
Chen, P., Desmet, L., Huygens, C.: A study on advanced persistent threats. In: De Decker, B., Zúquete, A. (eds.) CMS 2014. LNCS, vol. 8735, pp. 63–72. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44885-4_5
Dubreil, J., Darondeau, P., Marchand, H.: Supervisory control for opacity. IEEE Trans. Autom. Control 55(5), 1089–1100 (2010)
Gimbert, H., Oualhadj, Y., Paul, S.: Computing optimal strategies for Markov decision processes with parity and positive-average conditions (2011). https://hal.archives-ouvertes.fr/hal-00559173
Jacob, R., Lesage, J.J., Faure, J.M.: Overview of discrete event systems opacity: models, validation, and quantification. Annu. Rev. Control. 41, 135–146 (2016)
Karabag, M.O., Ornik, M., Topcu, U.: Deception in supervisory control. IEEE Trans. Autom. Control 1 (2021)
Kulkarni, A., Srivastava, S., Kambhampati, S.: A unified framework for planning in adversarial and cooperative environments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 2479–2487 (2019)
Kulkarni, A., Srivastava, S., Kambhampati, S.: Signaling friends and head-faking enemies simultaneously: balancing goal obfuscation and goal legibility. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2020, pp. 1889–1891. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2020)
Kupferman, O., Vardi, M.Y.: Model checking of safety properties. Formal Methods Syst. Design 19(3), 291–314 (2001)
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Lin, F.: Opacity of discrete event systems and its applications. Automatica 47(3), 496–503 (2011). https://doi.org/10.1016/j.automatica.2011.01.002
Masters, P., Sardina, S.: Deceptive path-planning. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 4368–4375. International Joint Conferences on Artificial Intelligence Organization, Melbourne (2017). https://doi.org/10.24963/ijcai.2017/610, https://www.ijcai.org/proceedings/2017/610
Saboori, A., Hadjicostis, C.N.: Opacity-enforcing supervisory strategies via state estimator constructions. IEEE Trans. Autom. Control 57(5), 1155–1165 (2012). https://doi.org/10.1109/TAC.2011.2170453
Salem, M.B., Hershkop, S., Stolfo, S.J.: A survey of insider attack detection research. In: Stolfo, S.J., Bellovin, S.M., Keromytis, A.D., Hershkop, S., Smith, S.W., Sinclair, S. (eds.) Insider Attack and Cyber Security: Beyond the Hacker. Advances in Information Security, vol. 39, pp. 69–90. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-77322-3_5
Zhang, Y., Shell, D.A.: Plans that remain private, even in hindsight. In: The AAAI Workshop on Privacy-Preserving Artificial Intelligence (AAAI-PPAI), p. 5 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Proof of Proposition 2 and the Construction of ASW Region and ASW Strategies
Proof
First, we provide the algorithm to solve the ASW region \(\textsf{ASW}(\varphi )\) and ASW strategy \(\pi \) for the task \(\varphi {:}{=}\lnot U \, \textsf{U}\,F\) where \(U \cap F = \emptyset \).
-
1.
Initiate \(X_0=F\) and \(Y_0=S {\setminus } U\). Let \(i=j=1\).
-
2.
Let \(X_{i+1} = \{X_i\} \cup \{s \in Y_j {\setminus } X_i\mid \exists a\in A(s), \textsf{Post}(s,a) \cap X_i \ne \emptyset \text { and } \textsf{Post}(s,a)\subseteq Y_j\}\)
-
3.
If \(X_{i+1}\ne X_i\), then let \(i=i+1\) and go to step 2; else, let \(n=i\) and go to step 4.
-
4.
Let \(Y_{j+1}=X_i\). If \(Y_{j+1}= Y_j\), then \(\textsf{ASW}(\varphi )=Y_j\). Return \(\{X_i, i=1,\ldots , n\}\) computed from the last iteration. Else, let \(j=j+1\) and \(i=0\) and go to step 2.
The algorithm returns a set of level sets \(X_i, i=0,\ldots , n\) for some \(n\ge 0\) and the ASW region \(\textsf{ASW}(\varphi )\). Recall that \(\textsf{Allowed}: S\rightarrow 2^A \) is defined by \(\textsf{Allowed}(s) = \{a\in A(s)\mid \textsf{Post}(s,a)\subseteq \textsf{ASW}(\varphi )\}\). The following property holds: For each \(s\in X_i\setminus X_{i-1}\), there exists an action \(a\in \textsf{Allowed}(s)\) that ensures, with a positive probability, the next state is in \(X_{i-1}\) and with probability one, the next state is in \(\textsf{ASW}(\varphi )\). The strategy \(\pi : \textsf{ASW}(\varphi )\rightarrow \mathcal {D}(A)\) is almost-sure winning if for every state \(s\in \textsf{ASW}(\varphi )\), \(\text{ Supp }(\pi (s)) = \textsf{Allowed}(s)\). That is, for every permissible action a at state \(s \in \textsf{ASW}(\varphi )\), \(\pi (s)\) selects that action with a non-zero probability. The ASW strategy may not be unique.
Let’s define a function \(\textsf{Prog}: \textsf{ASW}(\varphi )\rightarrow 2^A\) such that for each \(s\in X_i{\setminus } X_{i-1}\), \(\textsf{Prog}(s)=\{a\in \textsf{Allowed}(s)\mid \textsf{Post}(s,a)\cap X_{i-1}\ne \emptyset \}\). Intuitively, the set \(\textsf{Prog}(s)\) is a set of actions, each of which ensures that a progress (to a lower level set) can be made with a positive probability.
Therefore, by following a policy \(\pi \) that selects any action in \(\textsf{Allowed}(s)\) with probability \(> 0\), the probability of starting from a state \(s\in X_i{\setminus } X_{i-1}\) and not reaching a state in \(X_0=F\) in i steps is less than \((1-p)^i\) where \(p = \min _{0<i \le n, s\in S, a\in \textsf{Prog}(s)} \pi (s,a)P(s' \mid s,a)\) and is nonzero. If in the i-th step, the set \(X_0\) is not reached, the agent will reach a state \(s'\in \textsf{ASW}(\varphi )\) from which an action in \(\textsf{Prog}(s')\) will be selected with a nonzero probability. Thus, the probability of never reaching a state in F is \(\lim _{k\rightarrow \infty } (1-p)^k =0\). In other words, the policy \(\pi \) ensure F is eventually reached with probability one. At the same time, because the set \(Y_j \cap U =\emptyset \) for all \(j >0\) during iterations, \(\textsf{ASW}(\varphi )\cap U = \emptyset \) and thus the probability of reaching a state in U is zero by following the policy \(\pi \).
B Proof of Theorem 2
Proof
We show that the ASW policy \(\widehat{\pi }_1: S\times 2^S\rightarrow \mathcal {D}(A)\) obtained from the augmented MDP is qualitatively observation-equivalent to an ASW policy for the user.
Consider a history \(h = s_0a_0s_1a_1\ldots s_n \) which is sampled from the stochastic process \(M_{\widehat{\pi }_1}\) and satisfies \(s_i \notin F_1 \cup U_1\) for \(0\le i <n\) and \(s_n \in F_1\). The history is associated with a history in the augmented MDP, \(\widehat{h} = (s_0, B_0)a_0(s_1,B_1)a_1\ldots (s_n,B_n)\) where \(B_0=\textsf{DObs}_S(s_0)\) is the initial belief for the defender. Due to the construction of the augmented MDP, for all \(0\le i \le n\), \(B_i\ne \emptyset \).
By the definition of qualitatively observation-equivalence, we only need to show that there exists \(h' =s_0'a'_0s_1'a'_1 \ldots s_n'\) where \(s_i'\in B_i\), for all \(i=0,\ldots , n\), such that \(\Pr (h', M_{\pi _0})>0\) where \(M_{\pi _0}\) is the Markov chain induced by a user’s ASW policy \(\pi _0\) from the MDP M. It is observed that \(a_i\) and \(a'_i\) may not be the same. By way of contradiction, suppose, for any state-action sequence \(h' =s_0'a'_0s_1'a'_1 \ldots s_n' \) where \(s_i'\in B_i\) and \(a'_i \in A(s_i')\) for \(0\le i \le n\), it holds that \(\Pr (h', M_{\pi _0})=0\). When \(\Pr (h', M_{\pi _0})=0\), there are two possible cases: First case: there exists some \(i\ge 0\) such that for all \(s\in B_i\), \( \textsf{Allowed}(s)=\emptyset \); second case: there does not exists an action a enabled from the belief \(B_i\) and a state \(s'\in B_{i+1}\) such that \(P(s'|s,a)>0\).
The first case is not possible because when for all \(s\in B_i\), \(\textsf{Allowed}(s)=\emptyset \), then the next state reached will be \((s_i,\emptyset )\) which is a sink state, contradicting the fact that \(\widehat{h} \) satisfies the reach-avoid objective. In the second case, if for any state \(s\in B_i\), for any action a enabled from s, \(\textsf{Post}(s,a) \cap B_{i+1}=\emptyset \), then \(B_{i+1}=\emptyset \), this again contracts that \(\widehat{h}\) visits a state in \(\widehat{F}\).
Thus, it holds that there exists \(h'\) such that \(\textsf{DObs}_S(h)=\textsf{DObs}_S(h')\) and \(\Pr (h', M_{\pi _0})>0\).
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fu, J. (2023). On Almost-Sure Intention Deception Planning that Exploits Imperfect Observers. In: Fang, F., Xu, H., Hayel, Y. (eds) Decision and Game Theory for Security. GameSec 2022. Lecture Notes in Computer Science, vol 13727. Springer, Cham. https://doi.org/10.1007/978-3-031-26369-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-26369-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26368-2
Online ISBN: 978-3-031-26369-9
eBook Packages: Computer ScienceComputer Science (R0)