Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

On Almost-Sure Intention Deception Planning that Exploits Imperfect Observers

  • Conference paper
  • First Online:
Decision and Game Theory for Security (GameSec 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13727))

Included in the following conference series:

  • 455 Accesses

Abstract

Intention deception involves computing a strategy which deceives the opponent into a wrong belief about the agent’s intention or objective. This paper studies a class of probabilistic planning problems with intention deception and investigates how a defender’s limited sensing modality can be exploited by an attacker to achieve its attack objective almost surely (with probability one) while hiding its intention. In particular, we model the attack planning in a stochastic system modeled as a Markov decision process (MDP). The attacker is to reach some target states while avoiding unsafe states in the system and knows that his behavior is monitored by a defender with partial observations. Given partial state observations for the defender, we develop qualitative intention deception planning algorithms that construct attack strategies to play against an action-visible defender and an action-invisible defender, respectively. The synthesized attack strategy not only ensures the attack objective is satisfied almost surely but also deceives the defender into believing that the observed behavior is generated by a normal/legitimate user and thus failing to detect the presence of an attack. We show the proposed algorithms are correct and complete and illustrate the deceptive planning methods with examples.

Research was sponsored by the Army Research Office and was accomplished under Grant Number W911NF-22-1-0034.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In temporal logic, the reachability objective is expressed as \(\textsf{true}\, \textsf{U}\,F\) and the safety objective is expressed as \(\lnot (\textsf{true}\, \textsf{U}\,U)\).

  2. 2.

    The computation is performed in a MacBook Pro with 16 GB memory and Apple M1 Pro chip. The computation time is the total time taken to compute the attacker’s intention deception ASW region in the augmented MDP.

  3. 3.

    Videos of the sampled runs for case \(\text{(c)-B-I },\text{(c)-P-I }\) can be found at https://bit.ly/3BiPRb9 where the light green cells are states in the defender’s belief.

References

  1. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)

    MATH  Google Scholar 

  2. Baier, C., Katoen, J.P.: Principles of Model Checking (Representation and Mind Series). The MIT Press, Cambridge (2008)

    MATH  Google Scholar 

  3. Bernardini, S., Fagnani, F., Franco, S.: An optimization approach to robust goal obfuscation. In: Proceedings of the Seventeenth International Conference on Principles of Knowledge Representation and Reasoning, pp. 119–129. International Joint Conferences on Artificial Intelligence Organization, Rhodes (2020)

    Google Scholar 

  4. Chatterjee, K., Doyen, L.: Partial-observation stochastic games: how to win when belief fails. CoRR (2011)

    Google Scholar 

  5. Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-observable Markov decision processes. In: Hliněný, P., Kučera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 258–269. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15155-2_24

    Chapter  MATH  Google Scholar 

  6. Chen, P., Desmet, L., Huygens, C.: A study on advanced persistent threats. In: De Decker, B., Zúquete, A. (eds.) CMS 2014. LNCS, vol. 8735, pp. 63–72. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44885-4_5

    Chapter  Google Scholar 

  7. Dubreil, J., Darondeau, P., Marchand, H.: Supervisory control for opacity. IEEE Trans. Autom. Control 55(5), 1089–1100 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Gimbert, H., Oualhadj, Y., Paul, S.: Computing optimal strategies for Markov decision processes with parity and positive-average conditions (2011). https://hal.archives-ouvertes.fr/hal-00559173

  9. Jacob, R., Lesage, J.J., Faure, J.M.: Overview of discrete event systems opacity: models, validation, and quantification. Annu. Rev. Control. 41, 135–146 (2016)

    Article  Google Scholar 

  10. Karabag, M.O., Ornik, M., Topcu, U.: Deception in supervisory control. IEEE Trans. Autom. Control 1 (2021)

    Google Scholar 

  11. Kulkarni, A., Srivastava, S., Kambhampati, S.: A unified framework for planning in adversarial and cooperative environments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 2479–2487 (2019)

    Google Scholar 

  12. Kulkarni, A., Srivastava, S., Kambhampati, S.: Signaling friends and head-faking enemies simultaneously: balancing goal obfuscation and goal legibility. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2020, pp. 1889–1891. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2020)

    Google Scholar 

  13. Kupferman, O., Vardi, M.Y.: Model checking of safety properties. Formal Methods Syst. Design 19(3), 291–314 (2001)

    Article  MATH  Google Scholar 

  14. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47

    Chapter  Google Scholar 

  15. Lin, F.: Opacity of discrete event systems and its applications. Automatica 47(3), 496–503 (2011). https://doi.org/10.1016/j.automatica.2011.01.002

    Article  MathSciNet  MATH  Google Scholar 

  16. Masters, P., Sardina, S.: Deceptive path-planning. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 4368–4375. International Joint Conferences on Artificial Intelligence Organization, Melbourne (2017). https://doi.org/10.24963/ijcai.2017/610, https://www.ijcai.org/proceedings/2017/610

  17. Saboori, A., Hadjicostis, C.N.: Opacity-enforcing supervisory strategies via state estimator constructions. IEEE Trans. Autom. Control 57(5), 1155–1165 (2012). https://doi.org/10.1109/TAC.2011.2170453

    Article  MathSciNet  MATH  Google Scholar 

  18. Salem, M.B., Hershkop, S., Stolfo, S.J.: A survey of insider attack detection research. In: Stolfo, S.J., Bellovin, S.M., Keromytis, A.D., Hershkop, S., Smith, S.W., Sinclair, S. (eds.) Insider Attack and Cyber Security: Beyond the Hacker. Advances in Information Security, vol. 39, pp. 69–90. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-77322-3_5

    Chapter  Google Scholar 

  19. Zhang, Y., Shell, D.A.: Plans that remain private, even in hindsight. In: The AAAI Workshop on Privacy-Preserving Artificial Intelligence (AAAI-PPAI), p. 5 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Fu .

Editor information

Editors and Affiliations

Appendices

A Proof of Proposition 2 and the Construction of ASW Region and ASW Strategies

Proof

First, we provide the algorithm to solve the ASW region \(\textsf{ASW}(\varphi )\) and ASW strategy \(\pi \) for the task \(\varphi {:}{=}\lnot U \, \textsf{U}\,F\) where \(U \cap F = \emptyset \).

  1. 1.

    Initiate \(X_0=F\) and \(Y_0=S {\setminus } U\). Let \(i=j=1\).

  2. 2.

    Let \(X_{i+1} = \{X_i\} \cup \{s \in Y_j {\setminus } X_i\mid \exists a\in A(s), \textsf{Post}(s,a) \cap X_i \ne \emptyset \text { and } \textsf{Post}(s,a)\subseteq Y_j\}\)

  3. 3.

    If \(X_{i+1}\ne X_i\), then let \(i=i+1\) and go to step 2; else, let \(n=i\) and go to step 4.

  4. 4.

    Let \(Y_{j+1}=X_i\). If \(Y_{j+1}= Y_j\), then \(\textsf{ASW}(\varphi )=Y_j\). Return \(\{X_i, i=1,\ldots , n\}\) computed from the last iteration. Else, let \(j=j+1\) and \(i=0\) and go to step 2.

The algorithm returns a set of level sets \(X_i, i=0,\ldots , n\) for some \(n\ge 0\) and the ASW region \(\textsf{ASW}(\varphi )\). Recall that \(\textsf{Allowed}: S\rightarrow 2^A \) is defined by \(\textsf{Allowed}(s) = \{a\in A(s)\mid \textsf{Post}(s,a)\subseteq \textsf{ASW}(\varphi )\}\). The following property holds: For each \(s\in X_i\setminus X_{i-1}\), there exists an action \(a\in \textsf{Allowed}(s)\) that ensures, with a positive probability, the next state is in \(X_{i-1}\) and with probability one, the next state is in \(\textsf{ASW}(\varphi )\). The strategy \(\pi : \textsf{ASW}(\varphi )\rightarrow \mathcal {D}(A)\) is almost-sure winning if for every state \(s\in \textsf{ASW}(\varphi )\), \(\text{ Supp }(\pi (s)) = \textsf{Allowed}(s)\). That is, for every permissible action a at state \(s \in \textsf{ASW}(\varphi )\), \(\pi (s)\) selects that action with a non-zero probability. The ASW strategy may not be unique.

Let’s define a function \(\textsf{Prog}: \textsf{ASW}(\varphi )\rightarrow 2^A\) such that for each \(s\in X_i{\setminus } X_{i-1}\), \(\textsf{Prog}(s)=\{a\in \textsf{Allowed}(s)\mid \textsf{Post}(s,a)\cap X_{i-1}\ne \emptyset \}\). Intuitively, the set \(\textsf{Prog}(s)\) is a set of actions, each of which ensures that a progress (to a lower level set) can be made with a positive probability.

Therefore, by following a policy \(\pi \) that selects any action in \(\textsf{Allowed}(s)\) with probability \(> 0\), the probability of starting from a state \(s\in X_i{\setminus } X_{i-1}\) and not reaching a state in \(X_0=F\) in i steps is less than \((1-p)^i\) where \(p = \min _{0<i \le n, s\in S, a\in \textsf{Prog}(s)} \pi (s,a)P(s' \mid s,a)\) and is nonzero. If in the i-th step, the set \(X_0\) is not reached, the agent will reach a state \(s'\in \textsf{ASW}(\varphi )\) from which an action in \(\textsf{Prog}(s')\) will be selected with a nonzero probability. Thus, the probability of never reaching a state in F is \(\lim _{k\rightarrow \infty } (1-p)^k =0\). In other words, the policy \(\pi \) ensure F is eventually reached with probability one. At the same time, because the set \(Y_j \cap U =\emptyset \) for all \(j >0\) during iterations, \(\textsf{ASW}(\varphi )\cap U = \emptyset \) and thus the probability of reaching a state in U is zero by following the policy \(\pi \).

B Proof of Theorem 2

Proof

We show that the ASW policy \(\widehat{\pi }_1: S\times 2^S\rightarrow \mathcal {D}(A)\) obtained from the augmented MDP is qualitatively observation-equivalent to an ASW policy for the user.

Consider a history \(h = s_0a_0s_1a_1\ldots s_n \) which is sampled from the stochastic process \(M_{\widehat{\pi }_1}\) and satisfies \(s_i \notin F_1 \cup U_1\) for \(0\le i <n\) and \(s_n \in F_1\). The history is associated with a history in the augmented MDP, \(\widehat{h} = (s_0, B_0)a_0(s_1,B_1)a_1\ldots (s_n,B_n)\) where \(B_0=\textsf{DObs}_S(s_0)\) is the initial belief for the defender. Due to the construction of the augmented MDP, for all \(0\le i \le n\), \(B_i\ne \emptyset \).

By the definition of qualitatively observation-equivalence, we only need to show that there exists \(h' =s_0'a'_0s_1'a'_1 \ldots s_n'\) where \(s_i'\in B_i\), for all \(i=0,\ldots , n\), such that \(\Pr (h', M_{\pi _0})>0\) where \(M_{\pi _0}\) is the Markov chain induced by a user’s ASW policy \(\pi _0\) from the MDP M. It is observed that \(a_i\) and \(a'_i\) may not be the same. By way of contradiction, suppose, for any state-action sequence \(h' =s_0'a'_0s_1'a'_1 \ldots s_n' \) where \(s_i'\in B_i\) and \(a'_i \in A(s_i')\) for \(0\le i \le n\), it holds that \(\Pr (h', M_{\pi _0})=0\). When \(\Pr (h', M_{\pi _0})=0\), there are two possible cases: First case: there exists some \(i\ge 0\) such that for all \(s\in B_i\), \( \textsf{Allowed}(s)=\emptyset \); second case: there does not exists an action a enabled from the belief \(B_i\) and a state \(s'\in B_{i+1}\) such that \(P(s'|s,a)>0\).

The first case is not possible because when for all \(s\in B_i\), \(\textsf{Allowed}(s)=\emptyset \), then the next state reached will be \((s_i,\emptyset )\) which is a sink state, contradicting the fact that \(\widehat{h} \) satisfies the reach-avoid objective. In the second case, if for any state \(s\in B_i\), for any action a enabled from s, \(\textsf{Post}(s,a) \cap B_{i+1}=\emptyset \), then \(B_{i+1}=\emptyset \), this again contracts that \(\widehat{h}\) visits a state in \(\widehat{F}\).

Thus, it holds that there exists \(h'\) such that \(\textsf{DObs}_S(h)=\textsf{DObs}_S(h')\) and \(\Pr (h', M_{\pi _0})>0\).

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fu, J. (2023). On Almost-Sure Intention Deception Planning that Exploits Imperfect Observers. In: Fang, F., Xu, H., Hayel, Y. (eds) Decision and Game Theory for Security. GameSec 2022. Lecture Notes in Computer Science, vol 13727. Springer, Cham. https://doi.org/10.1007/978-3-031-26369-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26369-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26368-2

  • Online ISBN: 978-3-031-26369-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics