On Almost-Sure Intention Deception Planning that Exploits Imperfect Observers

Fu, Jie

doi:10.1007/978-3-031-26369-9_4

Jie Fu ORCID: orcid.org/0000-0002-4470-2827¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13727))

Included in the following conference series:

International Conference on Decision and Game Theory for Security

455 Accesses

Abstract

Intention deception involves computing a strategy which deceives the opponent into a wrong belief about the agent’s intention or objective. This paper studies a class of probabilistic planning problems with intention deception and investigates how a defender’s limited sensing modality can be exploited by an attacker to achieve its attack objective almost surely (with probability one) while hiding its intention. In particular, we model the attack planning in a stochastic system modeled as a Markov decision process (MDP). The attacker is to reach some target states while avoiding unsafe states in the system and knows that his behavior is monitored by a defender with partial observations. Given partial state observations for the defender, we develop qualitative intention deception planning algorithms that construct attack strategies to play against an action-visible defender and an action-invisible defender, respectively. The synthesized attack strategy not only ensures the attack objective is satisfied almost surely but also deceives the defender into believing that the observed behavior is generated by a normal/legitimate user and thus failing to detect the presence of an attack. We show the proposed algorithms are correct and complete and illustrate the deceptive planning methods with examples.

Research was sponsored by the Army Research Office and was accomplished under Grant Number W911NF-22-1-0034.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Opacity-Enforcing Active Perception and Control Against Eavesdropping Attacks

Online Cyber Deception System Using Partially Observable Monte-Carlo Planning Framework

What Should Be Observed for Optimal Reward in POMDPs?

Notes

1.
In temporal logic, the reachability objective is expressed as $\textsf{true}\, \textsf{U}\,F$ and the safety objective is expressed as $\lnot (\textsf{true}\, \textsf{U}\,U)$.
2.
The computation is performed in a MacBook Pro with 16 GB memory and Apple M1 Pro chip. The computation time is the total time taken to compute the attacker’s intention deception ASW region in the augmented MDP.
3.
Videos of the sampled runs for case $\text{(c)-B-I },\text{(c)-P-I }$ can be found at https://bit.ly/3BiPRb9 where the light green cells are states in the defender’s belief.

References

Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)
MATH Google Scholar
Baier, C., Katoen, J.P.: Principles of Model Checking (Representation and Mind Series). The MIT Press, Cambridge (2008)
MATH Google Scholar
Bernardini, S., Fagnani, F., Franco, S.: An optimization approach to robust goal obfuscation. In: Proceedings of the Seventeenth International Conference on Principles of Knowledge Representation and Reasoning, pp. 119–129. International Joint Conferences on Artificial Intelligence Organization, Rhodes (2020)
Google Scholar
Chatterjee, K., Doyen, L.: Partial-observation stochastic games: how to win when belief fails. CoRR (2011)
Google Scholar
Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-observable Markov decision processes. In: Hliněný, P., Kučera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 258–269. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15155-2_24
Chapter MATH Google Scholar
Chen, P., Desmet, L., Huygens, C.: A study on advanced persistent threats. In: De Decker, B., Zúquete, A. (eds.) CMS 2014. LNCS, vol. 8735, pp. 63–72. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44885-4_5
Chapter Google Scholar
Dubreil, J., Darondeau, P., Marchand, H.: Supervisory control for opacity. IEEE Trans. Autom. Control 55(5), 1089–1100 (2010)
Article MathSciNet MATH Google Scholar
Gimbert, H., Oualhadj, Y., Paul, S.: Computing optimal strategies for Markov decision processes with parity and positive-average conditions (2011). https://hal.archives-ouvertes.fr/hal-00559173
Jacob, R., Lesage, J.J., Faure, J.M.: Overview of discrete event systems opacity: models, validation, and quantification. Annu. Rev. Control. 41, 135–146 (2016)
Article Google Scholar
Karabag, M.O., Ornik, M., Topcu, U.: Deception in supervisory control. IEEE Trans. Autom. Control 1 (2021)
Google Scholar
Kulkarni, A., Srivastava, S., Kambhampati, S.: A unified framework for planning in adversarial and cooperative environments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 2479–2487 (2019)
Google Scholar
Kulkarni, A., Srivastava, S., Kambhampati, S.: Signaling friends and head-faking enemies simultaneously: balancing goal obfuscation and goal legibility. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2020, pp. 1889–1891. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2020)
Google Scholar
Kupferman, O., Vardi, M.Y.: Model checking of safety properties. Formal Methods Syst. Design 19(3), 291–314 (2001)
Article MATH Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Chapter Google Scholar
Lin, F.: Opacity of discrete event systems and its applications. Automatica 47(3), 496–503 (2011). https://doi.org/10.1016/j.automatica.2011.01.002
Article MathSciNet MATH Google Scholar
Masters, P., Sardina, S.: Deceptive path-planning. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 4368–4375. International Joint Conferences on Artificial Intelligence Organization, Melbourne (2017). https://doi.org/10.24963/ijcai.2017/610, https://www.ijcai.org/proceedings/2017/610
Saboori, A., Hadjicostis, C.N.: Opacity-enforcing supervisory strategies via state estimator constructions. IEEE Trans. Autom. Control 57(5), 1155–1165 (2012). https://doi.org/10.1109/TAC.2011.2170453
Article MathSciNet MATH Google Scholar
Salem, M.B., Hershkop, S., Stolfo, S.J.: A survey of insider attack detection research. In: Stolfo, S.J., Bellovin, S.M., Keromytis, A.D., Hershkop, S., Smith, S.W., Sinclair, S. (eds.) Insider Attack and Cyber Security: Beyond the Hacker. Advances in Information Security, vol. 39, pp. 69–90. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-77322-3_5
Chapter Google Scholar
Zhang, Y., Shell, D.A.: Plans that remain private, even in hindsight. In: The AAAI Workshop on Privacy-Preserving Artificial Intelligence (AAAI-PPAI), p. 5 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Florida, Gainesville, FL, 32611, USA
Jie Fu

Authors

Jie Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Fu .

Editor information

Editors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Fei Fang
University of Chicago, Chicago, IL, USA
Haifeng Xu
Université d'Avignon, Avignon, France
Yezekael Hayel

Appendices

A Proof of Proposition 2 and the Construction of ASW Region and ASW Strategies

Proof

First, we provide the algorithm to solve the ASW region $\textsf{ASW}(\varphi )$ and ASW strategy $\pi $ for the task $\varphi {:}{=}\lnot U \, \textsf{U}\,F$ where $U \cap F = \emptyset $.

1.
Initiate $X_0=F$ and $Y_0=S {\setminus } U$. Let $i=j=1$.
2.
Let $X_{i+1} = \{X_i\} \cup \{s \in Y_j {\setminus } X_i\mid \exists a\in A(s), \textsf{Post}(s,a) \cap X_i \ne \emptyset \text { and } \textsf{Post}(s,a)\subseteq Y_j\}$
3.
If $X_{i+1}\ne X_i$, then let $i=i+1$ and go to step 2; else, let $n=i$ and go to step 4.
4.
Let $Y_{j+1}=X_i$. If $Y_{j+1}= Y_j$, then $\textsf{ASW}(\varphi )=Y_j$. Return $\{X_i, i=1,\ldots , n\}$ computed from the last iteration. Else, let $j=j+1$ and $i=0$ and go to step 2.

The algorithm returns a set of level sets $X_i, i=0,\ldots , n$ for some $n\ge 0$ and the ASW region $\textsf{ASW}(\varphi )$. Recall that $\textsf{Allowed}: S\rightarrow 2^A $ is defined by $\textsf{Allowed}(s) = \{a\in A(s)\mid \textsf{Post}(s,a)\subseteq \textsf{ASW}(\varphi )\}$. The following property holds: For each $s\in X_i\setminus X_{i-1}$, there exists an action $a\in \textsf{Allowed}(s)$ that ensures, with a positive probability, the next state is in $X_{i-1}$ and with probability one, the next state is in $\textsf{ASW}(\varphi )$. The strategy $\pi : \textsf{ASW}(\varphi )\rightarrow \mathcal {D}(A)$ is almost-sure winning if for every state $s\in \textsf{ASW}(\varphi )$, $\text{ Supp }(\pi (s)) = \textsf{Allowed}(s)$. That is, for every permissible action a at state $s \in \textsf{ASW}(\varphi )$, $\pi (s)$ selects that action with a non-zero probability. The ASW strategy may not be unique.

Let’s define a function $\textsf{Prog}: \textsf{ASW}(\varphi )\rightarrow 2^A$ such that for each $s\in X_i{\setminus } X_{i-1}$, $\textsf{Prog}(s)=\{a\in \textsf{Allowed}(s)\mid \textsf{Post}(s,a)\cap X_{i-1}\ne \emptyset \}$. Intuitively, the set $\textsf{Prog}(s)$ is a set of actions, each of which ensures that a progress (to a lower level set) can be made with a positive probability.

Therefore, by following a policy $\pi $ that selects any action in $\textsf{Allowed}(s)$ with probability $> 0$, the probability of starting from a state $s\in X_i{\setminus } X_{i-1}$ and not reaching a state in $X_0=F$ in i steps is less than $(1-p)^i$ where $p = \min _{0<i \le n, s\in S, a\in \textsf{Prog}(s)} \pi (s,a)P(s' \mid s,a)$ and is nonzero. If in the i-th step, the set $X_0$ is not reached, the agent will reach a state $s'\in \textsf{ASW}(\varphi )$ from which an action in $\textsf{Prog}(s')$ will be selected with a nonzero probability. Thus, the probability of never reaching a state in F is $\lim _{k\rightarrow \infty } (1-p)^k =0$. In other words, the policy $\pi $ ensure F is eventually reached with probability one. At the same time, because the set $Y_j \cap U =\emptyset $ for all $j >0$ during iterations, $\textsf{ASW}(\varphi )\cap U = \emptyset $ and thus the probability of reaching a state in U is zero by following the policy $\pi $.

B Proof of Theorem 2

Proof

We show that the ASW policy $\widehat{\pi }_1: S\times 2^S\rightarrow \mathcal {D}(A)$ obtained from the augmented MDP is qualitatively observation-equivalent to an ASW policy for the user.

Consider a history $h = s_0a_0s_1a_1\ldots s_n $ which is sampled from the stochastic process $M_{\widehat{\pi }_1}$ and satisfies $s_i \notin F_1 \cup U_1$ for $0\le i <n$ and $s_n \in F_1$. The history is associated with a history in the augmented MDP, $\widehat{h} = (s_0, B_0)a_0(s_1,B_1)a_1\ldots (s_n,B_n)$ where $B_0=\textsf{DObs}_S(s_0)$ is the initial belief for the defender. Due to the construction of the augmented MDP, for all $0\le i \le n$, $B_i\ne \emptyset $.

By the definition of qualitatively observation-equivalence, we only need to show that there exists $h' =s_0'a'_0s_1'a'_1 \ldots s_n'$ where $s_i'\in B_i$, for all $i=0,\ldots , n$, such that $\Pr (h', M_{\pi _0})>0$ where $M_{\pi _0}$ is the Markov chain induced by a user’s ASW policy $\pi _0$ from the MDP M. It is observed that $a_i$ and $a'_i$ may not be the same. By way of contradiction, suppose, for any state-action sequence $h' =s_0'a'_0s_1'a'_1 \ldots s_n' $ where $s_i'\in B_i$ and $a'_i \in A(s_i')$ for $0\le i \le n$, it holds that $\Pr (h', M_{\pi _0})=0$. When $\Pr (h', M_{\pi _0})=0$, there are two possible cases: First case: there exists some $i\ge 0$ such that for all $s\in B_i$, $ \textsf{Allowed}(s)=\emptyset $; second case: there does not exists an action a enabled from the belief $B_i$ and a state $s'\in B_{i+1}$ such that $P(s'|s,a)>0$.

The first case is not possible because when for all $s\in B_i$, $\textsf{Allowed}(s)=\emptyset $, then the next state reached will be $(s_i,\emptyset )$ which is a sink state, contradicting the fact that $\widehat{h} $ satisfies the reach-avoid objective. In the second case, if for any state $s\in B_i$, for any action a enabled from s, $\textsf{Post}(s,a) \cap B_{i+1}=\emptyset $, then $B_{i+1}=\emptyset $, this again contracts that $\widehat{h}$ visits a state in $\widehat{F}$.

Thus, it holds that there exists $h'$ such that $\textsf{DObs}_S(h)=\textsf{DObs}_S(h')$ and $\Pr (h', M_{\pi _0})>0$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, J. (2023). On Almost-Sure Intention Deception Planning that Exploits Imperfect Observers. In: Fang, F., Xu, H., Hayel, Y. (eds) Decision and Game Theory for Security. GameSec 2022. Lecture Notes in Computer Science, vol 13727. Springer, Cham. https://doi.org/10.1007/978-3-031-26369-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-26369-9_4
Published: 09 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26368-2
Online ISBN: 978-3-031-26369-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Almost-Sure Intention Deception Planning that Exploits Imperfect Observers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Opacity-Enforcing Active Perception and Control Against Eavesdropping Attacks

Online Cyber Deception System Using Partially Observable Monte-Carlo Planning Framework

What Should Be Observed for Optimal Reward in POMDPs?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof of Proposition 2 and the Construction of ASW Region and ASW Strategies

Proof

B Proof of Theorem 2

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On Almost-Sure Intention Deception Planning that Exploits Imperfect Observers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Opacity-Enforcing Active Perception and Control Against Eavesdropping Attacks

Online Cyber Deception System Using Partially Observable Monte-Carlo Planning Framework

What Should Be Observed for Optimal Reward in POMDPs?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof of Proposition 2 and the Construction of ASW Region and ASW Strategies

Proof

B Proof of Theorem 2

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation