Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Adaptive Honeypot Engagement Through Reinforcement Learning of Semi-Markov Decision Processes

  • Conference paper
  • First Online:
Decision and Game Theory for Security (GameSec 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11836))

Included in the following conference series:

Abstract

A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to the SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.

Q. Zhu—This research is supported in part by NSF under grant ECCS-1847056, CNS-1544782, and SES-1541164, and in part by ARO grant W911NF1910041.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    See the demo following URL: https://bit.ly/2QUz3Ok.

References

  1. Al-Shaer, E.S., Wei, J., Hamlen, K.W., Wang, C.: Autonomous Cyber Deception: Reasoning, Adaptive Planning, and Evaluation of HoneyThings. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02110-8

    Book  Google Scholar 

  2. Bianco, D.: The pyramid of pain (2013). http://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html

  3. Bradtke, S.J., Duff, M.O.: Reinforcement learning methods for continuous-time Markov decision problems. In: Advances in Neural Information Processing Systems, pp. 393–400 (1995)

    Google Scholar 

  4. Chen, D., Trivedi, K.S.: Optimization for condition-based maintenance with semi-Markov decision process. Reliab. Eng. Syst. Saf. 90(1), 25–29 (2005)

    Article  Google Scholar 

  5. Chen, J., Zhu, Q.: Security as a service for cloud-enabled internet of controlled things under advanced persistent threats: a contract design approach. IEEE Trans. Inf. Forensics Secur. 12(11), 2736–2750 (2017)

    Article  Google Scholar 

  6. Even-Dar, E., Mansour, Y.: Learning rates for Q-learning. J. Mach. Learn. Res. 5(Dec), 1–25 (2003)

    MathSciNet  MATH  Google Scholar 

  7. Farhang, S., Manshaei, M.H., Esfahani, M.N., Zhu, Q.: A dynamic Bayesian security game framework for strategic defense mechanism design. In: Poovendran, R., Saad, W. (eds.) GameSec 2014. LNCS, vol. 8840, pp. 319–328. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12601-2_18

    Chapter  MATH  Google Scholar 

  8. Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  9. Hayel, Y., Zhu, Q.: Attack-aware cyber insurance for risk sharing in computer networks. In: Khouzani, M., Panaousis, E., Theodorakopoulos, G. (eds.) GameSec 2015. LNCS, vol. 9406, pp. 22–34. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25594-1_2

    Chapter  MATH  Google Scholar 

  10. Hecker, C.R.: A methodology for intelligent honeypot deployment and active engagement of attackers. Ph.D. thesis (2012). aAI3534194

    Google Scholar 

  11. Horák, K., Zhu, Q., Bošanskỳ, B.: Manipulating adversary’s belief: a dynamic game approach to deception by design for proactive network security. In: Rass, S., An, B., Kiekintveld, C., Fang, F., Schauer, S. (eds.) GameSec 2017. LNCS, vol. 10575, pp. 273–294. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68711-7_15

    Chapter  MATH  Google Scholar 

  12. Hu, Q., Yue, W.: Markov Decision Processes with Their Applications, vol. 14. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-36951-8

    Book  MATH  Google Scholar 

  13. Huang, L., Chen, J., Zhu, Q.: Distributed and optimal resilient planning of large-scale interdependent critical infrastructures. In: 2018 Winter Simulation Conference (WSC), pp. 1096–1107. IEEE (2018)

    Google Scholar 

  14. Huang, L., Chen, J., Zhu, Q.: Factored Markov game theory for secure interdependent infrastructure networks. In: Rass, S., Schauer, S. (eds.) Game Theory for Security and Risk Management. SDGTFA, pp. 99–126. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75268-6_5

    Chapter  Google Scholar 

  15. Huang, L., Zhu, Q.: Adaptive strategic cyber defense for advanced persistent threats in critical infrastructure networks. ACM SIGMETRICS Perform. Eval. Rev. 46(2), 52–56 (2018)

    Article  Google Scholar 

  16. Huang, L., Zhu, Q.: A dynamic games approach to proactive defense strategies against advanced persistent threats in cyber-physical systems. arXiv preprint arXiv:1906.09687 (2019)

  17. Jajodia, S., Ghosh, A.K., Swarup, V., Wang, C., Wang, X.S.: Moving Target Defense: Creating Asymmetric Uncertainty for Cyber Threats, vol. 54. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-0977-9

    Book  Google Scholar 

  18. Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49(2–3), 209–232 (2002)

    Article  Google Scholar 

  19. Kearns, M.J., Singh, S.P.: Finite-sample convergence rates for q-learning and indirect algorithms. In: Advances in Neural Information Processing Systems, pp. 996–1002 (1999)

    Google Scholar 

  20. La, Q.D., Quek, T.Q., Lee, J., Jin, S., Zhu, H.: Deceptive attack and defense game in honeypot-enabled networks for the internet of things. IEEE Internet Things J. 3(6), 1025–1035 (2016)

    Article  Google Scholar 

  21. Liang, H., Cai, L.X., Huang, D., Shen, X., Peng, D.: An SMDP-based service model for interdomain resource allocation in mobile cloud networks. IEEE Trans. Veh. Technol. 61(5), 2222–2232 (2012)

    Article  Google Scholar 

  22. Luo, T., Xu, Z., Jin, X., Jia, Y., Ouyang, X.: IoTCandyJar: Towards an intelligent-interaction honeypot for IoT devices. Black Hat (2017)

    Google Scholar 

  23. Mudrinich, E.M.: Cyber 3.0: the department of defense strategy for operating in cyberspace and the attribution problem. AFL Rev. 68, 167 (2012)

    Google Scholar 

  24. Nakagawa, T.: Stochastic Processes: with Applications to Reliability Theory. Springer, London (2011). https://doi.org/10.1007/978-0-85729-274-2

    Book  MATH  Google Scholar 

  25. Paruchuri, P., Pearce, J.P., Marecki, J., Tambe, M., Ordonez, F., Kraus, S.: Playing games for security: an efficient exact algorithm for solving Bayesian stackelberg games. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, vol. 2, pp. 895–902. International Foundation for Autonomous Agents and Multiagent Systems (2008)

    Google Scholar 

  26. Pauna, A., Iacob, A.C., Bica, I.: QRASSH-a self-adaptive SSH honeypot driven by Q-learning. In: 2018 International Conference on Communications (COMM), pp. 441–446. IEEE (2018)

    Google Scholar 

  27. Pawlick, J., Colbert, E., Zhu, Q.: Modeling and analysis of leaky deception using signaling games with evidence. IEEE Trans. Inf. Forensics Secur. 14(7), 1871–1886 (2018)

    Article  Google Scholar 

  28. Pawlick, J., Colbert, E., Zhu, Q.: A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy. ACM Comput. Surv. (CSUR) (2019, to appear )

    Google Scholar 

  29. Pawlick, J., Farhang, S., Zhu, Q.: Flip the cloud: cyber-physical signaling games in the presence of advanced persistent threats. In: Khouzani, M., Panaousis, E., Theodorakopoulos, G. (eds.) GameSec 2015. LNCS, vol. 9406, pp. 289–308. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25594-1_16

    Chapter  Google Scholar 

  30. Pawlick, J., Nguyen, T.T.H., Colbert, E., Zhu, Q.: Optimal timing in dynamic and robust attacker engagement during advanced persistent threats. In: 2019 17th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), pp. 1–6. IEEE (2019)

    Google Scholar 

  31. Pawlick, J., Zhu, Q.: A Stackelberg game perspective on the conflict between machine learning and data obfuscation. In: 2016 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–6. IEEE (2016). http://ieeexplore.ieee.org/abstract/document/7823893/

  32. Pawlick, J., Zhu, Q.: A mean-field stackelberg game approach for obfuscation adoption in empirical risk minimization. In: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 518–522. IEEE (2017)

    Google Scholar 

  33. Pawlick, J., Zhu, Q.: Proactive defense against physical denial of service attacks using poisson signaling games. In: Rass, S., An, B., Kiekintveld, C., Fang, F., Schauer, S. (eds.) GameSec 2017. LNCS, vol. 10575, pp. 336–356. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68711-7_18

    Chapter  Google Scholar 

  34. Pouget, F., Dacier, M., Debar, H.: White paper: honeypot, honeynet, honeytoken: terminological issues. Rapport technique EURECOM 1275 (2003)

    Google Scholar 

  35. Rid, T., Buchanan, B.: Attributing cyber attacks. J. Strateg. Stud. 38(1–2), 4–37 (2015)

    Article  Google Scholar 

  36. Sahabandu, D., Xiao, B., Clark, A., Lee, S., Lee, W., Poovendran, R.: DIFT games: dynamic information flow tracking games for advanced persistent threats. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 1136–1143. IEEE (2018)

    Google Scholar 

  37. Spitzner, L.: Honeypots: Tracking Hackers, vol. 1. Addison-Wesley, Reading (2003)

    Google Scholar 

  38. Sun, Y., Uysal-Biyikoglu, E., Yates, R.D., Koksal, C.E., Shroff, N.B.: Update or wait: how to keep your data fresh. IEEE Trans. Inf. Theory 63(11), 7492–7508 (2017)

    Article  MathSciNet  Google Scholar 

  39. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(Jul), 1633–1685 (2009)

    MathSciNet  MATH  Google Scholar 

  40. Wagener, G., State, R., Dulaunoy, A., Engel, T.: Self adaptive high interaction honeypots driven by game theory. In: Guerraoui, R., Petit, F. (eds.) SSS 2009. LNCS, vol. 5873, pp. 741–755. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-05118-0_51

    Chapter  Google Scholar 

  41. Wang, K., Du, M., Maharjan, S., Sun, Y.: Strategic honeypot game model for distributed denial of service attacks in the smart grid. IEEE Trans. Smart Grid 8(5), 2474–2482 (2017)

    Article  Google Scholar 

  42. Xu, Z., Zhu, Q.: A cyber-physical game framework for secure and resilient multi-agent autonomous systems. In: 2015 IEEE 54th Annual Conference on Decision and Control (CDC), pp. 5156–5161. IEEE (2015)

    Google Scholar 

  43. Zhang, R., Zhu, Q., Hayel, Y.: A bi-level game approach to attack-aware cyber insurance of computer networks. IEEE J. Sel. Areas Commun. 35(3), 779–794 (2017)

    Article  Google Scholar 

  44. Zhang, T., Zhu, Q.: Dynamic differential privacy for ADMM-based distributed classification learning. IEEE Trans. Inf. Forensics Secur. 12(1), 172–187 (2017). http://ieeexplore.ieee.org/abstract/document/7563366/

    Article  MathSciNet  Google Scholar 

  45. Zhang, T., Zhu, Q.: Distributed privacy-preserving collaborative intrusion detection systems for vanets. IEEE Trans. Sig. Inf. Process. Netw. 4(1), 148–161 (2018)

    MathSciNet  Google Scholar 

  46. Zhu, Q., Başar, T.: Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE Control Syst. Mag. 35(1), 46–65 (2015)

    Article  MathSciNet  Google Scholar 

  47. Zhu, Q., Başar, T.: Dynamic policy-based IDS configuration. In: Proceedings of the 48th IEEE Conference on Decision and Control, 2009 Held Jointly with the 2009 28th Chinese Control Conference, CDC/CCC 2009, pp. 8600–8605. IEEE (2009)

    Google Scholar 

  48. Zhu, Q., Başar, T.: Game-theoretic approach to feedback-driven multi-stage moving target defense. In: Das, S.K., Nita-Rotaru, C., Kantarcioglu, M. (eds.) GameSec 2013. LNCS, vol. 8252, pp. 246–263. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-02786-9_15

    Chapter  MATH  Google Scholar 

  49. Zhu, Q., Clark, A., Poovendran, R., Basar, T.: Deployment and exploitation of deceptive honeybots in social networks. In: 2013 IEEE 52nd Annual Conference on Decision and Control (CDC), pp. 212–219. IEEE (2013)

    Google Scholar 

  50. Zhu, Q., Fung, C., Boutaba, R., Başar, T.: GUIDEX: a game-theoretic incentive-based mechanism for intrusion detection networks. IEEE J. Sel. Areas Commun. 30(11), 2220–2230 (2012)

    Article  Google Scholar 

  51. Zhuang, J., Bier, V.M., Alagoz, O.: Modeling secrecy and deception in a multiple-period attacker-defender signaling game. Eur. J. Oper. Res. 203(2), 409–418 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linan Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, L., Zhu, Q. (2019). Adaptive Honeypot Engagement Through Reinforcement Learning of Semi-Markov Decision Processes. In: Alpcan, T., Vorobeychik, Y., Baras, J., Dán, G. (eds) Decision and Game Theory for Security. GameSec 2019. Lecture Notes in Computer Science(), vol 11836. Springer, Cham. https://doi.org/10.1007/978-3-030-32430-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32430-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32429-2

  • Online ISBN: 978-3-030-32430-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics