Abstract
The cybersecurity threat landscape has lately become overly complex. Threat actors leverage weaknesses in the network and endpoint security in a very coordinated manner to perpetuate sophisticated attacks that could bring down the entire network and many critical hosts in the network. To defend against such attacks, cybersecurity solutions are upgrading from the traditional to advanced deep and machine learning defense mechanisms for threat detection and protection. The application of these techniques has been reviewed well in the scientific literature. Deep Reinforcement Learning has shown great promise in developing AI solutions for areas that had earlier required advanced human cognizance. Different techniques and algorithms under deep reinforcement learning have shown great promise in applications ranging from games to industrial processes, where it is claimed to augment systems with general AI capabilities. These algorithms have recently also been used in cybersecurity, especially in threat detection and protection, where these are showing state-of-the-art results. Unlike supervised machine learning and deep learning, deep reinforcement learning is used in more diverse ways and is empowering many innovative applications in the threat defense landscape. However, there does not exist any comprehensive review of deep reinforcement learning applications in advanced cybersecurity threat detection and protection. Therefore, in this paper, we intend to fill this gap and provide a comprehensive review of the different applications of deep reinforcement learning in this field.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abu Rajab, M., Zarfoss, J., Monrose, F., & Terzis, A. (2006). A multifaceted approach to understanding the botnet phenomenon. In Proceedings of the 6th ACM SIGCOMM conference on internet measurement (p. 41–52). Association for Computing Machinery. https://doi.org/10.1145/1177080.1177086
Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052.
Anderson, H. S., Kharkar, A., Filar, B., & Roth, P. (2017). Evading machine learning malware detection. Black Hat
Apruzzese, G., Andreolini, M., Marchetti, M., Venturi, A., & Colajanni, M. (2020). Deep reinforcement adversarial learning against botnet evasion attacks. IEEE Transactions on Network and Service Management, 17(4), 1975–1987. https://doi.org/10.1109/TNSM.2020.3031843.
Arjoune, Y., & Faruque, S. (2020). Smart jamming attacks in 5g new radio: A review. In: 2020 10th annual computing and communication workshop and conference (CCWC) (pp. 1010–1015). https://doi.org/10.1109/CCWC47524.2020.9031175
Athiwaratkun, B., & Stokes, J. W. (2017). Malware classification with lstm and gru language models and a character-level cnn. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2482–2486). https://doi.org/10.1109/ICASSP.2017.7952603
Behera, C. K., & Bhaskari, D. L. (2015). Different obfuscation techniques for code protection. Procedia Computer Science, 70, 757–763.
Berman, D. S., Buczak, A. L., Chavis, J. S., & Corbett, C. L. (2019). A survey of deep learning methods for cyber security. Information, 10(4). https://doi.org/10.3390/info10040122
Bhuyan, M. H., Bhattacharyya, D. K., & Kalita, J. K. (2014). Network anomaly detection: Methods, systems and tools. IEEE Communications Surveys Tutorials, 16(1), 303–336. https://doi.org/10.1109/SURV.2013.052213.00046.
Birman, Y., Hindi, S., Katz, G., & Shabtai, A. (2020). Cost-effective malware detection as a service over serverless cloud using deep reinforcement learning. In: 2020 20th IEEE/ACM international symposium on cluster, cloud and internet computing (CCGRID) (pp. 420–429). https://doi.org/10.1109/CCGrid49817.2020.00-51
Bridges, R. A., Glass-Vanderlan, T. R., Iannacone, M. D., Vincent, M. S., & Chen, Q. G. (2019). A survey of intrusion detection systems leveraging host data. ACM Computing Surveys, 52(6). https://doi.org/10.1145/3344382
Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156–172. https://doi.org/10.1109/TSMCC.2007.913919.
Cai, Y., Shi, K., Song, F., Xu, Y., Wang, X., & Luan, H. (2019). Jamming pattern recognition using spectrum waterfall: A deep learning method. In 2019 IEEE 5th international conference on computer and communications (ICCC) (pp. 2113–2117). https://doi.org/10.1109/ICCC47050.2019.9064207
Chalaki, B., Beaver, L. E., Remer, B., Jang, K., Vinitsky, E., Bayen, A. M., & Malikopoulos, A. A. (2020). Zero-shot autonomous vehicle policy transfer: From simulation to real-world via adversarial learning. In 2020 IEEE 16th international conference on control & automation (ICCA) (pp. 35–40). https://doi.org/10.1109/ICCA51439.2020.9264552
Chen, Y., Li, Y., Xu, D., & Xiao, L. (2018). Dqn-based power control for iot transmission against jamming. In 2018 IEEE 87th vehicular technology conference (VTC Spring) (pp. 1–5). https://doi.org/10.1109/VTCSpring.2018.8417695
Chen, Y., Li, Y., Xu, D., & Xiao, L. (2018). Dqn-based power control for iot transmission against jamming. In 2018 IEEE 87th vehicular technology conference (VTC Spring) (pp. 1–5). IEEE
Chow, Y., & Ghavamzadeh, M. (2014). Algorithms for cvar optimization in mdps. In Advances in neural information processing systems (NIPS) (pp. 3509–3517)
Das, A., & Rad, P. (2020). Opportunities and challenges in explainable artificial intelligence (XAI): A survey. arXiv:2006.11371
David, W. (2019). UNSW-NB15 datasets. https://www.kaggle.com/mrwellsdavid/unsw-nb15. Accessed 2021 June 27
Dazeley, R., Vamplew, P., & Cruz, F. (2021). Explainable reinforcement learning for broad-XAI: A Conceptual framework and survey. arXiv:2108.09003
Fang, Z., Wang, J., Li, B., Wu, S., Zhou, Y., & Huang, H. (2019). Evading anti-malware engines with deep reinforcement learning. IEEE Access, 7, 48867–48879. https://doi.org/10.1109/ACCESS.2019.2908033.
Firstbrook, P., Hallawell, A., Girard, J., & MacDonald, N. (2009). Magic quadrant for endpoint protection platforms. Gartner RAS Core Research Note G, 208912
Gülmez, H. G., & Angın, P. (2021). A study on the efficacy of deep reinforcement learning for intrusion detection. Sakarya University Journal of Computer and Information Sciences, 4, 11–25. https://doi.org/10.35377/saucis.04.01.834048.
Han, Y., Rubinstein, B. I., Abraham, T., Alpcan, T., Vel, O. D., Erfani, S., Hubczenko, D., Leckie, C., & Montague, P. (2018). Reinforcement learning for autonomous defence in software-defined networking. In International conference on decision and game theory for security (pp. 145–165). Springer . https://doi.org/10.1007/978-3-030-01554-1_9
Han, G., Xiao, L., & Poor, H. V. (2017). Two-dimensional anti-jamming communication based on deep reinforcement learning. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2087–2091). IEEE. https://doi.org/10.1109/ICASSP.2017.7952524
Han, G., Xiao, L., & Poor, H. V. (2017). Two-dimensional anti-jamming communication based on deep reinforcement learning. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2087–2091). IEEE
Heady, R., Luger, G., Maccabe, A., & Servilla, M. (1990). The architecture of a network level intrusion detection system. Office of Scientific and Technical Information, U.S: Department of Energy. https://doi.org/10.2172/425295.
Hsu, Y. F., & Matsuoka, M. (2020). A deep reinforcement learning approach for anomaly network intrusion detection system. In 2020 IEEE 9th international conference on cloud networking (CloudNet) (pp. 1–6). https://doi.org/10.1109/CloudNet51028.2020.9335796
Hu, W., & Tan, Y. (2017). Generating adversarial malware examples for black-box attacks based on gan. arXiv:1702.05983
Kienzle, D. M., & Elder, M. C. (2003). Recent worms: A survey and trends. In Proceedings of the 2003 ACM workshop on rapid malcode, WORM ’03, (p. 1–10). Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/948187.948189
Lakshmi, V. (2019). Beginning Security with Microsoft Technologies. Springer. https://doi.org/10.1007/978-1-4842-4853-9.
Leibo, J. Z., Dueñez-Guzman, E. A., Vezhnevets, A., Agapiou, J. P., Sunehag, P., Koster, R., Matyas, J., Beattie, C., Mordatch, I., & Graepel, T. (2021). Scalable evaluation of multi-agent reinforcement learning with melting pot. In International conference on machine learning (pp. 6187–6199). PMLR
Li, Y., Liu, J., Li, Q., & Xiao, L. (2015). Mobile cloud offloading for malware detections with learning. In 2015 IEEE conference on computer communications workshops (INFOCOM WKSHPS) (pp. 197–201). https://doi.org/10.1109/INFCOMW.2015.7179384
Liao, H. J., Richard Lin, C. H., Lin, Y. C., & Tung, K. Y. (2013). Intrusion detection system: A comprehensive review. Journal of Network and Computer Applications, 36(1), 16–24. https://doi.org/10.1016/j.jnca.2012.09.004.
Lin, Z., Shi, Y., & Xue, Z. (2022). Idsgan: Generative adversarial networks for attack generation against intrusion detection. In Pacific-asia conference on knowledge discovery and data mining (pp. 79–91). Springer
Liu, S. (2020). Endpoint detection and response (EDR) and endpoint protection platform (EPP) market size worldwide from 2015 to 2020 . https://www.statista.com/statistics/799060/worldwideedr-epp-market-size/ . Accessed 2021 June 27
Liu, Y., Dong, M., Ota, K., Li, J., & Wu, J. (2018). Deep reinforcement learning based smart mitigation of ddos flooding in software-defined networks. In 2018 IEEE 23rd international workshop on computer aided modeling and design of communication links and networks (CAMAD) (pp. 1–6). https://doi.org/10.1109/CAMAD.2018.8514971
Liu, X., Xu, Y., Jia, L., Wu, Q., & Anpalagan, A. (2018). Anti-jamming communications using spectrum waterfall: A deep reinforcement learning approach. IEEE Communications Letters, 22(5), 998–1001. https://doi.org/10.1109/LCOMM.2018.2815018.
Liu, X., Xu, Y., Jia, L., Wu, Q., & Anpalagan, A. (2018). Anti-jamming communications using spectrum waterfall: A deep reinforcement learning approach. IEEE Communications Letters, 22(5), 998–1001.
Lopez-Martin, M., Carro, B., & Sanchez-Esguevillas, A. (2020). Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Systems with Applications, 141, 112963. https://doi.org/10.1016/j.eswa.2019.112963.
Malialis, K., & Kudenko, D. (2015). Distributed response to network intrusions using multiagent reinforcement learning. Engineering Applications of Artificial Intelligence, 41, 270–284. https://doi.org/10.1016/j.engappai.2015.01.013.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016) Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928–1937). PMLR
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236.
Mohi-ud din, G. (2017). NSL-KDD dataset. https://www.unb.ca/cic/datasets/nsl.html. Accessed 2020 June 27
Nappa, A., Rafique, M. Z., & Caballero, J. (2015). The malicia dataset: identification and analysis of drive-by download operations. International Journal of Information Security, 14(1), 15–33.
Nguyen, T. T., & Reddi, V. J. (2021). Deep reinforcement learning for cyber security. IEEE Transactions on Neural Networks and Learning Systems, pp. 1–17. https://doi.org/10.1109/TNNLS.2021.3121870
OroojlooyJadid, A., & Hajinezhad, D. (2019). A review of cooperative multi-agent deep reinforcement learning. arXiv:1908.03963
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. https://doi.org/10.1109/TKDE.2009.191.
Pao, D., Lin, W., & Liu, B. (2010). A memory-efficient pipelined implementation of the aho-corasick string-matching algorithm. ACM Transactions on Architecture and Code Optimization (TACO), 7(2), 1–27.
Rathore, H., Nikam, P., Sahay, S. K., & Sewak, M. (2021a). Identification of adversarial android intents using reinforcement learning. In 2021 international joint conference on neural networkks (IJCNN), (pp. 1–8). IEEE
Rathore, H., Sahay, S. K., Rajvanshi, R., & Sewak, M. (2020a). Identification of significant permissions for efficient android malware detection. In International conference on broadband communications, networks and systems, (pp. 33–52). Springer
Rathore, H., Sahay, S. K., Thukral, S., & Sewak, M. (2020b). Detection of malicious android applications: Classical machine learning vs. deep neural network integrated with clustering. In International conference on broadband communications, networks and systems (pp. 109–128). Springer
Rathore, H., Sharma, S. C., Sahay, S. K., & Sewak, M. (2022c). Are malware detection classifiers adversarially vulnerable to actor-critic based evasion attacks? EAI Endorsed Transactions on Scalable Information Systems pp. e74
Rathore, H., Sahay, S. K., Nikam, P., & Sewak, M. (2021). Robust android malware detection system against adversarial attacks using q-learning. Information Systems Frontiers, 23(4), 867–882.
Rathore, H., Samavedhi, A., Sahay, S. K., & Sewak, M. (2021). Robust malware detection models: learning from adversarial attacks and defenses. Forensic Science International: Digital Investigation, 37, 301183.
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning (pp. 1889–1897). PMLR
Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347
Sethi, K., Edupuganti, S., Kumar, R., Bera, P., & Madhav, Y. (2020). A context-aware robust intrusion detection system: a reinforcement learning-based approach. International Journal of Information Security, 19,. https://doi.org/10.1007/s10207-019-00482-7.
Sewak, M. (2019a). Coding the environment and mdp solution. In Deep reinforcement learning (pp. 29–49). Springer. https://doi.org/10.1007/978-981-13-8285-7_3
Sewak, M. (2019b). Deep q network (dqn), double dqn, and dueling dqn. In Deep reinforcement learning (pp. 95–108). Springer. https://doi.org/10.1007/978-981-13-8285-7_8
Sewak, M. (2019d). Deterministic policy gradient and the ddpg. In Deep reinforcement learning (pp. 173–184). Springer. https://doi.org/10.1007/978-981-13-8285-7_13
Sewak, M. (2019e). Introduction to reinforcement learning. In Deep reinforcement learning (pp. 1–18). Springer. https://doi.org/10.1007/978-981-13-8285-7_1
Sewak, M. (2019f). Policy-based reinforcement learning approaches. In Deep reinforcement learning (pp. 127–140). Springer. https://doi.org/10.1007/978-981-13-8285-7_10
Sewak, M., Sahay, S., & Rathore, H. (2020a). Value-approximation based deep reinforcement learning techniques: An overview. In 2020 IEEE 5th international conference on computing communication and automation (ICCCA) (pp. 379–384). https://doi.org/10.1109/ICCCA49541.2020.9250787
Sewak, M., Sahay, S. K., & Rathore, H. (2020b). Deepintent: Implicitintent based android ids with e2e deep learning architecture. In 2020 IEEE 31st annual international symposium on personal, indoor and mobile radio communications (pp. 1–6). IEEE
Sewak, M., Sahay, S. K., & Rathore, H. (2020c). DOOM: a novel adversarial-drl-based op-code level metamorphic malware obfuscator for the enhancement of IDS. In UbiComp/ISWC ’20: 2020 ACM international joint conference on pervasive and ubiquitous computing and 2020 ACM international symposium on wearable computers, Virtual Event, Mexico, September 12-17, 2020 (pp. 131–134). ACM. https://doi.org/10.1145/3410530.3414411
Sewak, M., Sahay, S. K., & Rathore, H. (2021b). Adversarialuscator: An adversarial-drl based obfuscator and metamorphic malware swarm generator. In International joint conference on neural networks (IJCNN 2021), (pp. 1–9.) IEEE . https://doi.org/10.1109/IJCNN52387.2021.9534016
Sewak, M., Sahay, S. K., & Rathore, H. (2021c). Dro: A data-scarce mechanism to revolutionize the performance of dl-based security systems. In 46th IEEE Conference on Local Computer Networks (LCN 2021) (pp. 581–588). IEEE . https://doi.org/10.1109/LCN52139.2021.9524929
Sewak, M., Sahay, S. K., & Rathore, H. (2022). Policy-approximation based deep reinforcement learning techniques: an overview. Information and Communication Technology for Competitive Strategies (ICTCS 2020) (pp. 493–507)
Sewak, M., Sahay, S., & Rathore, H. (2021). Drldo a novel drl based de obfuscation system for defence against metamorphic malware. Defence Science Journal, 71(1), 55–65. https://doi.org/10.14429/dsj.71.15780. https://publications.drdo.gov.in/ojs/index.php/dsj/article/view/15780.
Sewak, M. (2019). Deep reinforcement learning - frontiers of artificial intelligence. Springer. https://doi.org/10.1007/978-981-13-8285-7.
Sewak, M., Sahay, S. K., & Rathore, H. (2020). An overview of deep learning architecture of deep neural networks and autoencoders. Journal of Computational and Theoretical Nanoscience, 17(1), 182–188. https://doi.org/10.1166/jctn.2020.8648.
Suwannalai, E., & Polprasert, C. (2020). Network intrusion detection systems using adversarial reinforcement learning with deep q-network. In 2020 18th international conference on ICT and knowledge engineering (ICT KE) (pp. 1–7). https://doi.org/10.1109/ICTKE50349.2020.9289884
Teh, Y., Bapst, V., Czarnecki, W. M., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N., & Pascanu, R. (2017). Distral: Robust multitask reinforcement learning. Advances in Neural Information Processing Systems, 30
Tehranipoor, M., & Koushanfar, F. (2010). A survey of hardware trojan taxonomy and detection. IEEE Design Test of Computers, 27(1), 10–25. https://doi.org/10.1109/MDT.2010.7.
Uprety, A., & Rawat, D. B. (2021). Reinforcement learning for iot security: A comprehensive survey. IEEE Internet of Things Journal, 8(11), 8693–8706. https://doi.org/10.1109/JIOT.2020.3040957.
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30)
Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., et al. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782), 350–354. https://doi.org/10.1038/s41586-019-1724-z.
Wagner, O. (2022). Nearly 400 car crashes in 11 months involved automated tech, companies tell regulators. https://www.npr.org/2022/06/15/1105252793/nearly-400-car-crashes-in-11-months-involved-automated-tech-companies-tell-regul#:~:text=Automated%20tech%20factored%20in%20392,11%20months%2C %20regulators%20report %20 %3A %20NPR &text=Press-,Automated %20tech%20factored%20in %20392 %20car %20crashes %20in %2011 %20months,July%202021%20to %20May %202022. Accessed 2022 July 8
Wan, X., Sheng, G., Li, Y., Xiao, L., & Du, X. (2017). Reinforcement learning based mobile offloading for cloud-based malware detection. In IEEE Global Communications Conference (GLOBECOM 2017) (pp. 1–6). https://doi.org/10.1109/GLOCOM.2017.8254503
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., & De Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML 16) (pp. 1995–2003)
Wang, Y., Stokes, J. W., & Marinescu, M. (2019). Neural malware control with deep reinforcement learning. In MILCOM 2019 - 2019 IEEE military communications conference (MILCOM) (pp. 1–8). https://doi.org/10.1109/MILCOM47813.2019.9020862
Wells, L., & Bednarz, T. (2021). Explainable ai and reinforcement learning–a systematic review of current approaches and trends. Frontiers in Artificial Intelligence, 4,. https://doi.org/10.3389/frai.2021.550030.
Weng, L. (2019). Meta reinforcement learning. lilianweng. github. io/lillog
Wilson, A., Fern, A., Ray, S., & Tadepalli, P. (2007). Multi-task reinforcement learning: A hierarchical bayesian approach. In 24th international conference on machine learning (p. 1015–1022). https://doi.org/10.1145/1273496.1273624
Wu, D., Fang, B., Wang, J., Liu, Q., & Cui, X. (2019). Evading machine learning botnet detection models via deep reinforcement learning. In IEEE international conference on communications (ICC) (pp. 1–6). IEEE
Xiao, L., Wan, X., Su, W., Tang, Y., et al. (2018). Anti-jamming underwater transmission with mobility and learning. IEEE Communications Letters, 22(3), 542–545.
Yau, D., Lui, J., Liang, F., & Yam, Y. (2005). Defending against distributed denial-of-service attacks with max-min fair server-centric router throttles. IEEE/ACM Transactions on Networking, 13(1), 29–42. https://doi.org/10.1109/TNET.2004.842221.
You, I., & Yim, K. (2010). Malware obfuscation techniques: A brief survey. In 2010 international conference on broadband, wireless computing, communication and applications (pp. 297–300). https://doi.org/10.1109/BWCCA.2010.85
Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D. J., & Mannor, S. (2018). Learn what not to learn: Action elimination with deep reinforcement learning. Advances in Neural Information Processing Systems, 31, 3562–3573.
Funding
No funding or grant was received to assist with the preparation of this manuscript or conducting of this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
This paper is the extended version of the paper “Deep Reinforcement Learning for Cybersecurity Threat Detection and Protection: A Review”, presented at the SKM-2021.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sewak, M., Sahay, S.K. & Rathore, H. Deep Reinforcement Learning in the Advanced Cybersecurity Threat Detection and Protection. Inf Syst Front 25, 589–611 (2023). https://doi.org/10.1007/s10796-022-10333-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-022-10333-x