Robust Multiagent Reinforcement Learning for UAV Systems: Countering Byzantine Attacks †
Abstract
:1. Introduction
- To ensure that the proposed method will perform reliably and consistently under all conditions, the current work provides a theoretical guarantee of robustness against Byzantine attacks during MARL training.
- The present work includes a Byzantine threat model from a multi-UAV system perspective to identify the associated risks. This is particularly relevant in decentralized and distributed multi-UAV systems, where the assumption of malicious or faulty agents is essential for ensuring their integrity.
- Additional studies and experiments are conducted to analyze the impact of key attack parameters on learning performance during a Byzantine attack. The present work also examines whether the proposed method can mitigate Byzantine attacks when attack parameters are varied.
2. Literature Review
2.1. Secure Mechanisms in Multi-UAV Systems
2.2. Byzantine-Tolerant Machine Learning
2.3. Byzantine Fault Tolerance in Multi-UAV Systems
3. Related Background
3.1. Multiagent Reinforcement Learning
3.2. Navigation Model for Multi-UAV Systems
4. Problem Definition
5. Threat Model
6. Methodology
6.1. Proposed Method
- Potential Byzantine estimate: Based on the average rewards and the instances of penalties accrued in each training episode, an estimation is made for the potential number of Byzantine agents present within the MARL environment.
- State consensus model: Each agent observes its neighboring agents and communicates their states with the other agents in the environment. A consensus on all agents’ states is then derived from these observations using a geometric median aggregation method. This consensus method ensures a resilient state estimate, maintaining its reliability as long as the proportion of malicious agents within the MARL environment remains below the breakdown point of .
- State update model: Finally, each agent adjusts the state consensus, depending on the potential presence of Byzantine agents in the environment. In the presence of a Byzantine threat, greater weight is attributed to an agent’s own direct observations, signifying a higher level of self-trust compared with observations from other agents. Conversely, when there is no suspicion of a Byzantine threat, both direct and indirect observations are equally trusted. This allocation of weightage to direct and indirect observations further enhances the robustness of the consensus model.
6.2. Byzantine-Resilient MARL Algorithm
Algorithm 1 Byzantine-resilient multiagent reinforcement learning. |
|
6.3. Robustness Guarantee
7. Experimental Scenarios
- Cooperative navigation: In this task, each UAV collaborates with other agents to navigate toward a designated landmark, ensuring that all landmarks are covered. Agents receive collective rewards based on their proximity to any landmark (building) during each step. Penalties are incurred for collisions with other agents or landmarks. With sufficient training, the agents acquire the ability to reach their respective target landmarks by coordinating with one another.
- Predator–prey navigation: In this task, the MARL environment comprises a set of UAVs working in unison to capture an adversary with superior speed. Agents are rewarded for successfully intercepting the suspect (adversary) while facing penalties for collisions with one another. With sufficient training, agents learn to cooperate with other predator agents to navigate the environment and capture the prey agent.
Simulation Environment
8. Results Analysis
- Average reward: The average cumulative reward obtained by all agents over a set of episodes. It provides a measure of the overall performance of the agents.
- Individual agent reward: The reward achieved by each individual agent. This helps assess the contribution of each agent to the collective goal.
- Learning curve: Plots of cumulative reward over time or episodes, providing insights into how well agents learn and adapt.
8.1. Learning Performance
8.1.1. Cooperative Navigation
8.1.2. Predator–Prey Navigation
8.2. Analysis of Key Attack Parameters
8.2.1. Number of Agents
8.2.2. Position Bias during Attack
8.2.3. Number of Malicious Agents
9. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wise, R.; Rysdyk, R. UAV coordination for autonomous target tracking. In Proceedings of the AIAA Guidance, Navigation, and Control Conference and Exhibit, Honolulu, HI, USA, 18–21 August 2008; p. 6453. [Google Scholar]
- Elloumi, M.; Dhaou, R.; Escrig, B.; Idoudi, H.; Saidane, L.A. Monitoring road traffic with a UAV-based system. In Proceedings of the 2018 IEEE Wireless Communications and Networking Conference (WCNC), Barcelona, Spain, 15–18 April 2018; pp. 1–6. [Google Scholar]
- Merino, L.; Caballero, F.; Martinez-de Dios, J.; Ollero, A. Cooperative fire detection using unmanned aerial vehicles. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 1884–1889. [Google Scholar]
- Cuaran, J.; Leon, J. Crop monitoring using unmanned aerial vehicles: A review. Agric. Rev. 2021, 42, 121–132. [Google Scholar] [CrossRef]
- Scherer, J.; Yahyanejad, S.; Hayat, S.; Yanmaz, E.; Andre, T.; Khan, A.; Vukadinovic, V.; Bettstetter, C.; Hellwagner, H.; Rinner, B. An autonomous multi-UAV system for search and rescue. In Proceedings of the First Workshop on Micro Aerial Vehicle Networks, Systems, and Applications for Civilian Use, Florence, Italy, 18 May 2015; pp. 33–38. [Google Scholar]
- Lin, J.; Dzeparoska, K.; Zhang, S.Q.; Leon-Garcia, A.; Papernot, N. On the Robustness of Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 2020 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 21 May 2020; pp. 62–68. [Google Scholar] [CrossRef]
- Gleave, A.; Dennis, M.; Kant, N.; Wild, C.; Levine, S.; Russsell, S. Adversarial Policies: Attacking Deep Reinforcement Learning. arXiv 2020, arXiv:1905.10615. [Google Scholar]
- Rodday, N.M.; Schmidt, R.d.O.; Pras, A. Exploring security vulnerabilities of unmanned aerial vehicles. In Proceedings of the NOMS 2016–2016 IEEE/IFIP Network Operations and Management Symposium, Istanbul, Turkey, 25–29 April 2016; pp. 993–994. [Google Scholar]
- Lakew Yihunie, F.; Singh, A.K.; Bhatia, S. Assessing and exploiting security vulnerabilities of unmanned aerial vehicles. In Smart Systems and IoT: Innovations in Computing: Proceeding of SSIC 2019; Springer: Singapore, 2020; pp. 701–710. [Google Scholar]
- Dahiya, S.; Garg, M. Unmanned aerial vehicles: Vulnerability to cyber attacks. In Proceedings of the UASG 2019: Unmanned Aerial System in Geomatics 1; Springer: Cham, Switzerland, 2020; pp. 201–211. [Google Scholar]
- Cremonini, M.; Omicini, A.; Zambonelli, F. Multi-agent systems on the Internet: Extending the scope of coordination towards security and topology. In Proceedings of the European Workshop on Modelling Autonomous Agents in a Multi-Agent World, Valencia, Spain, 30 June–2 July 1999; pp. 77–88. [Google Scholar]
- Jung, Y.; Kim, M.; Masoumzadeh, A.; Joshi, J.B. A survey of security issue in multi-agent systems. Artif. Intell. Rev. 2012, 37, 239–260. [Google Scholar] [CrossRef]
- Brust, M.R.; Danoy, G.; Bouvry, P.; Gashi, D.; Pathak, H.; Gonçalves, M.P. Defending against intrusion of malicious uavs with networked uav defense swarms. In Proceedings of the 2017 IEEE 42nd Conference on Local Computer Networks Workshops (LCN Workshops), Singapore, 9 October 2017; pp. 103–111. [Google Scholar]
- Krishna, C.L.; Murphy, R.R. A review on cybersecurity vulnerabilities for unmanned aerial vehicles. In Proceedings of the 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), Shanghai, China, 11–13 October 2017; pp. 194–199. [Google Scholar]
- Zhi, Y.; Fu, Z.; Sun, X.; Yu, J. Security and privacy issues of UAV: A survey. Mob. Netw. Appl. 2020, 25, 95–101. [Google Scholar] [CrossRef]
- Lamport, L.; Shostak, R.; Pease, M. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 1982, 4, 382–401. [Google Scholar] [CrossRef]
- Medhi, J.K.; Huang, C.; Liu, R.; Chen, X. Byzantine Resilient Reinforcement Learning for Multi-Agent UAV Systems. In Proceedings of the AIAA SCITECH 2023 Forum, National Harbor, MD, USA, 23–27 January 2023; p. 2472. [Google Scholar]
- Altawy, R.; Youssef, A.M. Security, privacy, and safety aspects of civilian drones: A survey. ACM Trans. Cyber-Phys. Syst. 2016, 1, 1–25. [Google Scholar] [CrossRef]
- De Melo, C.F.E.; e Silva, T.D.; Boeira, F.; Stocchero, J.M.; Vinel, A.; Asplund, M.; de Freitas, E.P. UAVouch: A secure identity and location validation scheme for UAV-networks. IEEE Access 2021, 9, 82930–82946. [Google Scholar] [CrossRef]
- Walia, E.; Bhatia, V.; Kaur, G. Detection of malicious nodes in flying ad-hoc networks (FANET). Int. J. Electron. Commun. Eng. 2018, 5, 6–12. [Google Scholar] [CrossRef]
- Ali, Z.; Chaudhry, S.A.; Ramzan, M.S.; Al-Turjman, F. Securing smart city surveillance: A lightweight authentication mechanism for unmanned vehicles. IEEE Access 2020, 8, 43711–43724. [Google Scholar] [CrossRef]
- Keshavarz, M.; Shamsoshoara, A.; Afghah, F.; Ashdown, J. A real-time framework for trust monitoring in a network of unmanned aerial vehicles. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 6–9 July 2020; pp. 677–682. [Google Scholar]
- Bai, J.; Zeng, Z.; Wang, T.; Zhang, S.; Xiong, N.N.; Liu, A. TANTO: An Effective Trust-Based Unmanned Aerial Vehicle Computing System for the Internet of Things. IEEE Internet Things J. 2022, 10, 5644–5661. [Google Scholar] [CrossRef]
- Tangade, S.; Kumaar, R.A.; Malavika, S.; Monisha, S.; Azam, F. Detection of Malicious Nodes in Flying Ad-hoc Network with Supervised Machine Learning. In Proceedings of the 2022 Third International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE), Bengaluru, India, 16–17 December 2022; pp. 1–5. [Google Scholar]
- Bouhata, D.; Moumen, H. Byzantine Fault Tolerance in Distributed Machine Learning: A Survey. arXiv 2022, arXiv:2205.02572. [Google Scholar]
- Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine learning with adversaries: Byzantine tolerant gradient descent. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Xie, C.; Koyejo, O.; Gupta, I. Generalized byzantine-tolerant sgd. arXiv 2018, arXiv:1802.10116. [Google Scholar]
- Strobel, V.; Castelló Ferrer, E.; Dorigo, M. Blockchain technology secures robot swarms: A comparison of consensus protocols and their resilience to byzantine robots. Front. Robot. AI 2020, 7, 54. [Google Scholar] [CrossRef] [PubMed]
- Bing, Y.; Wang, L.; Chen, Z. A Spectrum Sensing Method for UAV Swarms Under Byzantine Attack. In Proceedings of the International Conference on Autonomous Unmanned Systems, Changsha, China, 24–26 September 2021; pp. 1748–1758. [Google Scholar]
- Hacohen, S.; Medina, O.; Grinshpoun, T.; Shvalb, N. Improved GNSS localization and Byzantine detection in UAV swarms. Sensors 2020, 20, 7239. [Google Scholar] [CrossRef] [PubMed]
- Liao, Z.; Zhang, L.; Dong, Z. UAV Swarm Exploration With Byzantine Fault Tolerance. In Proceedings of the 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; pp. 7150–7154. [Google Scholar]
- Kong, L.; Chen, B.; Hu, F. LAP-BFT: Lightweight Asynchronous Provable Byzantine Fault-Tolerant Consensus Mechanism for UAV Network. Drones 2022, 6, 187. [Google Scholar] [CrossRef]
- Hu, W.; Huo, X.; Zhang, Y. Research on UAV Swarm Technology Based on Practical Byzantine. In Proceedings of the 2022 5th International Conference on Machine Learning and Machine Intelligence, Xiamen, China, 23–25 September 2022; pp. 199–206. [Google Scholar]
- Busoniu, L.; Babuska, R.; De Schutter, B. A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2008, 38, 156–172. [Google Scholar] [CrossRef]
- Terry, J.; Black, B.; Grammel, N.; Jayakumar, M.; Hari, A.; Sullivan, R.; Santos, L.S.; Dieffendahl, C.; Horsch, C.; Perez-Vicente, R.; et al. Pettingzoo: Gym for multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst. 2021, 34, 15032–15043. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–June 24 2016; pp. 1928–1937. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, Beijing, China, 22–24 June 2014; pp. 387–395. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Park, S.; Martins, N.C. Necessary and sufficient conditions for the stabilizability of a class of LTI distributed observers. In Proceedings of the 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), Maui, HI, USA, 10–13 December 2012; pp. 7431–7436. [Google Scholar]
- Yang, Z.; Gang, A.; Bajwa, W.U. Adversary-resilient distributed and decentralized statistical inference and machine learning: An overview of recent advances under the Byzantine threat model. IEEE Signal Process. Mag. 2020, 37, 146–159. [Google Scholar] [CrossRef]
- Mitra, A.; Sundaram, S. Byzantine-resilient distributed observers for LTI systems. Automatica 2019, 108, 108487. [Google Scholar] [CrossRef]
- Minsker, S. Geometric median and robust estimation in Banach spaces. Bernoulli 2015, 21, 2308–2335. [Google Scholar] [CrossRef]
- Chen, Y.; Su, L.; Xu, J. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Meas. Anal. Comput. Syst. 2017, 1, 1–25. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
Related Works | Application | Computational Cost | Byzantine Attack Detection | Byzantine Attack Mitigation |
---|---|---|---|---|
[19,21] | Surveillance operations | High | No | No |
[23] | Cloud computing | Medium | No | No |
[20,24] | FANETs | High | No | No |
[28] | Disaster response | High | Yes | Yes |
[29] | Spectrum sensing | Low | Yes | Yes |
[30] | Localization of swarm UAVs | High | Yes | No |
[31,32] | Mission-oriented UAVs | High | Yes | Yes |
[33] | Swarm combat | High | No | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Medhi, J.K.; Liu, R.; Wang, Q.; Chen, X. Robust Multiagent Reinforcement Learning for UAV Systems: Countering Byzantine Attacks. Information 2023, 14, 623. https://doi.org/10.3390/info14110623
Medhi JK, Liu R, Wang Q, Chen X. Robust Multiagent Reinforcement Learning for UAV Systems: Countering Byzantine Attacks. Information. 2023; 14(11):623. https://doi.org/10.3390/info14110623
Chicago/Turabian StyleMedhi, Jishu K., Rui Liu, Qianlong Wang, and Xuhui Chen. 2023. "Robust Multiagent Reinforcement Learning for UAV Systems: Countering Byzantine Attacks" Information 14, no. 11: 623. https://doi.org/10.3390/info14110623
APA StyleMedhi, J. K., Liu, R., Wang, Q., & Chen, X. (2023). Robust Multiagent Reinforcement Learning for UAV Systems: Countering Byzantine Attacks. Information, 14(11), 623. https://doi.org/10.3390/info14110623