A Framework of Recommendation System for Unmanned Aerial Vehicle Autonomous Maneuver Decision
Abstract
:1. Introduction
- (1)
- For the first time, a maneuver decision-making algorithm based on the recommendation system has been proposed, and this innovative integration provides new ideas for the application of recommendation systems in the field of UAV cooperation and confrontation.
- (2)
- A recommendation system that simulates UAV cooperation and confrontation has been constructed, and the biggest feature of this environment is the use of an integrated framework to integrate various excellent maneuver decision-making algorithms to achieve different types of confrontation, including continuous and discrete actions. Based on this, an expert algorithm based on discrete actions for both offensive and defensive strategies and an expert algorithm for offensive greedy strategies has been designed as a baseline algorithm.
- (3)
- To verify the feasibility and effectiveness of applying recommendation systems to maneuver decision-making problems, starting from collaborative filtering recommendation algorithms, an offline integrated KNN-UserCF maneuver decision recommendation algorithm has been implemented, which introduces KNN based on UserCF to solve the problem of the difficult calculation of the user similarity matrix, and uses the Bagging ensemble method to integrate different K-nearest neighbors to improve the robustness of the recommendation.
- (4)
- Due to the data dependency problem of traditional offline recommendation algorithms, deep reinforcement learning technology has been used to design and implement online maneuver decision recommendation algorithms based on discrete actions DDQN and continuous actions DDPG of deep reinforcement learning, respectively. These algorithms introduce prioritized experience replay into the standard deep reinforcement learning algorithm to improve the sampling efficiency of samples and construct a dense reward guided by a situation assessment to lead the UAV to quickly reach an advantageous spatial position and complete the target task, thereby accelerating the convergence speed of the algorithm.
2. Construction of the UAV Cooperation and Confrontation Simulation Environment for the Recommendation System
2.1. Recommendation System Based on Reinforcement Learning
2.2. Flight Dynamics and Control Model Construction
2.3. Mission Victory and Defeat Judgment Model
2.4. Maneuver Action Library
2.5. Integrated Environment Construction
2.6. Baseline Algorithm Design
- (1)
- Offensive Greedy Strategy Expert Algorithm
- (2)
- Offensive and Defensive Greedy Strategy Expert Algorithm
3. KNN-UserCF Integrated Recommendation Algorithm
3.1. Framework of KNN-UserCF Integrated Recommendation Algorithm
3.2. KNN-UserCF Module
3.3. Ensemble Learning Module
4. Deep Reinforcement Learning Recommendation Algorithm
4.1. Deep Reinforcement Learning Framework for Maneuver Decision Recommendation
4.2. Prioritized Experience Replay
4.3. Dense Reward
5. Simulation Results and Analysis
5.1. Experimental Setup
- (1)
- Evaluation Metrics
- (2)
- Initial State Settings
5.2. Simulation Results Analysis of Integrated KNN-UserCF Maneuver Decision Recommendation
- (1)
- Determination of K Nearest Neighbors and the Number of Learners
- (2)
- Results Analysis
5.3. Simulation Results Analysis of Deep Reinforcement Learning Maneuver Decision Recommendation
- (1)
- Deep Reinforcement Learning Recommendation System with Sparse Rewards
- (2)
- Dense Reward Maneuver Decision Recommendation System
- (3)
- Deep Reinforcement Learning and Traditional Recommendation Algorithm Performance Comparison
6. Conclusions
Author Contributions
Funding
Data Availability Statement
DURC Statement
Conflicts of Interest
References
- Huang, Y.; Feng, B.; Tian, A.; Dong, P.; Yu, S.; Zhang, H. An Efficient Differentiated Routing Scheme for MEO/LEO-Based Multi-Layer Satellite Networks. IEEE Trans. Netw. Sci. Eng. 2024, 11, 1041–2024. [Google Scholar] [CrossRef]
- Wang, Y.; Ren, T.; Fan, Z. Unmanned aerial vehicle air combat maneuver decision-making based on guided Minimax-DDQN. Comput. Appl. 2023, 43, 2636–2643. [Google Scholar]
- Wei, Y.J.; Zhang, H.P.; Huang, C.Q. Maneuver Decision-Making For Autonomous Air Combat Through Curriculum Learning And Reinforcement Learning with Sparse Rewards. arXiv 2023, arXiv:2302.05838. [Google Scholar]
- Yin, S.; Kang, Y.; Zhao, Y.; Xue, J. Air Combat Maneuver Decision Based on Deep Reinforcement Learning and Game Theory. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022; pp. 6939–6943. [Google Scholar]
- Hu, J.; Wang, L.; Hu, T.; Guo, C.; Wang, Y. Autonomous maneuver decision making of dual-UAV cooperative air combat based on deep reinforcement learning. Electronics 2022, 11, 467. [Google Scholar] [CrossRef]
- Dong, Y.; Ai, J. Decision Making in Autonomous Air Combat: Review and Prospects. Acta Aeronaut. Astronaut. Sin. 2020, 41, 724264. [Google Scholar] [CrossRef]
- Fu, W.; Wang, H.; Gao, S. UAV Decision-Making Expert System in Dynamic Environment Based on Heuristic Algorithm. Beijing Univ. Aeronaut. Astronaut. J. 2015, 41, 1994–1999. [Google Scholar]
- Fu, L.; Wang, X. Research on differential game modeling for close-range air combat of unmanned combat aircraft. Ordnance J. 2012, 33, 1210–1216. [Google Scholar]
- Li, K.; Zhang, K.; Zhang, Z.; Liu, Z.; Hua, S.; He, J. A UAV maneuver decision-making algorithm for autonomous airdrop based on deep reinforcement learning. Sensors 2021, 21, 2233. [Google Scholar] [CrossRef] [PubMed]
- Huang, C.; Xie, Q. Cooperative Multi-Objective Attack Decision-Making Method Based on Genetic Algorithm. Fire Control Command Control 2004, 29, 4–8. [Google Scholar]
- Xie, J.; Yang, Q.; Dai, S.; Wang, W.; Zhang, J. UAV Maneuvering Decision Research Based on Enhanced Genetic Algorithms. J. Northwestern Polytech. Univ. 2020, 6, 38. [Google Scholar]
- Vicsek, T. Universal Patterns of Collective Motion from Minimal Models of Flocking. In Proceedings of the 2008 Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems, Venice, Italy, 26–29 October 2008; pp. 3–11. [Google Scholar]
- Guo, H.; Xu, H.-J.; Gu, X.-D.; Liu, D.-Y. Air Combat Decision-Making for Cooperative Multiple Target Attack Based on Improved Particle Swarm Algorithm. Fire Control Command. Control 2011, 36, 49–51+55. [Google Scholar]
- Li, S.Y.; Chen, M.; Wang, Y.H.; Wu, Q.X. Air combat decision-making of multiple UCAVs based on constraint strategy games. Def. Technol. 2022, 18, 368–383. [Google Scholar] [CrossRef]
- Li, S.; Chen, M.; Wang, Y.; Wu, Q. A fast algorithm to solve large-scale matrix games based on dimensionality reduction and its application in multiple unmanned combat air vehicles attack-defense decision-making. Inf. Sci. 2022, 594, 305–321. [Google Scholar] [CrossRef]
- Geng, W.X.; Kong, F.E.; Ma, D.Q. Study on tactical decision of UAV medium range air combat. In Proceedings of the 26th Chinese Control and Decision Conference, Changsha, China, 31 May–2 June 2014; pp. 135–139. [Google Scholar]
- Park, H.; Lee, B.Y.; Tahk, M.J.; Yoo, D.W. Differential game based air combat maneuver generation using scoring function matrix. Int. J. Aeronaut. Space Sci. 2016, 17, 204–213. [Google Scholar] [CrossRef]
- Zhao, Z.; Wan, Y.; Chen, Y. Deep Reinforcement Learning-Driven Collaborative Rounding-Up for Multiple Unmanned Aerial Vehicles in Obstacle Environments. Drones 2024, 8, 464. [Google Scholar] [CrossRef]
- Zhang, Q.; Yang, R.; Yu, L.; Zhang, T.; Zuo, J. BVR air combat maneuvering decision by using Q-network reinforcement learning. J. Air Force Eng. Univ. (Nat. Sci. Ed.) 2018, 19, 8–14. [Google Scholar]
- Lillicrap, T.P. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Li, F.; Li, Q.; Wang, X. Thinking on the complexity of air combat under the development trend of equipment technology. Military Digest 2024, 1, 31–33. [Google Scholar]
- Nguyen, L.V. OurSCARA: Awareness-Based Recommendation Services for Sustainable Tourism. World 2024, 5, 471–482. [Google Scholar] [CrossRef]
- Pradeep, N.; Mangalore, K.K.R.; Rajpal, B.; Prasad, N.; Shastri, R. Content Based Movie Recommendation System. Int. J. Res. Ind. Eng. 2020, 9, 337–348. [Google Scholar] [CrossRef]
- Shan, J. Research on Content-Based Personalized Recommendation System. Ph.D. Thesis, Northeast Normal University, Changchun, China, 2015; pp. 3–5. [Google Scholar]
- Bachiri, K.; Yahyaouy, A.; Gualous, H.; Malek, M.; Bennani, Y.; Makany, P.; Rogovschi, N. Multi-Agent DDPG Based Electric Vehicles Charging Station Recommendation. Energies 2023, 16, 6067. [Google Scholar] [CrossRef]
- Zhou, Q. A novel movies recommendation algorithm based on reinforcement learning with DDPG policy. Int. J. Intell. Comput. Cybern. 2020, 13, 67–79. [Google Scholar] [CrossRef]
- Wang, Y.; Zheng, Y.; Xu, J. Personalized recommendation system for library based on hybrid algorithm. Comput. Inf. Technol. 2023, 31. 39–42+50. [Google Scholar] [CrossRef]
- Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
- Rendle, S.; Schmidt-Thieme, L. Factorization Models for Collaborative Filtering. ACM Comput. Surv. 2010, 42, 1–23. [Google Scholar]
- Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Comput. Surv. (CSUR) 2019, 52, 1–38. [Google Scholar] [CrossRef]
- Feng, B.; Tian, A.; Yu, S.; Li, J.; Zhou, H.; Zhang, H. Efficient Cache Consistency Management for Transient IoT Data in Content-Centric Networking. IEEE Internet Things J. 2022, 9, 12931–12944. [Google Scholar] [CrossRef]
- Zhao, K.; Liu, S.; Cai, Q.; Zhao, X.; Liu, Z.; Zheng, D.; Jiang, P.; Gai, K. KuaiSim: A comprehensive simulator for recommender systems. Adv. Neural Inf. Process. Syst. 2023, 36, 44880–44897. [Google Scholar]
- Zhao, X.; Xia, L.; Zou, L.; Liu, H.; Yin, D.; Tang, J. Whole-chain recommendations. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Galway, Ireland, 19–23 October 2020; pp. 1883–1891. [Google Scholar]
- Zhao, X.; Gu, C.; Zhang, H.; Yang, X.; Liu, X.; Tang, J.; Liu, H. Dear: Deep reinforcement learning for online advertising impression in recommender systems. Proc. AAAI Conf. Artif. Intell. 2021, 35, 750–758. [Google Scholar] [CrossRef]
- Afsar, M.M.; Crump, T.; Far, B. Reinforcement Learning Based Recommender Systems: A Survey. ACM Comput. Surv. 2022, 55, 1–38. [Google Scholar] [CrossRef]
- Chu, W.T.; Tsai, Y.L. A hybrid recommender system considering visual information for predicting favorite restaurants. World Wide Web 2017, 20, 1313–1331. [Google Scholar] [CrossRef]
- Zhou, S.; Shi, Y.; Yang, W.; Wang, J.; Gao, L.; Gao, Y. Multi-aircraft cooperative air combat maneuver decision-making based on Cook-Seiford group decision-making algorithm. Command Control Simul. 2023, 45, 44–51. [Google Scholar]
- Feng, B.; Huang, Y.; Tian, A.; Wang, H.; Zhou, H.; Yu, S.; Zhang, H. DR-SDSN: An Elastic Differentiated Routing Framework for Software-Defined Satellite Networks. IEEE Wirel. Commun. 2022, 29, 86–2022. [Google Scholar] [CrossRef]
- Zhao, J.; Gan, Z.; Liang, J.; Wang, C.; Yue, K.; Li, W.; Li, Y.; Li, R. Path Planning Research of a UAV Base Station Searching for Disaster Victims’ Location Information Based on Deep Reinforcement Learning. Entropy 2022, 24, 1767. [Google Scholar] [CrossRef] [PubMed]
- Bao, T.; Syed, A.; Kennedy, W.S.; Kantarcı, M.E. Sustainable Task Offloading in Secure UAV-Assisted Smart Farm Networks: A Multi-Agent DRL with Action Mask Approach. arXiv 2024. [Google Scholar] [CrossRef]
- Yan, C.; Xiang, X.; Wang, C. Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments. J. Intell. Robot. Syst. 2020, 98, 297–309. [Google Scholar] [CrossRef]
- Yijing, Z.; Zheng, Z.; Xiaoyi, Z.; Yang, L. Q learning algorithm based UAV path learning and obstacle avoidence approach. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 3397–3402. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), Phoenix, AZ, USA, 12–17 February 2016; pp. 1–7. [Google Scholar] [CrossRef]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1928–1937. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Recommendation System Module | Reinforcement Learning Module |
---|---|
user state | state |
recommended item | action |
user feedback | reward |
recommendation algorithm | policy |
recommendation context | environment |
cold start vs. long-term optimization | exploration vs. exploitation |
Algorithm | Advantages | Disadvantages | Applicable Scenarios | Relevance to This Task |
---|---|---|---|---|
DDQN [39,40] | addresses Q-value overestimation | requires discretization of continuous spaces | discrete decision-making tasks like path selection | ideal for UAV tasks requiring discrete decisions (e.g., waypoint selection or mission mode changes). |
DDPG [41] |
|
| continuous control tasks like trajectory adjustment | essential for UAV tasks requiring smooth, continuous control (e.g., angle adjustment, speed control). |
Q-Learning [42] |
|
|
| for tasks where the UAV’s control actions can be discretized |
DQN [43] | simple and stable for static or simple environments |
| simple discrete tasks in static environments | insufficient for dynamic UAV tasks due to lack of adaptability and overestimation issues |
A3C (asynchronous advantage actor-critic) [44] |
|
| multi-task or dynamic environments | resource-intensive and overly complex for real-time UAV tasks focused on efficiency |
PPO (proximal policy optimization) [45] |
|
| complex dynamic environments | unsuitable for resource-constrained UAV tasks requiring fast, real-time decisions |
Name | x | y | z | Speed | Pitch | Heading |
---|---|---|---|---|---|---|
red side | random | random | 5000–10,000 m | 300 m/s | [0–2π] | [0–2π] |
blue side | random | random | 5000–10,000 m | 300 m/s | [0–2π] | [0–2π] |
Algorithm | PER-DDPG | PER-DDQN | DQN | Q-Learning |
---|---|---|---|---|
success rate | 69% | 63% | 45% | 30% |
convergence time(iterations) | 1000 | 1000 | 3000 | 5000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hao, Q.; Jing, T.; Sun, Y.; Yang, Z.; Zhang, J.; Wang, J.; Wang, W. A Framework of Recommendation System for Unmanned Aerial Vehicle Autonomous Maneuver Decision. Drones 2025, 9, 25. https://doi.org/10.3390/drones9010025
Hao Q, Jing T, Sun Y, Yang Z, Zhang J, Wang J, Wang W. A Framework of Recommendation System for Unmanned Aerial Vehicle Autonomous Maneuver Decision. Drones. 2025; 9(1):25. https://doi.org/10.3390/drones9010025
Chicago/Turabian StyleHao, Qinzhi, Tengyu Jing, Yao Sun, Zhuolin Yang, Jiali Zhang, Jiapeng Wang, and Wei Wang. 2025. "A Framework of Recommendation System for Unmanned Aerial Vehicle Autonomous Maneuver Decision" Drones 9, no. 1: 25. https://doi.org/10.3390/drones9010025
APA StyleHao, Q., Jing, T., Sun, Y., Yang, Z., Zhang, J., Wang, J., & Wang, W. (2025). A Framework of Recommendation System for Unmanned Aerial Vehicle Autonomous Maneuver Decision. Drones, 9(1), 25. https://doi.org/10.3390/drones9010025