research-article

Mean policy-based proximal policy optimization for maneuvering decision in multi-UAV air combat

Authors:

Yulong DingAuthors Info & Claims

Neural Computing and Applications, Volume 36, Issue 31

Pages 19667 - 19690

https://doi.org/10.1007/s00521-024-10261-8

Published: 07 August 2024 Publication History

Abstract

Autonomous maneuvering decision-making is a crucial technology for Unmanned Aerial Vehicles (UAVs) to take the air domination in modern unmanned warfare. With the advantage of balancing exploration and exploitation, as well as the immediacy of end-to-end output by combining with deep neural network, multi-agent reinforcement learning (MARL) has made remarkable achievements in multi-UAV autonomous air combat maneuvering decision-making (MUAAMD). However, the implementation of effective cooperative policy learning remains a challenging issue for MARL methods with centralized training decentralized execution (CTDE) paradigm. This paper proposes a MARL-based method to improve the performance of cooperation in MUAAMD. Firstly, considering the constraints of dynamic and limited perception for UAVs in the realistic air combat scenario, the MUAAMD problem is formulated based on partially observable Markov game (POMG) model. Secondly, a novel efficient MARL algorithm named the mean policy-based proximal policy optimization (MP3O) is introduced. Specifically, a joint policy optimization mechanism is constructed by estimating the policies of neighboring agents in group as a mean-field approximation while training, which enables both centralized evaluation and improvement of cooperative policy under the CTDE paradigm. Thirdly, by combining with three improvement techniques, a cooperative decision-making framework for MUAAMD based on MP3O is proposed. Empirically, results of simulations and comparative experiments validate the effectiveness of proposed method in promoting cooperative policy learning in resolving MUAAMD problem.

References

[1]

Xin B, Zhang J, Chen J, et al. Overview of research on transformation of multi-AUV formations Complex Syst Modeling Simul 2021 1 1-14

[2]

Wang X, Wang Y, Su X, et al. Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction Artif Intell Rev 2023 57 1

Digital Library

[3]

Ho Y, Bryson A, and Baron S Differential games and optimal pursuit-evasion strategies IEEE Trans Autom Control 1965 10 4 385-389

[4]

Yan T, Cai Y, and Xu B Evasion guidance algorithms for air-breathing hypersonic vehicles in three-player pursuit-evasion games Chin J Aeronaut 2020 33 3423-3436

[5]

Bao Fu F, Qi Shu P, Bing Rong H (2012) Research on high-speed evader vs. multi-lower speed pursuers in multi pursuit-evasion games. Inf Technol J 11(8): 989

[6]

Liu C, Sun S, Tao C, et al. Sliding mode control of multi-agent system with application to UAV air combat Comput Electr Eng 2021 96

Digital Library

[7]

Duan H, Li P, and Yu Y A predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory IEEE/CAA J Automatica Sinica 2015 2 11-18

[8]

Ma Y, Wang G, Hu X, et al. Cooperative occupancy decision making of multi-UAV in beyond-visual-range air combat: a game theory approach IEEE Access 2020 8 11624-11634

[9]

Li S, Chen M, Wang Y, and Wu Q Air combat decision-making of multiple UCAVs based on constraint strategy games Defense Technology 2022 18 3 368-383

[10]

Chen X and Wang YF Study on multi-UAV air combat game based on fuzzy strategy Appl Mech Mater 2014 494–495 1102-1105

[11]

Chen X, Zhao M (2012) The decision method research on air combat game based on uncertain interval information. In: 2012 Fifth international symposium on computational intelligence and design, pp 456–459

[12]

Ernest N and Carroll D Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions J Def Manag 2016 06 01 2167-2374

[13]

Kang Y, Pu Z, Liu Z, et al (2022) Air-to-air combat tactical decision method based on SIRMs fuzzy logic and improved genetic algorithm. In: Proceedings of 2020 international conference on guidance, navigation and control. Springer, pp 3699–3709

[14]

Gao J and Tong M Extracting decision rules for cooperative team air combat based on rough set theory Chin J Aeronaut 2003 16 223-228

[15]

Changqiang H, Kangsheng D, Hanqiao H, et al (2018) Autonomous air combat maneuvering decision using Bayesian infer-ence and moving horizon optimization. J Syst Eng Electron 29:86–97.

[16]

Su MC, Lai SC, Lin SC, and You LF A new approach to multi-aircraft air combat assignments Swarm Evol Comput 2012 6 39-46

[17]

McGrew JS (2008) Real-time maneuvering decisions for autonomous air combat. Dissertation. Massachusetts Institute of Technology. https://dspace.mit.edu/handle/1721.1/44927

[18]

Sprinkle J, Eklund JM, Kim HJ, Sastry S (2004) Encoding aerial pursuit/evasion games with fixed wing aircraft into a nonlinear model predictive tracking controller. In: 2004 43rd IEEE conference on decision and control (CDC). IEEE, pp 2609–2614

[19]

Sun Z, Piao H, Yang Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play Eng Appl Artif Intell 2021 98

[20]

Chai J, Chen W, Zhu Y, et al. A hierarchical deep reinforcement learning framework for 6-DOF UCAV air-to-ai-r combat IEEE Trans Syst Man Cybern Syst 2023 53 5417-5429

[21]

Jiandong Z, Qiming Y, Guoqing S, et al (2021) UAV cooperative air combat maneuver decision based on multi-agent reinf-orcement learning. J Syst Eng Electron 32:1421–1438.

[22]

Liu X, Yin Y, Su Y, and Ming R A multi-UCAV cooperative decision-making method based on an MAPPO algorithm for beyond-visual-range air combat Aerospace 2022 9 563

[23]

Wang L, Hu J, Xu Z, and Zhao C Autonomous maneuver strategy of swarm air combat based on DDPG Auton Intell Syst 2021 1 15

[24]

Han Y, Piao H, Hou Y, et al (2022) Deep relationship graph reinforcement learning for multi-aircraft air combat. In: 2022 international joint conference on neural networks (IJCNN), IEEE, pp 1–8

[25]

Piao H, Han Y, Chen H, et al. Complex relationship graph abstraction for autonomous air combat collaboration: a le-arning and expert knowledge hybrid approach Expert Syst Appl 2023 215

Digital Library

[26]

Sun Z, Wu H, Shi Y, et al. Multi-agent air combat with two-stage graph-attention communication Neural Comput Appl 2023 35 19765-19781

Digital Library

[27]

Zhao Z, Chen J, Xin B, et al. Learning scalable task assignment with imperative-priori conflict resolution in Multi-UAV adversarial swarm defense problem J Syst Sci Complex 2024 37 369-388

[28]

Shi W, Feng YH, Cheng GQ, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning Acta Automatica Sinica 2021 47 1610-1623

[29]

Chen C, Mo L, Zheng D, et al. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability Acta Aeronautica et Astronautica Sinica 2020 41 342-354

[30]

Li S, Chi H, Xie T (2021) Multi-agent combat in non-stationary environments. In: 2021 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

[31]

Gong Z, Xu Y, and Luo D UAV cooperative air combat maneuvering confrontation based on multi-agent reinforcement learning Un Sys 2023 11 273-286

[32]

Silver D, Huang A, Maddison CJ, et al. Mastering the game of Go with deep neural networks and tree search Nature 2016 529 484-489

[33]

Vinyals O, Babuschkin I, Czarnecki WM, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning Nature 2019 575 350-354

[34]

OpenAI, Berner C, Brockman G, et al (2019) Dota 2 with large scale deep reinforcement learning. https://arxiv.org/abs/191206680

[35]

Gronauer S and Diepold K Multi-agent deep reinforcement learning: a survey Artif Intell Rev 2022 55 2 895-943

Digital Library

[36]

Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, et al. (2017) Value-decomposition networks for cooperative multiagent learning. https://arxiv.org/abs/1706.05296

[37]

Rashid T, Samvelyan M, Schroeder C, et al (2018) QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th international conference on machine learning. PMLR, pp 4295–4304

[38]

Son K, Kim D, Kang WJ, et al (2019) QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning. PMLR, pp 5887–5896

[39]

Wang J, Ren Z, Liu T, et al (2021) QPLEX: duplex dueling multi-agent q-learning. https://arxiv.org/abs/2008.01062

[40]

Yang Y, Luo R, Li M, et al (2018) Mean field multi-agent reinforcement learning. In: Proceedings of the 35th international conference on machine learning. PMLR, pp 5571–5580

[41]

Lowe R, Wu Y, Tamar A, et al (2020) Multi-agent actor-critic for mixed cooperative-competitive environments. https://arxiv.org/abs/170602275

[42]

Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A, and Wu Y The surprising effectiveness of PPO in cooperative multi-agent games Adv Neural Inf Process Syst 2022 35 24611-24624

[43]

Wu X, Li X, Li J, et al. Caching transient content for IoT sensing: multi-agent soft actor-critic IEEE Trans Commun 2021 69 5886-5901

[44]

Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal Policy optimization algorithms. https://arxiv.org/abs/170706347

[45]

Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971

[46]

Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning. PMLR, pp 1861–1870

[47]

Foerster JN, Farquhar G, Afouras T, et al (2018) Counterfactual multi-agent policy gradients. In: 32nd Proceedings of the AAAI conference on artificial intelligence, vol 32

[48]

Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning. PMLR, pp 2961–2970

[49]

Wu Z, Yu C, Ye D, et al. Coordinated proximal policy optimization Adv Neural Inf Process Syst 2021 34 26437-26448

[50]

Kuba JG, Chen R, Wen M, et al (2022) Trust region policy optimization in multi-agent reinforcement learning. In: The tenth international conference on learning representations (ICLR)

[51]

Chu X, Ye H (2017) Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning. https://arxiv.org/abs/1710.00336

[52]

Schulman J, Levine S, Moritz P, et al (2015) Trust region policy optimization. In: Proceedings of the 32nd international conference on machine learning, PMLR, pp 1889–1897

[53]

Guan Y, Ren Y, Li SE, et al. Centralized cooperation for connected and automated vehicles at intersections by pro-ximal policy optimization IEEE Trans Veh Technol 2020 69 12597-12608

[54]

Li L, Zhang X, Qian C, and Wang R Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization Neural Comput Appl 2023 35 10239-10255

Digital Library

[55]

Huang L and Qu X Improving traffic signal control operations using proximal policy optimization IET Intel Transport Syst 2023 17 3 592-605

[56]

Beard RW and McLain TW Chapter 9 Small unmanned aircraft theory and practice 2012 New Jersey, NJ Princeton University Press 164-173

[57]

Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. https://arxiv.org/abs/1506.02438

[58]

Ye D, Liu Z, Sun M, et al. Mastering complex control in MOBA games with deep reinforcement learning Proc AAAI Conf Artif Intell 2020 34 6672-6679

[59]

Gaertner U (2013) UAV Swarm tactics: an agent-based simulation and markov process analysis. Dissertation, Naval Postgraduate School Monterey Ca Dept of Operations Research

[60]

Park S, Deyst J, How J (2004) A new nonlinear guidance logic for trajectory tracking In: AIAA guidance, navigation, and control conference.

[61]

Liang E, Liaw R, Moritz P, et al (2018) RLlib: abstractions for distributed reinforcement learning. In: Proceedings of the 35th international conference on machine learning(ICML). PMLR, pp 3053–3062

[62]

De Witt CS, Gupta T, Makoviichuk D, et al (2020) Is independent learning all you need in the starcraft multi-agent challenge? https://arxiv.org/abs/2011.09533

Index Terms

Index terms have been assigned to the content through auto-classification.

Recommendations

Rocket Powered Landing Guidance Using Proximal Policy Optimization
CACRE2019: Proceedings of the 2019 4th International Conference on Automation, Control and Robotics Engineering

Rocket recovery requires advanced guidance algorithms to achieve pinpoint landing while satisfying multiple stringent constraints. In this paper, we design a guidance law based on reinforcement learning for the powered landing phase of vertical take-off ...
An approximate dynamic programming approach for solving an air combat maneuvering problem
Abstract
Within visual range air combat involves execution of highly complex and dynamic activities, requiring rapid, sequential decision-making to achieve success. Fighter pilots spend years perfecting tactics and maneuvers for these types of ...
Highlights
- Within visual range air combat requires execution of complex & dynamic maneuvers.
Research on Maneuvering Decision for Multi-fighter Cooperative Air Combat
IHMSC '09: Proceedings of the 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 01

The multi-stage influence diagram theory and the game theory are introduced to maneuvering decisions in the multi-fighter cooperative air combat. A multi-stage influence diagram game model for n-stages decision is presented, and the decision is ...

Comments

Information & Contributors

Information

Published In

cover image Neural Computing and Applications

Neural Computing and Applications Volume 36, Issue 31

Nov 2024

625 pages

EISSN:1433-3058

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 August 2024

Accepted: 23 July 2024

Received: 29 September 2023

Author Tags

Qualifiers

Research-article

Funding Sources

National Outstanding Youth Talents Support Program
Basic Science Center Programs of NSFC
Shanghai Municipal Science and Technology Major Project
Shanghai Municipal of Science and Technology Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents