Abstract
For air combat maneuvering decision, the sparse reward during the application of deep reinforcement learning limits the exploration efficiency of the agents. To address this challenge, we propose an auxiliary reward function considering the impact of angle, range, and altitude. Furthermore, we investigate the influences of the network nodes, layers, and the learning rate on decision system, and reasonable parameter ranges are provided, which can serve as a guideline. Finally, four typical air combat scenarios demonstrate good adaptability and effectiveness of the proposed scheme, and the auxiliary reward significantly improves the learning ability of deep Q network (DQN) by leading the agents to explore more intently. Compared with the original deep deterministic policy gradient and soft actor critic algorithm, the proposed method exhibits superior exploration capability with higher reward, indicating that the trained agent can adapt to different air combats with good performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
No datasets were analyzed during the current study. The data generated during the current study are available from the corresponding author on reasonable request.
Abbreviations
- UCAV:
-
Unmanned combat aerial vehicle
- DQN:
-
Deep Q network
- DDPG:
-
Deep deterministic policy gradient
- SAC:
-
Soft actor critic
- LOS:
-
Line of sight
- AA:
-
Aspect angle (rad)
- ATA:
-
Antenna train angle (rad)
- HTC:
-
Heading crossing angle (rad)
- R :
-
Distance between UCAV and target (m/s)
- \(\dot{R}\) :
-
Change rate of R (m/s)
- \({z_\text{U}}\),\({z_\text{T}}\) :
-
Flight altitude of UCAV and target (m)
- \({v_\text{U}}\),\({v_\text{T}}\) :
-
Flight speed of UCAV and target (m/s)
- \(\Delta h\) :
-
Flight altitude difference (rad)
- \(\Delta v\) :
-
Flight speed difference (m/s)
- \({R_\text{w}}\) :
-
Airborne weapon attack range (m)
- \({\varphi _\text{w}}\) :
-
Maximum weapon attack angle (deg)
- \({q_\text{w}}\) :
-
Maximum aspect angle
- \({P_\text{U}},{P_\text{T}}\) :
-
Position vector of UCAV and target
References
Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664
Liu H, Meng Q, Peng F, Lewis FL (2020) Heterogeneous formation control of multiple UAVs with limited-input leader via reinforcement learning. Neurocomputing 412:63–71
Zhou K, Wei R, Xu Z (2020) An air combat decision learning system based on a brain-like cognitive mechanism. Cogn Comput 12:128–139
Trotta A, Felice MD, Montori F, Chowdhury KR, Bononi L (2018) Joint coverage, connectivity, and charging strategies for distributed UAV networks. IEEE Trans Robot 34:883–900
Sun Z, Wu H, Shi Y, Yu X, Gao Y, Pei W, Yang Z, Piao H, Hou Y (2023) Multi-agent air combat with two-stage graph-attention communication. Neural Comput Appl 35:19765–19781
Shin H, Lee J, Kim H, Hyunchul Shim D (2018) An autonomous aerial combat framework for two-on-two engagements based on basic fighter maneuvers. Aerosp Sci Technol 72:305–315
Maravall Lope J, Fuentes JP (2015) Vision-based anticipatory controller for the autonomous navigation of an UAV using artificial neural networks. Neurocomputing 151:101–107
Dai X, Mao Y, Huang T (2020) Automatic obstacle avoidance of quadrotor UAV via CNN-based learning. Neurocomputing 402:346–358
Wang M, Wang L, Yue T, Liu H (2020) Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness. Aerosp Sci Technol 96:105534
Zhou K, Wei R, Xu Z, Zhang Q (2018) (2018) A brain like air combat learning system inspired by human learning mechanism. In: Proceedings of IEEE CSAA guidance, navigation and control conference (CGNCC). IEEE, Xiamen, pp 1–6
Wang X, Guo K, Chao T, Wang S (2022) Design of differential game guidance law for dual defense aircrafts. In: Proceedings of 2022 5th international symposium on autonomous systems (ISAS). IEEE, Hangzhou, pp 1–6
Weintraub IE, Pachter M, Garcia E (2020) (2020) An introduction to pursuit-evasion differential games. In: Proceedings of American control conference (ACC). IEEE, Denver, pp 1049–1066
Ruan W, Sun Y, Deng Y, Duan H (2023) Hawk-pigeon game tactics for unmanned aerial vehicle swarm target defense. IEEE Trans Ind Inform 19:11619–11629
Ma Y, Wang G, Hu X, Luo H, Lei X (2020) Cooperative occupancy decision making of multi-UAV in beyond-visual-range air combat: a game theory approach. IEEE Access 8:11624–11634
Kang Y, Pu Z, Liu Z (2020) (2020) Air-to-air combat tactical decision method based on SIRMs fuzzy logic and improved genetic algorithm. In: Proceedings of international conference on guidance, navigation and control (ICGNC). Springer, Tianjin, pp 3699–3709
Crumpacker JB, Robbins MJ, Jenkins PR (2022) An approximate dynamic programming approach for solving an air combat maneuvering problem. Expert Syst Appl 203:117448
Sharma R (2014) (2014) Fuzzy Q learning based UAV autopilot. In: Proceedings of innovative applications of computational intelligence on power, energy and controls with their impact on humanity (CIPECH). IEEE, Ghaziabad, pp 29–33
Liu Y, Liu W, Obaid MA, Abbas IA (2016) Exponential stability of Markovian jumping Cohen–Grossberg neural networks with mixed mode-dependent time-delays. Neurocomputing 177:409–415
Du B, Liu Y, Atiatallah Abbas I (2016) Existence and asymptotic behavior results of periodic solution for discrete-time neutral-type neural networks. J Frankl Inst 353:448–461
Emuna R, Duffney R, Borowsky A, Biess A (2022) Example-guided learning of stochastic human driving policies using deep reinforcement learning. Neural Comput Appl 35:16791–16804
Kiani F, Saraç ÖF (2023) A novel intelligent traffic recovery model for emergency vehicles based on context-aware reinforcement learning. Inf Sci 619:288–309
Damadam S, Zourbakhsh M, Javidan R, Faroughi A (2022) An intelligent IoT based traffic light management system: deep reinforcement learning. Smart Cities 5:1293–1311
Zhu R, Li L, Wu S, Lv P, Li Y, Xu M (2023) Multi-agent broad reinforcement learning for intelligent traffic light control. Inf Sci 619:509–525
Du G, Zou Y, Zhang X, Liu T, Wu J, He D (2020) Deep reinforcement learning based energy management for a hybrid electric vehicle. Energy 201:117591
Yang D, Karimi HR, Pawelczyk M (2023) A new intelligent fault diagnosis framework for rotating machinery based on deep transfer reinforcement learning. Control Eng Pract 134:105475
Liu Q, Shi L, Sun L, Li J, Ding M, Shu FS (2020) Path planning for UAV-mounted mobile edge computing with deep reinforcement learning. IEEE Trans Veh Technol 69:5723–5728
Hoel C-J, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2020) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5:294–305
Leong AS, Ramaswamy A, Quevedo DE, Karl H (2020) Deep reinforcement learning for wireless sensor scheduling in cyber-physical system. Automatic 113:108759
Liessner R, Schmitt J, Dietermann A, Bäker B (2019) Hyperparameter optimization for deep reinforcement learning in vehicle energy management. In: Proceedings of 11th international conference on agents artificial intelligence SCITEPRESS—science and technology publications, Prague, pp 134–144
Chen Y, Zhang J, Yang Q, Zhou Y, Shi G, Wu Y (2020) Design and verification of UAV maneuver decision Simulation system based on deep Q-learning network. In: Proceedings of 2020 16th international conference on control, automation, robotics and vision (ICARCV). IEEE, Shenzhen, pp 817–823
Cao Y, Kou Y-X, Li Z-W, Xu A (2023) Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerosp Eng 2023:1–20
Zhang J, Yu Y, Zheng L, Yang Q, Shi G, Wu Y (2023) Situational continuity-based air combat autonomous maneuvering decision-making. Def Technol 29:66–79
Yang Q, Zhu Y, Zhang J, Qiao S, Liu J (2019) UAV air combat autonomous maneuver decision based on DDPG algorithm. In: 2019 IEEE 15th international conference on control automation. ICCA. IEEE, Edinburgh, pp 37–42
Zhang J, Yang Q, Shi G (2021) UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J Syst Eng Electron 32:1421–1438
Wang Z, Guo Y, Li N, Hu S, Wang M (2023) Autonomous collaborative combat strategy of unmanned system group in continuous dynamic environment based on PD-MADDPG. Comput Commun 200:182–204
Li L, Zhang X, Qian C et al (2023) Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization. Neural Comput Appl 2023:1–17
Wang Z, Li H, Wu Z, Wu H (2021) A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space. Int J Adv Robot Syst 18:172988142198954
Liu X, Yin Y, Su Y, Ming R (2022) A multi-UCAV cooperative decision-making method based on an MAPPO algorithm for beyond-visual-range air combat. Aerospace 9:563–582
Xu J, Zhang J, Yang L, Liu C (2022) Autonomous decision-making for dogfights based on a tactical pursuit point approach. Aerosp Sci Technol 129:107857
Li B, Bai S, Liang S, Ma R, Neretin E, Huang J (2023) Manoeuvre decision-making of unmanned aerial vehicles in air combat based on an expert actor-based soft actor critic algorithm. CAAI Trans Intell Technol 8:1608–1619
Li B, Huang J, Bai S, Gan Z, Liang S, Evgeny N, Yao S (2023) Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning. CAAI Trans Intell Technol 8:64–81
Huang C, Dong K, Huang H, Tang S (2018) Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 29:86–97
Johnson J (2023) Automating the OODA loop in the age of intelligent machines: reaffirming the role of humans in command-and-control decision-making in the digital age. Def Stud 23:43–67
Wang LX, Guo YG, Zhang Q, Yue T (2017) Suggestion for aircraft flying qualities requirements of a short-range air combat mission. Chin J Aeronaut 30:881–897
Li Y, Lyu Y, Shi J, Li W (2022) Autonomous maneuver decision of air combat based on simulated operation command and FRV-DDPG algorithm. Aerospace 9:658–676
Austin F, Carbone G, Falco M, Hinz H, Lewis M (1987) Automated maneuvering decisions for air-to-air combat. In: Guidance, navigation and control conference, pp 2393
Acknowledgements
This work was funded by the National Nature Science Foundation of China Grant Nos. 62073177, 61973175, 62003351.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This study belongs to the improvement and application innovation of reinforcement learning algorithms, so it does not involve ethical issues.
Informed consent
All authors are aware of this paper and agree to its submission.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, T., Wang, Y., Sun, M. et al. Air combat maneuver decision based on deep reinforcement learning with auxiliary reward. Neural Comput & Applic 36, 13341–13356 (2024). https://doi.org/10.1007/s00521-024-09720-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09720-z