research-article

Discovering Expert-Level Air Combat Knowledge via Deep Excitatory-Inhibitory Factorized Reinforcement Learning

Authors:

Yi ChangAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 15, Issue 4

Article No.: 65, Pages 1 - 28

https://doi.org/10.1145/3653979

Published: 18 June 2024 Publication History

Abstract

Artificial Intelligence (AI) has achieved a wide range of successes in autonomous air combat decision-making recently. Previous research demonstrated that AI-enabled air combat approaches could even acquire beyond human-level capabilities. However, there remains a lack of evidence regarding two major difficulties. First, the existing methods with fixed decision intervals are mostly devoted to solving what to act but merely pay attention to when to act, which occasionally misses optimal decision opportunities. Second, the method of an expert-crafted finite maneuver library leads to a lack of tactics diversity, which is vulnerable to an opponent equipped with new tactics. In view of this, we propose a novel Deep Reinforcement Learning (DRL) and prior knowledge hybrid autonomous air combat tactics discovering algorithm, namely deep Excitatory-iNhibitory fACTorIzed maneuVEr (ENACTIVE) learning. The algorithm consists of two key modules, i.e., ENHANCE and FACTIVE. Specifically, ENHANCE learns to adjust the air combat decision-making intervals and appropriately seize key opportunities. FACTIVE factorizes maneuvers and then jointly optimizes them with significant tactics diversity increments. Extensive experimental results reveal that the proposed method outperforms state-of-the-art algorithms with a 62% winning rate and further obtains a margin of a 2.85-fold increase in terms of global tactic space coverage. It also demonstrates that a variety of discovered air combat tactics are comparable to human experts’ knowledge.

References

[1]

Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI Press, 1726–1734. https://ojs.aaai.org/index.php/AAAI/article/view/10916

[2]

Loris Belcastro, Fabrizio Marozzo, Domenico Talia, and Paolo Trunfio. 2016. Using scalable data mining for predicting flight delays. ACM Transactions on Intelligent Systems and Technology 8, 1, Article 5 (July 2016), 20 pages.

Digital Library

[3]

Herbert H. Bell and Wayne L. Waag. 1998. Evaluating the effectiveness of flight simulators for training combat skills: A review. International Journal of Aviation Psychology 8, 3 (1998), 223–242.

[4]

Abder Rezak Benaskeur, Froduald Kabanza, and Eric Beaudry. 2010. CORALS: A real-time planner for anti-air defense operations. ACM Transactions on Intelligent Systems and Technology 1, 2, Article 13 (Dec. 2010), 21 pages.

Digital Library

[5]

André Biedenkapp, Raghu Rajan, Frank Hutter, and Marius Lindauer. 2021. TempoRL: Learning when to act. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 914–924.

[6]

G. Burgin and D. Eggleston. 1976. Design of an All-attitude Flight Control System to Execute Commanded Bank Angles and Angles of Attack. Technical Report. NASA Langley Technical Report Server.

[7]

G. H. Burgin and A. J. Owens. 1975. An Adaptive Maneuvering Logic Computer Program for the Simulation of One-on-one Air-to-air Combat. Technical Report. NASA Langley Technical Report Server.

[8]

Hechang Chen, Bo Yang, and Jiming Liu. 2018. Partially observable reinforcement learning for sustainable active surveillance. In Proceedings of the 11th International Conference on Knowledge Science, Engineering and Management. Springer, 425–437.

Digital Library

[9]

Robert Chen. 2004. Interactions between inhibitory and excitatory circuits in the human motor cortex. Experimental Brain Research 154, 1 (Jan. 2004), 1–10.

[10]

Ben Chih, Holly Engelman, and Peter Scheiffele. 2005. Control of excitatory and inhibitory synapse formation by neuroligins. Science 307, 5713 (Feb. 2005), 1324–1328.

[11]

Carsten Christensen and John Salmon. 2022. Principles for small-unit sUAS tactical deployment from a combat-simulating agent-based model analysis. Expert Systems with Applications 190 (2022), 116156.

Digital Library

[12]

Nicholas Ernest, David Carroll, Corey Schumacher, Matthew Clark, Kelly Cohen, and Gene Lee. 2016. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions. Journal of Defense Management 6, 1 (March 2016), 2167–0374.

[13]

Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2019. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations.

[14]

Robert C. Froemke. 2015. Plasticity of cortical excitatory-inhibitory balance. Annual Review of Neuroscience 38 (July 2015), 195–219.

[15]

Hai-yan He, Wanhua Shen, Lijun Zheng, Xia Guo, and Hollis T. Cline. 2018. Excitatory synaptic dysfunction cell-autonomously decreases inhibitory inputs and disrupts structural and functional plasticity. Nature Communications 9, 1 (July 2018), 1–14.

[16]

Dongyuan Hu, Rennong Yang, Jialiang Zuo, Ze Zhang, Jun Wu, and Ying Wang. 2021. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat. IEEE Access 9 (2021), 32282–32297.

[17]

Xia Hua, Xinqing Wang, Ting Rui, Faming Shao, and Dong Wang. 2021. Light-weight UAV object tracking network based on strategy gradient and attention mechanism. Knowledge-based Systems 224 (2021), 107071.

[18]

Hasan İşci and Gülay Öke Günel. 2022. Fuzzy logic based air-to-air combat algorithm for unmanned air vehicles. International Journal of Dynamics and Control 10, 1 (2022), 230–242.

[19]

Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castañeda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, and Thore Graepel. 2019. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 6443 (2019), 859–865.

[20]

John W. McManus and Kenneth H. Goodrich. 1989. Application of Artificial Intelligence (AI) Programming Techniques to Tactical Guidance For Fighter Aircraft. Technical Report. NASA Langley Technical Report Server.

[21]

Vijay R. Konda and John N. Tsitsiklis. 2000. Actor-critic algorithms. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Vol. 12. MIT Press, 1008–1014.

[22]

Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang. 2022. Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations.

[23]

Van Der Maaten Laurens and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (Nov. 2008), 2579–2605.

[24]

Shou-yi Li, Mou Chen, Yu-hui Wang, and Qing-xian Wu. 2022. Air combat decision-making of multiple UCAVs based on constraint strategy games. Defence Technology 18, 3 (2022), 368–383.

[25]

Jie Liu, Bin Liu, Yanchi Liu, Huipeng Chen, Lina Feng, Hui Xiong, and Yalou Huang. 2017. Personalized air travel prediction: A multi-factor perspective. ACM Transactions on Intelligent Systems and Technology 9, 3, Article 30 (Dec. 2017), 26 pages.

Digital Library

[26]

James S. McGrew, Jonathon P. How, Brian Williams, and Nicholas Roy. 2010. Air-combat strategy using approximate dynamic programming. Journal of Guidance Control and Dynamics 33, 5 (Sept.– Oct. 2010), 1641–1654.

[27]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48. PMLR, 1928–1937.

Digital Library

[28]

Robert Morris, Matthew Johnson, K. Brent Venable, and James Lindsey. 2016. Designing noise-minimal rotorcraft approach trajectories. ACM Transactions on Intelligent Systems and Technology 7, 4, Article 58 (April 2016), 25 pages.

Digital Library

[29]

Sacha B. Nelson and Vera Valakh. 2015. Excitatory/inhibitory balance and circuit homeostasis in autism spectrum disorders. Neuron 87, 4 (Aug. 2015), 684–698.

[30]

H. Park, B. Y. Lee, M. J. Tahk, and D. W. Yoo. 2015. Differential game based air combat maneuver generation using scoring function matrix. International Journal of Aeronautical and Space Sciences 17, 2 (2015), 204–213.

[31]

Su-Jeong Park, Soon-Seo Park, Han-Lim Choi, Kyeong-Soo An, and Young-Gon Kim. 2021. An expert data-driven air combat maneuver model learning approach. In AIAA Scitech 2021 Forum. 0526.

[32]

H. Piao, Z. Sun, G. Meng, H. Chen, B. Qu, K. Lang, Y. Sun, S. Yang, and X. Peng. 2020. Beyond-visual-range air combat tactics auto-generation by reinforcement learning. In Proceedings of the 2020 International Joint Conference on Neural Networks. IEEE, 1–8.

[33]

Adrian P. Pope, Jaime S. Ide, Daria Mićović, Henry Diaz, David Rosenbluth, Lee Ritholtz, Jason C. Twedt, Thayne T. Walker, Kevin Alcedo, and Daniel Javorsek. 2021. Hierarchical reinforcement learning for air-to-air combat. In Proceedings of the 2021 International Conference on Unmanned Aircraft System. IEEE, 275–284.

[34]

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder De Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2020. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research 21, 1 (2020), 7234–7284.

Digital Library

[35]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (July 2017).

[36]

Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. 2020. Dynamics-aware unsupervised discovery of skills. In International Conference on Learning Representations.

[37]

Jesús Enrique Sierra-García and Matilde Santos. 2021. Intelligent control of an UAV with a cable-suspended load using a neural network estimator. Expert Systems with Applications 183 (2021), 115380.

Digital Library

[38]

Martin Strohmeier, Matthew Smith, Vincent Lenders, and Ivan Martinovic. 2021. Classi-Fly: Inferring aircraft categories from open data. ACM Transactions on Intelligent Systems and Technology 12, 6, Article 79 (Nov. 2021), 23 pages.

Digital Library

[39]

Zhixiao Sun, Haiyin Piao, Zhen Yang, Yiyang Zhao, Guang Zhan, Deyun Zhou, Guanglei Meng, Hechang Chen, Xing Chen, Bohao Qu, and Yuanjie Lu. 2021. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play. Engineering Applications of Artificial Intelligence 98 (Feb. 2021), 104112.

[40]

Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Vol. 12. MIT Press, 1057–1063.

[41]

Carl Van Vreeswijk and Haim Sompolinsky. 1996. Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science 274, 5293 (Dec. 1996), 1724–1726.

[42]

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. 2022. The surprising effectiveness of PPO in cooperative multi-agent games. Advances in Neural Information Processing Systems 35 (2022), 24611–24624.

Index Terms

Discovering Expert-Level Air Combat Knowledge via Deep Excitatory-Inhibitory Factorized Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction
Abstract
Nowadays, various innovative air combat paradigms that rely on unmanned aerial vehicles (UAVs), i.e., UAV swarm and UAV-manned aircraft cooperation, have received great attention worldwide. During the operation, UAVs are expected to perform agile ...
Air combat maneuver decision based on deep reinforcement learning with auxiliary reward
Abstract
For air combat maneuvering decision, the sparse reward during the application of deep reinforcement learning limits the exploration efficiency of the agents. To address this challenge, we propose an auxiliary reward function considering the impact ...
Advancing Air Combat Tactics with Improved Neural Fictitious Self-play Reinforcement Learning
Advanced Intelligent Computing Technology and Applications
Abstract
We study the problem of utilizing reinforcement learning for action control in 1v1 Beyond-Visual-Range (BVR) air combat. In contrast to most reinforcement learning problems, 1v1 BVR air combat belongs to the class of two-player zero-sum games with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 15, Issue 4

August 2024

396 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3613644

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Online AM: 27 March 2024

Accepted: 06 March 2024

Revised: 07 October 2023

Received: 11 June 2022

Published in TIST Volume 15, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
233
Total Downloads

Downloads (Last 12 months)233
Downloads (Last 6 weeks)74

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents