Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Discovering Expert-Level Air Combat Knowledge via Deep Excitatory-Inhibitory Factorized Reinforcement Learning

Published: 18 June 2024 Publication History

Abstract

Artificial Intelligence (AI) has achieved a wide range of successes in autonomous air combat decision-making recently. Previous research demonstrated that AI-enabled air combat approaches could even acquire beyond human-level capabilities. However, there remains a lack of evidence regarding two major difficulties. First, the existing methods with fixed decision intervals are mostly devoted to solving what to act but merely pay attention to when to act, which occasionally misses optimal decision opportunities. Second, the method of an expert-crafted finite maneuver library leads to a lack of tactics diversity, which is vulnerable to an opponent equipped with new tactics. In view of this, we propose a novel Deep Reinforcement Learning (DRL) and prior knowledge hybrid autonomous air combat tactics discovering algorithm, namely deep Excitatory-iNhibitory fACTorIzed maneuVEr (ENACTIVE) learning. The algorithm consists of two key modules, i.e., ENHANCE and FACTIVE. Specifically, ENHANCE learns to adjust the air combat decision-making intervals and appropriately seize key opportunities. FACTIVE factorizes maneuvers and then jointly optimizes them with significant tactics diversity increments. Extensive experimental results reveal that the proposed method outperforms state-of-the-art algorithms with a 62% winning rate and further obtains a margin of a 2.85-fold increase in terms of global tactic space coverage. It also demonstrates that a variety of discovered air combat tactics are comparable to human experts’ knowledge.

References

[1]
Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI Press, 1726–1734. https://ojs.aaai.org/index.php/AAAI/article/view/10916
[2]
Loris Belcastro, Fabrizio Marozzo, Domenico Talia, and Paolo Trunfio. 2016. Using scalable data mining for predicting flight delays. ACM Transactions on Intelligent Systems and Technology 8, 1, Article 5 (July 2016), 20 pages.
[3]
Herbert H. Bell and Wayne L. Waag. 1998. Evaluating the effectiveness of flight simulators for training combat skills: A review. International Journal of Aviation Psychology 8, 3 (1998), 223–242.
[4]
Abder Rezak Benaskeur, Froduald Kabanza, and Eric Beaudry. 2010. CORALS: A real-time planner for anti-air defense operations. ACM Transactions on Intelligent Systems and Technology 1, 2, Article 13 (Dec. 2010), 21 pages.
[5]
André Biedenkapp, Raghu Rajan, Frank Hutter, and Marius Lindauer. 2021. TempoRL: Learning when to act. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 914–924.
[6]
G. Burgin and D. Eggleston. 1976. Design of an All-attitude Flight Control System to Execute Commanded Bank Angles and Angles of Attack. Technical Report. NASA Langley Technical Report Server.
[7]
G. H. Burgin and A. J. Owens. 1975. An Adaptive Maneuvering Logic Computer Program for the Simulation of One-on-one Air-to-air Combat. Technical Report. NASA Langley Technical Report Server.
[8]
Hechang Chen, Bo Yang, and Jiming Liu. 2018. Partially observable reinforcement learning for sustainable active surveillance. In Proceedings of the 11th International Conference on Knowledge Science, Engineering and Management. Springer, 425–437.
[9]
Robert Chen. 2004. Interactions between inhibitory and excitatory circuits in the human motor cortex. Experimental Brain Research 154, 1 (Jan. 2004), 1–10.
[10]
Ben Chih, Holly Engelman, and Peter Scheiffele. 2005. Control of excitatory and inhibitory synapse formation by neuroligins. Science 307, 5713 (Feb. 2005), 1324–1328.
[11]
Carsten Christensen and John Salmon. 2022. Principles for small-unit sUAS tactical deployment from a combat-simulating agent-based model analysis. Expert Systems with Applications 190 (2022), 116156.
[12]
Nicholas Ernest, David Carroll, Corey Schumacher, Matthew Clark, Kelly Cohen, and Gene Lee. 2016. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions. Journal of Defense Management 6, 1 (March 2016), 2167–0374.
[13]
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2019. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations.
[14]
Robert C. Froemke. 2015. Plasticity of cortical excitatory-inhibitory balance. Annual Review of Neuroscience 38 (July 2015), 195–219.
[15]
Hai-yan He, Wanhua Shen, Lijun Zheng, Xia Guo, and Hollis T. Cline. 2018. Excitatory synaptic dysfunction cell-autonomously decreases inhibitory inputs and disrupts structural and functional plasticity. Nature Communications 9, 1 (July 2018), 1–14.
[16]
Dongyuan Hu, Rennong Yang, Jialiang Zuo, Ze Zhang, Jun Wu, and Ying Wang. 2021. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat. IEEE Access 9 (2021), 32282–32297.
[17]
Xia Hua, Xinqing Wang, Ting Rui, Faming Shao, and Dong Wang. 2021. Light-weight UAV object tracking network based on strategy gradient and attention mechanism. Knowledge-based Systems 224 (2021), 107071.
[18]
Hasan İşci and Gülay Öke Günel. 2022. Fuzzy logic based air-to-air combat algorithm for unmanned air vehicles. International Journal of Dynamics and Control 10, 1 (2022), 230–242.
[19]
Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castañeda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, and Thore Graepel. 2019. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 6443 (2019), 859–865.
[20]
John W. McManus and Kenneth H. Goodrich. 1989. Application of Artificial Intelligence (AI) Programming Techniques to Tactical Guidance For Fighter Aircraft. Technical Report. NASA Langley Technical Report Server.
[21]
Vijay R. Konda and John N. Tsitsiklis. 2000. Actor-critic algorithms. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Vol. 12. MIT Press, 1008–1014.
[22]
Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang. 2022. Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations.
[23]
Van Der Maaten Laurens and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (Nov. 2008), 2579–2605.
[24]
Shou-yi Li, Mou Chen, Yu-hui Wang, and Qing-xian Wu. 2022. Air combat decision-making of multiple UCAVs based on constraint strategy games. Defence Technology 18, 3 (2022), 368–383.
[25]
Jie Liu, Bin Liu, Yanchi Liu, Huipeng Chen, Lina Feng, Hui Xiong, and Yalou Huang. 2017. Personalized air travel prediction: A multi-factor perspective. ACM Transactions on Intelligent Systems and Technology 9, 3, Article 30 (Dec. 2017), 26 pages.
[26]
James S. McGrew, Jonathon P. How, Brian Williams, and Nicholas Roy. 2010. Air-combat strategy using approximate dynamic programming. Journal of Guidance Control and Dynamics 33, 5 (Sept.– Oct. 2010), 1641–1654.
[27]
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48. PMLR, 1928–1937.
[28]
Robert Morris, Matthew Johnson, K. Brent Venable, and James Lindsey. 2016. Designing noise-minimal rotorcraft approach trajectories. ACM Transactions on Intelligent Systems and Technology 7, 4, Article 58 (April 2016), 25 pages.
[29]
Sacha B. Nelson and Vera Valakh. 2015. Excitatory/inhibitory balance and circuit homeostasis in autism spectrum disorders. Neuron 87, 4 (Aug. 2015), 684–698.
[30]
H. Park, B. Y. Lee, M. J. Tahk, and D. W. Yoo. 2015. Differential game based air combat maneuver generation using scoring function matrix. International Journal of Aeronautical and Space Sciences 17, 2 (2015), 204–213.
[31]
Su-Jeong Park, Soon-Seo Park, Han-Lim Choi, Kyeong-Soo An, and Young-Gon Kim. 2021. An expert data-driven air combat maneuver model learning approach. In AIAA Scitech 2021 Forum. 0526.
[32]
H. Piao, Z. Sun, G. Meng, H. Chen, B. Qu, K. Lang, Y. Sun, S. Yang, and X. Peng. 2020. Beyond-visual-range air combat tactics auto-generation by reinforcement learning. In Proceedings of the 2020 International Joint Conference on Neural Networks. IEEE, 1–8.
[33]
Adrian P. Pope, Jaime S. Ide, Daria Mićović, Henry Diaz, David Rosenbluth, Lee Ritholtz, Jason C. Twedt, Thayne T. Walker, Kevin Alcedo, and Daniel Javorsek. 2021. Hierarchical reinforcement learning for air-to-air combat. In Proceedings of the 2021 International Conference on Unmanned Aircraft System. IEEE, 275–284.
[34]
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder De Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2020. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research 21, 1 (2020), 7234–7284.
[35]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (July 2017).
[36]
Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. 2020. Dynamics-aware unsupervised discovery of skills. In International Conference on Learning Representations.
[37]
Jesús Enrique Sierra-García and Matilde Santos. 2021. Intelligent control of an UAV with a cable-suspended load using a neural network estimator. Expert Systems with Applications 183 (2021), 115380.
[38]
Martin Strohmeier, Matthew Smith, Vincent Lenders, and Ivan Martinovic. 2021. Classi-Fly: Inferring aircraft categories from open data. ACM Transactions on Intelligent Systems and Technology 12, 6, Article 79 (Nov. 2021), 23 pages.
[39]
Zhixiao Sun, Haiyin Piao, Zhen Yang, Yiyang Zhao, Guang Zhan, Deyun Zhou, Guanglei Meng, Hechang Chen, Xing Chen, Bohao Qu, and Yuanjie Lu. 2021. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play. Engineering Applications of Artificial Intelligence 98 (Feb. 2021), 104112.
[40]
Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Vol. 12. MIT Press, 1057–1063.
[41]
Carl Van Vreeswijk and Haim Sompolinsky. 1996. Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science 274, 5293 (Dec. 1996), 1724–1726.
[42]
Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. 2022. The surprising effectiveness of PPO in cooperative multi-agent games. Advances in Neural Information Processing Systems 35 (2022), 24611–24624.

Index Terms

  1. Discovering Expert-Level Air Combat Knowledge via Deep Excitatory-Inhibitory Factorized Reinforcement Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 4
    August 2024
    396 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/3613644
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2024
    Online AM: 27 March 2024
    Accepted: 06 March 2024
    Revised: 07 October 2023
    Received: 11 June 2022
    Published in TIST Volume 15, Issue 4

    Check for updates

    Author Tags

    1. Air combat
    2. Artificial Intelligence (AI)
    3. Deep Reinforcement Learning (DRL)
    4. Excitatory-Inhibitory (E/I) balance

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 233
      Total Downloads
    • Downloads (Last 12 months)233
    • Downloads (Last 6 weeks)74
    Reflects downloads up to 18 Aug 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media