Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3613759acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

M2ATS: A Real-world Multimodal Air Traffic Situation Benchmark Dataset and Beyond

Published: 27 October 2023 Publication History

Abstract

Air Traffic Control (ATC) is a complicated, time-evolving, and real-time procedure to direct flight operations in a safer and ordered manner. Although enormous data storages are available during air traffic operations for over 40 years, data-driven intelligent application in aviation is still an emerging task due to the safety-critical issue. With the prevalence of the Next Generation ATC system, artificial intelligence (AI) -empowered research topics are attracting increasing attention from both industrial and academic domains and a high-quality dataset naturally becomes the prerequisite for such practices. However, almost all ATC-related datasets are only unimodal for certain tasks, which fails to comprehensively illustrate the traffic situation to further support real-world studies. To address this gap, a multimodal air traffic situation (M2ATS) dataset is constructed to advance AI-related research in the ATC domain, including airspace information, flight plan, trajectory, and speech. M2ATS covers 10362 flights ATC situation data, involving 110000+ utterances (104 hours) with diversity golden text annotations, 16 intents, and 51 slots. Considering the real-world ATC requirements, a total of 10 multimedia-related tasks (24 baselines) are designed to validate the proposed dataset, covering automatic speech recognition, natural language processing, and spatial-temporal data processing. New ATC-related metrics corresponding to ATC applications are proposed in addition to the common metrics to evaluate task performance. Extensive experiment results demonstrate that the selective baselines can achieve designed tasks on this new dataset, and further investigations are also required to address task and data specificities. It is believed that the proposed new dataset is a new practice to advance AI applications to an industrial scene, which not only promotes ATC-related applications but also provides diverse research topics in the common multimedia community.

References

[1]
Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. PMLR, 173--182.
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473
[3]
William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 4960--4964.
[4]
Qian Chen, Zhu Zhuo, and Wen Wang. 2019. BERT for Joint Intent Classification and Slot Filling. arxiv: 1902.10909 [cs.CL]
[5]
Keunwoo Choi, György Fazekas, Mark Sandler, and Kyunghyun Cho. 2017. Convolutional Recurrent Neural Networks for Music Classification. In Proc. ICASSP 2017. IEEE, New Orleans, LA, USA, 2392--2396.
[6]
Alex Graves. 2012. Sequence Transduction with Recurrent Neural Networks. CoRR, Vol. abs/1211.3711 (2012). showeprint[arXiv]1211.3711 http://arxiv.org/abs/1211.3711
[7]
Dongyue Guo, Edmond Q. Wu, Yuankai Wu, Jianwei Zhang, Rob Law, and Yi Lin. 2023 a. FlightBERT: Binary Encoding Representation for Flight Trajectory Prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 24, 2 (2023), 1828--1842. https://doi.org/10.1109/TITS.2022.3219923
[8]
Dongyue Guo, Jianwei Zhang, and Yi Lin. 2023 b. SIA-FTP: A Spoken Instruction Aware Flight Trajectory Prediction Framework. arxiv: 2305.01661 [cs.SD]
[9]
Dongyue Guo, Jianwei Zhang, Bo Yang, and Yi Lin. 2023 c. A Comparative Study of Speaker Role Identification in Air Traffic Communication Using Deep Learning Approaches. ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, 102 (2023), 1--17. Issue 4. https://doi.org/10.1145/3572792
[10]
Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro J. Moreno, Rohit Prabhavalkar, Zhongdi Qu, and Austin Waters. 2018. From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding. In 2018 IEEE Spoken Language Technology Workshop, SLT 2018, Athens, Greece, December 18-21, 2018. IEEE, 720--726. https://doi.org/10.1109/SLT.2018.8639043
[11]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. CoRR, Vol. abs/1508.01991 (2015). [arXiv]1508.01991 http://arxiv.org/abs/1508.01991
[12]
Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, and Yatharth Saraf. 2020. Contextual RNN-T for Open Domain ASR. In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, Helen Meng, Bo Xu, and Thomas Fang Zheng (Eds.). ISCA, 11--15. https://doi.org/10.21437/Interspeech.2020-2986
[13]
Yi Lin. 2021. Spoken instruction understanding in air traffic control: Challenge, technique, and application. Aerospace, Vol. 8, 3 (2021), 65.
[14]
Yi Lin, Linjie Deng, Zhengmao Chen, Xiping Wu, Jianwei Zhang, and Bo Yang. 2019. A real-time ATC safety monitoring framework using a deep learning approach. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, 11 (2019), 4572--4581.
[15]
Yi Lin, Dongyue Guo, Jianwei Zhang, Zhengmao Chen, and Bo Yang. 2020. A unified framework for multilingual speech recognition in air traffic control systems. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 8 (2020), 3608--3620.
[16]
Yi Lin, Qin Li, Bo Yang, Zhen Yan, Huachun Tan, and Zhengmao Chen. 2021a. Improving speech recognition models with small samples for air traffic control systems. Neurocomputing, Vol. 445 (2021), 287--297.
[17]
Yi Lin, YuanKai Wu, Dongyue Guo, Pan Zhang, Changyu Yin, Bo Yang, and Jianwei Zhang. 2021b. A deep learning framework of autonomous pilot agent for air traffic controller training. IEEE Transactions on Human-Machine Systems, Vol. 51, 5 (2021), 442--450.
[18]
Yi Lin, Bo Yang, Linchao Li, Dongyue Guo, Jianwei Zhang, Hu Chen, and Yi Zhang. 2021c. ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems. Applied Soft Computing, Vol. 112 (2021), 107847.
[19]
Bing Liu and Ian R. Lane. 2016. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, Nelson Morgan (Ed.). ISCA, 685--689. https://doi.org/10.21437/Interspeech.2016-1352
[20]
Ben Niu, Weilei Wen, Wenqi Ren, Xiangde Zhang, Lianping Yang, Shuzhen Wang, Kaihao Zhang, Xiaochun Cao, and Haifeng Shen. 2020. Single Image Super-Resolution via a Holistic Attention Network. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XII (Lecture Notes in Computer Science, Vol. 12357), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 191--207. https://doi.org/10.1007/978-3-030-58610-2_12
[21]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL '02). Association for Computational Linguistics, USA, 311--318. https://doi.org/10.3115/1073083.1073135
[22]
Thomas Pellegrini, Jérôme Farinas, Estelle Delpech, and Francc ois Lancelot. 2019. The Airbus Air Traffic Control Speech Recognition 2018 Challenge: Towards ATC Automatic Transcription and Call Sign Detection. Proc. Interspeech 2019 (2019), 2993--2997.
[23]
Golan Pundak, Tara N. Sainath, Rohit Prabhavalkar, Anjuli Kannan, and Ding Zhao. 2018. Deep Context: End-to-end Contextual Speech Recognition. In 2018 IEEE Spoken Language Technology Workshop, SLT 2018, Athens, Greece, December 18-21, 2019. IEEE, 418--425. https://doi.org/10.1109/SLT.2018.8639034
[24]
Yao Qian, Rutuja Ubale, Vikram Ramanarayanan, Patrick L. Lange, David Suendermann-Oeft, Keelan Evanini, and Eugene Tsuprun. 2017. Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017. IEEE, 569--576. https://doi.org/10.1109/ASRU.2017.8268987
[25]
Mirco Ravanelli and Yoshua Bengio. 2018. Speaker Recognition from Raw Waveform with SincNet. In 2018 IEEE Spoken Language Technology Workshop (SLT). 1021--1028. https://doi.org/10.1109/SLT.2018.8639585
[26]
Wenqi Ren, Sifei Liu, Lin Ma, Qianqian Xu, Xiangyu Xu, Xiaochun Cao, Junping Du, and Ming-Hsuan Yang. 2019. Low-Light Image Enhancement via a Deep Hybrid Network. IEEE Trans. Image Process., Vol. 28, 9 (2019), 4364--4375. https://doi.org/10.1109/TIP.2019.2910412
[27]
Zhiyuan Shi, Min Xu, Quan Pan, Bing Yan, and Haimin Zhang. 2018. LSTM-based Flight Trajectory Prediction. In 2018 International Joint Conference on Neural Networks, IJCNN 2018, Rio de Janeiro, Brazil, July 8-13, 2019. IEEE, 1--8. https://doi.org/10.1109/IJCNN.2018.8489734
[28]
Lubos Smídl, Jan Svec, Daniel Tihelka, Jindrich Matousek, Jan Romportl, and Pavel Ircing. 2019. Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development. Language Resources and Evaluation, Vol. 53 (2019), 449--464. https://doi.org/10.1007/s10579-019-09449-5
[29]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.). 3104--3112. https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html
[30]
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai. 2018. ESPnet: End-to-End Speech Processing Toolkit. In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018, B. Yegnanarayana (Ed.). ISCA, 2207--2211. https://doi.org/10.21437/Interspeech.2018-1456
[31]
Puyang Xu and Ruhi Sarikaya. 2013. Exploiting shared information for multi-intent natural language sentence classification. In Interspeech. 3785--3789.
[32]
Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Min Ruan, Dan Li, Zhongping Yang, Xiping Wu, and Yi Lin. 2020. ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control Environment. Proc. Interspeech 2020 (2020), 399--403.
[33]
Jianwei Zhang, Pan Zhang, Dongyue Guo, Yang Zhou, Yuankai Wu, Bo Yang, and Yi Lin. 2022. Automatic repetition instruction generation for air traffic control training using multi-task learning with an improved copy network. Knowledge-Based Systems, Vol. 241 (2022), 108232.

Index Terms

  1. M2ATS: A Real-world Multimodal Air Traffic Situation Benchmark Dataset and Beyond

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. air traffic situation
    2. flight trajectory prediction
    3. multimodal fusion
    4. spoken instruction understanding

    Qualifiers

    • Research-article

    Funding Sources

    • Sichuan Science and Technology Program, and Fundamental Research Funds for the Central Universities of China
    • National Natural Science Foundation of China
    • Open Fund of Key Laboratory of Flight Techniques and Flight Safety, CAAC, China

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 184
      Total Downloads
    • Downloads (Last 12 months)184
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media