Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3556384.3556414acmotherconferencesArticle/Chapter ViewAbstractPublication PagesspmlConference Proceedingsconference-collections
research-article
Open access

Hierarchical Multi-modal Transformer for Automatic Detection of COVID-19

Published: 29 October 2022 Publication History

Abstract

Automated COVID-19 detection based on analysis of cough recordings has been an important field of study, as efficient and accurate methods are necessary to contain the spread of the global pandemic and relieve the burden on medical facilities. While previous works presented lightweight machine learning models [9], these models may sacrifice accuracy and interpretability to integrate into mobile devices. Besides, the question of how to effectively associate indicators from audio signals to other modality inputs (i.e. patient information) is still largely unexplored, as previous works predominantly relied on simply concatenated features to learn. To tackle these issues, this paper proposes a novel Hierarchical Multi-modal Transformer (HMT) that learns more informative multi-modal representations with a cross attention module during the feature fusion procedure. Besides, the block aggregation algorithm for the HMT provides an efficient and improved solution from the Vanilla Vision Transformer for limited COVID-19 benchmark datasets. Extensive experiments show the effectiveness of our proposed model for more accurate COVID-19 detection, which yield state-of-the-art results on two public datasets, Coswara and COUGHVID.

References

[1]
[n.d.]. World Health Organization. Coronavirus (COVID-19) Cases and Deaths. https://data.humdata.org/dataset/coronavirus-covid-19-cases-and-deaths
[2]
Yusuf Amrulloh, Udantha Abeyratne, Vinayak Swarnkar, and Rina Triasih. 2015. Cough Sound Analysis for Pneumonia and Asthma Classification in Pediatric Population. In 2015 6th International Conference on Intelligent Systems, Modelling and Simulation. 127–131. https://doi.org/10.1109/ISMS.2015.41
[3]
Gunvant Chaudhari, Xinyi Jiang, Ahmed Fakhry, Asriel Han, Jaclyn Xiao, Sabrina Shen, and Amil Khanzada. 2020. Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from Cough. https://doi.org/10.48550/ARXIV.2011.13320
[4]
S. Davis and P. Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 4(1980), 357–366. https://doi.org/10.1109/TASSP.1980.1163420
[5]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR abs/2010.11929(2020). arXiv:2010.11929https://arxiv.org/abs/2010.11929
[6]
Ahmed Fakhry, Xinyi Jiang, Jaclyn Xiao, Gunvant Chaudhari, Asriel Han, and Amil Khanzada. 2021. Virufy: A Multi-Branch Deep Learning Network for Automated Detection of COVID-19. https://doi.org/10.48550/ARXIV.2103.01806
[7]
Yuan Gong, Yu-An Chung, and James Glass. 2021. AST: Audio Spectrogram Transformer. In Proc. Interspeech 2021. 571–575. https://doi.org/10.21437/Interspeech.2021-698
[8]
Yuan Gong, Yu-An Chung, and James Glass. 2021. PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2021). https://doi.org/10.1109/TASLP.2021.3120633
[9]
Esin Darici Haritaoglu, Nicholas Rasmussen, Daniel C. H. Tan, Jennifer Ranjani J., Jaclyn Xiao, Gunvant Chaudhari, Akanksha Rajput, Praveen Govindan, Christian Canham, Wei Chen, Minami Yamaura, Laura Gomezjurado, Aaron Broukhim, Amil Khanzada, and Mert Pilanci. 2022. Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough. https://doi.org/10.48550/ARXIV.2201.01669
[10]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2016. Densely Connected Convolutional Networks. https://doi.org/10.48550/ARXIV.1608.06993
[11]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. https://doi.org/10.48550/ARXIV.1412.6980
[12]
J. Korpáš, J. Sadloňová, and M. Vrabec. 1996. Analysis of the Cough Sound: an Overview. Pulmonary Pharmacology 9, 5-6 (Oct 1996), 261–268. https://doi.org/10.1006/pulp.1996.0034
[13]
Shervin Minaee, Rahele Kafieh, Milan Sonka, Shakib Yazdani, and Ghazaleh Jamalipour Soufi. 2020. Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning. Medical Image Analysis 65 (2020), 101794. https://doi.org/10.1016/j.media.2020.101794
[14]
Lara Orlandic, Tomas Teijeiro, and David Atienza. 2021. The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms. https://www.nature.com/articles/s41597-021-00937-4
[15]
Zexu Pan, Zhaojie Luo, Jichen Yang, and Haizhou Li. 2020. Multi-modal Attention for Speech Emotion Recognition. https://doi.org/10.48550/ARXIV.2009.04107
[16]
Renard Xaviero Adhi Pramono, Syed Anas Imtiaz, and Esther Rodriguez-Villegas. 2016. A Cough-Based Algorithm for Automatic Diagnosis of Pertussis. PLOS ONE 11, 9 (09 2016), 1–20. https://doi.org/10.1371/journal.pone.0162128
[17]
Philip J. Rosenthal. 2020. The Importance of Diagnostic Testing during a Viral Pandemic: Early Lessons from Novel Coronavirus Disease (COVID-19). The American Journal of Tropical Medicine and Hygiene 102, 5 (2020), 915 – 916. https://doi.org/10.4269/ajtmh.20-0216
[18]
Elliot Saba. 2018. Techniques for cough sound analysis. http://hdl.handle.net/1773/43034
[19]
Neeraj Sharma, Prashant Krishnan, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli, Nirmala R., Prasanta Kumar Ghosh, and Sriram Ganapathy. 2020. Coswara — A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis. In Proc. Interspeech 2020. 4811–4815. https://doi.org/10.21437/Interspeech.2020-2768
[20]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. https://doi.org/10.48550/ARXIV.1602.07261
[21]
Shuyun Tang, Zhaojie Luo, Guoshun Nan, Yuichiro Yoshikawa, and Ishiguro Hiroshi. 2021. Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition. https://doi.org/10.48550/ARXIV.2109.07149
[22]
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2020. Training data-efficient image transformers & distillation through attention. CoRR abs/2012.12877(2020). arXiv:2012.12877https://arxiv.org/abs/2012.12877
[23]
Anusua Trivedi, Anthony Ortiz, Jocelyn Desbiens, Caleb Robinson, Marian Blazes, Sunil Gupta, Rahul Dodhia, Pavan Bhatraju, W. Conrad Liles, Aaron Lee, and Juan M. Lavista Ferres. 2020. Effective Deep Learning Approaches for Predicting COVID-19 Outcomes from Chest Computed Tomography Volumes. medRxiv (2020). https://doi.org/10.1101/2020.10.15.20213462 arXiv:https://www.medrxiv.org/content/early/2020/10/20/2020.10.15.20213462.full.pdf
[24]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. https://doi.org/10.48550/ARXIV.1706.03762
[25]
Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, and Tomas Pfister. 2021. Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding. https://doi.org/10.48550/ARXIV.2105.12723

Cited By

View all
  • (2024)COVID-19 Detection From Respiratory Sounds With Hierarchical Spectrogram TransformersIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.333970028:3(1273-1284)Online publication date: Mar-2024
  • (2024)A Systematic Review of Multimodal Deep Learning Approaches for COVID-19 DiagnosisImage Analysis and Processing - ICIAP 2023 Workshops10.1007/978-3-031-51026-7_13(140-151)Online publication date: 21-Jan-2024
  • (2023)Automatic Detection of Dyspnea in Real Human–Robot Interaction ScenariosSensors10.3390/s2317759023:17(7590)Online publication date: 1-Sep-2023
  • Show More Cited By

Index Terms

  1. Hierarchical Multi-modal Transformer for Automatic Detection of COVID-19

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        SPML '22: Proceedings of the 2022 5th International Conference on Signal Processing and Machine Learning
        August 2022
        309 pages
        ISBN:9781450396912
        DOI:10.1145/3556384
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 29 October 2022

        Check for updates

        Author Tags

        1. Accountable Artificial Intelligence
        2. Artificial Intelligence in Health
        3. Machine Learning
        4. Neural Networks
        5. Signal Processing
        6. Transformer

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        SPML 2022

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)223
        • Downloads (Last 6 weeks)26
        Reflects downloads up to 04 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)COVID-19 Detection From Respiratory Sounds With Hierarchical Spectrogram TransformersIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.333970028:3(1273-1284)Online publication date: Mar-2024
        • (2024)A Systematic Review of Multimodal Deep Learning Approaches for COVID-19 DiagnosisImage Analysis and Processing - ICIAP 2023 Workshops10.1007/978-3-031-51026-7_13(140-151)Online publication date: 21-Jan-2024
        • (2023)Automatic Detection of Dyspnea in Real Human–Robot Interaction ScenariosSensors10.3390/s2317759023:17(7590)Online publication date: 1-Sep-2023
        • (2023)Dyspnea Severity Assessment Based on Vocalization Behavior with Deep Learning on the TelephoneSensors10.3390/s2305244123:5(2441)Online publication date: 22-Feb-2023
        • (2023)Privacy-Enhancing Digital Contact Tracing with Machine Learning for Pandemic Response: A Comprehensive ReviewBig Data and Cognitive Computing10.3390/bdcc70201087:2(108)Online publication date: 1-Jun-2023

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media