Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-59713-9_51guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment

Published: 04 October 2020 Publication History

Abstract

We propose and demonstrate a novel machine learning algorithm that assesses pulmonary edema severity from chest radiographs. While large publicly available datasets of chest radiographs and free-text radiology reports exist, only limited numerical edema severity labels can be extracted from radiology reports. This is a significant challenge in learning such models for image classification. To take advantage of the rich information present in the radiology reports, we develop a neural network model that is trained on both images and free-text to assess pulmonary edema severity from chest radiographs at inference time. Our experimental results suggest that the joint image-text representation learning improves the performance of pulmonary edema assessment compared to a supervised model trained on images only. We also show the use of the text for explaining the image classification by the joint model. To the best of our knowledge, our approach is the first to leverage free-text radiology reports for improving the image model performance in this application. Our code is available at: https://github.com/RayRuizhiLiao/joint_chestxray.

References

[1]
Adams Jr, K.F., et al.: Characteristics and outcomes of patients hospitalized for heart failure in the United States: rationale, design, and preliminary observations from the first 100,000 cases in the acute decompensated heart failure national registry (adhere). Am. Heart J. 149(2), 209–216 (2005)
[2]
Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)
[3]
Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
[4]
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3606–3611 (2019)
[5]
Ben-Younes, H., Cadene, R., Cord, M., Thome, N.: MUTAN: multimodal tucker fusion for visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2612–2620 (2017)
[6]
Chapman WW, Bridewell W, Hanbury P, Cooper GF, and Buchanan BG A simple algorithm for identifying negated findings and diseases in discharge summaries J. Biomed. Inform. 2001 34 5 301-310
[7]
Chechik G, Sharma V, Shalit U, and Bengio S Large scale online learning of image similarity through ranking J. Mach. Learn. Res. 2010 11 1109-1135
[8]
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
[9]
Gheorghiade M et al. Assessing and grading congestion in acute heart failure: a scientific statement from the acute heart failure committee of the heart failure association of the European society of cardiology and endorsed by the European society of intensive care medicine Eur. J. Heart Fail. 2010 12 5 423-433
[10]
Goldberger AL et al. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals Circulation 2000 101 23 e215-e220
[11]
Harwath, D., Recasens, A., Surís, D., Chuang, G., Torralba, A., Glass, J.: Jointly discovering visual objects and spoken words from raw sensory input. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 649–665 (2018)
[12]
Harwath, D., Torralba, A., Glass, J.: Unsupervised learning of spoken language with visual context. In: Advances in Neural Information Processing Systems, pp. 1858–1866 (2016)
[13]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
[14]
Hunt SA et al. 2009 focused update incorporated into the ACC/AHA 2005 guidelines for the diagnosis and management of heart failure in adults: a report of the American college of cardiology foundation/American heart association task force on practice guidelines developed in collaboration with the international society for heart and lung transplantation J. Am. Coll. Cardiol. 2009 53 15 e1-e90
[15]
Jing, B., Xie, P., Xing, E.: On the automatic generation of medical imaging reports. arXiv preprint arXiv:1711.08195 (2017)
[16]
Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6 (2019)
[17]
Liao, R., et al.: Semi-supervised learning for quantification of pulmonary edema in chest x-ray images. arXiv preprint arXiv:1902.10785 (2019)
[18]
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: Advances in Neural Information Processing Systems, pp. 289–297 (2016)
[19]
Moradi M, Madani A, Gur Y, Guo Y, and Syeda-Mahmood T Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, and Fichtinger G Bimodal network architectures for automatic generation of image annotation from text Medical Image Computing and Computer Assisted Intervention – MICCAI 2018 2018 Cham Springer 449-456
[20]
Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: fast and robust models for biomedical natural language processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task. pp. 319–327. Association for Computational Linguistics, Florence, Italy, August 2019. https://www.aclweb.org/anthology/W19-5034
[21]
Plummer, B.A., Brown, M., Lazebnik, S.: Enhancing video summarization via vision-language embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5781–5789 (2017)
[22]
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
[23]
Vasudevan, A.B., Gygli, M., Volokitin, A., Van Gool, L.: Query-adaptive video summarization via quality-aware relevance estimation. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 582–590 (2017)
[24]
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
[25]
Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: TieNet: text-image embedding network for common thorax disease classification and reporting in chest x-rays. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9049–9058 (2018)
[26]
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. ArXiv arXiv:1910 (2019)
[27]
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
[28]
Xue Y and Huang X Chung ACS, Gee JC, Yushkevich PA, and Bao S Improved disease classification in chest X-rays with transferred features from report generation Information Processing in Medical Imaging 2019 Cham Springer 125-138

Cited By

View all
  • (2024)Towards Medical Vision-Language Contrastive Pre-training via Study-Oriented Semantic ExplorationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681531(4861-4870)Online publication date: 28-Oct-2024
  • (2024)Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and ProspectsInternational Journal of Computer Vision10.1007/s11263-024-02032-8132:9(3753-3769)Online publication date: 1-Sep-2024
  • (2023)A scoping review on multimodal deep learning in biomedical images and textsJournal of Biomedical Informatics10.1016/j.jbi.2023.104482146:COnline publication date: 1-Oct-2023
  • Show More Cited By

Index Terms

  1. Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            Medical Image Computing and Computer Assisted Intervention – MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II
            Oct 2020
            815 pages
            ISBN:978-3-030-59712-2
            DOI:10.1007/978-3-030-59713-9

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 04 October 2020

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 25 Dec 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Towards Medical Vision-Language Contrastive Pre-training via Study-Oriented Semantic ExplorationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681531(4861-4870)Online publication date: 28-Oct-2024
            • (2024)Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and ProspectsInternational Journal of Computer Vision10.1007/s11263-024-02032-8132:9(3753-3769)Online publication date: 1-Sep-2024
            • (2023)A scoping review on multimodal deep learning in biomedical images and textsJournal of Biomedical Informatics10.1016/j.jbi.2023.104482146:COnline publication date: 1-Oct-2023
            • (2023)Using Multiple Instance Learning to Build Multimodal RepresentationsInformation Processing in Medical Imaging10.1007/978-3-031-34048-2_35(457-470)Online publication date: 12-Jun-2023
            • (2022)Making the Most of Text Semantics to Improve Biomedical Vision–Language ProcessingComputer Vision – ECCV 202210.1007/978-3-031-20059-5_1(1-21)Online publication date: 23-Oct-2022
            • (2022)Few-Shot Learning Geometric Ensemble for Multi-label Classification of Chest X-RaysData Augmentation, Labelling, and Imperfections10.1007/978-3-031-17027-0_12(112-122)Online publication date: 22-Sep-2022
            • (2022)Multi-transSP: Multimodal Transformer for Survival Prediction of Nasopharyngeal Carcinoma PatientsMedical Image Computing and Computer Assisted Intervention – MICCAI 202210.1007/978-3-031-16449-1_23(234-243)Online publication date: 18-Sep-2022
            • (2021)Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region AlignmentMachine Learning in Medical Imaging10.1007/978-3-030-87589-3_12(110-119)Online publication date: 27-Sep-2021
            • (2021)Predicting Esophageal Fistula Risks Using a Multimodal Self-attention NetworkMedical Image Computing and Computer Assisted Intervention – MICCAI 202110.1007/978-3-030-87240-3_69(721-730)Online publication date: 27-Sep-2021
            • (2021)Co-graph Attention Reasoning Based Imaging and Clinical Features Integration for Lymph Node Metastasis PredictionMedical Image Computing and Computer Assisted Intervention – MICCAI 202110.1007/978-3-030-87240-3_63(657-666)Online publication date: 27-Sep-2021
            • Show More Cited By

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media