Visual Grounding of Whole Radiology Reports for 3D CT Images

Ichinose, Akimichi; Hatsutani, Taro; Nakamura, Keigo; Kitamura, Yoshiro; Iizuka, Satoshi; Simo-Serra, Edgar; Kido, Shoji; Tomiyama, Noriyuki

doi:10.1007/978-3-031-43904-9_59

Akimichi Ichinose¹⁴,
Taro Hatsutani¹⁴,
Keigo Nakamura¹⁴,
Yoshiro Kitamura¹⁴,
Satoshi Iizuka¹⁵,
Edgar Simo-Serra¹⁶,
Shoji Kido¹⁷ &
…
Noriyuki Tomiyama¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14224))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

5108 Accesses

Abstract

Building a large-scale training dataset is an essential problem in the development of medical image recognition systems. Visual grounding techniques, which automatically associate objects in images with corresponding descriptions, can facilitate labeling of large number of images. However, visual grounding of radiology reports for CT images remains challenging, because so many kinds of anomalies are detectable via CT imaging, and resulting report descriptions are long and complex. In this paper, we present the first visual grounding framework designed for CT image and report pairs covering various body parts and diverse anomaly types. Our framework combines two components of 1) anatomical segmentation of images, and 2) report structuring. The anatomical segmentation provides multiple organ masks of given CT images, and helps the grounding model recognize detailed anatomies. The report structuring helps to accurately extract information regarding the presence, location, and type of each anomaly described in corresponding reports. Given the two additional image/report features, the grounding model can achieve better localization. In the verification process, we constructed a large-scale dataset with region-description correspondence annotations for 10,410 studies of 7,321 unique patients. We evaluated our framework using grounding accuracy, the percentage of correctly localized anomalies, as a metric and demonstrated that the combination of the anatomical segmentation and the report structuring improves the performance with a large margin over the baseline model (66.0% vs 77.8%). Comparison with the prior techniques also showed higher performance of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Annotating Medical Image Data

VISCERAL: Towards Large Data in Medical Imaging — Challenges and Directions

Abstract: Robust Multi-Scale Anatomical Landmark Detection in Incomplete 3D-CT Data

References

Bhalodia, R., et al.: Improving pneumonia localization via cross-attention on medical images and reports. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12902, pp. 571–581. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_53
Chapter Google Scholar
Dall, T.: The Complexities of Physician Supply and Demand: Projections from 2016 to 2030. IHS Markit Limited (2018)
Google Scholar
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)
Article Google Scholar
Deng, J., Yang, Z., Chen, T., Zhou, W., Li, H.: TransVG: end-to-end Visual Grounding With Transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1769–1779 (2021)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Ebrahimian, S., et al.: FDA-regulated AI algorithms: trends, strengths, and gaps of validation studies. Acad. Radiol. 29(4), 559–566 (2022)
Article Google Scholar
Hu, R., Rohrbach, M., Darrell, T.: Segmentation from natural language expressions. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 108–124. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_7
Chapter Google Scholar
Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597 (2019)
Google Scholar
Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 317 (2019)
Google Scholar
Kamath, A., Singh, M., LeCun, Y., Synnaeve, G., Misra, I., Carion, N.: MDETR-modulated detection for end-to-end multi-modal understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1780–1790 (2021)
Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Google Scholar
Karpathy, A., Joulin, A., Fei-Fei, L.F.: Deep fragment embeddings for bidirectional image sentence mapping. In: Proceedings of Advances in Neural Information Processing System, pp. 1889–1897 (2014)
Google Scholar
Keshwani, D., Kitamura, Y., Li, Y.: Computation of total kidney volume from ct images in autosomal dominant polycystic kidney disease using multi-task 3D convolutional neural networks. In: Shi, Y., Suk, H.-I., Liu, M. (eds.) MLMI 2018. LNCS, vol. 11046, pp. 380–388. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00919-9_44
Chapter Google Scholar
Lee, K.H., Chen, X., Hua, G., Hu, H., He, X.: Stacked cross attention for image-text matching. In: Proceedings of the European Conference on Computer Vision, pp. 201–216 (2018)
Google Scholar
Li, B., Weng, Y., Sun, B., Li, S.: Towards visual-prompt temporal answering grounding in medical instructional video. arXiv preprint arXiv:2203.06667 (2022)
Li, Y., Wang, H., Luo, Y.: A comparison of pre-trained vision-and-language models for multimodal representation learning across medical images and reports. In: Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, pp. 1999–2004. IEEE (2020)
Google Scholar
Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Adv. Neural. Inf. Process. Syst. 32, 13–23 (2019)
Google Scholar
Masuzawa, N., Kitamura, Y., Nakamura, K., Iizuka, S., Simo-Serra, E.: Automatic segmentation, localization, and identification of vertebrae in 3d ct images using cascaded convolutional neural networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12266, pp. 681–690. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_66
Chapter Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the International Conference on 3D Vision, pp. 565–571. IEEE (2016)
Google Scholar
Nakano, N., et al.: Pre-training methods for creating a language model with embedded knowledge of radiology reports. In: Proceedings of the annual meeting of the Association for Natural Language Processing (2022)
Google Scholar
Nishie, A., et al.: Current radiologist workload and the shortages in Japan: how many full-time radiologists are required? Jpn. J. Radiol. 33, 266–272 (2015)
Article Google Scholar
Rimmer, A.: Radiologist shortage leaves patient care at risk, warns royal college. BMJ: British Med. J. (Online) 359 (2017)
Google Scholar
Seibold, C., et al.: Detailed Annotations of Chest X-Rays via CT Projection for Report Understanding. arXiv preprint arXiv:2210.03416 (2022)
Tagawa, Y., et al.: Performance improvement of named entity recognition on noisy data using teacher-student training. In: Proceedings of the annual meeting of the Association for Natural Language Processing (2022)
Google Scholar
Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: TieNet: text-image embedding network for common thorax disease classification and reporting in Chest X-rays. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9049–9058 (2018)
Google Scholar
Yan, K., Wang, X., Lu, L., Summers, R.M.: DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. J. Med. Imaging 5(3), 036501–036501 (2018)
Article Google Scholar
Yang, Z., Gong, B., Wang, L., Huang, W., Yu, D., Luo, J.: A fast and accurate one-stage approach to visual grounding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4683–4693 (2019)
Google Scholar
You, D., Liu, F., Ge, S., Xie, X., Zhang, J., Wu, X.: AlignTransformer: hierarchical alignment of visual regions and disease tags for medical report generation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 72–82. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_7
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Medical Systems Research and Development Center, FUJIFILM Corporation, Tokyo, Japan
Akimichi Ichinose, Taro Hatsutani, Keigo Nakamura & Yoshiro Kitamura
Center for Artificial Intelligence Research, University of Tsukuba, Ibaraki, Japan
Satoshi Iizuka
Department of Computer Science and Engineering, Waseda University, Tokyo, Japan
Edgar Simo-Serra
Graduate School of Medicine, Osaka University, Osaka, Japan
Shoji Kido & Noriyuki Tomiyama

Authors

Akimichi Ichinose
View author publications
You can also search for this author in PubMed Google Scholar
Taro Hatsutani
View author publications
You can also search for this author in PubMed Google Scholar
Keigo Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiro Kitamura
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Iizuka
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Simo-Serra
View author publications
You can also search for this author in PubMed Google Scholar
Shoji Kido
View author publications
You can also search for this author in PubMed Google Scholar
Noriyuki Tomiyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akimichi Ichinose .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 356 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ichinose, A. et al. (2023). Visual Grounding of Whole Radiology Reports for 3D CT Images. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14224. Springer, Cham. https://doi.org/10.1007/978-3-031-43904-9_59

Download citation

DOI: https://doi.org/10.1007/978-3-031-43904-9_59
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43903-2
Online ISBN: 978-3-031-43904-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Visual Grounding of Whole Radiology Reports for 3D CT Images

Abstract

Access this chapter

Subscribe and save

Buy Now