GMoD: Graph-Driven Momentum Distillation Framework with Active Perception of Disease Severity for Radiology Report Generation

Xiang, ZhiPeng; Cui, ShaoGuo; Shang, CaoZhi; Jiang, Jingfeng; Zhang, Liqiang

doi:10.1007/978-3-031-72086-4_28

ZhiPeng Xiang¹⁴,
ShaoGuo Cui¹⁴,
CaoZhi Shang¹⁵,
Jingfeng Jiang¹⁶ &
…
Liqiang Zhang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15005))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

1247 Accesses

Abstract

Automatic radiology report generation is a challenging task that seeks to produce comprehensive and semantically consistent detailed descriptions from radiography (e.g., X-ray), alleviating the heavy workload of radiologists. Previous work explored the introduction of diagnostic information through multi-label classification. However, such methods can only provide a binary positive or negative classification result, leading to the omission of critical information regarding disease severity. We propose a Graph-driven Momentum Distillation (GMoD) approach to guide the model in actively perceiving the apparent disease severity implicitly conveyed in each radiograph. The proposed GMoD introduces two novel modules: Graph-based Topic Classifier (GTC) and Momentum Topic-Signal Distiller (MTD). Specifically, GTC combines symptoms and lung diseases to build topic maps and focuses on potential connections between them. MTD constrains the GTC to focus on the confidence of each disease being negative or positive by constructing pseudo labels, and then uses the multi-label classification results to assist the model in perceiving joint features to generate a more accurate report. Extensive experiments and analyses on IU-Xray and MIMIC-CXR benchmark datasets demonstrate that our GMoD outperforms state-of-the-art method. Our code is available at https://github.com/xzp9999/GMoD-mian.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DMR$^2$G: diffusion model for radiology report generation

Article 16 September 2024

Generating Chinese Radiology Reports from X-Ray Images: A Public Dataset and an X-ray-to-Reports Generation Method

Multivariate Cooperative Game for Image-Report Pairs: Hierarchical Semantic Alignment for Medical Report Generation

References

Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)
Google Scholar
Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: IEEvaluation@ACL (2005)
Google Scholar
Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. arXiv preprint arXiv:2010.16056 (2020)
Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: CVPR, (2020)
Google Scholar
Demner-Fushman, D., Kohli, M.D., Rosenman, M.B., Shooshan, S.E., Rodriguez, L., Antani, S., Thoma, G.R., McDonald, C.J.: Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association (2016)
Google Scholar
Huang, Z., Zhang, X., Zhang, S.: Kiut: Knowledge-injected u-transformer for radiology report generation. In: CVPR (2023)
Google Scholar
Ji, J., Luo, Y., Sun, X., Chen, F., Luo, G., Wu, Y., Gao, Y., Ji, R.: Improving image captioning by leveraging intra-and inter-layer global representation in transformer network. In: AAAI (2021)
Google Scholar
Johnson, A.E., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., Horng, S.: Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 (2019)
Li, C., Xu, H., Tian, J., Wang, W., Yan, M., Bi, B., Ye, J., Chen, H., Xu, G., Cao, Z., et al.: mplug: Effective and efficient vision-language learning by cross-modal skip-connections. arXiv preprint arXiv:2205.12005 (2022)
Li, J., Li, D., Xiong, C., Hoi, S.: Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning (2022)
Google Scholar
Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., Hoi, S.C.H.: Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems (2021)
Google Scholar
Li, M., Cai, W., Verspoor, K., Pan, S., Liang, X., Chang, X.: Cross-modal clinical graph transformer for ophthalmic report generation. In: CVPR (2022)
Google Scholar
Li, M., Lin, B., Chen, Z., Lin, H., Liang, X., Chang, X.: Dynamic graph enhanced contrastive learning for chest x-ray report generation. In: CVPR, (2023)
Google Scholar
Li, Y., Yang, B., Cheng, X., Zhu, Z., Li, H., Zou, Y.: Unify, align and refine: Multi-level semantic alignment for radiology report generation. In: CVPR, (2023)
Google Scholar
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out (2004)
Google Scholar
Liu, F., Ge, S., Zou, Y., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. arXiv preprint arXiv:2206.14579 (2022)
Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: CVPR (2021)
Google Scholar
Liu, F., Yin, C., Wu, X., Ge, S., Zou, Y., Zhang, P., Sun, X.: Contrastive attention for automatic chest x-ray report generation. arXiv preprint arXiv:2106.06965 (2021)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In: CVPR (2017)
Google Scholar
Ma, S., Han, Y.: Describing images by feeding lstm with structural words. In: ICME (2016)
Google Scholar
Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks. arXiv preprint arXiv:1412.6632 (2014)
Pan, R., Ran, R., Hu, W., Zhang, W., Qin, Q., Cui, S.: S3-net: A self-supervised dual-stream network for radiology report generation. IEEE Journal of Biomedical and Health Informatics (2023)
Google Scholar
Qin, H., Song, Y.: Reinforced cross-modal alignment for radiology report generation. In: ACL (2022)
Google Scholar
Shang, C., Cui, S., Li, T., Wang, X., Li, Y., Jiang, J.: Matnet: Exploiting multi-modal features for radiology report generation. IEEE Signal Processing Letters (2022)
Google Scholar
Song, Z., Zhou, X.: Exploring explicit and implicit visual relationships for image captioning. In: ICME (2021)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems (2017)
Google Scholar
Vedantam, R., Lawrence Zitnick, C., Parikh, D.: Cider: Consensus-based image description evaluation. In: CVPR (2015)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: A neural image caption generator. In: CVPR (2015)
Google Scholar
Wang, J., Bhalerao, A., He, Y.: Cross-modal prototype driven network for radiology report generation. In: ECCV (2022)
Google Scholar
Wang, Z., Tang, M., Wang, L., Li, X., Zhou, L.: A medical semantic-assisted transformer for radiographic report generation. In: MICCAI (2022)
Google Scholar
Xu, K., Wang, H., Tang, P.: Image captioning with deep lstm based on sequential residual. In: ICME (2017)
Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning (2015)
Google Scholar
Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S.K., Xiao, L.: Radiology report generation with a learned knowledge base and multi-modal alignment. Medical Image Analysis (2023)
Google Scholar
Yang, S., Wu, X., Ge, S., Zhou, S.K., Xiao, L.: Knowledge matters: Chest radiology report generation with general and specific knowledge. Medical image analysis (2022)
Google Scholar
You, D., Liu, F., Ge, S., Xie, X., Zhang, J., Wu, X.: Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. In: MICCAI (2021)
Google Scholar
Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., Wu, Y.: Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022)
Zhang, X., Sun, X., Luo, Y., Ji, J., Zhou, Y., Wu, Y., Huang, F., Ji, R.: Rstnet: Captioning with adaptive attention on visual and non-visual words. In: CVPR (2021)
Google Scholar
Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A., Xu, D.: When radiology report generation meets knowledge graph. In: AAAI (2020)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (62306054), the Natural Science Foundation Project of Chongqing Science and Technology Bureau (CSTB2022NSCQ-MSX1206), Chongqing Postgraduate Scientific Research Innovation Project (CYS23406), the Key Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-K202200510). The Technology Foresight and System Innovation Project of Chongqing Municipal Science and Technology Bureau (CSTB2022TFII-OFX0042).

Author information

Authors and Affiliations

Chongqing Normal University, Chongqing, China
ZhiPeng Xiang & ShaoGuo Cui
Huazhong University of Science and Technology, Wuhan, China
CaoZhi Shang
Michigan Technological University, Houghton, USA
Jingfeng Jiang
The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
Liqiang Zhang

Authors

ZhiPeng Xiang
View author publications
You can also search for this author in PubMed Google Scholar
ShaoGuo Cui
View author publications
You can also search for this author in PubMed Google Scholar
CaoZhi Shang
View author publications
You can also search for this author in PubMed Google Scholar
Jingfeng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Liqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to ShaoGuo Cui .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

We declare no competing interests.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiang, Z., Cui, S., Shang, C., Jiang, J., Zhang, L. (2024). GMoD: Graph-Driven Momentum Distillation Framework with Active Perception of Disease Severity for Radiology Report Generation. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15005. Springer, Cham. https://doi.org/10.1007/978-3-031-72086-4_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-72086-4_28
Published: 04 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72085-7
Online ISBN: 978-3-031-72086-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

GMoD: Graph-Driven Momentum Distillation Framework with Active Perception of Disease Severity for Radiology Report Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DMR\(^2\)G: diffusion model for radiology report generation

Generating Chinese Radiology Reports from X-Ray Images: A Public Dataset and an X-ray-to-Reports Generation Method

Multivariate Cooperative Game for Image-Report Pairs: Hierarchical Semantic Alignment for Medical Report Generation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

GMoD: Graph-Driven Momentum Distillation Framework with Active Perception of Disease Severity for Radiology Report Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DMR\(^2\)G: diffusion model for radiology report generation

Generating Chinese Radiology Reports from X-Ray Images: A Public Dataset and an X-ray-to-Reports Generation Method

Multivariate Cooperative Game for Image-Report Pairs: Hierarchical Semantic Alignment for Medical Report Generation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation