keynote

Automated Medical Report Generation and Visual Question Answering

Author:

Luping ZhouAuthors Info & Claims

MCHM'24: Proceedings of the 1st International Workshop on Multimedia Computing for Health and Medicine

Pages 3 - 4

https://doi.org/10.1145/3688868.3689189

Published: 31 October 2024 Publication History

Get Access

Abstract

The rapid growth of medical imaging data has far outpaced the availability of trained radiologists, significantly increasing their workload. To alleviate this burden, reduce diagnostic errors, and streamline clinical workflows, the need for automated medical diagnostic report generation has become more urgent than ever. However, this task is particularly challenging, as it requires the ability to capture and describe clinically significant fine-grained visual differences in highly similar medical images. Additionally, critical disease-related keywords can easily be overshadowed by the prevalence of similar phrases describing common image content. Moreover, generating comprehensive reports that detail both normal and pathological findings within images adds to the complexity.

In this presentation, I will showcase our latest research on automated medical diagnostic report generation and medical visual question answering, highlighting how we have tackled these challenges. Our work has transitioned from traditional encoder-decoder models to cutting-edge approaches utilizing large language models (LLMs). I will also discuss the current limitations of these methods and propose potential future directions.

Specifically, I will present two methods we developed before the advent of pretrained LLMs, which enhance fine-grained recognition for medical report generation from different angles. The first is a self-boosting framework designed to learn highly correlated image and text features, enabling the model to narrate even finer visual changes in the generated reports. The second method is inspired by the 'multi-expert joint diagnosis' scenario and introduces multiple learnable 'expert' tokens into the transformer architecture, with each expert focusing on distinct image regions. These complementary perspectives are then aggregated to produce a final, more accurate report. In addition to report generation, I will also present our efforts in improving medical visual question answering (VQA).

Following this, I will introduce our recent work on integrating LLMs for medical report generation. I will outline two frameworks we developed: the first employs a frozen LLM for report generation, training only a lightweight visual alignment module to achieve state-of-the-art performance. The second framework goes a step further by integrating a knowledge graph to unlock disease-related knowledge within the LLM, thereby enhancing the clinical relevance of the generated reports. Additionally, I will share our latest investigation into GPT-4V's multimodal capabilities in chest X-ray analysis and discuss the limitations of current evaluation metrics for radiology report generation. To address these limitations, I will introduce our recently developed MRScore framework, which guides LLMs in radiology report evaluation to ensure alignment with human expert analysis.

References

[1]

Yingshu Li, Zhanyu Wang, Yunyi Liu, Lei Wang, Lingqiao Liu, and Luping Zhou. 2024. KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models. In International Conference on Medical Image Computing And Computer Assisted Intervention (MICCAI).

Google Scholar

[2]

Yunyi Liu, Yingshu Li abd Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Leyang Cui, Zhaopeng Tu, Longyue Wang, and Luping Zhou. 2024. A Systematic Evaluation of GPT-4V's Multimodal Capability for Chest X-ray Image Analysis. Meta-Radiology, Vol. 1 (2024). https://doi.org/10.1016/j.metrad.2024.100099

Crossref

Google Scholar

[3]

Yunyi Liu, Zhanyu Wang, Dong Xu, and Luping Zhou. 2023. Q2ATransformer: Improving Medical VQA by an Answer Querying Decoder. In Information Processing in Medical Imaging (IPMI).

Google Scholar

[4]

Yunyi Liu, Zhanyu Wang, Liang Xinyu Yingshu Li, Lingqiao Liu, Lei Wang, and Luping Zhou. 2024. MRScore: Evaluating Radiology Report Generation with LLM-based Reward System. In International Conference on Medical Image Computing And Computer Assisted Intervention (MICCAI).

Google Scholar

[5]

Zhanyu Wang, Lingqiao Liu, Lei Wang, and Luping Zhou. 2023. METransformer: Radiology Report Generation by Transformer with Multiple Expert Learners. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).

Google Scholar

[6]

Zhanyu Wang, Lingqiao Liu, Lei Wang, and Luping Zhou. 2023. R2GenGPT: Radiology Report Generation with frozen LLMs. Meta-Radiology, Vol. 1, 3 (2023), 41--49.

Crossref

Google Scholar

[7]

Zhanyu Wang, Luping Zhou, Lei Wang, and Xiu Li. 2021. A Self-boosting Framework for Automated Radiographic Report Generation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).

Crossref

Google Scholar

Index Terms

Automated Medical Report Generation and Visual Question Answering
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Bridging the Gap: Leveraging Textual and Visual Contexts for PreciseMedical Visual Question Answering
IC3-2024: Proceedings of the 2024 Sixteenth International Conference on Contemporary Computing

The advent of Visual Question Answering (VQA) technology has brought significant advancements in the medical field, offering transformative potential in clinical diagnostics and patient care. This research explores the application of VQA within the ...
Medical knowledge-based network for Patient-oriented Visual Question Answering
Abstract
Visual Question Answering (VQA) systems have achieved great success in general scenarios. In medical domain, VQA systems are still in their infancy as the datasets are limited by scale and application scenarios. Current medical VQA ...
Highlights
- We introduce a new Patient-oriented medical VQA dataset (P-VQA).
- P-VQA covers ...
PLMVQA: Applying Pseudo Labels for Medical Visual Question Answering with Limited Data
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 Workshops
Abstract
Different from Visual Question Answering (VQA) in the general domain, Medical VQA is more challenging due to the lack of large-scale labeled datasets. In addition, Medical VQA requires high interpretability when making decisions to answer clinical ...

Comments

Information & Contributors

Information

Published In

MCHM'24: Proceedings of the 1st International Workshop on Multimedia Computing for Health and Medicine

October 2024

85 pages

ISBN:9798400711954

DOI:10.1145/3688868

Program Chairs:
Xuequan Lu
La Trobe University, Australia
,
Wenxi Yue
University of Sydney, Australia
,
Imran Razzak
University of New South Wales, Australia
,
Kun Hu
University of Sydney, Australia
,
Jinglei Lv
University of Sydney, Australia
,
Sen Zhang
University of Sydney, Australia
,
Junhui Hou
City University of Hong Kong, China
,
Zhiyong Wang
University of Sydney, Australia
,
Jiebo Luo
University of Rochester, USA
,
Wei Xiang
La Trobe University, Australia

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2024

Check for updates

Author Tags

Qualifiers

Keynote

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
94
Total Downloads

Downloads (Last 12 months)94
Downloads (Last 6 weeks)5

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Index Terms

Recommendations

Bridging the Gap: Leveraging Textual and Visual Contexts for PreciseMedical Visual Question Answering

Medical knowledge-based network for Patient-oriented Visual Question Answering

PLMVQA: Applying Pseudo Labels for Medical Visual Question Answering with Limited Data

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations