ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

Xu, Shawn; Yang, Lin; Kelly, Christopher; Sieniek, Marcin; Kohlberger, Timo; Ma, Martin; Weng, Wei-Hung; Kiraly, Atilla; Kazemzadeh, Sahar; Melamed, Zakkai; Park, Jungyeon; Strachan, Patricia; Liu, Yun; Lau, Chuck; Singh, Preeti; Chen, Christina; Etemadi, Mozziyar; Kalidindi, Sreenivasa Raju; Matias, Yossi; Chou, Katherine; Corrado, Greg S.; Shetty, Shravya; Tse, Daniel; Prabhakara, Shruthi; Golden, Daniel; Pilgrim, Rory; Eswaran, Krish; Sellergren, Andrew

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.01317 (cs)

[Submitted on 2 Aug 2023 (v1), last revised 7 Sep 2023 (this version, v2)]

Title:ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

View PDF

Abstract:In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Cite as:	arXiv:2308.01317 [cs.CV]
	(or arXiv:2308.01317v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.01317

Submission history

From: Andrew Sellergren [view email]
[v1] Wed, 2 Aug 2023 17:59:45 UTC (2,530 KB)
[v2] Thu, 7 Sep 2023 23:07:51 UTC (2,513 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators