-
Advancing Multimodal Medical Capabilities of Gemini
Authors:
Lin Yang,
Shawn Xu,
Andrew Sellergren,
Timo Kohlberger,
Yuchen Zhou,
Ira Ktena,
Atilla Kiraly,
Faruk Ahmed,
Farhad Hormozdiari,
Tiam Jaroensri,
Eric Wang,
Ellery Wulczyn,
Fayaz Jamil,
Theo Guidroz,
Chuck Lau,
Siyuan Qiao,
Yun Liu,
Akshay Goel,
Kendall Park,
Arnav Agharwal,
Nick George,
Yang Wang,
Ryutaro Tanno,
David G. T. Barrett,
Wei-Hung Weng
, et al. (22 additional authors not shown)
Abstract:
Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop…
▽ More
Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
Authors:
Shawn Xu,
Lin Yang,
Christopher Kelly,
Marcin Sieniek,
Timo Kohlberger,
Martin Ma,
Wei-Hung Weng,
Atilla Kiraly,
Sahar Kazemzadeh,
Zakkai Melamed,
Jungyeon Park,
Patricia Strachan,
Yun Liu,
Chuck Lau,
Preeti Singh,
Christina Chen,
Mozziyar Etemadi,
Sreenivasa Raju Kalidindi,
Yossi Matias,
Katherine Chou,
Greg S. Corrado,
Shravya Shetty,
Daniel Tse,
Shruthi Prabhakara,
Daniel Golden
, et al. (3 additional authors not shown)
Abstract:
In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR ach…
▽ More
In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.
△ Less
Submitted 7 September, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Robust and Efficient Medical Imaging with Self-Supervision
Authors:
Shekoofeh Azizi,
Laura Culp,
Jan Freyberg,
Basil Mustafa,
Sebastien Baur,
Simon Kornblith,
Ting Chen,
Patricia MacWilliams,
S. Sara Mahdavi,
Ellery Wulczyn,
Boris Babenko,
Megan Wilson,
Aaron Loh,
Po-Hsuan Cameron Chen,
Yuan Liu,
Pinal Bavishi,
Scott Mayer McKinney,
Jim Winkens,
Abhijit Guha Roy,
Zach Beaver,
Fiona Ryan,
Justin Krogue,
Mozziyar Etemadi,
Umesh Telang,
Yun Liu
, et al. (9 additional authors not shown)
Abstract:
Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clinical settings different from the training environment. A common mitigation strategy is to develop separate systems for each clinical setting using site-specific d…
▽ More
Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clinical settings different from the training environment. A common mitigation strategy is to develop separate systems for each clinical setting using site-specific data [1]. However, this quickly becomes impractical as medical data is time-consuming to acquire and expensive to annotate [2]. Thus, the problem of "data-efficient generalization" presents an ongoing difficulty for Medical AI development. Although progress in representation learning shows promise, their benefits have not been rigorously studied, specifically for out-of-distribution settings. To meet these challenges, we present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI. REMEDIS uses a generic combination of large-scale supervised transfer learning with self-supervised learning and requires little task-specific customization. We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data. REMEDIS exhibits significantly improved in-distribution performance with up to 11.5% relative improvement in diagnostic accuracy over a strong supervised baseline. More importantly, our strategy leads to strong data-efficient generalization of medical imaging AI, matching strong supervised baselines using between 1% to 33% of retraining data across tasks. These results suggest that REMEDIS can significantly accelerate the life-cycle of medical imaging AI development thereby presenting an important step forward for medical imaging AI to deliver broad impact.
△ Less
Submitted 3 July, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
A multi-stage machine learning model on diagnosis of esophageal manometry
Authors:
Wenjun Kou,
Dustin A. Carlson,
Alexandra J. Baumann,
Erica N. Donnan,
Jacob M. Schauer,
Mozziyar Etemadi,
John E. Pandolfino
Abstract:
High-resolution manometry (HRM) is the primary procedure used to diagnose esophageal motility disorders. Its interpretation and classification includes an initial evaluation of swallow-level outcomes and then derivation of a study-level diagnosis based on Chicago Classification (CC), using a tree-like algorithm. This diagnostic approach on motility disordered using HRM was mirrored using a multi-s…
▽ More
High-resolution manometry (HRM) is the primary procedure used to diagnose esophageal motility disorders. Its interpretation and classification includes an initial evaluation of swallow-level outcomes and then derivation of a study-level diagnosis based on Chicago Classification (CC), using a tree-like algorithm. This diagnostic approach on motility disordered using HRM was mirrored using a multi-stage modeling framework developed using a combination of various machine learning approaches. Specifically, the framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. In the swallow-level stage, three models based on convolutional neural networks (CNNs) were developed to predict swallow type, swallow pressurization, and integrated relaxation pressure (IRP). At the study-level stage, model selection from families of the expert-knowledge-based rule models, xgboost models and artificial neural network(ANN) models were conducted, with the latter two model designed and augmented with motivation from the export knowledge. A simple model-agnostic strategy of model balancing motivated by Bayesian principles was utilized, which gave rise to model averaging weighted by precision scores. The averaged (blended) models and individual models were compared and evaluated, of which the best performance on test dataset is 0.81 in top-1 prediction, 0.92 in top-2 predictions. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data. Moreover, the proposed modeling framework could be easily extended to multi-modal tasks, such as diagnosis of esophageal patients based on clinical data from both HRM and functional luminal imaging probe panometry (FLIP).
△ Less
Submitted 24 May, 2022; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Deep Learning for Distinguishing Normal versus Abnormal Chest Radiographs and Generalization to Unseen Diseases
Authors:
Zaid Nabulsi,
Andrew Sellergren,
Shahar Jamshy,
Charles Lau,
Edward Santos,
Atilla P. Kiraly,
Wenxing Ye,
Jie Yang,
Rory Pilgrim,
Sahar Kazemzadeh,
Jin Yu,
Sreenivasa Raju Kalidindi,
Mozziyar Etemadi,
Florencia Garcia-Vicente,
David Melnick,
Greg S. Corrado,
Lily Peng,
Krish Eswaran,
Daniel Tse,
Neeral Beladia,
Yun Liu,
Po-Hsuan Cameron Chen,
Shravya Shetty
Abstract:
Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to build specific systems to detect every possible conditi…
▽ More
Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to build specific systems to detect every possible condition. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For development, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system generalizes to new patient populations and abnormalities. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7-28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist.
△ Less
Submitted 29 October, 2021; v1 submitted 21 October, 2020;
originally announced October 2020.
-
BioInfoBase : A Bioinformatics Resourceome
Authors:
Saeid Kadkhodaei,
Fatemeh Barantalab,
Sima Taheri,
Majid Foroughi,
Farahnaz Golestan Hashemi,
Mahmood Reza Shabanimofrad,
Hossein Hosseinimonfared,
Morvarid Akhavan Rezaei,
Ali Ranjbarfard,
Mahbod Sahebi,
Parisa Azizi,
Maryam Dadar,
Rambod Abiri,
Mohammad Fazel Harighi,
Nahid Kalhori,
Mohammad Reza Etemadi,
Ali Baradaran,
Mahmoud Danaee,
Iman Zare,
Ahmad Ghafarpour,
Zahra Azhdari,
Hamid Rajabi Memari,
Vajiheh Safavi,
Naser Tajabadi,
Faruku Bande
Abstract:
Over the past decade there has been a significant growth in bioinformatics databases, tools and resources. Although, bioinformatics is becoming more specific, increasing the number of bioinformatics-wares has made it difficult for researchers to find the most appropriate databases, tools or methods which match their needs. Our coordinated effort has been planned to establish a reference website in…
▽ More
Over the past decade there has been a significant growth in bioinformatics databases, tools and resources. Although, bioinformatics is becoming more specific, increasing the number of bioinformatics-wares has made it difficult for researchers to find the most appropriate databases, tools or methods which match their needs. Our coordinated effort has been planned to establish a reference website in Bioinformatics as a public repository of tools, databases, directories and resources annotated with contextual information and organized by functional relevance. Within the first phase of BioInfoBase development, 22 experts in different fields of molecular biology contributed and more than 2500 records were registered, which are increasing daily. For each record submitted to the database of website almost all related data (40 features) has been extracted. These include information from the biological category and subcategory to the scientific article and developer information. Searching the query keyword(s) returns links containing the entered keyword(s) found within the different features of the records with more weights on the title, abstract and application fields. The search results simply provide the users with the most informative features of the records to select the most suitable ones. The usefulness of the returned results is ranked according to the matching score based on the Term Frequency-Inverse Document Frequency (TF-IDF) methods. Therefore, this search engine will screen a comprehensive index of bioinformatics tools, databases and resources and provide the best suited records (links) to the researchers need. The BioInfoBase resource is available at www.bioinfobase.info.
△ Less
Submitted 20 November, 2016; v1 submitted 11 July, 2016;
originally announced July 2016.