Search | arXiv e-print repository

Advancing Multimodal Medical Capabilities of Gemini

Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2310.13259 [pdf]

Domain-specific optimization and diverse evaluation of self-supervised models for histopathology

Authors: Jeremy Lai, Faruk Ahmed, Supriya Vijay, Tiam Jaroensri, Jessica Loo, Saurabh Vyawahare, Saloni Agarwal, Fayaz Jamil, Yossi Matias, Greg S. Corrado, Dale R. Webster, Jonathan Krause, Yun Liu, Po-Hsuan Cameron Chen, Ellery Wulczyn, David F. Steiner

Abstract: Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential… ▽ More Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential to reduce the data, compute, and technical expertise necessary to develop task-specific deep learning models with the required level of model performance. In this work, we describe the development and evaluation of foundation models for histopathology via self-supervised learning (SSL). We first establish a diverse set of benchmark tasks involving 17 unique tissue types and 12 unique cancer types and spanning different optimal magnifications and task types. Next, we use this benchmark to explore and evaluate histopathology-specific SSL methods followed by further evaluation on held out patch-level and weakly supervised tasks. We found that standard SSL methods thoughtfully applied to histopathology images are performant across our benchmark tasks and that domain-specific methodological improvements can further increase performance. Our findings reinforce the value of using domain-specific SSL methods in pathology, and establish a set of high quality foundation models to enable further research across diverse applications. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 4 main tables, 3 main figures, additional supplemental tables and figures

arXiv:2301.03551 [pdf, other]

doi 10.1016/j.iot.2023.100691

A Lightweight Blockchain and Fog-enabled Secure Remote Patient Monitoring System

Authors: Omar Cheikhrouhou, Khaleel Mershad, Faisal Jamil, Redowan Mahmud, Anis Koubaa, Sanaz Rahimi Moosavi

Abstract: IoT has enabled the rapid growth of smart remote healthcare applications. These IoT-based remote healthcare applications deliver fast and preventive medical services to patients at risk or with chronic diseases. However, ensuring data security and patient privacy while exchanging sensitive medical data among medical IoT devices is still a significant concern in remote healthcare applications. Alte… ▽ More IoT has enabled the rapid growth of smart remote healthcare applications. These IoT-based remote healthcare applications deliver fast and preventive medical services to patients at risk or with chronic diseases. However, ensuring data security and patient privacy while exchanging sensitive medical data among medical IoT devices is still a significant concern in remote healthcare applications. Altered or corrupted medical data may cause wrong treatment and create grave health issues for patients. Moreover, current remote medical applications' efficiency and response time need to be addressed and improved. Considering the need for secure and efficient patient care, this paper proposes a lightweight Blockchain-based and Fog-enabled remote patient monitoring system that provides a high level of security and efficient response time. Simulation results and security analysis show that the proposed lightweight blockchain architecture fits the resource-constrained IoT devices well and is secure against attacks. Moreover, the augmentation of Fog computing improved the responsiveness of the remote patient monitoring system by 40%. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: 32 pages, 13 figures, 5 tables, accepted by Elsevier "Internet of Things; Engineering Cyber Physical Human Systems" journal on January 9, 2023

arXiv:2207.11504 [pdf]

Intelligent 3D Network Protocol for Multimedia Data Classification using Deep Learning

Authors: Arslan Syed, Eman A. Aldhahri, Muhammad Munawar Iqbal, Abid Ali, Ammar Muthanna, Harun Jamil, Faisal Jamil

Abstract: In videos, the human's actions are of three-dimensional (3D) signals. These videos investigate the spatiotemporal knowledge of human behavior. The promising ability is investigated using 3D convolution neural networks (CNNs). The 3D CNNs have not yet achieved high output for their well-established two-dimensional (2D) equivalents in still photographs. Board 3D Convolutional Memory and Spatiotempor… ▽ More In videos, the human's actions are of three-dimensional (3D) signals. These videos investigate the spatiotemporal knowledge of human behavior. The promising ability is investigated using 3D convolution neural networks (CNNs). The 3D CNNs have not yet achieved high output for their well-established two-dimensional (2D) equivalents in still photographs. Board 3D Convolutional Memory and Spatiotemporal fusion face training difficulty preventing 3D CNN from accomplishing remarkable evaluation. In this paper, we implement Hybrid Deep Learning Architecture that combines STIP and 3D CNN features to enhance the performance of 3D videos effectively. After implementation, the more detailed and deeper charting for training in each circle of space-time fusion. The training model further enhances the results after handling complicated evaluations of models. The video classification model is used in this implemented model. Intelligent 3D Network Protocol for Multimedia Data Classification using Deep Learning is introduced to further understand spacetime association in human endeavors. In the implementation of the result, the well-known dataset, i.e., UCF101 to, evaluates the performance of the proposed hybrid technique. The results beat the proposed hybrid technique that substantially beats the initial 3D CNNs. The results are compared with state-of-the-art frameworks from literature for action recognition on UCF101 with an accuracy of 95%. △ Less

Submitted 23 July, 2022; originally announced July 2022.

Comments: 21 pages, 10 figures

ACM Class: I.2.11; H.4; C.2.2

arXiv:2109.01202 [pdf, other]

doi 10.1145/3472749.3474768

NavStick: Making Video Games Blind-Accessible via the Ability to Look Around

Authors: Vishnu Nair, Jay L. Karp, Samuel Silverman, Mohar Kalra, Hollis Lehv, Faizan Jamil, Brian A. Smith

Abstract: Video games remain largely inaccessible to visually impaired people (VIPs). Today's blind-accessible games are highly simplified renditions of what sighted players enjoy, and they do not give VIPs the same freedom to look around and explore game worlds on their own terms. In this work, we introduce NavStick, an audio-based tool for looking around within virtual environments, with the aim of making… ▽ More Video games remain largely inaccessible to visually impaired people (VIPs). Today's blind-accessible games are highly simplified renditions of what sighted players enjoy, and they do not give VIPs the same freedom to look around and explore game worlds on their own terms. In this work, we introduce NavStick, an audio-based tool for looking around within virtual environments, with the aim of making 3D adventure video games more blind-accessible. NavStick repurposes a game controller's thumbstick to allow VIPs to survey what is around them via line-of-sight. In a user study, we compare NavStick with traditional menu-based surveying for different navigation tasks and find that VIPs were able to form more accurate mental maps of their environment with NavStick than with menu-based surveying. In an additional exploratory study, we investigate NavStick in the context of a representative 3D adventure game. Our findings reveal several implications for blind-accessible games, and we close by discussing these. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: 14 pages with 13 figures and 2 tables

Journal ref: Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology (UIST '21), October 2021

Showing 1–5 of 5 results for author: Jamil, F