GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records
Authors:
Xi Yang,
Aokun Chen,
Nima PourNejatian,
Hoo Chang Shin,
Kaleb E Smith,
Christopher Parisien,
Colin Compas,
Cheryl Martin,
Mona G Flores,
Ying Zhang,
Tanja Magoc,
Christopher A Harle,
Gloria Lipori,
Duane A Mitchell,
William R Hogan,
Elizabeth A Shenkman,
Jiang Bian,
Yonghui Wu
Abstract:
There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is compar…
▽ More
There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model - GatorTron - using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on 5 clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve 5 clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og.
△ Less
Submitted 16 December, 2022; v1 submitted 2 February, 2022;
originally announced March 2022.
Federated Learning for Breast Density Classification: A Real-World Implementation
Authors:
Holger R. Roth,
Ken Chang,
Praveer Singh,
Nir Neumark,
Wenqi Li,
Vikash Gupta,
Sharut Gupta,
Liangqiong Qu,
Alvin Ihsani,
Bernardo C. Bizzo,
Yuhong Wen,
Varun Buch,
Meesam Shah,
Felipe Kitamura,
Matheus Mendonça,
Vitor Lavor,
Ahmed Harouni,
Colin Compas,
Jesse Tetreault,
Prerna Dogra,
Yan Cheng,
Selnur Erdal,
Richard White,
Behrooz Hashemian,
Thomas Schultz
, et al. (18 additional authors not shown)
Abstract:
Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Report…
▽ More
Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Reporting & Data System (BI-RADS). We show that despite substantial differences among the datasets from all sites (mammography system, class distribution, and data set size) and without centralizing data, we can successfully train AI models in federation. The results show that models trained using FL perform 6.3% on average better than their counterparts trained on an institute's local data alone. Furthermore, we show a 45.8% relative improvement in the models' generalizability when evaluated on the other participating sites' testing data.
△ Less
Submitted 20 October, 2020; v1 submitted 3 September, 2020;
originally announced September 2020.