Search | arXiv e-print repository

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

Authors: Jingye Yang, Cong Liu, Wendy Deng, Da Wu, Chunhua Weng, Yunyun Zhou, Kai Wang

Abstract: We hypothesize that large language models (LLMs) based on the transformer architecture can enable automated detection of clinical phenotype terms, including terms not documented in the HPO. In this study, we developed two types of models: PhenoBCBERT, a BERT-based model, utilizing Bio+Clinical BERT as its pre-trained model, and PhenoGPT, a GPT-based model that can be initialized from diverse GPT m… ▽ More We hypothesize that large language models (LLMs) based on the transformer architecture can enable automated detection of clinical phenotype terms, including terms not documented in the HPO. In this study, we developed two types of models: PhenoBCBERT, a BERT-based model, utilizing Bio+Clinical BERT as its pre-trained model, and PhenoGPT, a GPT-based model that can be initialized from diverse GPT models, including open-source versions such as GPT-J, Falcon, and LLaMA, as well as closed-source versions such as GPT-3 and GPT-3.5. We compared our methods with PhenoTagger, a recently developed HPO recognition tool that combines rule-based and deep learning methods. We found that our methods can extract more phenotype concepts, including novel ones not characterized by HPO. We also performed case studies on biomedical literature to illustrate how new phenotype information can be recognized and extracted. We compared current BERT-based versus GPT-based models for phenotype tagging, in multiple aspects including model architecture, memory usage, speed, accuracy, and privacy protection. We also discussed the addition of a negation step and an HPO normalization layer to the transformer models for improved HPO term tagging. In conclusion, PhenoBCBERT and PhenoGPT enable the automated discovery of phenotype terms from clinical notes and biomedical literature, facilitating automated downstream tasks to derive new biological insights on human diseases. △ Less

Submitted 9 November, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

arXiv:2307.10181 [pdf, other]

Community-Aware Transformer for Autism Prediction in fMRI Connectome

Authors: Anushree Bannadabhavi, Soojin Lee, Wenlong Deng, Xiaoxiao Li

Abstract: Autism spectrum disorder(ASD) is a lifelong neurodevelopmental condition that affects social communication and behavior. Investigating functional magnetic resonance imaging (fMRI)-based brain functional connectome can aid in the understanding and diagnosis of ASD, leading to more effective treatments. The brain is modeled as a network of brain Regions of Interest (ROIs), and ROIs form communities… ▽ More Autism spectrum disorder(ASD) is a lifelong neurodevelopmental condition that affects social communication and behavior. Investigating functional magnetic resonance imaging (fMRI)-based brain functional connectome can aid in the understanding and diagnosis of ASD, leading to more effective treatments. The brain is modeled as a network of brain Regions of Interest (ROIs), and ROIs form communities and knowledge of these communities is crucial for ASD diagnosis. On the one hand, Transformer-based models have proven to be highly effective across several tasks, including fMRI connectome analysis to learn useful representations of ROIs. On the other hand, existing transformer-based models treat all ROIs equally and overlook the impact of community-specific associations when learning node embeddings. To fill this gap, we propose a novel method, Com-BrainTF, a hierarchical local-global transformer architecture that learns intra and inter-community aware node embeddings for ASD prediction task. Furthermore, we avoid over-parameterization by sharing the local transformer parameters for different communities but optimize unique learnable prompt tokens for each community. Our model outperforms state-of-the-art (SOTA) architecture on ABIDE dataset and has high interpretability, evident from the attention module. Our code is available at https://github.com/ubc-tea/Com-BrainTF. △ Less

Submitted 24 June, 2023; originally announced July 2023.

Comments: Accepted by 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023)

arXiv:2107.11740 [pdf, other]

Identifying the fragment structure of the organic compounds by deeply learning the original NMR data

Authors: Chongcan Li, Yong Cong, Weihua Deng

Abstract: We preprocess the raw NMR spectrum and extract key characteristic features by using two different methodologies, called equidistant sampling and peak sampling for subsequent substructure pattern recognition; meanwhile may provide the alternative strategy to address the imbalance issue of the NMR dataset frequently encountered in dataset collection of statistical modeling and establish two conventi… ▽ More We preprocess the raw NMR spectrum and extract key characteristic features by using two different methodologies, called equidistant sampling and peak sampling for subsequent substructure pattern recognition; meanwhile may provide the alternative strategy to address the imbalance issue of the NMR dataset frequently encountered in dataset collection of statistical modeling and establish two conventional SVM and KNN models to assess the capability of two feature selection, respectively. Our results in this study show that the models using the selected features of peak sampling outperform the ones using the other. Then we build the Recurrent Neural Network (RNN) model trained by Data B collected from peak sampling. Furthermore, we illustrate the easier optimization of hyper parameters and the better generalization ability of the RNN deep learning model by comparison with traditional machine learning SVM and KNN models in detail. △ Less

Submitted 25 July, 2021; originally announced July 2021.

Comments: 12 pages, 8 figures

arXiv:2004.09775 [pdf, ps, other]

doi 10.1103/PhysRevE.101.062127

Lévy walk dynamics in an external harmonic potential

Authors: Pengbo Xu, Tian Zhou, Ralf Metzler, Weihua Deng

Abstract: Lévy walks (LWs) are spatiotemporally coupled random-walk processes describing superdiffusive heat conduction in solids, propagation of light in disordered optical materials, motion of molecular motors in living cells, or motion of animals, humans, robots, and viruses. We here investigate a key feature of LWs, their response to an external harmonic potential. In this generic setting for confined m… ▽ More Lévy walks (LWs) are spatiotemporally coupled random-walk processes describing superdiffusive heat conduction in solids, propagation of light in disordered optical materials, motion of molecular motors in living cells, or motion of animals, humans, robots, and viruses. We here investigate a key feature of LWs, their response to an external harmonic potential. In this generic setting for confined motion we demonstrate that LWs equilibrate exponentially and may assume a bimodal stationary distribution. We also show that the stationary distribution has a horizontal slope next to a reflecting boundary placed at the origin, in contrast to correlated superdiffusive processes. Our results generalize LWs to confining forces and settle some long-standing puzzles around LWs. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: 13 pages, 5 figures, RevTeX

Journal ref: Phys. Rev. E 101, 062127 (2020)

Showing 1–4 of 4 results for author: Deng, W