-
pVACview: an interactive visualization tool for efficient neoantigen prioritization and selection
Authors:
Huiming Xia,
My Hoang,
Evelyn Schmidt,
Susanna Kiwala,
Joshua McMichael,
Zachary L. Skidmore,
Bryan Fisk,
Jonathan J. Song,
Jasreet Hundal,
Thomas Mooney,
Jason R. Walker,
S. Peter Goedegebuure,
Christopher A. Miller,
William E. Gillanders,
Obi L. Griffith,
Malachi Griffith
Abstract:
Neoantigen targeting therapies including personalized vaccines have shown promise in the treatment of cancers. Accurate identification/prioritization of neoantigens is highly relevant to designing clinical trials, predicting treatment response, and understanding mechanisms of resistance. With the advent of massively parallel sequencing technologies, it is now possible to predict neoantigens based…
▽ More
Neoantigen targeting therapies including personalized vaccines have shown promise in the treatment of cancers. Accurate identification/prioritization of neoantigens is highly relevant to designing clinical trials, predicting treatment response, and understanding mechanisms of resistance. With the advent of massively parallel sequencing technologies, it is now possible to predict neoantigens based on patient-specific variant information. However, numerous factors must be considered when prioritizing neoantigens for use in personalized therapies. Complexities such as alternative transcript annotations, various binding, presentation and immunogenicity prediction algorithms, and variable peptide lengths/registers all potentially impact the neoantigen selection process. While computational tools generate numerous algorithmic predictions for neoantigen characterization, results from these pipelines are difficult to navigate and require extensive knowledge of the underlying tools for accurate interpretation. Due to the intricate nature and number of salient neoantigen features, presenting all relevant information to facilitate candidate selection for downstream applications is a difficult challenge that current tools fail to address. We have created pVACview, the first interactive tool designed to aid in the prioritization and selection of neoantigen candidates for personalized neoantigen therapies. pVACview has a user-friendly and intuitive interface where users can upload, explore, select and export their neoantigen candidates. The tool allows users to visualize candidates using variant, transcript and peptide information. pVACview will allow researchers to analyze and prioritize neoantigen candidates with greater efficiency and accuracy in basic and translational settings. The application is available as part of the pVACtools pipeline at pvactools.org and as an online server at pvacview.org.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding
Authors:
Yiqing Shen,
Zan Chen,
Michail Mamalakis,
Luhan He,
Haiyang Xia,
Tianbin Li,
Yanzhou Su,
Junjun He,
Yu Guang Wang
Abstract:
The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have…
▽ More
The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have then attempted to adapt LLMs for protein understanding by integrating a protein sequence encoder with a pre-trained LLM. However, this adaptation raises a fundamental question: "Can LLMs, originally designed for NLP, effectively comprehend protein sequences as a form of language?" Current datasets fall short in addressing this question due to the lack of a direct correlation between protein sequences and corresponding text descriptions, limiting the ability to train and evaluate LLMs for protein understanding effectively. To bridge this gap, we introduce ProteinLMDataset, a dataset specifically designed for further self-supervised pretraining and supervised fine-tuning (SFT) of LLMs to enhance their capability for protein sequence comprehension. Specifically, ProteinLMDataset includes 17.46 billion tokens for pretraining and 893,000 instructions for SFT. Additionally, we present ProteinLMBench, the first benchmark dataset consisting of 944 manually verified multiple-choice questions for assessing the protein understanding capabilities of LLMs. ProteinLMBench incorporates protein-related details and sequences in multiple languages, establishing a new standard for evaluating LLMs' abilities in protein comprehension. The large language model InternLM2-7B, pretrained and fine-tuned on the ProteinLMDataset, outperforms GPT-4 on ProteinLMBench, achieving the highest accuracy score.
△ Less
Submitted 8 July, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
Interpretable (not just posthoc-explainable) heterogeneous survivor bias-corrected treatment effects for assignment of postdischarge interventions to prevent readmissions
Authors:
Hongjing Xia,
Joshua C. Chang,
Sarah Nowak,
Sonya Mahajan,
Rohit Mahajan,
Ted L. Chang,
Carson C. Chow
Abstract:
We used survival analysis to quantify the impact of postdischarge evaluation and management (E/M) services in preventing hospital readmission or death. Our approach avoids a specific pitfall of applying machine learning to this problem, which is an inflated estimate of the effect of interventions, due to survivors bias -- where the magnitude of inflation may be conditional on heterogeneous confoun…
▽ More
We used survival analysis to quantify the impact of postdischarge evaluation and management (E/M) services in preventing hospital readmission or death. Our approach avoids a specific pitfall of applying machine learning to this problem, which is an inflated estimate of the effect of interventions, due to survivors bias -- where the magnitude of inflation may be conditional on heterogeneous confounders in the population. This bias arises simply because in order to receive an intervention after discharge, a person must not have been readmitted in the intervening period. After deriving an expression for this phantom effect, we controlled for this and other biases within an inherently interpretable Bayesian survival framework. We identified case management services as being the most impactful for reducing readmissions overall.
△ Less
Submitted 3 August, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Tuning to non-veridical features in attention and perceptual decision-making
Authors:
Stefanie I. Becker,
Zachary Hamblin-Frohman,
Hongfeng Xia,
Zeguo Qiu
Abstract:
When searching for a lost item, we tune attention to the known properties of the object. Previously, it was believed that attention is tuned to the veridical attributes of the search target (e.g., orange), or an attribute that is slightly shifted away from irrelevant features towards a value that can more optimally distinguish the target from the distractors (e.g., red-orange; optimal tuning). How…
▽ More
When searching for a lost item, we tune attention to the known properties of the object. Previously, it was believed that attention is tuned to the veridical attributes of the search target (e.g., orange), or an attribute that is slightly shifted away from irrelevant features towards a value that can more optimally distinguish the target from the distractors (e.g., red-orange; optimal tuning). However, recent studies showed that attention is often tuned to the relative feature of the search target (e.g., redder), so that all items that match the relative features of the target equally attract attention (e.g., all redder items; relational account). Optimal tuning was shown to occur only at a later stage of identifying the target. However, the evidence for this division mainly relied on eye tracking studies that assessed the first eye movements. The present study tested whether this division can also be observed when the task is completed with covert attention and without moving the eyes. We used the N2pc in the EEG of participants to assess covert attention, and found comparable results: Attention was initially tuned to the relative colour of the target, as shown by a significantly larger N2pc to relatively matching distractors than a target-coloured distractor. However, in the response accuracies, a slightly shifted, "optimal" distractor interfered most strongly with target identification. These results confirm that early (covert) attention is tuned to the relative properties of an item, in line with the relational account, while later decision-making processes may be biased to optimal features.
△ Less
Submitted 28 February, 2023;
originally announced March 2023.
-
MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis
Authors:
Hanshu Cai,
Yiwen Gao,
Shuting Sun,
Na Li,
Fuze Tian,
Han Xiao,
Jianxiu Li,
Zhengwu Yang,
Xiaowei Li,
Qinglin Zhao,
Zhenyu Liu,
Zhijun Yao,
Minqiang Yang,
Hong Peng,
Jing Zhu,
Xiaowei Zhang,
Guoping Gao,
Fang Zheng,
Rui Li,
Zhihua Guo,
Rong Ma,
Jing Yang,
Lan Zhang,
Xiping Hu,
Yumin Li
, et al. (1 additional authors not shown)
Abstract:
According to the World Health Organization, the number of mental disorder patients, especially depression patients, has grown rapidly and become a leading contributor to the global burden of disease. However, the present common practice of depression diagnosis is based on interviews and clinical scales carried out by doctors, which is not only labor-consuming but also time-consuming. One important…
▽ More
According to the World Health Organization, the number of mental disorder patients, especially depression patients, has grown rapidly and become a leading contributor to the global burden of disease. However, the present common practice of depression diagnosis is based on interviews and clinical scales carried out by doctors, which is not only labor-consuming but also time-consuming. One important reason is due to the lack of physiological indicators for mental disorders. With the rising of tools such as data mining and artificial intelligence, using physiological data to explore new possible physiological indicators of mental disorder and creating new applications for mental disorder diagnosis has become a new research hot topic. However, good quality physiological data for mental disorder patients are hard to acquire. We present a multi-modal open dataset for mental-disorder analysis. The dataset includes EEG and audio data from clinically depressed patients and matching normal controls. All our patients were carefully diagnosed and selected by professional psychiatrists in hospitals. The EEG dataset includes not only data collected using traditional 128-electrodes mounted elastic cap, but also a novel wearable 3-electrode EEG collector for pervasive applications. The 128-electrodes EEG signals of 53 subjects were recorded as both in resting state and under stimulation; the 3-electrode EEG signals of 55 subjects were recorded in resting state; the audio data of 52 subjects were recorded during interviewing, reading, and picture description. We encourage other researchers in the field to use it for testing their methods of mental-disorder analysis.
△ Less
Submitted 4 March, 2020; v1 submitted 20 February, 2020;
originally announced February 2020.
-
Unifying Modular and Core-Periphery Structure in Functional Brain Networks over Development
Authors:
Shi Gu,
Cedric Huchuan Xia,
Rastko Ciric,
Tyler M. Moore,
Ruben C. Gur,
Raquel E. Gur,
Theodore D. Satterthwaite,
Danielle S. Bassett
Abstract:
At rest, human brain functional networks display striking modular architecture in which coherent clusters of brain regions are activated. The modular account of brain function is pervasive, reliable, and reproducible. Yet, a complementary perspective posits a core-periphery or rich-club account of brain function, where hubs are densely interconnected with one another, allowing for integrative proc…
▽ More
At rest, human brain functional networks display striking modular architecture in which coherent clusters of brain regions are activated. The modular account of brain function is pervasive, reliable, and reproducible. Yet, a complementary perspective posits a core-periphery or rich-club account of brain function, where hubs are densely interconnected with one another, allowing for integrative processing. Unifying these two perspectives has remained difficult due to the fact that the methodological tools to identify modules are entirely distinct from the methodological tools to identify core-periphery structure. Here we leverage a recently-developed model-based approach -- the weighted stochastic block model -- that simultaneously uncovers modular and core-periphery structure, and we apply it to fMRI data acquired at rest in 872 youth of the Philadelphia Neurodevelopmental Cohort. We demonstrate that functional brain networks display rich meso-scale organization beyond that sought by modularity maximization techniques. Moreover, we show that this meso-scale organization changes appreciably over the course of neurodevelopment, and that individual differences in this organization predict individual differences in cognition more accurately than module organization alone. Broadly, our study provides a unified assessment of modular and core-periphery structure in functional brain networks, providing novel insights into their development and implications for behavior.
△ Less
Submitted 4 April, 2019; v1 submitted 30 March, 2019;
originally announced April 2019.
-
Reconstruction of the evolutionary history of gene gains and losses since the last universal common ancestor
Authors:
Haiming Tang,
Paul Thomas,
Haoran Xia
Abstract:
Gene gains and losses have shaped the gene repertoire of species since the universal last common ancestor to species today. Genes in extant species were gained at different historical times via de novo creation of new genes, duplication of existing genes or transfer from genes of another species (HGT), and get lost gradually. With the increasing number of sequenced genomes, some comparative analys…
▽ More
Gene gains and losses have shaped the gene repertoire of species since the universal last common ancestor to species today. Genes in extant species were gained at different historical times via de novo creation of new genes, duplication of existing genes or transfer from genes of another species (HGT), and get lost gradually. With the increasing number of sequenced genomes, some comparative analyses have been done to quantify the evolutionary history of gene gains and losses in restricted lineages like vertebrates, insects, fungi, plants and so on. Here, we have constructed and analyzed over 10,000 gene family trees to reconstruct the gene content of ancestral genomes at an unprecedented scale, covering hundreds of genomes across all domains of life. This is the most comprehensive genome-wide analysis of all events in gene evolutionary histories. We find that our results are largely consistent with earlier, less complete comparative studies on specific lineages such as the vertebrates, but find significant differences especially in recent evolutionary histories. We find that the rate of gene gain varies widely among branches of the species tree, and find that some periods of rapid gene duplication are associated with great extinctions in geological history.
△ Less
Submitted 16 February, 2018;
originally announced February 2018.
-
Atomistic simulation of the Coupled adsorption and unfolding of protein GB1 on the polystyrenes nanoparticle surface
Authors:
Huifang Xiao,
Bin Huang,
Ge Yao,
Wenbin Kang,
Sheng Gong,
Hai Pan,
Yi Cao,
Jun Wang,
Jian Zhang,
Wei Wang
Abstract:
Protein adsorption/desorption upon nanoparticle surfaces is an important process to understand for developing new nanotechnology involving biomaterials, while atomistic picture of the process and its coupling with protein conformational change is lacking. Here we report our study on the adsorption of protein GB1 upon a polystyrene nanoparticle surface using atomistic molecular dynamic simulations.…
▽ More
Protein adsorption/desorption upon nanoparticle surfaces is an important process to understand for developing new nanotechnology involving biomaterials, while atomistic picture of the process and its coupling with protein conformational change is lacking. Here we report our study on the adsorption of protein GB1 upon a polystyrene nanoparticle surface using atomistic molecular dynamic simulations. Enabled by metadynamics, we explored the relevant phase space and identified three protein states; each protein state involved both the adsorbed and desorbed states. We also studied the change of secondary and tertiary structures of GB1 during adsorption, and the dominant interactions between protein and surface in different adsorbing stages. From the simulation results we obtained a scenario that is more rational and complete than the conventional one. We believe the new scenario is more appropriate as a theoretical model in understanding and explaining experimental signals. Introduction
△ Less
Submitted 9 November, 2017;
originally announced November 2017.
-
Mechanical analysis of pimple growth and pain level characterization
Authors:
Xiangbiao Liao,
Xiaobin Deng,
LiangLiang Zhu,
Feng Hao,
Hang Xiao,
Xiaoyang Shi,
Xi Chen
Abstract:
Pimple is one of the most common skin diseases for humans. The mechanical modeling of pimple growth is very limited. A finite element model is developed to quantify the deformation field with the expansion of follicle, and then the mechanical stimulus is related to the sensation of pain during the development of pimple. Through these models, parametric studies show the dependence of mechanical sti…
▽ More
Pimple is one of the most common skin diseases for humans. The mechanical modeling of pimple growth is very limited. A finite element model is developed to quantify the deformation field with the expansion of follicle, and then the mechanical stimulus is related to the sensation of pain during the development of pimple. Through these models, parametric studies show the dependence of mechanical stimulus and pain level on the pimple-surrounded structures, follicle depth and mechanical properties of the epidermis. The findings in this paper may provide useful insights on prevention or pain relief of pimples, as well as those related to cosmetics and other tissue growth.
△ Less
Submitted 29 March, 2017;
originally announced March 2017.