Search | arXiv e-print repository

GV-Rep: A Large-Scale Dataset for Genetic Variant Representation Learning

Authors: Zehui Li, Vallijah Subasri, Guy-Bart Stan, Yiren Zhao, Bo Wang

Abstract: Genetic variants (GVs) are defined as differences in the DNA sequences among individuals and play a crucial role in diagnosing and treating genetic diseases. The rapid decrease in next generation sequencing cost has led to an exponential increase in patient-level GV data. This growth poses a challenge for clinicians who must efficiently prioritize patient-specific GVs and integrate them with exist… ▽ More Genetic variants (GVs) are defined as differences in the DNA sequences among individuals and play a crucial role in diagnosing and treating genetic diseases. The rapid decrease in next generation sequencing cost has led to an exponential increase in patient-level GV data. This growth poses a challenge for clinicians who must efficiently prioritize patient-specific GVs and integrate them with existing genomic databases to inform patient management. To addressing the interpretation of GVs, genomic foundation models (GFMs) have emerged. However, these models lack standardized performance assessments, leading to considerable variability in model evaluations. This poses the question: How effectively do deep learning methods classify unknown GVs and align them with clinically-verified GVs? We argue that representation learning, which transforms raw data into meaningful feature spaces, is an effective approach for addressing both indexing and classification challenges. We introduce a large-scale Genetic Variant dataset, named GV-Rep, featuring variable-length contexts and detailed annotations, designed for deep learning models to learn GV representations across various traits, diseases, tissue types, and experimental contexts. Our contributions are three-fold: (i) Construction of a comprehensive dataset with 7 million records, each labeled with characteristics of the corresponding variants, alongside additional data from 17,548 gene knockout tests across 1,107 cell types, 1,808 variant combinations, and 156 unique clinically verified GVs from real-world patients. (ii) Analysis of the structure and properties of the dataset. (iii) Experimentation of the dataset with pre-trained GFMs. The results show a significant gap between GFMs current capabilities and accurate GV representation. We hope this dataset will help advance genomic deep learning to bridge this gap. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: Preprint

arXiv:2406.09159 [pdf]

ALPHAGMUT: A Rationale-Guided Alpha Shape Graph Neural Network to Evaluate Mutation Effects

Authors: Boshen Wang, Bowei Ye, Lin Xu, Jie Liang

Abstract: In silico methods evaluating the mutation effects of missense mutations are providing an important approach for understanding mutations in personal genomes and identifying disease-relevant biomarkers. However, existing methods, including deep learning methods, heavily rely on sequence-aware information, and do not fully leverage the potential of available 3D structural information. In addition, th… ▽ More In silico methods evaluating the mutation effects of missense mutations are providing an important approach for understanding mutations in personal genomes and identifying disease-relevant biomarkers. However, existing methods, including deep learning methods, heavily rely on sequence-aware information, and do not fully leverage the potential of available 3D structural information. In addition, these methods may exhibit an inability to predict mutations in domains difficult to formulate sequence-based embeddings. In this study, we introduce a novel rationale-guided graph neural network AlphaGMut to evaluate mutation effects and to distinguish pathogenic mutations from neutral mutations. We compute the alpha shapes of protein structures to obtain atomic-resolution edge connectivities and map them to an accurate residue-level graph representation. We then compute structural-, topological-, biophysical-, and sequence properties of the mutation sites, which are assigned as node attributes in the graph. These node attributes could effectively guide the graph neural network to learn the difference between pathogenic and neutral mutations using k-hop message passing with a short training period. We demonstrate that AlphaGMut outperforms state-of-the-art methods, including DeepMind's AlphaMissense, in many performance metrics. In addition, AlphaGMut has the advantage of performing well in alignment-free settings, which provides broader prediction coverage and better generalization compared to current methods requiring deep sequence-aware information. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 2 figures, 2 tables

arXiv:2404.02360 [pdf, other]

FraGNNet: A Deep Probabilistic Model for Mass Spectrum Prediction

Authors: Adamo Young, Fei Wang, David Wishart, Bo Wang, Hannes Röst, Russ Greiner

Abstract: The process of identifying a compound from its mass spectrum is a critical step in the analysis of complex mixtures. Typical solutions for the mass spectrum to compound (MS2C) problem involve matching the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to mass spectrum (C2MS) models can improve retrieval rate… ▽ More The process of identifying a compound from its mass spectrum is a critical step in the analysis of complex mixtures. Typical solutions for the mass spectrum to compound (MS2C) problem involve matching the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to mass spectrum (C2MS) models can improve retrieval rates by augmenting real libraries with predicted spectra. Unfortunately, many existing C2MS models suffer from problems with prediction resolution, scalability, or interpretability. We develop a new probabilistic method for C2MS prediction, FraGNNet, that can efficiently and accurately predict high-resolution spectra. FraGNNet uses a structured latent space to provide insight into the underlying processes that define the spectrum. Our model achieves state-of-the-art performance in terms of prediction error, and surpasses existing C2MS models as a tool for retrieval-based MS2C. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 21 pages, 4 figures, 9 tables

arXiv:2403.03425 [pdf, other]

Sculpting Molecules in 3D: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

Authors: Kaiwei Zhang, Yange Lin, Guangcheng Wu, Yuxiang Ren, Xuecang Zhang, Bo wang, Xiaoyu Zhang, Weitao Du

Abstract: The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a… ▽ More The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a practical molecular design necessitates not only meeting the diversity requirements but also addressing structural and textural constraints with various symmetries outlined by domain experts. In this article, we present an innovative approach to tackle this inverse design problem by formulating it as a multi-modality guidance generation/optimization task. Our proposed solution involves a textural-structure alignment symmetric diffusion framework for the implementation of molecular generation/optimization tasks, namely 3DToMolo. 3DToMolo aims to harmonize diverse modalities, aligning them seamlessly to produce molecular structures adhere to specified symmetric structural and textural constraints by experts in the field. Experimental trials across three guidance generation settings have shown a superior hit generation performance compared to state-of-the-art methodologies. Moreover, 3DToMolo demonstrates the capability to generate novel molecules, incorporating specified target substructures, without the need for prior knowledge. This work not only holds general significance for the advancement of deep learning methodologies but also paves the way for a transformative shift in molecular design strategies. 3DToMolo creates opportunities for a more nuanced and effective exploration of the vast chemical space, opening new frontiers in the development of molecular entities with tailored properties and functionalities. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2401.07937 [pdf, other]

Integrate Any Omics: Towards genome-wide data integration for patient stratification

Authors: Shihao Ma, Andy G. X. Zeng, Benjamin Haibe-Kains, Anna Goldenberg, John E Dick, Bo Wang

Abstract: High-throughput omics profiling advancements have greatly enhanced cancer patient stratification. However, incomplete data in multi-omics integration presents a significant challenge, as traditional methods like sample exclusion or imputation often compromise biological diversity and dependencies. Furthermore, the critical task of accurately classifying new patients with partial omics data into ex… ▽ More High-throughput omics profiling advancements have greatly enhanced cancer patient stratification. However, incomplete data in multi-omics integration presents a significant challenge, as traditional methods like sample exclusion or imputation often compromise biological diversity and dependencies. Furthermore, the critical task of accurately classifying new patients with partial omics data into existing subtypes is commonly overlooked. To address these issues, we introduce IntegrAO (Integrate Any Omics), an unsupervised framework for integrating incomplete multi-omics data and classifying new samples. IntegrAO first combines partially overlapping patient graphs from diverse omics sources and utilizes graph neural networks to produce unified patient embeddings. Our systematic evaluation across five cancer cohorts involving six omics modalities demonstrates IntegrAO's robustness to missing data and its accuracy in classifying new samples with partial profiles. An acute myeloid leukemia case study further validates its capability to uncover biological and clinical heterogeneity in incomplete datasets. IntegrAO's ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology, offering a holistic approach to patient characterization. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.06199 [pdf, other]

xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein

Authors: Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng, Chiming Liu, Aohan Zeng, Yuxiao Dong, Jie Tang, Le Song

Abstract: Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of… ▽ More Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to an advanced 3D structural prediction model that surpasses existing language model-based tools. 2) xTrimoPGLM not only can generate de novo protein sequences following the principles of natural ones, but also can perform programmable generation after supervised fine-tuning (SFT) on curated sequences. These results highlight the substantial capability and versatility of xTrimoPGLM in understanding and generating protein sequences, contributing to the evolving landscape of foundation models in protein science. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2312.00485 [pdf, other]

Backbone-based Dynamic Graph Spatio-Temporal Network for Epidemic Forecasting

Authors: Junkai Mao, Yuexing Han, Gouhei Tanaka, Bing Wang

Abstract: Accurate epidemic forecasting is a critical task in controlling disease transmission. Many deep learning-based models focus only on static or dynamic graphs when constructing spatial information, ignoring their relationship. Additionally, these models often rely on recurrent structures, which can lead to error accumulation and computational time consumption. To address the aforementioned problems,… ▽ More Accurate epidemic forecasting is a critical task in controlling disease transmission. Many deep learning-based models focus only on static or dynamic graphs when constructing spatial information, ignoring their relationship. Additionally, these models often rely on recurrent structures, which can lead to error accumulation and computational time consumption. To address the aforementioned problems, we propose a novel model called Backbone-based Dynamic Graph Spatio-Temporal Network (BDGSTN). Intuitively, the continuous and smooth changes in graph structure, make adjacent graph structures share a basic pattern. To capture this property, we use adaptive methods to generate static backbone graphs containing the primary information and temporal models to generate dynamic temporal graphs of epidemic data, fusing them to generate a backbone-based dynamic graph. To overcome potential limitations associated with recurrent structures, we introduce a linear model DLinear to handle temporal dependencies and combine it with dynamic graph convolution for epidemic forecasting. Extensive experiments on two datasets demonstrate that BDGSTN outperforms baseline models and ablation comparison further verifies the effectiveness of model components. Furthermore, we analyze and measure the significance of backbone and temporal graphs by using information metrics from different aspects. Finally, we compare model parameter volume and training time to confirm the superior complexity and efficiency of BDGSTN. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.07624 [pdf]

Disordered hyperuniformity signals functioning and resilience of self-organized vegetation patterns

Authors: Wensi Hu, Quan-Xing Liu, Bo Wang, Nuo Xu, Lijuan Cui, Chi Xu

Abstract: In harsh environments, organisms may self-organize into spatially patterned systems in various ways. So far, studies of ecosystem spatial self-organization have primarily focused on apparent orders reflected by regular patterns. However, self-organized ecosystems may also have cryptic orders that can be unveiled only through certain quantitative analyses. Here we show that disordered hyperuniformi… ▽ More In harsh environments, organisms may self-organize into spatially patterned systems in various ways. So far, studies of ecosystem spatial self-organization have primarily focused on apparent orders reflected by regular patterns. However, self-organized ecosystems may also have cryptic orders that can be unveiled only through certain quantitative analyses. Here we show that disordered hyperuniformity as a striking class of hidden orders can exist in spatially self-organized vegetation landscapes. By analyzing the high-resolution remotely sensed images across the American drylands, we demonstrate that it is not uncommon to find disordered hyperuniform vegetation states characterized by suppressed density fluctuations at long range. Such long-range hyperuniformity has been documented in a wide range of microscopic systems. Our finding contributes to expanding this domain to accommodate natural landscape ecological systems. We use theoretical modeling to propose that disordered hyperuniform vegetation patterning can arise from three generalized mechanisms prevalent in dryland ecosystems, including (1) critical absorbing states driven by an ecological legacy effect, (2) scale-dependent feedbacks driven by plant-plant facilitation and competition, and (3) density-dependent aggregation driven by plant-sediment feedbacks. Our modeling results also show that disordered hyperuniform patterns can help ecosystems cope with arid conditions with enhanced functioning of soil moisture acquisition. However, this advantage may come at the cost of slower recovery of ecosystem structure upon perturbations. Our work highlights that disordered hyperuniformity as a distinguishable but underexplored ecosystem self-organization state merits systematic studies to better understand its underlying mechanisms, functioning, and resilience. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 34 pages, 6 figures; Supplementary Materials, 19 pages, 10 figures, 2 tables

arXiv:2311.07621 [pdf, other]

To Transformers and Beyond: Large Language Models for the Genome

Authors: Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, Bo Wang

Abstract: In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on the foundation of traditional convolutional neural networks and recurrent neural networks, we explore… ▽ More In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on the foundation of traditional convolutional neural networks and recurrent neural networks, we explore both the strengths and limitations of transformers and other LLMs for genomics. Additionally, we contemplate the future of genomic modeling beyond the transformer architecture based on current trends in research. The paper aims to serve as a guide for computational biologists and computer scientists interested in LLMs for genomic data. We hope the paper can also serve as an educational introduction and discussion for biologists to a fundamental shift in how we will be analyzing genomic data in the future. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2310.08801 [pdf]

Neural Dysfunction Underlying Working Memory Processing at Different Stages of the Illness Course in Schizophrenia:A Comparative Meta-analysis

Authors: Yuhao Yao, Shufang Zhang, Boyao Wang, Gaofeng Zhao, Hong Deng, Ying Chen

Abstract: Schizophrenia (SCZ), as a chronic and persistent disorder, exhibits working memory deficits across various stages of the disorder, yet the neural mechanisms underlying these deficits remain elusive with inconsistent neuroimaging findings. We aimed to compare the brain functional changes of working memory in patients at different stages: clinical high risk (CHR), first-episode psychosis (FEP), and… ▽ More Schizophrenia (SCZ), as a chronic and persistent disorder, exhibits working memory deficits across various stages of the disorder, yet the neural mechanisms underlying these deficits remain elusive with inconsistent neuroimaging findings. We aimed to compare the brain functional changes of working memory in patients at different stages: clinical high risk (CHR), first-episode psychosis (FEP), and long-term SCZ, using meta-analyses of functional magnetic resonance imaging (fMRI) studies. Following a systematic literature search, fifty-six whole-brain task-based fMRI studies (15 for CHR, 16 for FEP, 25 for long-term SCZ) were included. The separate and pooled neurofunctional mechanisms among CHR, FEP and long-term SCZ were generated by Seed-based d Mapping toolbox. The CHR and FEP groups exhibited overlapping hypoactivation in the right inferior parietal lobule, right middle frontal gyrus, and left superior parietal lobule, indicating key lesion sites in the early phase of SCZ. Individuals with FEP showed lower activation in left inferior parietal lobule than those with long-term SCZ, reflecting a possible recovery process or more neural inefficiency. We concluded that SCZ represent as a continuum in the early stage of illness progression, while the neural bases are inversely changed with the development of illness course to long-term course. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.08738 [pdf, other]

Splicing Up Your Predictions with RNA Contrastive Learning

Authors: Philip Fradkin, Ruian Shi, Bo Wang, Brendan Frey, Leo J. Lee

Abstract: In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities… ▽ More In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities between sequences generated through alternative splicing and gene duplication. Our novel dataset and contrastive objective enable the learning of generalized RNA isoform representations. We validate their utility on downstream tasks such as RNA half-life and mean ribosome load prediction. Our pre-training strategy yields competitive results using linear probing on both tasks, along with up to a two-fold increase in Pearson correlation in low-data conditions. Importantly, our exploration of the learned latent space reveals that our contrastive objective yields semantically meaningful representations, underscoring its potential as a valuable initialization technique for RNA property prediction. △ Less

Submitted 17 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.10837 [pdf, other]

Improving Opioid Use Disorder Risk Modelling through Behavioral and Genetic Feature Integration

Authors: Sybille Légitime, Kaustubh Prabhu, Devin McConnell, Bing Wang, Dipak K. Dey, Derek Aguiar

Abstract: Opioids are an effective analgesic for acute and chronic pain, but also carry a considerable risk of addiction leading to millions of opioid use disorder (OUD) cases and tens of thousands of premature deaths in the United States yearly. Estimating OUD risk prior to prescription could improve the efficacy of treatment regimens, monitoring programs, and intervention strategies, but risk estimation i… ▽ More Opioids are an effective analgesic for acute and chronic pain, but also carry a considerable risk of addiction leading to millions of opioid use disorder (OUD) cases and tens of thousands of premature deaths in the United States yearly. Estimating OUD risk prior to prescription could improve the efficacy of treatment regimens, monitoring programs, and intervention strategies, but risk estimation is typically based on self-reported data or questionnaires. We develop an experimental design and computational methods that combine genetic variants associated with OUD with behavioral features extracted from GPS and Wi-Fi spatiotemporal coordinates to assess OUD risk. Since both OUD mobility and genetic data do not exist for the same cohort, we develop algorithms to (1) generate mobility features from empirical distributions and (2) synthesize mobility and genetic samples assuming an expected level of disease co-occurrence. We show that integrating genetic and mobility modalities improves risk modelling using classification accuracy, area under the precision-recall and receiver operator characteristic curves, and $F_1$ score. Interpreting the fitted models suggests that mobility features have more influence on OUD risk, although the genetic contribution was significant, particularly in linear models. While there exist concerns with respect to privacy, security, bias, and generalizability that must be evaluated in clinical trials before being implemented in practice, our framework provides preliminary evidence that behavioral and genetic features may improve OUD risk estimation to assist with personalized clinical decision-making. △ Less

Submitted 25 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: 32 pages (including References section), 8 figures. Under review by PLOS One

arXiv:2309.07701 [pdf]

Semantic reconstruction of continuous language from MEG signals

Authors: Bo Wang, Xiran Xu, Longxiang Zhang, Boda Xiao, Xihong Wu, Jing Chen

Abstract: Decoding language from neural signals holds considerable theoretical and practical importance. Previous research has indicated the feasibility of decoding text or speech from invasive neural signals. However, when using non-invasive neural signals, significant challenges are encountered due to their low quality. In this study, we proposed a data-driven approach for decoding semantic of language fr… ▽ More Decoding language from neural signals holds considerable theoretical and practical importance. Previous research has indicated the feasibility of decoding text or speech from invasive neural signals. However, when using non-invasive neural signals, significant challenges are encountered due to their low quality. In this study, we proposed a data-driven approach for decoding semantic of language from Magnetoencephalography (MEG) signals recorded while subjects were listening to continuous speech. First, a multi-subject decoding model was trained using contrastive learning to reconstruct continuous word embeddings from MEG data. Subsequently, a beam search algorithm was adopted to generate text sequences based on the reconstructed word embeddings. Given a candidate sentence in the beam, a language model was used to predict the subsequent words. The word embeddings of the subsequent words were correlated with the reconstructed word embedding. These correlations were then used as a measure of the probability for the next word. The results showed that the proposed continuous word embedding model can effectively leverage both subject-specific and subject-shared information. Additionally, the decoded text exhibited significant similarity to the target text, with an average BERTScore of 0.816, a score comparable to that in the previous fMRI study. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.05864 [pdf, other]

doi 10.1038/s41592-024-02233-6

The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

Authors: Jun Ma, Ronald Xie, Shamini Ayyadhury, Cheng Ge, Anubha Gupta, Ritu Gupta, Song Gu, Yao Zhang, Gihun Lee, Joonkee Kim, Wei Lou, Haofeng Li, Eric Upschulte, Timo Dickscheid, José Guilherme de Almeida, Yixin Wang, Lin Han, Xin Yang, Marco Labagnara, Vojislav Gligorovski, Maxime Scheder, Sahand Jamal Rahi, Carly Kempster, Alice Pollitt, Leon Espinosa , et al. (15 additional authors not shown)

Abstract: Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diver… ▽ More Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging. △ Less

Submitted 1 April, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: NeurIPS22 Cell Segmentation Challenge: https://neurips22-cellseg.grand-challenge.org/ . Nature Methods (2024)

arXiv:2307.13798 [pdf, other]

Estimates of the reproduction ratio from epidemic surveillance may be biased in spatially structured populations

Authors: Piero Birello, Michele Re Fiorentin, Boxuan Wang, Vittoria Colizza, Eugenio Valdano

Abstract: An accurate and timely estimate of the reproduction ratio R of an infectious disease epidemic is crucial to make projections on its evolution and set up the appropriate public health response. Estimates of R routinely come from statistical inference on timelines of cases or their proxies like symptomatic cases, hospitalizatons, deaths. Here, however, we prove that these estimates of R may not be a… ▽ More An accurate and timely estimate of the reproduction ratio R of an infectious disease epidemic is crucial to make projections on its evolution and set up the appropriate public health response. Estimates of R routinely come from statistical inference on timelines of cases or their proxies like symptomatic cases, hospitalizatons, deaths. Here, however, we prove that these estimates of R may not be accurate if the population is made up of spatially distinct communities, as the interplay between space and mobility may hide the true epidemic evolution from surveillance data. This means that surveillance may underestimate R over long periods, to the point of mistaking a growing epidemic for a subsiding one, misinforming public health response. To overcome this, we propose a correction to be applied to surveillance data that removes this bias and ensures an accurate estimate of R across all epidemic phases. We use COVID-19 as case study; our results, however, apply to any epidemic where mobility is a driver of circulation, including major challenges of the next decades: respiratory infections (influenza, SARS-CoV-2, emerging pathogens), vector-borne diseases (arboviruses). Our findings will help set up public health response to these threats, by improving epidemic monitoring and surveillance. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: 11 pages, 4 figures, plus Supplementary Information

arXiv:2306.00041 [pdf, other]

Causal Intervention for Measuring Confidence in Drug-Target Interaction Prediction

Authors: Wenting Ye, Chen Li, Yang Xie, Wen Zhang, Hong-Yu Zhang, Bowen Wang, Debo Cheng, Zaiwen Feng

Abstract: Identifying and discovering drug-target interactions(DTIs) are vital steps in drug discovery and development. They play a crucial role in assisting scientists in finding new drugs and accelerating the drug development process. Recently, knowledge graph and knowledge graph embedding (KGE) models have made rapid advancements and demonstrated impressive performance in drug discovery. However, such mo… ▽ More Identifying and discovering drug-target interactions(DTIs) are vital steps in drug discovery and development. They play a crucial role in assisting scientists in finding new drugs and accelerating the drug development process. Recently, knowledge graph and knowledge graph embedding (KGE) models have made rapid advancements and demonstrated impressive performance in drug discovery. However, such models lack authenticity and accuracy in drug target identification, leading to an increased misjudgment rate and reduced drug development efficiency. To address these issues, we focus on the problem of drug-target interactions, with knowledge mapping as the core technology. Specifically, a causal intervention-based confidence measure is employed to assess the triplet score to improve the accuracy of the drug-target interaction prediction model. Experimental results demonstrate that the developed confidence measurement method based on causal intervention can significantly enhance the accuracy of DTI link prediction, particularly for high-precision models. The predicted results are more valuable in guiding the design and development of subsequent drug development experiments, thereby significantly improving the efficiency of drug development. △ Less

Submitted 14 November, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.14517 [pdf, other]

CongFu: Conditional Graph Fusion for Drug Synergy Prediction

Authors: Oleksii Tsepa, Bohdan Naida, Anna Goldenberg, Bo Wang

Abstract: Drug synergy, characterized by the amplified combined effect of multiple drugs, is critically important for optimizing therapeutic outcomes. Limited data on drug synergy, arising from the vast number of possible drug combinations and testing costs, motivate the need for predictive methods. In this work, we introduce CongFu, a novel Conditional Graph Fusion Layer, designed to predict drug synergy.… ▽ More Drug synergy, characterized by the amplified combined effect of multiple drugs, is critically important for optimizing therapeutic outcomes. Limited data on drug synergy, arising from the vast number of possible drug combinations and testing costs, motivate the need for predictive methods. In this work, we introduce CongFu, a novel Conditional Graph Fusion Layer, designed to predict drug synergy. CongFu employs an attention mechanism and a bottleneck to extract local graph contexts and conditionally fuse graph data within a global context. Its modular architecture enables flexible replacement of layer modules, including readouts and graph encoders, facilitating customization for diverse applications. To evaluate the performance of CongFu, we conduct comprehensive experiments on four datasets, encompassing three distinct setups for drug synergy prediction. CongFu achieves state-of-the-art results on 11 out of 12 benchmark datasets, demonstrating its ability to capture intricate patterns of drug synergy. Through ablation studies, we validate the significance of individual layer components, affirming their contributions to overall predictive performance. Finally, we propose an explainability strategy for elucidating the effect of drugs on genes. By addressing the challenge of predicting drug synergy in untested drug pairs and utilizing our proposed explainability approach, CongFu opens new avenues for optimizing drug combinations and advancing personalized medicine. △ Less

Submitted 6 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.04931 [pdf]

Network pharmacology on the mechanism of Yi Qi Tong Qiao Pill inhibiting allergic rhinitis

Authors: Boyang Wang, DingFan Zhang, Tingyu Zhang, Chayanis Sutcharitchan, Jianlin Hua, Dongfang Hua, Bo Zhang, Shao Li

Abstract: Objective: The purpose of this study is to reveal the mechanism of action of Yi Qi Tong Qiao Pill (YQTQP) in the treatment of allergic rhinitis (AR), as well as establish a paradigm for the researches on traditional Chinese medicine (TCM) from systematic perspective. Methods: Based on the data collected from TCM-related and disease-related databases, target profiles of compounds in YQTQP were calc… ▽ More Objective: The purpose of this study is to reveal the mechanism of action of Yi Qi Tong Qiao Pill (YQTQP) in the treatment of allergic rhinitis (AR), as well as establish a paradigm for the researches on traditional Chinese medicine (TCM) from systematic perspective. Methods: Based on the data collected from TCM-related and disease-related databases, target profiles of compounds in YQTQP were calculated through network-based algorithms and holistic targets of TQTQP was constructed. Network target analysis was performed to explore the potential mechanisms of YQTQP in the treatment of AR and the mechanisms were classified into different modules according to their biological functions. Besides, animal and clinical experiments were conducted to validate our findings inferred from Network target analysis. Results: Network target analysis showed that YQTQP targeted 12 main pathways or biological processes related to AR, represented by those related to IL-4, IFN-γ, TNF-α and IL-13. These results could be classified into 3 biological modules, including regulation of immune and inflammation, epithelial barrier disorder and cell adhesion. Finally, a series of experiments composed of animal and clinical experiments, proved our findings and confirmed that YQTQP could improve related symptoms of AR, like permeability of nasal mucosa epithelium. Conclusion: A combination of Network target analysis and the experimental validation indicated that YQTQP was effective in the treatment of AR and might provide a new insight on revealing the mechanism of TCM against diseases. △ Less

Submitted 21 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

Comments: 25 pages, 6 figures

MSC Class: None

arXiv:2302.05450 [pdf]

A network-based biomarkers discovery of Cold/Hot ZHENG chronic gastritis and Cold/Hot herbs of formulae

Authors: Boyang Wang, Pan Chen, Peng Zhang, Shao Li

Abstract: Objective: To discover biomarkers and uncover the mechanism of Cold/Hot ZHENG (syndrome in traditional Chinese medicine) chronic gastritis (CG) and Cold/Hot herbs in traditional Chinese medicine (TCM) formulae on systematic biology. Background: CG is a common inflammatory disease and the diagnosis of CG in TCM can be classified into Cold ZHENG (Asthenic Cold) and Hot ZHENG (Excess Hot). However, t… ▽ More Objective: To discover biomarkers and uncover the mechanism of Cold/Hot ZHENG (syndrome in traditional Chinese medicine) chronic gastritis (CG) and Cold/Hot herbs in traditional Chinese medicine (TCM) formulae on systematic biology. Background: CG is a common inflammatory disease and the diagnosis of CG in TCM can be classified into Cold ZHENG (Asthenic Cold) and Hot ZHENG (Excess Hot). However, the molecular features of Cold/Hot ZHENG in CG and the mechanism of Cold/Hot herbs in formulae for CG remained unclear. Methods: Based on data of 35 patients of Cold/Hot ZHENG CG and 3 scRNA-seq CG samples, we conduct analysis with transcriptomics datasets and algorithms, to discover biomarkers for Cold/Hot ZHENG CG. And we collected 25 formulae (with traditional effects related to Cold/Hot ZHENG) for CG and corresponding 89 Cold/Hot herbs (including Warm/Cool herbs) to discover features and construct target networks of Cold/Hot herbs on the basis of network target and enrichment analysis. Results: Biomarkers of Cold/Hot ZHENG CG represented by CCL2 and LEP suggested that Hot ZHENG CG might be characterized by over-inflammation and exuberant metabolism, and Cold ZHENG CG showed a trend of suppression in immune regulation and energy metabolism. And biomarkers of Cold/Hot ZHENG showed also significant changes in the progression of gastric cancer. And biomarkers and pathways of Hot herbs intend to regulate immune responses and energy metabolism, while those of Cold herbs were likely to participate in anti-inflammation effect. Conclusion: In this study, we found that the biomarkers and mechanism of Cold/Hot ZHENG CG and those of Cold/Hot herbs were closely related to the regulation of immune and metabolisms. These findings may reflect the mechanism, build bridges between multiple views of Cold/Hot ZHENG and Cold/Hot herbs, and provide a research paradigm for further achieving precision TCM. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: 17 pages (references not included), 7 figures

arXiv:2302.03590 [pdf, other]

NodeCoder: a graph-based machine learning platform to predict active sites of modeled protein structures

Authors: Nasim Abdollahi, Seyed Ali Madani Tonekaboni, Jay Huang, Bo Wang, Stephen MacKinnon

Abstract: While accurate protein structure predictions are now available for nearly every observed protein sequence, predicted structures lack much of the functional context offered by experimental structure determination. We address this gap with NodeCoder, a task-independent platform that maps residue-based datasets onto 3D protein structures, embeds the resulting structural feature into a contact network… ▽ More While accurate protein structure predictions are now available for nearly every observed protein sequence, predicted structures lack much of the functional context offered by experimental structure determination. We address this gap with NodeCoder, a task-independent platform that maps residue-based datasets onto 3D protein structures, embeds the resulting structural feature into a contact network, and models residue classification tasks with a Graph Convolutional Network (GCN). We demonstrate the versatility of this strategy by modeling six separate tasks, with some labels derived from other experimental structure studies (ligand, peptide, ion, and nucleic acid binding sites) and other labels derived from annotation databases (post-translational modification and transmembrane regions). Moreover, A NodeCoder model trained to identify ligand binding site residues was able to outperform P2Rank, a widely-used software developed specifically for ligand binding site detection. NodeCoder is available as an open-source python package at https://pypi.org/project/NodeCoder/. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: including supplementary materials 22 pages, 6 figures, 4 tables, presented at NeurIPS 2021 and ACS 2022

arXiv:2212.13285 [pdf, other]

On the Level Sets and Invariance of Neural Tuning Landscapes

Authors: Binxu Wang, Carlos R. Ponce

Abstract: Visual representations can be defined as the activations of neuronal populations in response to images. The activation of a neuron as a function over all image space has been described as a "tuning landscape". As a function over a high-dimensional space, what is the structure of this landscape? In this study, we characterize tuning landscapes through the lens of level sets and Morse theory. A rece… ▽ More Visual representations can be defined as the activations of neuronal populations in response to images. The activation of a neuron as a function over all image space has been described as a "tuning landscape". As a function over a high-dimensional space, what is the structure of this landscape? In this study, we characterize tuning landscapes through the lens of level sets and Morse theory. A recent study measured the in vivo two-dimensional tuning maps of neurons in different brain regions. Here, we developed a statistically reliable signature for these maps based on the change of topology in level sets. We found this topological signature changed progressively throughout the cortical hierarchy, with similar trends found for units in convolutional neural networks (CNNs). Further, we analyzed the geometry of level sets on the tuning landscapes of CNN units. We advanced the hypothesis that higher-order units can be locally regarded as isotropic radial basis functions, but not globally. This shows the power of level sets as a conceptual tool to understand neuronal activations over image space. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: 24 pages, 13 figures. Published in NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations, and PMLR volume 197

arXiv:2210.05988 [pdf, other]

CLEEGN: A Convolutional Neural Network for Plug-and-Play Automatic EEG Reconstruction

Authors: Pin-Hua Lai, Bo-Shan Wang, Wei-Chun Yang, Hsiang-Chieh Tsou, Chun-Shu Wei

Abstract: Human electroencephalography (EEG) is a brain monitoring modality that senses cortical neuroelectrophysiological activity in high-temporal resolution. One of the greatest challenges posed in applications of EEG is the unstable signal quality susceptible to inevitable artifacts during recordings. To date, most existing techniques for EEG artifact removal and reconstruction are applicable to offline… ▽ More Human electroencephalography (EEG) is a brain monitoring modality that senses cortical neuroelectrophysiological activity in high-temporal resolution. One of the greatest challenges posed in applications of EEG is the unstable signal quality susceptible to inevitable artifacts during recordings. To date, most existing techniques for EEG artifact removal and reconstruction are applicable to offline analysis solely, or require individualized training data to facilitate online reconstruction. We have proposed CLEEGN, a novel convolutional neural network for plug-and-play automatic EEG reconstruction. CLEEGN is based on a subject-independent pre-trained model using existing data and can operate on a new user without any further calibration. The performance of CLEEGN was validated using multiple evaluations including waveform observation, reconstruction error assessment, and decoding accuracy on well-studied labeled datasets. The results of simulated online validation suggest that, even without any calibration, CLEEGN can largely preserve inherent brain activity and outperforms leading online/offline artifact removal methods in the decoding accuracy of reconstructed EEG data. In addition, visualization of model parameters and latent features exhibit the model behavior and reveal explainable insights related to existing knowledge of neuroscience. We foresee pervasive applications of CLEEGN in prospective works of online plug-and-play EEG decoding and analysis. △ Less

Submitted 20 February, 2024; v1 submitted 12 October, 2022; originally announced October 2022.

arXiv:2204.06765 [pdf, other]

doi 10.1145/3512290.3528725

High-performance Evolutionary Algorithms for Online Neuron Control

Authors: Binxu Wang, Carlos R. Ponce

Abstract: Recently, optimization has become an emerging tool for neuroscientists to study neural code. In the visual system, neurons respond to images with graded and noisy responses. Image patterns eliciting highest responses are diagnostic of the coding content of the neuron. To find these patterns, we have used black-box optimizers to search a 4096d image space, leading to the evolution of images that ma… ▽ More Recently, optimization has become an emerging tool for neuroscientists to study neural code. In the visual system, neurons respond to images with graded and noisy responses. Image patterns eliciting highest responses are diagnostic of the coding content of the neuron. To find these patterns, we have used black-box optimizers to search a 4096d image space, leading to the evolution of images that maximize neuronal responses. Although genetic algorithm (GA) has been commonly used, there haven't been any systematic investigations to reveal the best performing optimizer or the underlying principles necessary to improve them. Here, we conducted a large scale in silico benchmark of optimizers for activation maximization and found that Covariance Matrix Adaptation (CMA) excelled in its achieved activation. We compared CMA against GA and found that CMA surpassed the maximal activation of GA by 66% in silico and 44% in vivo. We analyzed the structure of Evolution trajectories and found that the key to success was not covariance matrix adaptation, but local search towards informative dimensions and an effective step size decay. Guided by these principles and the geometry of the image manifold, we developed SphereCMA optimizer which competed well against CMA, proving the validity of the identified principles. Code available at https://github.com/Animadversio/ActMax-Optimizer-Dev △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: 19 pages, 22 figures, 3 tables. Accepted as full paper to The Genetic and Evolutionary Computation Conference 2022

ACM Class: J.3; F.2.1; G.1.6; I.2.10; I.5.1

arXiv:2202.11551 [pdf, other]

SeqMapPDB: A Standalone Pipeline to Identify Representative Structures of Protein Sequences and Mapping Residue Indices in Real-Time at Proteome Scale

Authors: Boshen Wang, Xue Lei, Wei Tian, Alan Perez-Rathke, Yan-Yuan Tseng, Jie Liang

Abstract: Motivation: 3D structures of proteins provide rich information for understanding their biochemical roles. Identifying the representative protein structures for protein sequences is essential for analysis of proteins at proteome scale. However, there are technical difficulties in identifying the representative structure of a given protein sequence and providing accurate mapping of residue indices.… ▽ More Motivation: 3D structures of proteins provide rich information for understanding their biochemical roles. Identifying the representative protein structures for protein sequences is essential for analysis of proteins at proteome scale. However, there are technical difficulties in identifying the representative structure of a given protein sequence and providing accurate mapping of residue indices. Existing databases of mapping between structures and sequences are usually static that are not suitable for studying proteomes with frequent gene model revisions. They often do not provide reliable and consistent representative structures that maximizes sequence coverage. Furthermore, proteins isomers are usually not properly resolved. Results: To overcome these difficulties, we have developed a computational pipeline called SeqMapPDB to provide high-quality representative PDB structures of given sequences. It provides mapping to structures that fully cover the sequences when available, or to the set of partial non-overlapping structural domains that maximally cover the query sequence. The residue indices are accurate mapped and isomeric proteins are resolved. SeqMapPDB is efficient and can rapidly carry out proteome-wide mapping to the selected version of reference genomes in real-time. Furthermore, SeqMapPDB provides the flexibility of a stand-alone pipeline for large scale mapping of in-house sequence and structure data. Availability: Our method is available at https://bitbucket.org/lianglabuic/seqmappdb with GNU GPL license. △ Less

Submitted 27 February, 2023; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: 3 pages

arXiv:2112.07173 [pdf, other]

On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation

Authors: Binxu Wang, David Mayo, Arturo Deza, Andrei Barbu, Colin Conwell

Abstract: Self-supervised learning is a powerful way to learn useful representations from natural data. It has also been suggested as one possible means of building visual representation in humans, but the specific objective and algorithm are unknown. Currently, most self-supervised methods encourage the system to learn an invariant representation of different transformations of the same image in contrast t… ▽ More Self-supervised learning is a powerful way to learn useful representations from natural data. It has also been suggested as one possible means of building visual representation in humans, but the specific objective and algorithm are unknown. Currently, most self-supervised methods encourage the system to learn an invariant representation of different transformations of the same image in contrast to those of other images. However, such transformations are generally non-biologically plausible, and often consist of contrived perceptual schemes such as random cropping and color jittering. In this paper, we attempt to reverse-engineer these augmentations to be more biologically or perceptually plausible while still conferring the same benefits for encouraging robust representation. Critically, we find that random cropping can be substituted by cortical magnification, and saccade-like sampling of the image could also assist the representation learning. The feasibility of these transformations suggests a potential way that biological visual systems could implement self-supervision. Further, they break the widely accepted spatially-uniform processing assumption used in many computer vision algorithms, suggesting a role for spatially-adaptive computation in humans and machines alike. Our code and demo can be found here. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: 14 pages, 6 figures, 2 tables. Published in NeurIPS 2021 Workshop, Shared Visual Representations in Human & Machine Intelligence (SVRHM). For code, see https://github.com/Animadversio/Foveated_Saccade_SimCLR

ACM Class: I.4.10; I.5.1; I.2.6; I.2.10

arXiv:2111.04824 [pdf, other]

MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using Graph Transformers

Authors: Adamo Young, Bo Wang, Hannes Röst

Abstract: Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule. Although mass spectrometry is applied in many areas, the vast majority of small molecules lack experimental reference spectra. For over seventy years, spectrum prediction has remained a key challenge in the field. Existing deep learning methods do not leverage global structure in the molecu… ▽ More Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule. Although mass spectrometry is applied in many areas, the vast majority of small molecules lack experimental reference spectra. For over seventy years, spectrum prediction has remained a key challenge in the field. Existing deep learning methods do not leverage global structure in the molecule, potentially resulting in difficulties when generalizing to new data. In this work we propose a new model, MassFormer, for accurately predicting tandem mass spectra. MassFormer uses a graph transformer architecture to model long-distance relationships between atoms in the molecule. The transformer module is initialized with parameters obtained through a chemical pre-training task, then fine-tuned on spectral data. MassFormer outperforms competing approaches for spectrum prediction on multiple datasets, and is able to recover prior knowledge about the effect of collision energy on the spectrum. By employing gradient-based attribution methods, we demonstrate that the model can identify relationships between fragment peaks. To further highlight MassFormer's utility, we show that it can match or exceed existing prediction-based methods on two spectrum identification tasks. We provide open-source implementations of our model and baseline approaches, with the goal of encouraging future research in this area. △ Less

Submitted 1 May, 2023; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: 46 pages, 4 main figures, 6 extended data figures, 5 main tables, 1 extended data table

arXiv:2108.04176 [pdf, other]

Adaptive Residue-wise Profile Fusion for Low Homologous Protein SecondaryStructure Prediction Using External Knowledge

Authors: Qin Wang, Jun Wei, Boyuan Wang, Zhen Li1, Sheng Wang, Shuguang Cu

Abstract: Protein secondary structure prediction (PSSP) is essential for protein function analysis. However, for low homologous proteins, the PSSP suffers from insufficient input features. In this paper, we explicitly import external self-supervised knowledge for low homologous PSSP under the guidance of residue-wise profile fusion. In practice, we firstly demonstrate the superiority of profile over Positio… ▽ More Protein secondary structure prediction (PSSP) is essential for protein function analysis. However, for low homologous proteins, the PSSP suffers from insufficient input features. In this paper, we explicitly import external self-supervised knowledge for low homologous PSSP under the guidance of residue-wise profile fusion. In practice, we firstly demonstrate the superiority of profile over Position-Specific Scoring Matrix (PSSM) for low homologous PSSP. Based on this observation, we introduce the novel self-supervised BERT features as the pseudo profile, which implicitly involves the residue distribution in all native discovered sequences as the complementary features. Further-more, a novel residue-wise attention is specially designed to adaptively fuse different features (i.e.,original low-quality profile, BERT based pseudo profile), which not only takes full advantage of each feature but also avoids noise disturbance. Be-sides, the feature consistency loss is proposed to accelerate the model learning from multiple semantic levels. Extensive experiments confirm that our method outperforms state-of-the-arts (i.e.,4.7%forextremely low homologous cases on BC40 dataset). △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: Accepted in IJCAI-21

arXiv:2107.02962 [pdf, other]

Transmission Dynamics of COVID-19 Pandemic Non-pharmaceutical Interventions and Vaccination

Authors: Bin-Guo Wang, Shunxiang Huang, Yongping Xiong, Ming-Zhen Xin, Jing LI, Jiangqian Zhang, Zhihui Ma

Abstract: Non-pharmaceutical interventions(NPIs) play an important role in the early stage control of COVID-19 pandemic. Vaccination is considered to be the inevitable course to stop the spread of SARS-CoV-2. Based on the mechanism, a SVEIR COVID-19 model with vaccination and NPIs is proposed. By means of the basic reproduction number $R_{0}$, it is shown that the disease-free equilibrium is globally attrac… ▽ More Non-pharmaceutical interventions(NPIs) play an important role in the early stage control of COVID-19 pandemic. Vaccination is considered to be the inevitable course to stop the spread of SARS-CoV-2. Based on the mechanism, a SVEIR COVID-19 model with vaccination and NPIs is proposed. By means of the basic reproduction number $R_{0}$, it is shown that the disease-free equilibrium is globally attractive if $\mathscr{R}_{0}<1$, and COVID-19 is uniform persistence if $\mathscr{R}_{0}>1$. Taking Indian dates for example in the numerical simulation, we find that our dynamical results fits well with the statistical dates. Consequently, we forecast the spreading trend of COVID-19 pandemic in India. Furthermore, our results imply that improving the intensity of NPIs will greatly reduce the number of confirmed cases. Especially, NPIs are indispensable even if all the people were vaccinated when the efficiency of vaccine is relatively low. By simulating the relation ships of the basic reproduction number $\mathscr{R}_{0}$, the vaccination rate and the efficiency of vaccine, we find that it is impossible to achieve the herd immunity without NPIs when the efficiency of vaccine is lower than $76.9\%$. Therefore, the herd immunity area is defined by the evolution of relationships between the vaccination rate and the efficiency of vaccine. In the study of two patchy, we give the conditions for India and China to be open to navigation. Furthermore, an appropriate dispersal of population between India and China is obtained. A discussion completes the paper. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2106.13397 [pdf, other]

Pheno-Mapper: An Interactive Toolbox for the Visual Exploration of Phenomics Data

Authors: Youjia Zhou, Methun Kamruzzaman, Patrick Schnable, Bala Krishnamoorthy, Ananth Kalyanaraman, Bei Wang

Abstract: High-throughput technologies to collect field data have made observations possible at scale in several branches of life sciences. The data collected can range from the molecular level (genotypes) to physiological (phenotypic traits) and environmental observations (e.g., weather, soil conditions). These vast swathes of data, collectively referred to as phenomics data, represent a treasure trove of… ▽ More High-throughput technologies to collect field data have made observations possible at scale in several branches of life sciences. The data collected can range from the molecular level (genotypes) to physiological (phenotypic traits) and environmental observations (e.g., weather, soil conditions). These vast swathes of data, collectively referred to as phenomics data, represent a treasure trove of key scientific knowledge on the dynamics of the underlying biological system. However, extracting information and insights from these complex datasets remains a significant challenge owing to their multidimensionality and lack of prior knowledge about their complex structure. In this paper, we present Pheno-Mapper, an interactive toolbox for the exploratory analysis and visualization of large-scale phenomics data. Our approach uses the mapper framework to perform a topological analysis of the data, and subsequently render visual representations with built-in data analysis and machine learning capabilities. We demonstrate the utility of this new tool on real-world plant (e.g., maize) phenomics datasets. In comparison to existing approaches, the main advantage of Pheno-Mapper is that it provides rich, interactive capabilities in the exploratory analysis of phenomics data, and it integrates visual analytics with data analysis and machine learning in an easily extensible way. In particular, Pheno-Mapper allows the interactive selection of subpopulations guided by a topological summary of the data and applies data mining and machine learning to these selected subpopulations for in-depth exploration. △ Less

Submitted 6 July, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

Comments: This is a preprint version. For a published version, please refer to ACM DOI: 10.1145/3459930.3469511

arXiv:2104.01474 [pdf, other]

Thalamocortical contribution to solving credit assignment in neural systems

Authors: Mien Brabeeba Wang, Michael M. Halassa

Abstract: Animal brains evolved to optimize behavior in dynamically changing environments, selecting actions that maximize future rewards. A large body of experimental work indicates that such optimization changes the wiring of neural circuits, appropriately mapping environmental input onto behavioral outputs. A major unsolved scientific question is how optimal wiring adjustments, which must target the conn… ▽ More Animal brains evolved to optimize behavior in dynamically changing environments, selecting actions that maximize future rewards. A large body of experimental work indicates that such optimization changes the wiring of neural circuits, appropriately mapping environmental input onto behavioral outputs. A major unsolved scientific question is how optimal wiring adjustments, which must target the connections responsible for rewards, can be accomplished when the relation between sensory inputs, action taken, environmental context with rewards is ambiguous. The computational problem of properly targeting cues, contexts and actions that lead to reward is known as structural, contextual and temporal credit assignment respectively. In this review, we survey prior approaches to these three types of problems and advance the notion that the brain's specialized neural architectures provide efficient solutions. Within this framework, the thalamus with its cortical and basal ganglia interactions serve as a systems-level solution to credit assignment. Specifically, we propose that thalamocortical interaction is the locus of meta-learning where the thalamus provides cortical control functions that parametrize the cortical activity association space. By selecting among these control functions, the basal ganglia hierarchically guide thalamocortical plasticity across two timescales to enable meta-learning. The faster timescale establishes contextual associations to enable rapid behavioral flexibility while the slower one enables generalization to new contexts. Incorporating different thalamic control functions under this framework clarifies how thalamocortical-basal ganglia interactions may simultaneously solve the three credit assignment problems. △ Less

Submitted 3 April, 2021; originally announced April 2021.

arXiv:2012.13467 [pdf, other]

Real-Time Optimization of the Current Steering for Visual Prosthesis

Authors: Zhijie Charles Chen, Bing-Yi Wang, Daniel Palanker

Abstract: Current steering on a multi-electrode array is commonly used to shape the electric field in the neural tissue in order to improve selectivity and efficacy of stimulation. Previously, simulations of the electric field in tissue required separate computation for each set of the stimulation parameters. Not only is this approach to modeling time-consuming and very difficult with a large number of elec… ▽ More Current steering on a multi-electrode array is commonly used to shape the electric field in the neural tissue in order to improve selectivity and efficacy of stimulation. Previously, simulations of the electric field in tissue required separate computation for each set of the stimulation parameters. Not only is this approach to modeling time-consuming and very difficult with a large number of electrodes, it is incompatible with real-time optimization of the current steering for practical applications. We present a framework for efficient computation of the electric field in the neural tissue based on superposition of the fields from a pre-calculated basis. Such linear algebraic framework enables optimization of the current steering for any targeted electric field in real time. For applications to retinal prosthetics, we demonstrate how the stimulation depth can be optimized for each patient based on the retinal thickness and separation from the array, while maximizing the lateral confinement of the electric field essential for spatial resolution. △ Less

Submitted 24 December, 2020; originally announced December 2020.

Comments: 5 pages, 2 figures, submitted to IEEE EMBS NER'21

arXiv:2007.14391 [pdf, other]

doi 10.1080/24725854.2020.1856982

A calibration-free method for biosensing in cell manufacturing

Authors: Jialei Chen, Zhaonan Liu, Kan Wang, Chen Jiang, Chuck Zhang, Ben Wang

Abstract: Chimeric antigen receptor T cell therapy has demonstrated innovative therapeutic effectiveness in fighting cancers; however, it is extremely expensive due to the intrinsic patient-to-patient variability in cell manufacturing. We propose in this work a novel calibration-free statistical framework to effectively recover critical quality attributes under the patient-to-patient variability. Specifical… ▽ More Chimeric antigen receptor T cell therapy has demonstrated innovative therapeutic effectiveness in fighting cancers; however, it is extremely expensive due to the intrinsic patient-to-patient variability in cell manufacturing. We propose in this work a novel calibration-free statistical framework to effectively recover critical quality attributes under the patient-to-patient variability. Specifically, we model this variability via a patient-specific calibration parameter, and use readings from multiple biosensors to construct a patient-invariance statistic, thereby alleviating the effect of the calibration parameter. A carefully formulated optimization problem and an algorithmic framework are presented to find the best patient-invariance statistic and the model parameters. Using the patient-invariance statistic, we can recover the critical quality attribute of interest, free from the calibration parameter. We demonstrate improvements of the proposed calibration-free method in different simulation experiments. In the cell manufacturing case study, our method not only effectively recovers viable cell concentration for monitoring, but also reveals insights for the cell manufacturing process. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Journal ref: IISE Transactions, 2020

arXiv:2005.09769 [pdf, ps, other]

Controlling the Hidden Growth of COVID-19

Authors: Xiubin Bruce Wang, Chaolun Ma

Abstract: The COVID-19 pandemic has plagued the world for months. The U.S. has taken measures to counter it. On a daily basis, newly confirmed cases have been reported. In the early days, these numbers showed an increasing trend. Recently, the numbers have been generally flattened out. This report tries to estimate the hidden number of currently alive infections in the population by using the confirmed case… ▽ More The COVID-19 pandemic has plagued the world for months. The U.S. has taken measures to counter it. On a daily basis, newly confirmed cases have been reported. In the early days, these numbers showed an increasing trend. Recently, the numbers have been generally flattened out. This report tries to estimate the hidden number of currently alive infections in the population by using the confirmed cases. A major result indicates an existing infections estimate at about 10-50 times the daily confirmed new cases, with the stringent social distancing policy tipping to the upper end of this range. It clarifies the relationship between the infection rate and the test rate to put the epidemic under control, which says that the test rate shall keep up at the same pace as infection rate to prevent an outbreak. This relationship is meaningful in the wake of business re-opening in the U.S. and the world. The report also reveals the connections of all the measures taken to the epidemic spread. A stratified sampling method is proposed to add to the current tool kits of epidemic control. Again, this report is a summary of some straight observations and thoughts, not through a thorough study backed with field data. The results appear obvious and suitable for general education to interested policymakers and the public. △ Less

Submitted 19 May, 2020; originally announced May 2020.

Comments: 13 pages, 1 figure

arXiv:2003.02176 [pdf, ps, other]

Annotated-skeleton Biased Motion Planning for Faster Relevant Region Discovery

Authors: Diane Uwacu, Regina Rex, Bonnie Wang, Shawna Thomas, Nancy M. Amato

Abstract: Motion planning algorithms often leverage topological information about the environment to improve planner performance. However, these methods often focus only on the environment's connectivity while ignoring other properties such as obstacle clearance, terrain conditions, and resource accessibility. We present a method that augments a skeleton representing the workspace topology with such informa… ▽ More Motion planning algorithms often leverage topological information about the environment to improve planner performance. However, these methods often focus only on the environment's connectivity while ignoring other properties such as obstacle clearance, terrain conditions, and resource accessibility. We present a method that augments a skeleton representing the workspace topology with such information to guide a sampling-based motion planner to rapidly discover regions most relevant to the problem at hand. Our approach decouples guidance and planning, making it possible for basic planning algorithms to find desired paths earlier in the planning process. We demonstrate the efficacy of our approach in both robotics problems and applications in drug design. Our method is able to produce desirable paths quickly with no change to the underlying planner. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: 15 pages, 4 figures. Paper under review for WAFR 2020

arXiv:1911.02363 [pdf, other]

ODE-Inspired Analysis for the Biological Version of Oja's Rule in Solving Streaming PCA

Authors: Chi-Ning Chou, Mien Brabeeba Wang

Abstract: Oja's rule [Oja, Journal of mathematical biology 1982] is a well-known biologically-plausible algorithm using a Hebbian-type synaptic update rule to solve streaming principal component analysis (PCA). Computational neuroscientists have known that this biological version of Oja's rule converges to the top eigenvector of the covariance matrix of the input in the limit. However, prior to this work, i… ▽ More Oja's rule [Oja, Journal of mathematical biology 1982] is a well-known biologically-plausible algorithm using a Hebbian-type synaptic update rule to solve streaming principal component analysis (PCA). Computational neuroscientists have known that this biological version of Oja's rule converges to the top eigenvector of the covariance matrix of the input in the limit. However, prior to this work, it was open to prove any convergence rate guarantee. In this work, we give the first convergence rate analysis for the biological version of Oja's rule in solving streaming PCA. Moreover, our convergence rate matches the information theoretical lower bound up to logarithmic factors and outperforms the state-of-the-art upper bound for streaming PCA. Furthermore, we develop a novel framework inspired by ordinary differential equations (ODE) to analyze general stochastic dynamics. The framework abandons the traditional step-by-step analysis and instead analyzes a stochastic dynamic in one-shot by giving a closed-form solution to the entire dynamic. The one-shot framework allows us to apply stopping time and martingale techniques to have a flexible and precise control on the dynamic. We believe that this general framework is powerful and should lead to effective yet simple analysis for a large class of problems with stochastic dynamics. △ Less

Submitted 17 June, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2020

arXiv:1807.00123 [pdf, other]

doi 10.1016/j.inffus.2018.09.012

Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

Authors: Marinka Zitnik, Francis Nguyen, Bo Wang, Jure Leskovec, Anna Goldenberg, Michael M. Hoffman

Abstract: New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integ… ▽ More New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field. △ Less

Submitted 10 October, 2018; v1 submitted 30 June, 2018; originally announced July 2018.

Journal ref: Information Fusion 50 (2019) 71-91

arXiv:1805.03327 [pdf, other]

doi 10.1038/s41467-018-05469-x

Network Enhancement: a general method to denoise weighted biological networks

Authors: Bo Wang, Armin Pourshafeie, Marinka Zitnik, Junjie Zhu, Carlos D. Bustamante, Serafim Batzoglou, Jure Leskovec

Abstract: Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise rati… ▽ More Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise ratio of undirected, weighted networks. NE uses a doubly stochastic matrix operator that induces sparsity and provides a closed-form solution that increases spectral eigengap of the input network. As a result, NE removes weak edges, enhances real connections, and leads to better downstream performance. Experiments show that NE improves gene function prediction by denoising tissue-specific interaction networks, alleviates interpretation of noisy Hi-C contact maps from the human genome, and boosts fine-grained identification accuracy of species. Our results indicate that NE is widely applicable for denoising biological networks. △ Less

Submitted 1 June, 2018; v1 submitted 8 May, 2018; originally announced May 2018.

Journal ref: Nature Communications, 9:3108, 2018

arXiv:1709.04090 [pdf, other]

A Constrained, Weighted-L1 Minimization Approach for Joint Discovery of Heterogeneous Neural Connectivity Graphs

Authors: Chandan Singh, Beilun Wang, Yanjun Qi

Abstract: Determining functional brain connectivity is crucial to understanding the brain and neural differences underlying disorders such as autism. Recent studies have used Gaussian graphical models to learn brain connectivity via statistical dependencies across brain regions from neuroimaging. However, previous studies often fail to properly incorporate priors tailored to neuroscience, such as preferring… ▽ More Determining functional brain connectivity is crucial to understanding the brain and neural differences underlying disorders such as autism. Recent studies have used Gaussian graphical models to learn brain connectivity via statistical dependencies across brain regions from neuroimaging. However, previous studies often fail to properly incorporate priors tailored to neuroscience, such as preferring shorter connections. To remedy this problem, the paper here introduces a novel, weighted-$\ell_1$, multi-task graphical model (W-SIMULE). This model elegantly incorporates a flexible prior, along with a parallelizable formulation. Additionally, W-SIMULE extends the often-used Gaussian assumption, leading to considerable performance increases. Here, applications to fMRI data show that W-SIMULE succeeds in determining functional connectivity in terms of (1) log-likelihood, (2) finding edges that differentiate groups, and (3) classifying different groups based on their connectivity, achieving 58.6\% accuracy on the ABIDE dataset. Having established W-SIMULE's effectiveness, it links four key areas to autism, all of which are consistent with the literature. Due to its elegant domain adaptivity, W-SIMULE can be readily applied to various data types to effectively estimate connectivity. △ Less

Submitted 21 September, 2017; v1 submitted 12 September, 2017; originally announced September 2017.

Comments: 8 pages

arXiv:1708.01857 [pdf]

dbMPIKT: A web resource for the kinetic and thermodynamic database of mutant protein interactions

Authors: Quanya Liu, Peng Chen, Bing Wang, Jinyan Li

Abstract: Protein-protein interactions (PPIs) perform important roles on biological functions. Researches of mutants on protein interactions can further understand PPIs. In the past, many researchers have developed databases that stored mutants on protein interactions, which are old and not updated till now. To address the issue, we developed a kinetic and thermodynamic database of mutant protein interactio… ▽ More Protein-protein interactions (PPIs) perform important roles on biological functions. Researches of mutants on protein interactions can further understand PPIs. In the past, many researchers have developed databases that stored mutants on protein interactions, which are old and not updated till now. To address the issue, we developed a kinetic and thermodynamic database of mutant protein interactions (dbMPIKT) that can be freely accessible at our website. This database contains 5291 mutants that integrated data from previous databases and data from literatures for nearly three years. Furthermore, the data were analyzed, involving mutation number, mutation type, protein pair source and network map construction. On the whole, the database provides new data to further improve the study on PPIs. Website: http://210.45.212.128/lqy/index.php △ Less

Submitted 6 August, 2017; originally announced August 2017.

arXiv:1703.10927 [pdf, other]

Feature functional theory - binding predictor (FFT-BP) for the blind prediction of binding free energies

Authors: Bao Wang, Zhixiong Zhao, Duc D. Nguyen, Guo-Wei Wei

Abstract: We present a feature functional theory - binding predictor (FFT-BP) for the protein-ligand binding affinity prediction. The underpinning assumptions of FFT-BP are as follows: i) representability: there exists a microscopic feature vector that can uniquely characterize and distinguish one protein-ligand complex from another; ii) feature-function relationship: the macroscopic features, including bin… ▽ More We present a feature functional theory - binding predictor (FFT-BP) for the protein-ligand binding affinity prediction. The underpinning assumptions of FFT-BP are as follows: i) representability: there exists a microscopic feature vector that can uniquely characterize and distinguish one protein-ligand complex from another; ii) feature-function relationship: the macroscopic features, including binding free energy, of a complex is a functional of microscopic feature vectors; and iii) similarity: molecules with similar microscopic features have similar macroscopic features, such as binding affinity. Physical models, such as implicit solvent models and quantum theory, are utilized to extract microscopic features, while machine learning algorithms are employed to rank the similarity among protein-ligand complexes. A large variety of numerical validations and tests confirms the accuracy and robustness of the proposed FFT-BP model. The root mean square errors (RMSEs) of FFT-BP blind predictions of a benchmark set of 100 complexes, the PDBBind v2007 core set of 195 complexes and the PDBBind v2015 core set of 195 complexes are 1.99, 2.02 and 1.92 kcal/mol, respectively. Their corresponding Pearson correlation coefficients are 0.75, 0.80, and 0.78, respectively. △ Less

Submitted 31 March, 2017; originally announced March 2017.

Comments: 25 pages, 11 figures

arXiv:1703.07844 [pdf, other]

doi 10.1002/pmic.201700232

SIMLR: A Tool for Large-Scale Genomic Analyses by Multi-Kernel Learning

Authors: Bo Wang, Daniele Ramazzotti, Luca De Sano, Junjie Zhu, Emma Pierson, Serafim Batzoglou

Abstract: We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for heterogenous samples. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of samples. SIMLR was benchmar… ▽ More We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for heterogenous samples. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of samples. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization. Availability and Implementation SIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on http://bioconductor.org. △ Less

Submitted 18 January, 2018; v1 submitted 21 March, 2017; originally announced March 2017.

arXiv:1610.06945 [pdf]

Decreased aneurysmal subarachnoid hemorrhage incidence rate in elderly population than in middle aged population: a retrospective analysis of 8,144 cases in Mainland China

Authors: Yi Xiang J Wang, Lihong Zhang, Lin Zhao, Jian He, Xian-Jun Zeng, Heng Liu, Yun-jun Yang, Shang-Wei Ding, Zhong-Fei Xu, Yong-Min He, Lin Yang, Lan Sun, Ke-jie Mu, Bai-Song Wang, Xiao-Hong Xu, Zhong-You Ji, Jian-hua Liu, Jin-Zhou Fang, Rui Hou, Feng Fan, Guang Ming Peng, Sheng-Hong Ju

Abstract: Purpose: Rupture of an intracranial aneurysm is the most common cause of subarachnoid haemorrhage (SAH), which is a life-threatening acute cerebrovascular event that typically affects working-age people. This study aims to investigate the aneurysmal SAH incidence rate in elderly population than in middle aged population in China. Materials and methods: Aneurysmal SAH cases were collected retrospec… ▽ More Purpose: Rupture of an intracranial aneurysm is the most common cause of subarachnoid haemorrhage (SAH), which is a life-threatening acute cerebrovascular event that typically affects working-age people. This study aims to investigate the aneurysmal SAH incidence rate in elderly population than in middle aged population in China. Materials and methods: Aneurysmal SAH cases were collected retrospectively from the archives of 21 hospitals in Mainland China. All the cases collected were from September 2016 and backward consecutively for a period of time up to 8 years. SAH was initially diagnosed by brain computed tomography, and CT angiography (CTA) or digital subtraction angiography (DSA) was followed and SAH was confirmed to be due to cerebral aneurysm. When for cases multiple bleeding occurred, the age of the first SAH was used in this study. The toltal incidence from all hospital at each age were summed together for females and males; then adjusted by the total population number at each age for females and males. The total population data was from the 2010 population census of the People's Republic of China. Results: In total there were 8,144 cases, with 4,861 females and 3,283 males. Our analysis shows for both females and males the relative aneurysmal SAH rate started to decrease after around 65 years old. The males the relative aneurysmal SAH rate might have started to decrease after around 55 years old. Conclusion: In contrast to previous reports, our data demonstrated a decreased aneurysmal subarachnoid hemorrhage incidence rate in elderly population than in middle aged population. Our data therefore support the hypothesis that aneurysms do not grow progressively once they form but probably either rupture or stabilize and that very elderly patients are at a reduced risk of rupture compared with atients who are younger with the same-sized aneurysms. △ Less

Submitted 19 October, 2016; originally announced October 2016.

Comments: Total 16 pages, 3 figures

arXiv:1607.06037 [pdf, other]

doi 10.1063/1.4963193

Automatic parametrization of implicit solvent models for the blind prediction of solvation free energies

Authors: Bao Wang, Chengzhang Wang, Guowei Wei

Abstract: In this work, a systematic protocol is proposed to automatically parametrize implicit solvent models with polar and nonpolar components. The proposed protocol utilizes the classical Poisson model or the Kohn-Sham density functional theory (KSDFT) based polarizable Poisson model for modeling polar solvation free energies. For the nonpolar component, either the standard model of surface area, molecu… ▽ More In this work, a systematic protocol is proposed to automatically parametrize implicit solvent models with polar and nonpolar components. The proposed protocol utilizes the classical Poisson model or the Kohn-Sham density functional theory (KSDFT) based polarizable Poisson model for modeling polar solvation free energies. For the nonpolar component, either the standard model of surface area, molecular volume, and van der Waals interactions, or a model with atomic surface areas and molecular volume is employed. Based on the assumption that similar molecules have similar parametrizations, we develop scoring and ranking algorithms to classify solute molecules. Four sets of radius parameters are combined with four sets of charge force fields to arrive at a total of 16 different parametrizations for the Poisson model. A large database with 668 experimental data is utilized to validate the proposed protocol. The lowest leave-one-out root mean square (RMS) error for the database is 1.33k cal/mol. Additionally, five subsets of the database, i.e., SAMPL0-SAMPL4, are employed to further demonstrate that the proposed protocol offers some of the best solvation predictions. The optimal RMS errors are 0.93, 2.82, 1.90, 0.78, and 1.03 kcal/mol, respectively for SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 test sets. These results are some of the best, to our best knowledge. △ Less

Submitted 9 July, 2016; originally announced July 2016.

Comments: 19 pages, 9 figures

MSC Class: 65-04

arXiv:1607.04594 [pdf, other]

Accurate, robust and reliable calculations of Poisson-Boltzmann solvation energies

Authors: Bao Wang, Guowei Wei

Abstract: Developing accurate solvers for the Poisson Boltzmann (PB) model is the first step to make the PB model suitable for implicit solvent simulation. Reducing the grid size influence on the performance of the solver benefits to increasing the speed of solver and providing accurate electrostatics analysis for solvated molecules. In this work, we explore the accurate coarse grid PB solver based on the G… ▽ More Developing accurate solvers for the Poisson Boltzmann (PB) model is the first step to make the PB model suitable for implicit solvent simulation. Reducing the grid size influence on the performance of the solver benefits to increasing the speed of solver and providing accurate electrostatics analysis for solvated molecules. In this work, we explore the accurate coarse grid PB solver based on the Green's function treatment of the singular charges, matched interface and boundary (MIB) method for treating the geometric singularities, and posterior electrostatic potential field extension for calculating the reaction field energy. We made our previous PB software, MIBPB, robust and provides almost grid size independent reaction field energy calculation. Large amount of the numerical tests verify the grid size independence merit of the MIBPB software. The advantage of MIBPB software directly make the acceleration of the PB solver from the numerical algorithm instead of utilization of advanced computer architectures. Furthermore, the presented MIBPB software is provided as a free online sever. △ Less

Submitted 9 July, 2016; originally announced July 2016.

Comments: 15 pages, 3 figures

MSC Class: 65-04

arXiv:1603.04054 [pdf, other]

Accurate, robust and reliable calculations of Poisson-Boltzmann binding energies

Authors: Duc D. Nguyen, Bao Wang, Guo-wei Wei

Abstract: Poisson-Boltzmann (PB) model is one of the most popular implicit solvent models in biophysical modeling and computation. The ability of providing accurate and reliable PB estimation of electrostatic solvation free energy, $ΔG_{\text{el}}$, and binding free energy, $ΔΔG_{\text{el}}$, is of tremendous significance to computational biophysics and biochemistry. Recently, it has been warned in the lite… ▽ More Poisson-Boltzmann (PB) model is one of the most popular implicit solvent models in biophysical modeling and computation. The ability of providing accurate and reliable PB estimation of electrostatic solvation free energy, $ΔG_{\text{el}}$, and binding free energy, $ΔΔG_{\text{el}}$, is of tremendous significance to computational biophysics and biochemistry. Recently, it has been warned in the literature (Journal of Chemical Theory and Computation 2013, 9, 3677-3685) that the widely used grid spacing of $0.5$ Å$ $ produces unacceptable errors in $ΔΔG_{\text{el}}$ estimation with the solvent exclude surface (SES). In this work, we investigate the grid dependence of our PB solver (MIBPB) with SESs for estimating both electrostatic solvation free energies and electrostatic binding free energies. It is found that the relative absolute error of $ΔG_{\text{el}}$ obtained at the grid spacing of $1.0$ Å$ $ compared to $ΔG_{\text{el}}$ at $0.2$ Å$ $ averaged over 153 molecules is less than 0.2\%. Our results indicate that the use of grid spacing $0.6$ Å$ $ ensures accuracy and reliability in $ΔΔG_{\text{el}}$ calculation. In fact, the grid spacing of $1.1$ Å$ $ appears to deliver adequate accuracy for high throughput screening. △ Less

Submitted 9 June, 2016; v1 submitted 13 March, 2016; originally announced March 2016.

Comments: 26 pages, 7 figures

arXiv:1412.2368 [pdf, other]

Objective-oriented Persistent Homology

Authors: Bao Wang, Guo-Wei Wei

Abstract: Persistent homology provides a new approach for the topological simplification of big data via measuring the life time of intrinsic topological features in a filtration process and has found its success in scientific and engineering applications. However, such a success is essentially limited to qualitative data characterization, identification and analysis (CIA). In this work, we outline a genera… ▽ More Persistent homology provides a new approach for the topological simplification of big data via measuring the life time of intrinsic topological features in a filtration process and has found its success in scientific and engineering applications. However, such a success is essentially limited to qualitative data characterization, identification and analysis (CIA). In this work, we outline a general protocol to construct objective-oriented persistent homology methods. The minimization of the objective functional leads to a Laplace-Beltrami operator which generates a multiscale representation of the initial data and offers an objective oriented filtration process. The resulting differential geometry based objective-oriented persistent homology is able to preserve desirable geometric features in the evolutionary filtration and enhances the corresponding topological persistence. The consistence between Laplace-Beltrami flow based filtration and Euclidean distance based filtration is confirmed on the Vietoris-Rips complex for a large amount of numerical tests. The convergence and reliability of the present Laplace-Beltrami flow based cubical complex filtration approach are analyzed over various spatial and temporal mesh sizes. The efficiency and robustness of the present method are verified by more than 500 fullerene molecules. It is shown that the proposed persistent homology based quantitative model offers good predictions of total curvature energies for ten types of fullerene isomers. The present work offers the first example to design objective-oriented persistent homology to enhance or preserve desirable features in the original data during the filtration process and then automatically detect or extract the corresponding topological traits from the data. △ Less

Submitted 7 December, 2014; originally announced December 2014.

Comments: 13 figures and 96 references

arXiv:1405.1573 [pdf, ps, other]

doi 10.1209/0295-5075/107/58006

Evolutionary dynamics of cooperation on interdependent networks with Prisoner's Dilemma and Snowdrift Game

Authors: Baokui Wang, Zhenhua Pei, Long Wang

Abstract: The world in which we are living is a huge network of networks and should be described by interdependent networks. The interdependence between networks significantly affects the evolutionary dynamics of cooperation on them. Meanwhile, due to the diversity and complexity of social and biological systems, players on different networks may not interact with each other by the same way, which should be… ▽ More The world in which we are living is a huge network of networks and should be described by interdependent networks. The interdependence between networks significantly affects the evolutionary dynamics of cooperation on them. Meanwhile, due to the diversity and complexity of social and biological systems, players on different networks may not interact with each other by the same way, which should be described by multiple models in evolutionary game theory, such as the Prisoner's Dilemma and Snowdrift Game. We therefore study the evolutionary dynamics of cooperation on two interdependent networks playing different games respectively. We clearly evidence that, with the increment of network interdependence, the evolution of cooperation is dramatically promoted on the network playing Prisoner's Dilemma. The cooperation level of the network playing Snowdrift Game reduces correspondingly, although it is almost invisible. In particular, there exists an optimal intermediate region of network interdependence maximizing the growth rate of the evolution of cooperation on the network playing Prisoner's Dilemma. Remarkably, players contacting with other network have advantage in the evolution of cooperation than the others on the same network. △ Less

Submitted 7 July, 2014; v1 submitted 7 May, 2014; originally announced May 2014.

Comments: 6 pages, 6 figures

arXiv:1307.1898 [pdf]

doi 10.1103/PhysRevLett.111.208102

Bursts of Active Transport in Living Cells

Authors: Bo Wang, James Kuo, Steve Granick

Abstract: We scrutinize the temporally-resolved speed of active cargo transport in living cells, and show intermittent bursting motions. These nonlinear fluctuations follow a scaling law over several decades of time and space, the statistical regularities displaying a time-averaged shape that we interpret to reflect stress buildup followed by rapid release. The power law of scaling is the same as seen in dr… ▽ More We scrutinize the temporally-resolved speed of active cargo transport in living cells, and show intermittent bursting motions. These nonlinear fluctuations follow a scaling law over several decades of time and space, the statistical regularities displaying a time-averaged shape that we interpret to reflect stress buildup followed by rapid release. The power law of scaling is the same as seen in driven jammed colloids, granular, and magnetic systems. The implied regulation of active transport with environmental obstruction extends the classical notion of molecular crowding. △ Less

Submitted 7 July, 2013; originally announced July 2013.

arXiv:1306.0505 [pdf]

Diagnosing Heterogeneous Dynamics in Single Molecule/Particle Trajectories with Multiscale Wavelets

Authors: Kejia Chen, Bo Wang, Juan Guan, Steve Granick

Abstract: We describe a simple automated method to extract and quantify transient heterogeneous dynamical changes from large datasets generated in single molecule/particle tracking experiments. Based on wavelet transform, the method transforms raw data to locally match dynamics of interest. This is accomplished using statistically adaptive universal thresholding, whose advantage is to avoid a single arbitra… ▽ More We describe a simple automated method to extract and quantify transient heterogeneous dynamical changes from large datasets generated in single molecule/particle tracking experiments. Based on wavelet transform, the method transforms raw data to locally match dynamics of interest. This is accomplished using statistically adaptive universal thresholding, whose advantage is to avoid a single arbitrary threshold that might conceal individual variability across populations. How to implement this multiscale method is described, focusing on local confined diffusion separated by transient transport periods or hopping events, with 3 specific examples: in cell biology, biotechnology, and glassy colloid dynamics. This computationally-efficient method can run routinely on hundreds of millions of data points analyzed within an hour on a desktop personal computer. △ Less

Submitted 3 June, 2013; originally announced June 2013.

arXiv:1305.0361 [pdf, ps, other]

doi 10.1038/srep03292

Braess's Paradox in Epidemic Game: Better Condition Results in Less Payoff

Authors: Hai-Feng Zhang, Zimo Yang, Zhi-Xi Wu, Bing-Hong Wang, Tao Zhou

Abstract: Facing the threats of infectious diseases, we take various actions to protect ourselves, but few studies considered an evolving system with competing strategies. In view of that, we propose an evolutionary epidemic model coupled with human behaviors, where individuals have three strategies: vaccination, self-protection and laissez faire, and could adjust their strategies according to their neighbo… ▽ More Facing the threats of infectious diseases, we take various actions to protect ourselves, but few studies considered an evolving system with competing strategies. In view of that, we propose an evolutionary epidemic model coupled with human behaviors, where individuals have three strategies: vaccination, self-protection and laissez faire, and could adjust their strategies according to their neighbors' strategies and payoffs at the beginning of each new season of epidemic spreading. We found a counter-intuitive phenomenon analogous to the well-known \emph{Braess's Paradox}, namely a better condition may lead to worse performance. Specifically speaking, increasing the successful rate of self-protection does not necessarily reduce the epidemic size or improve the system payoff. This phenomenon is insensitive to the network topologies, and can be well explained by a mean-field approximation. Our study demonstrates an important fact that a better condition for individuals may yield a worse outcome for the society. △ Less

Submitted 2 May, 2013; originally announced May 2013.

Comments: 17 pages, 5 figures

Journal ref: Scientific Reports,3, (2013), 3292

Showing 1–50 of 69 results for author: Wang, B