-
A deep graph model for the signed interaction prediction in biological network
Authors:
Shuyi Jin,
Mengji Zhang,
Meijie Wang,
Lun Yu
Abstract:
In pharmaceutical research, the strategy of drug repurposing accelerates the development of new therapies while reducing R&D costs. Network pharmacology lays the theoretical groundwork for identifying new drug indications, and deep graph models have become essential for their precision in mapping complex biological networks. Our study introduces an advanced graph model that utilizes graph convolut…
▽ More
In pharmaceutical research, the strategy of drug repurposing accelerates the development of new therapies while reducing R&D costs. Network pharmacology lays the theoretical groundwork for identifying new drug indications, and deep graph models have become essential for their precision in mapping complex biological networks. Our study introduces an advanced graph model that utilizes graph convolutional networks and tensor decomposition to effectively predict signed chemical-gene interactions. This model demonstrates superior predictive performance, especially in handling the polar relations in biological networks. Our research opens new avenues for drug discovery and repurposing, especially in understanding the mechanism of actions of drugs.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling
Authors:
Kangjie Zheng,
Siyu Long,
Tianyu Lu,
Junwei Yang,
Xinyu Dai,
Ming Zhang,
Zaiqing Nie,
Wei-Ying Ma,
Hao Zhou
Abstract:
Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small mole…
▽ More
Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins. The source codes of ESM-AA are publicly released at https://github.com/zhengkangjie/ESM-AA.
△ Less
Submitted 12 June, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression Data
Authors:
Haoyang Liu,
Yijiang Li,
Jinglin Jian,
Yuxuan Cheng,
Jianrong Lu,
Shuyi Guo,
Jinglei Zhu,
Mianchen Zhang,
Miantong Zhang,
Haohan Wang
Abstract:
Machine learning has emerged as a powerful tool for scientific discovery, enabling researchers to extract meaningful insights from complex datasets. For instance, it has facilitated the identification of disease-predictive genes from gene expression data, significantly advancing healthcare. However, the traditional process for analyzing such datasets demands substantial human effort and expertise…
▽ More
Machine learning has emerged as a powerful tool for scientific discovery, enabling researchers to extract meaningful insights from complex datasets. For instance, it has facilitated the identification of disease-predictive genes from gene expression data, significantly advancing healthcare. However, the traditional process for analyzing such datasets demands substantial human effort and expertise for the data selection, processing, and analysis. To address this challenge, we introduce a novel framework, a Team of AI-made Scientists (TAIS), designed to streamline the scientific discovery pipeline. TAIS comprises simulated roles, including a project manager, data engineer, and domain expert, each represented by a Large Language Model (LLM). These roles collaborate to replicate the tasks typically performed by data scientists, with a specific focus on identifying disease-predictive genes. Furthermore, we have curated a benchmark dataset to assess TAIS's effectiveness in gene identification, demonstrating our system's potential to significantly enhance the efficiency and scope of scientific exploration. Our findings represent a solid step towards automating scientific discovery through large language models.
△ Less
Submitted 20 February, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Highly Accurate Disease Diagnosis and Highly Reproducible Biomarker Identification with PathFormer
Authors:
Zehao Dong,
Qihang Zhao,
Philip R. O. Payne,
Michael A Province,
Carlos Cruchaga,
Muhan Zhang,
Tianyu Zhao,
Yixin Chen,
Fuhai Li
Abstract:
Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction (diagno…
▽ More
Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction (diagnosis) accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer, which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability ( 30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer's Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Understanding YTHDF2-mediated mRNA Degradation By m6A-BERT-Deg
Authors:
Ting-He Zhang,
Sumin Jo,
Michelle Zhang,
Kai Wang,
Shou-Jiang Gao,
Yufei Huang
Abstract:
N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation, and splicing. Furthermore, it plays a critical role in the regulation of RNA degradation by primarily recruiting the YTHDF2 reader protein. However, the selective regulation of mRNA decay of the m6A-methylated mRNA through YTHDF2 bin…
▽ More
N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation, and splicing. Furthermore, it plays a critical role in the regulation of RNA degradation by primarily recruiting the YTHDF2 reader protein. However, the selective regulation of mRNA decay of the m6A-methylated mRNA through YTHDF2 binding is poorly understood. To improve our understanding, we developed m6A-BERT-Deg, a BERT model adapted for predicting YTHDF2-mediated degradation of m6A-methylated mRNAs. We meticulously assembled a high-quality training dataset by integrating multiple data sources for the HeLa cell line. To overcome the limitation of small training samples, we employed a pre-training-fine-tuning strategy by first performing a self-supervised pre-training of the model on 427,760 unlabeled m6A site sequences. The test results demonstrated the importance of this pre-training strategy in enabling m6A-BERT-Deg to outperform other benchmark models. We further conducted a comprehensive model interpretation and revealed a surprising finding that the presence of co-factors in proximity to m6A sites may disrupt YTHDF2-mediated mRNA degradation, subsequently enhancing mRNA stability. We also extended our analyses to the HEK293 cell line, shedding light on the context-dependent YTHDF2-mediated mRNA degradation.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
The management of obstructive sleep apnea accompanied with mandibular retrognathia across the lifespan from orthodontic perspective
Authors:
Menghan Zhang,
Yuehua Liu
Abstract:
Obstructive sleep apnea (OSA) is a complex disease with complex etiology, which requires multidisciplinary cooperation in diagnosis and treatment. Mandibular retrognathia is strongly associated with OSA. Orthodontists can either correct the mandibular retrognathia of pediatric OSA via various kinds of orthodontic appliances, following adenoidectomy and tonsillectomy, or enlarge upper airway by man…
▽ More
Obstructive sleep apnea (OSA) is a complex disease with complex etiology, which requires multidisciplinary cooperation in diagnosis and treatment. Mandibular retrognathia is strongly associated with OSA. Orthodontists can either correct the mandibular retrognathia of pediatric OSA via various kinds of orthodontic appliances, following adenoidectomy and tonsillectomy, or enlarge upper airway by mandibular advancement device (MAD) through repositioning the mandible and tongue of adult OSA patients. This mini review was to investigate the therapy of MAD to adult OSA as well as orthodontic treatment to pediatric OSA.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
The molecular pathology of genioglossus in obstructive sleep apnea
Authors:
Menghan Zhang,
Yuehua Liu
Abstract:
Obstructive sleep apnea (OSA) is a sleep respiratory disease characterized by sleep snoring accompanied by apnea and daytime sleeplessness. It is a complex disease, with the multifactorial etiology, and the pathology is incompletely understood. Genioglossus (GG), the largest dilator of upper airway, whose fatigue is strongly correlated to onset of OSA. This brief review was to investigate the path…
▽ More
Obstructive sleep apnea (OSA) is a sleep respiratory disease characterized by sleep snoring accompanied by apnea and daytime sleeplessness. It is a complex disease, with the multifactorial etiology, and the pathology is incompletely understood. Genioglossus (GG), the largest dilator of upper airway, whose fatigue is strongly correlated to onset of OSA. This brief review was to investigate the pathogenesis of OSA targeting on GG from different risk factors as gender, obesity, and aging, and the molecular mechanism of GG injury in OSA pathogenesis. We hope to find the targeted molecular mechanism on GG in OSA treatment.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory
Authors:
Ankur Sikarwar,
Mengmi Zhang
Abstract:
Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (W…
▽ More
Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.
△ Less
Submitted 1 November, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Mol-PECO: a deep learning model to predict human olfactory perception from molecular structures
Authors:
Mengji Zhang,
Yusuke Hiki,
Akira Funahashi,
Tetsuya J. Kobayashi
Abstract:
While visual and auditory information conveyed by wavelength of light and frequency of sound have been decoded, predicting olfactory information encoded by the combination of odorants remains challenging due to the unknown and potentially discontinuous perceptual space of smells and odorants. Herein, we develop a deep learning model called Mol-PECO (Molecular Representation by Positional Encoding…
▽ More
While visual and auditory information conveyed by wavelength of light and frequency of sound have been decoded, predicting olfactory information encoded by the combination of odorants remains challenging due to the unknown and potentially discontinuous perceptual space of smells and odorants. Herein, we develop a deep learning model called Mol-PECO (Molecular Representation by Positional Encoding of Coulomb Matrix) to predict olfactory perception from molecular structures. Mol-PECO updates the learned atom embedding by directional graph convolutional networks (GCN), which model the Laplacian eigenfunctions as positional encoding, and Coulomb matrix, which encodes atomic coordinates and charges. With a comprehensive dataset of 8,503 molecules, Mol-PECO directly achieves an area-under-the-receiver-operating-characteristic (AUROC) of 0.813 in 118 odor descriptors, superior to the machine learning of molecular fingerprints (AUROC of 0.761) and GCN of adjacency matrix (AUROC of 0.678). The learned embeddings by Mol-PECO also capture a meaningful odor space with global clustering of descriptors and local retrieval of similar odorants. Our work may promote the understanding and decoding of the olfactory sense and mechanisms.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
GraphVF: Controllable Protein-Specific 3D Molecule Generation with Variational Flow
Authors:
Fang Sun,
Zhihao Zhan,
Hongyu Guo,
Ming Zhang,
Jian Tang
Abstract:
Designing molecules that bind to specific target proteins is a fundamental task in drug discovery. Recent models leverage geometric constraints to generate ligand molecules that bind cohesively with specific protein pockets. However, these models cannot effectively generate 3D molecules with 2D skeletal curtailments and property constraints, which are pivotal to drug potency and development. To ta…
▽ More
Designing molecules that bind to specific target proteins is a fundamental task in drug discovery. Recent models leverage geometric constraints to generate ligand molecules that bind cohesively with specific protein pockets. However, these models cannot effectively generate 3D molecules with 2D skeletal curtailments and property constraints, which are pivotal to drug potency and development. To tackle this challenge, we propose GraphVF, a variational flow-based framework that combines 2D topology and 3D geometry, for controllable generation of binding 3D molecules. Empirically, our method achieves state-of-the-art binding affinity and realistic sub-structural layouts for protein-specific generation. In particular, GraphVF represents the first controllable geometry-aware, protein-specific molecule generation method, which can generate binding 3D molecules with tailored sub-structures and physio-chemical properties. Our code is available at https://github.com/Franco-Solis/GraphVF-code.
△ Less
Submitted 23 February, 2023;
originally announced April 2023.
-
Deep radiomic signature with immune cell markers predicts the survival of glioma patients
Authors:
Ahmad Chaddad,
Paul Daniel Mingli Zhang,
Saima Rathore,
Paul Sargos,
Christian Desrosiers,
Tamim Niazi
Abstract:
Imaging biomarkers offer a non-invasive way to predict the response of immunotherapy prior to treatment. In this work, we propose a novel type of deep radiomic features (DRFs) computed from a convolutional neural network (CNN), which capture tumor characteristics related to immune cell markers and overall survival. Our study uses four MRI sequences (T1-weighted, T1-weighted post-contrast, T2-weigh…
▽ More
Imaging biomarkers offer a non-invasive way to predict the response of immunotherapy prior to treatment. In this work, we propose a novel type of deep radiomic features (DRFs) computed from a convolutional neural network (CNN), which capture tumor characteristics related to immune cell markers and overall survival. Our study uses four MRI sequences (T1-weighted, T1-weighted post-contrast, T2-weighted and FLAIR) with corresponding immune cell markers of 151 patients with brain tumor. The proposed method extracts a total of 180 DRFs by aggregating the activation maps of a pre-trained 3D-CNN within labeled tumor regions of MRI scans. These features offer a compact, yet powerful representation of regional texture encoding tissue heterogeneity. A comprehensive set of experiments is performed to assess the relationship between the proposed DRFs and immune cell markers, and measure their association with overall survival. Results show a high correlation between DRFs and various markers, as well as significant differences between patients grouped based on these markers. Moreover, combining DRFs, clinical features and immune cell markers as input to a random forest classifier helps discriminate between short and long survival outcomes, with AUC of 72\% and p=2.36$\times$10$^{-5}$. These results demonstrate the usefulness of proposed DRFs as non-invasive biomarker for predicting treatment response in patients with brain tumors.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Modeling of Textures to Predict Immune Cell Status and Survival of Brain Tumour Patients
Authors:
Ahmad Chaddad,
Mingli Zhang,
Lama Hassan,
Tamim Niazi
Abstract:
Radiomics has shown a capability for different types of cancers such as glioma to predict the clinical outcome. It can have a non-invasive means of evaluating the immunotherapy response prior to treatment. However, the use of deep convolutional neural networks (CNNs)-based radiomics requires large training image sets. To avoid this problem, we investigate a new imaging features that model distribu…
▽ More
Radiomics has shown a capability for different types of cancers such as glioma to predict the clinical outcome. It can have a non-invasive means of evaluating the immunotherapy response prior to treatment. However, the use of deep convolutional neural networks (CNNs)-based radiomics requires large training image sets. To avoid this problem, we investigate a new imaging features that model distribution with a Gaussian mixture model (GMM) of learned 3D CNN features. Using these deep radiomic features (DRFs), we aim to predict the immune marker status (low versus high) and overall survival for glioma patients. We extract the DRFs by aggregating the activation maps of a pre-trained 3D-CNN within labeled tumor regions of MRI scans that corresponded immune markers of 151 patients. Our experiments are performed to assess the relationship between the proposed DRFs, three immune cell markers (Macrophage M1, Neutrophils and T Cells Follicular Helper), and measure their association with overall survival. Using the random forest (RF) model, DRFs was able to predict the immune marker status with area under the ROC curve (AUC) of 78.67, 83.93 and 75.67\% for Macrophage M1, Neutrophils and T Cells Follicular Helper, respectively. Combined the immune markers with DRFs and clinical variables, Kaplan-Meier estimator and Log-rank test achieved the most significant difference between predicted groups of patients (short-term versus long-term survival) with p\,=\,4.31$\times$10$^{-7}$ compared to p\,=\,0.03 for Immune cell markers, p\,=\,0.07 for clinical variables , and p\,=\,1.45$\times$10$^{-5}$ for DRFs. Our findings indicate that the proposed features (DRFs) used in RF models may significantly consider prognosticating patients with brain tumour prior to surgery through regularly acquired imaging data.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker Design
Authors:
Yinan Huang,
Xingang Peng,
Jianzhu Ma,
Muhan Zhang
Abstract:
Deep learning has achieved tremendous success in designing novel chemical compounds with desirable pharmaceutical properties. In this work, we focus on a new type of drug design problem -- generating a small "linker" to physically attach two independent molecules with their distinct functions. The main computational challenges include: 1) the generation of linkers is conditional on the two given m…
▽ More
Deep learning has achieved tremendous success in designing novel chemical compounds with desirable pharmaceutical properties. In this work, we focus on a new type of drug design problem -- generating a small "linker" to physically attach two independent molecules with their distinct functions. The main computational challenges include: 1) the generation of linkers is conditional on the two given molecules, in contrast to generating full molecules from scratch in previous works; 2) linkers heavily depend on the anchor atoms of the two molecules to be connected, which are not known beforehand; 3) 3D structures and orientations of the molecules need to be considered to avoid atom clashes, for which equivariance to E(3) group are necessary. To address these problems, we propose a conditional generative model, named 3DLinker, which is able to predict anchor atoms and jointly generate linker graphs and their 3D structures based on an E(3) equivariant graph variational autoencoder. So far as we know, there are no previous models that could achieve this task. We compare our model with multiple conditional generative models modified from other molecular design tasks and find that our model has a significantly higher rate in recovering molecular graphs, and more importantly, accurately predicting the 3D coordinates of all the atoms.
△ Less
Submitted 15 May, 2022;
originally announced May 2022.
-
Assembly of Model Postsynaptic Densities Involves Interactions Auxiliary to Stoichiometric Binding
Authors:
Yi-Hsuan Lin,
Haowei Wu,
Bowen Jia,
Mingjie Zhang,
Hue Sun Chan
Abstract:
The assembly of functional biomolecular condensates often involves liquid-liquid phase separation (LLPS) of proteins with multiple modular domains, which can be folded or conformationally disordered to various degrees. To understand the LLPS-driving domain-domain interactions, a fundamental question is how readily the interactions in the condensed phase can be inferred from inter-domain interactio…
▽ More
The assembly of functional biomolecular condensates often involves liquid-liquid phase separation (LLPS) of proteins with multiple modular domains, which can be folded or conformationally disordered to various degrees. To understand the LLPS-driving domain-domain interactions, a fundamental question is how readily the interactions in the condensed phase can be inferred from inter-domain interactions in dilute solutions. In particular, are the interactions leading to LLPS exclusively those underlying the formation of discrete inter-domain complexes in homogeneous solutions? We address this question by developing a mean-field LLPS theory of two stoichiometrically constrained solute species. The theory is applied to the neuronal proteins SynGAP and PSD-95, whose complex coacervate serves as a rudimentary model for neuronal postsynaptic densities (PSDs). The predicted phase behaviors are compared with experiments. Previously, a three-SynGAP, two-PSD-95 ratio was determined for SynGAP/PSD-95 complexes in dilute solutions. However, when this 3:2 stoichiometry is uniformly imposed in our theory encompassing both dilute and condensed phases, the tie-line pattern of the predicted SynGAP/PSD-95 phase diagram differs drastically from that obtained experimentally. In contrast, theories embodying alternate scenarios postulating auxiliary SynGAP-PSD-95 as well as SynGAP-SynGAP and PSD-95-PSD-95 interactions in addition to those responsible for stoichiometric SynGAP/PSD-95 complexes produce tie-line patterns consistent with experiment. Hence, our combined theoretical-experimental analysis indicates that weaker interactions or higher-order complexes beyond the 3:2 stoichiometry, but not yet documented, are involved in the formation of SynGAP/PSD-95 condensates, imploring future efforts to ascertain the nature of these auxiliary interactions in PSD-like LLPS.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Visual Search Asymmetry: Deep Nets and Humans Share Similar Inherent Biases
Authors:
Shashi Kant Gupta,
Mengmi Zhang,
Chia-Chien Wu,
Jeremy M. Wolfe,
Gabriel Kreiman
Abstract:
Visual search is a ubiquitous and often challenging daily task, exemplified by looking for the car keys at home or a friend in a crowd. An intriguing property of some classical search tasks is an asymmetry such that finding a target A among distractors B can be easier than finding B among A. To elucidate the mechanisms responsible for asymmetry in visual search, we propose a computational model th…
▽ More
Visual search is a ubiquitous and often challenging daily task, exemplified by looking for the car keys at home or a friend in a crowd. An intriguing property of some classical search tasks is an asymmetry such that finding a target A among distractors B can be easier than finding B among A. To elucidate the mechanisms responsible for asymmetry in visual search, we propose a computational model that takes a target and a search image as inputs and produces a sequence of eye movements until the target is found. The model integrates eccentricity-dependent visual recognition with target-dependent top-down cues. We compared the model against human behavior in six paradigmatic search tasks that show asymmetry in humans. Without prior exposure to the stimuli or task-specific training, the model provides a plausible mechanism for search asymmetry. We hypothesized that the polarity of search asymmetry arises from experience with the natural environment. We tested this hypothesis by training the model on augmented versions of ImageNet where the biases of natural images were either removed or reversed. The polarity of search asymmetry disappeared or was altered depending on the training protocol. This study highlights how classical perceptual properties can emerge in neural network models, without the need for task-specific training, but rather as a consequence of the statistical properties of the developmental diet fed to the model. All source code and data are publicly available at https://github.com/kreimanlab/VisualSearchAsymmetry.
△ Less
Submitted 6 November, 2021; v1 submitted 5 June, 2021;
originally announced June 2021.
-
Fluorescence-Enhanced Mid-Infrared Photothermal Microscopy
Authors:
Yi Zhang,
Haonan Zong,
Cheng Zong,
Yuying Tan,
Meng Zhang,
Yuewei Zhan,
Ji-Xin Cheng
Abstract:
Mid-infrared photothermal microscopy is a new chemical imaging technology in which a visible beam senses the photothermal effect induced by a pulsed infrared laser. This technology provides infrared spectroscopic information at sub-micron spatial resolution and enables infrared spectroscopy and imaging of living cells and organisms. Yet, current mid-infrared photothermal imaging sensitivity suffer…
▽ More
Mid-infrared photothermal microscopy is a new chemical imaging technology in which a visible beam senses the photothermal effect induced by a pulsed infrared laser. This technology provides infrared spectroscopic information at sub-micron spatial resolution and enables infrared spectroscopy and imaging of living cells and organisms. Yet, current mid-infrared photothermal imaging sensitivity suffers from a weak dependance of scattering on temperature and the image quality is vulnerable to the speckles caused by scattering. Here, we present a novel version of mid-infrared photothermal microscopy in which thermo-sensitive fluorescent probes are harnessed to sense the mid-infrared photothermal effect. The fluorescence intensity can be modulated at the level of 1% per Kelvin, which is 100 times larger than the modulation of scattering intensity. In addition, fluorescence emission is free of speckles, thus much improving the image quality. Moreover, fluorophores can target specific organelles or biomolecules, thus augmenting the specificity of photothermal imaging. Spectral fidelity is confirmed through fingerprinting a single bacterium. Finally, the photobleaching issue is successfully addressed through the development of a wide-field fluorescence-enhanced mid-infrared photothermal microscope which allows video rate bond-selective imaging of biological specimens.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Evolutionary games on simplicial complexes
Authors:
H. Guo,
D. Jia,
I. Sendiña-Nadal,
M. Zhang,
Z. Wang,
X. Li,
K. Alfaro-Bittner,
Y. Moreno,
S. Boccaletti
Abstract:
Elucidating the mechanisms that lead to cooperation is still one of the main scientific challenges of current times, as many common cooperative scenarios remain elusive and at odds with Darwin's natural selection theory. Here, we study evolutionary games on populations that are structured beyond pairwise interactions. Specifically, we introduce a general evolutionary approach that allows studying…
▽ More
Elucidating the mechanisms that lead to cooperation is still one of the main scientific challenges of current times, as many common cooperative scenarios remain elusive and at odds with Darwin's natural selection theory. Here, we study evolutionary games on populations that are structured beyond pairwise interactions. Specifically, we introduce a general evolutionary approach that allows studying situations in which indirect interactions via a neighbor other than the direct pairwise connection (or via a group of neighbors), impact the strategy of the focal player. To this end, we consider simplicial graphs that encode two- and three-body interactions, which enables to study competition between all possible pairs of social dilemmas and to scrutinize the role of three-body interactions in all the observed phenomenology. We simultaneously investigate how social dilemma with different Nash equilibria compete in simplicial structures and how such a competition is modulated by the unbalance of 2- and 1-simplices, which in its turn reflects the relative prevalence of pairwise or group interactions among the players. We report a number of results that: (i) support that higher-order games allow for non-dominant strategists to emerge and coexist with dominant ones, a scenario that can't be explained by any pairwise schemes, no matter the network of contacts; (ii) characterize a novel transition from dominant defection to dominant cooperation as a function of the simplicial structure of the population; and (iii) demonstrate that 2-simplex interactions are a source of strategy diversity, i.e. increasing the relative prevalence of group interactions always promotes diverse strategic identities of individuals. Our study constitutes a step forward in the quest for understanding the roots of cooperation and the mechanisms that sustain it in real-world and social environments.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Time-dependent Clearance of Cyclosporine in Adult Renal Transplant Recipients: A Population Pharmacokinetic Perspective
Authors:
Junjun Mao,
Xiaoyan Qiu,
Weiwei Qin,
Luyang Xu,
Ming Zhang,
Mingkang Zhong
Abstract:
Aim The pharmacokinetic (PK) properties of cyclosporine (CsA) in renal transplant recipients are patient- and time-dependent. Knowledge of this time-related variability is necessary to maintain or achieve CsA target exposure. Here, we aimed to identify factors explaining variabilities in CsA PK properties and characterise time-dependent clearance (CL/F) by performing a comprehensive analysis of Cs…
▽ More
Aim The pharmacokinetic (PK) properties of cyclosporine (CsA) in renal transplant recipients are patient- and time-dependent. Knowledge of this time-related variability is necessary to maintain or achieve CsA target exposure. Here, we aimed to identify factors explaining variabilities in CsA PK properties and characterise time-dependent clearance (CL/F) by performing a comprehensive analysis of CsA PK factors using population PK (popPK) modelling of long-term follow-up data from our institution. Methods In total, 3,674 whole-blood CsA concentrations from 183 patients who underwent initial renal transplantation were analysed using nonlinear mixed-effects modelling. The effects of potential covariates were selected according to a previous report and well-accepted theoretical mechanisms. Model-informed individualised therapeutic regimens were also conducted. Results A two-compartment model adequately described the data and the estimated mean CsA CL/F was 32.6 L h-1 (5%). Allometrically scaled body size, haematocrit (HCT) level, CGC haplotype carrier status, and postoperative time may contribute to CsA PK variability. The CsA bioavailability in patients receiving a prednisolone dose (PD) of 80 mg was 20.6% lower than that in patients receiving 20 mg. A significant decrease (52.6%) in CL/F was observed as the HCT increased from 10.5% to 60.5%. The CL/F of the non-CGC haplotype carrier was 14.4% lower than that of the CGC haplotype carrier at 3 months post operation. CsA dose adjustments should be considered in different postoperative periods. Conclusions By monitoring body size, HCT, PD, and CGC haplotype, changes in CsA CL/F over time could be predicted. Such information could be used to optimise CsA therapy.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
A Fully Integrated Sensor-Brain-Machine Interface System for Restoring Somatosensation
Authors:
Xilin Liu,
Hongjie Zhu,
Tian Qiu,
Srihari Y. Sritharan,
Dengteng Ge,
Shu Yang,
Milin Zhang,
Andrew G. Richardson,
Timothy H. Lucas,
Nader Engheta,
Jan Van der Spiegel
Abstract:
Sensory feedback is critical to the performance of neural prostheses that restore movement control after neurological injury. Recent advances in direct neural control of paralyzed arms present new requirements for miniaturized, low-power sensor systems. To address this challenge, we developed a fully-integrated wireless sensor-brain-machine interface (SBMI) system for communicating key somatosenso…
▽ More
Sensory feedback is critical to the performance of neural prostheses that restore movement control after neurological injury. Recent advances in direct neural control of paralyzed arms present new requirements for miniaturized, low-power sensor systems. To address this challenge, we developed a fully-integrated wireless sensor-brain-machine interface (SBMI) system for communicating key somatosensory signals, fingertip forces and limb joint angles, to the brain. The system consists of a tactile force sensor, an electrogoniometer, and a neural interface. The tactile force sensor features a novel optical waveguide on CMOS design for sensing. The electrogoniometer integrates an ultra low-power digital signal processor (DSP) for real-time joint angle measurement. The neural interface enables bidirectional neural stimulation and recording. Innovative designs of sensors and sensing interfaces, analog-to-digital converters (ADC) and ultra wide-band (UWB) wireless transceivers have been developed. The prototypes have been fabricated in 180nm standard CMOS technology and tested on the bench and in vivo. The developed system provides a novel solution for providing somatosensory feedback to next-generation neural prostheses.
△ Less
Submitted 17 October, 2020;
originally announced October 2020.
-
The Effect of Population Size for Pathogen Transmission on Prediction of COVID-19 Pandemic Spread
Authors:
Xuqi Zhang,
Haiqi Liu,
Hanning Tang,
Mei Zhang,
Xuedong Yuan,
Xiaojing Shen
Abstract:
Extreme public health interventions play a critical role in mitigating the local and global prevalence and pandemic potential of COVID-19. Here, we use population size for pathogen transmission to measure the intensity of public health interventions, which is a key characteristic variable for nowcasting and forecasting of the epidemic. By formulating a hidden Markov dynamic system and using nonlin…
▽ More
Extreme public health interventions play a critical role in mitigating the local and global prevalence and pandemic potential of COVID-19. Here, we use population size for pathogen transmission to measure the intensity of public health interventions, which is a key characteristic variable for nowcasting and forecasting of the epidemic. By formulating a hidden Markov dynamic system and using nonlinear filtering theory, we have developed a stochastic epidemic dynamic model under public health interventions. The model parameters and states are estimated in time from internationally available public data by combining an unscented filter and an interacting multiple model filter. Moreover, we consider the computability of the population size and provide its selection criterion. We estimate the mean of the basic reproductive number of China and the rest of the globe except China (GEC) to be 2.46 (95% CI: 2.41-2.51) and 3.64 (95% CI: (3.55-3.72), respectively. We infer that the number of latent infections of GEC is about 7.47*10^5 (95% CI: 7.32*10^5-7.62*10^5) as of April 2, 2020. We predict that the peak of infections in hospitals of GEC may reach 3.00*10^6 on the present trajectory, i.e., if the population size for pathogen transmission and epidemic parameters remains unchanged. If the control intensity is strengthened, e.g., 50% reduction or 75% reduction of the population size for pathogen transmission, the peak would decline to 1.84*10^6, 1.27*10^6, respectively.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
DS-GCNs: Connectome Classification Using Dynamic Spectral Graph Convolution Networks with Assistant Task Training
Authors:
Xiaodan Xing,
Qingfeng Li,
Hao Wei,
Minqing Zhang,
Yiqiang Zhan,
Xiang Sean Zhou,
Zhong Xue,
Feng Shi
Abstract:
Functional Connectivity (FC) matrices measure the regional interactions in the brain and have been widely used in neurological brain disease classification. However, a FC matrix is neither a natural image which contains shape and texture information, nor a vector of independent features, which renders the extracting of efficient features from matrices as a challenging problem. A brain network, als…
▽ More
Functional Connectivity (FC) matrices measure the regional interactions in the brain and have been widely used in neurological brain disease classification. However, a FC matrix is neither a natural image which contains shape and texture information, nor a vector of independent features, which renders the extracting of efficient features from matrices as a challenging problem. A brain network, also named as connectome, could forma a graph structure naturally, the nodes of which are brain regions and the edges are interregional connectivity. Thus, in this study, we proposed novel graph convolutional networks (GCNs) to extract efficient disease-related features from FC matrices. Considering the time-dependent nature of brain activity, we computed dynamic FC matrices with sliding-windows and implemented a graph convolution based LSTM (long short term memory) layer to process dynamic graphs. Moreover, the demographics of patients were also used to guide the classification. However, unlike in conventional methods where personal information, i.e., gender and age were added as extra inputs, we argue that this kind of approach may not actually improve the classification performance, for such personal information given in dataset was usually balanced distributed. In this paper, we proposed to utilize the demographic information as extra outputs and to share parameters among three networks predicting subject status, gender and age, which serve as assistant tasks. We tested the performance of the proposed architecture in ADNI II dataset to classify Alzheimer's disease patients from normal controls. The classification accuracy, sensitivity and specificity reach 0.90, 0.92 and 0.89 on ADNI II dataset.
△ Less
Submitted 10 December, 2019;
originally announced January 2020.
-
Effect of protein binding on exposure of unbound and total mycophenolic acid: a population pharmacokinetic analysis in Chinese adult kidney transplant recipients
Authors:
Changcheng Sheng,
Qun Zhao,
Wanjie Niu,
Xiaoyan Qiu,
Ming Zhang,
Zheng Jiao
Abstract:
AIMS A population pharmacokinetic (PK) analysis was performed to: (1) characterise the PK of unbound and total mycophenolic acid (MPA) and its 7-O-mycophenolic acid glucuronide (MPAG) metabolite, and (2) identify the clinically significant covariates that cause variability in the dose-exposure relationship to facilitate dose optimisation. METHODS A total of 740 unbound MPA (uMPA), 741 total MPA (t…
▽ More
AIMS A population pharmacokinetic (PK) analysis was performed to: (1) characterise the PK of unbound and total mycophenolic acid (MPA) and its 7-O-mycophenolic acid glucuronide (MPAG) metabolite, and (2) identify the clinically significant covariates that cause variability in the dose-exposure relationship to facilitate dose optimisation. METHODS A total of 740 unbound MPA (uMPA), 741 total MPA (tMPA) and 734 total MPAG (tMPAG) concentration-time data from 58 Chinese kidney transplant patients were analysed using a nonlinear mixed-effect model. The influence of covariates was tested using a stepwise procedure. RESULTS The PK of unbound MPA and MPAG were characterised by a two- and one-compartment model with first-order elimination, respectively. Apparent clearance of uMPA (CLuMPA/F) was estimated to be 852 L/h with a relative standard error (RSE) of 7.1%. The tMPA and uMPA were connected using a linear protein binding model, in which the protein binding rate constant (kB) increased non-linearly with the serum albumin (ALB) concentration. The estimated kB was 53.4 /h (RSE, 2.3%) for patients with ALB of 40 g/L. In addition, model-based simulation showed that changes in ALB substantially affected tMPA but not uMPA exposure. CONCLUSIONS The established model adequately described the population PK characteristics of the uMPA, tMPA, and MPAG. The estimated CLuMPA/F and unbound fraction of MPA (FUMPA) in Chinese kidney transplant recipients were comparable to those published previously in Caucasians. We recommend monitoring uMPA instead of tMPA to optimise mycophenolate mofetil (MMF) dosing for patients with lower ALB levels.
△ Less
Submitted 20 December, 2019;
originally announced December 2019.
-
Deep radiomic features from MRI scans predict survival outcome of recurrent glioblastoma
Authors:
Ahmad Chaddad,
Saima Rathore,
Mingli Zhang,
Christian Desrosiers,
Tamim Niazi
Abstract:
This paper proposes to use deep radiomic features (DRFs) from a convolutional neural network (CNN) to model fine-grained texture signatures in the radiomic analysis of recurrent glioblastoma (rGBM). We use DRFs to predict survival of rGBM patients with preoperative T1-weighted post-contrast MR images (n=100). DRFs are extracted from regions of interest labelled by a radiation oncologist and used t…
▽ More
This paper proposes to use deep radiomic features (DRFs) from a convolutional neural network (CNN) to model fine-grained texture signatures in the radiomic analysis of recurrent glioblastoma (rGBM). We use DRFs to predict survival of rGBM patients with preoperative T1-weighted post-contrast MR images (n=100). DRFs are extracted from regions of interest labelled by a radiation oncologist and used to compare between short-term and long-term survival patient groups. Random forest (RF) classification is employed to predict survival outcome (i.e., short or long survival), as well as to identify highly group-informative descriptors. Classification using DRFs results in an area under the ROC curve (AUC) of 89.15% (p<0.01) in predicting rGBM patient survival, compared to 78.07% (p<0.01) when using standard radiomic features (SRF). These results indicate the potential of DRFs as a prognostic marker for patients with rGBM.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Exact power spectrum in a minimal hybrid model of stochastic gene expression oscillations
Authors:
Chen Jia,
Hong Qian,
Michael Q. Zhang
Abstract:
Stochastic oscillations in individual cells are usually characterized by a non-monotonic power spectrum with an oscillatory autocorrelation function. Here we develop an analytical approach of stochastic oscillations in a minimal hybrid model of stochastic gene expression including promoter state switching, protein synthesis and degradation, as well as a genetic feedback loop. The oscillations obse…
▽ More
Stochastic oscillations in individual cells are usually characterized by a non-monotonic power spectrum with an oscillatory autocorrelation function. Here we develop an analytical approach of stochastic oscillations in a minimal hybrid model of stochastic gene expression including promoter state switching, protein synthesis and degradation, as well as a genetic feedback loop. The oscillations observed in our model are noise-induced since the deterministic theory predicts stable fixed points. The autocorrelated function, power spectrum, and steady-state distribution of protein concentration fluctuations are computed in closed form without making any approximations. Using the exactly solvable model, we illustrate sustained oscillations as a circular motion along a stochastic hysteresis loop induced by gene state switching. A triphasic stochastic bifurcation upon the increasing strength of negative feedback is observed, which reveals how stochastic bursts evolve into stochastic oscillations. In our model, oscillations tend to occur when the protein is relatively stable and when gene switching is relatively slow. Translational bursting is found to enhance the robustness and broaden the region of stochastic oscillations. These results provide deeper insights into R. Thomas' two conjectures for single-cell gene expression kinetics.
△ Less
Submitted 7 February, 2024; v1 submitted 20 September, 2019;
originally announced September 2019.
-
Topological portraits of multiscale coordination dynamics
Authors:
Mengsen Zhang,
William D. Kalies,
J. A. Scott Kelso,
Emmanuelle Tognoli
Abstract:
Living systems exhibit complex yet organized behavior on multiple spatiotemporal scales. To investigate the nature of multiscale coordination in living systems, one needs a meaningful and systematic way to quantify the complex dynamics, a challenge in both theoretical and empirical realms. The present work shows how integrating approaches from computational algebraic topology and dynamical systems…
▽ More
Living systems exhibit complex yet organized behavior on multiple spatiotemporal scales. To investigate the nature of multiscale coordination in living systems, one needs a meaningful and systematic way to quantify the complex dynamics, a challenge in both theoretical and empirical realms. The present work shows how integrating approaches from computational algebraic topology and dynamical systems may help us meet this challenge. In particular, we focus on the application of multiscale topological analysis to coordinated rhythmic processes. First, theoretical arguments are introduced as to why certain topological features and their scale-dependency are highly relevant to understanding complex collective dynamics. Second, we propose a method to capture such dynamically relevant topological information using persistent homology, which allows us to effectively construct a multiscale topological portrait of rhythmic coordination. Finally, the method is put to test in detecting transitions in real data from an experiment of rhythmic coordination in ensembles of interacting humans. The recurrence plots of topological portraits highlight collective transitions in coordination patterns that were elusive to more traditional methods. This sensitivity to collective transitions would be lost if the behavioral dynamics of individuals were treated as separate degrees of freedom instead of constituents of the topology that they collectively forge. Such multiscale topological portraits highlight collective aspects of coordination patterns that are irreducible to properties of individual parts. The present work demonstrates how the analysis of multiscale coordination dynamics can benefit from topological methods, thereby paving the way for further systematic quantification of complex, high-dimensional dynamics in living systems.
△ Less
Submitted 19 September, 2019;
originally announced September 2019.
-
Neural Population Coding for Effective Temporal Classification
Authors:
Zihan Pan,
Jibin Wu,
Yansong Chua,
Malu Zhang,
Haizhou Li
Abstract:
Neural encoding plays an important role in faithfully describing the temporally rich patterns, whose instances include human speech and environmental sounds. For tasks that involve classifying such spatio-temporal patterns with the Spiking Neural Networks (SNNs), how these patterns are encoded directly influence the difficulty of the task. In this paper, we compare several existing temporal and po…
▽ More
Neural encoding plays an important role in faithfully describing the temporally rich patterns, whose instances include human speech and environmental sounds. For tasks that involve classifying such spatio-temporal patterns with the Spiking Neural Networks (SNNs), how these patterns are encoded directly influence the difficulty of the task. In this paper, we compare several existing temporal and population coding schemes and evaluate them on both speech (TIDIGITS) and sound (RWCP) datasets. We show that, with population neural codings, the encoded patterns are linearly separable using the Support Vector Machine (SVM). We note that the population neural codings effectively project the temporal information onto the spatial domain, thus improving linear separability in the spatial dimension, achieving an accuracy of 95\% and 100\% for TIDIGITS and RWCP datasets classified using the SVM, respectively. This observation suggests that an effective neural coding scheme greatly simplifies the classification problem such that a simple linear classifier would suffice. The above datasets are then classified using the Tempotron, an SNN-based classifier. SNN classification results agree with the SVM findings that population neural codings help to improve classification accuracy. Hence, other than the learning algorithm, effective neural encoding is just as important as an SNN designed to recognize spatio-temporal patterns. It is an often neglected but powerful abstraction that deserves further study.
△ Less
Submitted 25 September, 2019; v1 submitted 12 September, 2019;
originally announced September 2019.
-
Single-cell stochastic gene expression kinetics with coupled positive-plus-negative feedback
Authors:
Chen Jia,
Le Yi Wang,
George G. Yin,
Michael Q. Zhang
Abstract:
Here we investigate single-cell stochastic gene expression kinetics in a minimal coupled gene circuit with positive-plus-negative feedback. A triphasic stochastic bifurcation upon the increasing ratio of the positive and negative feedback strengths is observed, which reveals a strong synergistic interaction between positive and negative feedback loops. We discover that coupled positive-plus-negati…
▽ More
Here we investigate single-cell stochastic gene expression kinetics in a minimal coupled gene circuit with positive-plus-negative feedback. A triphasic stochastic bifurcation upon the increasing ratio of the positive and negative feedback strengths is observed, which reveals a strong synergistic interaction between positive and negative feedback loops. We discover that coupled positive-plus-negative feedback amplifies gene expression mean but reduces gene expression noise over a wide range of feedback strengths when promoter switching is relatively slow, stabilizing gene expression around a relatively high level. In addition, we study two types of macroscopic limits of the discrete chemical master equation model: the Kurtz limit applies to proteins with large burst frequencies and the Lévy limit applies to proteins with large burst sizes. We derive the analytic steady-state distributions of the protein abundance in a coupled gene circuit for both the discrete model and its two macroscopic limits, generalizing the results obtained in [Chaos 26:043108, 2016]. We also obtain the analytic time-dependent protein distribution for the classical Friedman-Cai-Xie random bursting model proposed in [Phys. Rev. Lett. 97:168302, 2006]. Our analytic results are further applied to study the structure of gene expression noise in a coupled gene circuit and a complete decomposition of noise in terms of five different biophysical origins is provided.
△ Less
Submitted 25 October, 2019; v1 submitted 30 August, 2019;
originally announced September 2019.
-
Improving the Results of De novo Peptide Identification via Tandem Mass Spectrometry Using a Genetic Programming-based Scoring Function for Re-ranking Peptide-Spectrum Matches
Authors:
Samaneh Azari,
Bing Xue,
Mengjie Zhang,
Lifeng Peng
Abstract:
De novo peptide sequencing algorithms have been widely used in proteomics to analyse tandem mass spectra (MS/MS) and assign them to peptides, but quality-control methods to evaluate the confidence of de novo peptide sequencing are lagging behind. A fundamental part of a quality-control method is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). Here, we propose…
▽ More
De novo peptide sequencing algorithms have been widely used in proteomics to analyse tandem mass spectra (MS/MS) and assign them to peptides, but quality-control methods to evaluate the confidence of de novo peptide sequencing are lagging behind. A fundamental part of a quality-control method is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). Here, we propose a genetic programming (GP) based method, called GP-PSM, to learn a PSM scoring function for improving the rate of confident peptide identification from MS/MS data. The GP method learns from thousands of MS/MS spectra. Important characteristics about goodness of the matches are extracted from the learning set and incorporated into the GP scoring functions. We compare GP-PSM with two methods including Support Vector Regression (SVR) and Random Forest (RF). The GP method along with RF and SVR, each is used for post-processing the results of peptide identification by PEAKS, a commonly used de novo sequencing method. The results show that GP-PSM outperforms RF and SVR and discriminates accurately between correct and incorrect PSMs. It correctly assigns peptides to 10% more spectra on an evaluation dataset containing 120 MS/MS spectra and decreases the false positive rate (FPR) of peptide identification.
△ Less
Submitted 11 August, 2019;
originally announced August 2019.
-
FindeR: Accelerating FM-Index-based Exact Pattern Matching in Genomic Sequences through ReRAM technology
Authors:
Farzaneh Zokaee,
Mingzhe Zhang,
Lei Jiang
Abstract:
Genomics is the critical key to enabling precision medicine, ensuring global food security and enforcing wildlife conservation. The massive genomic data produced by various genome sequencing technologies presents a significant challenge for genome analysis. Because of errors from sequencing machines and genetic variations, approximate pattern matching (APM) is a must for practical genome analysis.…
▽ More
Genomics is the critical key to enabling precision medicine, ensuring global food security and enforcing wildlife conservation. The massive genomic data produced by various genome sequencing technologies presents a significant challenge for genome analysis. Because of errors from sequencing machines and genetic variations, approximate pattern matching (APM) is a must for practical genome analysis. Recent work proposes FPGA, ASIC and even process-in-memory-based accelerators to boost the APM throughput by accelerating dynamic-programming-based algorithms (e.g., Smith-Waterman). However, existing accelerators lack the efficient hardware acceleration for the exact pattern matching (EPM) that is an even more critical and essential function widely used in almost every step of genome analysis including assembly, alignment, annotation and compression.
State-of-the-art genome analysis adopts the FM-Index that augments the space-efficient BWT with additional data structures permitting fast EPM operations. But the FM-Index is notorious for poor spatial locality and massive random memory accesses. In this paper, we propose a ReRAM-based process-in-memory architecture, FindeR, to enhance the FM-Index EPM search throughput in genomic sequences. We build a reliable and energy-efficient Hamming distance unit to accelerate the computing kernel of FM-Index search using commodity ReRAM chips without introducing extra CMOS logic. We further architect a full-fledged FM-Index search pipeline and improve its search throughput by lightweight scheduling on the NVDIMM. We also create a system library for programmers to invoke FindeR to perform EPMs in genome analysis. Compared to state-of-the-art accelerators, FindeR improves the FM-Index search throughput by $83\%\sim 30K\times$ and throughput per Watt by $3.5\times\sim 42.5K\times$.
△ Less
Submitted 1 October, 2019; v1 submitted 10 July, 2019;
originally announced July 2019.
-
GA-Novo: De Novo Peptide Sequencing via Tandem Mass Spectrometry using Genetic Algorithm
Authors:
Samaneh Azari,
Bing Xue,
Mengjie Zhang,
Lifeng Peng
Abstract:
Proteomics is the large-scale analysis of the proteins. The common method for identifying proteins and characterising their amino acid sequences is to digest the proteins into peptides, analyse the peptides using mass spectrometry and assign the resulting tandem mass spectra (MS/MS) to peptides using database search tools. However, database search algorithms are highly dependent on a reference pro…
▽ More
Proteomics is the large-scale analysis of the proteins. The common method for identifying proteins and characterising their amino acid sequences is to digest the proteins into peptides, analyse the peptides using mass spectrometry and assign the resulting tandem mass spectra (MS/MS) to peptides using database search tools. However, database search algorithms are highly dependent on a reference protein database and they cannot identify peptides and proteins not included in the database. Therefore, de novo sequencing algorithms are developed to overcome the problem by directly reconstructing the peptide sequence of an MS/MS spectrum without using any protein database. Current de novo sequencing algorithms often fail to construct the completely matched sequences, and produce partial matches. In this study, we propose a genetic algorithm based method, GA-Novo, to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full length sequences. Given an MS/MS spectrum, GA-Novo optimises the amino acid sequences to best fit the input spectrum. On the testing dataset, GA-Novo outperforms PEAKS, the most commonly used software for this task, by constructing 8% higher number of fully matched peptide sequences, and 4% higher recall at partially matched sequences.
△ Less
Submitted 2 February, 2019;
originally announced February 2019.
-
Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits
Authors:
Martin J. Zhang,
James Zou,
David Tse
Abstract:
Monte Carlo (MC) permutation test is considered the gold standard for statistical hypothesis testing, especially when standard parametric assumptions are not clear or likely to fail. However, in modern data science settings where a large number of hypothesis tests need to be performed simultaneously, it is rarely used due to its prohibitive computational cost. In genome-wide association studies, f…
▽ More
Monte Carlo (MC) permutation test is considered the gold standard for statistical hypothesis testing, especially when standard parametric assumptions are not clear or likely to fail. However, in modern data science settings where a large number of hypothesis tests need to be performed simultaneously, it is rarely used due to its prohibitive computational cost. In genome-wide association studies, for example, the number of hypothesis tests $m$ is around $10^6$ while the number of MC samples $n$ for each test could be greater than $10^8$, totaling more than $nm$=$10^{14}$ samples. In this paper, we propose Adaptive MC multiple Testing (AMT) to estimate MC p-values and control false discovery rate in multiple testing. The algorithm outputs the same result as the standard full MC approach with high probability while requiring only $\tilde{O}(\sqrt{n}m)$ samples. This sample complexity is shown to be optimal. On a Parkinson GWAS dataset, the algorithm reduces the running time from 2 months for full MC to an hour. The AMT algorithm is derived based on the theory of multi-armed bandits.
△ Less
Submitted 18 May, 2019; v1 submitted 1 February, 2019;
originally announced February 2019.
-
Connecting empirical phenomena and theoretical models of biological coordination across scales
Authors:
Mengsen Zhang,
Christopher Beetle,
J. A. Scott Kelso,
Emmanuelle Tognoli
Abstract:
Coordination is ubiquitous in living systems. Existing theoretical models of coordination -- from bacteria to brains -- focus on either gross statistics in large-scale systems ($N\rightarrow\infty$) or detailed dynamics in small-scale systems (mostly $N=2$). Both approaches have proceeded largely independent of each other. The present work bridges this gap with a theoretical model of biological co…
▽ More
Coordination is ubiquitous in living systems. Existing theoretical models of coordination -- from bacteria to brains -- focus on either gross statistics in large-scale systems ($N\rightarrow\infty$) or detailed dynamics in small-scale systems (mostly $N=2$). Both approaches have proceeded largely independent of each other. The present work bridges this gap with a theoretical model of biological coordination that captures key experimental observations of mid-scale social coordination at multiple levels of description. It also reconciles in a single formulation two well-studied models of large- and small-scale biological coordination (Kuramoto and extended Haken-Kelso-Bunz). The model adds second-order coupling (from extended Haken-Kelso-Bunz) to the Kuramoto model. We show that second-order coupling is indispensable for reproducing empirically observed phenomena and gives rise to a phase transition from mono- to multi-stable coordination across scales. This mono-to-multistable transition connects the emergence and growth of behavioral complexity in small and large systems.
△ Less
Submitted 2 December, 2018;
originally announced December 2018.
-
Finding any Waldo: zero-shot invariant and efficient visual search
Authors:
Mengmi Zhang,
Jiashi Feng,
Keng Teck Ma,
Joo Hwee Lim,
Qi Zhao,
Gabriel Kreiman
Abstract:
Searching for a target object in a cluttered scene constitutes a fundamental challenge in daily vision. Visual search must be selective enough to discriminate the target from distractors, invariant to changes in the appearance of the target, efficient to avoid exhaustive exploration of the image, and must generalize to locate novel target objects with zero-shot training. Previous work has focused…
▽ More
Searching for a target object in a cluttered scene constitutes a fundamental challenge in daily vision. Visual search must be selective enough to discriminate the target from distractors, invariant to changes in the appearance of the target, efficient to avoid exhaustive exploration of the image, and must generalize to locate novel target objects with zero-shot training. Previous work has focused on searching for perfect matches of a target after extensive category-specific training. Here we show for the first time that humans can efficiently and invariantly search for natural objects in complex scenes. To gain insight into the mechanisms that guide visual search, we propose a biologically inspired computational model that can locate targets without exhaustive sampling and generalize to novel objects. The model provides an approximation to the mechanisms integrating bottom-up and top-down signals during search in natural scenes.
△ Less
Submitted 17 July, 2018;
originally announced July 2018.
-
Classification of lung nodules in CT images based on Wasserstein distance in differential geometry
Authors:
Min Zhang,
Qianli Ma,
Chengfeng Wen,
Hai Chen,
Deruo Liu,
Xianfeng Gu,
Jie He,
Xiaoyin Xu
Abstract:
Lung nodules are commonly detected in screening for patients with a risk for lung cancer. Though the status of large nodules can be easily diagnosed by fine needle biopsy or bronchoscopy, small nodules are often difficult to classify on computed tomography (CT). Recent works have shown that shape analysis of lung nodules can be used to differentiate benign lesions from malignant ones, though exist…
▽ More
Lung nodules are commonly detected in screening for patients with a risk for lung cancer. Though the status of large nodules can be easily diagnosed by fine needle biopsy or bronchoscopy, small nodules are often difficult to classify on computed tomography (CT). Recent works have shown that shape analysis of lung nodules can be used to differentiate benign lesions from malignant ones, though existing methods are limited in their sensitivity and specificity. In this work we introduced a new 3D shape analysis within the framework of differential geometry to calculate the Wasserstein distance between benign and malignant lung nodules to derive an accurate classification scheme. The Wasserstein distance between the nodules is calculated based on our new spherical optimal mass transport, this new algorithm works directly on sphere by using spherical metric, which is much more accurate and efficient than previous methods. In the process of deformation, the area-distortion factor gives a probability measure on the unit sphere, which forms the Wasserstein space. From known cases of benign and malignant lung nodules, we can calculate a unique optimal mass transport map between their correspondingly deformed Wasserstein spaces. This transportation cost defines the Wasserstein distance between them and can be used to classify new lung nodules into either the benign or malignant class. To the best of our knowledge, this is the first work that utilizes Wasserstein distance for lung nodule classification. The advantages of Wasserstein distance are it is invariant under rigid motions and scalings, thus it intrinsically measures shape distance even when the underlying shapes are of high complexity, making it well suited to classify lung nodules as they have different sizes, orientations, and appearances.
△ Less
Submitted 29 June, 2018;
originally announced July 2018.
-
Relaxation rates of gene expression kinetics reveal the feedback signs of autoregulatory gene networks
Authors:
Chen Jia,
Hong Qian,
Min Chen,
Michael Q. Zhang
Abstract:
The transient response to a stimulus and subsequent recovery to a steady state are the fundamental characteristics of a living organism. Here we study the relaxation kinetics of autoregulatory gene networks based on the chemical master equation model of single-cell stochastic gene expression with nonlinear feedback regulation. We report a novel relation between the rate of relaxation, characterize…
▽ More
The transient response to a stimulus and subsequent recovery to a steady state are the fundamental characteristics of a living organism. Here we study the relaxation kinetics of autoregulatory gene networks based on the chemical master equation model of single-cell stochastic gene expression with nonlinear feedback regulation. We report a novel relation between the rate of relaxation, characterized by the spectral gap of the Markov model, and the feedback sign of the underlying gene circuit. When a network has no feedback, the relaxation rate is exactly the decaying rate of the protein. We further show that positive feedback always slows down the relaxation kinetics while negative feedback always speeds it up. Numerical simulations demonstrate that this relation provides a possible method to infer the feedback topology of autoregulatory gene networks by using time-series data of gene expression.
△ Less
Submitted 3 March, 2018;
originally announced March 2018.
-
Phonemic evidence reveals interwoven evolution of Chinese dialects
Authors:
Meng-Han Zhang,
Wu-Yun Pan,
Shi Yan,
Li Jin
Abstract:
Han Chinese experienced substantial population migrations and admixture in history, yet little is known about the evolutionary process of Chinese dialects. Here, we used phylogenetic approaches and admixture inference to explicitly decompose the underlying structure of the diversity of Chinese dialects, based on the total phoneme inventories of 140 dialect samples from seven traditional dialect gr…
▽ More
Han Chinese experienced substantial population migrations and admixture in history, yet little is known about the evolutionary process of Chinese dialects. Here, we used phylogenetic approaches and admixture inference to explicitly decompose the underlying structure of the diversity of Chinese dialects, based on the total phoneme inventories of 140 dialect samples from seven traditional dialect groups: Mandarin, Wu, Xiang, Gan, Hakka, Min and Yue. We found a north-south gradient of phonemic differences in Chinese dialects induced from historical population migrations. We also quantified extensive horizontal language transfers among these dialects, corresponding to the complicated socio-genetic history in China. We finally identified that the middle latitude dialects of Xiang, Gan and Hakka were formed by admixture with other four dialects. Accordingly, the middle-latitude areas in China were a linguistic melting pot of northern and southern Han populations. Our study provides a detailed phylogenetic and historical context against family-tree model in China.
△ Less
Submitted 15 February, 2018;
originally announced February 2018.
-
Image Segmentation and Classification for Sickle Cell Disease using Deformable U-Net
Authors:
Mo Zhang,
Xiang Li,
Mengjia Xu,
Quanzheng Li
Abstract:
Reliable cell segmentation and classification from biomedical images is a crucial step for both scientific research and clinical practice. A major challenge for more robust segmentation and classification methods is the large variations in the size, shape and viewpoint of the cells, combining with the low image quality caused by noise and artifacts. To address this issue, in this work we propose a…
▽ More
Reliable cell segmentation and classification from biomedical images is a crucial step for both scientific research and clinical practice. A major challenge for more robust segmentation and classification methods is the large variations in the size, shape and viewpoint of the cells, combining with the low image quality caused by noise and artifacts. To address this issue, in this work we propose a learning-based, simultaneous cell segmentation and classification method based on the deep U-Net structure with deformable convolution layers. The U-Net architecture for deep learning has been shown to offer a precise localization for image semantic segmentation. Moreover, deformable convolution layer enables the free form deformation of the feature learning process, thus makes the whole network more robust to various cell morphologies and image settings. The proposed method is tested on microscopic red blood cell images from patients with sickle cell disease. The results show that U-Net with deformable convolution achieves the highest accuracy for segmentation and classification, comparing with original U-Net structure.
△ Less
Submitted 29 October, 2017; v1 submitted 23 October, 2017;
originally announced October 2017.
-
Emergent Lévy behavior in single-cell stochastic gene expression
Authors:
Chen Jia,
Michael Q. Zhang,
Hong Qian
Abstract:
Single-cell gene expression is inherently stochastic; its emergent behavior can be defined in terms of the chemical master equation describing the evolution of the mRNA and protein copy numbers as the latter tends to infinity. We establish two types of "macroscopic limits": the Kurtz limit is consistent with the classical chemical kinetics, while the Lévy limit provides a theoretical foundation fo…
▽ More
Single-cell gene expression is inherently stochastic; its emergent behavior can be defined in terms of the chemical master equation describing the evolution of the mRNA and protein copy numbers as the latter tends to infinity. We establish two types of "macroscopic limits": the Kurtz limit is consistent with the classical chemical kinetics, while the Lévy limit provides a theoretical foundation for an empirical equation proposed in [Phys. Rev. Lett. 97:168302, 2006]. Furthermore, we clarify the biochemical implications and ranges of applicability for various macroscopic limits and calculate a comprehensive analytic expression for the protein concentration distribution in autoregulatory gene networks. The relationship between our work and modern population genetics is discussed.
△ Less
Submitted 24 October, 2017; v1 submitted 20 August, 2017;
originally announced August 2017.
-
Stochastic fluctuations can reveal the feedback signs of gene regulatory networks at the single-molecule level
Authors:
Chen Jia,
Peng Xie,
Min Chen,
Michael Q. Zhang
Abstract:
Understanding the relationship between spontaneous stochastic fluctuations and the topology of the underlying gene regulatory network is of fundamental importance for the study of single-cell stochastic gene expression. Here by solving the analytical steady-state distribution of the protein copy number in a general kinetic model of stochastic gene expression with nonlinear feedback regulation, we…
▽ More
Understanding the relationship between spontaneous stochastic fluctuations and the topology of the underlying gene regulatory network is of fundamental importance for the study of single-cell stochastic gene expression. Here by solving the analytical steady-state distribution of the protein copy number in a general kinetic model of stochastic gene expression with nonlinear feedback regulation, we reveal the relationship between stochastic fluctuations and feedback topology at the single-molecule level, which provides novel insights into how and to what extent a feedback loop can enhance or suppress molecular fluctuations. Based on such relationship, we also develop an effective method to extract the topological information of a gene regulatory network from single-cell gene expression data. The theory is demonstrated by numerical simulations and, more importantly, validated quantitatively by single-cell data analysis of a synthetic gene circuit integrated in human kidney cells.
△ Less
Submitted 24 October, 2017; v1 submitted 19 March, 2017;
originally announced March 2017.
-
Recovering Metabolic Networks using A Novel Hyperlink Prediction Method
Authors:
Muhan Zhang,
Zhicheng Cui,
Tolutola Oyetunde,
Yinjie Tang,
Yixin Chen
Abstract:
Studying metabolic networks is vital for many areas such as novel drugs and bio-fuels. For biologists, a key challenge is that many reactions are impractical or expensive to be found through experiments. Our task is to recover the missing reactions. By exploiting the problem structure, we model reaction recovery as a hyperlink prediction problem, where each reaction is regarded as a hyperlink conn…
▽ More
Studying metabolic networks is vital for many areas such as novel drugs and bio-fuels. For biologists, a key challenge is that many reactions are impractical or expensive to be found through experiments. Our task is to recover the missing reactions. By exploiting the problem structure, we model reaction recovery as a hyperlink prediction problem, where each reaction is regarded as a hyperlink connecting its participating vertices (metabolites). Different from the traditional link prediction problem where two nodes form a link, a hyperlink can involve an arbitrary number of nodes. Since the cardinality of a hyperlink is variable, existing classifiers based on a fixed number of input features become infeasible. Traditional methods, such as common neighbors and Katz index, are not applicable either, since they are restricted to pairwise similarities. In this paper, we propose a novel hyperlink prediction algorithm, called Matrix Boosting (MATBoost). MATBoost conducts inference jointly in the incidence space and adjacency space by performing an iterative completion-matching optimization. We carry out extensive experiments to show that MATBoost achieves state-of-the-art performance. For a metabolic network with 1805 metabolites and 2583 reactions, our algorithm can successfully recover nearly 200 reactions out of 400 missing reactions.
△ Less
Submitted 21 November, 2016; v1 submitted 21 October, 2016;
originally announced October 2016.
-
Present Y chromosomes support the Persian ancestry of Sayyid Ajjal Shams al-Din Omar and Eminent Navigator Zheng He
Authors:
Chuan-Chao Wang,
Ling-Xiang Wang,
Manfei Zhang,
Dali Yao,
Li Jin,
Hui Li
Abstract:
Sayyid Ajjal is the ancestor of many Muslims in areas all across China. And one of his descendants is the famous Navigator of Ming Dynasty, Zheng He, who led the largest armada in the world of 15th century. The origin of Sayyid Ajjal's family remains unclear although many studies have been done on this topic of Muslim history. In this paper, we studied the Y chromosomes of his present descendants,…
▽ More
Sayyid Ajjal is the ancestor of many Muslims in areas all across China. And one of his descendants is the famous Navigator of Ming Dynasty, Zheng He, who led the largest armada in the world of 15th century. The origin of Sayyid Ajjal's family remains unclear although many studies have been done on this topic of Muslim history. In this paper, we studied the Y chromosomes of his present descendants, and found they all have haplogroup L1a-M76, proving a southern Persian origin.
△ Less
Submitted 21 October, 2013;
originally announced October 2013.
-
Convergence of Y chromosome STR haplotypes from different SNP haplogroups compromises accuracy of haplogroup prediction
Authors:
Chuan-Chao Wang,
Ling-Xiang Wang,
Rukesh Shrestha,
Shaoqing Wen,
Manfei Zhang,
Xinzhu Tong,
Li Jin,
Hui Li
Abstract:
Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) are two kinds of commonly used markers in Y chromosome studies of forensic and population genetics. There has been increasing interest in the cost saving strategy by using the STR haplotypes to predict SNP haplogroups. However, the convergence of Y chromosome STR haplotypes from different haplogroups might compromise the accura…
▽ More
Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) are two kinds of commonly used markers in Y chromosome studies of forensic and population genetics. There has been increasing interest in the cost saving strategy by using the STR haplotypes to predict SNP haplogroups. However, the convergence of Y chromosome STR haplotypes from different haplogroups might compromise the accuracy of haplogroup prediction. Here, we compared the worldwide Y chromosome lineages at both haplogroup level and haplotype level to search for the possible haplotype similarities among haplogroups. The similar haplotypes between haplogroups B and I2, C1 and E1b1b1, C2 and E1b1a1, H1 and J, L and O3a2c1, O1a and N, O3a1c and O3a2b, and M1 and O3a2 have been found, and those similarities reduce the accuracy of prediction.
△ Less
Submitted 20 October, 2013;
originally announced October 2013.
-
The effects of environmental disturbances on tumor growth
Authors:
Ning Xing Wang,
Xiao Miao Zhang,
Xiao Bing Han
Abstract:
In this study, the analytic expressions of the steady probability distribution of tumor cells were established based on the steady state solution to the corresponding Fokker-Planck equation. Then, the effects of two uncorrelated white noises on tumor cell growth were investigated. It was found that the predation rate plays the main role in determining whether or not the noise is favorable for tumo…
▽ More
In this study, the analytic expressions of the steady probability distribution of tumor cells were established based on the steady state solution to the corresponding Fokker-Planck equation. Then, the effects of two uncorrelated white noises on tumor cell growth were investigated. It was found that the predation rate plays the main role in determining whether or not the noise is favorable for tumor growth.
△ Less
Submitted 1 May, 2012;
originally announced May 2012.
-
Needed for completion of the human genome: hypothesis driven experiments and biologically realistic mathematical models
Authors:
Roderic Guigo,
Ewan Birney,
Michael Brent,
Emmanouil Dermitzakis,
Lior Pachter,
Hugues Roest Crollius,
Victor Solovyev,
Michael Q. Zhang
Abstract:
With the sponsorship of ``Fundacio La Caixa'' we met in Barcelona, November 21st and 22nd, to analyze the reasons why, after the completion of the human genome sequence, the identification all protein coding genes and their variants remains a distant goal. Here we report on our discussions and summarize some of the major challenges that need to be overcome in order to complete the human gene cat…
▽ More
With the sponsorship of ``Fundacio La Caixa'' we met in Barcelona, November 21st and 22nd, to analyze the reasons why, after the completion of the human genome sequence, the identification all protein coding genes and their variants remains a distant goal. Here we report on our discussions and summarize some of the major challenges that need to be overcome in order to complete the human gene catalog.
△ Less
Submitted 6 October, 2004;
originally announced October 2004.
-
Super-paramagnetic clustering of yeast gene expression profiles
Authors:
G. Getz,
E. Levine,
E. Domany,
M. Q. Zhang
Abstract:
High-density DNA arrays, used to monitor gene expression at a genomic scale, have produced vast amounts of information which require the development of efficient computational methods to analyze them. The important first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of a novel clustering algorithm, Super-Paramagnetic Clu…
▽ More
High-density DNA arrays, used to monitor gene expression at a genomic scale, have produced vast amounts of information which require the development of efficient computational methods to analyze them. The important first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of a novel clustering algorithm, Super-Paramagnetic Clustering (SPC) to analysis of gene expression profiles that were generated recently during a study of the yeast cell cycle. SPC was used to organize genes into biologically relevant clusters that are suggestive for their co-regulation. Some of the advantages of SPC are its robustness against noise and initialization, a clear signature of cluster formation and splitting, and an unsupervised self-organized determination of the number of clusters at each resolution. Our analysis revealed interesting correlated behavior of several groups of genes which has not been previously identified.
△ Less
Submitted 17 November, 1999;
originally announced November 1999.