Search | arXiv e-print repository

ULV: A robust statistical method for clustered data, with applications to multisubject, single-cell omics data

Authors: Mingyu Du, Kevin Johnston, Veronica Berrocal, Wei Li, Xiangmin Xu, Zhaoxia Yu

Abstract: Molecular and genomic technological advancements have greatly enhanced our understanding of biological processes by allowing us to quantify key biological variables such as gene expression, protein levels, and microbiome compositions. These breakthroughs have enabled us to achieve increasingly higher levels of resolution in our measurements, exemplified by our ability to comprehensively profile bi… ▽ More Molecular and genomic technological advancements have greatly enhanced our understanding of biological processes by allowing us to quantify key biological variables such as gene expression, protein levels, and microbiome compositions. These breakthroughs have enabled us to achieve increasingly higher levels of resolution in our measurements, exemplified by our ability to comprehensively profile biological information at the single-cell level. However, the analysis of such data faces several critical challenges: limited number of individuals, non-normality, potential dropouts, outliers, and repeated measurements from the same individual. In this article, we propose a novel method, which we call U-statistic based latent variable (ULV). Our proposed method takes advantage of the robustness of rank-based statistics and exploits the statistical efficiency of parametric methods for small sample sizes. It is a computationally feasible framework that addresses all the issues mentioned above simultaneously. An additional advantage of ULV is its flexibility in modeling various types of single-cell data, including both RNA and protein abundance. The usefulness of our method is demonstrated in two studies: a single-cell proteomics study of acute myelogenous leukemia (AML) and a single-cell RNA study of COVID-19 symptoms. In the AML study, ULV successfully identified differentially expressed proteins that would have been missed by the pseudobulk version of the Wilcoxon rank-sum test. In the COVID-19 study, ULV identified genes associated with covariates such as age and gender, and genes that would be missed without adjusting for covariates. The differentially expressed genes identified by our method are less biased toward genes with high expression levels. Furthermore, ULV identified additional gene pathways likely contributing to the mechanisms of COVID-19 severity. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.12645 [pdf, other]

Implementing feature binding through dendritic networks of a single neuron

Authors: Yuanhong Tang, Shanshan Jia, Tiejun Huang, Zhaofei Yu, Jian K. Liu

Abstract: A single neuron receives an extensive array of synaptic inputs through its dendrites, raising the fundamental question of how these inputs undergo integration and summation, culminating in the initiation of spikes in the soma. Experimental and computational investigations have revealed various modes of integration operations that include linear, superlinear, and sublinear summation. Interestingly,… ▽ More A single neuron receives an extensive array of synaptic inputs through its dendrites, raising the fundamental question of how these inputs undergo integration and summation, culminating in the initiation of spikes in the soma. Experimental and computational investigations have revealed various modes of integration operations that include linear, superlinear, and sublinear summation. Interestingly, distinct neuron types exhibit diverse patterns of dendritic integration contingent upon the spatial distribution of dendrites. The functional implications of these specific integration modalities remain largely unexplored. In this study, we employ the Purkinje cell as a model system to investigate these intricate questions. Our findings reveal that Purkinje cells (PCs) generally exhibit sublinear summation across their expansive dendrites. The degree of sublinearity is dynamically modulated by both spatial and temporal input. Strong sublinearity necessitates that the synaptic distribution in PCs be globally scattered sensitive, whereas weak sublinearity facilitates the generation of complex firing patterns in PCs. Leveraging dendritic branches characterized by strong sublinearity as computational units, we demonstrate that a neuron can adeptly address the feature-binding problem. Collectively, these results offer a systematic perspective on the functional role of dendritic sublinearity, providing inspiration for a broader understanding of dendritic integration across various neuronal types. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12519 [pdf, other]

MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation

Authors: Zhaoning Yu, Hongyang Gao

Abstract: Graph Neural Networks (GNNs) have shown remarkable success in molecular tasks, yet their interpretability remains challenging. Traditional model-level explanation methods like XGNN and GNNInterpreter often fail to identify valid substructures like rings, leading to questionable interpretability. This limitation stems from XGNN's atom-by-atom approach and GNNInterpreter's reliance on average graph… ▽ More Graph Neural Networks (GNNs) have shown remarkable success in molecular tasks, yet their interpretability remains challenging. Traditional model-level explanation methods like XGNN and GNNInterpreter often fail to identify valid substructures like rings, leading to questionable interpretability. This limitation stems from XGNN's atom-by-atom approach and GNNInterpreter's reliance on average graph embeddings, which overlook the essential structural elements crucial for molecules. To address these gaps, we introduce an innovative \textbf{M}otif-b\textbf{A}sed \textbf{G}NN \textbf{E}xplainer (MAGE) that uses motifs as fundamental units for generating explanations. Our approach begins with extracting potential motifs through a motif decomposition technique. Then, we utilize an attention-based learning method to identify class-specific motifs. Finally, we employ a motif-based graph generator for each class to create molecular graph explanations based on these class-specific motifs. This novel method not only incorporates critical substructures into the explanations but also guarantees their validity, yielding results that are human-understandable. Our proposed method's effectiveness is demonstrated through quantitative and qualitative assessments conducted on six real-world molecular datasets. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2405.08419

arXiv:2404.00111 [pdf, other]

Variational design of sensory feedback for powerstroke-recovery systems

Authors: Zhuojun Yu, Peter J. Thomas

Abstract: Although the raison d'etre of the brain is the survival of the body, there are relatively few theoretical studies of closed-loop rhythmic motor control systems. In this paper we provide a unified framework, based on variational analysis, for investigating the dual goals of performance and robustness in powerstroke-recovery systems. We augment two previously published closed-loop motor control mode… ▽ More Although the raison d'etre of the brain is the survival of the body, there are relatively few theoretical studies of closed-loop rhythmic motor control systems. In this paper we provide a unified framework, based on variational analysis, for investigating the dual goals of performance and robustness in powerstroke-recovery systems. We augment two previously published closed-loop motor control models by equipping each model with a performance measure based on the rate of progress of the system relative to a spatially extended external substrate -- such as progress relative to the ground for a locomotor task. The sensitivity measure quantifies the ability of the system to maintain performance in response to external perturbations. Motivated by a search for optimal design principles for feedback control achieving the complementary requirements of efficiency and robustness, we discuss the performance-sensitivity patterns of the systems featuring different sensory feedback architectures. In a paradigmatic half-center oscillator (HCO)-motor system, we observe that the excitation-inhibition property of feedback mechanisms determines the sensitivity pattern while the activation-inactivation property determines the performance pattern. Moreover, we show that the nonlinearity of the sigmoid activation of feedback signals allows the existence of optimal combinations of performance and sensitivity. In a detailed hindlimb locomotor system, we find that a force-dependent feedback can simultaneously optimize both performance and robustness, while length-dependent feedback variations result in significant performance-versus-sensitivity tradeoffs. Thus, this work provides an analytical framework for studying feedback control of oscillations in nonlinear dynamical systems, leading to several insights that have the potential to inform the design of control or rehabilitation systems. △ Less

Submitted 29 March, 2024; originally announced April 2024.

Comments: 48 pages, 17 figures, 3 tables

arXiv:2401.08986 [pdf, other]

Rigid Protein-Protein Docking via Equivariant Elliptic-Paraboloid Interface Prediction

Authors: Ziyang Yu, Wenbing Huang, Yang Liu

Abstract: The study of rigid protein-protein docking plays an essential role in a variety of tasks such as drug design and protein engineering. Recently, several learning-based methods have been proposed for the task, exhibiting much faster docking speed than those computational methods. In this paper, we propose a novel learning-based method called ElliDock, which predicts an elliptic paraboloid to represe… ▽ More The study of rigid protein-protein docking plays an essential role in a variety of tasks such as drug design and protein engineering. Recently, several learning-based methods have been proposed for the task, exhibiting much faster docking speed than those computational methods. In this paper, we propose a novel learning-based method called ElliDock, which predicts an elliptic paraboloid to represent the protein-protein docking interface. To be specific, our model estimates elliptic paraboloid interfaces for the two input proteins respectively, and obtains the roto-translation transformation for docking by making two interfaces coincide. By its design, ElliDock is independently equivariant with respect to arbitrary rotations/translations of the proteins, which is an indispensable property to ensure the generalization of the docking process. Experimental evaluations show that ElliDock achieves the fastest inference time among all compared methods and is strongly competitive with current state-of-the-art learning-based models such as DiffDock-PP and Multimer particularly for antibody-antigen docking. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: ICLR 2024

arXiv:2401.03639 [pdf, ps, other]

Deep Learning for Visual Neuroprosthesis

Authors: Peter Beech, Shanshan Jia, Zhaofei Yu, Jian K. Liu

Abstract: The visual pathway involves complex networks of cells and regions which contribute to the encoding and processing of visual information. While some aspects of visual perception are understood, there are still many unanswered questions regarding the exact mechanisms of visual encoding and the organization of visual information along the pathway. This chapter discusses the importance of visual perce… ▽ More The visual pathway involves complex networks of cells and regions which contribute to the encoding and processing of visual information. While some aspects of visual perception are understood, there are still many unanswered questions regarding the exact mechanisms of visual encoding and the organization of visual information along the pathway. This chapter discusses the importance of visual perception and the challenges associated with understanding how visual information is encoded and represented in the brain. Furthermore, this chapter introduces the concept of neuroprostheses: devices designed to enhance or replace bodily functions, and highlights the importance of constructing computational models of the visual pathway in the implementation of such devices. A number of such models, employing the use of deep learning models, are outlined, and their value to understanding visual coding and natural vision is discussed. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.15387 [pdf, other]

MotifPiece: A Data-Driven Approach for Effective Motif Extraction and Molecular Representation Learning

Authors: Zhaoning Yu, Hongyang Gao

Abstract: Motif extraction is an important task in motif based molecular representation learning. Previously, machine learning approaches employing either rule-based or string-based techniques to extract motifs. Rule-based approaches may extract motifs that aren't frequent or prevalent within the molecular data, which can lead to an incomplete understanding of essential structural patterns in molecules. Str… ▽ More Motif extraction is an important task in motif based molecular representation learning. Previously, machine learning approaches employing either rule-based or string-based techniques to extract motifs. Rule-based approaches may extract motifs that aren't frequent or prevalent within the molecular data, which can lead to an incomplete understanding of essential structural patterns in molecules. String-based methods often lose the topological information inherent in molecules. This can be a significant drawback because topology plays a vital role in defining the spatial arrangement and connectivity of atoms within a molecule, which can be critical for understanding its properties and behavior. In this paper, we develop a data-driven motif extraction technique known as MotifPiece, which employs statistical measures to define motifs. To comprehensively evaluate the effectiveness of MotifPiece, we introduce a heterogeneous learning module. Our model shows an improvement compared to previously reported models. Additionally, we demonstrate that its performance can be further enhanced in two ways: first, by incorporating more data to aid in generating a richer motif vocabulary, and second, by merging multiple datasets that share enough motifs, allowing for cross-dataset learning. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2306.11950 [pdf, other]

Mitigating Communication Costs in Neural Networks: The Role of Dendritic Nonlinearity

Authors: Xundong Wu, Pengfei Zhao, Zilin Yu, Lei Ma, Ka-Wa Yip, Huajin Tang, Gang Pan, Tiejun Huang

Abstract: Our comprehension of biological neuronal networks has profoundly influenced the evolution of artificial neural networks (ANNs). However, the neurons employed in ANNs exhibit remarkable deviations from their biological analogs, mainly due to the absence of complex dendritic trees encompassing local nonlinearity. Despite such disparities, previous investigations have demonstrated that point neurons… ▽ More Our comprehension of biological neuronal networks has profoundly influenced the evolution of artificial neural networks (ANNs). However, the neurons employed in ANNs exhibit remarkable deviations from their biological analogs, mainly due to the absence of complex dendritic trees encompassing local nonlinearity. Despite such disparities, previous investigations have demonstrated that point neurons can functionally substitute dendritic neurons in executing computational tasks. In this study, we scrutinized the importance of nonlinear dendrites within neural networks. By employing machine-learning methodologies, we assessed the impact of dendritic structure nonlinearity on neural network performance. Our findings reveal that integrating dendritic structures can substantially enhance model capacity and performance while keeping signal communication costs effectively restrained. This investigation offers pivotal insights that hold considerable implications for the development of future neural network accelerators. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.05654 [pdf, other]

Spike timing reshapes robustness against attacks in spiking neural networks

Authors: Jianhao Ding, Zhaofei Yu, Tiejun Huang, Jian K. Liu

Abstract: The success of deep learning in the past decade is partially shrouded in the shadow of adversarial attacks. In contrast, the brain is far more robust at complex cognitive tasks. Utilizing the advantage that neurons in the brain communicate via spikes, spiking neural networks (SNNs) are emerging as a new type of neural network model, boosting the frontier of theoretical investigation and empirical… ▽ More The success of deep learning in the past decade is partially shrouded in the shadow of adversarial attacks. In contrast, the brain is far more robust at complex cognitive tasks. Utilizing the advantage that neurons in the brain communicate via spikes, spiking neural networks (SNNs) are emerging as a new type of neural network model, boosting the frontier of theoretical investigation and empirical application of artificial neural networks and deep learning. Neuroscience research proposes that the precise timing of neural spikes plays an important role in the information coding and sensory processing of the biological brain. However, the role of spike timing in SNNs is less considered and far from understood. Here we systematically explored the timing mechanism of spike coding in SNNs, focusing on the robustness of the system against various types of attacks. We found that SNNs can achieve higher robustness improvement using the coding principle of precise spike timing in neural encoding and decoding, facilitated by different learning rules. Our results suggest that the utility of spike timing coding in SNNs could improve the robustness against attacks, providing a new approach to reliable coding principles for developing next-generation brain-inspired deep learning. △ Less

Submitted 8 June, 2023; originally announced June 2023.

arXiv:2306.05257 [pdf, other]

doi 10.1093/bib/bbad235

Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction

Authors: Xuan Lin, Lichang Dai, Yafang Zhou, Zu-Guo Yu, Wen Zhang, Jian-Yu Shi, Dong-Sheng Cao, Li Zeng, Haowen Chen, Bosheng Song, Philip S. Yu, Xiangxiang Zeng

Abstract: Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction… ▽ More Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, NLP based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely-used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted by Briefings in Bioinformatics

arXiv:2305.05861 [pdf]

Template-based eukaryotic genome editing directed by SviCas3

Authors: Wang-Yu Tong, Yong Li, Shou-Dong Ye, An-Jing Wang, Yan-Yan Tang, Mei-Li Li, Zhong-Fan Yu, Ting-Ting Xia, Qing-Yang Liu, Si-Qi Zhu

Abstract: RNA-guided gene editing based on the CRISPR-Cas system is currently the most effective genome editing technique. Here, we report that the SviCas3 from the subtype I-B-Svi Cas system in Streptomyces virginiae IBL14 is an RNA-guided and DNA-guided DNA endonuclease suitable for the HDR-directed gene and/or base editing of eukaryotic cell genomes. The genome editing efficiency of SviCas3 guided by DNA… ▽ More RNA-guided gene editing based on the CRISPR-Cas system is currently the most effective genome editing technique. Here, we report that the SviCas3 from the subtype I-B-Svi Cas system in Streptomyces virginiae IBL14 is an RNA-guided and DNA-guided DNA endonuclease suitable for the HDR-directed gene and/or base editing of eukaryotic cell genomes. The genome editing efficiency of SviCas3 guided by DNA is no less than that of SviCas3 guided by RNA. In particular, t-DNA, as a template and a guide, does not require a proto-spacer-adjacent motif, demonstrating that CRISPR, as the basis for crRNA design, is not required for the SviCas3-mediated gene and base editing. This discovery will broaden our understanding of enzyme diversity in CRISPR-Cas systems, will provide important tools for the creation and modification of living things and the treatment of human genetic diseases, and will usher in a new era of DNA-guided gene editing and base editing. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 113 pages, 12 figures and 4 tables

arXiv:2301.07459 [pdf]

Danlu Tongdu tablets treat lumbar spinal stenosis through reducing reactive oxygen species and apoptosis by regulating CDK2/CDK4/CDKN1A expression

Authors: Xue Bai, Ayesha T. Tahir, Zhengheng Yu, Wenbo Cheng, Bo Zhang, Jun Kang

Abstract: Lumbar spinal stenosis (LSS) is caused by the compression of the nerve root or cauda equina nerve by stenosis of the lumbar spinal canal or intervertebral foramen, and is manifested as chronic low back and leg pain. Danlu Tongdu (DLTD) tablets can relieve chronic pain caused by LSS, but the molecular mechanism remains largely unknown. In this study, the potential molecular mechanism of DLTD tablet… ▽ More Lumbar spinal stenosis (LSS) is caused by the compression of the nerve root or cauda equina nerve by stenosis of the lumbar spinal canal or intervertebral foramen, and is manifested as chronic low back and leg pain. Danlu Tongdu (DLTD) tablets can relieve chronic pain caused by LSS, but the molecular mechanism remains largely unknown. In this study, the potential molecular mechanism of DLTD tablets in the treatment of LSS was firstly predicted by network pharmacology method. Results showed that DLTD functions in regulating anti-oxidative, apoptosis, and inflammation signaling pathways. Furthermore, the flow cytometry results showed that DLTD tablets efficiently reduced ROS content and inhibited rat neural stem cell apoptosis induced by hydrogen peroxide. DLTD also inhibited the mitochondrial membrane potential damage induced by hydrogen peroxide. Elisa analysis showed that DLTD induced cell cycle related protein, CDK2 and CDK4 and reduced CDKN1A protein expression level. Taken together, our study provided new insights of DLTD in treating LSS through reducing ROS content, decreasing apoptosis by inhibiting CDKN1A and promoting CDK2 and CDK4 expression levels. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2211.07834 [pdf, other]

doi 10.1162/neco_a_01586

Sensitivity to control signals in triphasic rhythmic neural systems: a comparative mechanistic analysis via infinitesimal local timing response curves

Authors: Zhuojun Yu, Jonathan E. Rubin, Peter J. Thomas

Abstract: Similar activity patterns may arise from model neural networks with distinct coupling properties and individual unit dynamics. These similar patterns may, however, respond differently to parameter variations and, specifically, to tuning of inputs that represent control signals. In this work, we analyze the responses resulting from modulation of a localized input in each of three classes of model n… ▽ More Similar activity patterns may arise from model neural networks with distinct coupling properties and individual unit dynamics. These similar patterns may, however, respond differently to parameter variations and, specifically, to tuning of inputs that represent control signals. In this work, we analyze the responses resulting from modulation of a localized input in each of three classes of model neural networks that have been recognized in the literature for their capacity to produce robust three-phase rhythms: coupled fast-slow oscillators, near-heteroclinic oscillators, and threshold-linear networks. Triphasic rhythms, in which each phase consists of a prolonged activation of a corresponding subgroup of neurons followed by a fast transition to another phase, represent a fundamental activity pattern observed across a range of central pattern generators underlying behaviors critical to survival, including respiration, locomotion, and feeding. To perform our analysis, we extend the recently developed local timing response curve (lTRC), which allows us to characterize the timing effects due to perturbations, and we complement our lTRC approach with model-specific dynamical systems analysis. Interestingly, we observe disparate effects of similar perturbations across distinct model classes. Thus, this work provides an analytical framework for studying control of oscillations in nonlinear dynamical systems, and may help guide model selection in future efforts to study systems exhibiting triphasic rhythmic activity. △ Less

Submitted 15 June, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: 49 pages, 21 figures, 8 tables

Journal ref: Neural Computation, 2023, Volume 35, Issue 6, Page 1028-1085

arXiv:2206.02788 [pdf]

doi 10.1073/pnas.2118836119

Accurate Virus Identification with Interpretable Raman Signatures by Machine Learning

Authors: Jiarong Ye, Yin-Ting Yeh, Yuan Xue, Ziyang Wang, Na Zhang, He Liu, Kunyan Zhang, RyeAnne Ricker, Zhuohang Yu, Allison Roder, Nestor Perea Lopez, Lindsey Organtini, Wallace Greene, Susan Hafenstein, Huaguang Lu, Elodie Ghedin, Mauricio Terrones, Shengxi Huang, Sharon Xiaolei Huang

Abstract: Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device coupled with label-free Raman Spectroscopy holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning approach applied to recognize the virus based on i… ▽ More Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device coupled with label-free Raman Spectroscopy holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning approach applied to recognize the virus based on its Raman spectrum, which is used as a fingerprint. We present such a machine learning approach for analyzing Raman spectra of human and avian viruses. A Convolutional Neural Network (CNN) classifier specifically designed for spectral data achieves very high accuracy for a variety of virus type or subtype identification tasks. In particular, it achieves 99% accuracy for classifying influenza virus type A vs. type B, 96% accuracy for classifying four subtypes of influenza A, 95% accuracy for differentiating enveloped and non-enveloped viruses, and 99% accuracy for differentiating avian coronavirus (infectious bronchitis virus, IBV) from other avian viruses. Furthermore, interpretation of neural net responses in the trained CNN model using a full-gradient algorithm highlights Raman spectral ranges that are most important to virus identification. By correlating ML-selected salient Raman ranges with the signature ranges of known biomolecules and chemical functional groups (for example, amide, amino acid, carboxylic acid), we verify that our ML model effectively recognizes the Raman signatures of proteins, lipids and other vital functional groups present in different viruses and uses a weighted combination of these signatures to identify viruses. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: 23 pages, 8 figures

Journal ref: Proceedings of the National Academy of Sciences of the United States of America (2022)

arXiv:2203.00854 [pdf, other]

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

Authors: Shenggan Cheng, Xuanlei Zhao, Guangyang Lu, Jiarui Fang, Zhongming Yu, Tian Zheng, Ruidong Wu, Xiwen Zhang, Jian Peng, Yang You

Abstract: Protein structure prediction helps to understand gene translation and protein function, which is of growing interest and importance in structural biology. The AlphaFold model, which used transformer architecture to achieve atomic-level accuracy in protein structure prediction, was a significant breakthrough. However, training and inference of the AlphaFold model are challenging due to its high com… ▽ More Protein structure prediction helps to understand gene translation and protein function, which is of growing interest and importance in structural biology. The AlphaFold model, which used transformer architecture to achieve atomic-level accuracy in protein structure prediction, was a significant breakthrough. However, training and inference of the AlphaFold model are challenging due to its high computation and memory cost. In this work, we present FastFold, an efficient implementation of AlphaFold for both training and inference. We propose Dynamic Axial Parallelism and Duality Async Operations to improve the scaling efficiency of model parallelism. Besides, AutoChunk is proposed to reduce memory cost by over 80% during inference by automatically determining the chunk strategy. Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5X - 9.5X speedup for long-sequence inference. Furthermore, we scale FastFold to 512 GPUs and achieve an aggregate throughput of 6.02 PetaFLOP/s with 90.1% parallel efficiency. △ Less

Submitted 5 February, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

arXiv:2112.15552 [pdf, other]

doi 10.1109/JSSC.2021.3129993

Magnetoelectric Bio-Implants Powered and Programmed by a Single Transmitter for Coordinated Multisite Stimulation

Authors: Zhanghao Yu, Joshua C. Chen, Yan He, Fatima T. Alrashdan, Benjamin W. Avants, Amanda Singer, Jacob T. Robinson, Kaiyuan Yang

Abstract: This article presents a hardware platform including stimulating implants wirelessly powered and controlled by a shared transmitter (TX) for coordinated leadless multisite stimulation. The adopted novel single-TX, multiple-implant structure can flexibly deploy stimuli, improve system efficiency, easily scale stimulating channel quantity, and relieve efforts in device synchronization. In the propose… ▽ More This article presents a hardware platform including stimulating implants wirelessly powered and controlled by a shared transmitter (TX) for coordinated leadless multisite stimulation. The adopted novel single-TX, multiple-implant structure can flexibly deploy stimuli, improve system efficiency, easily scale stimulating channel quantity, and relieve efforts in device synchronization. In the proposed system, a wireless link leveraging magnetoelectric (ME) effect is co-designed with a robust and efficient system-on-chip (SoC) to enable reliable operation and individual programming of every implant. Each implant integrates a 0.8-mm2 chip, a 6-mm2 ME film, and an energy storage capacitor within a 6.2-mm3 size. ME power transfer is capable of safely transmitting milliwatt power to devices placed several centimeters away from the TX coil, maintaining good efficiency with size constraints, and tolerating 60 degree, 1.5-cm misalignment in angular and lateral movement. The SoC robustly operates with 2-V source amplitude variations that spans a 40-mm TX-implant distance change, realizes individual addressability through physical unclonable function (PUF) IDs, and achieves 90% efficiency for 1.5-3.5-V stimulation with fully programmable stimulation parameters. △ Less

Submitted 31 December, 2021; originally announced December 2021.

Comments: This paper has been published in IEEE Journal of Solid-State Circuits, 2021

Journal ref: IEEE Journal of Solid-State Circuits, 2021

arXiv:2112.05884 [pdf, other]

Unraveling Single-Particle Trajectories Confined in Tubular Networks

Authors: Yunhao Sun, Zexi Yu, Christopher J. Obara, Keshav Mittal, Jennifer Lippincott-Schwarz, Elena F Koslover

Abstract: The analysis of single particle trajectories plays an important role in elucidating dynamics within complex environments such as those found in living cells. However, the characterization of intracellular particle motion is often confounded by confinement of the particles within non-trivial subcellular geometries. Here, we focus specifically on the case of particles undergoing Brownian motion with… ▽ More The analysis of single particle trajectories plays an important role in elucidating dynamics within complex environments such as those found in living cells. However, the characterization of intracellular particle motion is often confounded by confinement of the particles within non-trivial subcellular geometries. Here, we focus specifically on the case of particles undergoing Brownian motion within a tubular network, as found in some cellular organelles. An unraveling algorithm is developed to uncouple particle motion from the confining network structure, allowing for an accurate extraction of the diffusion coefficient, as well as differentiating between Brownian and fractional Brownian dynamics. We validate the algorithm with simulated trajectories and then highlight its application to an example system: analyzing the motion of membrane proteins confined in the tubules of the peripheral endoplasmic reticulum in mammalian cells. We show that these proteins undergo diffusive motion with a well-characterized diffusivity. Our algorithm provides a generally applicable approach for disentangling geometric morphology and particle dynamics in networked architectures. △ Less

Submitted 7 January, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

arXiv:2107.02995 [pdf, other]

doi 10.1109/TBCAS.2020.3037862

MagNI: A Magnetoelectrically Powered and Controlled Wireless Neurostimulating Implant

Authors: Zhanghao Yu, Joshua C. Chen, Fatima T. Alrashdan, Benjamin W. Avants, Yan He, Amanda Singer, Jacob T. Robinson, Kaiyuan Yang

Abstract: This paper presents the first wireless and programmable neural stimulator leveraging magnetoelectric (ME) effects for power and data transfer. Thanks to low tissue absorption, low misalignment sensitivity and high power transfer efficiency, the ME effect enables safe delivery of high power levels (a few milliwatts) at low resonant frequencies (~250 kHz) to mm-sized implants deep inside the body (3… ▽ More This paper presents the first wireless and programmable neural stimulator leveraging magnetoelectric (ME) effects for power and data transfer. Thanks to low tissue absorption, low misalignment sensitivity and high power transfer efficiency, the ME effect enables safe delivery of high power levels (a few milliwatts) at low resonant frequencies (~250 kHz) to mm-sized implants deep inside the body (30-mm depth). The presented MagNI (Magnetoelectric Neural Implant) consists of a 1.5-mm$^2$ 180-nm CMOS chip, an in-house built 4x2 mm ME film, an energy storage capacitor, and on-board electrodes on a flexible polyimide substrate with a total volume of 8.2 mm$^3$ . The chip with a power consumption of 23.7 $μ$W includes robust system control and data recovery mechanisms under source amplitude variations (1-V variation tolerance). The system delivers fully-programmable bi-phasic current-controlled stimulation with patterns covering 0.05-to-1.5-mA amplitude, 64-to-512-$μ$s pulse width, and 0-to-200Hz repetition frequency for neurostimulation. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: This work has been accepted to 2020 IEEE Transactions on Biomedical Circuits and Systems (TBioCAS)

Journal ref: IEEE Transactions on Biomedical Circuits and Systems (TBioCAS), Volume: 14, Issue: 6, Pages: 1241-1252, Dec. 2020

arXiv:2103.02163 [pdf, other]

To Deconvolve, or Not to Deconvolve: Inferences of Neuronal Activities using Calcium Imaging Data

Authors: Tong Shen, Gyorgy Lur, Xiangmin Xu, Zhaoxia Yu

Abstract: With the increasing popularity of calcium imaging data in neuroscience research, methods for analyzing calcium trace data are critical to address various questions. The observed calcium traces are either analyzed directly or deconvolved to spike trains to infer neuronal activities. When both approaches are applicable, it is unclear whether deconvolving calcium traces is a necessary step. In this a… ▽ More With the increasing popularity of calcium imaging data in neuroscience research, methods for analyzing calcium trace data are critical to address various questions. The observed calcium traces are either analyzed directly or deconvolved to spike trains to infer neuronal activities. When both approaches are applicable, it is unclear whether deconvolving calcium traces is a necessary step. In this article, we compare the performance of using calcium traces or their deconvolved spike trains for three common analyses: clustering, principal component analysis (PCA), and population decoding. Our simulations and applications to real data suggest that the estimated spike data outperform calcium trace data for both clustering and PCA. Although calcium trace data show higher predictability than spike data at each time point, spike history or cumulative spike counts is comparable to or better than calcium traces in population decoding. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2010.09690 [pdf]

doi 10.1109/ICPR48806.2021.9412266

SPA: Stochastic Probability Adjustment for System Balance of Unsupervised SNNs

Authors: Xingyu Yang, Mingyuan Meng, Shanlin Xiao, Zhiyi Yu

Abstract: Spiking neural networks (SNNs) receive widespread attention because of their low-power hardware characteristic and brain-like signal response mechanism, but currently, the performance of SNNs is still behind Artificial Neural Networks (ANNs). We build an information theory-inspired system called Stochastic Probability Adjustment (SPA) system to reduce this gap. The SPA maps the synapses and neuron… ▽ More Spiking neural networks (SNNs) receive widespread attention because of their low-power hardware characteristic and brain-like signal response mechanism, but currently, the performance of SNNs is still behind Artificial Neural Networks (ANNs). We build an information theory-inspired system called Stochastic Probability Adjustment (SPA) system to reduce this gap. The SPA maps the synapses and neurons of SNNs into a probability space where a neuron and all connected pre-synapses are represented by a cluster. The movement of synaptic transmitter between different clusters is modeled as a Brownian-like stochastic process in which the transmitter distribution is adaptive at different firing phases. We experimented with a wide range of existing unsupervised SNN architectures and achieved consistent performance improvements. The improvements in classification accuracy have reached 1.99% and 6.29% on the MNIST and EMNIST datasets respectively. △ Less

Submitted 6 May, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: Published at the 25th International Conference on Pattern Recognition (ICPR2020)

Journal ref: 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 6417-6424

arXiv:2001.10696 [pdf]

doi 10.1109/IJCNN48605.2020.9207161

Spiking Inception Module for Multi-layer Unsupervised Spiking Neural Networks

Authors: Mingyuan Meng, Xingyu Yang, Shanlin Xiao, Zhiyi Yu

Abstract: Spiking Neural Network (SNN), as a brain-inspired approach, is attracting attention due to its potential to produce ultra-high-energy-efficient hardware. Competitive learning based on Spike-Timing-Dependent Plasticity (STDP) is a popular method to train an unsupervised SNN. However, previous unsupervised SNNs trained through this method are limited to a shallow network with only one learnable laye… ▽ More Spiking Neural Network (SNN), as a brain-inspired approach, is attracting attention due to its potential to produce ultra-high-energy-efficient hardware. Competitive learning based on Spike-Timing-Dependent Plasticity (STDP) is a popular method to train an unsupervised SNN. However, previous unsupervised SNNs trained through this method are limited to a shallow network with only one learnable layer and cannot achieve satisfactory results when compared with multi-layer SNNs. In this paper, we eased this limitation by: 1)We proposed a Spiking Inception (Sp-Inception) module, inspired by the Inception module in the Artificial Neural Network (ANN) literature. This module is trained through STDP-based competitive learning and outperforms the baseline modules on learning capability, learning efficiency, and robustness. 2)We proposed a Pooling-Reshape-Activate (PRA) layer to make the Sp-Inception module stackable. 3)We stacked multiple Sp-Inception modules to construct multi-layer SNNs. Our algorithm outperforms the baseline algorithms on the hand-written digit classification task, and reaches state-of-the-art results on the MNIST dataset among the existing unsupervised SNNs. △ Less

Submitted 28 September, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

Comments: Published at the 2020 International Joint Conference on Neural Networks (IJCNN); Extended from arXiv:2001.01680

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, United Kingdom, 2020, pp. 1-8

arXiv:2001.04064 [pdf, other]

Towards the Next Generation of Retinal Neuroprosthesis: Visual Computation with Spikes

Authors: Zhaofei Yu, Jian K. Liu, Shanshan Jia, Yichen Zhang, Yajing Zheng, Yonghong Tian, Tiejun Huang

Abstract: Neuroprosthesis, as one type of precision medicine device, is aiming for manipulating neuronal signals of the brain in a closed-loop fashion, together with receiving stimulus from the environment and controlling some part of our brain/body. In terms of vision, incoming information can be processed by the brain in millisecond interval. The retina computes visual scenes and then sends its output as… ▽ More Neuroprosthesis, as one type of precision medicine device, is aiming for manipulating neuronal signals of the brain in a closed-loop fashion, together with receiving stimulus from the environment and controlling some part of our brain/body. In terms of vision, incoming information can be processed by the brain in millisecond interval. The retina computes visual scenes and then sends its output as neuronal spikes to the cortex for further computation. Therefore, the neuronal signal of interest for retinal neuroprosthesis is spike. Closed-loop computation in neuroprosthesis includes two stages: encoding stimulus to neuronal signal, and decoding it into stimulus. Here we review some of the recent progress about visual computation models that use spikes for analyzing natural scenes, including static images and dynamic movies. We hypothesize that for a better understanding of computational principles in the retina, one needs a hypercircuit view of the retina, in which different functional network motifs revealed in the cortex neuronal network should be taken into consideration for the retina. Different building blocks of the retina, including a diversity of cell types and synaptic connections, either chemical synapses or electrical synapses (gap junctions), make the retina an ideal neuronal network to adapt the computational techniques developed in artificial intelligence for modeling of encoding/decoding visual scenes. Altogether, one needs a systems approach of visual computation with spikes to advance the next generation of retinal neuroprosthesis as an artificial visual system. △ Less

Submitted 13 January, 2020; originally announced January 2020.

Comments: 15 pages, 5 figures

Journal ref: published 2019

arXiv:2001.01680 [pdf]

doi 10.1016/j.neucom.2021.02.027

High-parallelism Inception-like Spiking Neural Networks for Unsupervised Feature Learning

Authors: Mingyuan Meng, Xingyu Yang, Lei Bi, Jinman Kim, Shanlin Xiao, Zhiyi Yu

Abstract: Spiking Neural Networks (SNNs) are brain-inspired, event-driven machine learning algorithms that have been widely recognized in producing ultra-high-energy-efficient hardware. Among existing SNNs, unsupervised SNNs based on synaptic plasticity, especially Spike-Timing-Dependent Plasticity (STDP), are considered to have great potential in imitating the learning process of the biological brain. Neve… ▽ More Spiking Neural Networks (SNNs) are brain-inspired, event-driven machine learning algorithms that have been widely recognized in producing ultra-high-energy-efficient hardware. Among existing SNNs, unsupervised SNNs based on synaptic plasticity, especially Spike-Timing-Dependent Plasticity (STDP), are considered to have great potential in imitating the learning process of the biological brain. Nevertheless, the existing STDP-based SNNs have limitations in constrained learning capability and/or slow learning speed. Most STDP-based SNNs adopted a slow-learning Fully-Connected (FC) architecture and used a sub-optimal vote-based scheme for spike decoding. In this paper, we overcome these limitations with: 1) a design of high-parallelism network architecture, inspired by the Inception module in Artificial Neural Networks (ANNs); 2) use of a Vote-for-All (VFA) decoding layer as a replacement to the standard vote-based spike decoding scheme, to reduce the information loss in spike decoding and, 3) a proposed adaptive repolarization (resetting) mechanism that accelerates SNNs' learning by enhancing spiking activities. Our experimental results on two established benchmark datasets (MNIST/EMNIST) show that our network architecture resulted in superior performance compared to the widely used FC architecture and a more advanced Locally-Connected (LC) architecture, and that our SNN achieved competitive results with state-of-the-art unsupervised SNNs (95.64%/80.11% accuracy on the MNIST/EMNISE dataset) while having superior learning efficiency and robustness against hardware damage. Our SNN achieved great classification accuracy with only hundreds of training iterations, and random destruction of large numbers of synapses or neurons only led to negligible performance degradation. △ Less

Submitted 8 March, 2021; v1 submitted 2 December, 2019; originally announced January 2020.

Comments: Published at Neurocomputing

Journal ref: Neurocomputing, vol. 441, pp. 92-104, 2021

arXiv:1911.05584 [pdf]

doi 10.1093/bib/bbaa140

Tensor Decomposition with Relational Constraints for Predicting Multiple Types of MicroRNA-disease Associations

Authors: Feng Huang, Xiang Yue, Zhankun Xiong, Zhouxin Yu, Wen Zhang

Abstract: MicroRNAs (miRNAs) play crucial roles in multifarious biological processes associated with human diseases. Identifying potential miRNA-disease associations contributes to understanding the molecular mechanisms of miRNA-related diseases. Most of the existing computational methods mainly focus on predicting whether a miRNA-disease association exists or not. However, the roles of miRNAs in diseases a… ▽ More MicroRNAs (miRNAs) play crucial roles in multifarious biological processes associated with human diseases. Identifying potential miRNA-disease associations contributes to understanding the molecular mechanisms of miRNA-related diseases. Most of the existing computational methods mainly focus on predicting whether a miRNA-disease association exists or not. However, the roles of miRNAs in diseases are prominently diverged, for instance, Genetic variants of microRNA (mir-15) may affect expression level of miRNAs leading to B cell chronic lymphocytic leukemia, while circulating miRNAs (including mir-1246, mir-1307-3p, etc.) have potentials to detecting breast cancer in the early stage. In this paper, we aim to predict multi-type miRNA-disease associations instead of taking them as binary. To this end, we innovatively represent miRNA-disease-type triplets as a tensor and introduce Tensor Decomposition methods to solve the prediction task. Experimental results on two widely-adopted miRNA-disease datasets: HMDD v2.0 and HMDD v3.2 show that tensor decomposition methods improve a recent baseline in a large scale (up to 38% in top-1 F1). We further propose a novel method, Tensor Decomposition with Relational Constraints (TDRC), which incorporates biological features as relational constraints to further the existing tensor decomposition methods. Compared with two existing tensor decomposition methods, TDRC can produce better performance while being more efficient. △ Less

Submitted 9 March, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

Journal ref: Briefings in Bioinformatics, Volume 22, Issue 3, May 2021, bbaa140

arXiv:1911.00185 [pdf]

ItLnc-BXE: a Bagging-XGBoost-ensemble method with multiple features for identification of plant lncRNAs

Authors: Guangyan Zhang, Ziru Liu, Jichen Dai, Zilan Yu, Shuai Liu, Wen Zhang

Abstract: Motivation: Since long non-coding RNAs (lncRNAs) have involved in a wide range of functions in cellular and developmental processes, an increasing number of methods have been proposed for distinguishing lncRNAs from coding RNAs. However, most of the existing methods are designed for lncRNAs in animal systems, and only a few methods focus on the plant lncRNA identification. Different from lncRNAs i… ▽ More Motivation: Since long non-coding RNAs (lncRNAs) have involved in a wide range of functions in cellular and developmental processes, an increasing number of methods have been proposed for distinguishing lncRNAs from coding RNAs. However, most of the existing methods are designed for lncRNAs in animal systems, and only a few methods focus on the plant lncRNA identification. Different from lncRNAs in animal systems, plant lncRNAs have distinct characteristics. It is desirable to develop a computational method for accurate and robust identification of plant lncRNAs. Results: Herein, we present a plant lncRNA identification method ItLnc-BXE, which utilizes multiple features and the ensemble learning strategy. First, a diversity of lncRNA features is collected and filtered by feature selection to represent RNA transcripts. Then, several base learners are trained and further combined into a single meta-learner by ensemble learning, and thus an ItLnc-BXE model is constructed. ItLnc-BXE models are evaluated on datasets of six plant species, the results show that ItLnc-BXE outperforms other state-of-the-art plant lncRNA identification methods, achieving better and robust performances (AUC>95.91%). We also perform some experiments about cross-species lncRNA identification, and the results indicate that dicots-based and monocots-based models can be used to accurately identify lncRNAs in lower plant species, such as mosses and algae. Availability: source codes are available at https://github.com/BioMedicalBigDataMiningLab/ItLnc-BXE. Contact: zhangwen@mail.hzau.edu.cn (or) zhangwen@whu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. △ Less

Submitted 24 January, 2020; v1 submitted 31 October, 2019; originally announced November 2019.

Comments: 7 pages, 3 figures, 4 tables

arXiv:1904.13007 [pdf, other]

Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks

Authors: Yichen Zhang, Shanshan Jia, Yajing Zheng, Zhaofei Yu, Yonghong Tian, Siwei Ma, Tiejun Huang, Jian K. Liu

Abstract: Neural coding is one of the central questions in systems neuroscience for understanding how the brain processes stimulus from the environment, moreover, it is also a cornerstone for designing algorithms of brain-machine interface, where decoding incoming stimulus is highly demanded for better performance of physical devices. Traditionally researchers have focused on functional magnetic resonance i… ▽ More Neural coding is one of the central questions in systems neuroscience for understanding how the brain processes stimulus from the environment, moreover, it is also a cornerstone for designing algorithms of brain-machine interface, where decoding incoming stimulus is highly demanded for better performance of physical devices. Traditionally researchers have focused on functional magnetic resonance imaging (fMRI) data as the neural signals of interest for decoding visual scenes. However, our visual perception operates in a fast time scale of millisecond in terms of an event termed neural spike. There are few studies of decoding by using spikes. Here we fulfill this aim by developing a novel decoding framework based on deep neural networks, named spike-image decoder (SID), for reconstructing natural visual scenes, including static images and dynamic videos, from experimentally recorded spikes of a population of retinal ganglion cells. The SID is an end-to-end decoder with one end as neural spikes and the other end as images, which can be trained directly such that visual scenes are reconstructed from spikes in a highly accurate fashion. Our SID also outperforms on the reconstruction of visual stimulus compared to existing fMRI decoding models. In addition, with the aid of a spike encoder, we show that SID can be generalized to arbitrary visual scenes by using the image datasets of MNIST, CIFAR10, and CIFAR100. Furthermore, with a pre-trained SID, one can decode any dynamic videos to achieve real-time encoding and decoding of visual scenes by spikes. Altogether, our results shed new light on neuromorphic computing for artificial visual systems, such as event-based visual cameras and visual neuroprostheses. △ Less

Submitted 28 January, 2020; v1 submitted 29 April, 2019; originally announced April 2019.

Comments: 35 pages, 10 figures

ACM Class: I.2.6

arXiv:1902.08411 [pdf, other]

doi 10.1016/j.neunet.2020.03.003

Probabilistic Inference of Binary Markov Random Fields in Spiking Neural Networks through Mean-field Approximation

Authors: Yajing Zheng, Shanshan Jia, Zhaofei Yu, Tiejun Huang, Jian K. Liu, Yonghong Tian

Abstract: Recent studies have suggested that the cognitive process of the human brain is realized as probabilistic inference and can be further modeled by probabilistic graphical models like Markov random fields. Nevertheless, it remains unclear how probabilistic inference can be implemented by a network of spiking neurons in the brain. Previous studies have tried to relate the inference equation of binary… ▽ More Recent studies have suggested that the cognitive process of the human brain is realized as probabilistic inference and can be further modeled by probabilistic graphical models like Markov random fields. Nevertheless, it remains unclear how probabilistic inference can be implemented by a network of spiking neurons in the brain. Previous studies have tried to relate the inference equation of binary Markov random fields to the dynamic equation of spiking neural networks through belief propagation algorithm and reparameterization, but they are valid only for Markov random fields with limited network structure. In this paper, we propose a spiking neural network model that can implement inference of arbitrary binary Markov random fields. Specifically, we design a spiking recurrent neural network and prove that its neuronal dynamics are mathematically equivalent to the inference process of Markov random fields by adopting mean-field theory. Furthermore, our mean-field approach unifies previous works. Theoretical analysis and experimental results, together with the application to image denoising, demonstrate that our proposed spiking neural network can get comparable results to that of mean-field inference. △ Less

Submitted 12 March, 2020; v1 submitted 22 February, 2019; originally announced February 2019.

Comments: Accepted in Neural Networks

arXiv:1811.02290 [pdf, other]

Revealing Fine Structures of the Retinal Receptive Field by Deep Learning Networks

Authors: Qi Yan, Yajing Zheng, Shanshan Jia, Yichen Zhang, Zhaofei Yu, Feng Chen, Yonghong Tian, Tiejun Huang, Jian K. Liu

Abstract: Deep convolutional neural networks (CNNs) have demonstrated impressive performance on many visual tasks. Recently, they became useful models for the visual system in neuroscience. However, it is still not clear what are learned by CNNs in terms of neuronal circuits. When a deep CNN with many layers is used for the visual system, it is not easy to compare the structure components of CNNs with possi… ▽ More Deep convolutional neural networks (CNNs) have demonstrated impressive performance on many visual tasks. Recently, they became useful models for the visual system in neuroscience. However, it is still not clear what are learned by CNNs in terms of neuronal circuits. When a deep CNN with many layers is used for the visual system, it is not easy to compare the structure components of CNNs with possible neuroscience underpinnings due to highly complex circuits from the retina to higher visual cortex. Here we address this issue by focusing on single retinal ganglion cells with biophysical models and recording data from animals. By training CNNs with white noise images to predict neuronal responses, we found that fine structures of the retinal receptive field can be revealed. Specifically, convolutional filters learned are resembling biological components of the retinal circuit. This suggests that a CNN learning from one single retinal cell reveals a minimal neural network carried out in this cell. Furthermore, when CNNs learned from different cells are transferred between cells, there is a diversity of transfer learning performance, which indicates that CNNs are cell-specific. Moreover, when CNNs are transferred between different types of input images, here white noise v.s. natural images, transfer learning shows a good performance, which implies that CNNs indeed capture the full computational ability of a single retinal cell for different inputs. Taken together, these results suggest that CNNs could be used to reveal structure components of neuronal circuits, and provide a powerful model for neural system identification. △ Less

Submitted 18 February, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

Comments: updated version

arXiv:1808.03958 [pdf, ps, other]

Neural System Identification with Spike-triggered Non-negative Matrix Factorization

Authors: Shanshan Jia, Zhaofei Yu, Arno Onken, Yonghong Tian, Tiejun Huang, Jian K. Liu

Abstract: Neuronal circuits formed in the brain are complex with intricate connection patterns. Such complexity is also observed in the retina as a relatively simple neuronal circuit. A retinal ganglion cell receives excitatory inputs from neurons in previous layers as driving forces to fire spikes. Analytical methods are required that can decipher these components in a systematic manner. Recently a method… ▽ More Neuronal circuits formed in the brain are complex with intricate connection patterns. Such complexity is also observed in the retina as a relatively simple neuronal circuit. A retinal ganglion cell receives excitatory inputs from neurons in previous layers as driving forces to fire spikes. Analytical methods are required that can decipher these components in a systematic manner. Recently a method termed spike-triggered non-negative matrix factorization (STNMF) has been proposed for this purpose. In this study, we extend the scope of the STNMF method. By using the retinal ganglion cell as a model system, we show that STNMF can detect various computational properties of upstream bipolar cells, including spatial receptive field, temporal filter, and transfer nonlinearity. In addition, we recover synaptic connection strengths from the weight matrix of STNMF. Furthermore, we show that STNMF can separate spikes of a ganglion cell into a few subsets of spikes where each subset is contributed by one presynaptic bipolar cell. Taken together, these results corroborate that STNMF is a useful method for deciphering the structure of neuronal circuits. △ Less

Submitted 1 March, 2020; v1 submitted 12 August, 2018; originally announced August 2018.

Comments: updated version

arXiv:1808.00675 [pdf, other]

Winner-Take-All as Basic Probabilistic Inference Unit of Neuronal Circuits

Authors: Zhaofei Yu, Yonghong Tian, Tiejun Huang, Jian K. Liu

Abstract: Experimental observations of neuroscience suggest that the brain is working a probabilistic way when computing information with uncertainty. This processing could be modeled as Bayesian inference. However, it remains unclear how Bayesian inference could be implemented at the level of neuronal circuits of the brain. In this study, we propose a novel general-purpose neural implementation of probabil… ▽ More Experimental observations of neuroscience suggest that the brain is working a probabilistic way when computing information with uncertainty. This processing could be modeled as Bayesian inference. However, it remains unclear how Bayesian inference could be implemented at the level of neuronal circuits of the brain. In this study, we propose a novel general-purpose neural implementation of probabilistic inference based on a ubiquitous network of cortical microcircuits, termed winner-take-all (WTA) circuit. We show that each WTA circuit could encode the distribution of states defined on a variable. By connecting multiple WTA circuits together, the joint distribution can be represented for arbitrary probabilistic graphical models. Moreover, we prove that the neural dynamics of WTA circuit is able to implement one of the most powerful inference methods in probabilistic graphical models, mean-field inference. We show that the synaptic drive of each spiking neuron in the WTA circuit encodes the marginal probability of the variable in each state, and the firing probability (or firing rate) of each neuron is proportional to the marginal probability. Theoretical analysis and experimental results demonstrate that the WTA circuits can get comparable inference result as mean-field approximation. Taken together, our results suggest that the WTA circuit could be seen as the minimal inference unit of neuronal circuits. △ Less

Submitted 2 August, 2018; originally announced August 2018.

Comments: 10 pages, 4 figures

arXiv:1803.03910 [pdf, other]

A pathway-based kernel boosting method for sample classification using genomic data

Authors: Li Zeng, Zhaolong Yu, Hongyu Zhao

Abstract: The analysis of cancer genomic data has long suffered "the curse of dimensionality". Sample sizes for most cancer genomic studies are a few hundreds at most while there are tens of thousands of genomic features studied. Various methods have been proposed to leverage prior biological knowledge, such as pathways, to more effectively analyze cancer genomic data. Most of the methods focus on testing m… ▽ More The analysis of cancer genomic data has long suffered "the curse of dimensionality". Sample sizes for most cancer genomic studies are a few hundreds at most while there are tens of thousands of genomic features studied. Various methods have been proposed to leverage prior biological knowledge, such as pathways, to more effectively analyze cancer genomic data. Most of the methods focus on testing marginal significance of the associations between pathways and clinical phenotypes. They can identify relevant pathways, but do not involve predictive modeling. In this article, we propose a Pathway-based Kernel Boosting (PKB) method for integrating gene pathway information for sample classification, where we use kernel functions calculated from each pathway as base learners and learn the weights through iterative optimization of the classification loss function. We apply PKB and several competing methods to three cancer studies with pathological and clinical information, including tumor grade, stage, tumor sites, and metastasis status. Our results show that PKB outperforms other methods, and identifies pathways relevant to the outcome variables. △ Less

Submitted 11 March, 2018; originally announced March 2018.

arXiv:1711.02837 [pdf, ps, other]

Revealing structure components of the retina by deep learning networks

Authors: Qi Yan, Zhaofei Yu, Feng Chen, Jian K. Liu

Abstract: Deep convolutional neural networks (CNNs) have demonstrated impressive performance on visual object classification tasks. In addition, it is a useful model for predication of neuronal responses recorded in visual system. However, there is still no clear understanding of what CNNs learn in terms of visual neuronal circuits. Visualizing CNN's features to obtain possible connections to neuronscience… ▽ More Deep convolutional neural networks (CNNs) have demonstrated impressive performance on visual object classification tasks. In addition, it is a useful model for predication of neuronal responses recorded in visual system. However, there is still no clear understanding of what CNNs learn in terms of visual neuronal circuits. Visualizing CNN's features to obtain possible connections to neuronscience underpinnings is not easy due to highly complex circuits from the retina to higher visual cortex. Here we address this issue by focusing on single retinal ganglion cells with a simple model and electrophysiological recordings from salamanders. By training CNNs with white noise images to predicate neural responses, we found that convolutional filters learned in the end are resembling to biological components of the retinal circuit. Features represented by these filters tile the space of conventional receptive field of retinal ganglion cells. These results suggest that CNN could be used to reveal structure components of neuronal circuits. △ Less

Submitted 8 November, 2017; originally announced November 2017.

Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning

arXiv:1607.04146 [pdf]

Molecular Mechanics of Chitin-Protein Interface

Authors: Zechuan Yu, Denvid Lau

Abstract: Chitin and protein are two main building blocks for many natural biomaterials. The interaction between chitin and protein critically determines the properties of the composite biological materials. As living organisms usually encounter complex ambient conditions like water, pH and ions are critical factors towards the structural integrity of biomaterials. It is therefore essential to study the chi… ▽ More Chitin and protein are two main building blocks for many natural biomaterials. The interaction between chitin and protein critically determines the properties of the composite biological materials. As living organisms usually encounter complex ambient conditions like water, pH and ions are critical factors towards the structural integrity of biomaterials. It is therefore essential to study the chitin-protein interface under different environmental conditions. Here, an atomistic model consisting of a chitin substrate and a protein filament is constructed, which is regarded as a representative of the chitin-protein interface existing in many chitin-based biomaterials. Based on this model, the mechanical properties of chitin-protein interface under different moisture and pH values are investigated through molecular dynamics simulations. The results reveal a weakening effect of water towards the chitin-protein interface, as well as acidity, i.e. the protonated protein forms a stronger adhesion to chitin than that in the alkaline environment. In addition, the effect from side-chain of protein is studied and it is found that certain kinds of amino acid can form hydrophobic connections to chitin surface, which means that these peptides partly dodge the weakening effect of water. Our observation indicates that terminuses and side-chains in protein are of importance in forming interfacial hydrogen bonds. From our full atomistic models, we can observe some molecular mechanisms about how protein interacts with chitin in different conditions, which may spotlight the engineering on biomaterials with similar interfaces. △ Less

Submitted 24 July, 2016; v1 submitted 12 July, 2016; originally announced July 2016.

Comments: Conference proceeding for: Zechuan Yu and Denvid Lau (2014), "Molecular mechanics of chitin-protein interface", 7th World Congress of Biomechanics, 6-11 July, Boston, MA, USA

arXiv:1606.00157 [pdf, other]

CaMKII activation supports reward-based neural network optimization through Hamiltonian sampling

Authors: Zhaofei Yu, David Kappel, Robert Legenstein, Sen Song, Feng Chen, Wolfgang Maass

Abstract: Synaptic plasticity is implemented and controlled through over thousand different types of molecules in the postsynaptic density and presynaptic boutons that assume a staggering array of different states through phosporylation and other mechanisms. One of the most prominent molecule in the postsynaptic density is CaMKII, that is described in molecular biology as a "memory molecule" that can integr… ▽ More Synaptic plasticity is implemented and controlled through over thousand different types of molecules in the postsynaptic density and presynaptic boutons that assume a staggering array of different states through phosporylation and other mechanisms. One of the most prominent molecule in the postsynaptic density is CaMKII, that is described in molecular biology as a "memory molecule" that can integrate through auto-phosporylation Ca-influx signals on a relatively large time scale of dozens of seconds. The functional impact of this memory mechanism is largely unknown. We show that the experimental data on the specific role of CaMKII activation in dopamine-gated spine consolidation suggest a general functional role in speeding up reward-guided search for network configurations that maximize reward expectation. Our theoretical analysis shows that stochastic search could in principle even attain optimal network configurations by emulating one of the most well-known nonlinear optimization methods, simulated annealing. But this optimization is usually impeded by slowness of stochastic search at a given temperature. We propose that CaMKII contributes a momentum term that substantially speeds up this search. In particular, it allows the network to overcome saddle points of the fitness function. The resulting improved stochastic policy search can be understood on a more abstract level as Hamiltonian sampling, which is known to be one of the most efficient stochastic search methods. △ Less

Submitted 15 May, 2018; v1 submitted 1 June, 2016; originally announced June 2016.

Comments: 27 pages, 5 figures

arXiv:1509.06863 [pdf]

doi 10.1186/s12859-019-2714-8

On the parameters affecting dual-target-function evaluation of single-particle selection from cryo-electron micrographs

Authors: Zhou Yu, Wei Li Wang, Luis R. Castillo-Menendez, Joseph Sodroski, Youdong Mao

Abstract: In the analysis of frozen hydrated biomolecules by single-particle cryo-electron microscopy, template-based particle picking by a target function called fast local correlation (FLC) allows a large number of particle images to be automatically picked from micrographs. A second, independent target function based on maximum likelihood (ML) can be used to align the images and verify the presence of si… ▽ More In the analysis of frozen hydrated biomolecules by single-particle cryo-electron microscopy, template-based particle picking by a target function called fast local correlation (FLC) allows a large number of particle images to be automatically picked from micrographs. A second, independent target function based on maximum likelihood (ML) can be used to align the images and verify the presence of signal in the picked particles. Although the paradigm of this dual-target-function (DTF) evaluation of single-particle selection has been practiced in recent years, it remains unclear how the performance of this DTF approach is affected by the signal-to-noise ratio of the images and by the choice of references for FLC and ML. Here we examine this problem through a systematic study of simulated data, followed by experimental substantiation. We quantitatively pinpoint the critical signal-to-noise ratio (SNR), at which the DTF approach starts losing its ability to select and verify particles from cryo-EM micrographs. A Gaussian model is shown to be as effective in picking particles as a single projection view of the imaged molecule in the tested cases. For both simulated micrographs and real cryo-EM data of the 173-kDa glucose isomerase complex, we found that the use of a Gaussian model to initialize the target functions suppressed the detrimental effect of reference bias in template-based particle selection. Given a sufficient signal-to-noise ratio in the images and the appropriate choice of references, the DTF approach can expedite the automated assembly of single-particle data sets. △ Less

Submitted 23 September, 2015; originally announced September 2015.

Comments: 62 pages, 11 figures. arXiv admin note: text overlap with arXiv:1309.2618

Journal ref: BMC Bioinformatics 2019; 20:169

arXiv:1509.00998 [pdf, other]

Sampling-based Causal Inference in Cue Combination and its Neural Implementation

Authors: Zhaofei Yu, Feng Chen, Jianwu Dong, Qionghai Dai

Abstract: Causal inference in cue combination is to decide whether the cues have a single cause or multiple causes. Although the Bayesian causal inference model explains the problem of causal inference in cue combination successfully, how causal inference in cue combination could be implemented by neural circuits, is unclear. The existing method based on calculating log posterior ratio with variable elimina… ▽ More Causal inference in cue combination is to decide whether the cues have a single cause or multiple causes. Although the Bayesian causal inference model explains the problem of causal inference in cue combination successfully, how causal inference in cue combination could be implemented by neural circuits, is unclear. The existing method based on calculating log posterior ratio with variable elimination has the problem of being unrealistic and task-specific. In this paper, we take advantages of the special structure of the Bayesian causal inference model and propose a hierarchical inference algorithm based on importance sampling. A simple neural circuit is designed to implement the proposed inference algorithm. Theoretical analyses and experimental results demonstrate that our algorithm converges to the accurate value as the sample size goes to infinite. Moreover, the neural circuit we design can be easily generalized to implement inference for other problems, such as the multi-stimuli cause inference and the same-different judgment. △ Less

Submitted 3 September, 2015; originally announced September 2015.

arXiv:0709.0778 [pdf]

Modular co-evolution of metabolic networks

Authors: Jing Zhao, Guo-Hui Ding, Lin Tao, Hong Yu, Zhong-Hao Yu, Jian-Hua Luo, Zhi-Wei Cao, Yi-Xue Li

Abstract: The architecture of biological networks has been reported to exhibit high level of modularity, and to some extent, topological modules of networks overlap with known functional modules. However, how the modular topology of the molecular network affects the evolution of its member proteins remains unclear. In this work, the functional and evolutionary modularity of Homo sapiens (H. sapiens) metab… ▽ More The architecture of biological networks has been reported to exhibit high level of modularity, and to some extent, topological modules of networks overlap with known functional modules. However, how the modular topology of the molecular network affects the evolution of its member proteins remains unclear. In this work, the functional and evolutionary modularity of Homo sapiens (H. sapiens) metabolic network were investigated from a topological point of view. Network decomposition shows that the metabolic network is organized in a highly modular core-periphery way, in which the core modules are tightly linked together and perform basic metabolism functions, whereas the periphery modules only interact with few modules and accomplish relatively independent and specialized functions. Moreover, over half of the modules exhibit co-evolutionary feature and belong to specific evolutionary ages. Peripheral modules tend to evolve more cohesively and faster than core modules do. The correlation between functional, evolutionary and topological modularity suggests that the evolutionary history and functional requirements of metabolic systems have been imprinted in the architecture of metabolic networks. Such systems level analysis could demonstrate how the evolution of genes may be placed in a genome-scale network context, giving a novel perspective on molecular evolution. △ Less

Submitted 6 September, 2007; originally announced September 2007.

Comments: 26 pages, 7 figures

Journal ref: BMC Bioinformatics, 2007, 8:311

arXiv:q-bio/0701001 [pdf]

doi 10.1016/j.bbrc.2006.11.080

MicroRNAs preferentially target the genes with high transcriptional regulation complexity

Authors: Qinghua Cui, Zhenbao Yu, Youlian Pan, Enrico Purisima, Edwin Wang

Abstract: Over the past few years, microRNAs (miRNAs) have emerged as a new prominent class of gene regulatory factors that negatively regulate expression of approximately one-third of the genes in animal genomes at post-transcriptional level. However, it is still unclear why some genes are regulated by miRNAs but others are not, i.e. what principles govern miRNA regulation in animal genomes. In this stud… ▽ More Over the past few years, microRNAs (miRNAs) have emerged as a new prominent class of gene regulatory factors that negatively regulate expression of approximately one-third of the genes in animal genomes at post-transcriptional level. However, it is still unclear why some genes are regulated by miRNAs but others are not, i.e. what principles govern miRNA regulation in animal genomes. In this study, we systematically analyzed the relationship between transcription factors (TFs) and miRNAs in gene regulation. We found that the genes with more TF-binding sites have a higher probability of being targeted by miRNAs and have more miRNA-binding sites on average. This observation reveals that the genes with higher cis-regulation complexity are more coordinately regulated by TFs at the transcriptional level and by miRNAs at the post-transcriptional level. This is a potentially novel discovery of mechanism for coordinated regulation of gene expression. Gene ontology analysis further demonstrated that such coordinated regulation is more popular in the developmental genes. △ Less

Submitted 29 December, 2006; originally announced January 2007.

Comments: supplementary data available at http://www.bri.nrc.ca/wang

Journal ref: Biochem Biophys Res Commun., 352:733-738, 2007

arXiv:q-bio/0612034 [pdf]

doi 10.1038/msb4100089

Principles of microRNA regulation of a human cellular signaling network

Authors: Qinghua Cui, Zhenbao Yu, Enrico O. Purisima, Edwin Wang

Abstract: MicroRNAs (miRNAs) are endogenous 22-nucleotide RNAs, which suppress gene expression by selectively binding to the 3-noncoding region of specific message RNAs through base-pairing. Given the diversity and abundance of miRNA targets, miRNAs appear to functionally interact with various components of many cellular networks. By analyzing the interactions between miRNAs and a human cellular signaling… ▽ More MicroRNAs (miRNAs) are endogenous 22-nucleotide RNAs, which suppress gene expression by selectively binding to the 3-noncoding region of specific message RNAs through base-pairing. Given the diversity and abundance of miRNA targets, miRNAs appear to functionally interact with various components of many cellular networks. By analyzing the interactions between miRNAs and a human cellular signaling network, we found that miRNAs predominantly target positive regulatory motifs, highly connected scaffolds and most downstream network components such as signaling transcription factors, but less frequently target negative regulatory motifs, common components of basic cellular machines and most upstream network components such as ligands. In addition, when an adaptor has potential to recruit more downstream components, these components are more frequently targeted by miRNAs. This work uncovers the principles of miRNA regulation of signal transduction networks and implies a potential function of miRNAs for facilitating robust transitions of cellular response to extracellular signals and maintaining cellular homeostasis. △ Less

Submitted 17 December, 2006; originally announced December 2006.

Journal ref: Molecular Systems Biology, 2:46, 2006

arXiv:q-bio/0312039 [pdf]

RNA Binding Density on X-chromosome Differing from that on 22 Autosomes in Human

Authors: Zhanjun Lu, Ying Lu, Shuxia Song, Zhai Yu, Xiufang Wang

Abstract: To test whether X-chromosome has unique genomic characteristics, X-chromosome and 22 autosomes were compared for RNA binding density. Nucleotide sequences on the chromosomes were divided into 50kb per segment that was recoded as a set of frequency values of 7-nucleotide (7nt) strings using all possible 7nt strings (47=16384). 120 genes highly expressed in tonsil germinal center B cells were sele… ▽ More To test whether X-chromosome has unique genomic characteristics, X-chromosome and 22 autosomes were compared for RNA binding density. Nucleotide sequences on the chromosomes were divided into 50kb per segment that was recoded as a set of frequency values of 7-nucleotide (7nt) strings using all possible 7nt strings (47=16384). 120 genes highly expressed in tonsil germinal center B cells were selected for calculating 7nt string frequency values of all introns (RNAs). The binding density of DNA segments and RNAs was determined by the amount of complement sequences. It was shown for the first time that gene-poor and low gene expression X-chromosome had the lowest percentage of the DNA segments that can highly bind RNAs, whereas gene-rich and high gene expression chromosome 19 had the highest percentage of the segments. On the basis of these results, it is proposed that the nonrandom properties of distribution of RNA highly binding DNA segments on the chromosomes provide strong evidence that lack of RNA highly binding segments may be a cause of X-chromosome inactivation △ Less

Submitted 24 December, 2003; originally announced December 2003.

arXiv:physics/0207060 [pdf, ps, other]

doi 10.1103/PhysRevE.66.031910

Recognition of an organism from fragments of its complete genome

Authors: V. V. Anh, K. S. Lau, Z. G. Yu

Abstract: This paper considers the problem of matching fragment to organism using its complete genome. Our method is based on the probability measure representation of a genome. We first demonstrate that these probability measures can be modelled as recurrent iterated function systems (RIFS) consisting of four contractive similarities. Our hypothesis is that the multifractal characteristic of the probabil… ▽ More This paper considers the problem of matching fragment to organism using its complete genome. Our method is based on the probability measure representation of a genome. We first demonstrate that these probability measures can be modelled as recurrent iterated function systems (RIFS) consisting of four contractive similarities. Our hypothesis is that the multifractal characteristic of the probability measure of a complete genome, as captured by the RIFS, is preserved in its reasonably long fragments. We compute the RIFS of fragments of various lengths and random starting points, and compare with that of the original sequence for recognition using the Euclidean distance. A demonstration on five randomly selected organisms supports the above hypothesis. △ Less

Submitted 15 July, 2002; originally announced July 2002.

Comments: 9 pages,5 figures, Accepted for publication by Phys. Rev. E

Journal ref: Phys. Rev. E, vol. 66, (2002) 031910.

arXiv:physics/0108055 [pdf, ps, other]

doi 10.1103/PhysRevE.64.031903

Measure representation and multifractal analysis of complete genomes

Authors: Zu-Guo Yu, Vo Anh, Ka-Sing Lau

Abstract: This paper introduces the notion of measure representation of DNA sequences. Spectral analysis and multifractal analysis are then performed on the measure representations of a large number of complete genomes. The main aim of this paper is to discuss the multifractal property of the measure representation and the classification of bacteria. From the measure representations and the values of the… ▽ More This paper introduces the notion of measure representation of DNA sequences. Spectral analysis and multifractal analysis are then performed on the measure representations of a large number of complete genomes. The main aim of this paper is to discuss the multifractal property of the measure representation and the classification of bacteria. From the measure representations and the values of the $D_{q}$ spectra and related $C_{q}$ curves, it is concluded that these complete genomes are not random sequences. In fact, spectral analyses performed indicate that these measure representations considered as time series, exhibit strong long-range correlation. For substrings with length K=8, the $D_{q}$ spectra of all organisms studied are multifractal-like and sufficiently smooth for the $C_{q}$ curves to be meaningful. The $C_{q}$ curves of all bacteria resemble a classical phase transition at a critical point. But the 'analogous' phase transitions of chromosomes of non-bacteria organisms are different. Apart from Chromosome 1 of {\it C. elegans}, they exhibit the shape of double-peaked specific heat function. △ Less

Submitted 28 August, 2001; originally announced August 2001.

Comments: 12 pages with 9 figures and 1 table

Journal ref: Phys. Rev. E, Vol. 64, 031903 (2001)

arXiv:physics/0108054 [pdf, ps, other]

doi 10.1088/0305-4470/34/36/301

Multifractal characterisation of complete genomes

Authors: Vo Anh, Ka-Sing Lau, Zu-Guo Yu

Abstract: This paper develops a theory for characterisation of DNA sequences based on their measure representation. The measures are shown to be random cascades generated by an infinitely divisible distribution. This probability distribution is uniquely determined by the exponent function in the multifractal theory of random cascades. Curve fitting to a large number of complete genomes of bacteria indicat… ▽ More This paper develops a theory for characterisation of DNA sequences based on their measure representation. The measures are shown to be random cascades generated by an infinitely divisible distribution. This probability distribution is uniquely determined by the exponent function in the multifractal theory of random cascades. Curve fitting to a large number of complete genomes of bacteria indicates that the Gamma density function provides an excellent fit to the exponent function, and hence to the probability distribution of the complete genomes. △ Less

Submitted 28 August, 2001; originally announced August 2001.

Comments: 16 pages with 4 figures and 1 table, To appear in J. Phys. A: Math. Gene

Journal ref: J. Phys. A: Math. Gen., vol 34, (2001) 7127-7139

arXiv:physics/0108053 [pdf, ps, other]

doi 10.1016/S0378-4371(01)00391-0

Multifractal characterisation of length sequences of coding and noncoding segments in a complete genome

Authors: Zu-Guo Yu, Vo Anh, Ka-Sing Lau

Abstract: The coding and noncoding length sequences constructed from a complete genome are characterised by multifractal analysis. The dimension spectrum $D_{q}$ and its derivative, the 'analogous' specific heat $C_{q}$, are calculated for the coding and noncoding length sequences of bacteria, where $q$ is the moment order of the partition sum of the sequences. From the shape of the $% D_{q}$ and $C_{q}$… ▽ More The coding and noncoding length sequences constructed from a complete genome are characterised by multifractal analysis. The dimension spectrum $D_{q}$ and its derivative, the 'analogous' specific heat $C_{q}$, are calculated for the coding and noncoding length sequences of bacteria, where $q$ is the moment order of the partition sum of the sequences. From the shape of the $% D_{q}$ and $C_{q}$ curves, it is seen that there exists a clear difference between the coding/noncoding length sequences of all organisms considered and a completely random sequence. The complexity of noncoding length sequences is higher than that of coding length sequences for bacteria. Almost all $D_{q}$ curves for coding length sequences are flat, so their multifractality is small whereas almost all $D_{q}$ curves for noncoding length sequences are multifractal-like. We propose to characterise the bacteria according to the types of the $C_{q}$ curves of their noncoding length sequences. △ Less

Submitted 28 August, 2001; originally announced August 2001.

Comments: 15 pages with 5 figures, Latex, Accepted for publication in Physica A

Journal ref: Physica A, vol 301, (2001) 351-361

arXiv:physics/0009008 [pdf, ps, other]

doi 10.1063/1.481180

One way to Characterize the compact structures of lattice protein model

Authors: Bin Wang, Zu-guo Yu

Abstract: On the study of protein folding, our understanding about the protein structures is limited. In this paper we find one way to characterize the compact structures of lattice protein model. A quantity called Partnum is given to each compact structure. The Partnum is compared with the concept Designability of protein structures emerged recently. It is shown that the highly designable structures have… ▽ More On the study of protein folding, our understanding about the protein structures is limited. In this paper we find one way to characterize the compact structures of lattice protein model. A quantity called Partnum is given to each compact structure. The Partnum is compared with the concept Designability of protein structures emerged recently. It is shown that the highly designable structures have, on average, an atypical number of local degree of freedom. The statistical property of Partnum and its dependence on sequence length is also studied. △ Less

Submitted 2 September, 2000; originally announced September 2000.

Comments: 10 pages, 5 figures

Journal ref: J. Chem. Phys. Vol. 112, P. 6084

arXiv:physics/0006080 [pdf, ps, other]

doi 10.1016/S0960-0779(99)00208-8

A time series model of CDS sequences in complete genome

Authors: Zu-Guo Yu, Bin Wang

Abstract: A time series model of CDS sequences in complete genome is proposed. A map of DNA sequence to integer sequence is given. The correlation dimensions and Hurst exponents of CDS sequences in complete genome of bacteria are calculated. Using the average of correlation dimensions, some interesting results are obtained. A time series model of CDS sequences in complete genome is proposed. A map of DNA sequence to integer sequence is given. The correlation dimensions and Hurst exponents of CDS sequences in complete genome of bacteria are calculated. Using the average of correlation dimensions, some interesting results are obtained. △ Less

Submitted 30 June, 2000; originally announced June 2000.

Comments: 11 pages with 4 figures and one table, Chaos, Solitons and Fractals (2000)(to appear)

ACM Class: 87.10+e, 47.53+n

Journal ref: Chaos, Solitons and Fractals 12(3) (2001)519-526

arXiv:physics/0006079 [pdf, ps, other]

doi 10.1103/PhysRevE.63.011903

Correlation property of length sequences based on global structure of complete genome

Authors: Zu-Guo Yu, V. V. Anh, Bin Wang

Abstract: This paper considers three kinds of length sequences of the complete genome. Detrended fluctuation analysis, spectral analysis, and the mean distance spanned within time $L$ are used to discuss the correlation property of these sequences. The values of the exponents from these methods of these three kinds of length sequences of bacteria indicate that the long-range correlations exist in most of… ▽ More This paper considers three kinds of length sequences of the complete genome. Detrended fluctuation analysis, spectral analysis, and the mean distance spanned within time $L$ are used to discuss the correlation property of these sequences. The values of the exponents from these methods of these three kinds of length sequences of bacteria indicate that the long-range correlations exist in most of these sequences. The correlation have a rich variety of behaviours including the presence of anti-correlations. Further more, using the exponent $γ$, it is found that these correlations are all linear ($γ=1.0\pm 0.03$). It is also found that these sequences exhibit $1/f$ noise in some interval of frequency ($f>1$). The length of this interval of frequency depends on the length of the sequence. The shape of the periodogram in $f>1$ exhibits some periodicity. The period seems to depend on the length and the complexity of the length sequence. △ Less

Submitted 20 October, 2000; v1 submitted 30 June, 2000; originally announced June 2000.

Comments: RevTex, 9 pages with 5 figures and 3 tables. Phys. Rev. E Jan. 1,2001 (to appear)

ACM Class: 87.10+e, 47.53+n

Journal ref: Physical Review E, vol 63, (2001) 11903

arXiv:physics/0006071 [pdf, ps, other]

doi 10.1016/S0960-0779(00)00147-8

Time series model based on global structure of complete genome

Authors: Zu-Guo Yu, Vo Anh

Abstract: A time series model based on the global structure of the complete genome is proposed. Three kinds of length sequences of the complete genome are considered. The correlation dimensions and Hurst exponents of the length sequences are calculated. Using these two exponents, some interesting results related to the problem of classification and evolution relationship of bacteria are obtained. A time series model based on the global structure of the complete genome is proposed. Three kinds of length sequences of the complete genome are considered. The correlation dimensions and Hurst exponents of the length sequences are calculated. Using these two exponents, some interesting results related to the problem of classification and evolution relationship of bacteria are obtained. △ Less

Submitted 28 June, 2000; originally announced June 2000.

Comments: 11 pages with 3 figures and 2 tables, Chaos, Solitons and Fractals (Accepted for publications)

ACM Class: 87.10+e, 47.53+n

Journal ref: Chaos, Solitons and Fractals, vol 12(10), (2001) 1827-1834

arXiv:physics/9910040 [pdf, ps, other]

doi 10.1016/S0960-0779(99)00141-1

Dimensions of fractals related to languages defined by tagged strings in complete genomes

Authors: Zu-Guo Yu, Bai-lin Hao, Hui-min Xie, Guo-Yi Chen

Abstract: A representation of frequency of strings of length K in complete genomes of many organisms in a square has led to seemingly self-similar patterns when K increases. These patterns are caused by under-represented strings with a certain "tag"-string and they define some fractals when K tends to infinite. The Box and Hausdorff dimensions of the limit set are discussed. Although the method proposed b… ▽ More A representation of frequency of strings of length K in complete genomes of many organisms in a square has led to seemingly self-similar patterns when K increases. These patterns are caused by under-represented strings with a certain "tag"-string and they define some fractals when K tends to infinite. The Box and Hausdorff dimensions of the limit set are discussed. Although the method proposed by Mauldin and Williams to calculate Box and Hausdorff dimension is valid in our case, a different and simpler method is proposed in this paper. △ Less

Submitted 26 October, 1999; originally announced October 1999.

Comments: 9 pages with two figures

Journal ref: Chaos, Solitons and Fractals 11(14) (2000) 2215-2222

arXiv:physics/9910039 [pdf, ps, other]

doi 10.1088/0253-6102/33/4/673

Rescaled range and transition matrix analysis of DNA sequences

Authors: Zu-Guo Yu, Guo-Yi Chen

Abstract: In this paper we treat some fractal and statistical features of the DNA sequences. First, a fractal record model of DNA sequence is proposed by mapping DNA sequences to integer sequences, followed by R/S analysis of the model and computation of the Hurst exponents. Second, we consider transition between the four kinds of bases within DNA. The transition matrix analysis of DNA sequences shows tha… ▽ More In this paper we treat some fractal and statistical features of the DNA sequences. First, a fractal record model of DNA sequence is proposed by mapping DNA sequences to integer sequences, followed by R/S analysis of the model and computation of the Hurst exponents. Second, we consider transition between the four kinds of bases within DNA. The transition matrix analysis of DNA sequences shows that some measures of complexity based on transition proportion matrix are of interest. We use some measures of complexity to distinguish exon and intron. Regarding the evolution, we find that for species of higher grade, the transition rate among the four kinds of bases goes further from the equilibrium. △ Less

Submitted 26 October, 1999; originally announced October 1999.

Comments: 8 pages with one figure. Communication in Theoretical Physics (2000) (to appear)

Journal ref: Comm. Theor. Phys. 33(4) (2000) 673-678

Showing 1–50 of 50 results for author: Yu, Z