-
Dimeric Drug Polymeric Micelles with Acid-Active Tumor Targeting and FRET-indicated Drug Release
Authors:
Xing Guo,
Lin Wang,
Kayla Duval,
Jing Fan,
Shaobing Zhou,
Zi Chen
Abstract:
Trans-activating transcriptional activator (TAT), a cell-penetrating peptide, has been extensively used for facilitating cellular uptake and nuclear targeting of drug delivery systems. However, the positively charged TAT peptide usually strongly interacts with serum components and undergoes substantial phagocytosis by the reticuloendothelial system, causing a short blood circulation in vivo. In th…
▽ More
Trans-activating transcriptional activator (TAT), a cell-penetrating peptide, has been extensively used for facilitating cellular uptake and nuclear targeting of drug delivery systems. However, the positively charged TAT peptide usually strongly interacts with serum components and undergoes substantial phagocytosis by the reticuloendothelial system, causing a short blood circulation in vivo. In this work, an acid-active tumor targeting nanoplatform DA-TAT-PECL was developed to effectively inhibit the nonspecific interactions of TAT in the bloodstream. 2,3-dimethylmaleic anhydride (DA) was first used to convert the TAT amines to carboxylic acid, the resulting DA-TAT was further conjugated to get DA-TAT-PECL. After self-assembly into polymeric micelles, they were capable of circulating in the physiological condition for a long time and promoting cell penetration upon accumulation at the tumor site and de-shielding the DA group. Moreover, camptothecin (CPT) was used as the anticancer drug and modified into a dimer (CPT)2-ss-Mal, in which two CPT molecules were connected by a reduction-labile maleimide thioether bond. The FRET signal between CPT and maleimide thioether bond was monitored to visualize the drug release process and effective targeted delivery of antitumor drugs was demonstrated. This pH/reduction dual-responsive micelle system provides a new platform for high fidelity cancer therapy.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
BERT and LLMs-Based avGFP Brightness Prediction and Mutation Design
Authors:
X. Guo,
W. Che
Abstract:
This study aims to utilize Transformer models and large language models (such as GPT and Claude) to predict the brightness of Aequorea victoria green fluorescent protein (avGFP) and design mutants with higher brightness. Considering the time and cost associated with traditional experimental screening methods, this study employs machine learning techniques to enhance research efficiency. We first r…
▽ More
This study aims to utilize Transformer models and large language models (such as GPT and Claude) to predict the brightness of Aequorea victoria green fluorescent protein (avGFP) and design mutants with higher brightness. Considering the time and cost associated with traditional experimental screening methods, this study employs machine learning techniques to enhance research efficiency. We first read and preprocess a proprietary dataset containing approximately 140,000 protein sequences, including about 30,000 avGFP sequences. Subsequently, we constructed and trained a Transformer-based prediction model to screen and design new avGFP mutants that are expected to exhibit higher brightness.
Our methodology consists of two primary stages: first, the construction of a scoring model using BERT, and second, the screening and generation of mutants using mutation site statistics and large language models. Through the analysis of predictive results, we designed and screened 10 new high-brightness avGFP sequences. This study not only demonstrates the potential of deep learning in protein design but also provides new perspectives and methodologies for future research by integrating prior knowledge from large language models.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry
Authors:
Shiva Ebrahimi,
Xuan Guo
Abstract:
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor…
▽ More
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce DiaTrans, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our DiaTrans model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DiaTrans.
△ Less
Submitted 26 June, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Levenshtein Distance Embedding with Poisson Regression for DNA Storage
Authors:
Xiang Wei,
Alan J. X. Guo,
Sihan Sun,
Mengyi Wei,
Wei Yu
Abstract:
Efficient computation or approximation of Levenshtein distance, a widely-used metric for evaluating sequence similarity, has attracted significant attention with the emergence of DNA storage and other biological applications. Sequence embedding, which maps Levenshtein distance to a conventional distance between embedding vectors, has emerged as a promising solution. In this paper, a novel neural n…
▽ More
Efficient computation or approximation of Levenshtein distance, a widely-used metric for evaluating sequence similarity, has attracted significant attention with the emergence of DNA storage and other biological applications. Sequence embedding, which maps Levenshtein distance to a conventional distance between embedding vectors, has emerged as a promising solution. In this paper, a novel neural network-based sequence embedding technique using Poisson regression is proposed. We first provide a theoretical analysis of the impact of embedding dimension on model performance and present a criterion for selecting an appropriate embedding dimension. Under this embedding dimension, the Poisson regression is introduced by assuming the Levenshtein distance between sequences of fixed length following a Poisson distribution, which naturally aligns with the definition of Levenshtein distance. Moreover, from the perspective of the distribution of embedding distances, Poisson regression approximates the negative log likelihood of the chi-squared distribution and offers advancements in removing the skewness. Through comprehensive experiments on real DNA storage data, we demonstrate the superior performance of the proposed method compared to state-of-the-art approaches.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
STGIC: a graph and image convolution-based method for spatial transcriptomic clustering
Authors:
Chen Zhang,
Junhui Gao,
Lingxin Kong,
Guangshuo cao,
Xiangyu Guo,
Wei Liu
Abstract:
Spatial transcriptomic (ST) clustering employs spatial and transcription information to group spots spatially coherent and transcriptionally similar together into the same spatial domain. Graph convolution network (GCN) and graph attention network (GAT), fed with spatial coordinates derived adjacency and transcription profile derived feature matrix are often used to solve the problem. Our proposed…
▽ More
Spatial transcriptomic (ST) clustering employs spatial and transcription information to group spots spatially coherent and transcriptionally similar together into the same spatial domain. Graph convolution network (GCN) and graph attention network (GAT), fed with spatial coordinates derived adjacency and transcription profile derived feature matrix are often used to solve the problem. Our proposed method STGIC (spatial transcriptomic clustering with graph and image convolution) utilizes an adaptive graph convolution (AGC) to get high quality pseudo-labels and then resorts to dilated convolution framework (DCF) for virtual image converted from gene expression information and spatial coordinates of spots. The dilation rates and kernel sizes are set appropriately and updating of weight values in the kernels is made to be subject to the spatial distance from the position of corresponding elements to kernel centers so that feature extraction of each spot is better guided by spatial distance to neighbor spots. Self-supervision realized by KL-divergence, spatial continuity loss and cross entropy calculated among spots with high confidence pseudo-labels make up the training objective of DCF. STGIC attains state-of-the-art (SOTA) clustering performance on the benchmark dataset of human dorsolateral prefrontal cortex (DLPFC). Besides, it's capable of depicting fine structures of other tissues from other species as well as guiding the identification of marker genes. Also, STGIC is expandable to Stereo-seq data with high spatial resolution.
△ Less
Submitted 23 October, 2023; v1 submitted 19 March, 2023;
originally announced March 2023.
-
Inter-brain substrates of role switching during mother-child interaction
Authors:
Yamin Li,
Saishuang Wu,
Jiayang Xu,
Haiwa Wang,
Qi Zhu,
Wen Shi,
Yue Fang,
Fan Jiang,
Shanbao Tong,
Yunting Zhang,
Xiaoli Guo
Abstract:
Mother-child interaction is highly dynamic and reciprocal. Switching roles in these back-and-forth interactions serves as a crucial feature of reciprocal behaviors while the underlying neural entrainment is still not well-studied. Here, we designed a role-controlled cooperative task with dual EEG recording to study how differently two brains interact when mothers and children hold different roles.…
▽ More
Mother-child interaction is highly dynamic and reciprocal. Switching roles in these back-and-forth interactions serves as a crucial feature of reciprocal behaviors while the underlying neural entrainment is still not well-studied. Here, we designed a role-controlled cooperative task with dual EEG recording to study how differently two brains interact when mothers and children hold different roles. When children were actors and mothers were observers, mother-child inter-brain synchrony emerged within the theta oscillations and the frontal lobe, which highly correlated with children's attachment to their mothers. When their roles were reversed, this synchrony was shifted to the alpha oscillations and the central area and associated with mothers' perception of their relationship with their children. The results suggested an observer-actor neural alignment within the actor's oscillations, which was modulated by the actor-toward-observer emotional bonding. Our findings contribute to the understanding of how inter-brain synchrony is established and dynamically changed during mother-child reciprocal interaction.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Learning Personalized Brain Functional Connectivity of MDD Patients from Multiple Sites via Federated Bayesian Networks
Authors:
Shuai Liu,
Xiao Guo,
Shun Qi,
Huaning Wang,
Xiangyu Chang
Abstract:
Identifying functional connectivity biomarkers of major depressive disorder (MDD) patients is essential to advance understanding of the disorder mechanisms and early intervention. However, due to the small sample size and the high dimension of available neuroimaging data, the performance of existing methods is often limited. Multi-site data could enhance the statistical power and sample size, whil…
▽ More
Identifying functional connectivity biomarkers of major depressive disorder (MDD) patients is essential to advance understanding of the disorder mechanisms and early intervention. However, due to the small sample size and the high dimension of available neuroimaging data, the performance of existing methods is often limited. Multi-site data could enhance the statistical power and sample size, while they are often subject to inter-site heterogeneity and data-sharing policies. In this paper, we propose a federated joint estimator, NOTEARS-PFL, for simultaneous learning of multiple Bayesian networks (BNs) with continuous optimization, to identify disease-induced alterations in MDD patients. We incorporate information shared between sites and site-specific information into the proposed federated learning framework to learn personalized BN structures by introducing the group fused lasso penalty. We develop the alternating direction method of multipliers, where in the local update step, the neuroimaging data is processed at each local site. Then the learned network structures are transmitted to the center for the global update. In particular, we derive a closed-form expression for the local update step and use the iterative proximal projection method to deal with the group fused lasso penalty in the global update step. We evaluate the performance of the proposed method on both synthetic and real-world multi-site rs-fMRI datasets. The results suggest that the proposed NOTEARS-PFL yields superior effectiveness and accuracy than the comparable methods.
△ Less
Submitted 6 January, 2023;
originally announced January 2023.
-
Early Disease Stage Characterization in Parkinson's Disease from Resting-state fMRI Data Using a Long Short-term Memory Network
Authors:
Xueqi Guo,
Sule Tinaz,
Nicha C. Dvornek
Abstract:
Parkinson's disease (PD) is a common and complex neurodegenerative disorder with 5 stages in the Hoehn and Yahr scaling. Given the heterogeneity of PD, it is challenging to classify early stages 1 and 2 and detect brain function alterations. Functional magnetic resonance imaging (fMRI) is a promising tool in revealing functional connectivity (FC) differences and developing biomarkers in PD. Some m…
▽ More
Parkinson's disease (PD) is a common and complex neurodegenerative disorder with 5 stages in the Hoehn and Yahr scaling. Given the heterogeneity of PD, it is challenging to classify early stages 1 and 2 and detect brain function alterations. Functional magnetic resonance imaging (fMRI) is a promising tool in revealing functional connectivity (FC) differences and developing biomarkers in PD. Some machine learning approaches like support vector machine and logistic regression have been successfully applied in the early diagnosis of PD using fMRI data, which outperform classifiers based on manually selected morphological features. However, the early-stage characterization in FC changes has not been fully investigated. Given the complexity and non-linearity of fMRI data, we propose the use of a long short-term memory (LSTM) network to characterize the early stages of PD. The study included 84 subjects (56 in stage 2 and 28 in stage 1) from the Parkinson's Progression Markers Initiative (PPMI), the largest available public PD dataset. Under a repeated 10-fold stratified cross-validation, the LSTM model reached an accuracy of 71.63%, 13.52% higher than the best traditional machine learning method, indicating significantly better robustness and accuracy compared with other machine learning classifiers. We used the learned LSTM model weights to select the top brain regions that contributed to model prediction and performed FC analyses to characterize functional changes with disease stage and motor impairment to gain better insight into the brain mechanisms of PD.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
A Weak Monotonicity Based Muscle Fatigue Detection Algorithm for a Short-Duration Poor Posture Using sEMG Measurements
Authors:
Xinliang Guo,
Lei Lu,
Mark Robinson,
Ying Tan,
Kusal Goonewardena,
Denny Oetomo
Abstract:
Muscle fatigue is usually defined as a decrease in the ability to produce force. The surface electromyography (sEMG) signals have been widely used to provide information about muscle activities including detecting muscle fatigue by various data-driven techniques such as machine learning and statistical approaches. However, it is well-known that sEMG signals are weak signals (low amplitude of the s…
▽ More
Muscle fatigue is usually defined as a decrease in the ability to produce force. The surface electromyography (sEMG) signals have been widely used to provide information about muscle activities including detecting muscle fatigue by various data-driven techniques such as machine learning and statistical approaches. However, it is well-known that sEMG signals are weak signals (low amplitude of the signals) with a low signal-to-noise ratio, data-driven techniques cannot work well when the quality of the data is poor. In particular, the existing methods are unable to detect muscle fatigue coming from static poses. This work exploits the concept of weak monotonicity, which has been observed in the process of fatigue, to robustly detect muscle fatigue in the presence of measurement noises and human variations. Such a population trend methodology has shown its potential in muscle fatigue detection as demonstrated by the experiment of a static pose.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Deep learning for peptide identification from metaproteomics datasets
Authors:
Xuan Guo,
Shichao Feng
Abstract:
Metaproteomics are becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. The identification of peptides and proteins from MS data involves the computational procedure of searching MS/MS spectra ag…
▽ More
Metaproteomics are becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. The identification of peptides and proteins from MS data involves the computational procedure of searching MS/MS spectra against a predefined protein sequence database and assigning top-scored peptides to spectra. Existing computational tools are still far from being able to extract all the information out of large MS/MS datasets acquired from metaproteome samples. In this paper, we proposed a deep-learning-based algorithm, called DeepFilter, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Compared with other post-processing tools, including Percolator, Q-ranker, PeptideProphet, and Iprophet, DeepFilter identified 20% and 10% more peptide-spectrum-matches and proteins, respectively, on marine microbial and soil microbial metaproteome samples with false discovery rate at 1%.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Generating Tertiary Protein Structures via an Interpretative Variational Autoencoder
Authors:
Xiaojie Guo,
Yuanqi Du,
Sivani Tadepalli,
Liang Zhao,
Amarda Shehu
Abstract:
Much scientific enquiry across disciplines is founded upon a mechanistic treatment of dynamic systems that ties form to function. A highly visible instance of this is in molecular biology, where an important goal is to determine functionally-relevant forms/structures that a protein molecule employs to interact with molecular partners in the living cell. This goal is typically pursued under the umb…
▽ More
Much scientific enquiry across disciplines is founded upon a mechanistic treatment of dynamic systems that ties form to function. A highly visible instance of this is in molecular biology, where an important goal is to determine functionally-relevant forms/structures that a protein molecule employs to interact with molecular partners in the living cell. This goal is typically pursued under the umbrella of stochastic optimization with algorithms that optimize a scoring function. Research repeatedly shows that current scoring function, though steadily improving, correlate weakly with molecular activity. Inspired by recent momentum in generative deep learning, this paper proposes and evaluates an alternative approach to generating functionally-relevant three-dimensional structures of a protein. Though typically deep generative models struggle with highly-structured data, the work presented here circumvents this challenge via graph-generative models. A comprehensive evaluation of several deep architectures shows the promise of generative models in directly revealing the latent space for sampling novel tertiary structures, as well as in highlighting axes/factors that carry structural meaning and open the black box often associated with deep models. The work presented here is a first step towards interpretative, deep generative models becoming viable and informative complementary approaches to protein structure prediction.
△ Less
Submitted 16 June, 2021; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Prediction and analysis of Coronavirus Disease 2019
Authors:
Lin Jia,
Kewen Li,
Yu Jiang,
Xin Guo,
Ting zhao
Abstract:
In December 2019, a novel coronavirus was found in a seafood wholesale market in Wuhan, China. WHO officially named this coronavirus as COVID-19. Since the first patient was hospitalized on December 12, 2019, China has reported a total of 78,824 confirmed CONID-19 cases and 2,788 deaths as of February 28, 2020. Wuhan's cumulative confirmed cases and deaths accounted for 61.1% and 76.5% of the whol…
▽ More
In December 2019, a novel coronavirus was found in a seafood wholesale market in Wuhan, China. WHO officially named this coronavirus as COVID-19. Since the first patient was hospitalized on December 12, 2019, China has reported a total of 78,824 confirmed CONID-19 cases and 2,788 deaths as of February 28, 2020. Wuhan's cumulative confirmed cases and deaths accounted for 61.1% and 76.5% of the whole China mainland , making it the priority center for epidemic prevention and control. Meanwhile, 51 countries and regions outside China have reported 4,879 confirmed cases and 79 deaths as of February 28, 2020. COVID-19 epidemic does great harm to people's daily life and country's economic development. This paper adopts three kinds of mathematical models, i.e., Logistic model, Bertalanffy model and Gompertz model. The epidemic trends of SARS were first fitted and analyzed in order to prove the validity of the existing mathematical models. The results were then used to fit and analyze the situation of COVID-19. The prediction results of three different mathematical models are different for different parameters and in different regions. In general, the fitting effect of Logistic model may be the best among the three models studied in this paper, while the fitting effect of Gompertz model may be better than Bertalanffy model. According to the current trend, based on the three models, the total number of people expected to be infected is 49852-57447 in Wuhan,12972-13405 in non-Hubei areas and 80261-85140 in China respectively. The total death toll is 2502-5108 in Wuhan, 107-125 in Non-Hubei areas and 3150-6286 in China respetively. COVID-19 will be over p robably in late-April, 2020 in Wuhan and before late-March, 2020 in other areas respectively.
△ Less
Submitted 16 March, 2020; v1 submitted 11 March, 2020;
originally announced March 2020.
-
COVID-19 Docking Server: A meta server for docking small molecules, peptides and antibodies against potential targets of COVID-19
Authors:
Ren Kong,
Guangbo Yang,
Rui Xue,
Ming Liu,
Feng Wang,
Jianping Hu,
Xiaoqiang Guo,
Shan Chang
Abstract:
Motivation: The coronavirus disease 2019 (COVID-19) caused by a new type of coronavirus has been emerging from China and led to thousands of death globally since December 2019. Despite many groups have engaged in studying the newly emerged virus and searching for the treatment of COVID-19, the understanding of the COVID-19 target-ligand interactions represents a key chal-lenge. Herein, we introduc…
▽ More
Motivation: The coronavirus disease 2019 (COVID-19) caused by a new type of coronavirus has been emerging from China and led to thousands of death globally since December 2019. Despite many groups have engaged in studying the newly emerged virus and searching for the treatment of COVID-19, the understanding of the COVID-19 target-ligand interactions represents a key chal-lenge. Herein, we introduce COVID-19 Docking Server, a web server that predicts the binding modes between COVID-19 targets and the ligands including small molecules, peptides and anti-bodies. Results: Structures of proteins involved in the virus life cycle were collected or constructed based on the homologs of coronavirus, and prepared ready for docking. The meta platform provides a free and interactive tool for the prediction of COVID-19 target-ligand interactions and following drug discovery for COVID-19.
△ Less
Submitted 7 August, 2020; v1 submitted 28 February, 2020;
originally announced March 2020.
-
The Nonequilibrium Mechanism of Noise Enhancer synergizing with Activator in HIV Latency Reactivation
Authors:
Xiaolu Guo,
Tao Tang,
Minxuan Duan,
Lei Zhang,
Hao Ge
Abstract:
Noise-modulating chemicals can synergize with transcriptional activators in reactivating latent HIV to eliminate latent HIV reservoirs. To understand the underlying biomolecular mechanism, we investigate a previous two-gene-state model and identify two necessary conditions for the synergy: an assumption of inhibition effect of transcription activators on noise enhancers; and frequent transitions t…
▽ More
Noise-modulating chemicals can synergize with transcriptional activators in reactivating latent HIV to eliminate latent HIV reservoirs. To understand the underlying biomolecular mechanism, we investigate a previous two-gene-state model and identify two necessary conditions for the synergy: an assumption of inhibition effect of transcription activators on noise enhancers; and frequent transitions to the gene non-transcription-permissive state. We then develop a loop-four-gene-state model with Tat transcription/translation and find that drug synergy is mainly determined by the magnitude and direction of energy input into the genetic regulatory kinetics of the HIV promoter. The inhibition effect of transcription activators is actually a phenomenon of energy dissipation in the nonequilibrium gene transition system. Overall, the loop-four-state model demonstrates that energy dissipation plays a crucial role in HIV latency reactivation, which might be useful for improving drug effects and identifying other synergies on lentivirus latency reactivation.
△ Less
Submitted 12 March, 2022; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Dynamics of Social Interactions and Agent Spreading in Social Insects Colonies: Effects of Environmental Events and Spatial Heterogeneity
Authors:
Xisohui Guo,
Jun Chen,
Asma Azizi,
Jennifer Fewell,
Yun Kang
Abstract:
The relationship between division of labor and individuals' spatial behavior in social insect colonies provides a useful context to study how social interactions influence the spreading of agent (which could be information or virus) across distributed agent systems. In social insect colonies, spatial heterogeneity associated with variations of individual task roles, affects social contacts, and th…
▽ More
The relationship between division of labor and individuals' spatial behavior in social insect colonies provides a useful context to study how social interactions influence the spreading of agent (which could be information or virus) across distributed agent systems. In social insect colonies, spatial heterogeneity associated with variations of individual task roles, affects social contacts, and thus the way in which agent moves through social contact networks. We used an Agent Based Model (ABM) to mimic three realistic scenarios of agent spreading in social insect colonies. Our model suggests that individuals within a specific task interact more with consequences that agent could potentially spread rapidly within that group, while agent spreads slower between task groups. Our simulations show a strong linear relationship between the degree of spatial heterogeneity and social contact rates, and that the spreading dynamics of agents follow a modified nonlinear logistic growth model with varied transmission rates for different scenarios. Our work provides an important insights on the dual-functionality of physical contacts. This dual-functionality is often driven via variations of individual spatial behavior, and can have both inhibiting and facilitating effects on agent transmission rates depending on environment. The results from our proposed model not only provide important insights on mechanisms that generate spatial heterogeneity, but also deepen our understanding of how social insect colonies balance the benefit and cost of physical contacts on the agents' transmission under varied environmental conditions.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Inference with Hybrid Bio-hardware Neural Networks
Authors:
Yuan Zeng,
Zubayer Ibne Ferdous,
Weixiang Zhang,
Mufan Xu,
Anlan Yu,
Drew Patel,
Xiaochen Guo,
Yevgeny Berdichevsky,
Zhiyuan Yan
Abstract:
To understand the learning process in brains, biologically plausible algorithms have been explored by modeling the detailed neuron properties and dynamics. On the other hand, simplified multi-layer models of neural networks have shown great success on computational tasks such as image classification and speech recognition. However, the computational models that can achieve good accuracy for these…
▽ More
To understand the learning process in brains, biologically plausible algorithms have been explored by modeling the detailed neuron properties and dynamics. On the other hand, simplified multi-layer models of neural networks have shown great success on computational tasks such as image classification and speech recognition. However, the computational models that can achieve good accuracy for these learning applications are very different from the bio-plausible models. This paper studies whether a bio-plausible model of a in vitro living neural network can be used to perform machine learning tasks and achieve good inference accuracy. A novel two-layer bio-hardware hybrid neural network is proposed. The biological layer faithfully models variations of synapses, neurons, and network sparsity in in vitro living neural networks. The hardware layer is a computational fully-connected layer that tunes parameters to optimize for accuracy. Several techniques are proposed to improve the inference accuracy of the proposed hybrid neural network. For instance, an adaptive pre-processing technique helps the proposed neural network to achieve good learning accuracy for different living neural network sparsity. The proposed hybrid neural network with realistic neuron parameters and variations achieves a 98.3% testing accuracy for the handwritten digit recognition task on the full MNIST dataset.
△ Less
Submitted 5 September, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Gamma band oscillations reflect sensory and affective dimensions of pain
Authors:
Yuanyuan Lyu,
Francesca Zidda,
Stefan Radev,
Hongcai Liu,
Xiaoli Guo,
Shanbao Tong,
Herta Flor,
Jamila Andoh
Abstract:
Pain is a multidimensional process, which can be modulated by emotions, however, the mechanisms underlying this modulation are unknown. We used pictures with different emotional valence (negative, positive, neutral) as primes and applied electrical painful stimuli as targets to healthy participants. We assessed pain intensity and unpleasantness ratings and recorded electroencephalograms (EEG). We…
▽ More
Pain is a multidimensional process, which can be modulated by emotions, however, the mechanisms underlying this modulation are unknown. We used pictures with different emotional valence (negative, positive, neutral) as primes and applied electrical painful stimuli as targets to healthy participants. We assessed pain intensity and unpleasantness ratings and recorded electroencephalograms (EEG). We found that pain unpleasantness, and not pain intensity ratings were modulated by emotion, with increased ratings for negative and decreased for positive pictures. We also found two consecutive gamma band oscillations (GBOs) related to pain processing from time frequency analyses of the EEG signals. An early GBO had a cortical distribution contralateral to the painful stimulus, and its amplitude was positively correlated with intensity and unpleasantness ratings, but not with prime valence. The late GBO had a centroparietal distribution and its amplitude was larger for negative compared to neutral and positive pictures. The emotional modulation effect (negative versus positive) of the late GBO amplitude was positively correlated with pain unpleasantness. The early GBO might reflect the overall pain perception, possibly involving the thalamocortical circuit, while the late GBO might be related to the affective dimension of pain and top-down related processes.
△ Less
Submitted 15 February, 2019;
originally announced February 2019.
-
JS-MA: A Jensen-Shannon Divergence Based Method for Mapping Genome-wide Associations on Multiple Diseases
Authors:
Xuan Guo
Abstract:
Taking advantages of high-throughput genotyping technology of single nucleotide polymorphism (SNP), large genome-wide association studies (GWASs) have been considered as the promise to unravel the complex relationships between genotypes and phenotypes, in particularly common diseases. However, current multi-locus-based methods are insufficient, in terms of computational cost and discrimination pow…
▽ More
Taking advantages of high-throughput genotyping technology of single nucleotide polymorphism (SNP), large genome-wide association studies (GWASs) have been considered as the promise to unravel the complex relationships between genotypes and phenotypes, in particularly common diseases. However, current multi-locus-based methods are insufficient, in terms of computational cost and discrimination power, to detect statistically significant interactions and they are lacking in the ability of finding diverse genetic effects on multifarious diseases. Especially, multiple statistic tests for high-order epistasis ($ \geq $ 2 SNPs) will raise huge analytical challenges because the computational cost increases exponentially as the growth of the cardinality of SNPs in an epistatic module. In this paper, we develop a simple, fast and powerful method, named JS-MA, using the Jensen-Shannon divergence and a high-dimensional $ k $-mean clustering algorithm for mapping the genome-wide multi-locus epistatic interactions on multiple diseases. Compared with some state-of-the-art association mapping tools, our method is demonstrated to be more powerful and efficient from the experimental results on the systematical simulations. We also applied JS-MA to the GWAS datasets from WTCCC for two common diseases, i.e. Rheumatoid Arthritis and Type 1 Diabetes. JS-MA not only confirms some recently reported biologically meaningful associations but also identifies some novel findings. Therefore, we believe that our method is suitable and efficient for the full-scale analysis of multi-disease-related interactions in the large GWASs.
△ Less
Submitted 16 November, 2018;
originally announced November 2018.
-
A Supervised STDP-based Training Algorithm for Living Neural Networks
Authors:
Yuan Zeng,
Kevin Devincentis,
Yao Xiao,
Zubayer Ibne Ferdous,
Xiaochen Guo,
Zhiyuan Yan,
Yevgeny Berdichevsky
Abstract:
Neural networks have shown great potential in many applications like speech recognition, drug discovery, image classification, and object detection. Neural network models are inspired by biological neural networks, but they are optimized to perform machine learning tasks on digital computers. The proposed work explores the possibilities of using living neural networks in vitro as basic computation…
▽ More
Neural networks have shown great potential in many applications like speech recognition, drug discovery, image classification, and object detection. Neural network models are inspired by biological neural networks, but they are optimized to perform machine learning tasks on digital computers. The proposed work explores the possibilities of using living neural networks in vitro as basic computational elements for machine learning applications. A new supervised STDP-based learning algorithm is proposed in this work, which considers neuron engineering constrains. A 74.7% accuracy is achieved on the MNIST benchmark for handwritten digit recognition.
△ Less
Submitted 21 March, 2018; v1 submitted 30 October, 2017;
originally announced October 2017.
-
Towards a Mathematical Foundation of Immunology and Amino Acid Chains
Authors:
Wen-Jun Shen,
Hau-San Wong,
Quan-Wu Xiao,
Xin Guo,
Stephen Smale
Abstract:
We attempt to set a mathematical foundation of immunology and amino acid chains. To measure the similarities of these chains, a kernel on strings is defined using only the sequence of the chains and a good amino acid substitution matrix (e.g. BLOSUM62). The kernel is used in learning machines to predict binding affinities of peptides to human leukocyte antigens DR (HLA-DR) molecules. On both fixed…
▽ More
We attempt to set a mathematical foundation of immunology and amino acid chains. To measure the similarities of these chains, a kernel on strings is defined using only the sequence of the chains and a good amino acid substitution matrix (e.g. BLOSUM62). The kernel is used in learning machines to predict binding affinities of peptides to human leukocyte antigens DR (HLA-DR) molecules. On both fixed allele (Nielsen and Lund 2009) and pan-allele (Nielsen et.al. 2010) benchmark databases, our algorithm achieves the state-of-the-art performance. The kernel is also used to define a distance on an HLA-DR allele set based on which a clustering analysis precisely recovers the serotype classifications assigned by WHO (Nielsen and Lund 2009, and Marsh et.al. 2010). These results suggest that our kernel relates well the chain structure of both peptides and HLA-DR molecules to their biological functions, and that it offers a simple, powerful and promising methodology to immunology and amino acid chain studies.
△ Less
Submitted 25 June, 2012; v1 submitted 28 May, 2012;
originally announced May 2012.