Search | arXiv e-print repository

arXiv:2406.11906 [pdf, other]

NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics

Authors: Jingbo Zhou, Shaorong Chen, Jun Xia, Sizhe Liu, Tianze Ling, Wenjie Du, Yue Liu, Jianwei Yin, Stan Z. Li

Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this im… ▽ More Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this important task. Firstly, since there is no consensus for the evaluation datasets, the empirical results in different research papers are often not comparable, leading to unfair comparison. Secondly, the current methods are usually limited to amino acid-level or peptide-level precision and recall metrics. In this work, we present the first unified benchmark NovoBench for \emph{de novo} peptide sequencing, which comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics. Recent impressive methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $π$-HelixNovo are integrated into our framework. In addition to amino acid-level and peptide-level precision and recall, we evaluate the models' performance in terms of identifying post-tranlational modifications (PTMs), efficiency and robustness to peptide length, noise peaks and missing fragment ratio, which are important influencing factors while seldom be considered. Leveraging this benchmark, we conduct a large-scale study of current methods, report many insightful findings that open up new possibilities for future development. The benchmark will be open-sourced to facilitate future research and application. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2405.15206 [pdf, other]

Maximum Caliber Infers Effective Coupling and Response from Spiking Networks

Authors: Kevin S. Chen, Ying-Jen Yang

Abstract: The characterization of network and biophysical properties from neural spiking activity is an important goal in neuroscience. A framework that provides unbiased inference on causal synaptic interaction and single neural properties has been missing. Here we applied the stochastic dynamics extension of Maximum Entropy -- the Maximum Caliber Principle -- to infer the transition rates of network state… ▽ More The characterization of network and biophysical properties from neural spiking activity is an important goal in neuroscience. A framework that provides unbiased inference on causal synaptic interaction and single neural properties has been missing. Here we applied the stochastic dynamics extension of Maximum Entropy -- the Maximum Caliber Principle -- to infer the transition rates of network states. Effective synaptic coupling strength and neuronal response functions for various network motifs can then be computed. The inferred minimal model also enables leading-order reconstruction of inter-spike interval distribution. Our method is tested with numerical simulated spiking networks and applied to data from salamander retina. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.12144 [pdf]

Alterations of electrocortical activity during hand movements induced by motor cortex glioma

Authors: Yihan Wu, Tao Chang, Siliang Chen, Xiaodong Niu, Yu Li, Yuan Fang, Lei Yang, Yixuan Zong, Yaoxin Yang, Yuehua Li, Mengsong Wang, Wen Yang, Yixuan Wu, Chen Fu, Xia Fang, Yuxin Quan, Xilin Peng, Qiang Sun, Marc M. Van Hulle, Yanhui Liu, Ning Jiang, Dario Farina, Yuan Yang, Jiayuan He, Qing Mao

Abstract: Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with gl… ▽ More Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with glioma-infiltrated motor cortex, and recorded high-density electrocortical signals during finger movement tasks. The results showed that glioma suppresses task-related synchronization in the high-gamma band and reduces the power across all frequency bands. The resulting atypical motor information transmission model with discrete signaling pathways and delayed responses disrupts the stability of neuronal encoding patterns for finger movement kinematics across various temporal-spatial scales. These findings demonstrate that gliomas functionally invade neural circuits within the motor cortex. This result advances our understanding of motor function processing in chronic disease states, which is important to advance the surgical strategies and neurorehabilitation approaches for patients with malignant gliomas. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11096 [pdf]

MicroBundlePillarTrack, A Python package for automated segmentation, tracking, and analysis of pillar deflection in cardiac microbundles

Authors: Hiba Kobeissi, Xining Gao, Samuel J. DePalma, Jourdan K. Ewoldt, Miranda C. Wang, Shoshana L. Das, Javiera Jilberto, David Nordsletten, Brendon M. Baker, Christopher S. Chen, Emma Lejeune

Abstract: Movies of human induced pluripotent stem cell (hiPSC)-derived engineered cardiac tissue (microbundles) contain abundant information about structural and functional maturity. However, extracting these data in a reproducible and high-throughput manner remains a major challenge. Furthermore, it is not straightforward to make direct quantitative comparisons across the multiple in vitro experimental pl… ▽ More Movies of human induced pluripotent stem cell (hiPSC)-derived engineered cardiac tissue (microbundles) contain abundant information about structural and functional maturity. However, extracting these data in a reproducible and high-throughput manner remains a major challenge. Furthermore, it is not straightforward to make direct quantitative comparisons across the multiple in vitro experimental platforms employed to fabricate these tissues. Here, we present "MicroBundlePillarTrack," an open-source optical flow-based package developed in Python to track the deflection of pillars in cardiac microbundles grown on experimental platforms with two different pillar designs ("Type 1" and "Type 2" design). Our software is able to automatically segment both pillars, track their displacements, and output time-dependent metrics for contractility analysis, including beating amplitude and rate, contractile force, and tissue stress. Because this software is fully automated, it will allow for both faster and more reproducible analyses of larger datasets and it will enable more reliable cross-platform comparisons as compared to existing approaches that require manual steps and are tailored to one specific experimental platform. To complement this open-source software, we share a dataset of 1,540 brightfield example movies on which we have tested our software. Through sharing this data and software, our goal is to directly enable quantitative comparisons across labs, and facilitate future collective progress via the biomedical engineering open-source data and software ecosystem. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 8 main pages, 1 main figure, Supplementary Information included

MSC Class: 92F05; 74A05 ACM Class: J.2; J.3

arXiv:2405.04557 [pdf, other]

Determining cell population size from cell fraction in cell plasticity models

Authors: Yuman Wang, Shuli Chen, Jie Hu, Da Zhou

Abstract: Quantifying the size of cell populations is crucial for understanding biological processes such as growth, injury repair, and disease progression. Often, experimental data offer information in the form of relative frequencies of distinct cell types, rather than absolute cell counts. This emphasizes the need to devise effective strategies for estimating absolute cell quantities from fraction data.… ▽ More Quantifying the size of cell populations is crucial for understanding biological processes such as growth, injury repair, and disease progression. Often, experimental data offer information in the form of relative frequencies of distinct cell types, rather than absolute cell counts. This emphasizes the need to devise effective strategies for estimating absolute cell quantities from fraction data. In response to this challenge, we present two computational approaches grounded in stochastic cell population models: the first-order moment method (FOM) and the second-order moment method (SOM). These methods explicitly establish mathematical mappings from cell fraction to cell population size using moment equations of the stochastic models. Notably, our investigation demonstrates that the SOM method obviates the requirement for a priori knowledge of the initial population size, highlighting the utility of incorporating variance details from cell proportions. The robustness of both the FOM and SOM methods was analyzed from different perspectives. Additionally, we extended the application of the FOM and SOM methods to various biological mechanisms within the context of cell plasticity models. Our methodologies not only assist in mitigating the inherent limitations of experimental techniques when only fraction data is available for detecting cell population size, but they also offer new insights into utilizing the stochastic characteristics of cell population dynamics to quantify interactions between different biomasses within the system. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.12865 [pdf, other]

A minimal model of boosting and waning iin a recurrent seasonal epidemic

Authors: Siyu Chen, David Sankoff

Abstract: We propose a model of the immunity to a cyclical epidemic disease taking account not only of seasonal boosts during the infectious season, but also of residual immunity remaining from one season to the next. The focus is on the exponential waning process over successive cycles, imposed on the temporal distribution of infections or exposures over a season. This distribution, interacting with the wa… ▽ More We propose a model of the immunity to a cyclical epidemic disease taking account not only of seasonal boosts during the infectious season, but also of residual immunity remaining from one season to the next. The focus is on the exponential waning process over successive cycles, imposed on the temporal distribution of infections or exposures over a season. This distribution, interacting with the waning function, is all that is necessary to reproduce, in mathematically closed form, the mechanical cycle of boosting and waning immunity characteristic of recurrent seasonal infectious disease. Distinct from epidemiological models predicting numbers of individuals moving between infectivity compartments, our result enables us to directly estimate parameters of waning and the infectivity distribution. We can naturally iterate the cyclical process to simulate immunity trajectories over many years and thus to quantify the strong relationship between residual immunity and the time elapsed between annual infectivity peaks. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2403.07013 [pdf, other]

AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information

Authors: Jun Xia, Shaorong Chen, Jingbo Zhou, Tianze Ling, Wenjie Du, Sizhe Liu, Stan Z. Li

Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in \emph{de novo} peptide sequencing. Firstly, prior methods struggle to identify amino acids with… ▽ More Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in \emph{de novo} peptide sequencing. Firstly, prior methods struggle to identify amino acids with post-translational modifications (PTMs) due to their lower frequency in training data compared to canonical amino acids, further resulting in decreased peptide-level identification precision. Secondly, diverse types of noise and missing peaks in mass spectra reduce the reliability of training data (peptide-spectrum matches, PSMs). To address these challenges, we propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino acid/peptide, using CMI for adaptive model training. Extensive experiments demonstrate AdaNovo's state-of-the-art performance on a 9-species benchmark, where the peptides in the training set are almost completely disjoint from the peptides of the test sets. Moreover, AdaNovo excels in identifying amino acids with PTMs and exhibits robustness against data noise. The supplementary materials contain the official code. △ Less

Submitted 15 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.17997 [pdf]

StaPep: an open-source tool for the structure prediction and feature extraction of hydrocarbon-stapled peptides

Authors: Zhe Wang, Jianping Wu, Mengjun Zheng, Chenchen Geng, Borui Zhen, Wei Zhang, Hui Wu, Zhengyang Xu, Gang Xu, Si Chen, Xiang Li

Abstract: Many tools exist for extracting structural and physiochemical descriptors from linear peptides to predict their properties, but similar tools for hydrocarbon-stapled peptides are lacking.Here, we present StaPep, a Python-based toolkit designed for generating 2D/3D structures and calculating 21 distinct features for hydrocarbon-stapled peptides.The current version supports hydrocarbon-stapled pepti… ▽ More Many tools exist for extracting structural and physiochemical descriptors from linear peptides to predict their properties, but similar tools for hydrocarbon-stapled peptides are lacking.Here, we present StaPep, a Python-based toolkit designed for generating 2D/3D structures and calculating 21 distinct features for hydrocarbon-stapled peptides.The current version supports hydrocarbon-stapled peptides containing 2 non-standard amino acids (norleucine and 2-aminoisobutyric acid) and 6 nonnatural anchoring residues (S3, S5, S8, R3, R5 and R8).Then we established a hand-curated dataset of 201 hydrocarbon-stapled peptides and 384 linear peptides with sequence information and experimental membrane permeability, to showcase StaPep's application in artificial intelligence projects.A machine learning-based predictor utilizing above calculated features was developed with AUC of 0.85, for identifying cell-penetrating hydrocarbon-stapled peptides.StaPep's pipeline spans data retrieval, cleaning, structure generation, molecular feature calculation, and machine learning model construction for hydrocarbon-stapled peptides.The source codes and dataset are freely available on Github: https://github.com/dahuilangda/stapep_package. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 26 pages, 6 figures

arXiv:2402.04422 [pdf]

Immunogenic cell death triggered by pathogen ligands via host germ line-encoded receptors

Authors: Chuang Li, Yichen Wei, Chao Qin, Shifan Chen, Xiaolong Shao

Abstract: The strategic induction of cell death serves as a crucial immune defense mechanism for the eradication of pathogenic infections within host cells. Investigating the molecular mechanisms underlying immunogenic cell pathways has significantly enhanced our understanding of the host's immunity. This review provides a comprehensive overview of the immunogenic cell death mechanisms triggered by pathogen… ▽ More The strategic induction of cell death serves as a crucial immune defense mechanism for the eradication of pathogenic infections within host cells. Investigating the molecular mechanisms underlying immunogenic cell pathways has significantly enhanced our understanding of the host's immunity. This review provides a comprehensive overview of the immunogenic cell death mechanisms triggered by pathogen infections, focusing on the critical role of pattern recognition receptors. In response to infections, host cells dictate a variety of cell death pathways, including apoptosis, pyroptosis, necrosis, and lysosomal cell death, which are essential for amplifying immune responses and controlling pathogen dissemination. Key components of these mechanisms are host cellular receptors that recognize pathogen-associated ligands. These receptors activate downstream signaling cascades, leading to the expression of immunoregulatory genes and the production of antimicrobial cytokines and chemokines. Particularly, the inflammasome, a multi-protein complex, plays a pivotal role in these responses by processing pro-inflammatory cytokines and inducing pyroptotic cell death. Pathogens, in turn, have evolved strategies to manipulate these cell death pathways, either by inhibiting them to facilitate their replication or by triggering them to evade host defenses. This dynamic interplay between host immune mechanisms and pathogen strategies highlights the intricate co-evolution of microbial virulence and host immunity. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 30 pages, 3 figures

arXiv:2312.08987 [pdf, other]

doi 10.1038/s43588-023-00576-2

Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model

Authors: Junbo Shen, Qinze Yu, Shenyang Chen, Qingxiong Tan, Jingcheng Li, Yu Li

Abstract: Signal peptide (SP) is a short peptide located in the N-terminus of proteins. It is essential to target and transfer transmembrane and secreted proteins to correct positions. Compared with traditional experimental methods to identify signal peptides, computational methods are faster and more efficient, which are more practical for analyzing thousands or even millions of protein sequences, especial… ▽ More Signal peptide (SP) is a short peptide located in the N-terminus of proteins. It is essential to target and transfer transmembrane and secreted proteins to correct positions. Compared with traditional experimental methods to identify signal peptides, computational methods are faster and more efficient, which are more practical for analyzing thousands or even millions of protein sequences, especially for metagenomic data. Here we present Unbiased Organism-agnostic Signal Peptide Network (USPNet), a signal peptide classification and cleavage site prediction deep learning method that takes advantage of protein language models. We propose to apply label distribution-aware margin loss to handle data imbalance problems and use evolutionary information of protein to enrich representation and overcome species information dependence. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 23 pages 5 figures. Nat Comput Sci (2023)

arXiv:2311.17965 [pdf]

doi 10.1371/journal.pone.0019517

Defining Reference Sequences for Nocardia Species by Similarity and Clustering Analyses of 16S rRNA Gene Sequence Data

Authors: Manal Helal, Fanrong Kong, Sharon C. A. Chen, Michael Bain, Richard Christen, Vitali Sintchenko

Abstract: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S… ▽ More The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. Results: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. △ Less

Submitted 29 November, 2023; originally announced November 2023.

ACM Class: I.2.6

Journal ref: PLoS ONE June 2011 | Volume 6 | Issue 6 | e19517

arXiv:2311.17964 [pdf]

Linear normalised hash function for clustering gene sequences and identifying reference sequences from multiple sequence alignments

Authors: Manal Helal, Fanrong Kong, Sharon C-A Chen, Fei Zhou, Dominic E Dwyer, John Potter, Vitali Sintchenko

Abstract: The aim of this study was to develop a method that would identify the cluster centroids and the optimal number of clusters for a given sensitivity level and could work equally well for the different sequence datasets. A novel method that combines the linear mapping hash function and multiple sequence alignment (MSA) was developed. This method takes advantage of the already sorted by similarity seq… ▽ More The aim of this study was to develop a method that would identify the cluster centroids and the optimal number of clusters for a given sensitivity level and could work equally well for the different sequence datasets. A novel method that combines the linear mapping hash function and multiple sequence alignment (MSA) was developed. This method takes advantage of the already sorted by similarity sequences from the MSA output, and identifies the optimal number of clusters, clusters cut-offs, and clusters centroids that can represent reference gene vouchers for the different species. The linear mapping hash function can map an already ordered by similarity distance matrix to indices to reveal gaps in the values around which the optimal cut-offs of the different clusters can be identified. The method was evaluated using sets of closely related (16S rRNA gene sequences of Nocardia species) and highly variable (VP1 genomic region of Enterovirus 71) sequences and outperformed existing unsupervised machine learning clustering methods and dimensionality reduction methods. This method does not require prior knowledge of the number of clusters or the distance between clusters, handles clusters of different sizes and shapes, and scales linearly with the dataset. The combination of MSA with the linear mapping hash function is a computationally efficient way of gene sequence clustering and can be a valuable tool for the assessment of similarity, clustering of different microbial genomes, identifying reference sequences, and for the study of evolution of bacteria and viruses. △ Less

Submitted 29 November, 2023; originally announced November 2023.

ACM Class: I.2.6

Journal ref: Microbial Informatics and Experimentation volume 2, Article number: 2 (2012) https://microbialinformaticsj.biomedcentral.com/counter/pdf/10.1186/2042-5783-2-2.pdf

arXiv:2311.10255 [pdf, other]

FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems

Authors: Shiyuan Luo, Juntong Ni, Shengyu Chen, Runlong Yu, Yiqun Xie, Licheng Liu, Zhenong Jin, Huaxiu Yao, Xiaowei Jia

Abstract: Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values… ▽ More Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships amongst various environmental data over space and time? In this paper, we introduce a new framework, FREE, which maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem. The proposed FREE framework leverages recent advances in Large Language Models (LLMs) to supplement the original input features with natural language descriptions. This facilitates capturing the data semantics and also allows harnessing the irregularities of input features. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction. The efficacy of FREE is evaluated in the context of two societally important real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. Beyond the superior predictive performance over multiple baseline methods, FREE is shown to be more data- and computation-efficient as it can be pre-trained on simulated data generated by physics-based models. △ Less

Submitted 19 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.07117 [pdf, other]

Olfactory learning alters navigation strategies and behavioral variability in C. elegans

Authors: Kevin S. Chen, Anuj K. Sharma, Jonathan W. Pillow, Andrew M. Leifer

Abstract: Animals adjust their behavioral response to sensory input adaptively depending on past experiences. The flexible brain computation is crucial for survival and is of great interest in neuroscience. The nematode C. elegans modulates its navigation behavior depending on the association of odor butanone with food (appetitive training) or starvation (aversive training), and will then climb up the butan… ▽ More Animals adjust their behavioral response to sensory input adaptively depending on past experiences. The flexible brain computation is crucial for survival and is of great interest in neuroscience. The nematode C. elegans modulates its navigation behavior depending on the association of odor butanone with food (appetitive training) or starvation (aversive training), and will then climb up the butanone gradient or ignore it, respectively. However, the exact change in navigation strategy in response to learning is still unknown. Here we study the learned odor navigation in worms by combining precise experimental measurement and a novel descriptive model of navigation. Our model consists of two known navigation strategies in worms: biased random walk and weathervaning. We infer weights on these strategies by applying the model to worm navigation trajectories and the exact odor concentration it experiences. Compared to naive worms, appetitive trained worms up-regulate the biased random walk strategy, and aversive trained worms down-regulate the weathervaning strategy. The statistical model provides prediction with $>90 \%$ accuracy of the past training condition given navigation data, which outperforms the classical chemotaxis metric. We find that the behavioral variability is altered by learning, such that worms are less variable after training compared to naive ones. The model further predicts the learning-dependent response and variability under optogenetic perturbation of the olfactory neuron AWC$^\mathrm{ON}$. Lastly, we investigate neural circuits downstream from AWC$^\mathrm{ON}$ that are differentially recruited for learned odor-guided navigation. Together, we provide a new paradigm to quantify flexible navigation algorithms and pinpoint the underlying neural substrates. △ Less

Submitted 23 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.19624 [pdf, other]

Exploring Post-Training Quantization of Protein Language Models

Authors: Shuang Peng, Fei Yang, Ning Sun, Sheng Chen, Yanfeng Jiang, Aimin Pan

Abstract: Recent advancements in unsupervised protein language models (ProteinLMs), like ESM-1b and ESM-2, have shown promise in different protein prediction tasks. However, these models face challenges due to their high computational demands, significant memory needs, and latency, restricting their usage on devices with limited resources. To tackle this, we explore post-training quantization (PTQ) for Prot… ▽ More Recent advancements in unsupervised protein language models (ProteinLMs), like ESM-1b and ESM-2, have shown promise in different protein prediction tasks. However, these models face challenges due to their high computational demands, significant memory needs, and latency, restricting their usage on devices with limited resources. To tackle this, we explore post-training quantization (PTQ) for ProteinLMs, focusing on ESMFold, a simplified version of AlphaFold based on ESM-2 ProteinLM. Our study is the first attempt to quantize all weights and activations of ProteinLMs. We observed that the typical uniform quantization method performs poorly on ESMFold, causing a significant drop in TM-Score when using 8-bit quantization. We conducted extensive quantization experiments, uncovering unique challenges associated with ESMFold, particularly highly asymmetric activation ranges before Layer Normalization, making representation difficult using low-bit fixed-point formats. To address these challenges, we propose a new PTQ method for ProteinLMs, utilizing piecewise linear quantization for asymmetric activation values to ensure accurate approximation. We demonstrated the effectiveness of our method in protein structure prediction tasks, demonstrating that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, we applied our method to the contact prediction task, showcasing its versatility. In summary, our study introduces an innovative PTQ method for ProteinLMs, addressing specific quantization challenges and potentially leading to the development of more efficient ProteinLMs with significant implications for various protein-related applications. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 8 pages, 4 figures

arXiv:2310.18533 [pdf, other]

Evaluating the effects of high-throughput structural neuroimaging predictors on whole-brain functional connectome outcomes via network-based vector-on-matrix regression

Authors: Tong Lu, Yuan Zhang, Vince Lyzinski, Chuan Bi, Peter Kochunov, Elliot Hong, Shuo Chen

Abstract: The joint analysis of multimodal neuroimaging data is critical in the field of brain research because it reveals complex interactive relationships between neurobiological structures and functions. In this study, we focus on investigating the effects of structural imaging (SI) features, including white matter micro-structure integrity (WMMI) and cortical thickness, on the whole brain functional con… ▽ More The joint analysis of multimodal neuroimaging data is critical in the field of brain research because it reveals complex interactive relationships between neurobiological structures and functions. In this study, we focus on investigating the effects of structural imaging (SI) features, including white matter micro-structure integrity (WMMI) and cortical thickness, on the whole brain functional connectome (FC) network. To achieve this goal, we propose a network-based vector-on-matrix regression model to characterize the FC-SI association patterns. We have developed a novel multi-level dense bipartite and clique subgraph extraction method to identify which subsets of spatially specific SI features intensively influence organized FC sub-networks. The proposed method can simultaneously identify highly correlated structural-connectomic association patterns and suppress false positive findings while handling millions of potential interactions. We apply our method to a multimodal neuroimaging dataset of 4,242 participants from the UK Biobank to evaluate the effects of whole-brain WMMI and cortical thickness on the resting-state FC. The results reveal that the WMMI on corticospinal tracts and inferior cerebellar peduncle significantly affect functional connections of sensorimotor, salience, and executive sub-networks with an average correlation of 0.81 (p<0.001). △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 20 pages, 5 figures, 2 tables

arXiv:2310.18377 [pdf, other]

Large-scale Foundation Models and Generative AI for BigData Neuroscience

Authors: Ran Wang, Zhe Sage Chen

Abstract: Recent advances in machine learning have made revolutionary breakthroughs in computer games, image and natural language understanding, and scientific discovery. Foundation models and large-scale language models (LLMs) have recently achieved human-like intelligence thanks to BigData. With the help of self-supervised learning (SSL) and transfer learning, these models may potentially reshape the land… ▽ More Recent advances in machine learning have made revolutionary breakthroughs in computer games, image and natural language understanding, and scientific discovery. Foundation models and large-scale language models (LLMs) have recently achieved human-like intelligence thanks to BigData. With the help of self-supervised learning (SSL) and transfer learning, these models may potentially reshape the landscapes of neuroscience research and make a significant impact on the future. Here we present a mini-review on recent advances in foundation models and generative AI models as well as their applications in neuroscience, including natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.13018 [pdf, other]

Getting aligned on representational alignment

Authors: Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine L. Hermann, Kerem Oktar, Klaus Greff, Martin N. Hebart, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas P. O'Connell , et al. (5 additional authors not shown)

Abstract: Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of an… ▽ More Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of another system? These questions pertaining to the study of representational alignment are at the heart of some of the most active research areas in cognitive science, neuroscience, and machine learning. For example, cognitive scientists measure the representational alignment of multiple individuals to identify shared cognitive priors, neuroscientists align fMRI responses from multiple individuals into a shared representational space for group-level analyses, and ML researchers distill knowledge from teacher models into student models by increasing their alignment. Unfortunately, there is limited knowledge transfer between research communities interested in representational alignment, so progress in one field often ends up being rediscovered independently in another. Thus, greater cross-field communication would be advantageous. To improve communication between these fields, we propose a unifying framework that can serve as a common language between researchers studying representational alignment. We survey the literature from all three fields and demonstrate how prior work fits into this framework. Finally, we lay out open problems in representational alignment where progress can benefit all three of these fields. We hope that our work can catalyze cross-disciplinary collaboration and accelerate progress for all communities studying and developing information processing systems. We note that this is a working paper and encourage readers to reach out with their suggestions for future revisions. △ Less

Submitted 2 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: Working paper, changes to be made in upcoming revisions

arXiv:2310.12035 [pdf]

Tracking dynamic flow: Decoding flow fluctuations through performance in a fine motor control task

Authors: Bohao Tian, Shijun Zhang, Sirui Chen, Yuru Zhang, Kaiping Peng, Hongxing Zhang, Dangxiao Wang

Abstract: Flow, an optimal mental state merging action and awareness, significantly impacts our emotion, performance, and well-being. However, capturing its swift fluctuations on a fine timescale is challenging due to the sparsity of the existing flow detecting tools. Here we present a fine fingertip force control (F3C) task to induce flow, wherein the task challenge is set at a compatible level with person… ▽ More Flow, an optimal mental state merging action and awareness, significantly impacts our emotion, performance, and well-being. However, capturing its swift fluctuations on a fine timescale is challenging due to the sparsity of the existing flow detecting tools. Here we present a fine fingertip force control (F3C) task to induce flow, wherein the task challenge is set at a compatible level with personal skill, and to quantitatively track the flow state variations from synchronous motor control performance. We extract eight performance metrics from fingertip force sequence and reveal their significant differences under distinct flow states. Further, we built a learning-based flow decoder that aims to predict the continuous flow intensity during the user experiment through the selected performance metrics, taking the self-reported flow as the label. Cross-validation shows that the predicted flow intensity reaches significant correlation with the self-reported flow intensity (r=0.81). Based on the decoding results, we observe rapid oscillations in flow fluctuations during the intervals between sparse self-reporting probes. This study showcases the feasibility of tracking intrinsic flow variations with high temporal resolution using task performance measures and may serve as foundation for future work aiming to take advantage of flow' s dynamics to enhance performance and positive emotions. △ Less

Submitted 28 December, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.06191 [pdf]

Investigating the Correlation between Force Output, Strains, and Pressure for Active Skeletal Muscle Contractions

Authors: Karan Taneja, Xiaolong He, John Hodgson, Usha Sinha, Shantanu Sinha, J. S. Chen

Abstract: Experimental observations suggest that the force output of the skeletal muscle tissue can be correlated to the intra-muscular pressure generated by the muscle belly. However, pressure often proves difficult to measure through in-vivo tests. Simulations on the other hand, offer a tool to model muscle contractions and analyze the relationship between muscle force generation and deformations as well… ▽ More Experimental observations suggest that the force output of the skeletal muscle tissue can be correlated to the intra-muscular pressure generated by the muscle belly. However, pressure often proves difficult to measure through in-vivo tests. Simulations on the other hand, offer a tool to model muscle contractions and analyze the relationship between muscle force generation and deformations as well as pressure outputs, enabling us to gain insight into correlations among experimentally measurable quantities such as principal and volumetric strains, and the force output. In this work, a correlation study is performed using Pearson's and Spearman's correlation coefficients on the force output of the skeletal muscle, the principal and volumetric strains experienced by the muscle and the pressure developed within the muscle belly as the muscle tissue undergoes isometric contractions due to varying activation profiles. The study reveals strong correlations between force output and the strains at all locations of the belly, irrespective of the type of activation profile used. This observation enables estimation on the contribution of various muscle groups to the total force by the experimentally measurable principal and volumetric strains in the muscle belly. It is also observed that pressure does not correlate well with force output due to stress relaxation near the boundary of muscle belly. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.11642 [pdf]

High-content stimulated Raman histology of human breast cancer

Authors: Hongli Ni, Chinmayee Prabhu Dessai, Haonan Lin, Wei Wang, Shaoxiong Chen, Yuhao Yuan, Xiaowei Ge, Jianpeng Ao, Nolan Vild, Ji-Xin Cheng

Abstract: Histological examination is crucial for cancer diagnosis, including hematoxylin and eosin (H&E) staining for mapping morphology and immunohistochemistry (IHC) staining for revealing chemical information. Recently developed two-color stimulated Raman histology could bypass the complex tissue processing to mimic H&E-like morphology. Yet, the underlying chemical features are not revealed, compromisin… ▽ More Histological examination is crucial for cancer diagnosis, including hematoxylin and eosin (H&E) staining for mapping morphology and immunohistochemistry (IHC) staining for revealing chemical information. Recently developed two-color stimulated Raman histology could bypass the complex tissue processing to mimic H&E-like morphology. Yet, the underlying chemical features are not revealed, compromising the effectiveness of prognostic stratification. Here, we present a high-content stimulated Raman histology (HC-SRH) platform that provides both morphological and chemical information for cancer diagnosis based on un-stained breast tissues. Through spectral unmixing in the C-H vibration window, HC-SRH can map unsaturated lipids, cellular protein, extracellular matrix, saturated lipid, and water in breast tissue. In this way, HC-SRH provides excellent contrast for various tissue components. Considering rapidness is important in clinical trials, we implemented spectral selective sampling to boost the speed of HC-SRH by one order. We also successfully demonstrated the HC-SRH in a clinical-compatible fiber laser-based SRS microscopy. With the widely rapid tuning capability of the advanced fiber laser, a clear chemical contrast of nucleic acid and solid-state ester is shown in the fingerprint result. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: 6 figures

arXiv:2309.08383 [pdf, other]

Dynamical Analysis of an Allelopathic Phytoplankton Model with Fear Effect

Authors: Shangming Chen, Fengde Chen, Vaibhava Srivastava, Rana D. Parshad

Abstract: This paper is the first to propose an allelopathic phytoplankton competition ODE model influenced by a fear effect based on natural biological phenomena. It is shown that the interplay of this fear effect and the allelopathic term cause rich dynamics in the proposed competition model, such as global stability, transcritical bifurcation, pitchfork bifurcation, and saddle-node bifurcation. We also c… ▽ More This paper is the first to propose an allelopathic phytoplankton competition ODE model influenced by a fear effect based on natural biological phenomena. It is shown that the interplay of this fear effect and the allelopathic term cause rich dynamics in the proposed competition model, such as global stability, transcritical bifurcation, pitchfork bifurcation, and saddle-node bifurcation. We also consider the spatially explicit version of the model and prove analogous results. Numerical simulations verify the feasibility of the theoretical analysis. The results demonstrate that the primary cause of the extinction of non-toxic species is the fear of toxic species compared to toxins. Allelopathy only affects the density of non-toxic species. The discussion provides guidance for the conservation of species and the maintenance of biodiversity. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2303.04919

arXiv:2309.03242 [pdf, other]

Automated Bioinformatics Analysis via AutoBA

Authors: Juexiao Zhou, Bin Zhang, Xiuying Chen, Haoyang Li, Xiaopeng Xu, Siyuan Chen, Xin Gao

Abstract: With the fast-growing and evolving omics data, the demand for streamlined and adaptable tools to handle the analysis continues to grow. In response to this need, we introduce Auto Bioinformatics Analysis (AutoBA), an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis. AutoBA simplifies the analytical process by requiring minimal user input… ▽ More With the fast-growing and evolving omics data, the demand for streamlined and adaptable tools to handle the analysis continues to grow. In response to this need, we introduce Auto Bioinformatics Analysis (AutoBA), an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis. AutoBA simplifies the analytical process by requiring minimal user input while delivering detailed step-by-step plans for various bioinformatics tasks. Through rigorous validation by expert bioinformaticians, AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics. AutoBA's unique capacity to self-design analysis processes based on input data variations further underscores its versatility. Compared with online bioinformatic services, AutoBA deploys the analysis locally, preserving data privacy. Moreover, different from the predefined pipeline, AutoBA has adaptability in sync with emerging bioinformatics tools. Overall, AutoBA represents a convenient tool, offering robustness and adaptability for complex omics data analysis. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.11809 [pdf, other]

Expressive probabilistic sampling in recurrent neural networks

Authors: Shirui Chen, Linxing Preston Jiang, Rajesh P. N. Rao, Eric Shea-Brown

Abstract: In sampling-based Bayesian models of brain function, neural activities are assumed to be samples from probability distributions that the brain uses for probabilistic computation. However, a comprehensive understanding of how mechanistic models of neural dynamics can sample from arbitrary distributions is still lacking. We use tools from functional analysis and stochastic differential equations to… ▽ More In sampling-based Bayesian models of brain function, neural activities are assumed to be samples from probability distributions that the brain uses for probabilistic computation. However, a comprehensive understanding of how mechanistic models of neural dynamics can sample from arbitrary distributions is still lacking. We use tools from functional analysis and stochastic differential equations to explore the minimum architectural requirements for $\textit{recurrent}$ neural circuits to sample from complex distributions. We first consider the traditional sampling model consisting of a network of neurons whose outputs directly represent the samples (sampler-only network). We argue that synaptic current and firing-rate dynamics in the traditional model have limited capacity to sample from a complex probability distribution. We show that the firing rate dynamics of a recurrent neural circuit with a separate set of output units can sample from an arbitrary probability distribution. We call such circuits reservoir-sampler networks (RSNs). We propose an efficient training procedure based on denoising score matching that finds recurrent and output weights such that the RSN implements Langevin sampling. We empirically demonstrate our model's ability to sample from several complex data distributions using the proposed neural dynamics and discuss its applicability to developing the next generation of sampling-based brain models. △ Less

Submitted 14 November, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.08562 [pdf, other]

Bayesian Inference of Phenotypic Plasticity of Cancer Cells Based on Dynamic Model for Temporal Cell Proportion Data

Authors: Shuli Chen, Yuman Wang, Da Zhou, Jie Hu

Abstract: Mounting evidence underscores the prevalent hierarchical organization of cancer tissues. At the foundation of this hierarchy reside cancer stem cells, a subset of cells endowed with the pivotal role of engendering the entire cancer tissue through cell differentiation. In recent times, substantial attention has been directed towards the phenomenon of cancer cell plasticity, where the dynamic interc… ▽ More Mounting evidence underscores the prevalent hierarchical organization of cancer tissues. At the foundation of this hierarchy reside cancer stem cells, a subset of cells endowed with the pivotal role of engendering the entire cancer tissue through cell differentiation. In recent times, substantial attention has been directed towards the phenomenon of cancer cell plasticity, where the dynamic interconversion between cancer stem cells and non-stem cancer cells has garnered significant interest. Since the task of detecting cancer cell plasticity from empirical data remains a formidable challenge, we propose a Bayesian statistical framework designed to infer phenotypic plasticity within cancer cells, utilizing temporal data on cancer stem cell proportions. Our approach is grounded in a stochastic model, adept at capturing the dynamic behaviors of cells. Leveraging Bayesian analysis, we explore the moment equation governing cancer stem cell proportions, derived from the Kolmogorov forward equation of our stochastic model. With improved Euler method for ordinary differential equations, a new statistical method for parameter estimation in nonlinear ordinary differential equations models is developed, which also provides novel ideas for the study of compositional data. Extensive simulations robustly validate the efficacy of our proposed method. To further corroborate our findings, we apply our approach to analyze published data from SW620 colon cancer cell lines. Our results harmonize with \emph{in situ} experiments, thereby reinforcing the utility of our method in discerning and quantifying phenotypic plasticity within cancer cells. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.04610 [pdf, other]

MicroBundleCompute: Automated segmentation, tracking, and analysis of subdomain deformation in cardiac microbundles

Authors: Hiba Kobeissi, Javiera Jilberto, M. Çağatay Karakan, Xining Gao, Samuel J. DePalma, Shoshana L. Das, Lani Quach, Jonathan Urquia, Brendon M. Baker, Christopher S. Chen, David Nordsletten, Emma Lejeune

Abstract: Advancing human induced pluripotent stem cell derived cardiomyocyte (hiPSC-CM) technology will lead to significant progress ranging from disease modeling, to drug discovery, to regenerative tissue engineering. Yet, alongside these potential opportunities comes a critical challenge: attaining mature hiPSC-CM tissues. At present, there are multiple techniques to promote maturity of hiPSC-CMs includi… ▽ More Advancing human induced pluripotent stem cell derived cardiomyocyte (hiPSC-CM) technology will lead to significant progress ranging from disease modeling, to drug discovery, to regenerative tissue engineering. Yet, alongside these potential opportunities comes a critical challenge: attaining mature hiPSC-CM tissues. At present, there are multiple techniques to promote maturity of hiPSC-CMs including physical platforms and cell culture protocols. However, when it comes to making quantitative comparisons of functional behavior, there are limited options for reliably and reproducibly computing functional metrics that are suitable for direct cross-system comparison. In addition, the current standard functional metrics obtained from time-lapse images of cardiac microbundle contraction reported in the field (i.e., post forces, average tissue stress) do not take full advantage of the available information present in these data (i.e., full-field tissue displacements and strains). Thus, we present "MicroBundleCompute," a computational framework for automatic quantification of morphology-based mechanical metrics from movies of cardiac microbundles. Briefly, this computational framework offers tools for automatic tissue segmentation, tracking, and analysis of brightfield and phase contrast movies of beating cardiac microbundles. It is straightforward to implement, requires little to no parameter tuning, and runs quickly on a personal computer. In this paper, we describe the methods underlying this computational framework, show the results of our extensive validation studies, and demonstrate the utility of exploring heterogeneous tissue deformations and strains as functional metrics. With this manuscript, we disseminate "MicroBundleCompute" as an open-source computational tool with the aim of making automated quantitative analysis of beating cardiac microbundles more accessible to the community. △ Less

Submitted 20 February, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: 16 main pages, 7 main figures, Supplementary Information included as appendices

MSC Class: 92F05; 74A05 ACM Class: J.2; J.3

arXiv:2308.01921 [pdf, other]

Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats

Authors: Wei Chen, Yihui Ren, Ai Kagawa, Matthew R. Carbone, Samuel Yen-Chi Chen, Xiaohui Qu, Shinjae Yoo, Austin Clyde, Arvind Ramanathan, Rick L. Stevens, Hubertus J. J. van Dam, Deyu Lu

Abstract: Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we… ▽ More Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats. △ Less

Submitted 14 September, 2023; v1 submitted 17 July, 2023; originally announced August 2023.

Comments: 8 pages, 5 figures, 2 tables, accepted by ICLMA2023

ACM Class: I.2.1

arXiv:2303.15520 [pdf, other]

Learning Harmonic Molecular Representations on Riemannian Manifold

Authors: Yiqun Wang, Yuning Shen, Shi Chen, Lihao Wang, Fei Ye, Hao Zhou

Abstract: Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular… ▽ More Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface. HMR offers a multi-resolution representation of molecular geometric and chemical features on 2D Riemannian manifold. We also introduce a harmonic message passing method to realize efficient spectral message passing over the surface manifold for better molecular encoding. Our proposed method shows comparable predictive power to current models in small molecule property prediction, and outperforms the state-of-the-art deep learning models for ligand-binding protein pocket classification and the rigid protein docking challenge, demonstrating its versatility in molecular representation learning. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: 25 pages including Appendix

arXiv:2303.04919 [pdf, other]

Dynamical Analysis of a Lotka-Volterra Competition Model with both Allee and Fear Effect

Authors: Shangming Chen, Fengde Chen, Vaibhava Srivastava, Rana D. Parshad

Abstract: Population ecology theory is replete with density dependent processes. However trait-mediated or behavioral indirect interactions can both reinforce or oppose density-dependent effects. This paper presents the first two species competitive ODE and PDE systems where an Allee effect, which is a density dependent process and the fear effect, which is non-consumptive and behavioral are both present. T… ▽ More Population ecology theory is replete with density dependent processes. However trait-mediated or behavioral indirect interactions can both reinforce or oppose density-dependent effects. This paper presents the first two species competitive ODE and PDE systems where an Allee effect, which is a density dependent process and the fear effect, which is non-consumptive and behavioral are both present. The stability of the equilibria is discussed analytically using the qualitative theory of ordinary differential equations. It is found that the Allee effect and the fear effect change the extinction dynamics of the system and the number of positive equilibrium points, but they do not affect the stability of the positive equilibria. We also observe some special dynamics that induce bifurcations in the system by varying the Allee or fear parameter. Interestingly we find that the Allee effect working in conjunction with the fear effect, can bring about several qualitative changes to the dynamical behavior of the system with only the fear effect in place, in regimes of small fear. That is, for small amounts of the fear parameter, it can change a competitive exclusion type situation to a strong competition type situation. It can also change a weak competition type situation to a bi-stability type situation. However for large fear regimes the Allee effect reinforces the dynamics driven by the fear effect. The analysis of the corresponding spatially explicit model is also presented. To this end the comparison principle for parabolic PDE is used. The conclusions of this paper have strong implications for conservation biology, biological control as well as the preservation of biodiversity. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2302.09445 [pdf, other]

Partial differential equation-based inference of migration and proliferation mechanisms in cancer cell populations

Authors: Patrick C. Kinnunen, Siddhartha Srivastava, Zhenlin Wang, Kenneth K. Y. Ho, Brock A. Humphries, Siyi Chen, Jennifer J. Linderman, Gary D. Luker, Kathryn E. Luker, Krishna Garikipati

Abstract: Targeting signaling pathways that drive cancer cell migration or proliferation is a common therapeutic approach. A popular experimental technique, the scratch assay, measures the migration and proliferation-driven cell monolayer formation. Scratch assay analyses do not differentiate between migration and proliferation effects and do not attempt to measure dynamic effects. To improve upon these met… ▽ More Targeting signaling pathways that drive cancer cell migration or proliferation is a common therapeutic approach. A popular experimental technique, the scratch assay, measures the migration and proliferation-driven cell monolayer formation. Scratch assay analyses do not differentiate between migration and proliferation effects and do not attempt to measure dynamic effects. To improve upon these methods, we combine high-throughput scratch assays, continuous video microscopy, and variational system identification (VSI) to infer partial differential equation (PDE) models of cell migration and proliferation. We capture the evolution of cell density fields over time using live cell microscopy and automated image processing. We employ VSI techniques to identify cell density dynamics modeled with first-order kinetics of advection-diffusion-reaction systems. We present a comparison of our methods to results obtained using traditional inference approaches on previously analyzed 1-dimensional scratch assay data. We demonstrate the application of this pipeline on high throughput 2-dimensional scratch assays and find that decreasing serum levels can decrease random cell migration by approximately 20%. Our integrated experimental and computational pipeline can be adapted for automatically quantifying the effect of biological perturbations on cell migration and proliferation in various cell lines. △ Less

Submitted 18 February, 2023; originally announced February 2023.

arXiv:2301.06645 [pdf, other]

Analysis of a Reaction-Diffusion Susceptible-Infected-Susceptible Epidemic Patch Model Incorporating Movement Inside and Among Patches

Authors: Shanshan Chen, Yixiang Wu

Abstract: In this paper, we propose and analyze a reaction-diffusion susceptible-infected-susceptible (SIS) epidemic patch model. The individuals are assumed to reside in different patches, where they are able to move inside and among the patches. The movement of individuals inside the patches is descried by diffusion terms, and the movement pattern among patches is modeled by an essentially nonnegative mat… ▽ More In this paper, we propose and analyze a reaction-diffusion susceptible-infected-susceptible (SIS) epidemic patch model. The individuals are assumed to reside in different patches, where they are able to move inside and among the patches. The movement of individuals inside the patches is descried by diffusion terms, and the movement pattern among patches is modeled by an essentially nonnegative matrix. We define a basic reproduction number $\mathcal{R}_0$ for the model and show that it is a threshold value for disease extinction versus persistence. The monotone dependence of $\mathcal{R}_0$ on the movement rates of infected individuals is proved when the dispersal pattern is symmetric or non-symmetric. Numerical simulations are performed to illustrate the impact of the movement of individuals inside and among patches on the transmission of the disease. △ Less

Submitted 16 January, 2023; originally announced January 2023.

MSC Class: 92D30; 37N25; 92D40

arXiv:2301.06640 [pdf, other]

doi 10.1007/s00033-023-02009-6

On the impact of spatial heterogeneity and drift rate in a three-patch two-species Lotka-Volterra competition model over a stream

Authors: Shanshan Chen, Jie Liu, Yixiang Wu

Abstract: In this paper, we study a three-patch two-species Lotka-Volterra competition patch model over a stream network. The individuals are subject to both random and directed movements, and the two species are assumed to be identical except for the movement rates. The environment is heterogeneous, and the carrying capacity is lager in upstream locations. We treat one species as a resident species and inv… ▽ More In this paper, we study a three-patch two-species Lotka-Volterra competition patch model over a stream network. The individuals are subject to both random and directed movements, and the two species are assumed to be identical except for the movement rates. The environment is heterogeneous, and the carrying capacity is lager in upstream locations. We treat one species as a resident species and investigate whether the other species can invade or not. Our results show that the spatial heterogeneity of environment and the magnitude of the drift rates have a large impact on the competition outcomes of the stream species. △ Less

Submitted 16 January, 2023; originally announced January 2023.

MSC Class: 92D25; 92D40; 34C12; 34D23; 37C65

arXiv:2301.06638 [pdf, other]

Evolution of dispersal in advective patchy environments with varying drift rates

Authors: Shanshan Chen, Jie Liu, Yixiang Wu

Abstract: In this paper, we study a two stream species Lotka-Volterra competition patch model with the patches aligned along a line. The two species are supposed to be identical except for the diffusion rates. For each species, the diffusion rates between patches are the same, while the drift rates vary. Our results show that the convexity of the drift rates has a significant impact on the competition outco… ▽ More In this paper, we study a two stream species Lotka-Volterra competition patch model with the patches aligned along a line. The two species are supposed to be identical except for the diffusion rates. For each species, the diffusion rates between patches are the same, while the drift rates vary. Our results show that the convexity of the drift rates has a significant impact on the competition outcomes: if the drift rates are convex, then the species with larger diffusion rate wins the competition; if the drift rates are concave, then the species with smaller diffusion rate wins the competition. △ Less

Submitted 16 January, 2023; originally announced January 2023.

MSC Class: 92D25; 92D40; 34C12; 34D23; 37C65

arXiv:2301.05905 [pdf, other]

Continuous odor profile monitoring to study olfactory navigation in small animals

Authors: Kevin S. Chen, Rui Wu, Marc H. Gershow, Andrew M. Leifer

Abstract: Olfactory navigation is observed across species and plays a crucial role in locating resources for survival. In the laboratory, understanding the behavioral strategies and neural circuits underlying odor-taxis requires a detailed understanding of the animal's sensory environment. For small model organisms like C. elegans and larval D. melanogaster, controlling and measuring the odor environment ex… ▽ More Olfactory navigation is observed across species and plays a crucial role in locating resources for survival. In the laboratory, understanding the behavioral strategies and neural circuits underlying odor-taxis requires a detailed understanding of the animal's sensory environment. For small model organisms like C. elegans and larval D. melanogaster, controlling and measuring the odor environment experienced by the animal can be challenging, especially for airborne odors, which are subject to subtle effects from airflow, temperature variation, and from the odor's adhesion, adsorption or reemission. Here we present a method to flexibly control and precisely measure airborne odor concentration in an arena with agar while imaging animal behavior. Crucially and unlike previous methods, our method allows continuous monitoring of the odor profile during behavior. We construct stationary chemical landscapes in an odor flow chamber through spatially patterned odorized air. The odor concentration is measured with a spatially distributed array of digital gas sensors. Careful placement of the sensors allows the odor concentration across the arena to be accurately inferred and continuously monitored at all points in time. We use this approach to measure the precise odor concentration that each animal experiences as it undergoes chemotaxis behavior and report chemotaxis strategies for C. elegans and D. melanogaster larvae populations under different spatial odor landscapes. △ Less

Submitted 14 January, 2023; originally announced January 2023.

arXiv:2212.03329 [pdf, other]

Enhancing Low-Density EEG-Based Brain-Computer Interfaces with Similarity-Keeping Knowledge Distillation

Authors: Xin-Yao Huang, Sung-Yu Chen, Chun-Shu Wei

Abstract: Electroencephalogram (EEG) has been one of the common neuromonitoring modalities for real-world brain-computer interfaces (BCIs) because of its non-invasiveness, low cost, and high temporal resolution. Recently, light-weight and portable EEG wearable devices based on low-density montages have increased the convenience and usability of BCI applications. However, loss of EEG decoding performance is… ▽ More Electroencephalogram (EEG) has been one of the common neuromonitoring modalities for real-world brain-computer interfaces (BCIs) because of its non-invasiveness, low cost, and high temporal resolution. Recently, light-weight and portable EEG wearable devices based on low-density montages have increased the convenience and usability of BCI applications. However, loss of EEG decoding performance is often inevitable due to reduced number of electrodes and coverage of scalp regions of a low-density EEG montage. To address this issue, we introduce knowledge distillation (KD), a learning mechanism developed for transferring knowledge/information between neural network models, to enhance the performance of low-density EEG decoding. Our framework includes a newly proposed similarity-keeping (SK) teacher-student KD scheme that encourages a low-density EEG student model to acquire the inter-sample similarity as in a pre-trained teacher model trained on high-density EEG data. The experimental results validate that our SK-KD framework consistently improves motor-imagery EEG decoding accuracy when number of electrodes deceases for the input EEG data. For both common low-density headphone-like and headband-like montages, our method outperforms state-of-the-art KD methods across various EEG decoding model architectures. As the first KD scheme developed for enhancing EEG decoding, we foresee the proposed SK-KD framework to facilitate the practicality of low-density EEG-based BCI in real-world applications. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2210.14744 [pdf]

Quantitative elemental imaging in eukaryotic algae

Authors: Stefan Schmollinger, Si Chen, Sabeeha S. Merchant

Abstract: All organisms, fundamentally, are made from the same raw material, namely the elements of the periodic table. Biochemical diversity is achieved with how these elements are utilized, for what purpose and in which physical location. Determining elemental distributions, especially those of trace elements that facilitate metabolism as cofactors in the active centers of essential enzymes, can determine… ▽ More All organisms, fundamentally, are made from the same raw material, namely the elements of the periodic table. Biochemical diversity is achieved with how these elements are utilized, for what purpose and in which physical location. Determining elemental distributions, especially those of trace elements that facilitate metabolism as cofactors in the active centers of essential enzymes, can determine the state of metabolism, the nutritional status or the developmental stage of an organism. Photosynthetic eukaryotes, especially algae, are excellent subjects for quantitative analysis of elemental distribution. These microbes utilize unique metabolic pathways that require various trace nutrients at their core to enable its operation. Photosynthetic microbes also have important environmental roles as primary producers in habitats with limited nutrient supply or toxin contaminations. Accordingly, photosynthetic eukaryotes are of great interest for biotechnological exploitation, carbon sequestration and bioremediation, with many of the applications involving various trace elements and consequently affecting their quota and intracellular distribution. A number of diverse applications were developed for elemental imaging allowing subcellular resolution, with X-ray fluorescence microscopy (XFM) being at the forefront, enabling quantitative descriptions of intact cells in a non-destructive method. This Tutorial Review summarizes the workflow of a quantitative, single-cell elemental distribution analysis of a eukaryotic alga using XFM. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.08016 [pdf, other]

Prediction of drug effectiveness in rheumatoid arthritis patients based on machine learning algorithms

Authors: Shengjia Chen, Nikunj Gupta, Woodward B. Galbraith, Valay Shah, Jacopo Cirrone

Abstract: Rheumatoid arthritis (RA) is an autoimmune condition caused when patients' immune system mistakenly targets their own tissue. Machine learning (ML) has the potential to identify patterns in patient electronic health records (EHR) to forecast the best clinical treatment to improve patient outcomes. This study introduced a Drug Response Prediction (DRP) framework with two main goals: 1) design a dat… ▽ More Rheumatoid arthritis (RA) is an autoimmune condition caused when patients' immune system mistakenly targets their own tissue. Machine learning (ML) has the potential to identify patterns in patient electronic health records (EHR) to forecast the best clinical treatment to improve patient outcomes. This study introduced a Drug Response Prediction (DRP) framework with two main goals: 1) design a data processing pipeline to extract information from tabular clinical data, and then preprocess it for functional use, and 2) predict RA patient's responses to drugs and evaluate classification models' performance. We propose a novel two-stage ML framework based on European Alliance of Associations for Rheumatology (EULAR) criteria cutoffs to model drug effectiveness. Our model Stacked-Ensemble DRP was developed and cross-validated using data from 425 RA patients. The evaluation used a subset of 124 patients (30%) from the same data source. In the evaluation of the test set, two-stage DRP leads to improved classification accuracy over other end-to-end classification models for binary classification. Our proposed method provides a complete pipeline to predict disease activity scores and identify the group that does not respond well to anti-TNF treatments, thus showing promise in supporting clinical decisions based on EHR information. △ Less

Submitted 21 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 13 pages, 5 figures, to be published in ICBBE 2022

arXiv:2208.10661 [pdf, other]

Therapeutic algebra of immunomodulatory drug responses at single-cell resolution

Authors: Jialong Jiang, Sisi Chen, Tiffany Tsou, Christopher S. McGinnis, Tahmineh Khazaei, Qin Zhu, Jong H. Park, Paul Rivaud, Inna-Marie Strazhnik, Eric D. Chow, David A. Sivak, Zev J. Gartner, Matt Thomson

Abstract: Therapeutic modulation of immune states is central to the treatment of human disease. However, how drugs and drug combinations impact the diverse cell types in the human immune system remains poorly understood at the transcriptome scale. Here, we apply single-cell mRNA-seq to profile the response of human immune cells to 502 immunomodulatory drugs alone and in combination. We develop a unified mat… ▽ More Therapeutic modulation of immune states is central to the treatment of human disease. However, how drugs and drug combinations impact the diverse cell types in the human immune system remains poorly understood at the transcriptome scale. Here, we apply single-cell mRNA-seq to profile the response of human immune cells to 502 immunomodulatory drugs alone and in combination. We develop a unified mathematical model that quantitatively describes the transcriptome scale response of myeloid and lymphoid cell types to individual drugs and drug combinations through a single inferred regulatory network. The mathematical model reveals how drug combinations generate novel, macrophage and T-cell states by recruiting combinations of gene expression programs through both additive and non-additive drug interactions. A simplified drug response algebra allows us to predict the continuous modulation of immune cell populations between activated, resting and hyper-inhibited states through combinatorial drug dose titrations. Our results suggest that transcriptome-scale mathematical models could enable the design of therapeutic strategies for programming the human immune system using combinations of therapeutics. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 19 pages, 5 figures

arXiv:2205.07673 [pdf, other]

ProNet DB: A proteome-wise database for protein surface property representations and RNA-binding profiles

Authors: Junkang Wei, Jin Xiao, Siyuan Chen, Licheng Zong, Xin Gao, Yu Li

Abstract: The rapid growth in the number of experimental and predicted protein structures and more complicated protein structures challenge users in computational biology for utilizing the structural information and protein surface property representation. Recently, AlphaFold2 released the comprehensive proteome of various species, and protein surface property representation plays a crucial role in protein-… ▽ More The rapid growth in the number of experimental and predicted protein structures and more complicated protein structures challenge users in computational biology for utilizing the structural information and protein surface property representation. Recently, AlphaFold2 released the comprehensive proteome of various species, and protein surface property representation plays a crucial role in protein-molecule interaction prediction such as protein-protein interaction, protein-nucleic acid interaction, and protein-compound interaction. Here, we proposed the first comprehensive database, namely ProNet DB, which incorporates multiple protein surface representations and RNA-binding landscape for more than 326,175 protein structures covering 16 model organism proteomes from AlphaFold Protein Structure Database (AlphaFold DB) and experimentally validated protein structures deposited in Protein Data Bank (PDB). For each protein, we provided the original protein structure, surface property representation including hydrophobicity, charge distribution, hydrogen bond, interacting face, and RNA-binding landscape such as RNA binding sites and RNA binding preference. To interpret protein surface property representation and RNA binding landscape intuitively, we also integrate Mol* and Online 3D Viewer to visualize the representation on the protein surface. The pre-computed features are available for the users instantaneously and boost computational biology development including molecular mechanism exploration, geometry-based drug discovery and novel therapeutics development. The server is now available on https://proj.cse.cuhk.edu.hk/aihlab/pronet/. △ Less

Submitted 7 August, 2023; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: 12 pages, 6 figures

arXiv:2204.13858 [pdf, other]

One-Way Matching of Datasets with Low Rank Signals

Authors: Shuxiao Chen, Sizun Jiang, Zongming Ma, Garry P. Nolan, Bokai Zhu

Abstract: We study one-way matching of a pair of datasets with low rank signals. Under a stylized model, we first derive information-theoretic limits of matching under a mismatch proportion loss. We then show that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task. The theoretical error bounds are corroborated by simulated exampl… ▽ More We study one-way matching of a pair of datasets with low rank signals. Under a stylized model, we first derive information-theoretic limits of matching under a mismatch proportion loss. We then show that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task. The theoretical error bounds are corroborated by simulated examples. Furthermore, we illustrate practical use of the matching procedure on two single-cell data examples. △ Less

Submitted 3 October, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

arXiv:2204.12595 [pdf]

doi 10.1371/journal.pcbi.1010421

Correcting motion induced fluorescence artifacts in two-channel neural imaging

Authors: Matthew S. Creamer, Kevin S. Chen, Andrew M. Leifer, Jonathan W. Pillow

Abstract: Imaging neural activity in a behaving animal presents unique challenges in part because motion from an animal's movement creates artifacts in fluorescence intensity time-series that are difficult to distinguish from neural signals of interest. One approach to mitigating these artifacts is to image two channels; one that captures an activity-dependent fluorophore, such as GCaMP, and another that ca… ▽ More Imaging neural activity in a behaving animal presents unique challenges in part because motion from an animal's movement creates artifacts in fluorescence intensity time-series that are difficult to distinguish from neural signals of interest. One approach to mitigating these artifacts is to image two channels; one that captures an activity-dependent fluorophore, such as GCaMP, and another that captures an activity-independent fluorophore such as RFP. Because the activity-independent channel contains the same motion artifacts as the activity-dependent channel, but no neural signals, the two together can be used to remove the artifacts. Existing approaches for this correction, such as taking the ratio of the two channels, do not account for channel independent noise in the measured fluorescence. Moreover, no systematic comparison has been made of existing approaches that use two-channel signals. Here, we present Two-channel Motion Artifact Correction (TMAC), a method which seeks to remove artifacts by specifying a generative model of the fluorescence of the two channels as a function of motion artifact, neural activity, and noise. We further present a novel method for evaluating ground-truth performance of motion correction algorithms by comparing the decodability of behavior from two types of neural recordings; a recording that had both an activity-dependent fluorophore (GCaMP and RFP) and a recording where both fluorophores were activity-independent (GFP and RFP). A successful motion-correction method should decode behavior from the first type of recording, but not the second. We use this metric to systematically compare five methods for removing motion artifacts from fluorescent time traces. We decode locomotion from a GCaMP expressing animal 15x more accurately on average than from control when using TMAC inferred activity and outperform all other methods of motion correction tested. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: 11 pages, 3 figures

arXiv:2204.11026 [pdf]

Bioinformatic analysis for structure and function of Glutamine synthetase(GS)

Authors: Jiahao Ma, Guotong Xu, Le Ao, Siqi Chen, Jingze Liu

Abstract: Objective: To predict structure and function of Glutamine synthetase (GS) from Pseudoalteromonas sp. by bioinformatics technology, and to provide a theoretical basis for further study. Methods: Open reading frame (ORF) of GS sequence from Pseudoalteromonas sp. was obtained by ORF finder and was translated into amino acid residue. The structure domain was analyzed by Blast. By the method of analysi… ▽ More Objective: To predict structure and function of Glutamine synthetase (GS) from Pseudoalteromonas sp. by bioinformatics technology, and to provide a theoretical basis for further study. Methods: Open reading frame (ORF) of GS sequence from Pseudoalteromonas sp. was obtained by ORF finder and was translated into amino acid residue. The structure domain was analyzed by Blast. By the method of analysis tools: Protparam, ProtScale, SignalP-4.0, TMHMM, SOPMA, SWISS-MODEL, NCBI SMART-BLAST and MAGA 7.0, the structure and function of the protein were predicted and analyzed. Results: The results showed that the sequence was GS with 468 amino acid residues, theoretical molecular weight was 51986.64 Da. The protein has the closest evolutionary status with Shewanella oneidensis. Then it had no signal peptide site and transmembrane domain. Secondary structure of GS contained 35.04% alpha-helix, 16.67% Extended chain, 5.34% beta-turn, 42.95% RandomCoil. Conclusions: This GU was a variety of biological functions of protein that may be used as a molecular samples of microbial nitrogen metabolism in extreme environments. △ Less

Submitted 23 April, 2022; originally announced April 2022.

Comments: 8 pages, 8 figures

arXiv:2204.06939 [pdf]

doi 10.1103/PhysRevE.105.064412

Prediction and Control of Focal Seizure Spread: Random Walk with Restart on Heterogeneous Brain Networks

Authors: Chen Wang, Sida Chen, Liang Huang, Lianchun Yu

Abstract: Whole-brain models offer a promising method of predicting seizure spread, which is critical for successful surgery treatment of focal epilepsy. Existing methods are largely based on structural connectome, which ignores the effects of heterogeneity in regional excitability of brains. In this study, we used a whole-brain model to show that heterogeneity in nodal excitability had a significant impact… ▽ More Whole-brain models offer a promising method of predicting seizure spread, which is critical for successful surgery treatment of focal epilepsy. Existing methods are largely based on structural connectome, which ignores the effects of heterogeneity in regional excitability of brains. In this study, we used a whole-brain model to show that heterogeneity in nodal excitability had a significant impact on seizure propagation in the networks, and compromised the prediction accuracy with structural connections. We then addressed this problem with an algorithm based on random walk with restart on graphs. We demonstrated that by establishing a relationship between the restarting probability and the excitability for each node, this algorithm could significantly improve the seizure spread prediction accuracy in heterogeneous networks, and was more robust against the extent of heterogeneity. We also strategized surgical seizure control as a process to identify and remove the key nodes (connections) responsible for the early spread of seizures from the focal region. Compared to strategies based on structural connections, virtual surgery with a strategy based on mRWER generated outcomes with a high success rate while maintaining low damage to the brain by removing fewer anatomical connections. These findings may have potential applications in developing personalized surgery strategies for epilepsy. △ Less

Submitted 14 April, 2022; originally announced April 2022.

arXiv:2204.01607 [pdf, other]

Modern Views of Machine Learning for Precision Psychiatry

Authors: Zhe Sage Chen, Prathamesh, Kulkarni, Isaac R. Galatzer-Levy, Benedetta Bigio, Carla Nasca, Yu Zhang

Abstract: In light of the NIMH's Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI w… ▽ More In light of the NIMH's Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of the ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. Additionally, we review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We further discuss explainable AI (XAI) and causality testing in a closed-human-in-the-loop manner, and highlight the ML potential in multimedia information extraction and multimodal data fusion. Finally, we discuss conceptual and practical challenges in precision psychiatry and highlight ML opportunities in future research. △ Less

Submitted 11 July, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

arXiv:2107.12609 [pdf]

Tracking Fast Neural Adaptation by Globally Adaptive Point Process Estimation for Brain-Machine Interface

Authors: Shuhang Chen, Xiang Zhang, Xiang Shen, Yifan Huang, Yiwen Wang

Abstract: Brain-machine interfaces (BMIs) help the disabled restore body functions by translating neural activity into digital commands to control external devices. Neural adaptation, where the brain signals change in response to external stimuli or movements, plays an important role in BMIs. When subjects purely use neural activity to brain-control a prosthesis, some neurons will actively explore a new tun… ▽ More Brain-machine interfaces (BMIs) help the disabled restore body functions by translating neural activity into digital commands to control external devices. Neural adaptation, where the brain signals change in response to external stimuli or movements, plays an important role in BMIs. When subjects purely use neural activity to brain-control a prosthesis, some neurons will actively explore a new tuning property to accomplish the movement task. The prediction of this neural tuning property can help subjects adapt more efficiently to brain control and maintain good decoding performance. Existing prediction methods track the slow change of the tuning property in the manual control, which is not suitable for the fast neural adaptation in brain control. In order to identify the active neurons in brain control and track their tuning property changes, we propose a globally adaptive point process method (GaPP) to estimate the neural modulation state from spike trains, decompose the states into the hyper preferred direction and reconstruct the kinematics in a dual-model framework. We implement the method on real data from rats performing a two-lever discrimination task under manual control and brain control. The results show our method successfully predicts the neural modulation state and identifies the neurons that become active in brain control. Compared to existing methods, ours tracks the fast changes of the hyper preferred direction from manual control to brain control more accurately and efficiently and reconstructs the kinematics better and faster. △ Less

Submitted 27 July, 2021; originally announced July 2021.

arXiv:2107.12243 [pdf, other]

Protein-RNA interaction prediction with deep learning: Structure matters

Authors: Junkang Wei, Siyuan Chen, Licheng Zong, Xin Gao, Yu Li

Abstract: Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Due to the limitation of the previous database, especially the lack of protein structure data, most of the existing computational methods rely heavily on the sequence data, with only a small portion of the methods utiliz… ▽ More Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Due to the limitation of the previous database, especially the lack of protein structure data, most of the existing computational methods rely heavily on the sequence data, with only a small portion of the methods utilizing the structural information. Recently, AlphaFold has revolutionized the entire protein and biology field. Foreseeably, the protein-RNA interaction prediction will also be promoted significantly in the upcoming years. In this work, we give a thorough review of this field, surveying both the binding site and binding preference prediction problems and covering the commonly used datasets, features, and models. We also point out the potential challenges and opportunities in this field. This survey summarizes the development of the RBP-RNA interaction field in the past and foresees its future development in the post-AlphaFold era. △ Less

Submitted 23 November, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

arXiv:2107.05478 [pdf, ps, other]

Structural characteristics in network control of molecular multiplex networks

Authors: Cheng Yuan, Zu-Yu Qian, Shi-Ming Chen, Sen Nie

Abstract: Numerous real-world systems can be naturally modeled as multilayer networks, enabling an efficient way to characterize those complex systems. Much evidence in the context of system biology indicated that the collections between different molecular networks can dramatically impact the global network functions. Here, we focus on the molecular multiplex networks coupled by the transcriptional regulat… ▽ More Numerous real-world systems can be naturally modeled as multilayer networks, enabling an efficient way to characterize those complex systems. Much evidence in the context of system biology indicated that the collections between different molecular networks can dramatically impact the global network functions. Here, we focus on the molecular multiplex networks coupled by the transcriptional regulatory network (TRN) and protein-protein interaction (PPI) network, exploring the controllability and energy requiring in these types of molecular multiplex networks. We find that the driver nodes tend to avoid essential or pathogen-related genes. Yet, imposing the external inputs to these essential or pathogen-related genes can remarkably reduce the energy cost, implying their crucial role in network control. Moreover, we find that lower minimal driver nodes as well as energy requiring are associated with disassortative coupling between TRN and PPI networks. Our findings in several species provide comprehensive understanding of genes' roles in biology and network control. △ Less

Submitted 12 July, 2021; originally announced July 2021.

arXiv:2106.08317 [pdf, other]

Active feature selection discovers minimal gene sets for classifying cell types and disease states with single-cell mRNA-seq data

Authors: Xiaoqiao Chen, Sisi Chen, Matt Thomson

Abstract: Sequencing costs currently prohibit the application of single-cell mRNA-seq to many biological and clinical analyses. Targeted single-cell mRNA-sequencing reduces sequencing costs by profiling reduced gene sets that capture biological information with a minimal number of genes. Here, we introduce an active learning method (ActiveSVM) that identifies minimal but highly-informative gene sets that en… ▽ More Sequencing costs currently prohibit the application of single-cell mRNA-seq to many biological and clinical analyses. Targeted single-cell mRNA-sequencing reduces sequencing costs by profiling reduced gene sets that capture biological information with a minimal number of genes. Here, we introduce an active learning method (ActiveSVM) that identifies minimal but highly-informative gene sets that enable the identification of cell-types, physiological states, and genetic perturbations in single-cell data using a small number of genes. Our active feature selection procedure generates minimal gene sets from single-cell data through an iterative cell-type classification task where misclassified cells are examined at each round of analysis to identify maximally informative genes through an `active' support vector machine (ActiveSVM) classifier. By focusing computational resources on misclassified cells, ActiveSVM scales to analyze data sets with over a million single cells. We demonstrate that ActiveSVM feature selection identifies gene sets that enable ~90% cell-type classification accuracy across a variety of data sets including cell atlas and disease characterization data sets. The method generalizes to reveal genes that respond to genetic perturbations and to identify region specific gene expression patterns in spatial transcriptomics data. The discovery of small but highly informative gene sets should enable substantial reductions in the number of measurements necessary for application of single-cell mRNA-seq to clinical tests, therapeutic discovery, and genetic screens. △ Less

Submitted 12 February, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

Comments: 37 pages, 7 figures

arXiv:2104.04188 [pdf]

doi 10.1021/acsami.1c00495

Ammonia-induced Calcium Phosphate Nanostructure: A Potential Assay for Studying Osteoporosis and Bone Metastasis

Authors: Sijia Chen, Qiong Wang, Felipe Eltit, Yubin Guo, Michael Cox, Rizhi Wang

Abstract: Osteoclastic resorption of bone plays a central role in both osteoporosis and bone metastasis. A reliable in vitro assay that simulates osteoclastic resorption in vivo would significantly speed up the process of devel-oping effective therapeutic solutions for those diseases. Here we reported the development of a novel and robust nano-structured calcium phosphate coating with unique functions on th… ▽ More Osteoclastic resorption of bone plays a central role in both osteoporosis and bone metastasis. A reliable in vitro assay that simulates osteoclastic resorption in vivo would significantly speed up the process of devel-oping effective therapeutic solutions for those diseases. Here we reported the development of a novel and robust nano-structured calcium phosphate coating with unique functions on the track-etched porous mem-brane by using an ammonia-induced mineralization (AiM) technique. The calcium phosphate coating uni-formly covers one side of the PET membrane enabling testing for osteoclastic resorption. The track-etched pores in the PET membrane allow calcium phosphate mineral pins to grow inside, which, on one hand, enhances coating integration with membrane substrate, and on the other hand provides diffusion channels for delivering drugs from the lower chamber of a double-chamber cell culture system. The applications of the processed calcium phosphate coating was first demonstrated as a drug screening device by using alen-dronate, a widely used drug for osteoporosis. It was confirmed that the delivery of alendronate significant-ly decreased both the number of monocyte-differentiated osteoclasts and coating resorption. To demon-strate the application in studying bone metastasis, we delivered PC3 prostate cancer conditioned medium and confirmed that both the differentiation of monocytes into osteoclasts and the osteoclastic resorption of the calcium phosphate coating were significantly enhanced. This novel assay thus provides a new platform for studying osteoclastic activities and assessing drug efficacy in vitro. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Journal ref: ACS Applied Materials & Interfaces Manuscript ID: am-2021-004953.R2

arXiv:2104.01175 [pdf, other]

doi 10.1039/D0LC01078B

Direct laser writing for cardiac tissue engineering: a microfluidic heart on a chip with integrated transducers

Authors: Rachael K. Jayne, M. Çağatay Karakan, Kehan Zhang, Noelle Pierce, Christos Michas, David J. Bishop, Christopher S. Chen, Kamil L. Ekinci, Alice E. White

Abstract: We have designed and fabricated a microfluidic-based platform for sensing mechanical forces generated by cardiac microtissues in a highly-controlled microenvironment. Our fabrication approach combines Direct Laser Writing (DLW) lithography with soft lithography. At the center of our platform is a cylindrical volume, divided into two chambers by a cylindrical polydimethylsiloxane (PDMS) shell. Cell… ▽ More We have designed and fabricated a microfluidic-based platform for sensing mechanical forces generated by cardiac microtissues in a highly-controlled microenvironment. Our fabrication approach combines Direct Laser Writing (DLW) lithography with soft lithography. At the center of our platform is a cylindrical volume, divided into two chambers by a cylindrical polydimethylsiloxane (PDMS) shell. Cells are seeded into the inner chamber from a top opening, and the microtissue assembles onto tailor-made attachment sites on the inner walls of the cylindrical shell. The outer chamber is electrically and fluidically isolated from the inner one by the cylindrical shell and is designed for actuation and sensing purposes. Externally applied pressure waves to the outer chamber deform parts of the cylindrical shell and thus allow us to exert time-dependent forces on the microtissue. Oscillatory forces generated by the microtissue similarly deform the cylindrical shell and change the volume of the outer chamber, resulting in measurable electrical conductance changes. We have used this platform to study the response of cardiac microtissues derived from human induced pluripotent stem cells (hiPSC) under prescribed mechanical loading and pacing. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: Main article 15 pages, 6 figures, 1 tables; supplementary 11 pages, 7 figures, 1 table, 6 movies

Journal ref: Lab on a Chip, 2021, 21, 1724 - 1737

Showing 1–50 of 91 results for author: Chen, S