Search | arXiv e-print repository

Exploring the Enigma of Neural Dynamics Through A Scattering-Transform Mixer Landscape for Riemannian Manifold

Authors: Tingting Dan, Ziquan Wei, Won Hwa Kim, Guorong Wu

Abstract: The human brain is a complex inter-wired system that emerges spontaneous functional fluctuations. In spite of tremendous success in the experimental neuroscience field, a system-level understanding of how brain anatomy supports various neural activities remains elusive. Capitalizing on the unprecedented amount of neuroimaging data, we present a physics-informed deep model to uncover the coupling m… ▽ More The human brain is a complex inter-wired system that emerges spontaneous functional fluctuations. In spite of tremendous success in the experimental neuroscience field, a system-level understanding of how brain anatomy supports various neural activities remains elusive. Capitalizing on the unprecedented amount of neuroimaging data, we present a physics-informed deep model to uncover the coupling mechanism between brain structure and function through the lens of data geometry that is rooted in the widespread wiring topology of connections between distant brain regions. Since deciphering the puzzle of self-organized patterns in functional fluctuations is the gateway to understanding the emergence of cognition and behavior, we devise a geometric deep model to uncover manifold mapping functions that characterize the intrinsic feature representations of evolving functional fluctuations on the Riemannian manifold. In lieu of learning unconstrained mapping functions, we introduce a set of graph-harmonic scattering transforms to impose the brain-wide geometry on top of manifold mapping functions, which allows us to cast the manifold-based deep learning into a reminiscent of MLP-Mixer architecture (in computer vision) for Riemannian manifold. As a proof-of-concept approach, we explore a neural-manifold perspective to understand the relationship between (static) brain structure and (dynamic) function, challenging the prevailing notion in cognitive neuroscience by proposing that neural activities are essentially excited by brain-wide oscillation waves living on the geometry of human connectomes, instead of being confined to focal areas. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 15 pages, 6 figures

MSC Class: 51H30 ACM Class: I.3.5

arXiv:2404.15805 [pdf, other]

Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering

Authors: Shujian Jiao, Bingxuan Li, Lei Wang, Xiaojin Zhang, Wei Chen, Jiajie Peng, Zhongyu Wei

Abstract: Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting… ▽ More Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting amino acid representations with notable biochemical accuracy. Yet, it lacks in delivering functional protein insights, signaling an opportunity for enhancing representation quality.Our study addresses this gap by incorporating protein family classification into ESM2's training.This approach, augmented with Community Propagation-Based Clustering Algorithm, improves global protein representations, while a contextual prediction task fine-tunes local amino acid accuracy. Significantly, our model achieved state-of-the-art results in several downstream experiments, demonstrating the power of combining global and local methodologies to substantially boost protein representation quality. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2312.08906 [pdf, other]

Using eye tracking to investigate what native Chinese speakers notice about linguistic landscape images

Authors: Zichao Wei, Yewei Qin

Abstract: Linguistic landscape is an important field in sociolinguistic research. Eye tracking technology is a common technology in psychological research. There are few cases of using eye movement to study linguistic landscape. This paper uses eye tracking technology to study the actual fixation of the linguistic landscape and finds that in the two dimensions of fixation time and fixation times, the fixati… ▽ More Linguistic landscape is an important field in sociolinguistic research. Eye tracking technology is a common technology in psychological research. There are few cases of using eye movement to study linguistic landscape. This paper uses eye tracking technology to study the actual fixation of the linguistic landscape and finds that in the two dimensions of fixation time and fixation times, the fixation of native Chinese speakers to the linguistic landscape is higher than that of the general landscape. This paper argues that this phenomenon is due to the higher information density of linguistic landscapes. At the same time, the article also discusses other possible reasons for this phenomenon. △ Less

Submitted 2 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

ACM Class: J.4

arXiv:2305.01580 [pdf]

doi 10.5121/csit.2023.130303

Molecular design method based on novel molecular representation and variational auto-encoder

Authors: Li Kai, Li Ning, Zhang Wei, Gao Ming

Abstract: Based on the traditional VAE, a novel neural network model is presented, with the latest molecular representation, SELFIES, to improve the effect of generating new molecules. In this model, multi-layer convolutional network and Fisher information are added to the original encoding layer to learn the data characteristics and guide the encoding process, which makes the features of the data hiding la… ▽ More Based on the traditional VAE, a novel neural network model is presented, with the latest molecular representation, SELFIES, to improve the effect of generating new molecules. In this model, multi-layer convolutional network and Fisher information are added to the original encoding layer to learn the data characteristics and guide the encoding process, which makes the features of the data hiding layer more aggregated, and integrates the Long Short Term Memory neural network (LSTM) into the decoding layer for better data generation, which effectively solves the degradation phenomenon generated by the encoding layer and decoding layer of the original VAE model. Through experiments on zinc molecular data sets, it is found that the similarity in the new VAE is 8.47% higher than that of the original ones. SELFIES are better at generating a variety of molecules than the traditional molecular representation, SELFIES. Experiments have shown that using SELFIES and the new VAE model presented in this paper can improve the effectiveness of generating new molecules. △ Less

Submitted 20 February, 2023; originally announced May 2023.

Comments: 13 pages, 7 figures, conference: NIAI

Journal ref: 4th International Conference on Natural Language Processing, Information Retrieval and AI (NIAI 2023), Volume 13, Number 03, February 2023, pp. 23-35, 2023. CS & IT - CSCP 2023

arXiv:2302.12177 [pdf, other]

EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction

Authors: Yang Zhang, Zhewei Wei, Ye Yuan, Chongxuan Li, Wenbing Huang

Abstract: Predicting the binding sites of target proteins plays a fundamental role in drug discovery. Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels and then feed the voxelized protein into a 3D CNN for prediction. However, the CNN-based methods encounter several critical issues: 1) defective in representing irregular protein structures; 2)… ▽ More Predicting the binding sites of target proteins plays a fundamental role in drug discovery. Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels and then feed the voxelized protein into a 3D CNN for prediction. However, the CNN-based methods encounter several critical issues: 1) defective in representing irregular protein structures; 2) sensitive to rotations; 3) insufficient to characterize the protein surface; 4) unaware of protein size shift. To address the above issues, this work proposes EquiPocket, an E(3)-equivariant Graph Neural Network (GNN) for binding site prediction, which comprises three modules: the first one to extract local geometric information for each surface atom, the second one to model both the chemical and spatial structure of protein and the last one to capture the geometry of the surface via equivariant message passing over the surface atoms. We further propose a dense attention output layer to alleviate the effect incurred by variable protein size. Extensive experiments on several representative benchmarks demonstrate the superiority of our framework to the state-of-the-art methods. △ Less

Submitted 22 July, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted to ICML 2024 (Oral)

arXiv:2302.07061 [pdf, other]

Do Deep Learning Methods Really Perform Better in Molecular Conformation Generation?

Authors: Gengmo Zhou, Zhifeng Gao, Zhewei Wei, Hang Zheng, Guolin Ke

Abstract: Molecular conformation generation (MCG) is a fundamental and important problem in drug discovery. Many traditional methods have been developed to solve the MCG problem, such as systematic searching, model-building, random searching, distance geometry, molecular dynamics, Monte Carlo methods, etc. However, they have some limitations depending on the molecular structures. Recently, there are plenty… ▽ More Molecular conformation generation (MCG) is a fundamental and important problem in drug discovery. Many traditional methods have been developed to solve the MCG problem, such as systematic searching, model-building, random searching, distance geometry, molecular dynamics, Monte Carlo methods, etc. However, they have some limitations depending on the molecular structures. Recently, there are plenty of deep learning based MCG methods, which claim they largely outperform the traditional methods. However, to our surprise, we design a simple and cheap algorithm (parameter-free) based on the traditional methods and find it is comparable to or even outperforms deep learning based MCG methods in the widely used GEOM-QM9 and GEOM-Drugs benchmarks. In particular, our design algorithm is simply the clustering of the RDKIT-generated conformations. We hope our findings can help the community to revise the deep learning methods for MCG. The code of the proposed algorithm could be found at https://gist.github.com/ZhouGengmo/5b565f51adafcd911c0bc115b2ef027c. △ Less

Submitted 27 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

arXiv:2209.13014 [pdf, other]

Predicting Protein-Ligand Binding Affinity via Joint Global-Local Interaction Modeling

Authors: Yang Zhang, Gengmo Zhou, Zhewei Wei, Hongteng Xu

Abstract: The prediction of protein-ligand binding affinity is of great significance for discovering lead compounds in drug research. Facing this challenging task, most existing prediction methods rely on the topological and/or spatial structure of molecules and the local interactions while ignoring the multi-level inter-molecular interactions between proteins and ligands, which often lead to sub-optimal pe… ▽ More The prediction of protein-ligand binding affinity is of great significance for discovering lead compounds in drug research. Facing this challenging task, most existing prediction methods rely on the topological and/or spatial structure of molecules and the local interactions while ignoring the multi-level inter-molecular interactions between proteins and ligands, which often lead to sub-optimal performance. To solve this issue, we propose a novel global-local interaction (GLI) framework to predict protein-ligand binding affinity. In particular, our GLI framework considers the inter-molecular interactions between proteins and ligands, which involve not only the high-energy short-range interactions between closed atoms but also the low-energy long-range interactions between non-bonded atoms. For each pair of protein and ligand, our GLI embeds the long-range interactions globally and aggregates local short-range interactions, respectively. Such a joint global-local interaction modeling strategy helps to improve prediction accuracy, and the whole framework is compatible with various neural network-based modules. Experiments demonstrate that our GLI framework outperforms state-of-the-art methods with simple neural network architectures and moderate computational costs. △ Less

Submitted 18 September, 2022; originally announced September 2022.

arXiv:2103.04376 [pdf, other]

Analyzing the Spatiotemporal Interaction and Propagation of ATN Biomarkers in Alzheimer's Disease using Longitudinal Neuroimaging Data

Authors: Qing Liu, Defu Yang, Jingwen Zhang, Ziming Wei, Guorong Wu, Minghan Chen

Abstract: Three major biomarkers: beta-amyloid (A), pathologic tau (T), and neurodegeneration (N), are recognized as valid proxies for neuropathologic changes of Alzheimer's disease. While there are extensive studies on cerebrospinal fluids biomarkers (amyloid, tau), the spatial propagation pattern across brain is missing and their interactive mechanisms with neurodegeneration are still unclear. To this end… ▽ More Three major biomarkers: beta-amyloid (A), pathologic tau (T), and neurodegeneration (N), are recognized as valid proxies for neuropathologic changes of Alzheimer's disease. While there are extensive studies on cerebrospinal fluids biomarkers (amyloid, tau), the spatial propagation pattern across brain is missing and their interactive mechanisms with neurodegeneration are still unclear. To this end, we aim to analyze the spatiotemporal associations between ATN biomarkers using large-scale neuroimaging data. We first investigate the temporal appearances of amyloid plaques, tau tangles, and neuronal loss by modeling the longitudinal transition trajectories. Second, we propose linear mixed-effects models to quantify the pathological interactions and propagation of ATN biomarkers at each brain region. Our analysis of the current data shows that there exists a temporal latency in the build-up of amyloid to the onset of tau pathology and neurodegeneration. The propagation pattern of amyloid can be characterized by its diffusion along the topological brain network. Our models provide sufficient evidence that the progression of pathological tau and neurodegeneration share a strong regional association, which is different from amyloid. △ Less

Submitted 7 March, 2021; originally announced March 2021.

Comments: 4 pages, 2 figures, to be published in IEEE ISBI 2021

arXiv:2001.00520 [pdf]

3D Deep Learning Enables Fast Imaging of Spines through Scattering Media by Temporal Focusing Microscopy

Authors: Zhun Wei, Josiah R. Boivin, Yi Xue, Xudong Chen, Peter T. C. So, Elly Nedivi, Dushan N. Wadduwage

Abstract: Today the gold standard for in vivo imaging through scattering tissue is the point-scanning two-photon microscope (PSTPM). Especially in neuroscience, PSTPM is widely used for deep-tissue imaging in the brain. However, due to sequential scanning, PSTPM is slow. Temporal focusing microscopy (TFM), on the other hand, focuses femtosecond pulsed laser light temporally, while keeping wide-field illumin… ▽ More Today the gold standard for in vivo imaging through scattering tissue is the point-scanning two-photon microscope (PSTPM). Especially in neuroscience, PSTPM is widely used for deep-tissue imaging in the brain. However, due to sequential scanning, PSTPM is slow. Temporal focusing microscopy (TFM), on the other hand, focuses femtosecond pulsed laser light temporally, while keeping wide-field illumination, and is consequently much faster. However, due to the use of a camera detector, TFM suffers from the scattering of emission photons. As a result, TFM produces images of poor spatial resolution and signal-to-noise ratio (SNR), burying fluorescent signals from small structures such as dendritic spines. In this work, we present a data-driven deep learning approach to improve resolution and SNR of TFM images. Using a 3D convolutional neural network (CNN) we build a map from TFM to PSTPM modalities, to enable fast TFM imaging while maintaining high-resolution through scattering media. We demonstrate this approach for in vivo imaging of dendritic spines on pyramidal neurons in the mouse visual cortex. We show that our trained network rapidly outputs high-resolution images that recover biologically relevant features previously buried in the scattered fluorescence in the TFM images. In vivo imaging that combines TFM and the proposed 3D convolution neural network is one to two orders of magnitude faster than PSTPM but retains the high resolution and SNR necessary to analyze small fluorescent structures. The proposed 3D convolution deep network could also be potentially beneficial for improving the performance of many speed-demanding deep-tissue imaging applications such as in vivo voltage imaging. △ Less

Submitted 24 December, 2019; originally announced January 2020.

arXiv:1901.11418 [pdf, other]

Sequential Bayesian Detection of Spike Activities from Fluorescence Observations

Authors: Zhuangkun Wei, Bin Li, Weisi Guo, Wenxiu Hu, Chenglin Zhao

Abstract: Extracting and detecting spike activities from the fluorescence observations is an important step in understanding how neuron systems work. The main challenge lies in that the combination of the ambient noise with dynamic baseline fluctuation, often contaminates the observations, thereby deteriorating the reliability of spike detection. This may be even worse in the face of the nonlinear biologica… ▽ More Extracting and detecting spike activities from the fluorescence observations is an important step in understanding how neuron systems work. The main challenge lies in that the combination of the ambient noise with dynamic baseline fluctuation, often contaminates the observations, thereby deteriorating the reliability of spike detection. This may be even worse in the face of the nonlinear biological process, the coupling interactions between spikes and baseline, and the unknown critical parameters of an underlying physiological model, in which erroneous estimations of parameters will affect the detection of spikes causing further error propagation. In this paper, we propose a random finite set (RFS) based Bayesian approach. The dynamic behaviors of spike sequence, fluctuated baseline and unknown parameters are formulated as one RFS. This RFS state is capable of distinguishing the hidden active/silent states induced by spike and non-spike activities respectively, thereby \emph{negating the interaction role} played by spikes and other factors. Then, premised on the RFS states, a Bayesian inference scheme is designed to simultaneously estimate the model parameters, baseline, and crucial spike activities. Our results demonstrate that the proposed scheme can gain an extra $12\%$ detection accuracy in comparison with the state-of-the-art MLSpike method. △ Less

Submitted 31 January, 2019; originally announced January 2019.

arXiv:1811.02459 [pdf, other]

Nonlinear Evolution via Spatially-Dependent Linear Dynamics for Electrophysiology and Calcium Data

Authors: Daniel Hernandez, Antonio Khalil Moretti, Ziqiang Wei, Shreya Saxena, John Cunningham, Liam Paninski

Abstract: Latent variable models have been widely applied for the analysis of time series resulting from experimental neuroscience techniques. In these datasets, observations are relatively smooth and possibly nonlinear. We present Variational Inference for Nonlinear Dynamics (VIND), a variational inference framework that is able to uncover nonlinear, smooth latent dynamics from sequential data. The framewo… ▽ More Latent variable models have been widely applied for the analysis of time series resulting from experimental neuroscience techniques. In these datasets, observations are relatively smooth and possibly nonlinear. We present Variational Inference for Nonlinear Dynamics (VIND), a variational inference framework that is able to uncover nonlinear, smooth latent dynamics from sequential data. The framework is a direct extension of PfLDS; including a structured approximate posterior describing spatially-dependent linear dynamics, as well as an algorithm that relies on the fixed-point iteration method to achieve convergence. We apply VIND to electrophysiology, single-cell voltage and widefield imaging datasets with state-of-the-art results in reconstruction error. In single-cell voltage data, VIND finds a 5D latent space, with variables akin to those of Hodgkin-Huxley-like models. VIND's learned dynamics are further quantified by predicting future neural activity. VIND excels in this task, in some cases substantially outperforming current methods. △ Less

Submitted 16 June, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

Comments: 8 figs, Accepted at NBDT

arXiv:1612.00525 [pdf, other]

A Noise-Filtering Approach for Cancer Drug Sensitivity Prediction

Authors: Turki Turki, Zhi Wei

Abstract: Accurately predicting drug responses to cancer is an important problem hindering oncologists' efforts to find the most effective drugs to treat cancer, which is a core goal in precision medicine. The scientific community has focused on improving this prediction based on genomic, epigenomic, and proteomic datasets measured in human cancer cell lines. Real-world cancer cell lines contain noise, whic… ▽ More Accurately predicting drug responses to cancer is an important problem hindering oncologists' efforts to find the most effective drugs to treat cancer, which is a core goal in precision medicine. The scientific community has focused on improving this prediction based on genomic, epigenomic, and proteomic datasets measured in human cancer cell lines. Real-world cancer cell lines contain noise, which degrades the performance of machine learning algorithms. This problem is rarely addressed in the existing approaches. In this paper, we present a noise-filtering approach that integrates techniques from numerical linear algebra and information retrieval targeted at filtering out noisy cancer cell lines. By filtering out noisy cancer cell lines, we can train machine learning algorithms on better quality cancer cell lines. We evaluate the performance of our approach and compare it with an existing approach using the Area Under the ROC Curve (AUC) on clinical trial data. The experimental results show that our proposed approach is stable and also yields the highest AUC at a statistically significant level. △ Less

Submitted 5 December, 2016; v1 submitted 1 December, 2016; originally announced December 2016.

Comments: Accepted at NIPS 2016 Workshop on Machine Learning for Health

Showing 1–12 of 12 results for author: Wei, Z