Search | arXiv e-print repository

Neurodevelopmental disorders modeling using isogeometric analysis, dynamic domain expansion and local refinement

Authors: Kuanren Qian, Genesis Omana Suarez, Toshihiko Nambara, Takahisa Kanekiyo, Ashlee S. Liao, Victoria A. Webster-Wood, Yongjie Jessica Zhang

Abstract: Neurodevelopmental disorders (NDDs) have arisen as one of the most prevailing chronic diseases within the US. Often associated with severe adverse impacts on the formation of vital central and peripheral nervous systems during the neurodevelopmental process, NDDs are comprised of a broad spectrum of disorders, such as autism spectrum disorder, attention deficit hyperactivity disorder, and epilepsy… ▽ More Neurodevelopmental disorders (NDDs) have arisen as one of the most prevailing chronic diseases within the US. Often associated with severe adverse impacts on the formation of vital central and peripheral nervous systems during the neurodevelopmental process, NDDs are comprised of a broad spectrum of disorders, such as autism spectrum disorder, attention deficit hyperactivity disorder, and epilepsy, characterized by progressive and pervasive detriments to cognitive, speech, memory, motor, and other neurological functions in patients. However, the heterogeneous nature of NDDs poses a significant roadblock to identifying the exact pathogenesis, impeding accurate diagnosis and the development of targeted treatment planning. A computational NDDs model holds immense potential in enhancing our understanding of the multifaceted factors involved and could assist in identifying the root causes to expedite treatment development. To tackle this challenge, we introduce optimal neurotrophin concentration to the driving force and degradation of neurotrophin to the synaptogenesis process of a 2D phase field neuron growth model using isogeometric analysis to simulate neurite retraction and atrophy. The optimal neurotrophin concentration effectively captures the inverse relationship between neurotrophin levels and neurite survival, while its degradation regulates concentration levels. Leveraging dynamic domain expansion, the model efficiently expands the domain based on outgrowth patterns to minimize degrees of freedom. Based on truncated T-splines, our model simulates the evolving process of complex neurite structures by applying local refinement adaptively to the cell/neurite boundary. Furthermore, a thorough parameter investigation is conducted with detailed comparisons against neuron cell cultures in experiments, enhancing our fundamental understanding of the mechanisms underlying NDDs. △ Less

Submitted 3 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

Comments: 23 pages, 10 figures, 1 table

arXiv:2406.09989 [pdf, other]

Suppressing seizure via optimal electrical stimulation to the hub of epileptic brain network

Authors: Zhichao Liang, Guanyi Zhao, Yinuo Zhang, Weiting Sun, Jingzhe Lin, Jialin Wang, Quanying Liu

Abstract: The electrical stimulation to the seizure onset zone (SOZ) serves as an efficient approach to seizure suppression. Recently, seizure dynamics have gained widespread attendance in its network propagation mechanisms. Compared with the direct stimulation to SOZ, other brain network-level approaches that can effectively suppress epileptic seizures remain under-explored. In this study, we introduce a p… ▽ More The electrical stimulation to the seizure onset zone (SOZ) serves as an efficient approach to seizure suppression. Recently, seizure dynamics have gained widespread attendance in its network propagation mechanisms. Compared with the direct stimulation to SOZ, other brain network-level approaches that can effectively suppress epileptic seizures remain under-explored. In this study, we introduce a platform equipped with a system identification module and a control strategy module, to validate the effectiveness of the hub of the epileptic brain network in suppressing seizure. The identified surrogate dynamics show high predictive performance in reconstructing neural dynamics which enables the model predictive framework to achieve accurate neural stimulation. The electrical stimulation on the hub of the epileptic brain network shows remarkable performance as the direct stimulation of SOZ in suppressing seizure dynamics. Underpinned by network control theory, our platform offers a general tool for the validation of neural stimulation. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08980 [pdf, other]

From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

Authors: Bowen Gao, Haichuan Tan, Yanwen Huang, Minsi Ren, Xiao Huang, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

Abstract: Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability… ▽ More Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability of the Vina docking score, the current standard for assessing binding abilities, is increasingly questioned due to its susceptibility to overfitting. To address these limitations, we propose a comprehensive evaluation framework that includes assessing the similarity of generated molecules to known active compounds, introducing a virtual screening-based metric for practical deployment capabilities, and re-evaluating binding affinity more rigorously. Our experiments reveal that while current SBDD models achieve high Vina scores, they fall short in practical usability metrics, highlighting a significant gap between theoretical predictions and real-world applicability. Our proposed metrics and dataset aim to bridge this gap, enhancing the practical applicability of future SBDD models and aligning them more closely with the needs of pharmaceutical research and development. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08961 [pdf, other]

SIU: A Million-Scale Structural Small Molecule-Protein Interaction Dataset for Unbiased Bioactivity Prediction

Authors: Yanwen Huang, Bowen Gao, Yinjun Jia, Hongbo Ma, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

Abstract: Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or t… ▽ More Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or toxic pharmacological outcomes of small molecules, rendering accurate bioactivity prediction crucial for the development of safe and effective drugs. However, existing structural datasets of small molecule-protein interactions are often limited in scale and lack systematically organized bioactivity labels, thereby impeding our understanding of these interactions and precise bioactivity prediction. In this study, we introduce a comprehensive dataset of small molecule-protein interactions, consisting of over a million binding structures, each annotated with real biological activity labels. This dataset is designed to facilitate unbiased bioactivity prediction. We evaluated several classical models on this dataset, and the results demonstrate that the task of unbiased bioactivity prediction is challenging yet essential. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.03406 [pdf]

LncRNA-disease association prediction method based on heterogeneous information completion and convolutional neural network

Authors: Wen-Yu Xi, Juan Wang, Yu-Lin Zhang, Jin-Xing Liu, Yin-Lian Gao

Abstract: The emerging research shows that lncRNA has crucial research value in a series of complex human diseases. Therefore, the accurate identification of lncRNA-disease associations (LDAs) is very important for the warning and treatment of diseases. However, most of the existing methods have limitations in identifying nonlinear LDAs, and it remains a huge challenge to predict new LDAs. In this paper, a… ▽ More The emerging research shows that lncRNA has crucial research value in a series of complex human diseases. Therefore, the accurate identification of lncRNA-disease associations (LDAs) is very important for the warning and treatment of diseases. However, most of the existing methods have limitations in identifying nonlinear LDAs, and it remains a huge challenge to predict new LDAs. In this paper, a deep learning model based on a heterogeneous network and convolutional neural network (CNN) is proposed for lncRNA-disease association prediction, named HCNNLDA. The heterogeneous network containing the lncRNA, disease, and miRNA nodes, is constructed firstly. The embedding matrix of a lncRNA-disease node pair is constructed according to various biological premises about lncRNAs, diseases, and miRNAs. Then, the low-dimensional feature representation is fully learned by the convolutional neural network. In the end, the XGBoot classifier model is trained to predict the potential LDAs. HCNNLDA obtains a high AUC value of 0.9752 and AUPR of 0.9740 under the 5-fold cross-validation. The experimental results show that the proposed model has better performance than that of several latest prediction models. Meanwhile, the effectiveness of HCNNLDA in identifying novel LDAs is further demonstrated by case studies of three diseases. To sum up, HCNNLDA is a feasible calculation model to predict LDAs. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.16123 [pdf, other]

Retro-prob: Retrosynthetic Planning Based on a Probabilistic Model

Authors: Chengyang Tian, Yangpeng Zhang, Yang Liu

Abstract: Retrosynthesis is a fundamental but challenging task in organic chemistry, with broad applications in fields such as drug design and synthesis. Given a target molecule, the goal of retrosynthesis is to find out a series of reactions which could be assembled into a synthetic route which starts from purchasable molecules and ends at the target molecule. The uncertainty of reactions used in retrosynt… ▽ More Retrosynthesis is a fundamental but challenging task in organic chemistry, with broad applications in fields such as drug design and synthesis. Given a target molecule, the goal of retrosynthesis is to find out a series of reactions which could be assembled into a synthetic route which starts from purchasable molecules and ends at the target molecule. The uncertainty of reactions used in retrosynthetic planning, which is caused by hallucinations of backward models, has recently been noticed. In this paper we propose a succinct probabilistic model to describe such uncertainty. Based on the model, we propose a new retrosynthesis planning algorithm called retro-prob to maximize the successful synthesis probability of target molecules, which acquires high efficiency by utilizing the chain rule of derivatives. Experiments on the Paroutes benchmark show that retro-prob outperforms previous algorithms, retro* and retro-fallback, both in speed and in the quality of synthesis plans. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.06649 [pdf, other]

ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction

Authors: Mingyu Jin, Haochen Xue, Zhenting Wang, Boming Kang, Ruosong Ye, Kaixiong Zhou, Mengnan Du, Yongfeng Zhang

Abstract: The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases. Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions, ignoring the broader context of nonphysical connections through intermediate proteins, thus limiting their effectiveness. The emergence of Large Language Models (LLMs) provides a ne… ▽ More The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases. Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions, ignoring the broader context of nonphysical connections through intermediate proteins, thus limiting their effectiveness. The emergence of Large Language Models (LLMs) provides a new opportunity for addressing this complex biological challenge. By transforming structured data into natural language prompts, we can map the relationships between proteins into texts. This approach allows LLMs to identify indirect connections between proteins, tracing the path from upstream to downstream. Therefore, we propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time. Specifically, we propose Protein Chain of Thought (ProCoT), which replicates the biological mechanism of signaling pathways as natural language prompts. ProCoT considers a signaling pathway as a protein reasoning process, which starts from upstream proteins and passes through several intermediate proteins to transmit biological signals to downstream proteins. Thus, we can use ProCoT to predict the interaction between upstream proteins and downstream proteins. The training of ProLLM employs the ProCoT format, which enhances the model's understanding of complex biological problems. In addition to ProCoT, this paper also contributes to the exploration of embedding replacement of protein sites in natural language prompts, and instruction fine-tuning in protein knowledge datasets. We demonstrate the efficacy of ProLLM through rigorous validation against benchmark datasets, showing significant improvement over existing methods in terms of prediction accuracy and generalizability. The code is available at: https://github.com/MingyuJ666/ProLLM. △ Less

Submitted 30 March, 2024; originally announced May 2024.

arXiv:2404.18162 [pdf, other]

fMRI Exploration of Visual Quality Assessment

Authors: Yiming Zhang, Ying Hu, Xiongkuo Min, Yan Zhou, Guangtao Zhai

Abstract: Despite significant strides in visual quality assessment, the neural mechanisms underlying visual quality perception remain insufficiently explored. This study employed fMRI to examine brain activity during image quality assessment and identify differences in human processing of images with varying quality. Fourteen healthy participants underwent tasks assessing both image quality and content clas… ▽ More Despite significant strides in visual quality assessment, the neural mechanisms underlying visual quality perception remain insufficiently explored. This study employed fMRI to examine brain activity during image quality assessment and identify differences in human processing of images with varying quality. Fourteen healthy participants underwent tasks assessing both image quality and content classification while undergoing functional MRI scans. The collected behavioral data was statistically analyzed, and univariate and functional connectivity analyses were conducted on the imaging data. The findings revealed that quality assessment is a more complex task than content classification, involving enhanced activation in high-level cognitive brain regions for fine-grained visual analysis. Moreover, the research showed the brain's adaptability to different visual inputs, adopting different strategies depending on the input's quality. In response to high-quality images, the brain primarily uses specialized visual areas for precise analysis, whereas with low-quality images, it recruits additional resources including higher-order visual cortices and related cognitive and attentional networks to decode and recognize complex, ambiguous signals effectively. This study pioneers the intersection of neuroscience and image quality research, providing empirical evidence through fMRI linking image quality to neural processing. It contributes novel insights into the human visual system's response to diverse image qualities, thereby paving the way for advancements in objective image quality assessment algorithms. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.16880 [pdf, other]

Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

Authors: Yikun Zhang, Geyan Ye, Chaohao Yuan, Bo Han, Long-Kai Huang, Jianhua Yao, Wei Liu, Yu Rong

Abstract: Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields, including drug discovery and materials science. Existing studies adopt a global alignment approach to learn the knowledge from different modalities. These global alignment approaches fail to cap… ▽ More Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields, including drug discovery and materials science. Existing studies adopt a global alignment approach to learn the knowledge from different modalities. These global alignment approaches fail to capture fine-grained information, such as molecular fragments and their corresponding textual description, which is crucial for downstream tasks. Furthermore, it is incapable to model such information using a similar global alignment strategy due to data scarcity of paired local part annotated data from existing datasets. In this paper, we propose Atomas, a multi-modal molecular representation learning framework to jointly learn representations from SMILES string and text. We design a Hierarchical Adaptive Alignment model to concurrently learn the fine-grained fragment correspondence between two modalities and align these representations of fragments in three levels. Additionally, Atomas's end-to-end training framework incorporates the tasks of understanding and generating molecule, thereby supporting a wider range of downstream tasks. In the retrieval task, Atomas exhibits robust generalization ability and outperforms the baseline by 30.8% of recall@1 on average. In the generation task, Atomas achieves state-of-the-art results in both molecule captioning task and molecule generation task. Moreover, the visualization of the Hierarchical Adaptive Alignment model further confirms the chemical significance of our approach. Our codes can be found at https://anonymous.4open.science/r/Atomas-03C3. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.16866 [pdf, other]

Functional Protein Design with Local Domain Alignment

Authors: Chaohao Yuan, Songyou Li, Geyan Ye, Yikun Zhang, Long-Kai Huang, Wenbing Huang, Wei Liu, Jianhua Yao, Yu Rong

Abstract: The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which d… ▽ More The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which directly describe the protein's high-level functionalities, properties, and their correlation with target amino acid sequences, remain unexplored in the context of protein design tasks. In this paper, we propose Protein-Annotation Alignment Generation (PAAG), a multi-modality protein design framework that integrates the textual annotations extracted from protein database for controllable generation in sequence space. Specifically, within a multi-level alignment module, PAAG can explicitly generate proteins containing specific domains conditioned on the corresponding domain annotations, and can even design novel proteins with flexible combinations of different kinds of annotations. Our experimental results underscore the superiority of the aligned protein representations from PAAG over 7 prediction tasks. Furthermore, PAAG demonstrates a nearly sixfold increase in generation success rate (24.7% vs 4.7% in zinc finger, and 54.3% vs 8.7% in the immunoglobulin domain) in comparison to the existing model. △ Less

Submitted 27 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.16357 [pdf, other]

Reverse engineering the brain input: Network control theory to identify cognitive task-related control nodes

Authors: Zhichao Liang, Yinuo Zhang, Jushen Wu, Quanying Liu

Abstract: The human brain receives complex inputs when performing cognitive tasks, which range from external inputs via the senses to internal inputs from other brain regions. However, the explicit inputs to the brain during a cognitive task remain unclear. Here, we present an input identification framework for reverse engineering the control nodes and the corresponding inputs to the brain. The framework is… ▽ More The human brain receives complex inputs when performing cognitive tasks, which range from external inputs via the senses to internal inputs from other brain regions. However, the explicit inputs to the brain during a cognitive task remain unclear. Here, we present an input identification framework for reverse engineering the control nodes and the corresponding inputs to the brain. The framework is verified with synthetic data generated by a predefined linear system, indicating it can robustly reconstruct data and recover the inputs. Then we apply the framework to the real motor-task fMRI data from 200 human subjects. Our results show that the model with sparse inputs can reconstruct neural dynamics in motor tasks ($EV=0.779$) and the identified 28 control nodes largely overlap with the motor system. Underpinned by network control theory, our framework offers a general tool for understanding brain inputs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.11759 [pdf, other]

Modelling infectious disease transmission dynamics in conference environments: An individual-based approach

Authors: Xue Liu, Yue Deng, Jingying Huang, Yuhong Zhang, Jinzhi Lei

Abstract: The global public health landscape is perpetually challenged by the looming threat of infectious diseases. Central to addressing this concern is the imperative to prevent and manage disease transmission during pandemics, particularly in unique settings. This study addresses the transmission dynamics of infectious diseases within conference venues, presenting a computational model designed to simul… ▽ More The global public health landscape is perpetually challenged by the looming threat of infectious diseases. Central to addressing this concern is the imperative to prevent and manage disease transmission during pandemics, particularly in unique settings. This study addresses the transmission dynamics of infectious diseases within conference venues, presenting a computational model designed to simulate transmission processes within a condensed timeframe (one day), beginning with sporadic cases. Our model intricately captures the activities of individual attendees within the conference venue, encompassing meetings, rest intervals, and meal breaks. While meetings entail proximity seating, rest and lunch periods allow attendees to interact with diverse individuals. Moreover, the restroom environment poses an additional avenue for potential infection transmission. Employing an individual-based model, we meticulously replicated the transmission dynamics of infectious diseases, with a specific emphasis on close-contact interactions between infected and susceptible individuals. Through comprehensive analysis of model simulations, we elucidated the intricacies of disease transmission dynamics within conference settings and assessed the efficacy of control strategies to curb disease dissemination. Ultimately, our study proffers a numerical framework for assessing the risk of infectious disease transmission during short-duration conferences, furnishing conference organizers with valuable insights to inform the implementation of targeted prevention and control measures. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 25 pages; 8 figures

arXiv:2404.10354 [pdf]

Physical formula enhanced multi-task learning for pharmacokinetics prediction

Authors: Ruifeng Li, Dongzhan Zhou, Ancheng Shen, Ao Zhang, Mao Su, Mingqian Li, Hongyang Chen, Gang Chen, Yin Zhang, Shufei Zhang, Yuqiang Li, Wanli Ouyang

Abstract: Artificial intelligence (AI) technology has demonstrated remarkable potential in drug dis-covery, where pharmacokinetics plays a crucial role in determining the dosage, safety, and efficacy of new drugs. A major challenge for AI-driven drug discovery (AIDD) is the scarcity of high-quality data, which often requires extensive wet-lab work. A typical example of this is pharmacokinetic experiments. I… ▽ More Artificial intelligence (AI) technology has demonstrated remarkable potential in drug dis-covery, where pharmacokinetics plays a crucial role in determining the dosage, safety, and efficacy of new drugs. A major challenge for AI-driven drug discovery (AIDD) is the scarcity of high-quality data, which often requires extensive wet-lab work. A typical example of this is pharmacokinetic experiments. In this work, we develop a physical formula enhanced mul-ti-task learning (PEMAL) method that predicts four key parameters of pharmacokinetics simultaneously. By incorporating physical formulas into the multi-task framework, PEMAL facilitates effective knowledge sharing and target alignment among the pharmacokinetic parameters, thereby enhancing the accuracy of prediction. Our experiments reveal that PEMAL significantly lowers the data demand, compared to typical Graph Neural Networks. Moreover, we demonstrate that PEMAL enhances the robustness to noise, an advantage that conventional Neural Networks do not possess. Another advantage of PEMAL is its high flexibility, which can be potentially applied to other multi-task machine learning scenarios. Overall, our work illustrates the benefits and potential of using PEMAL in AIDD and other scenarios with data scarcity and noise. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.05329 [pdf]

In silico bioactivity prediction of proteins interacting with graphene-based nanomaterials guides rational design of biosensor

Authors: Jing Ye, Minzhi Fan, Xiaoyu Zhang, Shasha Lu, Mengyao Chai, Yunshan Zhang, Xiaoyu Zhao, Shuang Li, Diming Zhang

Abstract: Graphene based nanomaterials have attracted significant attention for their potentials in biomedical and biotechnology applications in recent years, owing to the outstanding physical and chemical properties. However, the interaction mechanism and impact on biological activity of macro and micro biomolecules still require more concerns and further research in order to enhance their applicability in… ▽ More Graphene based nanomaterials have attracted significant attention for their potentials in biomedical and biotechnology applications in recent years, owing to the outstanding physical and chemical properties. However, the interaction mechanism and impact on biological activity of macro and micro biomolecules still require more concerns and further research in order to enhance their applicability in biosensors, etc. Herein, an integrated method has been developed to predict the protein bioactivity performance when interacting with nanomaterials for protein based biosensor. Molecular dynamics simulation and molecular docking technique were consolidated to investigate several nanomaterials C60 fullerene, single walled carbon nanotube, pristine graphene and graphene oxide, and their effect when interacting with protein. The adsorption behavior, secondary structure changes and protein bioactivity changes were simulated, and the results of protein activity simulation were verified in combination with atomic force spectrum, circular dichroism spectrum fluorescence and electrochemical experiments. The best quantification alignment between bioactivity obtained by simulation and experiment measurements was further explored. The two proteins, RNase A and Exonuclease III, were regarded as analysis model for the proof of concept, and the prediction accuracy of protein bioactivty could reach up to 0.98. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.03516 [pdf]

Drug-target interaction prediction by integrating heterogeneous information with mutual attention network

Authors: Yuanyuan Zhang, Yingdong Wang, Chaoyong Wu, Lingmin Zhana, Aoyi Wang, Caiping Cheng, Jinzhong Zhao, Wuxia Zhang, Jianxin Chen, Peng Li

Abstract: Identification of drug-target interactions is an indispensable part of drug discovery. While conventional shallow machine learning and recent deep learning methods based on chemogenomic properties of drugs and target proteins have pushed this prediction performance improvement to a new level, these methods are still difficult to adapt to novel structures. Alternatively, large-scale biological and… ▽ More Identification of drug-target interactions is an indispensable part of drug discovery. While conventional shallow machine learning and recent deep learning methods based on chemogenomic properties of drugs and target proteins have pushed this prediction performance improvement to a new level, these methods are still difficult to adapt to novel structures. Alternatively, large-scale biological and pharmacological data provide new ways to accelerate drug-target interaction prediction. Here, we propose DrugMAN, a deep learning model for predicting drug-target interaction by integrating multiplex heterogeneous functional networks with a mutual attention network (MAN). DrugMAN uses a graph attention network-based integration algorithm to learn network-specific low-dimensional features for drugs and target proteins by integrating four drug networks and seven gene/protein networks, respectively. DrugMAN then captures interaction information between drug and target representations by a mutual attention network to improve drug-target prediction. DrugMAN achieves the best prediction performance under four different scenarios, especially in real-world scenarios. DrugMAN spotlights heterogeneous information to mine drug-target interactions and can be a powerful tool for drug discovery and drug repurposing. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.00044 [pdf, other]

UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment

Authors: Kaipeng Zeng, Bo yang, Xin Zhao, Yu Zhang, Fan Nie, Xiaokang Yang, Yaohui Jin, Yanyan Xu

Abstract: Motivation: Retrosynthesis planning poses a formidable challenge in the organic chemical industry. Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science. Various deep learning-based methods have been proposed for this task in recent years, incorporating diverse levels of additional chem… ▽ More Motivation: Retrosynthesis planning poses a formidable challenge in the organic chemical industry. Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science. Various deep learning-based methods have been proposed for this task in recent years, incorporating diverse levels of additional chemical knowledge dependency. Results: This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction. By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules. Based on the fact that the majority of molecule structures remain unchanged during a chemical reaction, we propose a simple yet effective SMILES alignment technique to facilitate the reuse of unchanged structures for reactant generation. Extensive experiments show that our method substantially outperforms state-of-the-art template-free and semi-template-based approaches. Importantly, our template-free method achieves effectiveness comparable to, or even surpasses, established powerful template-based methods. Scientific contribution: We present a novel graph-to-sequence template-free retrosynthesis prediction pipeline that overcomes the limitations of Transformer-based methods in molecular representation learning and insufficient utilization of chemical information. We propose an unsupervised learning mechanism for establishing product-atom correspondence with reactant SMILES tokens, achieving even better results than supervised SMILES alignment methods. Extensive experiments demonstrate that UAlign significantly outperforms state-of-the-art template-free methods and rivals or surpasses template-based approaches, with up to 5\% (top-5) and 5.4\% (top-10) increased accuracy over the strongest baseline. △ Less

Submitted 19 April, 2024; v1 submitted 24 March, 2024; originally announced April 2024.

arXiv:2403.18826 [pdf]

SAM-dPCR: Real-Time and High-throughput Absolute Quantification of Biological Samples Using Zero-Shot Segment Anything Model

Authors: Yuanyuan Wei, Shanhang Luo, Changran Xu, Yingqi Fu, Qingyue Dong, Yi Zhang, Fuyang Qu, Guangyao Cheng, Yi-Ping Ho, Ho-Pui Ho, Wu Yuan

Abstract: Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM… ▽ More Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM-dPCR, a novel self-supervised learning-based pipeline that enables real-time and high-throughput absolute quantification of biological samples. Leveraging the zero-shot SAM model, SAM-dPCR efficiently analyzes diverse microreactors with over 97.7% accuracy within a rapid processing time of 3.16 seconds. By utilizing commonly available lab fluorescence microscopes, SAM-dPCR facilitates the quantification of sample concentrations. The accuracy of SAM-dPCR is validated by the strong linear relationship observed between known and inferred sample concentrations. Additionally, SAM-dPCR demonstrates versatility through comprehensive verification using various samples and reactor morphologies. This accessible, cost-effective tool transcends the limitations of traditional detection methods or fully supervised AI models, marking the first application of SAM in nucleic acid detection or molecular diagnostics. By eliminating the need for annotated training data, SAM-dPCR holds great application potential for nucleic acid quantification in resource-limited settings. △ Less

Submitted 22 January, 2024; originally announced March 2024.

Comments: 23 pages, 6 figures

arXiv:2403.07297 [pdf]

Optical detection of bacterial cells on stainless-steel surface with a low-magnification light microscope

Authors: Yuzhen Zhang, Zili Gao, Lili He

Abstract: A Rapid and cost-effective method for detecting bacterial cells on surfaces is critical to protect public health from various aspects, including food safety, clinical hygiene, and pharmacy quality. Herein, we first established an optical detection method based on a gold chip coating with 3-mercaptophenylboronic acid (3-MPBA) to capture bacterial cells, which allows for the detection and quantifica… ▽ More A Rapid and cost-effective method for detecting bacterial cells on surfaces is critical to protect public health from various aspects, including food safety, clinical hygiene, and pharmacy quality. Herein, we first established an optical detection method based on a gold chip coating with 3-mercaptophenylboronic acid (3-MPBA) to capture bacterial cells, which allows for the detection and quantification of bacterial cells with a standard light microscope under low-magnification (10 fold) objective lens. Then, integrating the developed optical detection method with swab sampling to achieve to detect bacterial cells loading on stainless-steel surfaces. Using Salmonella enterica (SE1045) and Escherichia coli as model bacterial cells, we achieved a capture efficiency of up to 76.0 % for SE1045 cells and 81.1 % for E. coli cells at Log 3 CFU/mL upon the optimized conditions. Our assay showed good linear relationship between the concentrations of bacterial cells with the cell counting in images with the limit of detection (LOD) of Log 3 CFU/mL for both SE1045 and E. coli cells. A further increase in sensitivity in detecting E. coli cells was achieved through a heat treatment, enabling the LOD to be pushed as low as Log 2 CFU/mL. Furthermore, successful application was observed in assessing bacterial contamination on stainless-steel surface following integrating with swab collection, achieving a recovery rate of approximately 70 % suggests future prospects for evaluating the cleanliness of surfaces. The entire process was completed within around 2 hours, with a cost of merely 2 dollars per sample. Given a standard light microscope cost around 250 dollars, our developed method has shown great potential in practical industrial applications for bacterial contamination control on surfaces in low-resource settings. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 38 pages, 13 figures, 1 table

arXiv:2403.06940 [pdf, other]

Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction

Authors: Qing Xiao, Siyeop Yoon, Hui Ren, Matthew Tivnan, Lichao Sun, Quanzheng Li, Tianming Liu, Yu Zhang, Xiang Li

Abstract: Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffe… ▽ More Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffer from temporal sparsity and incompleteness, presenting substantial challenges in modeling the disease's progression accurately. Existing methods are limited, focusing primarily on datasets without missing entries or requiring predefined assumptions about CTh progression. To overcome these obstacles, we propose a conditional score-based diffusion model specifically designed to generate CTh trajectories with the given baseline information, such as age, sex, and initial diagnosis. Our conditional diffusion model utilizes all available data during the training phase to make predictions based solely on baseline information during inference without needing prior history about CTh progression. The prediction accuracy of the proposed CTh prediction pipeline using a conditional score-based model was compared for sub-groups consisting of cognitively normal, mild cognitive impairment, and AD subjects. The Bland-Altman analysis shows our diffusion-based prediction model has a near-zero bias with narrow 95% confidential interval compared to the ground-truth CTh in 6-36 months. In addition, our conditional diffusion model has a stochastic generative nature, therefore, we demonstrated an uncertainty analysis of patient-specific CTh prediction through multiple realizations. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.02724 [pdf]

A genome-scale deep learning model to predict gene expression changes of genetic perturbations from multiplex biological networks

Authors: Lingmin Zhan, Yuanyuan Zhang, Yingdong Wang, Aoyi Wang, Caiping Cheng, Jinzhong Zhao, Wuxia Zhang, Peng Lia, Jianxin Chen

Abstract: Systematic characterization of biological effects to genetic perturbation is essential to the application of molecular biology and biomedicine. However, the experimental exhaustion of genetic perturbations on the genome-wide scale is challenging. Here, we show that TranscriptionNet, a deep learning model that integrates multiple biological networks to systematically predict transcriptional profile… ▽ More Systematic characterization of biological effects to genetic perturbation is essential to the application of molecular biology and biomedicine. However, the experimental exhaustion of genetic perturbations on the genome-wide scale is challenging. Here, we show that TranscriptionNet, a deep learning model that integrates multiple biological networks to systematically predict transcriptional profiles to three types of genetic perturbations based on transcriptional profiles induced by genetic perturbations in the L1000 project: RNA interference (RNAi), clustered regularly interspaced short palindromic repeat (CRISPR) and overexpression (OE). TranscriptionNet performs better than existing approaches in predicting inducible gene expression changes for all three types of genetic perturbations. TranscriptionNet can predict transcriptional profiles for all genes in existing biological networks and increases perturbational gene expression changes for each type of genetic perturbation from a few thousand to 26,945 genes. TranscriptionNet demonstrates strong generalization ability when comparing predicted and true gene expression changes on different external tasks. Overall, TranscriptionNet can systemically predict transcriptional consequences induced by perturbing genes on a genome-wide scale and thus holds promise to systemically detect gene function and enhance drug development and target discovery. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00020 [pdf, other]

Operators' cognitive performance under extreme hot-humid exposure and its physiological-psychological mechanism based on ECG, fNIRS, and Eye Tracking

Authors: Yan Zhang, Ming Jia, Meng Li, Jianyu Wang, Xiangmin Hu, Zhihui Xu, Tao Chen

Abstract: Operators' cognitive functions are impaired significantly under extreme heat stress, potentially resulting in more severe secondary disasters. This research investigated the impact of elevated temperature and humidity (25 60%RH, 30 70%RH, 35 80%RH, 40 90%RH) on the cognitive functions and performance of operators. Meanwhile, we explored the psychological-physiological mechanism underlying the chan… ▽ More Operators' cognitive functions are impaired significantly under extreme heat stress, potentially resulting in more severe secondary disasters. This research investigated the impact of elevated temperature and humidity (25 60%RH, 30 70%RH, 35 80%RH, 40 90%RH) on the cognitive functions and performance of operators. Meanwhile, we explored the psychological-physiological mechanism underlying the change in performance by electrocardiogram (ECG), functional near-infrared spectroscopy (fNIRS), and eye tracking physiologically. Psychological aspects such as situation awareness, workload, and working memory were assessed. Eventually, we verified and extended the maximal adaptability model to the extreme condition. Unexpectedly, a temporary improvement in simple reaction tasks but rapid impairment in advanced cognitive functions (i.e. situation awareness, communication, working memory) was obtained above 35 WBGT. The best performance in a suitable environment was due to more effective activation in the prefrontal cortex (PFC). With temperature increasing, more mistakes occurred and comprehension was impaired due to drowsiness and lower arousal levels, according to evidence of compensatory effect in fNIRS. In the extreme environment, the enhanced PFC cooperation with higher functional connectivity resulted in a temporary improvement, while depressed activation in PFC, heavy physical load, and poor regulation of the cardiovascular system restricted it. Our results provide a detailed study of the process of operators' performance and cognitive functions when encountering increasing heat stress, as well as its underlying mechanisms from a neuroergonomics perspective. This can contribute to a better understanding of the interaction between operators' performance and workplace conditions, and help to achieve a more reliable human-centered production system in the promising era of Industry 5.0. △ Less

Submitted 27 May, 2024; v1 submitted 28 February, 2024; originally announced March 2024.

arXiv:2402.17774 [pdf]

doi 10.1021/acsnano.4c02434

A paper-based multiplexed serological test to monitor immunity against SARS-CoV-2 using machine learning

Authors: Merve Eryilmaz, Artem Goncharov, Gyeo-Re Han, Hyou-Arm Joung, Zachary S. Ballard, Rajesh Ghosh, Yijie Zhang, Dino Di Carlo, Aydogan Ozcan

Abstract: The rapid spread of SARS-CoV-2 caused the COVID-19 pandemic and accelerated vaccine development to prevent the spread of the virus and control the disease. Given the sustained high infectivity and evolution of SARS-CoV-2, there is an ongoing interest in developing COVID-19 serology tests to monitor population-level immunity. To address this critical need, we designed a paper-based multiplexed vert… ▽ More The rapid spread of SARS-CoV-2 caused the COVID-19 pandemic and accelerated vaccine development to prevent the spread of the virus and control the disease. Given the sustained high infectivity and evolution of SARS-CoV-2, there is an ongoing interest in developing COVID-19 serology tests to monitor population-level immunity. To address this critical need, we designed a paper-based multiplexed vertical flow assay (xVFA) using five structural proteins of SARS-CoV-2, detecting IgG and IgM antibodies to monitor changes in COVID-19 immunity levels. Our platform not only tracked longitudinal immunity levels but also categorized COVID-19 immunity into three groups: protected, unprotected, and infected, based on the levels of IgG and IgM antibodies. We operated two xVFAs in parallel to detect IgG and IgM antibodies using a total of 40 uL of human serum sample in <20 min per test. After the assay, images of the paper-based sensor panel were captured using a mobile phone-based custom-designed optical reader and then processed by a neural network-based serodiagnostic algorithm. The trained serodiagnostic algorithm was blindly tested with serum samples collected before and after vaccination or infection, achieving an accuracy of 89.5%. The competitive performance of the xVFA, along with its portability, cost-effectiveness, and rapid operation, makes it a promising computational point-of-care (POC) serology test for monitoring COVID-19 immunity, aiding in timely decisions on the administration of booster vaccines and general public health policies to protect vulnerable populations. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 19 Pages, 4 Figures

Journal ref: ACS Nano (2024)

arXiv:2312.12402 [pdf, other]

Inferring geometrical dynamics of cell nucleus translocation

Authors: Sirine Amiri, Yirui Zhang, Andonis Gerardos, Cécile Sykes, Pierre Ronceray

Abstract: The ability of eukaryotic cells to squeeze through constrictions is limited by the stiffness of their large and rigid nucleus. However, migrating cells are often able to overcome this limitation and pass through constrictions much smaller than their nucleus, a mechanism that is not yet understood. This is what we address here through a data-driven approach using microfluidic devices where cells mi… ▽ More The ability of eukaryotic cells to squeeze through constrictions is limited by the stiffness of their large and rigid nucleus. However, migrating cells are often able to overcome this limitation and pass through constrictions much smaller than their nucleus, a mechanism that is not yet understood. This is what we address here through a data-driven approach using microfluidic devices where cells migrate through controlled narrow spaces of sizes comparable to the ones encountered in physiological situations. Stochastic Force Inference is applied to experimental nuclear trajectories and nuclear shape descriptors, resulting in equations that effectively describe this phenomenon of nuclear translocation. By employing a model where the channel geometry is an explicit parameter and by training it over experimental data with different sizes of constrictions, we ensure that the resulting equations are predictive to other geometries. Altogether, the approach developed here paves the way for a mechanistic and quantitative description of dynamical cell complexity during its motility. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.04019 [pdf, other]

Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models

Authors: Yijie Zhang, Zhangyang Gao, Cheng Tan, Stan Z. Li

Abstract: Predicting protein stability changes induced by single-point mutations has been a persistent challenge over the years, attracting immense interest from numerous researchers. The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry, including drug development, protein evolution analysis, and enzyme synthesis. Despite the proposition… ▽ More Predicting protein stability changes induced by single-point mutations has been a persistent challenge over the years, attracting immense interest from numerous researchers. The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry, including drug development, protein evolution analysis, and enzyme synthesis. Despite the proposition of multiple methodologies aimed at addressing this issue, few approaches have successfully achieved optimal performance coupled with high computational efficiency. Two principal hurdles contribute to the existing challenges in this domain. The first is the complexity of extracting and aggregating sufficiently representative features from proteins. The second refers to the limited availability of experimental data for protein mutation analysis, further complicating the comprehensive evaluation of model performance on unseen data samples. With the advent of Large Language Models(LLM), such as the ESM models in protein research, profound interpretation of protein features is now accessibly aided by enormous training data. Therefore, LLMs are indeed to facilitate a wide range of protein research. In our study, we introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations. Furthermore, we have curated a dataset meticulously designed to preclude data leakage, corresponding to two extensively employed test datasets, to facilitate a more equitable model comparison. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.02953 [pdf]

Longitudinal Assessment of Seasonal Impacts and Depression Associations on Circadian Rhythm Using Multimodal Wearable Sensing

Authors: Yuezhou Zhang, Amos A Folarin, Shaoxiong Sun, Nicholas Cummins, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Pauline Conde, Heet Sankesara, Petroula Laiou, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Srinivasan Vairavan, Inez Myin-Germeys, David C. Mohr, Til Wykes, Josep Maria Haro, Peter Annas, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf , et al. (2 additional authors not shown)

Abstract: Objective: This study aimed to explore the associations between depression severity and wearable-measured circadian rhythms, accounting for seasonal impacts and quantifying seasonal changes in circadian rhythms.Materials and Methods: Data used in this study came from a large longitudinal mobile health study. Depression severity (measured biweekly using the 8-item Patient Health Questionnaire [PHQ-… ▽ More Objective: This study aimed to explore the associations between depression severity and wearable-measured circadian rhythms, accounting for seasonal impacts and quantifying seasonal changes in circadian rhythms.Materials and Methods: Data used in this study came from a large longitudinal mobile health study. Depression severity (measured biweekly using the 8-item Patient Health Questionnaire [PHQ-8]) and behaviors (monitored by Fitbit) were tracked for up to two years. Twelve features were extracted from Fitbit recordings to approximate circadian rhythms. Three nested linear mixed-effects models were employed for each feature: (1) incorporating the PHQ-8 score as an independent variable; (2) adding the season variable; and (3) adding an interaction term between season and the PHQ-8 score. Results: This study analyzed 10,018 PHQ-8 records with Fitbit data from 543 participants. Upon adjusting for seasonal effects, higher PHQ-8 scores were associated with reduced activity, irregular behaviors, and delayed rhythms. Notably, the negative association with daily step counts was stronger in summer and spring than in winter, and the positive association with the onset of the most active continuous 10-hour period was significant only during summer. Furthermore, participants had shorter and later sleep, more activity, and delayed circadian rhythms in summer compared to winter. Discussion and Conclusions: Our findings underscore the significant seasonal impacts on human circadian rhythms and their associations with depression and indicate that wearable-measured circadian rhythms have the potential to be the digital biomarkers of depression. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.09261 [pdf, other]

Emerging Drug Interaction Prediction Enabled by Flow-based Graph Neural Network with Biomedical Network

Authors: Yongqi Zhang, Quanming Yao, Ling Yue, Xian Wu, Ziheng Zhang, Zhenxi Lin, Yefeng Zheng

Abstract: Accurately predicting drug-drug interactions (DDI) for emerging drugs, which offer possibilities for treating and alleviating diseases, with computational methods can improve patient care and contribute to efficient drug development. However, many existing computational methods require large amounts of known DDI information, which is scarce for emerging drugs. In this paper, we propose EmerGNN, a… ▽ More Accurately predicting drug-drug interactions (DDI) for emerging drugs, which offer possibilities for treating and alleviating diseases, with computational methods can improve patient care and contribute to efficient drug development. However, many existing computational methods require large amounts of known DDI information, which is scarce for emerging drugs. In this paper, we propose EmerGNN, a graph neural network (GNN) that can effectively predict interactions for emerging drugs by leveraging the rich information in biomedical networks. EmerGNN learns pairwise representations of drugs by extracting the paths between drug pairs, propagating information from one drug to the other, and incorporating the relevant biomedical concepts on the paths. The different edges on the biomedical network are weighted to indicate the relevance for the target DDI prediction. Overall, EmerGNN has higher accuracy than existing approaches in predicting interactions for emerging drugs and can identify the most relevant information on the biomedical network. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Accepted by Nature Computational Science

arXiv:2311.06657 [pdf]

Combined certainty and uncertainty across development frees phenotypic variation in evolution

Authors: Yue Zhang

Abstract: Developmental bias plays a major role in phenotypic evolution. Some researchers have argued that phenotypes, regulated by development, can only evolve along restricted trajectory under certain scenarios, such as the case for mammalian molar size ratios. However, this view has been challenged. Broadly speaking, sources for phenotypic variation remain largely unknown. The study here presents a gener… ▽ More Developmental bias plays a major role in phenotypic evolution. Some researchers have argued that phenotypes, regulated by development, can only evolve along restricted trajectory under certain scenarios, such as the case for mammalian molar size ratios. However, this view has been challenged. Broadly speaking, sources for phenotypic variation remain largely unknown. The study here presents a generalized Inhibitory Cascade Model and explains that the original model described only means of phenotypes resulting from selection when viewed under a higher taxonomic scope. Consequently, I propose the combined property of development: certainty, when the prior intersegmental inhibition is strong, and uncertainty, when the opposite holds. This property potentially not only explains counterintuitively high levels of developmental instability, but also plays an essential role in generating phenotypic variation. △ Less

Submitted 14 November, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

arXiv:2310.18533 [pdf, other]

Evaluating the effects of high-throughput structural neuroimaging predictors on whole-brain functional connectome outcomes via network-based vector-on-matrix regression

Authors: Tong Lu, Yuan Zhang, Vince Lyzinski, Chuan Bi, Peter Kochunov, Elliot Hong, Shuo Chen

Abstract: The joint analysis of multimodal neuroimaging data is critical in the field of brain research because it reveals complex interactive relationships between neurobiological structures and functions. In this study, we focus on investigating the effects of structural imaging (SI) features, including white matter micro-structure integrity (WMMI) and cortical thickness, on the whole brain functional con… ▽ More The joint analysis of multimodal neuroimaging data is critical in the field of brain research because it reveals complex interactive relationships between neurobiological structures and functions. In this study, we focus on investigating the effects of structural imaging (SI) features, including white matter micro-structure integrity (WMMI) and cortical thickness, on the whole brain functional connectome (FC) network. To achieve this goal, we propose a network-based vector-on-matrix regression model to characterize the FC-SI association patterns. We have developed a novel multi-level dense bipartite and clique subgraph extraction method to identify which subsets of spatially specific SI features intensively influence organized FC sub-networks. The proposed method can simultaneously identify highly correlated structural-connectomic association patterns and suppress false positive findings while handling millions of potential interactions. We apply our method to a multimodal neuroimaging dataset of 4,242 participants from the UK Biobank to evaluate the effects of whole-brain WMMI and cortical thickness on the resting-state FC. The results reveal that the WMMI on corticospinal tracts and inferior cerebellar peduncle significantly affect functional connections of sensorimotor, salience, and executive sub-networks with an average correlation of 0.81 (p<0.001). △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 20 pages, 5 figures, 2 tables

arXiv:2310.12035 [pdf]

Tracking dynamic flow: Decoding flow fluctuations through performance in a fine motor control task

Authors: Bohao Tian, Shijun Zhang, Sirui Chen, Yuru Zhang, Kaiping Peng, Hongxing Zhang, Dangxiao Wang

Abstract: Flow, an optimal mental state merging action and awareness, significantly impacts our emotion, performance, and well-being. However, capturing its swift fluctuations on a fine timescale is challenging due to the sparsity of the existing flow detecting tools. Here we present a fine fingertip force control (F3C) task to induce flow, wherein the task challenge is set at a compatible level with person… ▽ More Flow, an optimal mental state merging action and awareness, significantly impacts our emotion, performance, and well-being. However, capturing its swift fluctuations on a fine timescale is challenging due to the sparsity of the existing flow detecting tools. Here we present a fine fingertip force control (F3C) task to induce flow, wherein the task challenge is set at a compatible level with personal skill, and to quantitatively track the flow state variations from synchronous motor control performance. We extract eight performance metrics from fingertip force sequence and reveal their significant differences under distinct flow states. Further, we built a learning-based flow decoder that aims to predict the continuous flow intensity during the user experiment through the selected performance metrics, taking the self-reported flow as the label. Cross-validation shows that the predicted flow intensity reaches significant correlation with the self-reported flow intensity (r=0.81). Based on the decoding results, we observe rapid oscillations in flow fluctuations during the intervals between sparse self-reporting probes. This study showcases the feasibility of tracking intrinsic flow variations with high temporal resolution using task performance measures and may serve as foundation for future work aiming to take advantage of flow' s dynamics to enhance performance and positive emotions. △ Less

Submitted 28 December, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.10978 [pdf]

NeuroQuantify -- An Image Analysis Software for Detection and Quantification of Neurons and Neurites using Deep Learning

Authors: Ka My Dang, Yi Jia Zhang, Tianchen Zhang, Chao Wang, Anton Sinner, Piero Coronica, Joyce K. S. Poon

Abstract: The segmentation of cells and neurites in microscopy images of neuronal networks provides valuable quantitative information about neuron growth and neuronal differentiation, including the number of cells, neurites, neurite length and neurite orientation. This information is essential for assessing the development of neuronal networks in response to extracellular stimuli, which is useful for studyi… ▽ More The segmentation of cells and neurites in microscopy images of neuronal networks provides valuable quantitative information about neuron growth and neuronal differentiation, including the number of cells, neurites, neurite length and neurite orientation. This information is essential for assessing the development of neuronal networks in response to extracellular stimuli, which is useful for studying neuronal structures, for example, the study of neurodegenerative diseases and pharmaceuticals. However, automatic and accurate analysis of neuronal structures from phase contrast images has remained challenging. To address this, we have developed NeuroQuantify, an open-source software that uses deep learning to efficiently and quickly segment cells and neurites in phase contrast microscopy images. NeuroQuantify offers several key features: (i) automatic detection of cells and neurites; (ii) post-processing of the images for the quantitative neurite length measurement based on segmentation of phase contrast microscopy images, and (iii) identification of neurite orientations. The user-friendly NeuroQuantify software can be installed and freely downloaded from GitHub https://github.com/StanleyZ0528/neural-image-segmentation. △ Less

Submitted 19 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.07464 [pdf]

Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma

Authors: Zijie Fang, Yihan Liu, Yifeng Wang, Xiangyang Zhang, Yang Chen, Changjing Cai, Yiyang Lin, Ying Han, Zhi Wang, Shan Zeng, Hong Shen, Jun Tan, Yongbing Zhang

Abstract: Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a… ▽ More Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a Multi-Biomarker Histomorphology Discoverer (Multi-Beholder) model based on the multiple instance learning (MIL) framework, to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Specifically, by incorporating the one-class classification into the MIL framework, accurate instance pseudo-labeling is realized for instance-level supervision, which greatly complements the slide-level labels and improves the biomarker prediction performance. Multi-Beholder demonstrates superior prediction performance and generalizability for five LGG biomarkers (AUROC=0.6469-0.9735) in two cohorts (n=607) with diverse races and scanning protocols. Moreover, the excellent interpretability of Multi-Beholder allows for discovering the quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 47 pages, 6 figures

arXiv:2310.04563 [pdf, other]

Modeling the Risk of In-Person Instruction during the COVID-19 Pandemic

Authors: Brian Liu, Yujia Zhang, Shane G. Henderson, David B. Shmoys, Peter I. Frazier

Abstract: During the COVID-19 pandemic, safely implementing in-person indoor instruction was a high priority for universities nationwide. To support this effort at the University, we developed a mathematical model for estimating the risk of SARS-CoV-2 transmission in university classrooms. This model was used to evaluate combinations of feasible interventions for classrooms at the University during the pand… ▽ More During the COVID-19 pandemic, safely implementing in-person indoor instruction was a high priority for universities nationwide. To support this effort at the University, we developed a mathematical model for estimating the risk of SARS-CoV-2 transmission in university classrooms. This model was used to evaluate combinations of feasible interventions for classrooms at the University during the pandemic and optimize the set of interventions that would allow higher occupancy levels, matching the pre-pandemic numbers of in-person courses. Importantly, we determined that requiring masking in dense classrooms with unrestricted seating with more than 90% of students vaccinated was easy to implement, incurred little logistical or financial cost, and allowed classes to be held at full capacity. A retrospective analysis at the end of the semester confirmed the model's assessment that the proposed classroom configuration would be safe. Our framework is generalizable and was used to support reopening decisions at Stanford University. In addition, our framework is flexible and applies to a wide range of indoor settings. It was repurposed for large university events and gatherings and could be used to support planning indoor space use to avoid transmission of infectious diseases across various industries, from secondary schools to movie theaters and restaurants. △ Less

Submitted 19 February, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.14404 [pdf]

pLMFPPred: a novel approach for accurate prediction of functional peptides integrating embedding from pre-trained protein language model and imbalanced learning

Authors: Zebin Ma, Yonglin Zou, Xiaobin Huang, Wenjin Yan, Hao Xu, Jiexin Yang, Ying Zhang, Jinqi Huang

Abstract: Functional peptides have the potential to treat a variety of diseases. Their good therapeutic efficacy and low toxicity make them ideal therapeutic agents. Artificial intelligence-based computational strategies can help quickly identify new functional peptides from collections of protein sequences and discover their different functions.Using protein language model-based embeddings (ESM-2), we deve… ▽ More Functional peptides have the potential to treat a variety of diseases. Their good therapeutic efficacy and low toxicity make them ideal therapeutic agents. Artificial intelligence-based computational strategies can help quickly identify new functional peptides from collections of protein sequences and discover their different functions.Using protein language model-based embeddings (ESM-2), we developed a tool called pLMFPPred (Protein Language Model-based Functional Peptide Predictor) for predicting functional peptides and identifying toxic peptides. We also introduced SMOTE-TOMEK data synthesis sampling and Shapley value-based feature selection techniques to relieve data imbalance issues and reduce computational costs. On a validated independent test set, pLMFPPred achieved accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score values of 0.974, 0.99, and 0.974, respectively. Comparative experiments show that pLMFPPred outperforms current methods for predicting functional peptides.The experimental results suggest that the proposed method (pLMFPPred) can provide better performance in terms of Accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score than existing methods. pLMFPPred has achieved good performance in predicting functional peptides and represents a new computational method for predicting functional peptides. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 20 pages, 5 figures,under review

arXiv:2309.10128 [pdf, other]

Markov Chain-Guided Graph Construction and Sampling Depth Optimization for EEG-Based Mental Disorder Detection

Authors: Yihan Wu, Tao Chang, Peng Xu, Yangsong Zhang

Abstract: Graph Neural Networks (GNNs) have received considerable attention since its introduction. It has been widely applied in various fields due to its ability to represent graph structured data. However, the application of GNNs is constrained by two main issues. Firstly, the "over-smoothing" problem restricts the use of deeper network structures. Secondly, GNNs' applicability is greatly limited when no… ▽ More Graph Neural Networks (GNNs) have received considerable attention since its introduction. It has been widely applied in various fields due to its ability to represent graph structured data. However, the application of GNNs is constrained by two main issues. Firstly, the "over-smoothing" problem restricts the use of deeper network structures. Secondly, GNNs' applicability is greatly limited when nodes and edges are not clearly defined and expressed, as is the case with EEG data.In this study, we proposed an innovative approach that harnesses the distinctive properties of the graph structure's Markov Chain to optimize the sampling depth of deep graph convolution networks. We introduced a tailored method for constructing graph structures specifically designed for analyzing EEG data, alongside the development of a vertex-level GNN classification model for precise detection of mental disorders. In order to verify the method's performance, we conduct experiments on two disease datasets using a subject-independent experiment scenario. For the Schizophrenia (SZ) data, our method achieves an average accuracy of 100% using only the first 300 seconds of data from each subject. Similarly, for Major Depressive Disorder (MDD) data, the method yields average accuracies of over 99%. These experiments demonstrate the method's ability to effectively distinguish between healthy control (HC) subjects and patients with mental disorders. We believe this method shows great promise for clinical diagnosis. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 5 figures, 4 tables

arXiv:2309.05088 [pdf]

Towards Trustworthy Artificial Intelligence for Equitable Global Health

Authors: Hong Qin, Jude Kong, Wandi Ding, Ramneek Ahluwalia, Christo El Morr, Zeynep Engin, Jake Okechukwu Effoduh, Rebecca Hwa, Serena Jingchuan Guo, Laleh Seyyed-Kalantari, Sylvia Kiwuwa Muyingo, Candace Makeda Moore, Ravi Parikh, Reva Schwartz, Dongxiao Zhu, Xiaoqian Wang, Yiye Zhang

Abstract: Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a glob… ▽ More Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a global mix of experts from various disciplines, community health practitioners, policymakers, and more. Topics covered included managing AI bias in socio-technical systems, AI's potential impacts on global health, and balancing data privacy with transparency. Panel discussions examined the cultural, political, and ethical dimensions of AI in global health. FairMI4GH aimed to stimulate dialogue, facilitate knowledge transfer, and spark innovative solutions. Drawing from NIST's AI Risk Management Framework, it provided suggestions for handling AI risks and biases. The need to mitigate data biases from the research design stage, adopt a human-centered approach, and advocate for AI transparency was recognized. Challenges such as updating legal frameworks, managing cross-border data sharing, and motivating developers to reduce bias were acknowledged. The event emphasized the necessity of diverse viewpoints and multi-dimensional dialogue for creating a fair and ethical AI framework for equitable global health. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 7 pages

arXiv:2308.11773 [pdf]

Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model

Authors: Yuezhou Zhang, Amos A Folarin, Judith Dineley, Pauline Conde, Valeria de Angel, Shaoxiong Sun, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Petroula Laiou, Heet Sankesara, Linglong Qian, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Björn W. Schuller, Srinivasan Vairavan, Til Wykes, Josep Maria Haro, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf , et al. (3 additional authors not shown)

Abstract: Language use has been shown to correlate with depression, but large-scale validation is needed. Traditional methods like clinic studies are expensive. So, natural language processing has been employed on social media to predict depression, but limitations remain-lack of validated labels, biased user samples, and no context. Our study identified 29 topics in 3919 smartphone-collected speech recordi… ▽ More Language use has been shown to correlate with depression, but large-scale validation is needed. Traditional methods like clinic studies are expensive. So, natural language processing has been employed on social media to predict depression, but limitations remain-lack of validated labels, biased user samples, and no context. Our study identified 29 topics in 3919 smartphone-collected speech recordings from 265 participants using the Whisper tool and BERTopic model. Six topics with a median PHQ-8 greater than or equal to 10 were regarded as risk topics for depression: No Expectations, Sleep, Mental Therapy, Haircut, Studying, and Coursework. To elucidate the topic emergence and associations with depression, we compared behavioral (from wearables) and linguistic characteristics across identified topics. The correlation between topic shifts and changes in depression severity over time was also investigated, indicating the importance of longitudinally monitoring language use. We also tested the BERTopic model on a similar smaller dataset (356 speech recordings from 57 participants), obtaining some consistent results. In summary, our findings demonstrate specific speech topics may indicate depression severity. The presented data-driven workflow provides a practical approach to collecting and analyzing large-scale speech data from real-world settings for digital health research. △ Less

Submitted 5 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.05864 [pdf, other]

doi 10.1038/s41592-024-02233-6

The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

Authors: Jun Ma, Ronald Xie, Shamini Ayyadhury, Cheng Ge, Anubha Gupta, Ritu Gupta, Song Gu, Yao Zhang, Gihun Lee, Joonkee Kim, Wei Lou, Haofeng Li, Eric Upschulte, Timo Dickscheid, José Guilherme de Almeida, Yixin Wang, Lin Han, Xin Yang, Marco Labagnara, Vojislav Gligorovski, Maxime Scheder, Sahand Jamal Rahi, Carly Kempster, Alice Pollitt, Leon Espinosa , et al. (15 additional authors not shown)

Abstract: Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diver… ▽ More Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging. △ Less

Submitted 1 April, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: NeurIPS22 Cell Segmentation Challenge: https://neurips22-cellseg.grand-challenge.org/ . Nature Methods (2024)

arXiv:2308.01402 [pdf, other]

Machine Learning-guided Lipid Nanoparticle Design for mRNA Delivery

Authors: Daisy Yi Ding, Yuhui Zhang, Yuan Jia, Jiuzhi Sun

Abstract: While RNA technologies hold immense therapeutic potential in a range of applications from vaccination to gene editing, the broad implementation of these technologies is hindered by the challenge of delivering these agents effectively. Lipid nanoparticles have emerged as one of the most widely used delivery agents, but their design optimization relies on laborious and costly experimental methods. W… ▽ More While RNA technologies hold immense therapeutic potential in a range of applications from vaccination to gene editing, the broad implementation of these technologies is hindered by the challenge of delivering these agents effectively. Lipid nanoparticles have emerged as one of the most widely used delivery agents, but their design optimization relies on laborious and costly experimental methods. We propose to in silico optimize LNP design with machine learning models. On a curated dataset of 622 LNPs from published studies, we demonstrate the effectiveness of our model in predicting the transfection efficiency of unseen LNPs, with the multilayer perceptron achieving a classification accuracy of 98% on the test set. Our work represents a pioneering effort in combining ML and LNP design, offering significant potential for improving screening efficiency by computationally prioritizing LNP candidates for experimental validation and accelerating the development of effective mRNA delivery systems. △ Less

Submitted 28 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

Comments: The 2023 ICML Workshop on Computational Biology

arXiv:2307.06235 [pdf, other]

Multimodal Molecular Pretraining via Modality Blending

Authors: Qiying Yu, Yudi Zhang, Yuyan Ni, Shikun Feng, Yanyan Lan, Hao Zhou, Jingjing Liu

Abstract: Self-supervised learning has recently gained growing interest in molecular modeling for scientific tasks such as AI-assisted drug discovery. Current studies consider leveraging both 2D and 3D molecular structures for representation learning. However, relying on straightforward alignment strategies that treat each modality separately, these methods fail to exploit the intrinsic correlation between… ▽ More Self-supervised learning has recently gained growing interest in molecular modeling for scientific tasks such as AI-assisted drug discovery. Current studies consider leveraging both 2D and 3D molecular structures for representation learning. However, relying on straightforward alignment strategies that treat each modality separately, these methods fail to exploit the intrinsic correlation between 2D and 3D representations that reflect the underlying structural characteristics of molecules, and only perform coarse-grained molecule-level alignment. To derive fine-grained alignment and promote structural molecule understanding, we introduce an atomic-relation level "blend-then-predict" self-supervised learning approach, MoleBLEND, which first blends atom relations represented by different modalities into one unified relation matrix for joint encoding, then recovers modality-specific information for 2D and 3D structures individually. By treating atom relationships as anchors, MoleBLEND organically aligns and integrates visually dissimilar 2D and 3D modalities of the same molecule at fine-grained atomic level, painting a more comprehensive depiction of each molecule. Extensive experiments show that MoleBLEND achieves state-of-the-art performance across major 2D/3D molecular benchmarks. We further provide theoretical insights from the perspective of mutual-information maximization, demonstrating that our method unifies contrastive, generative (cross-modality prediction) and mask-then-predict (single-modality prediction) objectives into one single cohesive framework. △ Less

Submitted 8 October, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

arXiv:2307.00511 [pdf]

SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration

Authors: Jianxun Ren, Ning An, Youjia Zhang, Danyang Wang, Zhenyu Sun, Cong Lin, Weigang Cui, Weiwei Wang, Ying Zhou, Wei Zhang, Qingyu Hu, Ping Zhang, Dan Hu, Danhong Wang, Hesheng Liu

Abstract: Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a lea… ▽ More Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.13957 [pdf, other]

DiffDTM: A conditional structure-free framework for bioactive molecules generation targeted for dual proteins

Authors: Lei Huang, Zheng Yuan, Huihui Yan, Rong Sheng, Linjing Liu, Fuzhou Wang, Weidun Xie, Nanjun Chen, Fei Huang, Songfang Huang, Ka-Chun Wong, Yaoyun Zhang

Abstract: Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free… ▽ More Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free deep generative model based on a diffusion model for dual targets based molecule generation to address the above issues. Specifically, DiffDTM receives protein sequences and molecular graphs as inputs instead of protein and molecular conformations and incorporates an information fusion module to achieve conditional generation in a one-shot manner. We have conducted comprehensive multi-view experiments to demonstrate that DiffDTM can generate drug-like, synthesis-accessible, novel, and high-binding affinity molecules targeting specific dual proteins, outperforming the state-of-the-art (SOTA) models in terms of multiple evaluation metrics. Furthermore, we utilized DiffDTM to generate molecules towards dopamine receptor D2 and 5-hydroxytryptamine receptor 1A as new antipsychotics. The experimental results indicate that DiffDTM can be easily plugged into unseen dual targets to generate bioactive molecules, addressing the issues of requiring insufficient active molecule data for training as well as the need to retrain when encountering new targets. △ Less

Submitted 24 June, 2023; originally announced June 2023.

arXiv:2306.10949 [pdf, ps, other]

Cooperation of myosin II in muscle contraction through nonlinear elasticity

Authors: Beibei Shen, Yunxin Zhang

Abstract: Myosin II plays a pivotal role in muscle contraction by generating force through the cooperative action of multiple motors on actin filaments. In this study, we integrate the nonlinear elasticity of the neck linker in individual myosin II and comprehensively investigate the evolution of cooperativity and dynamics at {\it microstate} and {\it mesostate} levels using a combined model of single and m… ▽ More Myosin II plays a pivotal role in muscle contraction by generating force through the cooperative action of multiple motors on actin filaments. In this study, we integrate the nonlinear elasticity of the neck linker in individual myosin II and comprehensively investigate the evolution of cooperativity and dynamics at {\it microstate} and {\it mesostate} levels using a combined model of single and multiple motors. We find that a substantial proportion of actin-bound motors reside in the {\it mid-} and {\it post-power stroke} states, and our nonlinear model reveals their increased capacity for load sharing. Additionally, we systematically explore the impact of mechanical load and ATP concentration on myosin II motors. Notably, we observe that the average net distance of actin undergoes a transition from a weak load-sensitive regime at low ATP concentrations to a load-sensitive regime at higher ATP concentrations. Furthermore, increasing the load or raising the ATP concentration to saturation can enhance the efficiency and output power of myosin filament. Moreover, the efficiency of the myosin filament increases with the power stroke strength, reaching a maximum at a specific range, and subsequently declining beyond that threshold. Finally, we explore the mean run time/length and mean existence probability of myosin filament, shedding light on its overall behavior. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: 10 pages, 7 figures

arXiv:2306.07505 [pdf]

Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV. Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01). Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2306.01794 [pdf, other]

DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing

Authors: Yangtian Zhang, Zuobai Zhang, Bozitao Zhong, Sanchit Misra, Jian Tang

Abstract: Proteins play a critical role in carrying out biological functions, and their 3D structures are essential in determining their functions. Accurately predicting the conformation of protein side-chains given their backbones is important for applications in protein structure prediction, design and protein-protein interactions. Traditional methods are computationally intensive and have limited accurac… ▽ More Proteins play a critical role in carrying out biological functions, and their 3D structures are essential in determining their functions. Accurately predicting the conformation of protein side-chains given their backbones is important for applications in protein structure prediction, design and protein-protein interactions. Traditional methods are computationally intensive and have limited accuracy, while existing machine learning methods treat the problem as a regression task and overlook the restrictions imposed by the constant covalent bond lengths and angles. In this work, we present DiffPack, a torsional diffusion model that learns the joint distribution of side-chain torsional angles, the only degrees of freedom in side-chain packing, by diffusing and denoising on the torsional space. To avoid issues arising from simultaneous perturbation of all four torsional angles, we propose autoregressively generating the four torsional angles from $χ_1$ to $χ_4$ and training diffusion models for each torsional angle. We evaluate the method on several benchmarks for protein side-chain packing and show that our method achieves improvements of $11.9\%$ and $13.5\%$ in angle accuracy on CASP13 and CASP14, respectively, with a significantly smaller model size ($60\times$ fewer parameters). Additionally, we show the effectiveness of our method in enhancing side-chain predictions in the AlphaFold2 model. Code is available at https://github.com/DeepGraphLearning/DiffPack. △ Less

Submitted 15 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2305.19043 [pdf, other]

A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction

Authors: Guillaume Huguet, Alexander Tong, Edward De Brouwer, Yanlei Zhang, Guy Wolf, Ian Adelstein, Smita Krishnaswamy

Abstract: Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensional, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoret… ▽ More Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensional, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoretical links have been established. Here, we establish such a link via results in Riemannian geometry explicitly connecting heat diffusion to manifold distances. In this process, we also formulate a more general heat kernel based manifold embedding method that we call heat geodesic embeddings. This novel perspective makes clearer the choices available in manifold learning and denoising. Results show that our method outperforms existing state of the art in preserving ground truth manifold distances, and preserving cluster structure in toy datasets. We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure, where our method enables interpolation of withheld timepoints of data. Finally, we show that parameters of our more general method can be configured to give results similar to PHATE (a state-of-the-art diffusion based manifold learning method) as well as SNE (an attraction/repulsion neighborhood based method that forms the basis of t-SNE). △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 31 pages, 13 figures, 10 tables

arXiv:2305.05093 [pdf]

Prokaryotic genome editing based on the subtype I-B-Svi CRISPR-Cas system

Authors: Wang-Yu Tong, De-Xiang Yong, Xin Xu, Cai-Hua Qiu, Yan Zhang, Xing-Wang Yang, Ting-Ting Xia, Qing-Yang Liu, Su-Li Cao, Yan Sun, Xue Li

Abstract: Type I CRISPR-Cas systems are the most common among six types of CRISPR-Cas systems, however, non-self-targeting genome editing based on a single Cas3 of type I CRISPR-Cas systems has not been reported. Here, we present the subtype I-B-Svi CRISPR-Cas system (with three confirmed CRISPRs and a cas gene cluster) and genome editing based on this system found in Streptomyces virginiae IBL14. Important… ▽ More Type I CRISPR-Cas systems are the most common among six types of CRISPR-Cas systems, however, non-self-targeting genome editing based on a single Cas3 of type I CRISPR-Cas systems has not been reported. Here, we present the subtype I-B-Svi CRISPR-Cas system (with three confirmed CRISPRs and a cas gene cluster) and genome editing based on this system found in Streptomyces virginiae IBL14. Importantly, like the animal-derived bacterial protein SpCas9 (1368 amino-acids), the single, compact, non-animal-derived bacterial protein SviCas3 (771 amino-acids) can also direct template-based microbial genome editing through the target cell's own homology-directed repair system, which breaks the view that the genome editing based on type I CRISPR-Cas systems requires a full Cascade. Notably, no off-target changes or indel-formation were detected in the analysis of potential off-target sites. This discovery broadens our understanding of the diversity of type I CRISPR-Cas systems and will facilitate new developments in genome editing tools. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 113 pages, 10 figures, and 6 tables

arXiv:2305.03061 [pdf, other]

Mining fMRI Dynamics with Parcellation Prior for Brain Disease Diagnosis

Authors: Xiaozhao Liu, Mianxin Liu, Lang Mei, Yuyao Zhang, Feng Shi, Han Zhang, Dinggang Shen

Abstract: To characterize atypical brain dynamics under diseases, prevalent studies investigate functional magnetic resonance imaging (fMRI). However, most of the existing analyses compress rich spatial-temporal information as the brain functional networks (BFNs) and directly investigate the whole-brain network without neurological priors about functional subnetworks. We thus propose a novel graph learning… ▽ More To characterize atypical brain dynamics under diseases, prevalent studies investigate functional magnetic resonance imaging (fMRI). However, most of the existing analyses compress rich spatial-temporal information as the brain functional networks (BFNs) and directly investigate the whole-brain network without neurological priors about functional subnetworks. We thus propose a novel graph learning framework to mine fMRI signals with topological priors from brain parcellation for disease diagnosis. Specifically, we 1) detect diagnosis-related temporal features using a "Transformer" for a higher-level BFN construction, and process it with a following graph convolutional network, and 2) apply an attention-based multiple instance learning strategy to emphasize the disease-affected subnetworks to further enhance the diagnosis performance and interpretability. Experiments demonstrate higher effectiveness of our method than compared methods in the diagnosis of early mild cognitive impairment. More importantly, our method is capable of localizing crucial brain subnetworks during the diagnosis, providing insights into the pathogenic source of mild cognitive impairment. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: 5 pages, 2 figures, conference paper, accepted by IEEE International Symposium on Biomedical Imaging (ISBI) 2023

arXiv:2304.09729 [pdf, other]

De novo reconstruction of satellite repeat units from sequence data

Authors: Yujie Zhang, Justin Chu, Haoyu Cheng, Heng Li

Abstract: Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algo… ▽ More Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we showed that SRF could reconstruct known satellites in human and well-studied model organisms. We also found satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress on genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2303.14193 [pdf, other]

Quadratic Graph Attention Network (Q-GAT) for Robust Construction of Gene Regulatory Networks

Authors: Hui Zhang, Xuexin An, Qiang He, Yudong Yao, Yudong Zhang, Feng-Lei Fan, Yueyang Teng

Abstract: Gene regulatory relationships can be abstracted as a gene regulatory network (GRN), which plays a key role in characterizing complex cellular processes and pathways. Recently, graph neural networks (GNNs), as a class of deep learning models, have emerged as a useful tool to infer gene regulatory relationships from gene expression data. However, deep learning models have been found to be vulnerable… ▽ More Gene regulatory relationships can be abstracted as a gene regulatory network (GRN), which plays a key role in characterizing complex cellular processes and pathways. Recently, graph neural networks (GNNs), as a class of deep learning models, have emerged as a useful tool to infer gene regulatory relationships from gene expression data. However, deep learning models have been found to be vulnerable to noise, which greatly hinders the adoption of deep learning in constructing GRNs, because high noise is often unavoidable in the process of gene expression measurement. Can we preferably prototype a robust GNN for constructing GRNs? In this paper, we give a positive answer by proposing a Quadratic Graph Attention Network (Q-GAT) with a dual attention mechanism. We study the changes in the predictive accuracy of Q-GAT and 9 state-of-the-art baselines by introducing different levels of adversarial perturbations. Experiments in the E. coli and S. cerevisiae datasets suggest that Q-GAT outperforms the state-of-the-art models in robustness. Lastly, we dissect why Q-GAT is robust through the signal-to-noise ratio (SNR) and interpretability analyses. The former informs that nonlinear aggregation of quadratic neurons can amplify useful signals and suppress unwanted noise, thereby facilitating robustness, while the latter reveals that Q-GAT can leverage more features in prediction thanks to the dual attention mechanism, which endows Q-GAT with the ability to confront adversarial perturbation. We have shared our code in https://github.com/Minorway/Q-GAT_for_Robust_Construction_of_GRN for readers' evaluation. △ Less

Submitted 4 November, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.07830 [pdf]

Emergent Bio-Functional Similarities in a Cortical-Spike-Train-Decoding Spiking Neural Network Facilitate Predictions of Neural Computation

Authors: Tengjun Liu, Yansong Chua, Yiwei Zhang, Yuxiao Ning, Pengfu Liu, Guihua Wan, Zijun Wan, Shaomin Zhang, Weidong Chen

Abstract: Despite its better bio-plausibility, goal-driven spiking neural network (SNN) has not achieved applicable performance for classifying biological spike trains, and showed little bio-functional similarities compared to traditional artificial neural networks. In this study, we proposed the motorSRNN, a recurrent SNN topologically inspired by the neural motor circuit of primates. By employing the moto… ▽ More Despite its better bio-plausibility, goal-driven spiking neural network (SNN) has not achieved applicable performance for classifying biological spike trains, and showed little bio-functional similarities compared to traditional artificial neural networks. In this study, we proposed the motorSRNN, a recurrent SNN topologically inspired by the neural motor circuit of primates. By employing the motorSRNN in decoding spike trains from the primary motor cortex of monkeys, we achieved a good balance between classification accuracy and energy consumption. The motorSRNN communicated with the input by capturing and cultivating more cosine-tuning, an essential property of neurons in the motor cortex, and maintained its stability during training. Such training-induced cultivation and persistency of cosine-tuning was also observed in our monkeys. Moreover, the motorSRNN produced additional bio-functional similarities at the single-neuron, population, and circuit levels, demonstrating biological authenticity. Thereby, ablation studies on motorSRNN have suggested long-term stable feedback synapses contribute to the training-induced cultivation in the motor cortex. Besides these novel findings and predictions, we offer a new framework for building authentic models of neural computation. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Showing 1–50 of 229 results for author: Zhang, Y