Search | arXiv e-print repository

Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

Authors: Pengfei Zhang, Zhihang Zheng, Shichen Zhang, Minghao Yang, Shaojun Tang

Abstract: Compared with invasive examinations that require tissue sampling, respiratory sound testing is a non-invasive examination method that is safer and easier for patients to accept. In this study, we introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition. Rene has been rigorously fine-tuned with an extensive dataset featuring a broad array of respiratory audio sample… ▽ More Compared with invasive examinations that require tissue sampling, respiratory sound testing is a non-invasive examination method that is safer and easier for patients to accept. In this study, we introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition. Rene has been rigorously fine-tuned with an extensive dataset featuring a broad array of respiratory audio samples, targeting disease detection, sound pattern classification, and event identification. Our innovative approach applies a pre-trained speech recognition model to process respiratory sounds, augmented with patient medical records. The resulting multi-modal deep-learning framework addresses interpretability and real-time diagnostic challenges that have hindered previous respiratory-focused models. Benchmark comparisons reveal that Rene significantly outperforms existing models, achieving improvements of 10.27%, 16.15%, 15.29%, and 18.90% in respiratory event detection and audio classification on the SPRSound database. Disease prediction accuracy on the ICBHI database improved by 23% over the baseline in both mean average and harmonic scores. Moreover, we have developed a real-time respiratory sound discrimination system utilizing the Rene architecture. Employing state-of-the-art Edge AI technology, this system enables rapid and accurate responses for respiratory sound auscultation(https://github.com/zpforlove/Rene). △ Less

Submitted 6 June, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

arXiv:2310.10893 [pdf, other]

Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction

Authors: Pengfei Zhang, Seojin Bang, Heewook Lee

Abstract: T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of an… ▽ More T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of annotated TCR-epitope pairs. Annotating their binding affinity requires expensive and time-consuming wet-lab evaluation. To reduce annotation cost, we present ActiveTCR, a framework that incorporates active learning and TCR-epitope binding affinity prediction models. Starting with a small set of labeled training pairs, ActiveTCR iteratively searches for unlabeled TCR-epitope pairs that are ''worth'' for annotation. It aims to maximize performance gains while minimizing the cost of annotation. We compared four query strategies with a random sampling baseline and demonstrated that ActiveTCR reduces annotation costs by approximately 40%. Furthermore, we showed that providing ground truth labels of TCR-epitope pairs to query strategies can help identify and reduce more than 40% redundancy among already annotated pairs without compromising model performance, enabling users to train equally powerful prediction models with less training data. Our work is the first systematic investigation of data optimization for TCR-epitope binding affinity prediction. △ Less

Submitted 30 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 10 pages, 7 figures, this paper has been accepted for publication in the proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2023

arXiv:2307.00511 [pdf]

SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration

Authors: Jianxun Ren, Ning An, Youjia Zhang, Danyang Wang, Zhenyu Sun, Cong Lin, Weigang Cui, Weiwei Wang, Ying Zhou, Wei Zhang, Qingyu Hu, Ping Zhang, Dan Hu, Danhong Wang, Hesheng Liu

Abstract: Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a lea… ▽ More Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.12457 [pdf, other]

Deep Dynamic Epidemiological Modelling for COVID-19 Forecasting in Multi-level Districts

Authors: Ruhan Liu, Jiajia Li, Yang Wen, Huating Li, Ping Zhang, Bin Sheng, David Dagan Feng

Abstract: Objective: COVID-19 has spread worldwide and made a huge influence across the world. Modeling the infectious spread situation of COVID-19 is essential to understand the current condition and to formulate intervention measurements. Epidemiological equations based on the SEIR model simulate disease development. The traditional parameter estimation method to solve SEIR equations could not precisely f… ▽ More Objective: COVID-19 has spread worldwide and made a huge influence across the world. Modeling the infectious spread situation of COVID-19 is essential to understand the current condition and to formulate intervention measurements. Epidemiological equations based on the SEIR model simulate disease development. The traditional parameter estimation method to solve SEIR equations could not precisely fit real-world data due to different situations, such as social distancing policies and intervention strategies. Additionally, learning-based models achieve outstanding fitting performance, but cannot visualize mechanisms. Methods: Thus, we propose a deep dynamic epidemiological (DDE) method that combines epidemiological equations and deep-learning advantages to obtain high accuracy and visualization. The DDE contains deep networks to fit the effect function to simulate the ever-changing situations based on the neural ODE method in solving variants' equations, ensuring the fitting performance of multi-level areas. Results: We introduce four SEIR variants to fit different situations in different countries and regions. We compare our DDE method with traditional parameter estimation methods (Nelder-Mead, BFGS, Powell, Truncated Newton Conjugate-Gradient, Neural ODE) in fitting the real-world data in the cases of countries (the USA, Columbia, South Africa) and regions (Wuhan in China, Piedmont in Italy). Our DDE method achieves the best Mean Square Error and Pearson coefficient in all five areas. Further, compared with the state-of-art learning-based approaches, the DDE outperforms all techniques, including LSTM, RNN, GRU, Random Forest, Extremely Random Trees, and Decision Tree. Conclusion: DDE presents outstanding predictive ability and visualized display of the changes in infection rates in different regions and countries. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.07505 [pdf]

Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV. Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01). Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2302.05450 [pdf]

A network-based biomarkers discovery of Cold/Hot ZHENG chronic gastritis and Cold/Hot herbs of formulae

Authors: Boyang Wang, Pan Chen, Peng Zhang, Shao Li

Abstract: Objective: To discover biomarkers and uncover the mechanism of Cold/Hot ZHENG (syndrome in traditional Chinese medicine) chronic gastritis (CG) and Cold/Hot herbs in traditional Chinese medicine (TCM) formulae on systematic biology. Background: CG is a common inflammatory disease and the diagnosis of CG in TCM can be classified into Cold ZHENG (Asthenic Cold) and Hot ZHENG (Excess Hot). However, t… ▽ More Objective: To discover biomarkers and uncover the mechanism of Cold/Hot ZHENG (syndrome in traditional Chinese medicine) chronic gastritis (CG) and Cold/Hot herbs in traditional Chinese medicine (TCM) formulae on systematic biology. Background: CG is a common inflammatory disease and the diagnosis of CG in TCM can be classified into Cold ZHENG (Asthenic Cold) and Hot ZHENG (Excess Hot). However, the molecular features of Cold/Hot ZHENG in CG and the mechanism of Cold/Hot herbs in formulae for CG remained unclear. Methods: Based on data of 35 patients of Cold/Hot ZHENG CG and 3 scRNA-seq CG samples, we conduct analysis with transcriptomics datasets and algorithms, to discover biomarkers for Cold/Hot ZHENG CG. And we collected 25 formulae (with traditional effects related to Cold/Hot ZHENG) for CG and corresponding 89 Cold/Hot herbs (including Warm/Cool herbs) to discover features and construct target networks of Cold/Hot herbs on the basis of network target and enrichment analysis. Results: Biomarkers of Cold/Hot ZHENG CG represented by CCL2 and LEP suggested that Hot ZHENG CG might be characterized by over-inflammation and exuberant metabolism, and Cold ZHENG CG showed a trend of suppression in immune regulation and energy metabolism. And biomarkers of Cold/Hot ZHENG showed also significant changes in the progression of gastric cancer. And biomarkers and pathways of Hot herbs intend to regulate immune responses and energy metabolism, while those of Cold herbs were likely to participate in anti-inflammation effect. Conclusion: In this study, we found that the biomarkers and mechanism of Cold/Hot ZHENG CG and those of Cold/Hot herbs were closely related to the regulation of immune and metabolisms. These findings may reflect the mechanism, build bridges between multiple views of Cold/Hot ZHENG and Cold/Hot herbs, and provide a research paradigm for further achieving precision TCM. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: 17 pages (references not included), 7 figures

arXiv:2210.16654 [pdf, other]

Biological free energy transduction is an Achilles heel of mean-field transport theory

Authors: Kiriko Terai, Jonathon L. Yuly, Peng Zhang, David N. Beratan

Abstract: Studies of nanoscale biological transport often use a mean-field approximation that is exact only when the system is at equilibrium and there are no interactions between particles on different sites in the network. We explore the limitations of this approximation to describe many-particle transport in the context of enzyme function and biological transport networks. Our focus is on three bioenerge… ▽ More Studies of nanoscale biological transport often use a mean-field approximation that is exact only when the system is at equilibrium and there are no interactions between particles on different sites in the network. We explore the limitations of this approximation to describe many-particle transport in the context of enzyme function and biological transport networks. Our focus is on three bioenergetic networks: a linear electron transfer chain (as found in bacterial nanowires), a redox-coupled proton pump (as in complex IV of respiration), and a near reversible electron bifurcation network (as in complex III of respiration and other recently discovered structures). Away from equilibrium and with typical site-site interactions, we find that the mean-field approximation adequately describes linear transport chains. However, the mean-field approximation fails catastrophically to describe energy-transducing systems, as in the redox coupled proton pump and reversible electron bifurcation reactions. The mean-field approximation fails to capture the essential correlations that are needed to prevent slippage events and to produce efficient energy transduction. △ Less

Submitted 29 October, 2022; originally announced October 2022.

Comments: 45 pages, 12 figures

arXiv:2210.01169 [pdf, other]

Neural-network solutions to stochastic reaction networks

Authors: Ying Tang, Jiayu Weng, Pan Zhang

Abstract: The stochastic reaction network in which chemical species evolve through a set of reactions is widely used to model stochastic processes in physics, chemistry and biology. To characterize the evolving joint probability distribution in the state space of species counts requires solving a system of ordinary differential equations, the chemical master equation, where the size of the counting state sp… ▽ More The stochastic reaction network in which chemical species evolve through a set of reactions is widely used to model stochastic processes in physics, chemistry and biology. To characterize the evolving joint probability distribution in the state space of species counts requires solving a system of ordinary differential equations, the chemical master equation, where the size of the counting state space increases exponentially with the type of species, making it challenging to investigate the stochastic reaction network. Here, we propose a machine-learning approach using the variational autoregressive network to solve the chemical master equation. Training the autoregressive network employs the policy gradient algorithm in the reinforcement learning framework, which does not require any data simulated in prior by another method. Different from simulating single trajectories, the approach tracks the time evolution of the joint probability distribution, and supports direct sampling of configurations and computing their normalized joint probabilities. We apply the approach to representative examples in physics and biology, and demonstrate that it accurately generates the probability distribution over time. The variational autoregressive network exhibits a plasticity in representing the multimodal distribution, cooperates with the conservation law, enables time-dependent reaction rates, and is efficient for high-dimensional reaction networks with allowing a flexible upper count limit. The results suggest a general approach to investigate stochastic reaction networks based on modern machine learning. △ Less

Submitted 7 February, 2023; v1 submitted 29 September, 2022; originally announced October 2022.

arXiv:2206.04349 [pdf, other]

doi 10.1016/j.neucom.2020.10.117

Deep radiomic signature with immune cell markers predicts the survival of glioma patients

Authors: Ahmad Chaddad, Paul Daniel Mingli Zhang, Saima Rathore, Paul Sargos, Christian Desrosiers, Tamim Niazi

Abstract: Imaging biomarkers offer a non-invasive way to predict the response of immunotherapy prior to treatment. In this work, we propose a novel type of deep radiomic features (DRFs) computed from a convolutional neural network (CNN), which capture tumor characteristics related to immune cell markers and overall survival. Our study uses four MRI sequences (T1-weighted, T1-weighted post-contrast, T2-weigh… ▽ More Imaging biomarkers offer a non-invasive way to predict the response of immunotherapy prior to treatment. In this work, we propose a novel type of deep radiomic features (DRFs) computed from a convolutional neural network (CNN), which capture tumor characteristics related to immune cell markers and overall survival. Our study uses four MRI sequences (T1-weighted, T1-weighted post-contrast, T2-weighted and FLAIR) with corresponding immune cell markers of 151 patients with brain tumor. The proposed method extracts a total of 180 DRFs by aggregating the activation maps of a pre-trained 3D-CNN within labeled tumor regions of MRI scans. These features offer a compact, yet powerful representation of regional texture encoding tissue heterogeneity. A comprehensive set of experiments is performed to assess the relationship between the proposed DRFs and immune cell markers, and measure their association with overall survival. Results show a high correlation between DRFs and various markers, as well as significant differences between patients grouped based on these markers. Moreover, combining DRFs, clinical features and immune cell markers as input to a random forest classifier helps discriminate between short and long survival outcomes, with AUC of 72\% and p=2.36$\times$10$^{-5}$. These results demonstrate the usefulness of proposed DRFs as non-invasive biomarker for predicting treatment response in patients with brain tumors. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Journal ref: Neurocomputing, Volume 469, 16 January 2022, Pages 366-375

arXiv:2111.09656 [pdf, other]

CLMB: deep contrastive learning for robust metagenomic binning

Authors: Pengfei Zhang, Zhengyuan Jiang, Yixuan Wang, Yu Li

Abstract: The reconstruction of microbial genomes from large metagenomic datasets is a critical procedure for finding uncultivated microbial populations and defining their microbial functional roles. To achieve that, we need to perform metagenomic binning, clustering the assembled contigs into draft genomes. Despite the existing computational tools, most of them neglect one important property of the metagen… ▽ More The reconstruction of microbial genomes from large metagenomic datasets is a critical procedure for finding uncultivated microbial populations and defining their microbial functional roles. To achieve that, we need to perform metagenomic binning, clustering the assembled contigs into draft genomes. Despite the existing computational tools, most of them neglect one important property of the metagenomic data, that is, the noise. To further improve the metagenomic binning step and reconstruct better metagenomes, we propose a deep Contrastive Learning framework for Metagenome Binning (CLMB), which can efficiently eliminate the disturbance of noise and produce more stable and robust results. Essentially, instead of denoising the data explicitly, we add simulated noise to the training data and force the deep learning model to produce similar and stable representations for both the noise-free data and the distorted data. Consequently, the trained model will be robust to noise and handle it implicitly during usage. CLMB outperforms the previous state-of-the-art binning methods significantly, recovering the most near-complete genomes on almost all the benchmarking datasets (up to 17\% more reconstructed genomes compared to the second-best method). It also improves the performance of bin refinement, reconstructing 8-22 more high-quality genomes and 15-32 more middle-quality genomes than the second-best result. Impressively, in addition to being compatible with the binning refiner, single CLMB even recovers on average 15 more HQ genomes than the refiner of VAMB and Maxbin on the benchmarking datasets. CLMB is open-source and available at https://github.com/zpf0117b/CLMB/. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: 20 pages, 9 figures

ACM Class: I.2.1; J.3

arXiv:2011.07639 [pdf]

doi 10.1063/5.0037517

Determining the atomic charge of calcium ion requires the information of its coordination geometry in an EF-hand motif

Authors: Pengzhi Zhang, Jaebeom Han, Piotr Cieplak, Margaret. S. Cheung

Abstract: It is challenging to parameterize the force field for calcium ions (Ca2+) in calcium-binding proteins because of their unique coordination chemistry that involves the surrounding atoms required for stability. In this work, we observed wide variation in Ca2+ binding loop conformations of the Ca2+-binding protein calmodulin (CaM), which adopts the most populated ternary structures determined from th… ▽ More It is challenging to parameterize the force field for calcium ions (Ca2+) in calcium-binding proteins because of their unique coordination chemistry that involves the surrounding atoms required for stability. In this work, we observed wide variation in Ca2+ binding loop conformations of the Ca2+-binding protein calmodulin (CaM), which adopts the most populated ternary structures determined from the MD simulations, followed by ab initio quantum mechanical (QM) calculations on all twelve amino acids in the loop that coordinate Ca2+ in aqueous solution. Ca2+ charges were derived by fitting to the electrostatic potential (ESP) in the context of a classical or polarizable force field (PFF). We discovered that the atomic radius of Ca2+ in conventional force fields is too large for the QM calculation to capture the variation in the coordination geometry of Ca2+ in its ionic form, leading to unphysical charges. Specifically, we found that the fitted atomic charges of Ca2+ in the context of PFF depend on the coordinating geometry of electronegative atoms from the amino acids in the loop. Although nearby water molecules do not influence the atomic charge of Ca2+, they are crucial for compensating for the coordination of Ca2+ due to the conformational flexibility in the EF-hand loop. Our method advances the development of force fields for metal ions and protein binding sites in dynamic environments. △ Less

Submitted 22 March, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

Comments: The following article has been accepted by Journal of Chemical Physics

Journal ref: J. Chem. Phys. 154, 124104 (2021)

arXiv:2007.07886 [pdf, other]

Clinical connectivity map for drug repurposing: using laboratory tests to bridge drugs and diseases

Authors: Qianlong Wen, Ruoqi Liu, Ping Zhang

Abstract: Drug repurposing has attracted increasing attention from both the pharmaceutical industry and the research community. Many existing computational drug repurposing methods rely on preclinical data (e.g., chemical structures, drug targets), resulting in translational problems for clinical trials. In this study, we propose a clinical connectivity map framework for drug repurposing by leveraging labor… ▽ More Drug repurposing has attracted increasing attention from both the pharmaceutical industry and the research community. Many existing computational drug repurposing methods rely on preclinical data (e.g., chemical structures, drug targets), resulting in translational problems for clinical trials. In this study, we propose a clinical connectivity map framework for drug repurposing by leveraging laboratory tests to analyze complementarity between drugs and diseases. We establish clinical drug effect vectors (i.e., drug-laboratory test associations) by applying a continuous self-controlled case series model on a longitudinal electronic health record data. We establish clinical disease sign vectors (i.e., disease-laboratory test associations) by applying a Wilcoxon rank sum test on a large-scale national survey data. Finally, we compute a repurposing possibility score for each drug-disease pair by applying a dot product-based scoring function on clinical disease sign vectors and clinical drug effect vectors. We comprehensively evaluate 392 drugs for 6 important chronic diseases (e.g., asthma, coronary heart disease, type 2 diabetes, etc.). We discover not only known associations between diseases and drugs but also many hidden drug-disease associations. Moreover, we are able to explain the predicted drug-disease associations via the corresponding complementarity between laboratory tests of drug effect vectors and disease sign vectors. The proposed clinical connectivity map framework uses laboratory tests from electronic clinical information to bridge drugs and diseases, which is explainable and has better translational power than existing computational methods. Experimental results demonstrate the effectiveness of the proposed framework and suggest that our method could help identify drug repurposing opportunities, which will benefit patients by offering more effective and safer treatments. △ Less

Submitted 24 July, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

arXiv:2005.14052 [pdf]

doi 10.1073/pnas.2010815117

Universal Free Energy Landscape Produces Efficient and Reversible Electron Bifurcation

Authors: Jonathon L. Yuly, Peng Zhang, Carolyn E. Lubner, John W. Peters, David N. Beratan

Abstract: For decades, it was unknown how electron bifurcating systems in Nature prevented energy-wasting short-circuiting reactions that have large driving forces, so synthetic electron bifurcating molecular machines could not be designed and built. The underpinning free energy landscapes for electron bifurcation were also enigmatic. We predict that a simple and universal free energy landscape enables elec… ▽ More For decades, it was unknown how electron bifurcating systems in Nature prevented energy-wasting short-circuiting reactions that have large driving forces, so synthetic electron bifurcating molecular machines could not be designed and built. The underpinning free energy landscapes for electron bifurcation were also enigmatic. We predict that a simple and universal free energy landscape enables electron bifurcation, and we show that it enables high-efficiency bifurcation with limited short-circuiting (the EB-scheme). The landscape relies on steep free energy slopes in the two redox branches to insulate against short-circuiting without relying on nuanced changes in the microscopic rate constants for the short-circuiting reactions. The EB-scheme thus provides a blueprint for future campaigns to establish synthetic electron bifurcating machines. △ Less

Submitted 21 July, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

Comments: 28 pages, 7 figures

Journal ref: PNAS, 2020. Vol. 117 (35), 21045-21051

arXiv:2004.00908 [pdf, other]

doi 10.4208/csiam-am.2020-0006

Detecting Suspected Epidemic Cases Using Trajectory Big Data

Authors: Chuansai Zhou, Wen Yuan, Jun Wang, Haiyong Xu, Yong Jiang, Xinmin Wang, Qiuzi Han Wen, Pingwen Zhang

Abstract: Emerging infectious diseases are existential threats to human health and global stability. The recent outbreaks of the novel coronavirus COVID-19 have rapidly formed a global pandemic, causing hundreds of thousands of infections and huge economic loss. The WHO declares that more precise measures to track, detect and isolate infected people are among the most effective means to quickly contain the… ▽ More Emerging infectious diseases are existential threats to human health and global stability. The recent outbreaks of the novel coronavirus COVID-19 have rapidly formed a global pandemic, causing hundreds of thousands of infections and huge economic loss. The WHO declares that more precise measures to track, detect and isolate infected people are among the most effective means to quickly contain the outbreak. Based on trajectory provided by the big data and the mean field theory, we establish an aggregated risk mean field that contains information of all risk-spreading particles by proposing a spatio-temporal model named HiRES risk map. It has dynamic fine spatial resolution and high computation efficiency enabling fast update. We then propose an objective individual epidemic risk scoring model named HiRES-p based on HiRES risk maps, and use it to develop statistical inference and machine learning methods for detecting suspected epidemic-infected individuals. We conduct numerical experiments by applying the proposed methods to study the early outbreak of COVID-19 in China. Results show that the HiRES risk map has strong ability in capturing global trend and local variability of the epidemic risk, thus can be applied to monitor epidemic risk at country, province, city and community levels, as well as at specific high-risk locations such as hospital and station. HiRES-p score seems to be an effective measurement of personal epidemic risk. The accuracy of both detecting methods are above 90\% when the population infection rate is under 20\%, which indicates great application potential in epidemic risk prevention and control practice. △ Less

Submitted 15 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

Journal ref: CSIAM Transactions on Applied Mathematics. 1(2020).186-206

arXiv:1902.03429 [pdf]

Clustering Bioactive Molecules in 3D Chemical Space with Unsupervised Deep Learning

Authors: Chu Qin, Ying Tan, Shang Ying Chen, Xian Zeng, Xingxing Qi, Tian Jin, Huan Shi, Yiwei Wan, Yu Chen, Jingfeng Li, Weidong He, Yali Wang, Peng Zhang, Feng Zhu, Hongping Zhao, Yuyang Jiang, Yuzong Chen

Abstract: Unsupervised clustering has broad applications in data stratification, pattern investigation and new discovery beyond existing knowledge. In particular, clustering of bioactive molecules facilitates chemical space mapping, structure-activity studies, and drug discovery. These tasks, conventionally conducted by similarity-based methods, are complicated by data complexity and diversity. We ex-plored… ▽ More Unsupervised clustering has broad applications in data stratification, pattern investigation and new discovery beyond existing knowledge. In particular, clustering of bioactive molecules facilitates chemical space mapping, structure-activity studies, and drug discovery. These tasks, conventionally conducted by similarity-based methods, are complicated by data complexity and diversity. We ex-plored the superior learning capability of deep autoencoders for unsupervised clustering of 1.39 mil-lion bioactive molecules into band-clusters in a 3-dimensional latent chemical space. These band-clusters, displayed by a space-navigation simulation software, band molecules of selected bioactivity classes into individual band-clusters possessing unique sets of common sub-structural features beyond structural similarity. These sub-structural features form the frameworks of the literature-reported pharmacophores and privileged fragments. Within each band-cluster, molecules are further banded into selected sub-regions with respect to their bioactivity target, sub-structural features and molecular scaffolds. Our method is potentially applicable for big data clustering tasks of different fields. △ Less

Submitted 9 February, 2019; originally announced February 2019.

arXiv:1812.04994 [pdf, ps, other]

Bayesian deep neural networks for low-cost neurophysiological markers of Alzheimer's disease severity

Authors: Wolfgang Fruehwirt, Adam D. Cobb, Martin Mairhofer, Leonard Weydemann, Heinrich Garn, Reinhold Schmidt, Thomas Benke, Peter Dal-Bianco, Gerhard Ransmayr, Markus Waser, Dieter Grossegger, Pengfei Zhang, Georg Dorffner, Stephen Roberts

Abstract: As societies around the world are ageing, the number of Alzheimer's disease (AD) patients is rapidly increasing. To date, no low-cost, non-invasive biomarkers have been established to advance the objectivization of AD diagnosis and progression assessment. Here, we utilize Bayesian neural networks to develop a multivariate predictor for AD severity using a wide range of quantitative EEG (QEEG) mark… ▽ More As societies around the world are ageing, the number of Alzheimer's disease (AD) patients is rapidly increasing. To date, no low-cost, non-invasive biomarkers have been established to advance the objectivization of AD diagnosis and progression assessment. Here, we utilize Bayesian neural networks to develop a multivariate predictor for AD severity using a wide range of quantitative EEG (QEEG) markers. The Bayesian treatment of neural networks both automatically controls model complexity and provides a predictive distribution over the target function, giving uncertainty bounds for our regression task. It is therefore well suited to clinical neuroscience, where data sets are typically sparse and practitioners require a precise assessment of the predictive uncertainty. We use data of one of the largest prospective AD EEG trials ever conducted to demonstrate the potential of Bayesian deep learning in this domain, while comparing two distinct Bayesian neural network approaches, i.e., Monte Carlo dropout and Hamiltonian Monte Carlo. △ Less

Submitted 13 December, 2018; v1 submitted 12 December, 2018; originally announced December 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

arXiv:1809.09553 [pdf]

Prediction of Coronary Heart Disease Using Routine Blood Tests

Authors: Ning Meng, Peng Zhang, Junfeng Li, Jun He, Jin Zhu

Abstract: Background --The objective of this study was to examine the association of routine blood test results with coronary heart disease (CHD) risk, to incorporate them into coronary prediction models and to compare the discrimination properties of this approach with other prediction functions. Methods and Results --This work was designed as a retrospective, single-center study of a hospital-based cohort… ▽ More Background --The objective of this study was to examine the association of routine blood test results with coronary heart disease (CHD) risk, to incorporate them into coronary prediction models and to compare the discrimination properties of this approach with other prediction functions. Methods and Results --This work was designed as a retrospective, single-center study of a hospital-based cohort. The 5060 CHD patients (2365 men and 2695 women) were 1 to 97 years old at baseline with 8 years (2009-2017) of medical records, 5051 health check-ups and 5075 cases of other diseases. We developed a two-layer Gradient Boosting Decision Tree(GBDT) model based on routine blood data to predict the risk of coronary heart disease, which could identify 86% of people with coronary heart disease. We built a dataset with 15,000 routine blood tests results. Using this dataset, we trained the two-layer GBDT model to classify healthy status, coronary heart disease and other diseases. As a result of the classification after machine learning, we found that the sensitivity of detecting the health data was approximately 93% for all data, and the sensitivity of detecting CHD was 93% for disease data that included coronary heart disease. On this basis, we further visualized the correlation between routine blood results and related data items, and there was an obvious pattern in health and coronary heart disease in all data presentations, which can be used for clinical reference. Finally, we briefly analyzed the results above from the perspective of pathophysiology. Conclusions --Routine blood data provides more information about CHD than what we already know through the correlation between test results and related data items. A simple coronary disease prediction model was developed using a GBDT algorithm, which will allow physicians to predict CHD risk in patients without overt CHD. △ Less

Submitted 11 September, 2018; originally announced September 2018.

arXiv:1711.08359 [pdf, ps, other]

Riemannian tangent space mapping and elastic net regularization for cost-effective EEG markers of brain atrophy in Alzheimer's disease

Authors: Wolfgang Fruehwirt, Matthias Gerstgrasser, Pengfei Zhang, Leonard Weydemann, Markus Waser, Reinhold Schmidt, Thomas Benke, Peter Dal-Bianco, Gerhard Ransmayr, Dieter Grossegger, Heinrich Garn, Gareth W. Peters, Stephen Roberts, Georg Dorffner

Abstract: The diagnosis of Alzheimer's disease (AD) in routine clinical practice is most commonly based on subjective clinical interpretations. Quantitative electroencephalography (QEEG) measures have been shown to reflect neurodegenerative processes in AD and might qualify as affordable and thereby widely available markers to facilitate the objectivization of AD assessment. Here, we present a novel framewo… ▽ More The diagnosis of Alzheimer's disease (AD) in routine clinical practice is most commonly based on subjective clinical interpretations. Quantitative electroencephalography (QEEG) measures have been shown to reflect neurodegenerative processes in AD and might qualify as affordable and thereby widely available markers to facilitate the objectivization of AD assessment. Here, we present a novel framework combining Riemannian tangent space mapping and elastic net regression for the development of brain atrophy markers. While most AD QEEG studies are based on small sample sizes and psychological test scores as outcome measures, here we train and test our models using data of one of the largest prospective EEG AD trials ever conducted, including MRI biomarkers of brain atrophy. △ Less

Submitted 22 November, 2017; originally announced November 2017.

Comments: Presented at NIPS 2017 Workshop on Machine Learning for Health

arXiv:1612.08444 [pdf]

doi 10.1038/ncomms13689

In vitro protease cleavage and computer simulations reveal the HIV-1 capsid maturation pathway

Authors: Jiying Ning, Gonca Erdemci-Tandogan, Ernest L Yufenyuy, Jef Wagner, Benjamin A Himes, Gongpu Zhao, Christopher Aiken, Roya Zandi, Peijun Zhang

Abstract: HIV-1 virions assemble as immature particles containing Gag polyproteins that are processed by the viral protease into individual components, resulting in the formation of mature infectious particles. There are two competing models for the process of forming the mature HIV-1 core: the disassembly and de novo reassembly model and the non-diffusional displacive model. To study the maturation pathway… ▽ More HIV-1 virions assemble as immature particles containing Gag polyproteins that are processed by the viral protease into individual components, resulting in the formation of mature infectious particles. There are two competing models for the process of forming the mature HIV-1 core: the disassembly and de novo reassembly model and the non-diffusional displacive model. To study the maturation pathway, we simulate HIV-1 maturation in vitro by digesting immature particles and assembled virus-like particles with recombinant HIV-1 protease and monitor the process with biochemical assays and cryoEM structural analysis in parallel. Processing of Gag in vitro is accurate and efficient and results in both soluble capsid protein and conical or tubular capsid assemblies, seemingly converted from immature Gag particles. Computer simulations further reveal probable assembly pathways of HIV-1 capsid formation. Combining the experimental data and computer simulations, our results suggest a sequential combination of both displacive and disassembly/reassembly processes for HIV-1 maturation. △ Less

Submitted 26 December, 2016; originally announced December 2016.

Journal ref: Nature Communications 2016; 7: 13689

arXiv:1410.5723 [pdf]

DNA methylation variation in Arabidopsis has a genetic basis and shows evidence of local adaptation

Authors: Manu J. Dubin, Pei Zhang, Dazhe Meng, Marie-Stanislas Remigereau, Edward J. Osborne, Francesco Paolo Casale, Phillip Drewe, André Kahles, Bjarni Vilhjálmsson, Joanna Jagoda, Selen Irez, Viktor Voronin, Qiang Song, Quan Long, Gunnar Rätsch, Oliver Stegle, Richard M. Clark, Magnus Nordborg

Abstract: Epigenome modulation in response to the environment potentially provides a mechanism for organisms to adapt, both within and between generations. However, neither the extent to which this occurs, nor the molecular mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects on DNA… ▽ More Epigenome modulation in response to the environment potentially provides a mechanism for organisms to adapt, both within and between generations. However, neither the extent to which this occurs, nor the molecular mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects on DNA methylation were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association mapping revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) on the coding region of genes was not affected by growth temperature, but was instead strongly correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was correlated with elevated transcription levels for the genes affected. Genome-wide association mapping revealed that this effect was largely due to trans-acting loci, a significant fraction of which showed evidence of local adaptation. These findings constitute the first direct link between DNA methylation and adaptation to the environment, and provide a basis for further dissecting how environmentally driven and genetically determined epigenetic variation interact and influence organismal fitness. △ Less

Submitted 21 October, 2014; originally announced October 2014.

Comments: 38 pages 4 figures

arXiv:1302.0189 [pdf, other]

doi 10.1109/ICCW.2013.6649458

Non-adaptive pooling strategies for detection of rare faulty items

Authors: Pan Zhang, Florent Krzakala, Marc Mézard, Lenka Zdeborová

Abstract: We study non-adaptive pooling strategies for detection of rare faulty items. Given a binary sparse N-dimensional signal x, how to construct a sparse binary MxN pooling matrix F such that the signal can be reconstructed from the smallest possible number M of measurements y=Fx? We show that a very low number of measurements is possible for random spatially coupled design of pools F. Our design might… ▽ More We study non-adaptive pooling strategies for detection of rare faulty items. Given a binary sparse N-dimensional signal x, how to construct a sparse binary MxN pooling matrix F such that the signal can be reconstructed from the smallest possible number M of measurements y=Fx? We show that a very low number of measurements is possible for random spatially coupled design of pools F. Our design might find application in genetic screening or compressed genotyping. We show that our results are robust with respect to the uncertainty in the matrix F when some elements are mistaken. △ Less

Submitted 1 February, 2013; originally announced February 2013.

Comments: 5 pages

Journal ref: IEEE International Conference on Communications Workshops (ICC 2013), Pages: 1409 - 1414, (2013)

arXiv:q-bio/0611064 [pdf, ps, other]

doi 10.1140/epjb/e2007-00278-0

Frequency and phase synchronization of two coupled neurons with channel noise

Authors: L. C. Yu, Yong Chen, Pan Zhang

Abstract: We study the frequency and phase synchronization in two coupled identical and nonidentical neurons with channel noise. The occupation number method is used to model the neurons in the context of stochastic Hodgkin-Huxley model in which the strength of of channel noise is represented by ion channel cluster size of the initiation region of neuron. It is shown that frequency synchronization only wa… ▽ More We study the frequency and phase synchronization in two coupled identical and nonidentical neurons with channel noise. The occupation number method is used to model the neurons in the context of stochastic Hodgkin-Huxley model in which the strength of of channel noise is represented by ion channel cluster size of the initiation region of neuron. It is shown that frequency synchronization only was achieved at arbitrary value of couple strength as long as two neurons' channel cluster sizes are the same. We also show that the relative phase of neurons can display profuse dynamic behavior under the combined action of coupling and channel noise. Both qualitative and quantitative descriptions are applied to describe the transitions between those behaviors. Relevance of our findings to controlling neural synchronization experimentally is discussed. △ Less

Submitted 6 November, 2007; v1 submitted 20 November, 2006; originally announced November 2006.

Comments: 8 pages, 10 figures

Journal ref: Eur. Phys. J. B 59, 249-257(2007)

arXiv:q-bio/0608037 [pdf, ps, other]

doi 10.1088/1367-2630/9/7/220

Network growth approach to macroevolution

Authors: Shao-Meng Qin, Yong Chen, Pan Zhang

Abstract: We propose a novel network growth model coupled with the competition interaction to simulate macroevolution. Our work shows that the competition plays an important role in macroevolution and it is more rational to describe the interaction between species by network structures. Our model presents a complete picture of the development of phyla and the splitting process. It is found that periodic m… ▽ More We propose a novel network growth model coupled with the competition interaction to simulate macroevolution. Our work shows that the competition plays an important role in macroevolution and it is more rational to describe the interaction between species by network structures. Our model presents a complete picture of the development of phyla and the splitting process. It is found that periodic mass extinction occurred in our networks without any extraterrestrial factors and the lifetime distribution of species is very close to fossil record. We also perturb networks with two scenarios of mass extinctions on different hierarchic levels in order to study their recovery. △ Less

Submitted 11 July, 2007; v1 submitted 25 August, 2006; originally announced August 2006.

Comments: 16 pages, 7 figures, published version

Journal ref: New J. Phys. 9 (2007) 220

Showing 1–23 of 23 results for author: Zhang, P