-
Dimeric Drug Polymeric Micelles with Acid-Active Tumor Targeting and FRET-indicated Drug Release
Authors:
Xing Guo,
Lin Wang,
Kayla Duval,
Jing Fan,
Shaobing Zhou,
Zi Chen
Abstract:
Trans-activating transcriptional activator (TAT), a cell-penetrating peptide, has been extensively used for facilitating cellular uptake and nuclear targeting of drug delivery systems. However, the positively charged TAT peptide usually strongly interacts with serum components and undergoes substantial phagocytosis by the reticuloendothelial system, causing a short blood circulation in vivo. In th…
▽ More
Trans-activating transcriptional activator (TAT), a cell-penetrating peptide, has been extensively used for facilitating cellular uptake and nuclear targeting of drug delivery systems. However, the positively charged TAT peptide usually strongly interacts with serum components and undergoes substantial phagocytosis by the reticuloendothelial system, causing a short blood circulation in vivo. In this work, an acid-active tumor targeting nanoplatform DA-TAT-PECL was developed to effectively inhibit the nonspecific interactions of TAT in the bloodstream. 2,3-dimethylmaleic anhydride (DA) was first used to convert the TAT amines to carboxylic acid, the resulting DA-TAT was further conjugated to get DA-TAT-PECL. After self-assembly into polymeric micelles, they were capable of circulating in the physiological condition for a long time and promoting cell penetration upon accumulation at the tumor site and de-shielding the DA group. Moreover, camptothecin (CPT) was used as the anticancer drug and modified into a dimer (CPT)2-ss-Mal, in which two CPT molecules were connected by a reduction-labile maleimide thioether bond. The FRET signal between CPT and maleimide thioether bond was monitored to visualize the drug release process and effective targeted delivery of antitumor drugs was demonstrated. This pH/reduction dual-responsive micelle system provides a new platform for high fidelity cancer therapy.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction
Authors:
Yuyan Liu,
Sirui Ding,
Sheng Zhou,
Wenqi Fan,
Qiaoyu Tan
Abstract:
Molecular property prediction (MPP) is a fundamental and crucial task in drug discovery. However, prior methods are limited by the requirement for a large number of labeled molecules and their restricted ability to generalize for unseen and new tasks, both of which are essential for real-world applications. To address these challenges, we present MolecularGPT for few-shot MPP. From a perspective o…
▽ More
Molecular property prediction (MPP) is a fundamental and crucial task in drug discovery. However, prior methods are limited by the requirement for a large number of labeled molecules and their restricted ability to generalize for unseen and new tasks, both of which are essential for real-world applications. To address these challenges, we present MolecularGPT for few-shot MPP. From a perspective on instruction tuning, we fine-tune large language models (LLMs) based on curated molecular instructions spanning over 1000 property prediction tasks. This enables building a versatile and specialized LLM that can be adapted to novel MPP tasks without any fine-tuning through zero- and few-shot in-context learning (ICL). MolecularGPT exhibits competitive in-context reasoning capabilities across 10 downstream evaluation datasets, setting new benchmarks for few-shot molecular prediction tasks. More importantly, with just two-shot examples, MolecularGPT can outperform standard supervised graph neural network methods on 4 out of 7 datasets. It also excels state-of-the-art LLM baselines by up to 16.6% increase on classification accuracy and decrease of 199.17 on regression metrics (e.g., RMSE) under zero-shot. This study demonstrates the potential of LLMs as effective few-shot molecular property predictors. The code is available at https://github.com/NYUSHCS/MolecularGPT.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction
Authors:
Linjia Kang,
Songhua Zhou,
Shuyan Fang,
Shichao Liu,
Wen Zhang
Abstract:
Accurate prediction of molecular properties is critical in the field of drug discovery. However, existing methods do not fully consider the fact that molecules in the real world usually possess multiple property labels, and complex high-order relationships may exist among these labels. Therefore, molecular representation learning models should generate differential molecular representations that c…
▽ More
Accurate prediction of molecular properties is critical in the field of drug discovery. However, existing methods do not fully consider the fact that molecules in the real world usually possess multiple property labels, and complex high-order relationships may exist among these labels. Therefore, molecular representation learning models should generate differential molecular representations that consider multi-granularity correlation information among tasks. To this end, our research introduces a Hierarchical Prompted Molecular Representation Learning Framework (HiPM), which enhances the differential expression of tasks in molecular representations through task-aware prompts, and utilizes shared information among labels to mitigate negative transfer between different tasks. HiPM primarily consists of two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). The MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atomic and motif levels, while the TAP uses agglomerative hierarchical clustering to build a prompt tree that reflects the affinity and distinctiveness of tasks, enabling the model to effectively handle the complexity of multi-label property predictions. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a new perspective on multi-label molecular representation learning.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Group-specific discriminant analysis reveals statistically validated sex differences in lateralization of brain functional network
Authors:
Shuo Zhou,
Junhao Luo,
Yaya Jiang,
Haolin Wang,
Haiping Lu,
Gaolang Gong
Abstract:
Lateralization is a fundamental feature of the human brain, where sex differences have been observed. Conventional studies in neuroscience on sex-specific lateralization are typically conducted on univariate statistical comparisons between male and female groups. However, these analyses often lack effective validation of group specificity. Here, we formulate modeling sex differences in lateralizat…
▽ More
Lateralization is a fundamental feature of the human brain, where sex differences have been observed. Conventional studies in neuroscience on sex-specific lateralization are typically conducted on univariate statistical comparisons between male and female groups. However, these analyses often lack effective validation of group specificity. Here, we formulate modeling sex differences in lateralization of functional networks as a dual-classification problem, consisting of first-order classification for left vs. right functional networks and second-order classification for male vs. female models. To capture sex-specific patterns, we develop the Group-Specific Discriminant Analysis (GSDA) for first-order classification. The evaluation on two public neuroimaging datasets demonstrates the efficacy of GSDA in learning sex-specific models from functional networks, achieving a significant improvement in group specificity over baseline methods. The major sex differences are in the strength of lateralization and the interactions within and between lobes. The GSDA-based method is generic in nature and can be adapted to other group-specific analyses such as handedness-specific or disease-specific analyses.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Dual-Space Optimization: Improved Molecule Sequence Design by Latent Prompt Transformer
Authors:
Deqian Kong,
Yuhao Huang,
Jianwen Xie,
Edouardo Honig,
Ming Xu,
Shuanghong Xue,
Pei Lin,
Sanping Zhou,
Sheng Zhong,
Nanning Zheng,
Ying Nian Wu
Abstract:
Designing molecules with desirable properties, such as drug-likeliness and high binding affinities towards protein targets, is a challenging problem. In this paper, we propose the Dual-Space Optimization (DSO) method that integrates latent space sampling and data space selection to solve this problem. DSO iteratively updates a latent space generative model and a synthetic dataset in an optimizatio…
▽ More
Designing molecules with desirable properties, such as drug-likeliness and high binding affinities towards protein targets, is a challenging problem. In this paper, we propose the Dual-Space Optimization (DSO) method that integrates latent space sampling and data space selection to solve this problem. DSO iteratively updates a latent space generative model and a synthetic dataset in an optimization process that gradually shifts the generative model and the synthetic data towards regions of desired property values. Our generative model takes the form of a Latent Prompt Transformer (LPT) where the latent vector serves as the prompt of a causal transformer. Our extensive experiments demonstrate effectiveness of the proposed method, which sets new performance benchmarks across single-objective, multi-objective and constrained molecule design tasks.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Clinical Applications of Plantar Pressure Measurement
Authors:
Kelsey Detels,
David Shin,
Harrison Wilson,
Shanni Zhou,
Andrew Chen,
Jessica Rosendorf,
Atta Taseh,
Bardiya Akhbari,
Joseph H. Schwab,
Hamid Ghaednia
Abstract:
Plantar pressure measurements can provide valuable insight into various health characteristics in patients. In this study, we describe different plantar pressure devices available on the market and their clinical relevance. Current devices are either platform-based or wearable and consist of a variety of sensor technologies: resistive, capacitive, piezoelectric, and optical. The measurements colle…
▽ More
Plantar pressure measurements can provide valuable insight into various health characteristics in patients. In this study, we describe different plantar pressure devices available on the market and their clinical relevance. Current devices are either platform-based or wearable and consist of a variety of sensor technologies: resistive, capacitive, piezoelectric, and optical. The measurements collected from any of these sensors can be utilized for a range of clinical applications including patients with diabetes, trauma, deformity and cerebral palsy, stroke, cervical myelopathy, ankle instability, sports injuries, and Parkinsons disease. However, the proper technology should be selected based on the clinical need and the type of tests being performed on the device. In this review we provide the reader with a simple overview of the existing technologies their advantages and disadvantages and provide application examples for each. Moreover, we suggest new areas in orthopaedic that plantar pressure mapping technology can be utilized for increased quality of care.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
The Required Spatial Resolution to Assess Imbalance using Plantar Pressure Mapping
Authors:
Kelsey Detels,
Shanni Zhou,
Harrison Wilson,
Jessica Rosendorf,
Ghazal Shabestanipour,
Elias Ben Mellouk,
David Shin,
Joseph Schwab,
Hamid Ghaednia
Abstract:
Roughly 1/3 of adults older than 65 fall each year, resulting in more than 3 million emergency room visits, thousands of deaths, and over $50 Billion in direct costs. The Centers for Disease Control and Prevention (CDC) estimate that 1/3 of falls are preventable with effective mitigation strategies, particularly for imbalance. Therefore, quantification of imbalance is being studied extensively in…
▽ More
Roughly 1/3 of adults older than 65 fall each year, resulting in more than 3 million emergency room visits, thousands of deaths, and over $50 Billion in direct costs. The Centers for Disease Control and Prevention (CDC) estimate that 1/3 of falls are preventable with effective mitigation strategies, particularly for imbalance. Therefore, quantification of imbalance is being studied extensively in recent years. In this study we investigate the feasibility of plantar pressure mapping in balance assessment through a healthy human subject study. We used an in-house plantar pressure mapping device with high precision based on Frustrated Total Internal Reflection to measure subjects sway during the Romberg test. Through the measurements obtained from all subjects, we measured the minimum spatial resolution required for plantar pressure mapping devices in assessment of balance. We conclude that most of the current devices in the market lack the requirements for imbalance measurements.
△ Less
Submitted 18 April, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Molecular Property Prediction Based on Graph Structure Learning
Authors:
Bangyi Zhao,
Weixia Xu,
Jihong Guan,
Shuigeng Zhou
Abstract:
Molecular property prediction (MPP) is a fundamental but challenging task in the computer-aided drug discovery process. More and more recent works employ different graph-based models for MPP, which have made considerable progress in improving prediction performance. However, current models often ignore relationships between molecules, which could be also helpful for MPP. For this sake, in this pap…
▽ More
Molecular property prediction (MPP) is a fundamental but challenging task in the computer-aided drug discovery process. More and more recent works employ different graph-based models for MPP, which have made considerable progress in improving prediction performance. However, current models often ignore relationships between molecules, which could be also helpful for MPP. For this sake, in this paper we propose a graph structure learning (GSL) based MPP approach, called GSL-MPP. Specifically, we first apply graph neural network (GNN) over molecular graphs to extract molecular representations. Then, with molecular fingerprints, we construct a molecular similarity graph (MSG). Following that, we conduct graph structure learning on the MSG (i.e., molecule-level graph structure learning) to get the final molecular embeddings, which are the results of fusing both GNN encoded molecular representations and the relationships among molecules, i.e., combining both intra-molecule and inter-molecule information. Finally, we use these molecular embeddings to perform MPP. Extensive experiments on seven various benchmark datasets show that our method could achieve state-of-the-art performance in most cases, especially on classification tasks. Further visualization studies also demonstrate the good molecular representations of our method.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive Learning
Authors:
Weikang Jiang,
Jinxian Wang,
Jihong Guan,
Shuigeng Zhou
Abstract:
Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene expression at single-cell level. One important task in scRNA-seq data analysis is unsupervised clustering, which helps identify distinct cell types, laying down the foundation for other downstream analysis tasks. In this paper, we propose a novel method called Cluster-aware Iterative Contrastive Learning (CICL in short) for…
▽ More
Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene expression at single-cell level. One important task in scRNA-seq data analysis is unsupervised clustering, which helps identify distinct cell types, laying down the foundation for other downstream analysis tasks. In this paper, we propose a novel method called Cluster-aware Iterative Contrastive Learning (CICL in short) for scRNA-seq data clustering, which utilizes an iterative representation learning and clustering framework to progressively learn the clustering structure of scRNA-seq data with a cluster-aware contrastive loss. CICL consists of a Transformer encoder, a clustering head, a projection head and a contrastive loss module. First, CICL extracts the feature vectors of the original and augmented data by the Transformer encoder. Then, it computes the clustering centroids by K-means and employs the student t-distribution to assign pseudo-labels to all cells in the clustering head. The projection-head uses a Multi-Layer Perceptron (MLP) to obtain projections of the augmented data. At last, both pseudo-labels and projections are used in the contrastive loss to guide the model training. Such a process goes iteratively so that the clustering result becomes better and better. Extensive experiments on 25 real world scRNA-seq datasets show that CICL outperforms the SOTA methods. Concretely, CICL surpasses the existing methods by from 14% to 280%, and from 5% to 133% on average in terms of performance metrics ARI and NMI respectively.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties
Authors:
Siyuan Guo,
Jihong Guan,
Shuigeng Zhou
Abstract:
In the past decade, Artificial Intelligence driven drug design and discovery has been a hot research topic, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue only the basic properties like validity and uniqueness of the generated molecules, a few go further to…
▽ More
In the past decade, Artificial Intelligence driven drug design and discovery has been a hot research topic, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue only the basic properties like validity and uniqueness of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g. QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g. pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250k show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fréchet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Genetic Analysis of Prostate Cancer with Computer Science Methods
Authors:
Yuxuan Li,
Shi Zhou
Abstract:
Metastatic prostate cancer is one of the most common cancers in men. In the advanced stages of prostate cancer, tumours can metastasise to other tissues in the body, which is fatal. In this thesis, we performed a genetic analysis of prostate cancer tumours at different metastatic sites using data science, machine learning and topological network analysis methods. We presented a general procedure f…
▽ More
Metastatic prostate cancer is one of the most common cancers in men. In the advanced stages of prostate cancer, tumours can metastasise to other tissues in the body, which is fatal. In this thesis, we performed a genetic analysis of prostate cancer tumours at different metastatic sites using data science, machine learning and topological network analysis methods. We presented a general procedure for pre-processing gene expression datasets and pre-filtering significant genes by analytical methods. We then used machine learning models for further key gene filtering and secondary site tumour classification. Finally, we performed gene co-expression network analysis and community detection on samples from different prostate cancer secondary site types. In this work, 13 of the 14,379 genes were selected as the most metastatic prostate cancer related genes, achieving approximately 92% accuracy under cross-validation. In addition, we provide preliminary insights into the co-expression patterns of genes in gene co-expression networks. Project code is available at https://github.com/zcablii/Master_cancer_project.
△ Less
Submitted 28 March, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Tensor-based Multimodal Learning for Prediction of Pulmonary Arterial Wedge Pressure from Cardiac MRI
Authors:
Prasun C. Tripathi,
Mohammod N. I. Suvon,
Lawrence Schobs,
Shuo Zhou,
Samer Alabed,
Andrew J. Swift,
Haiping Lu
Abstract:
Heart failure is a serious and life-threatening condition that can lead to elevated pressure in the left ventricle. Pulmonary Arterial Wedge Pressure (PAWP) is an important surrogate marker indicating high pressure in the left ventricle. PAWP is determined by Right Heart Catheterization (RHC) but it is an invasive procedure. A non-invasive method is useful in quickly identifying high-risk patients…
▽ More
Heart failure is a serious and life-threatening condition that can lead to elevated pressure in the left ventricle. Pulmonary Arterial Wedge Pressure (PAWP) is an important surrogate marker indicating high pressure in the left ventricle. PAWP is determined by Right Heart Catheterization (RHC) but it is an invasive procedure. A non-invasive method is useful in quickly identifying high-risk patients from a large population. In this work, we develop a tensor learning-based pipeline for identifying PAWP from multimodal cardiac Magnetic Resonance Imaging (MRI). This pipeline extracts spatial and temporal features from high-dimensional scans. For quality control, we incorporate an epistemic uncertainty-based binning strategy to identify poor-quality training samples. To improve the performance, we learn complementary information by integrating features from multimodal data: cardiac MRI with short-axis and four-chamber views, and Electronic Health Records. The experimental analysis on a large cohort of $1346$ subjects who underwent the RHC procedure for PAWP estimation indicates that the proposed pipeline has a diagnostic value and can produce promising performance with significant improvement over the baseline in clinical practice (i.e., $Δ$AUC $=0.10$, $Δ$Accuracy $=0.06$, and $Δ$MCC $=0.39$). The decision curve analysis further confirms the clinical utility of our method.
△ Less
Submitted 6 April, 2024; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Molecular Property Prediction by Semantic-invariant Contrastive Learning
Authors:
Ziqiao Zhang,
Ailin Xie,
Jihong Guan,
Shuigeng Zhou
Abstract:
Contrastive learning have been widely used as pretext tasks for self-supervised pre-trained molecular representation learning models in AI-aided drug design and discovery. However, exiting methods that generate molecular views by noise-adding operations for contrastive learning may face the semantic inconsistency problem, which leads to false positive pairs and consequently poor prediction perform…
▽ More
Contrastive learning have been widely used as pretext tasks for self-supervised pre-trained molecular representation learning models in AI-aided drug design and discovery. However, exiting methods that generate molecular views by noise-adding operations for contrastive learning may face the semantic inconsistency problem, which leads to false positive pairs and consequently poor prediction performance. To address this problem, in this paper we first propose a semantic-invariant view generation method by properly breaking molecular graphs into fragment pairs. Then, we develop a Fragment-based Semantic-Invariant Contrastive Learning (FraSICL) model based on this view generation method for molecular property prediction. The FraSICL model consists of two branches to generate representations of views for contrastive learning, meanwhile a multi-view fusion and an auxiliary similarity loss are introduced to make better use of the information contained in different fragment-pair views. Extensive experiments on various benchmark datasets show that with the least number of pre-training samples, FraSICL can achieve state-of-the-art performance, compared with major existing counterpart models.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Activity Cliff Prediction: Dataset and Benchmark
Authors:
Ziqiao Zhang,
Bangyi Zhao,
Ailin Xie,
Yatao Bian,
Shuigeng Zhou
Abstract:
Activity cliffs (ACs), which are generally defined as pairs of structurally similar molecules that are active against the same bio-target but significantly different in the binding potency, are of great importance to drug discovery. Up to date, the AC prediction problem, i.e., to predict whether a pair of molecules exhibit the AC relationship, has not yet been fully explored. In this paper, we fir…
▽ More
Activity cliffs (ACs), which are generally defined as pairs of structurally similar molecules that are active against the same bio-target but significantly different in the binding potency, are of great importance to drug discovery. Up to date, the AC prediction problem, i.e., to predict whether a pair of molecules exhibit the AC relationship, has not yet been fully explored. In this paper, we first introduce ACNet, a large-scale dataset for AC prediction. ACNet curates over 400K Matched Molecular Pairs (MMPs) against 190 targets, including over 20K MMP-cliffs and 380K non-AC MMPs, and provides five subsets for model development and evaluation. Then, we propose a baseline framework to benchmark the predictive performance of molecular representations encoded by deep neural networks for AC prediction, and 16 models are evaluated in experiments. Our experimental results show that deep learning models can achieve good performance when the models are trained on tasks with adequate amount of data, while the imbalanced, low-data and out-of-distribution features of the ACNet dataset still make it challenging for deep neural networks to cope with. In addition, the traditional ECFP method shows a natural advantage on MMP-cliff prediction, and outperforms other deep learning models on most of the data subsets. To the best of our knowledge, our work constructs the first large-scale dataset for AC prediction, which may stimulate the study of AC prediction models and prompt further breakthroughs in AI-aided drug discovery. The codes and dataset can be accessed by https://drugai.github.io/ACNet/.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Biofilms as self-shaping growing nematics
Authors:
Japinder Nijjer,
Mrityunjay Kothari,
Changhao Li,
Thomas Henzel,
Qiuting Zhang,
Jung-Shen B. Tai,
Shuang Zhou,
Sulin Zhang,
Tal Cohen,
Jing Yan
Abstract:
Active nematics are the nonequilibrium analog of passive liquid crystals in which anisotropic units consume free energy to drive emergent behavior. Similar to liquid crystal (LC) molecules in displays, ordering and dynamics in active nematics are sensitive to boundary conditions; however, unlike passive liquid crystals, active nematics, such as those composed of living matter, have the potential t…
▽ More
Active nematics are the nonequilibrium analog of passive liquid crystals in which anisotropic units consume free energy to drive emergent behavior. Similar to liquid crystal (LC) molecules in displays, ordering and dynamics in active nematics are sensitive to boundary conditions; however, unlike passive liquid crystals, active nematics, such as those composed of living matter, have the potential to regulate their boundaries through self-generated stresses. Here, using bacterial biofilms confined by a hydrogel as a model system, we show how a three-dimensional, living nematic can actively shape itself and its boundary in order to regulate its internal architecture through growth-induced stresses. We show that biofilms exhibit a sharp transition in shape from domes to lenses upon changing environmental stiffness or cell-substrate friction, which is explained by a theoretical model considering the competition between confinement and interfacial forces. The growth mode defines the progression of the boundary, which in turn determines the trajectories and spatial distribution of cell lineages. We further demonstrate that the evolving boundary defines the orientational ordering of cells and the emergence of topological defects in the interior of the biofilm. Our findings reveal novel self-organization phenomena in confined active matter and provide strategies for guiding the development of programmed microbial consortia with emergent material properties.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Can Pre-trained Models Really Learn Better Molecular Representations for AI-aided Drug Discovery?
Authors:
Ziqiao Zhang,
Yatao Bian,
Ailin Xie,
Pengju Han,
Long-Kai Huang,
Shuigeng Zhou
Abstract:
Self-supervised pre-training is gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pre-trained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations have not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in tr…
▽ More
Self-supervised pre-training is gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pre-trained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations have not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure-Activity Relationship (QSAR) analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pre-trained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pre-trained models are analyzed. The results indicate that the state-of-the-art pre-trained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints (ECFP), while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pre-trained models. And our findings can guide the community to develop better pre-training techniques to regularize the occurrence of ACs and SH.
△ Less
Submitted 21 August, 2022;
originally announced September 2022.
-
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations
Authors:
Yuanfeng Ji,
Lu Zhang,
Jiaxiang Wu,
Bingzhe Wu,
Long-Kai Huang,
Tingyang Xu,
Yu Rong,
Lanqing Li,
Jie Ren,
Ding Xue,
Houtim Lai,
Shaoyong Xu,
Jing Feng,
Wei Liu,
Ping Luo,
Shuigeng Zhou,
Junzhou Huang,
Peilin Zhao,
Yatao Bian
Abstract:
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise},…
▽ More
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications.
In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Neuropsychiatric Disease Classification Using Functional Connectomics -- Results of the Connectomics in NeuroImaging Transfer Learning Challenge
Authors:
Markus D. Schirmer,
Archana Venkataraman,
Islem Rekik,
Minjeong Kim,
Stewart H. Mostofsky,
Mary Beth Nebel,
Keri Rosch,
Karen Seymour,
Deana Crocetti,
Hassna Irzan,
Michael Hütel,
Sebastien Ourselin,
Neil Marlow,
Andrew Melbourne,
Egor Levchenko,
Shuo Zhou,
Mwiza Kunda,
Haiping Lu,
Nicha C. Dvornek,
Juntang Zhuang,
Gideon Pinto,
Sandip Samal,
Jennings Zhang,
Jorge L. Bernal-Rusiel,
Rudolph Pienaar
, et al. (1 additional authors not shown)
Abstract:
Large, open-source consortium datasets have spurred the development of new and increasingly powerful machine learning approaches in brain connectomics. However, one key question remains: are we capturing biologically relevant and generalizable information about the brain, or are we simply overfitting to the data? To answer this, we organized a scientific challenge, the Connectomics in NeuroImaging…
▽ More
Large, open-source consortium datasets have spurred the development of new and increasingly powerful machine learning approaches in brain connectomics. However, one key question remains: are we capturing biologically relevant and generalizable information about the brain, or are we simply overfitting to the data? To answer this, we organized a scientific challenge, the Connectomics in NeuroImaging Transfer Learning Challenge (CNI-TLC), held in conjunction with MICCAI 2019. CNI-TLC included two classification tasks: (1) diagnosis of Attention-Deficit/Hyperactivity Disorder (ADHD) within a pre-adolescent cohort; and (2) transference of the ADHD model to a related cohort of Autism Spectrum Disorder (ASD) patients with an ADHD comorbidity. In total, 240 resting-state fMRI time series averaged according to three standard parcellation atlases, along with clinical diagnosis, were released for training and validation (120 neurotypical controls and 120 ADHD). We also provided demographic information of age, sex, IQ, and handedness. A second set of 100 subjects (50 neurotypical controls, 25 ADHD, and 25 ASD with ADHD comorbidity) was used for testing. Models were submitted in a standardized format as Docker images through ChRIS, an open-source image analysis platform. Utilizing an inclusive approach, we ranked the methods based on 16 different metrics. The final rank was calculated using the rank product for each participant across all measures. Furthermore, we assessed the calibration curves of each method. Five participants submitted their model for evaluation, with one outperforming all other methods in both ADHD and ASD classification. However, further improvements are needed to reach the clinical translation of functional connectomics. We are keeping the CNI-TLC open as a publicly available resource for developing and validating new classification methodologies in the field of connectomics.
△ Less
Submitted 25 November, 2020; v1 submitted 5 June, 2020;
originally announced June 2020.
-
Alpha-1 adrenergic receptor antagonists to prevent hyperinflammation and death from lower respiratory tract infection
Authors:
Allison Koenecke,
Michael Powell,
Ruoxuan Xiong,
Zhu Shen,
Nicole Fischer,
Sakibul Huq,
Adham M. Khalafallah,
Marco Trevisan,
Pär Sparen,
Juan J Carrero,
Akihiko Nishimura,
Brian Caffo,
Elizabeth A. Stuart,
Renyuan Bai,
Verena Staedtke,
David L. Thomas,
Nickolas Papadopoulos,
Kenneth W. Kinzler,
Bert Vogelstein,
Shibin Zhou,
Chetan Bettegowda,
Maximilian F. Konig,
Brett Mensh,
Joshua T. Vogelstein,
Susan Athey
Abstract:
In severe viral pneumonia, including Coronavirus disease 2019 (COVID-19), the viral replication phase is often followed by hyperinflammation, which can lead to acute respiratory distress syndrome, multi-organ failure, and death. We previously demonstrated that alpha-1 adrenergic receptor ($α_1$-AR) antagonists can prevent hyperinflammation and death in mice. Here, we conducted retrospective analys…
▽ More
In severe viral pneumonia, including Coronavirus disease 2019 (COVID-19), the viral replication phase is often followed by hyperinflammation, which can lead to acute respiratory distress syndrome, multi-organ failure, and death. We previously demonstrated that alpha-1 adrenergic receptor ($α_1$-AR) antagonists can prevent hyperinflammation and death in mice. Here, we conducted retrospective analyses in two cohorts of patients with acute respiratory distress (ARD, n=18,547) and three cohorts with pneumonia (n=400,907). Federated across two ARD cohorts, we find that patients exposed to $α_1$-AR antagonists, as compared to unexposed patients, had a 34% relative risk reduction for mechanical ventilation and death (OR=0.70, p=0.021). We replicated these methods on three pneumonia cohorts, all with similar effects on both outcomes. All results were robust to sensitivity analyses. These results highlight the urgent need for prospective trials testing whether prophylactic use of $α_1$-AR antagonists ameliorates lower respiratory tract infection-associated hyperinflammation and death, as observed in COVID-19.
△ Less
Submitted 2 August, 2021; v1 submitted 21 April, 2020;
originally announced April 2020.
-
DS-GCNs: Connectome Classification Using Dynamic Spectral Graph Convolution Networks with Assistant Task Training
Authors:
Xiaodan Xing,
Qingfeng Li,
Hao Wei,
Minqing Zhang,
Yiqiang Zhan,
Xiang Sean Zhou,
Zhong Xue,
Feng Shi
Abstract:
Functional Connectivity (FC) matrices measure the regional interactions in the brain and have been widely used in neurological brain disease classification. However, a FC matrix is neither a natural image which contains shape and texture information, nor a vector of independent features, which renders the extracting of efficient features from matrices as a challenging problem. A brain network, als…
▽ More
Functional Connectivity (FC) matrices measure the regional interactions in the brain and have been widely used in neurological brain disease classification. However, a FC matrix is neither a natural image which contains shape and texture information, nor a vector of independent features, which renders the extracting of efficient features from matrices as a challenging problem. A brain network, also named as connectome, could forma a graph structure naturally, the nodes of which are brain regions and the edges are interregional connectivity. Thus, in this study, we proposed novel graph convolutional networks (GCNs) to extract efficient disease-related features from FC matrices. Considering the time-dependent nature of brain activity, we computed dynamic FC matrices with sliding-windows and implemented a graph convolution based LSTM (long short term memory) layer to process dynamic graphs. Moreover, the demographics of patients were also used to guide the classification. However, unlike in conventional methods where personal information, i.e., gender and age were added as extra inputs, we argue that this kind of approach may not actually improve the classification performance, for such personal information given in dataset was usually balanced distributed. In this paper, we proposed to utilize the demographic information as extra outputs and to share parameters among three networks predicting subject status, gender and age, which serve as assistant tasks. We tested the performance of the proposed architecture in ADNI II dataset to classify Alzheimer's disease patients from normal controls. The classification accuracy, sensitivity and specificity reach 0.90, 0.92 and 0.89 on ADNI II dataset.
△ Less
Submitted 10 December, 2019;
originally announced January 2020.
-
Logic and connectivity jointly determine criticality in biological gene regulatory networks
Authors:
Bryan C. Daniels,
Hyunju Kim,
Douglas Moore,
Siyu Zhou,
Harrison Smith,
Bradley Karas,
Stuart A. Kauffman,
Sara I. Walker
Abstract:
The complex dynamics of gene expression in living cells can be well-approximated using Boolean networks. The average sensitivity is a natural measure of stability in these systems: values below one indicate typically stable dynamics associated with an ordered phase, whereas values above one indicate chaotic dynamics. This yields a theoretically motivated adaptive advantage to being near the critic…
▽ More
The complex dynamics of gene expression in living cells can be well-approximated using Boolean networks. The average sensitivity is a natural measure of stability in these systems: values below one indicate typically stable dynamics associated with an ordered phase, whereas values above one indicate chaotic dynamics. This yields a theoretically motivated adaptive advantage to being near the critical value of one, at the boundary between order and chaos. Here, we measure average sensitivity for 66 publicly available Boolean network models describing the function of gene regulatory circuits across diverse living processes. We find the average sensitivity values for these networks are clustered around unity, indicating they are near critical. In many types of random networks, mean connectivity <K> and the average activity bias of the logic functions <p> have been found to be the most important network properties in determining average sensitivity, and by extension a network's criticality. Surprisingly, many of these gene regulatory networks achieve the near-critical state with <K> and <p> far from that predicted for critical systems: randomized networks sharing the local causal structure and local logic of biological networks better reproduce their critical behavior than controlling for macroscale properties such as <K> and <p> alone. This suggests the local properties of genes interacting within regulatory networks are selected to collectively be near-critical, and this non-local property of gene regulatory network dynamics cannot be predicted using the density of interactions alone.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.
-
EEG and ECG changes during deep-sea manned submersible operation
Authors:
Haifei Yang,
Lu Shi,
Feng Liu,
Yanmeng Zhang,
Baohua Liu,
Yangyang Li,
Zhongyuan Shi,
Shuyao Zhou
Abstract:
Background: Deep-sea manned submersible operation could induce mental workload and influence neurophysiological measures. Psychophysiological responses to submersible operation are not well known. The main aim of this study was to investigate changes in EEG and ECG components and subjective mental stress of pilots during submersible operation. Methods: There were 6 experienced submersible pilots w…
▽ More
Background: Deep-sea manned submersible operation could induce mental workload and influence neurophysiological measures. Psychophysiological responses to submersible operation are not well known. The main aim of this study was to investigate changes in EEG and ECG components and subjective mental stress of pilots during submersible operation. Methods: There were 6 experienced submersible pilots who performed a 3 h submersible operation task composed of 5 subtasks. Electroencephalogram (EEG) and electrocardiogram (ECG) was recorded before the operation task, after 1.5 h and 2.5 h operation, and after the task. Subjective ratings of mental stress were also conducted at these time points. Results: HR and scores on subjective stressed scale increased during the task compared to baseline (P<0.05). LF/HF ratio at 1.5 h were higher than those at Baseline (P<0.05) and 2.5 h (P<0.05). Relative theta power at the Cz site increased (P<0.01) and relative alpha power decreased (P<0.01) at 2.5 h compared to values at Baseline. Alpha attenuation coefficient (AAC, ratio of mean alpha power during eyes closed versus eyes open) at 2.5 h and after the task were lower compared to baseline and 1.5 h (P<0.05 or less). Conclusions: Submersible operation resulted in an increased HR in association with mental stress, alterations in autonomic activity and EEG changes that expressed variations in mental workload. Brain arousal level declined during the later operation period.
△ Less
Submitted 1 July, 2017;
originally announced July 2017.
-
Continuous use of ERP-based BCIs with different visual angles in ALS patients
Authors:
Jing Jin,
Brendan Z. Allison,
Yu Zhang,
Yan Chen,
Sijie Zhou,
Yi Dong,
Xingyu Wang,
Andrzej Chchocki
Abstract:
Objective: Amyotrophic lateral sclerosis (ALS) is a rare disease, but is also one of the most common motor neuron diseases, and people of all races and ethnic backgrounds are affected. There is currently no cure. Brain computer interfaces (BCIs) can establish a communication channel directly between the brain and an external device by recognizing brain activities that reflect user intent. Therefor…
▽ More
Objective: Amyotrophic lateral sclerosis (ALS) is a rare disease, but is also one of the most common motor neuron diseases, and people of all races and ethnic backgrounds are affected. There is currently no cure. Brain computer interfaces (BCIs) can establish a communication channel directly between the brain and an external device by recognizing brain activities that reflect user intent. Therefore, this technology could help ALS patients in promoting functional independence through BCI-based speller systems and motor assistive devices. Methods: In this paper, two kinds of ERP-based speller systems were tested on 18 ALS patients to: (1) assess performance when they spelled 42 characters online continuously, without a break; and (2) to compare performance between a matrix-based speller paradigm (MS-P, mean visual angle 6 degree) and a new speller paradigm that used a larger visual angle called the large visual angle speller paradigm (LS-P, mean visual angle 8 degree). Results: Although results showed that there were no significant differences between the two paradigms in accuracy trend over continuous use (p>0.05), the fatigue during the LS-P condition was significantly lower than that of MS-P (p<0.05). Results also showed that continuous use slightly reduced the performance of this ERP-based BCI. Conclusion: 15 subjects obtained higher than 80% feedback accuracy (online output accuracy) and 9 subjects obtained higher than 90% feedback accuracy in one of the two paradigms, thus validating the BCI approaches in this study. Significance: Most ALS subjects in this study could spell effectively after continuous use of an ERP-based BCI. The new LS-P display may be easier for subjects to use, resulting in lower fatigue.
△ Less
Submitted 27 June, 2017;
originally announced June 2017.
-
Hybrid spreading mechanisms and T cell activation shape the dynamics of HIV-1 infection
Authors:
Changwang Zhang,
Shi Zhou,
Elisabetta Groppelli,
Pierre Pellegrino,
Ian Williams,
Persephone Borrow,
Benjamin M. Chain,
Clare Jolly
Abstract:
HIV-1 can disseminate between susceptible cells by two mechanisms: cell-free infection following fluid-phase diffusion of virions and by highly-efficient direct cell-to-cell transmission at immune cell contacts. The contribution of this hybrid spreading mechanism, which is also a characteristic of some important computer worm outbreaks, to HIV-1 progression in vivo remains unknown. Here we present…
▽ More
HIV-1 can disseminate between susceptible cells by two mechanisms: cell-free infection following fluid-phase diffusion of virions and by highly-efficient direct cell-to-cell transmission at immune cell contacts. The contribution of this hybrid spreading mechanism, which is also a characteristic of some important computer worm outbreaks, to HIV-1 progression in vivo remains unknown. Here we present a new mathematical model that explicitly incorporates the ability of HIV-1 to use hybrid spreading mechanisms and evaluate the consequences for HIV-1 pathogenenesis. The model captures the major phases of the HIV-1 infection course of a cohort of treatment naive patients and also accurately predicts the results of the Short Pulse Anti-Retroviral Therapy at Seroconversion (SPARTAC) trial. Using this model we find that hybrid spreading is critical to seed and establish infection, and that cell-to-cell spread and increased CD4+ T cell activation are important for HIV-1 progression. Notably, the model predicts that cell-to-cell spread becomes increasingly effective as infection progresses and thus may present a considerable treatment barrier. Deriving predictions of various treatments' influence on HIV-1 progression highlights the importance of earlier intervention and suggests that treatments effectively targeting cell-to-cell HIV-1 spread can delay progression to AIDS. This study suggests that hybrid spreading is a fundamental feature of HIV infection, and provides the mathematical framework incorporating this feature with which to evaluate future therapeutic strategies.
△ Less
Submitted 31 March, 2015;
originally announced March 2015.
-
A maximum likelihood estimate of natural mortality for brown tiger prawn (Penaeus esculentus) in Moreton Bay (Australia)
Authors:
Marco Kienzle,
David Sterling,
Shijie Zhou,
You-Gan Wang
Abstract:
The delay difference model was implemented to fit 21 years of brown tiger prawn (Penaeus esculentus) catch in Moreton Bay by maximum likelihood to assess the status of this stock. Monte Carlo simulations testing of the stock assessment software coded in C++ showed that the model could estimate simultaneously natural mortality in addition to catchability, recruitment and initial biomasses. Applied…
▽ More
The delay difference model was implemented to fit 21 years of brown tiger prawn (Penaeus esculentus) catch in Moreton Bay by maximum likelihood to assess the status of this stock. Monte Carlo simulations testing of the stock assessment software coded in C++ showed that the model could estimate simultaneously natural mortality in addition to catchability, recruitment and initial biomasses. Applied to logbooks data collected from 1990 to 2010, this implementation of the delay difference provided for the first time an estimate of natural mortality for brown tiger prawn in Moreton Bay, equal to $0.031 \pm 0.002$ week$^{-1}$. This estimate is approximately 30\% lower than the value of natural mortality (0.045 week$^{-1}$) used in previous stock assessments of this species.
△ Less
Submitted 27 January, 2015;
originally announced January 2015.
-
Yeast caspase 1 suppresses the burst of reactive oxygen species and maintains mitochondrial stability in Saccharomyces cerevisiae
Authors:
Lin Du,
Xiaodan Huang,
Jian Tan,
Yongjun Lu,
Shining Zhou
Abstract:
Caspases are a family of cysteine proteases that play essential roles during apoptosis, and we presume some of them may also protect the cell from oxidative stress. We found that the absence of yeast caspase 1(Yca1)in Saccharomyces cerevisiae leads to a more intense burst of mitochondrial reactive oxygen species (ROS) In addition, compared to wild type yeast cells, the ability of yca1 mutant cells…
▽ More
Caspases are a family of cysteine proteases that play essential roles during apoptosis, and we presume some of them may also protect the cell from oxidative stress. We found that the absence of yeast caspase 1(Yca1)in Saccharomyces cerevisiae leads to a more intense burst of mitochondrial reactive oxygen species (ROS) In addition, compared to wild type yeast cells, the ability of yca1 mutant cells to maintain mitochondrial activity is significantly reduced after either oxidative stress treatment or aging. During mitochondrial ROS burst, deletion of the yca1 gene delayed structural damage of a green fluorescent protein (GFP) reporter bound in the inner mitochondrial membrane. This work implies that yeast caspase 1 is closely connected to the oxidative stress response. We speculate that Yca1 can discriminate proteins damaged by oxidation and accelerate their hydrolysis to attenuate the ROS burst.
△ Less
Submitted 14 January, 2015;
originally announced January 2015.
-
Optimizing Hybrid Spreading in Metapopulations
Authors:
Changwang Zhang,
Shi Zhou,
Joel C. Miller,
Ingemar J. Cox,
Benjamin M. Chain
Abstract:
Epidemic spreading phenomena are ubiquitous in nature and society. Examples include the spreading of diseases, information, and computer viruses. Epidemics can spread by local spreading, where infected nodes can only infect a limited set of direct target nodes and global spreading, where an infected node can infect every other node. In reality, many epidemics spread using a hybrid mixture of both…
▽ More
Epidemic spreading phenomena are ubiquitous in nature and society. Examples include the spreading of diseases, information, and computer viruses. Epidemics can spread by local spreading, where infected nodes can only infect a limited set of direct target nodes and global spreading, where an infected node can infect every other node. In reality, many epidemics spread using a hybrid mixture of both types of spreading. In this study we develop a theoretical framework for studying hybrid epidemics, and examine the optimum balance between spreading mechanisms in terms of achieving the maximum outbreak size. We show the existence of critically hybrid epidemics where neither spreading mechanism alone can cause a noticeable spread but a combination of the two spreading mechanisms would produce an enormous outbreak. Our results provide new strategies for maximising beneficial epidemics and estimating the worst outcome of damaging hybrid epidemics.
△ Less
Submitted 31 March, 2015; v1 submitted 25 September, 2014;
originally announced September 2014.
-
Semantic Context Forests for Learning-Based Knee Cartilage Segmentation in 3D MR Images
Authors:
Quan Wang,
Dijia Wu,
Le Lu,
Meizhu Liu,
Kim L. Boyer,
Shaohua Kevin Zhou
Abstract:
The automatic segmentation of human knee cartilage from 3D MR images is a useful yet challenging task due to the thin sheet structure of the cartilage with diffuse boundaries and inhomogeneous intensities. In this paper, we present an iterative multi-class learning method to segment the femoral, tibial and patellar cartilage simultaneously, which effectively exploits the spatial contextual constra…
▽ More
The automatic segmentation of human knee cartilage from 3D MR images is a useful yet challenging task due to the thin sheet structure of the cartilage with diffuse boundaries and inhomogeneous intensities. In this paper, we present an iterative multi-class learning method to segment the femoral, tibial and patellar cartilage simultaneously, which effectively exploits the spatial contextual constraints between bone and cartilage, and also between different cartilages. First, based on the fact that the cartilage grows in only certain area of the corresponding bone surface, we extract the distance features of not only to the surface of the bone, but more informatively, to the densely registered anatomical landmarks on the bone surface. Second, we introduce a set of iterative discriminative classifiers that at each iteration, probability comparison features are constructed from the class confidence maps derived by previously learned classifiers. These features automatically embed the semantic context information between different cartilages of interest. Validated on a total of 176 volumes from the Osteoarthritis Initiative (OAI) dataset, the proposed approach demonstrates high robustness and accuracy of segmentation in comparison with existing state-of-the-art MR cartilage segmentation methods.
△ Less
Submitted 22 April, 2014; v1 submitted 10 July, 2013;
originally announced July 2013.
-
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
Authors:
Keith R. Bradnam,
Joseph N. Fass,
Anton Alexandrov,
Paul Baranay,
Michael Bechner,
İnanç Birol,
Sébastien Boisvert,
Jarrod A. Chapman,
Guillaume Chapuis,
Rayan Chikhi,
Hamidreza Chitsaz,
Wen-Chi Chou,
Jacques Corbeil,
Cristian Del Fabbro,
T. Roderick Docking,
Richard Durbin,
Dent Earl,
Scott Emrich,
Pavel Fedotov,
Nuno A. Fonseca,
Ganeshkumar Ganapathy,
Richard A. Gibbs,
Sante Gnerre,
Élénie Godzaridis,
Steve Goldstein
, et al. (66 additional authors not shown)
Abstract:
Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and…
▽ More
Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results - In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions - Many current genome assemblers produced useful assemblies, containing a significant representation of their genes, regulatory sequences, and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
△ Less
Submitted 27 June, 2013; v1 submitted 23 January, 2013;
originally announced January 2013.
-
Epidemic spreading with nonlinear infectivity in weighted scale-free networks
Authors:
Xiangwei Chu,
Zhongzhi Zhang,
Jihong Guan,
Shuigeng Zhou
Abstract:
In this paper, we investigate the epidemic spreading for SIR model in weighted scale-free networks with nonlinear infectivity, where the transmission rate in our analytical model is weighted. Concretely, we introduce the infectivity exponent $α$ and the weight exponent $β$ into the analytical SIR model, then examine the combination effects of $α$ and $β$ on the epidemic threshold and phase trans…
▽ More
In this paper, we investigate the epidemic spreading for SIR model in weighted scale-free networks with nonlinear infectivity, where the transmission rate in our analytical model is weighted. Concretely, we introduce the infectivity exponent $α$ and the weight exponent $β$ into the analytical SIR model, then examine the combination effects of $α$ and $β$ on the epidemic threshold and phase transition. We show that one can adjust the values of $α$ and $β$ to rebuild the epidemic threshold to a finite value, and it is observed that the steady epidemic prevalence $R$ grows in an exponential form in the early stage, then follows hierarchical dynamics. Furthermore, we find $α$ is more sensitive than $β$ in the transformation of the epidemic threshold and epidemic prevalence, which might deliver some useful information or new insights in the epidemic spreading and the correlative immunization schemes.
△ Less
Submitted 5 March, 2009;
originally announced March 2009.
-
Fractal scale-free networks resistant to disease spread
Authors:
Zhongzhi Zhang,
Shuigeng Zhou,
Zou Tao,
Guisheng Chen
Abstract:
In contrast to the conventional wisdom that scale-free networks are prone to epidemic propagation, in the paper we present that disease spreading is inhibited in fractal scale-free networks. We first propose a novel network model and show that it simultaneously has the following rich topological properties: scale-free degree distribution, tunable clustering coefficient, "large-world" behavior, a…
▽ More
In contrast to the conventional wisdom that scale-free networks are prone to epidemic propagation, in the paper we present that disease spreading is inhibited in fractal scale-free networks. We first propose a novel network model and show that it simultaneously has the following rich topological properties: scale-free degree distribution, tunable clustering coefficient, "large-world" behavior, and fractal scaling. Existing network models do not display these characteristics. Then, we investigate the susceptible-infected-removed (SIR) model of the propagation of diseases in our fractal scale-free networks by mapping it to bond percolation process. We find an existence of nonzero tunable epidemic thresholds by making use of the renormalization group technique, which implies that power-law degree distribution does not suffice to characterize the epidemic dynamics on top of scale-free networks. We argue that the epidemic dynamics are determined by the topological properties, especially the fractality and its accompanying "large-world" behavior.
△ Less
Submitted 20 April, 2008;
originally announced April 2008.