Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 60 results for author: Tang, J

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2406.13864  [pdf, other

    cs.LG q-bio.BM

    Evaluating representation learning on the protein structure universe

    Authors: Arian R. Jamasb, Alex Morehead, Chaitanya K. Joshi, Zuobai Zhang, Kieran Didi, Simon V. Mathis, Charles Harris, Jian Tang, Jianlin Cheng, Pietro Lio, Tom L. Blundell

    Abstract: We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relations… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICLR 2024

  2. arXiv:2406.10146  [pdf

    q-bio.QM

    Multimodal Radiomics Model for Predicting Gold Nanoparticles Accumulation in Mouse Tumors

    Authors: Jiajia Tang, Jie Zhang, Jiulou Zhang, Yuxia Tang, Hao Ni, Shouju Wang

    Abstract: Background: Nanoparticles can accumulate in solid tumors, serving as diagnostic or therapeutic agents for cancer. Clinical translation is challenging due to low accumulation in tumors and heterogeneity between tumor types and individuals. Tools to identify this heterogeneity and predict nanoparticle accumulation are needed. Advanced imaging techniques combined with radiomics and AI may offer a sol… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2406.05347  [pdf, other

    q-bio.BM cs.AI cs.LG

    MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training

    Authors: Bo Chen, Zhilei Bei, Xingyi Cheng, Pan Li, Jie Tang, Le Song

    Abstract: Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high quality MSA. Although various methods have been proposed to generate virtual MSA under these conditions, they fall short in compre… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  4. arXiv:2405.18605  [pdf, ps, other

    cs.CL cs.IR q-bio.MN

    BioBERT-based Deep Learning and Merged ChemProt-DrugProt for Enhanced Biomedical Relation Extraction

    Authors: Bridget T. McInnes, Jiawei Tang, Darshini Mahendran, Mai H. Nguyen

    Abstract: This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improv… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2405.00751  [pdf, other

    q-bio.QM cs.AI cs.LG

    F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching

    Authors: Shaoning Li, Yusong Wang, Mingyu Li, Jian Zhang, Bin Shao, Nanning Zheng, Jian Tang

    Abstract: Molecular dynamics (MD) is a crucial technique for simulating biological systems, enabling the exploration of their dynamic nature and fostering an understanding of their functions and properties. To address exploration inefficiency, emerging enhanced sampling approaches like coarse-graining (CG) and generative models have been employed. In this work, we propose a \underline{Frame-to-Frame} genera… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by ICLR 2024 GEM workshop

  6. arXiv:2402.10433  [pdf, other

    q-bio.BM cs.LG q-bio.QM

    Fusing Neural and Physical: Augment Protein Conformation Sampling with Tractable Simulations

    Authors: Jiarui Lu, Zuobai Zhang, Bozitao Zhong, Chence Shi, Jian Tang

    Abstract: The protein dynamics are common and important for their biological functions and properties, the study of which usually involves time-consuming molecular dynamics (MD) simulations in silico. Recently, generative models has been leveraged as a surrogate sampler to obtain conformation ensembles with orders of magnitude faster and without requiring any simulation data (a "zero-shot" inference). Howev… ▽ More

    Submitted 11 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Published at the GEM workshop, ICLR 2024

  7. arXiv:2402.07955  [pdf, other

    q-bio.BM cs.LG

    ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation

    Authors: Zuobai Zhang, Jiarui Lu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang

    Abstract: Protein function annotation is an important yet challenging task in biology. Recent deep learning advancements show significant potential for accurate function prediction by learning from protein sequences and structures. Nevertheless, these predictor-based methods often overlook the modeling of protein similarity, an idea commonly employed in traditional approaches using sequence or structure ret… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  8. arXiv:2402.05856  [pdf, other

    q-bio.BM cs.LG

    Structure-Informed Protein Language Model

    Authors: Zuobai Zhang, Jiarui Lu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang

    Abstract: Protein language models are a powerful tool for learning protein representations through pre-training on vast protein sequence datasets. However, traditional protein language models lack explicit structural supervision, despite its relevance to protein function. To address this issue, we introduce the integration of remote homology detection to distill structural information into protein language… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  9. arXiv:2401.17123  [pdf, other

    cs.LG cs.AI q-bio.QM

    Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are Entangled

    Authors: Shengchao Liu, Chengpeng Wang, Jiarui Lu, Weili Nie, Hanchen Wang, Zhuoxinran Li, Bolei Zhou, Jian Tang

    Abstract: Deep generative models (DGMs) have been widely developed for graph data. However, much less investigation has been carried out on understanding the latent space of such pretrained graph DGMs. These understandings possess the potential to provide constructive guidelines for crucial tasks, such as graph controllable generation. Thus in this work, we are interested in studying this problem and propos… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  10. arXiv:2401.11447  [pdf, other

    cs.LG q-bio.QM

    Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis

    Authors: Yin Li, Yu Xiong, Wenxin Fan, Kai Wang, Qingqing Yu, Liping Si, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Objective: Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR). How to enhance the adherence of patients to maximize the benefit of allergen immunotherapy (AIT) plays a crucial role in the management of AIT. This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients and related local symptom s… ▽ More

    Submitted 19 July, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Frontiers in Pharmacology, research topic: Methods and Metrics to Measure Medication Adherence

  11. arXiv:2401.06199  [pdf, other

    q-bio.QM cs.AI cs.LG

    xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein

    Authors: Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng, Chiming Liu, Aohan Zeng, Yuxiao Dong, Jie Tang, Le Song

    Abstract: Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  12. arXiv:2312.15252  [pdf, other

    q-bio.BM cs.LG

    DTIAM: A unified framework for predicting drug-target interactions, binding affinities and activation/inhibition mechanisms

    Authors: Zhangli Lu, Chuqi Lei, Kaili Wang, Libo Qin, Jing Tang, Min Li

    Abstract: Accurate and robust prediction of drug-target interactions (DTIs) plays a vital role in drug discovery. Despite extensive efforts have been invested in predicting novel DTIs, existing approaches still suffer from insufficient labeled data and cold start problems. More importantly, there is currently a lack of studies focusing on elucidating the mechanism of action (MoA) between drugs and targets.… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  13. arXiv:2312.00080  [pdf, other

    q-bio.QM cs.LG

    PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design

    Authors: Chuanrui Wang, Bozitao Zhong, Zuobai Zhang, Narendra Chaudhary, Sanchit Misra, Jian Tang

    Abstract: Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a universally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the $\textit{in silico}$ validation with recovery and perplexity metrics is efficient but may not… ▽ More

    Submitted 29 November, 2023; originally announced December 2023.

    Comments: 13 pages

  14. arXiv:2310.10697  [pdf

    q-bio.QM

    Synthetic IMU Datasets and Protocols Can Simplify Fall Detection Experiments and Optimize Sensor Configuration

    Authors: Jie Tang, Bin He, Junkai Xu, Tian Tan, Zhipeng Wang, Yanmin Zhou, Shuo Jiang

    Abstract: Falls represent a significant cause of injury among the elderly population. Extensive research has been devoted to the utilization of wearable IMU sensors in conjunction with machine learning techniques for fall detection. To address the challenge of acquiring costly training data, this paper presents a novel method that generates a substantial volume of synthetic IMU data with minimal real fall e… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 11 pages, 7 figures

  15. arXiv:2306.09375  [pdf, other

    cs.LG physics.chem-ph q-bio.QM

    Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials

    Authors: Shengchao Liu, Weitao Du, Yanjing Li, Zhuoxinran Li, Zhiling Zheng, Chenru Duan, Zhiming Ma, Omar Yaghi, Anima Anandkumar, Christian Borgs, Jennifer Chayes, Hongyu Guo, Jian Tang

    Abstract: Artificial intelligence for scientific discovery has recently generated significant interest within the machine learning and scientific communities, particularly in the domains of chemistry, biology, and material discovery. For these scientific problems, molecules serve as the fundamental building blocks, and machine learning has emerged as a highly effective and powerful tool for modeling their g… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  16. arXiv:2306.07505  [pdf

    q-bio.TO eess.IV

    Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

    Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

    Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  17. arXiv:2306.03117  [pdf, other

    q-bio.QM cs.LG q-bio.BM

    Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

    Authors: Jiarui Lu, Bozitao Zhong, Zuobai Zhang, Jian Tang

    Abstract: The dynamic nature of proteins is crucial for determining their biological functions and properties, for which Monte Carlo (MC) and molecular dynamics (MD) simulations stand as predominant tools to study such phenomena. By utilizing empirically derived force fields, MC or MD simulations explore the conformational space through numerically evolving the system via Markov chain or Newtonian mechanics… ▽ More

    Submitted 11 March, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Published as a conference paper at ICLR 2024, see https://openreview.net/forum?id=C4BikKsgmK

  18. arXiv:2306.01794  [pdf, other

    q-bio.QM cs.LG

    DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing

    Authors: Yangtian Zhang, Zuobai Zhang, Bozitao Zhong, Sanchit Misra, Jian Tang

    Abstract: Proteins play a critical role in carrying out biological functions, and their 3D structures are essential in determining their functions. Accurately predicting the conformation of protein side-chains given their backbones is important for applications in protein structure prediction, design and protein-protein interactions. Traditional methods are computationally intensive and have limited accurac… ▽ More

    Submitted 15 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  19. arXiv:2305.18407  [pdf, other

    cs.LG cs.AI q-bio.BM

    A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining

    Authors: Shengchao Liu, Weitao Du, Zhiming Ma, Hongyu Guo, Jian Tang

    Abstract: Molecule pretraining has quickly become the go-to schema to boost the performance of AI-based drug discovery. Naturally, molecules can be represented as 2D topological graphs or 3D geometric point clouds. Although most existing pertaining methods focus on merely the single modality, recent research has shown that maximizing the mutual information (MI) between such two modalities enhances the molec… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  20. arXiv:2304.12825  [pdf, other

    q-bio.BM cs.AI cs.LG

    GraphVF: Controllable Protein-Specific 3D Molecule Generation with Variational Flow

    Authors: Fang Sun, Zhihao Zhan, Hongyu Guo, Ming Zhang, Jian Tang

    Abstract: Designing molecules that bind to specific target proteins is a fundamental task in drug discovery. Recent models leverage geometric constraints to generate ligand molecules that bind cohesively with specific protein pockets. However, these models cannot effectively generate 3D molecules with 2D skeletal curtailments and property constraints, which are pivotal to drug potency and development. To ta… ▽ More

    Submitted 23 February, 2023; originally announced April 2023.

    Comments: 15 pages, 8 figures

  21. arXiv:2303.06275  [pdf, other

    q-bio.QM cs.LG

    A Systematic Study of Joint Representation Learning on Protein Sequences and Structures

    Authors: Zuobai Zhang, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang

    Abstract: Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions. Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge. In contrast, structure-based methods leverage 3D structural informat… ▽ More

    Submitted 18 October, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

  22. Single-Cell Multimodal Prediction via Transformers

    Authors: Wenzhuo Tang, Hongzhi Wen, Renming Liu, Jiayuan Ding, Wei Jin, Yuying Xie, Hui Liu, Jiliang Tang

    Abstract: The recent development of multimodal single-cell technology has made the possibility of acquiring multiple omics data from individual cells, thereby enabling a deeper understanding of cellular states and dynamics. Nevertheless, the proliferation of multimodal single-cell data also introduces tremendous challenges in modeling the complex interactions among different modalities. The recently advance… ▽ More

    Submitted 13 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: CIKM 2023

    Journal ref: In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM 23), 2023, Birmingham, United Kingdom

  23. arXiv:2302.04611  [pdf, other

    cs.LG cs.AI q-bio.QM stat.ML

    A Text-guided Protein Design Framework

    Authors: Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, Jian Tang, Hongyu Guo, Anima Anandkumar

    Abstract: Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework tha… ▽ More

    Submitted 12 August, 2024; v1 submitted 9 February, 2023; originally announced February 2023.

  24. arXiv:2301.12040  [pdf, other

    q-bio.BM cs.LG

    ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts

    Authors: Minghao Xu, Xinyu Yuan, Santiago Miret, Jian Tang

    Abstract: Current protein language models (PLMs) learn protein representations mainly based on their sequences, thereby well capturing co-evolutionary information, but they are unable to explicitly acquire protein functions, which is the end goal of protein representation learning. Fortunately, for many proteins, their textual property descriptions are available, where their various functions are also descr… ▽ More

    Submitted 4 July, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: Accpeted by ICML 2023 (Oral), code and data released

  25. arXiv:2212.10789  [pdf, other

    cs.LG cs.CL q-bio.QM stat.ML

    Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

    Authors: Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, Anima Anandkumar

    Abstract: There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Her… ▽ More

    Submitted 29 January, 2024; v1 submitted 21 December, 2022; originally announced December 2022.

  26. arXiv:2210.12385  [pdf, other

    q-bio.QM cs.AI

    Deep Learning in Single-Cell Analysis

    Authors: Dylan Molho, Jiayuan Ding, Zhaoheng Li, Hongzhi Wen, Wenzhuo Tang, Yixin Wang, Julian Venegas, Wei Jin, Renming Liu, Runze Su, Patrick Danaher, Robert Yang, Yu Leo Lei, Yuying Xie, Jiliang Tang

    Abstract: Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performan… ▽ More

    Submitted 5 November, 2022; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: 77 pages, 11 figures, 15 tables, deep learning, single-cell analysis

  27. arXiv:2210.08761  [pdf, other

    q-bio.BM cs.LG

    Protein Sequence and Structure Co-Design with Equivariant Translation

    Authors: Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang

    Abstract: Proteins are macromolecules that perform essential functions in all living organisms. Designing novel proteins with specific structures and desired functions has been a long-standing challenge in the field of bioengineering. Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models, both of which suffer from high inference costs. In thi… ▽ More

    Submitted 2 March, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Published as a conference paper at ICLR 2023, see https://openreview.net/forum?id=pRCMXcfdihq

  28. arXiv:2210.06069  [pdf, other

    q-bio.BM cs.LG

    E3Bind: An End-to-End Equivariant Network for Protein-Ligand Docking

    Authors: Yangtian Zhang, Huiyu Cai, Chence Shi, Bozitao Zhong, Jian Tang

    Abstract: In silico prediction of the ligand binding pose to a given protein target is a crucial but challenging task in drug discovery. This work focuses on blind flexible selfdocking, where we aim to predict the positions, orientations and conformations of docked molecules. Traditional physics-based methods usually suffer from inaccurate scoring functions and high inference cost. Recently, data-driven met… ▽ More

    Submitted 1 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: International Conference on Learning Representations (ICLR 2023)

  29. arXiv:2209.15315  [pdf, other

    cs.LG physics.chem-ph q-bio.BM q-bio.QM

    FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning

    Authors: Songtao Liu, Zhengkai Tu, Minkai Xu, Zuobai Zhang, Lu Lin, Rex Ying, Jian Tang, Peilin Zhao, Dinghao Wu

    Abstract: Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms, taking only the product as the input to predict the reactants for each planning step and ignoring valuable context information along the synthetic route. In this work, we pr… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Accepted by ICML 2023

  30. arXiv:2207.05561  [pdf, other

    cs.NE cs.AI cs.LG q-bio.NC

    Brain-inspired Graph Spiking Neural Networks for Commonsense Knowledge Representation and Reasoning

    Authors: Hongjian Fang, Yi Zeng, Jianbo Tang, Yuwei Wang, Yao Liang, Xin Liu

    Abstract: How neural networks in the human brain represent commonsense knowledge, and complete related reasoning tasks is an important research topic in neuroscience, cognitive science, psychology, and artificial intelligence. Although the traditional artificial neural network using fixed-length vectors to represent symbols has gained good performance in some specific tasks, it is still a black box that lac… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

  31. arXiv:2207.00821  [pdf

    q-bio.BM cs.LG q-bio.MN

    PGMG: A Pharmacophore-Guided Deep Learning Approach for Bioactive Molecular Generation

    Authors: Huimin Zhu, Renyi Zhou, Jing Tang, Min Li

    Abstract: The rational design of novel molecules with desired bioactivity is a critical but challenging task in drug discovery, especially when treating a novel target family or understudied targets. Here, we propose PGMG, a pharmacophore-guided deep learning approach for bioactivate molecule generation. Through the guidance of pharmacophore, PGMG provides a flexible strategy to generate bioactive molecules… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

  32. arXiv:2206.13602  [pdf, other

    cs.LG q-bio.QM

    Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching

    Authors: Shengchao Liu, Hongyu Guo, Jian Tang

    Abstract: Molecular representation pretraining is critical in various applications for drug and material discovery due to the limited number of labeled molecules, and most existing work focuses on pretraining on 2D molecular graphs. However, the power of pretraining on 3D geometric structures has been less explored. This is owing to the difficulty of finding a sufficient proxy task that can empower the pret… ▽ More

    Submitted 28 February, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

  33. arXiv:2206.08005  [pdf, other

    cs.LG q-bio.QM

    Evaluating Self-Supervised Learning for Molecular Graph Embeddings

    Authors: Hanchen Wang, Jean Kaddour, Shengchao Liu, Jian Tang, Joan Lasenby, Qi Liu

    Abstract: Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling, a capability that carries profound implications for molecular graphs due to the staggering number of potential molecules and the high cost of obtaining labels. However, GSSL methods are designed not for optimisation within a specific domain but rather for transferability across a vari… ▽ More

    Submitted 18 October, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Camera ready, NeurIPS Benchmark 2023

  34. arXiv:2205.11279  [pdf, other

    cs.LG q-bio.QM

    Tyger: Task-Type-Generic Active Learning for Molecular Property Prediction

    Authors: Kuangqi Zhou, Kaixin Wang, Jiashi Feng, Jian Tang, Tingyang Xu, Xinchao Wang

    Abstract: How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery, which generally requires a large amount of annotation for training deep learning models. Annotating molecules, however, is quite costly because it requires lab experiments conducted by experts. To reduce annotation cost, deep Active Learning (AL) methods are developed to select only the most… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  35. arXiv:2205.11274  [pdf, other

    q-bio.MN stat.ME

    Single-cell gene regulatory network analysis for mixed cell populations with applications to COVID-19 single cell data

    Authors: Junjie Tang, Changhu Wang, Feiyi Xiao, Ruibin Xi

    Abstract: Gene regulatory network (GRN) refers to the complex network formed by regulatory interactions between genes in living cells. In this paper, we consider inferring GRNs in single cells based on single cell RNA sequencing (scRNA-seq) data. In scRNA-seq, single cells are often profiled from mixed populations and their cell identities are unknown. A common practice for single cell GRN analysis is to fi… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: 95 pages,28 figures

  36. arXiv:2203.04695  [pdf, other

    q-bio.BM cs.LG stat.ML

    Structured Multi-task Learning for Molecular Property Prediction

    Authors: Shengchao Liu, Meng Qu, Zuobai Zhang, Huiyu Cai, Jian Tang

    Abstract: Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for… ▽ More

    Submitted 5 October, 2022; v1 submitted 22 February, 2022; originally announced March 2022.

  37. arXiv:2203.02923  [pdf, other

    cs.LG q-bio.QM

    GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation

    Authors: Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, Jian Tang

    Abstract: Predicting molecular conformations from molecular graphs is a fundamental problem in cheminformatics and drug discovery. Recently, significant progress has been achieved with machine learning approaches, especially with deep generative models. Inspired by the diffusion process in classical non-equilibrium thermodynamics where heated particles will diffuse from original states to a noise distributi… ▽ More

    Submitted 6 March, 2022; originally announced March 2022.

    Comments: Published as a conference paper at ICLR 2022 (https://openreview.net/forum?id=PzcvxEMzvQC)

  38. arXiv:2112.06567  [pdf, other

    cs.LG cs.AI cs.SI q-bio.MN

    Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs

    Authors: Stephen Bonner, Ufuk Kirik, Ola Engkvist, Jian Tang, Ian P Barrett

    Abstract: Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KG) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One comm… ▽ More

    Submitted 18 March, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Briefings in Bioinformatics, 2022

  39. arXiv:2110.07728  [pdf, other

    cs.LG cs.CV eess.IV q-bio.QM

    Pre-training Molecular Graph Representation with 3D Geometry

    Authors: Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, Jian Tang

    Abstract: Molecular graph representation learning is a fundamental problem in modern drug and material discovery. Molecular graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays a more vital role in predicting molecular functionalities. However, the lack of 3D information in real-world scenarios has significantly impeded the… ▽ More

    Submitted 29 May, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

  40. arXiv:2108.07435  [pdf, other

    cs.LG cs.CL q-bio.BM

    Modeling Protein Using Large-scale Pretrain Language Model

    Authors: Yijia Xiao, Jiezhong Qiu, Ziang Li, Chang-Yu Hsieh, Jie Tang

    Abstract: Protein is linked to almost every life process. Therefore, analyzing the biological structure and property of protein sequences is critical to the exploration of life, as well as disease detection and drug discovery. Traditional protein analysis methods tend to be labor-intensive and time-consuming. The emergence of deep learning models makes modeling data patterns in large quantities of data poss… ▽ More

    Submitted 7 December, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

    Comments: Accepted paper in Pretrain@KDD 2021 (The International Workshop on Pretraining: Algorithms, Architectures, and Applications)

  41. arXiv:2106.05388  [pdf

    q-bio.NC q-bio.MN

    Neurological Consequences of COVID-19 Infection

    Authors: Jiabin Tang, Shivani Patel, Steve Gentleman, Paul Matthews

    Abstract: COVID-19 infections have well described systemic manifestations, especially respiratory problems. There are currently no specific treatments or vaccines against the current strain. With higher case numbers, a range of neurological symptoms are becoming apparent. The mechanisms responsible for these are not well defined, other than those related to hypoxia and microthrombi. We speculate that sustai… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: 19 pages, 4 figures

  42. arXiv:2105.07246  [pdf, other

    cs.LG q-bio.BM

    An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming

    Authors: Minkai Xu, Wujie Wang, Shitong Luo, Chence Shi, Yoshua Bengio, Rafael Gomez-Bombarelli, Jian Tang

    Abstract: Predicting molecular conformations (or 3D structures) from molecular graphs is a fundamental problem in many applications. Most existing approaches are usually divided into two steps by first predicting the distances between atoms and then generating a 3D structure through optimizing a distance geometry problem. However, the distances predicted with such two-stage approaches may not be able to con… ▽ More

    Submitted 2 June, 2021; v1 submitted 15 May, 2021; originally announced May 2021.

    Comments: Accepted by ICML 2021

  43. arXiv:2105.03902  [pdf, other

    cs.LG physics.chem-ph q-bio.BM

    Learning Gradient Fields for Molecular Conformation Generation

    Authors: Chence Shi, Shitong Luo, Minkai Xu, Jian Tang

    Abstract: We study a fundamental problem in computational chemistry known as molecular conformation generation, trying to predict stable 3D structures from 2D molecular graphs. Existing machine learning approaches usually first predict distances between atoms and then generate a 3D structure satisfying the distances, where noise in predicted distances may induce extra errors during 3D coordinate generation.… ▽ More

    Submitted 7 June, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

    Comments: ICML 2021, Long talk

  44. arXiv:2012.05716  [pdf, other

    q-bio.QM cs.LG

    Utilising Graph Machine Learning within Drug Discovery and Development

    Authors: Thomas Gaudelet, Ben Day, Arian R. Jamasb, Jyothish Soman, Cristian Regep, Gertrude Liu, Jeremy B. R. Hayter, Richard Vickers, Charles Roberts, Jian Tang, David Roblin, Tom L. Blundell, Michael M. Bronstein, Jake P. Taylor-King

    Abstract: Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development… ▽ More

    Submitted 10 February, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

    Comments: 19 pages, 7 figures, 2 tables

  45. Drug repurposing for COVID-19 using graph neural network and harmonizing multiple evidence

    Authors: Kanglin Hsieh, Yinyin Wang, Luyao Chen, Zhongming Zhao, Sean Savitz, Xiaoqian Jiang, Jing Tang, Yejin Kim

    Abstract: Amid the pandemic of 2019 novel coronavirus disease (COVID-19) infected by SARS-CoV-2, a vast amount of drug research for prevention and treatment has been quickly conducted, but these efforts have been unsuccessful thus far. Our objective is to prioritize repurposable drugs using a drug repurposing pipeline that systematically integrates multiple SARS-CoV-2 and drug interactions, deep graph neura… ▽ More

    Submitted 1 February, 2022; v1 submitted 23 September, 2020; originally announced September 2020.

    Comments: 13 pages

    Journal ref: Sci Rep 11, 23179 (2021)

  46. arXiv:1912.12587  [pdf

    q-bio.BM physics.app-ph physics.atm-clus

    Highly fluorescent copper nanoclusters for sensing and bioimaging

    Authors: Yu An, Ying Ren, Jing Tang, Jun Chen, Baisong Chang

    Abstract: Metal nanoclusters (NCs), typically consisting of a few to tens of metal atoms, bridge the gap between organometallic compounds and crystalline metal nanoparticles. As their size approaches the Fermi wavelength of electrons, metal NCs exhibit discrete energy levels, which in turn results in the emergence of intriguing physical and chemical (or physicochemical) properties, especially strong fluores… ▽ More

    Submitted 29 December, 2019; originally announced December 2019.

  47. arXiv:1907.04765  [pdf, other

    q-bio.PE

    Evaluating bird collision risk of a high-speed railway crossing the habitat of the crested ibis (Nipponia nippon) in Qinling Mountains, China

    Authors: Han Hu, Junqing Tang, Yi Wang, Hongfeng Zhang, Dong Wu, Yingchun Lin, Lina Su, Yan Liu, Wei Zhang, Chao Wang, Xiaomin Wu

    Abstract: Bird collisions with high-speed transport modes is a vital topic on vehicle safety and wildlife protection, especially when high-speed trains, with an average speed of 250km/h, have to run across the habitat of an endangered bird species. This paper evaluates the bird-train collision risk associated with a recent high-speed railway project in Qinling Mountains, China, for the crested ibis (Nipponi… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

    Comments: 25 pages, 6 figures, preprint for submission to Transportation Research Part D: Transport and Environment

  48. arXiv:1906.01513  [pdf

    physics.med-ph eess.SP physics.bio-ph physics.comp-ph q-bio.QM

    Custom Edge-Element FEM Solver and its Application to Eddy-Current Simulation of Realistic 2M-Element Human Brain Phantom

    Authors: Wuliang Yin, Mingyang Lu, Jiawei Tang, Qian Zhao, Zhijie Zhang, Kai Li, Yan Han, Anthony Peyton

    Abstract: Extensive research papers of three-dimensional computational techniques are widely used for the investigation of human brain pathophysiology. Eddy current analyzing could provide an indication of conductivity change within a biological body. A significant obstacle to current trend analyses is the development of a numerically stable and efficiency-finite element scheme that performs well at low fre… ▽ More

    Submitted 30 May, 2019; originally announced June 2019.

  49. arXiv:1905.00534  [pdf, other

    stat.ML cs.LG cs.SI q-bio.QM

    Drug-Drug Adverse Effect Prediction with Graph Co-Attention

    Authors: Andreea Deac, Yu-Hsiang Huang, Petar Veličković, Pietro Liò, Jian Tang

    Abstract: Complex or co-existing diseases are commonly treated using drug combinations, which can lead to higher risk of adverse side effects. The detection of polypharmacy side effects is usually done in Phase IV clinical trials, but there are still plenty which remain undiscovered when the drugs are put on the market. Such accidents have been affecting an increasing proportion of the population (15% in th… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

    Comments: 8 pages, 5 figures

  50. arXiv:1903.05947  [pdf, other

    q-bio.PE

    Designing wildlife crossing structures for ungulates in a desert landscape: A case study in China

    Authors: Bin Zhang, Junqing Tang, Yi Wang, Hongfeng Zhang, Gang Xu, Yu Lin, Xiaomin Wu

    Abstract: This paper reports on the design of wildlife crossing structures (WCSs) along a new expressway in China, which exemplifies the country's increasing efforts on wildlife protection in infrastructure projects. The expert knowledge and field surveys were used to determine the target species in the study area and the quantity, locations, size, and type of the WCSs. The results on relative abundance ind… ▽ More

    Submitted 14 March, 2019; originally announced March 2019.

    Comments: 20 pages, 7 figures, Submit to Transportation Research Part D