-
Generative Active Learning for the Search of Small-molecule Protein Binders
Authors:
Maksym Korablyov,
Cheng-Hao Liu,
Moksh Jain,
Almer M. van der Sloot,
Eric Jolicoeur,
Edward Ruediger,
Andrei Cristian Nica,
Emmanuel Bengio,
Kostiantyn Lapchevskyi,
Daniel St-Cyr,
Doris Alexandra Schuetz,
Victor Ion Butoi,
Jarrid Rector-Brooks,
Simon Blackburn,
Leo Feng,
Hadi Nekoei,
SaiKrishna Gottipati,
Priyesh Vijayan,
Prateek Gupta,
Ladislav Rampášek,
Sasikanth Avancha,
Pierre-Luc Bacon,
William L. Hamilton,
Brooks Paige,
Sanchit Misra
, et al. (9 additional authors not shown)
Abstract:
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu…
▽ More
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Target-Specific De Novo Peptide Binder Design with DiffPepBuilder
Authors:
Fanhao Wang,
Yuzhe Wang,
Laiyi Feng,
Changsheng Zhang,
Luhua Lai
Abstract:
Despite the exciting progress in target-specific de novo protein binder design, peptide binder design remains challenging due to the flexibility of peptide structures and the scarcity of protein-peptide complex structure data. In this study, we curated a large synthetic dataset, referred to as PepPC-F, from the abundant protein-protein interface data and developed DiffPepBuilder, a de novo target-…
▽ More
Despite the exciting progress in target-specific de novo protein binder design, peptide binder design remains challenging due to the flexibility of peptide structures and the scarcity of protein-peptide complex structure data. In this study, we curated a large synthetic dataset, referred to as PepPC-F, from the abundant protein-protein interface data and developed DiffPepBuilder, a de novo target-specific peptide binder generation method that utilizes an SE(3)-equivariant diffusion model trained on PepPC-F to co-design peptide sequences and structures. DiffPepBuilder also introduces disulfide bonds to stabilize the generated peptide structures. We tested DiffPepBuilder on 30 experimentally verified strong peptide binders with available protein-peptide complex structures. DiffPepBuilder was able to effectively recall the native structures and sequences of the peptide ligands and to generate novel peptide binders with improved binding free energy. We subsequently conducted de novo generation case studies on three targets. In both the regeneration test and case studies, DiffPepBuilder outperformed AfDesign and RFdiffusion coupled with ProteinMPNN, in terms of sequence and structure recall, interface quality, and structural diversity. Molecular dynamics simulations confirmed that the introduction of disulfide bonds enhanced the structural rigidity and binding performance of the generated peptides. As a general peptide binder de novo design tool, DiffPepBuilder can be used to design peptide binders for given protein targets with three dimensional and binding site information.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis
Authors:
Di Guo,
Sijin Li,
Jun Liu,
Zhangren Tu,
Tianyu Qiu,
Jingjing Xu,
Liubin Feng,
Donghai Lin,
Qing Hong,
Meijin Lin,
Yanqin Lin,
Xiaobo Qu
Abstract:
Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep l…
▽ More
Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep learning tools is hard to be widely used in NMR due to the sophisticated setup of computation. Thus, NMR processing is not an easy task for chemist and biologists. In this work, we present CloudBrain-NMR, an intelligent online cloud computing platform designed for NMR data reading, processing, reconstruction, and quantitative analysis. The platform is conveniently accessed through a web browser, eliminating the need for any program installation on the user side. CloudBrain-NMR uses parallel computing with graphics processing units and central processing units, resulting in significantly shortened computation time. Furthermore, it incorporates state-of-the-art deep learning-based algorithms offering comprehensive functionalities that allow users to complete the entire processing procedure without relying on additional software. This platform has empowered NMR applications with advanced artificial intelligence processing. CloudBrain-NMR is openly accessible for free usage at https://csrc.xmu.edu.cn/CloudBrain.html
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Machine learning traction force maps of cell monolayers
Authors:
Changhao Li,
Luyi Feng,
Yang Jeong Park,
Jian Yang,
Ju Li,
Sulin Zhang
Abstract:
Cellular force transmission across a hierarchy of molecular switchers is central to mechanobiological responses. However, current cellular force microscopies suffer from low throughput and resolution. Here we introduce and train a generative adversarial network (GAN) to paint out traction force maps of cell monolayers with high fidelity to the experimental traction force microscopy (TFM). The GAN…
▽ More
Cellular force transmission across a hierarchy of molecular switchers is central to mechanobiological responses. However, current cellular force microscopies suffer from low throughput and resolution. Here we introduce and train a generative adversarial network (GAN) to paint out traction force maps of cell monolayers with high fidelity to the experimental traction force microscopy (TFM). The GAN analyzes traction force maps as an image-to-image translation problem, where its generative and discriminative neural networks are simultaneously cross-trained by hybrid experimental and numerical datasets. In addition to capturing the colony-size and substrate-stiffness dependent traction force maps, the trained GAN predicts asymmetric traction force patterns for multicellular monolayers seeding on substrates with stiffness gradient, implicating collective durotaxis. Further, the neural network can extract experimentally inaccessible, the hidden relationship between substrate stiffness and cell contractility, which underlies cellular mechanotransduction. Trained solely on datasets for epithelial cells, the GAN can be extrapolated to other contractile cell types using only a single scaling factor. The digital TFM serves as a high-throughput tool for mapping out cellular forces of cell monolayers and paves the way toward data-driven discoveries in cell mechanobiology.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Identify Hidden Spreaders of Pandemic over Contact Tracing Networks
Authors:
Shuhong Huang,
Jiachen Sun,
Ling Feng,
Jiarong Xie,
Dashun Wang,
Yanqing Hu
Abstract:
The COVID-19 infection cases have surged globally, causing devastations to both the society and economy. A key factor contributing to the sustained spreading is the presence of a large number of asymptomatic or hidden spreaders, who mix among the susceptible population without being detected or quarantined. Here we propose an effective non-pharmacological intervention method of detecting the asymp…
▽ More
The COVID-19 infection cases have surged globally, causing devastations to both the society and economy. A key factor contributing to the sustained spreading is the presence of a large number of asymptomatic or hidden spreaders, who mix among the susceptible population without being detected or quarantined. Here we propose an effective non-pharmacological intervention method of detecting the asymptomatic spreaders in contact-tracing networks, and validated it on the empirical COVID-19 spreading network in Singapore. We find that using pure physical spreading equations, the hidden spreaders of COVID-19 can be identified with remarkable accuracy. Specifically, based on the unique characteristics of COVID-19 spreading dynamics, we propose a computational framework capturing the transition probabilities among different infectious states in a network, and extend it to an efficient algorithm to identify asymptotic individuals. Our simulation results indicate that a screening method using our prediction outperforms machine learning algorithms, e.g. graph neural networks, that are designed as baselines in this work, as well as random screening of infection's closest contacts widely used by China in its early outbreak. Furthermore, our method provides high precision even with incomplete information of the contract-tracing networks. Our work can be of critical importance to the non-pharmacological interventions of COVID-19, especially with increasing adoptions of contact tracing measures using various new technologies. Beyond COVID-19, our framework can be useful for other epidemic diseases that also feature asymptomatic spreading
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Robust Nucleus Detection with Partially Labeled Exemplars
Authors:
Linqing Feng,
Jun Ho Song,
Jiwon Kim,
Soomin Jeong,
Jin Sung Park,
Jinhyun Kim
Abstract:
Quantitative analysis of cell nuclei in microscopic images is an essential yet challenging source of biological and pathological information. The major challenge is accurate detection and segmentation of densely packed nuclei in images acquired under a variety of conditions. Mask R-CNN-based methods have achieved state-of-the-art nucleus segmentation. However, the current pipeline requires fully a…
▽ More
Quantitative analysis of cell nuclei in microscopic images is an essential yet challenging source of biological and pathological information. The major challenge is accurate detection and segmentation of densely packed nuclei in images acquired under a variety of conditions. Mask R-CNN-based methods have achieved state-of-the-art nucleus segmentation. However, the current pipeline requires fully annotated training images, which are time consuming to create and sometimes noisy. Importantly, nuclei often appear similar within the same image. This similarity could be utilized to segment nuclei with only partially labeled training examples. We propose a simple yet effective region-proposal module for the current Mask R-CNN pipeline to perform few-exemplar learning. To capture the similarities between unlabeled regions and labeled nuclei, we apply decomposed self-attention to learned features. On the self-attention map, we observe strong activation at the centers and edges of all nuclei, including unlabeled nuclei. On this basis, our region-proposal module propagates partial annotations to the whole image and proposes effective bounding boxes for the bounding box-regression and binary mask-generation modules. Our method effectively learns from unlabeled regions thereby improving detection performance. We test our method with various nuclear images. When trained with only 1/4 of the nuclei annotated, our approach retains a detection accuracy comparable to that from training with fully annotated data. Moreover, our method can serve as a bootstrapping step to create full annotations of datasets, iteratively generating and correcting annotations until a predetermined coverage and accuracy are reached. The source code is available at https://github.com/feng-lab/nuclei.
△ Less
Submitted 13 November, 2019; v1 submitted 23 July, 2019;
originally announced July 2019.
-
Prey selection of Amur tigers in relation to the spatiotemporal overlap with prey across the Sino-Russian border
Authors:
Hailong Dou,
Haitao Yang,
James L. D. Smith,
Limin Feng,
Tianming Wang,
Jianping Ge
Abstract:
The endangered Amur tiger is confined primarily to a narrow area along the border with Russia in Northeast China. Little is known about the foraging strategies of this small subpopulation in Hunchun Nature Reserve on the Chinese side of the border; at this location, the prey base and land use patterns are distinctly different from those in the larger population of the Sikhote-Alin Mountains of Rus…
▽ More
The endangered Amur tiger is confined primarily to a narrow area along the border with Russia in Northeast China. Little is known about the foraging strategies of this small subpopulation in Hunchun Nature Reserve on the Chinese side of the border; at this location, the prey base and land use patterns are distinctly different from those in the larger population of the Sikhote-Alin Mountains of Russia. Using dietary analysis of scats and camera-trapping data from Hunchun Nature Reserve, we assessed spatiotemporal overlap of tigers and their prey and identified prey selection patterns to enhance understanding of the ecological requirements of tigers in Northeast China. Results indicated that wild prey constituted 94.9% of the total biomass consumed by tigers; domestic livestock represented 5.1% of the diet. Two species, wild boar and sika deer , collectively represented 83% of the biomass consumed by tigers. Despite lower spatial overlap of tigers and wild boar compared to tigers and sika deer, tigers preferentially preyed on boar, likely facilitated by high temporal overlap in activity patterns. Tigers exhibit significant spatial overlap with sika deer, likely favoring a high level of tiger predation on this large-sized ungulate. However, tigers did not preferred roe deer (Capreolus pygargus) and showed a low spatial overlap with roe deer. Overall, our results suggest that tiger prey selection is determined by prey body size and also overlap in tiger and prey use of time or space. Also, we suggest that strategies designed to minimize livestock forays into forested lands may be important for decreasing the livestock depredation by tigers. This study offers a framework to simultaneously integrate food habit analysis with the distribution of predators and prey through time and space to provide a comprehensive understanding of foraging strategies of large carnivores.
△ Less
Submitted 28 March, 2019; v1 submitted 26 October, 2018;
originally announced October 2018.
-
Effects of free-ranging livestock on sympatric herbivores at fine spatiotemporal scales
Authors:
Rongna Feng,
Xinyue Lu,
Tianming Wang,
Jiawei Feng,
Yifei Sun,
Wenhong Xiao,
Yu Guan,
Limin Feng,
James L. D. Smith,
Jianping Ge
Abstract:
Understanding wildlife-livestock interactions is crucial for the design and management of protected areas that aim to conserve large mammal communities undergoing conflicts with humans worldwide. An example of the need to quantify the strength and direction of species interactions is the conservation of big cats in newly established protected areas in China. Currently, free-ranging livestock degra…
▽ More
Understanding wildlife-livestock interactions is crucial for the design and management of protected areas that aim to conserve large mammal communities undergoing conflicts with humans worldwide. An example of the need to quantify the strength and direction of species interactions is the conservation of big cats in newly established protected areas in China. Currently, free-ranging livestock degrade the food and habitat of the endangered Amur tiger and Amur leopard in the forest landscapes of Northeast China, but quantitative assessments of how livestock affect the use of habitat by the major ungulate prey of these predators are very limited. Here, we examined livestock-ungulate interactions using large-scale camera-trap data in the newly established Tiger and Leopard National Park in Northeast China, which borders Russia. We used N-mixture models, two-species occupancy models and activity pattern overlap to understand the effects of cattle grazing on three ungulate species (wild boar, roe deer and sika deer) at a fine spatiotemporal scale. Our results showed that incorporating the biotic interactions with cattle had significant negative effects on encounters with three ungulates; sika deer were particularly displaced as more cattle encroached on forest habitat, as they exhibited low levels of co-occurrence with cattle in terms of habitat use. These results, combined with spatiotemporal overlap, suggested fine-scale avoidance behaviours, and they can help to refine strategies for the conservation of tigers, leopards and their prey in human-dominated transboundary landscapes. Progressively controlling cattle and the impact of cattle on biodiversity while simultaneously addressing the economic needs of local communities should be key priority actions for the Chinese government.
△ Less
Submitted 23 January, 2020; v1 submitted 26 October, 2018;
originally announced October 2018.
-
Origin and Quantitative Control of Sertoli Cells
Authors:
Lixin Feng,
Yongguang Yang
Abstract:
Sertoli cell is the"nurse"in testes that regulates germ cell proliferation and differentiation. One Sertoli cell supports a certain number of germ cells during these processes. Thus, it is the determinant of male reproductive capability. Sertoli cells originate from the primitive gonads during embryonic stage and their proliferations continue throughout the pre-pubertal stage. The proliferation an…
▽ More
Sertoli cell is the"nurse"in testes that regulates germ cell proliferation and differentiation. One Sertoli cell supports a certain number of germ cells during these processes. Thus, it is the determinant of male reproductive capability. Sertoli cells originate from the primitive gonads during embryonic stage and their proliferations continue throughout the pre-pubertal stage. The proliferation and final density of Sertoli cells in the testis are regulated by hormones and local factors through autocrine, paracrine as well as endocrine methods. In the concise minireview, the most recent progresses in the study of factors and signaling pathways that participate into regulating the proliferation and function of Sertoli cell were summarized.
△ Less
Submitted 2 June, 2017;
originally announced June 2017.
-
Non-trivial Resource Amount Requirement in the Early Stage for Containing Fatal Diseases
Authors:
Xiaolong Chen,
Tianshou Zhou,
Ling Feng,
Junhao Liang,
Fredrik Liljeros,
Shlomo Havlin,
Yanqing Hu
Abstract:
During an epidemic control, the containment of the disease is usually achieved through increasing devoted resource to shorten the duration of infectiousness. However, the impact of this resource expenditure has not been studied quantitatively. Using the well-documented cholera data, we observe empirically that the recovery rate which is related to the duration of infectiousness has a strong positi…
▽ More
During an epidemic control, the containment of the disease is usually achieved through increasing devoted resource to shorten the duration of infectiousness. However, the impact of this resource expenditure has not been studied quantitatively. Using the well-documented cholera data, we observe empirically that the recovery rate which is related to the duration of infectiousness has a strong positive correlation with the average resource devoted to the infected individuals. By incorporating this relation we build a novel model and find that insufficient resource leads to an abrupt increase in the infected population size, which is in marked contrast with the continuous phase transitions believed previously. Counterintuitively, this abrupt phase transition is more pronounced in the less contagious diseases, which usually correspond to the most fatal ones. Furthermore, we find that even for a single infection source, public resource needs to meet a significant amount, which is proportional to the whole population size to ensure epidemic containment. Our findings provide a theoretical foundation for efficient epidemic containment strategies in the early stage.
△ Less
Submitted 27 January, 2018; v1 submitted 1 November, 2016;
originally announced November 2016.