Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–24 of 24 results for author: Jaakkola, T

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2402.18396  [pdf, other

    q-bio.BM cs.LG

    Deep Confident Steps to New Pockets: Strategies for Docking Generalization

    Authors: Gabriele Corso, Arthur Deng, Benjamin Fry, Nicholas Polizzi, Regina Barzilay, Tommi Jaakkola

    Abstract: Accurate blind docking has the potential to lead to new biological breakthroughs, but for this promise to be realized, docking methods must generalize well across the proteome. Existing benchmarks, however, fail to rigorously assess generalizability. Therefore, we develop DockGen, a new benchmark based on the ligand-binding domains of proteins, and we show that existing machine learning-based dock… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Journal ref: International Conference on Learning Representations 2024

  2. arXiv:2402.05841  [pdf, other

    q-bio.BM cs.LG

    Dirichlet Flow Matching with Applications to DNA Sequence Design

    Authors: Hannes Stark, Bowen Jing, Chenyu Wang, Gabriele Corso, Bonnie Berger, Regina Barzilay, Tommi Jaakkola

    Abstract: Discrete diffusion or flow models could enable faster and more controllable sequence generation than autoregressive models. We show that naïve linear flow matching on the simplex is insufficient toward this goal since it suffers from discontinuities in the training target and further pathologies. To overcome this, we develop Dirichlet flow matching on the simplex based on mixtures of Dirichlet dis… ▽ More

    Submitted 30 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Published at ICML 2024. (Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024)

  3. arXiv:2402.04997  [pdf, other

    stat.ML cs.LG q-bio.QM

    Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

    Authors: Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola

    Abstract: Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models to be applied to multimodal continuous and discrete data problems. Our key insight is that the discrete equivalent of continuous space flow matching can be rea… ▽ More

    Submitted 5 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 60 pages, 11 figures, 6 tables; ICML 2024

  4. arXiv:2402.04845  [pdf, other

    q-bio.BM cs.LG

    AlphaFold Meets Flow Matching for Generating Protein Ensembles

    Authors: Bowen Jing, Bonnie Berger, Tommi Jaakkola

    Abstract: The biological functions of proteins often depend on dynamic structural ensembles. In this work, we develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins. We repurpose highly accurate single-state predictors such as AlphaFold and ESMFold and fine-tune them under a custom flow matching framework to obtain sequence-conditoned generative… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  5. arXiv:2401.04082  [pdf, other

    q-bio.QM cs.LG stat.ML

    Improved motif-scaffolding with SE(3) flow matching

    Authors: Jason Yim, Andrew Campbell, Emile Mathieu, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Frank Noé, Regina Barzilay, Tommi S. Jaakkola

    Abstract: Protein design often begins with the knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a range of motifs. However, generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend Fr… ▽ More

    Submitted 18 July, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Preprint. Code: https://github.com/ microsoft/frame-flow

    Journal ref: Transactions on Machine Learning Research 2024

  6. arXiv:2312.04323  [pdf, other

    q-bio.BM cs.LG

    Equivariant Scalar Fields for Molecular Docking with Fast Fourier Transforms

    Authors: Bowen Jing, Tommi Jaakkola, Bonnie Berger

    Abstract: Molecular docking is critical to structure-based virtual screening, yet the throughput of such workflows is limited by the expensive optimization of scoring functions involved in most docking algorithms. We explore how machine learning can accelerate this process by learning a scoring function with a functional form that allows for more rapid optimization. Specifically, we define the scoring funct… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  7. arXiv:2312.02447  [pdf, other

    q-bio.BM stat.ML

    Fast non-autoregressive inverse folding with discrete diffusion

    Authors: John J. Yang, Jason Yim, Regina Barzilay, Tommi Jaakkola

    Abstract: Generating protein sequences that fold into a intended 3D structure is a fundamental step in de novo protein design. De facto methods utilize autoregressive generation, but this eschews higher order interactions that could be exploited to improve inference speed. We describe a non-autoregressive alternative that performs inference using a constant number of calls resulting in a 23 times speed up w… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: NeurIPS Machine learning for Stuctural Biology workshop

  8. arXiv:2312.00718  [pdf, other

    cs.LG cs.AI q-bio.BM

    Removing Biases from Molecular Representations via Information Maximization

    Authors: Chenyu Wang, Sharut Gupta, Caroline Uhler, Tommi Jaakkola

    Abstract: High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce s… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  9. arXiv:2310.05297  [pdf, other

    q-bio.QM

    Fast protein backbone generation with SE(3) flow matching

    Authors: Jason Yim, Andrew Campbell, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Regina Barzilay, Tommi Jaakkola, Frank Noé

    Abstract: We present FrameFlow, a method for fast protein backbone generation using SE(3) flow matching. Specifically, we adapt FrameDiff, a state-of-the-art diffusion model, to the flow-matching generative modeling paradigm. We show how flow matching can be applied on SE(3) and propose modifications during training to effectively learn the vector field. Compared to FrameDiff, FrameFlow requires five times… ▽ More

    Submitted 10 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Preprint

  10. arXiv:2307.00494  [pdf, other

    q-bio.BM cs.LG q-bio.QM stat.ML

    Improving Protein Optimization with Smoothed Fitness Landscapes

    Authors: Andrew Kirjner, Jason Yim, Raman Samusevich, Shahar Bracha, Tommi Jaakkola, Regina Barzilay, Ila Fiete

    Abstract: The ability to engineer novel proteins with higher fitness for a desired property would be revolutionary for biotechnology and medicine. Modeling the combinatorially large space of sequences is infeasible; prior methods often constrain optimization to a small mutational radius, but this drastically limits the design space. Instead of heuristics, we propose smoothing the fitness landscape to facili… ▽ More

    Submitted 2 March, 2024; v1 submitted 2 July, 2023; originally announced July 2023.

    Comments: ICLR 2024. Code: https://github.com/kirjner/GGS

  11. arXiv:2304.03889  [pdf, other

    q-bio.BM cs.LG

    DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models

    Authors: Mohamed Amine Ketata, Cedrik Laue, Ruslan Mammadov, Hannes Stärk, Menghua Wu, Gabriele Corso, Céline Marquet, Regina Barzilay, Tommi S. Jaakkola

    Abstract: Understanding how proteins structurally interact is crucial to modern biology, with applications in drug discovery and protein design. Recent machine learning methods have formulated protein-small molecule docking as a generative problem with significant performance boosts over both traditional and deep learning baselines. In this work, we propose a similar approach for rigid protein-protein docki… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: ICLR Machine Learning for Drug Discovery (MLDD) Workshop 2023

  12. arXiv:2304.02198  [pdf, other

    q-bio.BM cs.LG physics.bio-ph

    EigenFold: Generative Protein Structure Prediction with Diffusion Models

    Authors: Bowen Jing, Ezra Erives, Peter Pao-Huang, Gabriele Corso, Bonnie Berger, Tommi Jaakkola

    Abstract: Protein structure prediction has reached revolutionary levels of accuracy on single structures, yet distributional modeling paradigms are needed to capture the conformational ensembles and flexibility that underlie biological function. Towards this goal, we develop EigenFold, a diffusion generative modeling framework for sampling a distribution of structures from a given protein sequence. We defin… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: ICLR MLDD workshop 2023

  13. arXiv:2302.02277  [pdf, other

    cs.LG q-bio.QM stat.ML

    SE(3) diffusion model with application to protein backbone generation

    Authors: Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi Jaakkola

    Abstract: The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusi… ▽ More

    Submitted 22 May, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

    Journal ref: International Conference of Machine Learning (ICML) 2023

  14. arXiv:2210.01776  [pdf, other

    q-bio.BM cs.LG physics.bio-ph

    DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking

    Authors: Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola

    Abstract: Predicting the binding structure of a small molecule ligand to a protein -- a task known as molecular docking -- is critical to drug design. Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling… ▽ More

    Submitted 11 February, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: International Conference on Learning Representations (ICLR 2023)

  15. arXiv:2207.06616  [pdf, other

    q-bio.BM cs.LG

    Antibody-Antigen Docking and Design via Hierarchical Equivariant Refinement

    Authors: Wengong Jin, Regina Barzilay, Tommi Jaakkola

    Abstract: Computational antibody design seeks to automatically create an antibody that binds to an antigen. The binding affinity is governed by the 3D binding interface where antibody residues (paratope) closely interact with antigen residues (epitope). Thus, predicting 3D paratope-epitope complex (docking) is the key to finding the best paratope. In this paper, we propose a new model called Hierarchical Eq… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  16. arXiv:2206.04119  [pdf, other

    q-bio.BM cs.LG stat.ML

    Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem

    Authors: Brian L. Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, Tommi Jaakkola

    Abstract: Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds.… ▽ More

    Submitted 19 March, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Appearing in ICLR 2023. Code available: github.com/blt2114/ProtDiff_SMCDiff

  17. arXiv:2206.01729  [pdf, other

    physics.chem-ph cs.LG q-bio.BM

    Torsional Diffusion for Molecular Conformer Generation

    Authors: Bowen Jing, Gabriele Corso, Jeffrey Chang, Regina Barzilay, Tommi Jaakkola

    Abstract: Molecular conformer generation is a fundamental task in computational chemistry. Several machine learning approaches have been developed, but none have outperformed state-of-the-art cheminformatics methods. We propose torsional diffusion, a novel diffusion framework that operates on the space of torsion angles via a diffusion process on the hypertorus and an extrinsic-to-intrinsic score model. On… ▽ More

    Submitted 28 February, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  18. arXiv:2202.05146  [pdf, other

    q-bio.BM cs.LG

    EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

    Authors: Hannes Stärk, Octavian-Eugen Ganea, Lagnajit Pattanaik, Regina Barzilay, Tommi Jaakkola

    Abstract: Predicting how a drug-like molecule binds to a specific protein target is a core problem in drug discovery. An extremely fast computational binding method would enable key applications such as fast virtual screening or drug engineering. Existing methods are computationally expensive as they rely on heavy candidate sampling coupled with scoring, ranking, and fine-tuning steps. We challenge this par… ▽ More

    Submitted 4 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: 39th International Conference on Machine Learning (ICML 2022). Also accepted at ICLR 2022 GTRL and at ICLR 2022 MLDD as spotlight

    Journal ref: 39th International Conference on Machine Learning (ICML 2022)

  19. arXiv:2111.01009  [pdf, other

    q-bio.BM cs.LG

    Fragment-based Sequential Translation for Molecular Optimization

    Authors: Benson Chen, Xiang Fu, Regina Barzilay, Tommi Jaakkola

    Abstract: Searching for novel molecular compounds with desired properties is an important problem in drug discovery. Many existing frameworks generate molecules one atom at a time. We instead propose a flexible editing paradigm that generates molecules using learned molecular fragments--meaningful substructures of molecules. To do so, we train a variational autoencoder (VAE) to encode molecular fragments in… ▽ More

    Submitted 26 October, 2021; originally announced November 2021.

  20. arXiv:2110.04624  [pdf, other

    q-bio.BM cs.LG

    Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design

    Authors: Wengong Jin, Jeremy Wohlwend, Regina Barzilay, Tommi Jaakkola

    Abstract: Antibodies are versatile proteins that bind to pathogens like viruses and stimulate the adaptive immune system. The specificity of antibody binding is determined by complementarity-determining regions (CDRs) at the tips of these Y-shaped proteins. In this paper, we propose a generative model to automatically design the CDRs of antibodies with enhanced binding specificity or neutralization capabili… ▽ More

    Submitted 27 January, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Comments: Accepted to ICLR 2022

  21. arXiv:2011.04651  [pdf, other

    q-bio.BM cs.LG q-bio.QM

    Discovering Synergistic Drug Combinations for COVID with Biological Bottleneck Models

    Authors: Wengong Jin, Regina Barzilay, Tommi Jaakkola

    Abstract: Drug combinations play an important role in therapeutics due to its better efficacy and reduced toxicity. Recent approaches have applied machine learning to identify synergistic combinations for cancer, but they are not applicable to new diseases with limited combination data. Given that drug synergy is closely tied to biological targets, we propose a \emph{biological bottleneck} model that jointl… ▽ More

    Submitted 28 November, 2020; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: Accepted to NeurIPS 2020 Machine Learning for Molecules Workshop

  22. arXiv:2006.08532  [pdf, other

    q-bio.BM cs.CV cs.LG eess.IV q-bio.QM

    Improved Conditional Flow Models for Molecule to Image Synthesis

    Authors: Karren Yang, Samuel Goldman, Wengong Jin, Alex Lu, Regina Barzilay, Tommi Jaakkola, Caroline Uhler

    Abstract: In this paper, we aim to synthesize cell microscopy images under different molecular interventions, motivated by practical applications to drug development. Building on the recent success of graph neural networks for learning molecular embeddings and flow-based models for image generation, we propose Mol2Image: a flow-based generative model for molecule to cell image synthesis. To generate cell fe… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    MSC Class: 92-08

  23. arXiv:2005.03004  [pdf, other

    q-bio.QM cs.LG stat.ML

    Adaptive Invariance for Molecule Property Prediction

    Authors: Wengong Jin, Regina Barzilay, Tommi Jaakkola

    Abstract: Effective property prediction methods can help accelerate the search for COVID-19 antivirals either through accurate in-silico screens or by effectively guiding on-going at-scale experimental efforts. However, existing prediction tools have limited ability to accommodate scarce or fragmented training data currently available. In this paper, we introduce a novel approach to learn predictors that ca… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

  24. arXiv:1511.04486  [pdf, other

    stat.ME q-bio.GN q-bio.QM

    Modeling Persistent Trends in Distributions

    Authors: Jonas Mueller, Tommi Jaakkola, David Gifford

    Abstract: We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the sequential-progression effects, our approach leverages an assumption that these effects follow a persistent trend. This work is motivated by the rece… ▽ More

    Submitted 24 May, 2017; v1 submitted 13 November, 2015; originally announced November 2015.

    Comments: To appear in: Journal of the American Statistical Association

    Journal ref: Journal of the American Statistical Association, 113(523):1296-1310, 2018