Abstract
The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life. Despite recent advancements in protein structure prediction, existing algorithms are so far unable to systematically predict the binding ligand structures along with their regulatory effects on protein folding. To address this discrepancy, we present NeuralPLexer, a computational approach that can directly predict proteinâligand complex structures solely using protein sequence and ligand molecular graph inputs. NeuralPLexer adopts a deep generative model to sample the three-dimensional structures of the binding complex and their conformational changes at an atomistic resolution. The model is based on a diffusion process that incorporates essential biophysical constraints and a multiscale geometric deep learning system to iteratively sample residue-level contact maps and all heavy-atom coordinates in a hierarchical manner. NeuralPLexer achieves state-of-the-art performance compared with all existing methods on benchmarks for both proteinâligand blind docking and flexible binding-site structure recovery. Moreover, owing to its specificity in sampling both ligand-free-state and ligand-bound-state ensembles, NeuralPLexer consistently outperforms AlphaFold2 in terms of global protein structure accuracy on both representative structure pairs with large conformational changes and recently determined ligand-binding proteins. NeuralPLexer predictions align with structure determination experiments for important targets in enzyme engineering and drug discovery, suggesting its potential for accelerating the design of functional proteins and small molecules at the proteome scale.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 /Â 30Â days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All datasets and predictions used to generate the reported results are available on Code Ocean86 and also on Zenodo at https://doi.org/10.5281/zenodo.10373581.
Code availability
The code, scripts and interactive data analysis notebooks are available on Code Ocean86 and also on GitHub at https://github.com/zrqiao/NeuralPLexer.
References
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583â589 (2021).
Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294â298 (2017).
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496â1503 (2020).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871â876 (2021).
Baek, M. et al. Accurate prediction of proteinânucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117â121 (2024).
Chowdhury, R. et al. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. 40, 1617â1623 (2022).
Wang, W., Peng, Z. & Yang, J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat. Comput. Sci. 2, 804â814 (2022).
Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at https://www.biorxiv.org/content/10.1101/2022.07.21.500999v1 (2022)
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123â1130 (2023).
Zhang, Y. et al. Benchmarking refined and unrefined AlphaFold2 structures for hit discovery. J. Chem. Inf. Model. 63, 1656â1667 (2023).
Wong, F. et al. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).
Jones, D. T. & Thornton, J. M. The impact of AlphaFold2 one year on. Nat. Methods 19, 15â20 (2022).
Henzler-Wildman, K. & Kern, D. Dynamic personalities of proteins. Nature 450, 964â972 (2007).
Nussinov, R. & Tsai, C.-J. Allostery in disease and in drug discovery. Cell 153, 293â305 (2013).
Ayaz, P. et al. Structural mechanism of a drug-binding process involving a large conformational change of the protein target. Nat. Commun. 14, 1885 (2023).
Lane, T. J. Protein structure prediction has reached the single-structure frontier. Nat. Methods 20, 170â173 (2023).
Moore, A. R., Rosenberg, S. C., McCormick, F. & Malek, S. Ras-targeted therapies: is the undruggable drugged? Nat. Rev. Drug Discov. 19, 533â552 (2020).
Draper-Joyce, C. J. et al. Positive allosteric mechanisms of adenosine a1 receptor-mediated analgesia. Nature 597, 571â576 (2021).
Sadybekov, A. V. & Katritch, V. Computational approaches streamlining drug discovery. Nature 616, 673â685 (2023).
Shaw, D. E. et al. Atomic-level characterization of the structural dynamics of proteins. Science 330, 341â346 (2010).
Shan, Y. et al. How does a small molecule bind at a cryptic binding site? PLoS Comput. Biol. 18, e1009817 (2022).
Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 34, 8780â8794 (2021).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877â1901 (2020).
Vaswani, A. et al. Attention is All You Need. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, Inc., 2017).
Zvyagin, M. et al. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. Int. J. High Perform. Comput. Appl. 37, 683â705 (2023).
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196â1203 (2021).
Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654â669 (2021).
Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49â56 (2022).
Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070â1078 (2023).
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089â1100 (2023).
Wu, K. E. et al. Protein structure generation via folding diffusion. Preprint at https://arxiv.org/abs/2209.15611 (2022).
Lin, Y. & AlQuraishi, M. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. Preprint at https://arxiv.org/abs/2301.12485 (2023).
Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations (2022).
Lu, W. et al. Tankbind: trigonometry-aware neural networks for drug-protein binding structure prediction. In Advances in Neural Information Processing Systems, Vol. 35 (eds Koyejo, S. et al.) 7236â7249 (Curran Associates, Inc., 2022).
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock: diffusion steps, twists, and turns for molecular docking. In The Eleventh International Conference on Learning Representations.
Nakata, S., Mori, Y. & Tanaka, S. End-to-end proteinâligand complex structure generation with diffusion-based generative models. BMC Bioinformatics 24, 233 (2023).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at https://arxiv.org/abs/2210.13695 (2022).
Alayrac, J.-B. et al. Flamingo: a visual language model for few-shot learning. Adv. Neural Inf. Process. Syst. 35, 23716â23736 (2022).
Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111â4119 (2005).
Davis, I. W. & Baker, D. Rosettaligand docking with full ligand and receptor flexibility. J. Mol. Biol. 385, 381â392 (2009).
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302â2309 (2005).
Eliel, E. L. & Wilen, S. H. Stereochemistry of Organic Compounds (John Wiley & Sons, 1994).
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning Vol. 37 (eds Bach, F. & Blei, D.) 2256â2265 (PMLR, 2015).
Song, Y. et al. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (2021).
Shin, Y. et al. Discovery of N-(1-acryloylazetidin-3-yl)-2-(1H-indol-1-yl)acetamides as covalent inhibitors of KRASG12C. ACS Med. Chem. Lett. 10, 1302â1308 (2019).
Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227â1233 (2020).
Meller, A. et al. Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Nat. Commun. 14, 1177 (2023).
Best, R. B., Hummer, G. & Eaton, W. A. Native contacts determine protein folding mechanisms in atomistic simulations. Proc. Natl Acad. Sci. USA 110, 17874â17879 (2013).
Karelina, M., Noh, J. J. & Dror, R. O. How accurately can one predict drug binding modes using AlphaFold models? eLife https://doi.org/10.7554/elife.89386.1 (2023).
Chen, C.-Y., Chang, Y.-C., Lin, B.-L., Huang, C.-H. & Tsai, M.-D. Temperature-resolved cryo-EM uncovers structural bases of temperature-dependent enzyme functions. J. Am. Chem. Soc. 141, 19983â19987 (2019).
Lee, M.-Y. et al. Harnessing the power of an X-ray laser for serial crystallography of membrane proteins crystallized in lipidic cubic phase. IUCrJ 7, 976â984 (2020).
GarcÃa-NafrÃa, J., Lee, Y., Bai, X., Carpenter, B. & Tate, C. G. Cryo-EM structure of the adenosine A2A receptor coupled to an engineered heterotrimeric G protein. eLife 7, e35946 (2018).
Bertheleme, N., Singh, S., Dowell, S. J., Hubbard, J. & Byrne, B. Loss of constitutive activity is correlated with increased thermostability of the human adenosine A2A receptor. Br. J. Pharmacol. 169, 988â998 (2013).
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590â596 (2021).
Wishart, D. S. et al. HMDB 5.0: the human metabolome database for 2022. Nucleic Acids Res. 50, D622âD631 (2022).
Irwin, J. J. & Shoichet, B. K. ZINCâa free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177â182 (2005).
Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 617â626 (2020).
Fu, T. et al. Differentiable scaffolding tree for molecule optimization. In International Conference on Learning Representations (2022).
Plested, A. J. Structural mechanisms of activation and desensitization in neurotransmitter-gated ion channels. Nat. Struct. Mol. Biol. 23, 494â502 (2016).
Kondor, R. I. & Lafferty, J. Diffusion kernels on graphs and other discrete structures. In Proc. 19th International Conference on Machine Learning, 315â322 (2002) .
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In Proceedings of the 38th International Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 9323â9332 (PMLR, 2021).
Brandstetter, J., Hesselink, R., van der Pol, E., Bekkers, E. J. & Welling, M. Geometric and physical quantities improve E(3) equivariant message passing. In International Conference on Learning Representations (2022).
Li, Y., Wu, J., Tedrake, R., Tenenbaum, J. B. & Torralba, A. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In International Conference on Learning Representations (2019).
Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations (2021).
Shen, T. et al. E2Efold-3D: end-to-end deep learning method for accurate de novo RNA 3D structure prediction. Preprint at https://arxiv.org/abs/2207.01586 (2022).
Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at https://arxiv.org/abs/2205.15019 (2022).
Meucci, A. Review of statistical arbitrage, cointegration, and multivariate OrnsteinâUhlenbeck. SSRN: https://ssrn.com/abstract=1404905 (2009).
Song, Y. & Ermon, S. Generative Modeling by Estimating Gradients of the Data Distribution. In: Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems. Vol. 32. Curran Associates, Inc.; 2019.
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840â6851 (2020).
Karras, T., Aittala, M., Aila, T. & Laine, S. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems (2022).
Yang, J., Roy, A. & Zhang, Y. BioLiP: a semi-manually curated database for biologically relevant ligandâprotein interactions. Nucleic Acids Res. 41, D1096âD1103 (2012).
Pándy-Szekeres, G. et al. GPCRdb in 2023: state-specific structure models using AlphaFold2 and new ligand resources. Nucleic Acids Res. 51, D395âD402 (2023).
Del Alamo, D., Sala, D., Mchaourab, H. S. & Meiler, J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. eLife 11, e75751 (2022).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679â682 (2022).
Ahdritz, G. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Preprint at https://www.biorxiv.org/content/10.1101/2022.11.20.517210v3 (2022).
Yan, X. et al. Pointsite: a point cloud segmentation tool for identification of protein ligand binding atoms. J. Chem. Inf. Model. 62, 2835â2845 (2022).
Krivák, R. & Hoksza, D. P2Rank: machine learning-based tool for rapid and accurate prediction of ligand binding sites from protein structure. J. Cheminform. 10, 39 (2018).
McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).
Yu, Y. et al. Uni-dock: GPU-accelerated docking enables ultralarge virtual screening. J. Chem. Theory Comput. 19, 3336â3345 (2023).
Yu, Y., Lu, S., Gao, Z., Zheng, H. & Ke, G. Do deep learning models really outperform traditional approaches in molecular docking? Preprint at arXiv:2302.07134 (2023). https://arxiv.org/abs/2302.07134
Stärk, H., Ganea, O., Pattanaik, L., Barzilay, D. R. & Jaakkola, T. EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction. In: Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, editors. Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research; Vol. 162. PMLR; 2022 Jul 17-23. p. 20503-20521.
Mariani, V., Biasini, M., Barbato, A. & Schwede, T. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 2722â2728 (2013).
Robin, X. et al. Continuous Automated Model EvaluatiOn (CAMEO)âperspectives on the future of fully automated evaluation of structure prediction methods. Proteins 89, 1977â1986 (2021).
Biasini, M. et al. OpenStructure: an integrated software framework for computational structural biology. Acta Crystallogr. D Biol. Crystallogr. 69, 701â709 (2013).
Rappé, A. K., Casewit, C. J., Colwell, K., Goddard III, W. A. & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024â10035 (1992).
Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F. & Anandkumar, A. State-specific protein-ligand complex structure prediction with a multi-scale deep generative model. Code Ocean https://doi.org/10.24433/CO.9870737.v1 (2023).
Acknowledgements
Z.Q. acknowledges graduate research funding from Caltech and partial support from the Amazon-Caltech AI4Science fellowship. T.F.M. acknowledges partial support from the Caltech DeLogi fund, and A.A. acknowledges support from a Caltech Bren professorship. We thank M. Welborn, F. R. Manby, C. Zhang and V. Bhethanabotla for discussions on the work and for comments on the manuscript. We thank A. Meller and J. Borowsky for sharing the PocketMiner dataset.
Author information
Authors and Affiliations
Contributions
Z.Q., W.N., A.V., T.F.M. and A.A. conceived and designed the experiments. Z.Q. performed the experiments. Z.Q., W.N., A.V., T.F.M. and A.A. analysed the data. Z.Q. contributed analysis tools. Z.Q. and A.A. wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
Z.Q. and T.F.M. are currently employees of Iambic Therapeutics or its affiliates. A provisional patent application related to this work has been filed (US Patent App. provisional 63/496,899). The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Shigenori Tanaka, Anastassis Perrakis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Structure prediction accuracy on all targets.
Comparing AlphaFold2 (AF2), NeuralPLexer, and NeuralPLexer (no ligand) in terms of TM-score against all structure prediction targets described in this study, including PocketMiner and recent structures. All NeuralPLexer results shown in this figure are obtained using the LSA-SDE sampler and are based on the structure with the highest average protein pLDDT among the 8 generated structures for each prediction target.
Supplementary information
Supplementary Information
Supplementary results and discussions and Algorithms 1â12, Figs. 1â5 and Tables 1â6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qiao, Z., Nie, W., Vahdat, A. et al. State-specific proteinâligand complex structure prediction with a multiscale deep generative model. Nat Mach Intell 6, 195â208 (2024). https://doi.org/10.1038/s42256-024-00792-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-024-00792-z
This article is cited by
-
Structure prediction of protein-ligand complexes from sequence information with Umol
Nature Communications (2024)
-
Accurate structure prediction of biomolecular interactions with AlphaFold 3
Nature (2024)