Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3666122.3666858guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

Full-atom protein pocket design via iterative refinement

Published: 30 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    The design and optimization of functional proteins that bind specific ligand molecules is paramount in therapeutics and bio-engineering. A critical yet formidable task in this endeavor is the design of the protein pocket, which is the cavity region of the protein where the ligand binds. Current methods are plagued by inefficient generation, inadequate context modeling of the ligand molecule, and the inability to generate side-chain atoms. Here, we present the Full-Atom Iterative Refinement (FAIR) method, designed to address these challenges by facilitating the co-design of protein pocket sequences, specifically residue types, and their corresponding 3D structures. FAIR operates in two steps, proceeding in a coarse-to-fine manner (transitioning from protein backbone to atoms, including side chains) for a full-atom generation. In each iteration, all residue types and structures are simultaneously updated, a process termed full-shot refinement. In the initial stage, the residue types and backbone coordinates are refined using a hierarchical context encoder, complemented by two structure refinement modules that capture both inter-residue and pocket-ligand interactions. The subsequent stage delves deeper, modeling the side-chain atoms of the pockets and updating residue types to ensure sequence-structure congruence. Concurrently, the structure of the binding ligand is refined across iterations to accommodate its inherent flexibility. Comprehensive experiments show that FAIR surpasses existing methods in designing superior pocket sequences and structures, producing average improvement exceeding 10% in AAR and RMSD metrics.

    References

    [1]
    Rebecca F Alford, Andrew Leaver-Fay, Jeliazko R Jeliazkov, Matthew J O'Meara, Frank P DiMaio, Hahnbeom Park, Maxim V Shapovalov, P Douglas Renfrew, Vikram K Mulligan, Kalli Kappel, et al. The rosetta all-atom energy function for macromolecular modeling and design. Journal of chemical theory and computation, 13(6):3031-3048, 2017.
    [2]
    Amr Alhossary, Stephanus Daniel Handoko, Yuguang Mu, and Chee-Keong Kwoh. Fast, accurate, and reliable molecular docking with quickvina 2. Bioinformatics, 31(13):2214-2216, 2015.
    [3]
    Ivan Anishchenko, Samuel J Pellock, Tamuka M Chidyausiku, Theresa A Ramelot, Sergey Ovchinnikov, Jingzhou Hao, Khushboo Bafna, Christoffer Norn, Alex Kang, Asim K Bera, et al. De novo protein design by deep network hallucination. Nature, 600(7889):547-552, 2021.
    [4]
    Amos Bairoch. The enzyme database in 2000. Nucleic acids research, 28(1):304-305, 2000.
    [5]
    Matthew J Bick, Per J Greisen, Kevin J Morey, Mauricio S Antunes, David La, Banumathi Sankaran, Luc Reymond, Kai Johnsson, June I Medford, and David Baker. Computational design of environmental sensors for the potent opioid fentanyl. Elife, 6:e28909, 2017.
    [6]
    F Edward Boas and Pehr B Harbury. Potential energy functions for protein design. Current opinion in structural biology, 17(2):199-204, 2007.
    [7]
    Yue Cao, Payel Das, Vijil Chenthamarakshan, Pin-Yu Chen, Igor Melnyk, and Yang Shen. Fold2seq: A joint sequence (ld)-fold (3d) embedding-based generative model for protein design. In International Conference on Machine Learning, pages 1261-1271. PMLR, 2021.
    [8]
    Yaoxi Chen, Quan Chen, and Haiyan Liu. Depact and pacmatch: A workflow of designing de novo protein pockets to bind small molecules. Journal of Chemical Information and Modeling, 62(4):971-985, 2022.
    [9]
    Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J Ragotte, Lukas F Milles, Basile IM Wicky, Alexis Courbet, Rob J de Haas, Neville Bethel, et al. Robust deep learning-based protein sequence design using proteinmpnn. Science, 378(6615):49-56, 2022.
    [10]
    Christopher M Dobson. Protein folding and misfolding. Nature, 426(6968):884-890, 2003.
    [11]
    Jiayi Dou, Lindsey Doyle, Per Jr Greisen, Alberto Schena, Hahnbeom Park, Kai Johnsson, Barry L Stoddard, and David Baker. Sampling and energy evaluation challenges in ligand binding protein design. Protein Science, 26(12):2426-2437, 2017.
    [12]
    Xing Du, Yi Li, Yuan-Ling Xia, Shi-Meng Ai, Jing Liang, Peng Sang, Xing-Lai Ji, and Shu-Qun Liu. Insights into protein-ligand interactions: mechanisms, models, and methods. International journal of molecular sciences, 17(2):144, 2016.
    [13]
    Peter Eastman, Jason Swails, John D Chodera, Robert T McGibbon, Yutong Zhao, Kyle A Beauchamp, Lee-Ping Wang, Andrew C Simmonett, Matthew P Harrigan, Chaya D Stern, et al. Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7):e1005659, 2017.
    [14]
    Noelia Ferruz and Birte Höcker. Controllable protein design with language models. Nature Machine Intelligence, 4(6):521-532, 2022.
    [15]
    Noelia Ferruz, Steffen Schmidt, and Birte Höcker. Protgpt2 is a deep unsupervised language model for protein design. Nature communications, 13(1):4348, 2022.
    [16]
    Paul G Francoeur, Tomohide Masuda, Jocelyn Sunseri, Andrew Jia, Richard B Iovanisci, Ian Snyder, and David R Koes. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of chemical information and modeling, 60(9):4200-4215, 2020.
    [17]
    Wenhao Gao, Sai Pooja Mahajan, Jeremias Sulam, and Jeffrey J Gray. Deep learning in protein structural modeling and design. Patterns, 1(9):100142, 2020.
    [18]
    Anum A Glasgow, Yao-Ming Huang, Daniel J Mandell, Michael Thompson, Ryan Ritterson, Amanda L Loshbaugh, Jenna Pellegrino, Cody Krivacic, Roland A Pache, Kyle A Barlow, et al. Computational design of a modular protein sense-response system. Science, 366(6468):1024-1028, 2019.
    [19]
    Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, and Jianzhu Ma. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. In International Conference on Learning Representations, 2023.
    [20]
    Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, and Alexander Rives. Learning inverse folding from millions of predicted structures. ICML, 2022.
    [21]
    Liegi Hu, Mark L Benson, Richard D Smith, Michael G Lerner, and Heather A Carlson. Binding moad (mother of all databases). Proteins: Structure, Function, and Bioinformatics, 60(3):333-340, 2005.
    [22]
    Po-Ssu Huang, Scott E Boyken, and David Baker. The coming of age of de novo protein design. Nature, 537(7620):320-327, 2016.
    [23]
    Wenbing Huang, Jiaqi Han, Yu Rong, Tingyang Xu, Fuchun Sun, and Junzhou Huang. Equivariant graph mechanics networks with constraints. arXiv preprint arXiv:2203.06442, 2022.
    [24]
    Peter J Huber. Robust estimation of a location parameter. Breakthroughs in statistics: Methodology and distribution, pages 492-518, 1992.
    [25]
    Du Q Huynh. Metrics for 3d rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision, 35(2):155-164, 2009.
    [26]
    John Ingraham, Vikas Garg, Regina Barzilay, and Tommi Jaakkola. Generative models for graph-based protein design. Advances in neural information processing systems, 32, 2019.
    [27]
    Lin Jiang, Eric A. Althoff, Fernando R. Clemente, Lindsey Doyle, Daniela Röthlisberger, Alexandre Zanghellini, Jasmine L. Gallaher, Jamie L. Betker, Fujie Tanaka, Carlos F. Barbas, Donald Hilvert, Kendall N. Houk, Barry L. Stoddard, and David Baker. De novo computational design of retro-aldol enzymes. Science, 319(5868):1387-1391, 2008.
    [28]
    Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Antibody-antigen docking and design via hierarchical structure refinement. In ICML, pages 10217-10227. PMLR, 2022.
    [29]
    Wengong Jin, Jeremy Wohlwend, Regina Barzilay, and Tommi Jaakkola. Iterative refinement graph neural network for antibody sequence-structure co-design. ICLR, 2022.
    [30]
    Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael JL Townshend, and Ron Dror. Learning from protein structure with geometric vector perceptrons. ICLR, 2021.
    [31]
    John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583-589, 2021.
    [32]
    Judith Klein-Seetharaman, Maki Oikawa, Shaun B Grimshaw, Julia Wirmer, Elke Duchardt, Tadashi Ueda, Taiji Imoto, Lorna J Smith, Christopher M Dobson, and Harald Schwalbe. Long-range interactions within a nonnative protein. Science, 295(5560):1719-1722, 2002.
    [33]
    Xiangzhe Kong, Wenbing Huang, and Yang Liu. Conditional antibody design as 3d equivariant graph translation. ICLR, 2023.
    [34]
    Xiangzhe Kong, Wenbing Huang, and Yang Liu. End-to-end full-atom antibody design. arXiv preprint arXiv:2302.00203, 2023.
    [35]
    Ivan V Korendovych and William F DeGrado. De novo protein design, a retrospective. Quarterly reviews of biophysics, 53:e3, 2020.
    [36]
    Brian Kuhlman and Philip Bradley. Advances in protein structure prediction and design. Nature Reviews Molecular Cell Biology, 20(11):681-697, 2019.
    [37]
    Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, and Shuiwang Ji. Generating 3d molecules for target protein binding. ICML, 2022.
    [38]
    Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, and Di He. One transformer can understand both 2d & 3d molecular data. ICLR, 2022.
    [39]
    Shitong Luo, Yufeng Su, Xingang Peng, Sheng Wang, Jian Peng, and Jianzhu Ma. Antigen-specific antibody design and optimization with diffusion-based generative models. NeurIPS, 2022.
    [40]
    Youzhi Luo and Shuiwang Ji. An autoregressive flow model for 3d molecular geometry generation from scratch. In ICLR, 2021.
    [41]
    James A Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E Hauser, and Carlos Simmerling. ff14sb: improving the accuracy of protein side chain and backbone parameters from ff99sb. Journal of chemical theory and computation, 11(8):3696-3713, 2015.
    [42]
    Christoph Malisi, Marcel Schumann, Nora C Toussaint, Jorge Kageyama, Oliver Kohlbacher, and Birte Höcker. Binding pocket optimization by computational protein design. PloS one, 7(12):e52505, 2012.
    [43]
    Gilles Marcou and Didier Rognan. Optimizing fragment and scaffold docking by use of molecular interaction fingerprints. Journal of chemical information and modeling, 47(1):195-207, 2007.
    [44]
    Erxue Min, Runfa Chen, Yatao Bian, Tingyang Xu, Kangfei Zhao, Wenbing Huang, Peilin Zhao, Junzhou Huang, Sophia Ananiadou, and Yu Rong. Transformer for graphs: An overview from architecture perspective. arXiv preprint arXiv:2202.08455, 2022.
    [45]
    Jakob Noske, Josef Paul Kynast, Dominik Lemm, Steffen Schmidt, and Birte Höcker. Pocketoptimizer 2.0: A modular framework for computer-aided ligand-binding design. Protein Science, 32(1):e4516, 2023.
    [46]
    Sergey Ovchinnikov and Po-Ssu Huang. Structure-based protein design with deep learning. Current opinion in chemical biology, 65:136-144, 2021.
    [47]
    Xingang Peng, Shitong Luo, Jiaqi Guan, Qi Xie, Jian Peng, and Jianzhu Ma. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. ICML, 2022.
    [48]
    Nicholas F Polizzi and William F DeGrado. A defined structural unit enables de novo design of small-molecule-binding proteins. Science, 369(6508):1227-1233, 2020.
    [49]
    Matthew Ragoza, Tomohide Masuda, and David Ryan Koes. Generating 3d molecules conditional on receptor binding sites with deep generative models. Chemical science, 13(9):2701-2713, 2022.
    [50]
    Anthony K Rappé, Carla J Casewit, KS Colwell, William A Goddard III, and W Mason Skiff. Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American chemical society, 114(25):10024-10035, 1992.
    [51]
    Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15):e2016239118, 2021.
    [52]
    Daniela Röthlisberger, Olga Khersonsky, Andrew M Wollacott, Lin Jiang, Jason DeChancie, Jamie Betker, Jasmine L Gallaher, Eric A Althoff, Alexandre Zanghellini, Orly Dym, et al. Kemp elimination catalysts by computational enzyme design. Nature, 453(7192):190-195, 2008.
    [53]
    Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In European semantic web conference, pages 593-607. Springer, 2018.
    [54]
    Arne Schneuing, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, and Bruno Correia. Structure-based drug design with equivariant diffusion models. arXiv preprint arXiv:2210.13695, 2022.
    [55]
    Maxim V Shapovalov and Roland L Dunbrack Jr. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure, 19(6):844-858, 2011.
    [56]
    Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, and Jian Tang. Protein sequence and structure co-design with equivariant translation. ICLR, 2023.
    [57]
    Ryan K Spencer, Glenn L Butterfoss, John R Edison, James R Eastwood, Stephen Whitelam, Kent Kirshenbaum, and Ronald N Zuckermann. Stereochemistry of polypeptoid chain configurations. Biopolymers, 110(6):e23266, 2019.
    [58]
    Martin Steinegger and Johannes Söding. Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11):1026-1028, 2017.
    [59]
    H Stephen Stoker. General, organic, and biological chemistry. Cengage Learning, 2015.
    [60]
    Arthur G Street and Stephen L Mayo. Computational protein design. Structure, 7(5):R105-R109, 1999.
    [61]
    Oleg Trott and Arthur J Olson. Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2):455-461, 2010.
    [62]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
    [63]
    Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tieyan Liu. On layer normalization in the transformer architecture. In International Conference on Machine Learning, pages 10524-10533. PMLR, 2020.
    [64]
    Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. Geodiff: A geometric diffusion model for molecular conformation generation. ICLR, 2022.
    [65]
    Xihong Yang, Jiaqi Jin, Siwei Wang, Ke Liang, Yue Liu, Yi Wen, Suyuan Liu, Sihang Zhou, Xinwang Liu, and En Zhu. Dealmvc: Dual contrastive calibration for multi-view clustering. In Proceedings of the 31th ACM International Conference on Multimedia, 2023.
    [66]
    Xihong Yang, Cheng Tan, Yue Liu, Ke Liang, Siwei Wang, Sihang Zhou, Jun Xia, Stan Z Li, Xinwang Liu, and En Zhu. Convert: Contrastive graph clustering with reliable augmentation. In Proceedings of the 31th ACM International Conference on Multimedia, 2023.
    [67]
    Alexandre Zanghellini, Lin Jiang, Andrew M Wollacott, Gong Cheng, Jens Meiler, Eric A Althoff, Daniela Röthlisberger, and David Baker. New algorithms and an in silico benchmark for computational enzyme design. Protein Science, 15(12):2785-2794, 2006.
    [68]
    Zaixi Zhang, Qi Liu, Qingyong Hu, and Chee-Kong Lee. Hierarchical graph transformer with adaptive node sampling. Advances in Neural Information Processing Systems, 2022.
    [69]
    Zaixi Zhang, Qi Liu, Chee-Kong Lee, Chang-Yu Hsieh, and Enhong Chen. An equivariant generative framework for molecular graph-structure co-design. Chemical Science, 14(31):8380-8392, 2023.
    [70]
    Zaixi Zhang, Yaosen Min, Shuxin Zheng, and Qi Liu. Molecule generation for target protein binding with structural motifs. In ICLR, 2023.
    [71]
    Zaixi Zhang, Jiaxian Yan, Qi Liu, Enhong Chen, and Marinka Zitnik. A systematic survey in geometric deep learning for structure-based drug design. arXiv preprint arXiv:2306.11768, 2023.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
    December 2023
    80772 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 30 May 2024

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media