Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3583780.3614893acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Geometric Graph Learning for Protein Mutation Effect Prediction

Published: 21 October 2023 Publication History

Abstract

Proteins govern a wide range of biological systems. Evaluating the changes in protein properties upon protein mutation is a fundamental application of protein design, where modeling the 3D protein structure is a principal task for AI-driven computational approaches. Existing deep learning (DL) approaches represent the protein structure as a 3D geometric graph and simplify the graph modeling to different degrees, thereby failing to capture the low-level atom patterns and high-level amino acid patterns simultaneously. In addition, limited training samples with ground truth labels and protein structures further restrict the effectiveness of DL approaches. In this paper, we propose a new graph learning framework, Hierarchical Graph Invariant Network (HGIN), a fine-grained and data-efficient graph neural encoder for encoding protein structures and predicting the mutation effect on protein properties. For fine-grained modeling, HGIN hierarchically models the low-level interactions of atoms and the high-level interactions of amino acid residues by Graph Neural Networks. For data efficiency, HGIN preserves the invariant encoding for atom permutation and coordinate transformation, which is an intrinsic inductive bias of property prediction that bypasses data augmentations. We integrate HGIN into a Siamese network to predict the quantitative effect on protein properties upon mutations. Our approach outperforms 9 state-of-the-art approaches on 3 protein datasets. More inspiringly, when predicting the neutralizing ability of human antibodies against COVID-19 mutant viruses, HGIN achieves an absolute improvement of 0.23 regarding the Spearman coefficient.

References

[1]
https://github.com/ddofer/ProFET.
[2]
https://github.com/FowlerLab/Envision2017.
[3]
https://life.bsc.es/pid/skempi2.
[4]
https://github.com/tommyhuangthu/EvoEF2.
[5]
Helen M Berman, John Westbrook, Zukang Feng, Gary Gilliland, Talapady N Bhat, Helge Weissig, Ilya N Shindyalov, and Philip E Bourne. The protein data bank. Nucleic acids research, 28(1):235--242, 2000.
[6]
Muhao Chen, Chelsea Ju, Guangyu Zhou, Xuelu Chen, Tianran Zhang, Kai-Wei Chang, Carlo Zaniolo, and Wei Wang. Multifaceted protein-protein interaction prediction based on siamese residual rcnn. Bioinformatics, 35(14):i305--i314, 07 2019.
[7]
Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, 2014.
[8]
Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Wang Yu, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, and Burkhard Rost. Prottrans: Towards cracking the language of lifes code through self-supervised deep learning and high performance computing. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1--1, 2021.
[9]
Shuheng Fang, Kangfei Zhao, Guanghua Li, and Jeffrey Xu Yu. Community search: A meta-learning approach. In Proc. ICDE'23, pages 2358--2371. IEEE, 2023.
[10]
Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-Hur. Protein interface prediction using graph convolutional networks. In Proc. NIPS, pages 6530--6539, 2017.
[11]
Octavian-Eugen Ganea, Xinyuan Huang, Charlotte Bunne, Yatao Bian, Regina Barzilay, Tommi S. Jaakkola, and Andreas Krause. Independent SE(3)-equivariant models for end-to-end rigid protein docking. In International Conference on Learning Representations, 2022.
[12]
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. In Proc. ICML, volume 70 of Proceedings of Machine Learning Research, pages 1263--1272. PMLR, 2017.
[13]
Vanessa E Gray, Ronald J Hause, Jens Luebeck, Jay Shendure, and Douglas M Fowler. Quantitative missense variant effect prediction using large-scale mutagenesis data. Cell systems, 6(1):116--124, 2018.
[14]
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997.
[15]
Xiaoqiang Huang, Robin Pearce, and Yang Zhang. Evoef2: accurate and fast energy function for computational protein design. Bioinform., 36(4):1135--1142, 2020.
[16]
Justina Jankauskaite, Brian Jiménez-García, Justas Dapkunas, Juan Fernández-Recio, and Iain H. Moal. SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinform., 35(3):462--469, 2019.
[17]
Yuli Jiang, Yu Rong, Hong Cheng, Xin Huang, Kangfei Zhao, and Junzhou Huang. Query driven-graph neural networks for community search: From non-attributed, attributed, to interactive attributed. Proc. VLDB Endow., 15(6):1243--1255, 2022.
[18]
Rui Jiao, Jiaqi Han, Wenbing Huang, Yu Rong, and Yang Liu. Energy-motivated equivariant pretraining for 3d molecular graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 8096--8104, 2023.
[19]
Wengong Jin, Jeremy Wohlwend, Regina Barzilay, and Tommi S. Jaakkola. Iterative refinement graph neural network for antibody sequence-structure co-design. In International Conference on Learning Representations, 2022.
[20]
Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael John Lamarre Townshend, and Ron O. Dror. Learning from protein structure with geometric vector perceptrons. In Proc. ICLR. OpenReview.net, 2021.
[21]
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin ?ídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583--589, 2021.
[22]
Yash Khemchandani, Stephen O'Hagan, Soumitra Samanta, Neil Swainston, Timothy J. Roberts, Danushka Bollegala, and Douglas B. Kell. Deepgraphmolgen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. J. Cheminformatics, 12(1):53, 2020.
[23]
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In Proc. ICLR. OpenReview.net, 2017.
[24]
Johannes Klicpera, Janek Groß, and Stephan Günnemann. Directional message passing for molecular graphs. In Proc. ICLR. OpenReview.net, 2020.
[25]
Jia Li, Yu Rong, Hong Cheng, Helen Meng, Wenbing Huang, and Junzhou Huang. Semi-supervised graph classification: A hierarchical graph perspective. In The World Wide Web Conference, pages 972--982, 2019.
[26]
Shuangli Li, Jingbo Zhou, Tong Xu, Liang Huang, Fan Wang, Haoyi Xiong, Weili Huang, Dejing Dou, and Hui Xiong. Structure-aware interactive graph neural networks for the prediction of protein-ligand binding affinity. In Proc. KDD, pages 975--985. ACM, 2021.
[27]
Xianggen Liu, Yunan Luo, Pengyong Li, Sen Song, and Jian Peng. Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS Comput. Biol., 17(8), 2021.
[28]
Yi Liu, Hao Yuan, Lei Cai, and Shuiwang Ji. Deep learning of high-order interactions for protein interface prediction. In Proc. KDD, pages 679--687. ACM, 2020.
[29]
Yunan Luo, Guangde Jiang, Tianhao Yu, Yang Liu, Lam Vo, Hantian Ding, Yufeng Su, Wesley Wei Qian, Huimin Zhao, and Jian Peng. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nature Communications, 12(1), September 2021.
[30]
Hehuan Ma, Yatao Bian, Yu Rong, Wenbing Huang, Tingyang Xu, Weiyang Xie, Geyan Ye, and Junzhou Huang. Cross-dependent graph neural networks for molecular property prediction. Bioinformatics, 38(7):2003--2009, 2022.
[31]
Dan Ofer and Michal Linial. Profet: Feature engineering captures high-level protein functions. Bioinform., 31(21):3429--3436, 2015.
[32]
Xingang Peng, Shitong Luo, Jiaqi Guan, Qi Xie, Jian Peng, and Jianzhu Ma. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In Inter- national Conference on Machine Learning, 2022.
[33]
Matthew Ragoza, Joshua E. Hochuli, Elisa Idrobo, Jocelyn Sunseri, and David Ryan Koes. Protein-ligand scoring with convolutional neural networks. J. Chem. Inf. Model., 57(4):942--957, 2017.
[34]
Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John F. Canny, Pieter Abbeel, and Yun S. Song. Evaluating protein transfer learning with TAPE. In Proc. NeurIPS, pages 9686--9698, 2019.
[35]
Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, 33:12559--12571, 2020.
[36]
Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. Dropedge: Towards deep graph convolutional networks on node classification. In International Conference on Learning Representations, 2019.
[37]
Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E(n) equivariant graph neural networks. In Proc. ICML, volume 139 of Proceedings of Machine Learning Research, pages 9323--9332. PMLR, 2021.
[38]
Sisi Shan, Shitong Luo, Ziqing Yang, Junxian Hong, Yufeng Su, Fan Ding, Lili Fu, Chenyu Li, Peng Chen, Jianzhu Ma, Xuanling Shi, Qi Zhang, Bonnie Berger, Linqi Zhang, and Jian Peng. Deep learning guided optimization of human antibody against sars-cov-2 variants with broad neutralization. Proceedings of the National Academy of Sciences, 119(11):e2122954119, 2022.
[39]
Marta M. Stepniewska-Dziubinska, Piotr Zielenkiewicz, and Pawel Siedlecki. Development and evaluation of a deep learning model for protein-ligand binding affinity prediction. Bioinform., 34(21):3666--3674, 2018.
[40]
Jeanne Trinquier, Guido Uguzzoni, Andrea Pagnani, Francesco Zamponi, and Martin Weigt. Efficient generative modeling of protein sequences using simple autoregressive models. Nature communications, 12(1):1--11, 2021.
[41]
Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
[42]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proc. NIPS, pages 5998--6008, 2017.
[43]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In Proc. ICLR, 2018.
[44]
Ning Wu, Wayne Xin Zhao, Jingyuan Wang, and Dayan Pan. Learning effective road network representation with hierarchical graph neural networks. In Proc. KDD, pages 6--14. ACM, 2020.
[45]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst., 32(1):4--24, 2021.
[46]
Tian Xia and Wei-Shinn Ku. Geometric graph representation learning on protein structure prediction. In Proc. KDD, pages 1873--1883. ACM, 2021.
[47]
Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, and Stefano Soatto. Learning hierarchical graph neural networks for image clustering. In Proc. ICCV, pages 3447--3457. IEEE, 2021.
[48]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In Proc. ICLR. OpenReview.net, 2019.
[49]
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. Do transformers really perform badly for graph representation? In Proc. NeurIPS, pages 28877--28888, 2021.
[50]
Jiaxuan You, Bowen Liu, Zhitao Ying, Vijay S. Pande, and Jure Leskovec. Graph convolutional policy network for goal-directed molecular graph generation. In Proc. NeurIPS, pages 6412--6422, 2018.
[51]
Junchi Yu, Tingyang Xu, Yu Rong, Junzhou Huang, and Ran He. Structure-aware conditional variational auto-encoder for constrained molecule optimization. Pattern Recognit., 126:108581, 2022.
[52]
Ziwei Zhang, Peng Cui, and Wenwu Zhu. Deep learning on graphs: A survey. IEEE Trans. Knowl. Data Eng., 34(1):249--270, 2022.
[53]
Chenguang Zhao, Tong Liu, and Zheng Wang. Panda2: protein function prediction using graph neural networks. NAR genomics and bioinformatics, 4(1), 2022.
[54]
Kangfei Zhao, Jeffrey Xu Yu, Qiyan Li, Hao Zhang, and Yu Rong. Learned sketch for subgraph counting: a holistic approach. The VLDB Journal, pages 1--26, 2023.
[55]
Kangfei Zhao, Zhiwei Zhang, Yu Rong, Jeffrey Xu Yu, and Junzhou Huang. Finding critical users in social communities via graph convolutions. IEEE Trans. Knowl. Data Eng., 35(1):456--468, 2023.
[56]
Yingwen Zhao, Zhihao Yang, Yongkai Hong, Yumeng Yang, Lei Wang, Yin Zhang, Hongfei Lin, and Jian Wang. Protein function prediction with functional and topological knowledge of gene ontology. IEEE Transactions on NanoBioscience, 2023.
[57]
Guangyu Zhou, Muhao Chen, Chelsea JT Ju, Zheng Wang, Jyun-Yu Jiang, and Wei Wang. Mutation effect estimation on protein--protein interactions using deep contextualized representation learning. NAR genomics and bioinformatics, 2(2):lqaa015, 2020.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. geometric graph learning
  2. graph neural network

Qualifiers

  • Research-article

Conference

CIKM '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 380
    Total Downloads
  • Downloads (Last 12 months)380
  • Downloads (Last 6 weeks)18
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media