Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3627673.3679704acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

Published: 21 October 2024 Publication History

Abstract

Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose kNN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power of the DTA model with no or negligible cost. Different from existing methods, we introduce two neighbor aggregation ways from both embedding space and label space that are integrated into a unified framework. Specifically, we propose a label aggregation with pair-wise retrieval and a representation aggregation with point-wise retrieval of the nearest neighbors. This method executes in the inference phase and can efficiently boost the DTA prediction performance with no training cost. In addition, we propose an extension, Ada-kNN-DTA, an instance-wise and adaptive aggregation with lightweight learning. Results on four benchmark datasets show that kNN-DTA brings significant improvements, outperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB IC50 and Ki testbeds, kNN-DTA obtains new records of RMSE 0.684 and 0.750 . The extended Ada-kNN-DTA further improves the performance to be 0.675 and 0.735 RMSE. These results strongly prove the effectiveness of our method. Results in other settings and comprehensive studies/analyses also show the great potential of our kNN-DTA approach.

References

[1]
Karim Abbasi, Parvin Razzaghi, Antti Poso, Massoud Amanlou, Jahan B Ghasemi, and Ali Masoudi-Nejad. 2020. DeepCDA: deep cross-domain compound--protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics, Vol. 36, 17 (2020), 4633--4642.
[2]
Qi An and Liang Yu. 2021. A heterogeneous network embedding framework for predicting similarity-based drug-target interactions. Briefings in Bioinformatics, Vol. 22, 6 (2021), bbab275.
[3]
Bence Bolgár and Péter Antal. 2016. Bayesian matrix factorization with non-random missing data using informative Gaussian process priors and soft evidences. In Conference on Probabilistic Graphical Models. PMLR, 25--36.
[4]
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2021. Improving language models by retrieving from trillions of tokens. arXiv preprint arXiv:2112.04426 (2021).
[5]
Anna Cichonska, Balaguru Ravikumar, Elina Parri, Sanna Timonen, Tapio Pahikkala, Antti Airola, Krister Wennerberg, Juho Rousu, and Tero Aittokallio. 2017. Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors. PLoS computational biology, Vol. 13, 8 (2017), e1005678.
[6]
Peter JA Cock, Tiago Antao, Jeffrey T Chang, Brad A Chapman, Cymon J Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, et al. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, Vol. 25, 11 (2009), 1422--1423.
[7]
Mindy I Davis, Jeremy P Hunt, Sanna Herrgard, Pietro Ciceri, Lisa M Wodicka, Gabriel Pallares, Michael Hocker, Daniel K Treiber, and Patrick P Zarrinkar. 2011. Comprehensive analysis of kinase inhibitor selectivity. Nature biotechnology, Vol. 29, 11 (2011), 1046--1051.
[8]
Hao Ding, Ichigaku Takigawa, Hiroshi Mamitsuka, and Shanfeng Zhu. 2014. Similarity-based machine learning methods for predicting drug--target interactions: a brief review. Briefings in bioinformatics, Vol. 15, 5 (2014), 734--747.
[9]
Sofia D'Souza, KV Prema, and Seetharaman Balaji. 2020. Machine learning models for drug--target interactions: current knowledge and future directions. Drug Discovery Today, Vol. 25, 4 (2020), 748--756.
[10]
Angela Fan, Claire Gardent, Chloé Braud, and Antoine Bordes. 2021. Augmenting transformers with knn-based composite memory for dialog. Transactions of the Association for Computational Linguistics, Vol. 9 (2021), 82--99.
[11]
Shikun Feng, Lixin Yang, Weiying Ma, and Yanyan Lan. 2023. UniMAP: Universal SMILES-Graph Representation Learning. arXiv preprint arXiv:2310.14216 (2023).
[12]
Michael A Fligner, Joseph S Verducci, and Paul E Blower. 2002. A modification of the Jaccard--Tanimoto similarity index for diverse selection of chemical compounds using binary strings. Technometrics, Vol. 44, 2 (2002), 110--119.
[13]
Michael K Gilson and Huan-Xiang Zhou. 2007. Calculation of protein-ligand binding affinities. Annu. Rev. Biophys. Biomol. Struct., Vol. 36 (2007), 21--42.
[14]
Mithat Gönen and Glenn Heller. 2005. Concordance probability and discriminatory power in proportional hazards regression. Biometrika, Vol. 92, 4 (2005), 965--970.
[15]
Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor OK Li. 2018. Search engine guided neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[16]
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International Conference on Machine Learning. PMLR, 3929--3938.
[17]
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. Realm: Retrieval-augmented language model pre-training. arXiv preprint arXiv:2002.08909 (2020).
[18]
Tong He, Marten Heidemeyer, Fuqiang Ban, Artem Cherkasov, and Martin Ester. 2017. SimBoost: a read-across approach for predicting drug--target binding affinities using gradient boosting machines. Journal of cheminformatics, Vol. 9, 1 (2017), 1--14.
[19]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[20]
Kexin Huang, Tianfan Fu, Lucas M Glass, Marinka Zitnik, Cao Xiao, and Jimeng Sun. 2020. DeepPurpose: a deep learning library for drug--target interaction prediction. Bioinformatics, Vol. 36, 22--23 (2020), 5545--5547.
[21]
James Inglese and Douglas S Auld. 2007. High throughput screening (HTS) techniques: applications in chemical biology. Wiley Encyclopedia of Chemical Biology (2007), 1--15.
[22]
Sk Mazharul Islam, Sk Md Mosaddek Hossain, and Sumanta Ray. 2021. DTI-SNNFRA: Drug-target interaction prediction by shared nearest neighbors and fuzzy-rough approximation. Plos one, Vol. 16, 2 (2021), e0246920.
[23]
Mingjian Jiang, Shuang Wang, Shugang Zhang, Wei Zhou, Yuanyuan Zhang, and Zhen Li. 2022. Sequence-based drug-target affinity prediction using weighted graph neural networks. BMC genomics, Vol. 23, 1 (2022), 1--17.
[24]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, Vol. 7, 3 (2019), 535--547.
[25]
Łukasz Kaiser, Ofir Nachum, Aurko Roy, and Samy Bengio. 2017. Learning to remember rare events. arXiv preprint arXiv:1703.03129 (2017).
[26]
Mostafa Karimi, Di Wu, Zhangyang Wang, and Yang Shen. 2019. DeepAffinity: interpretable deep learning of compound--protein affinity through unified recurrent and convolutional neural networks. Bioinformatics, Vol. 35, 18 (2019), 3329--3338.
[27]
Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020. Nearest Neighbor Machine Translation. In International Conference on Learning Representations.
[28]
Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2019. Generalization through Memorization: Nearest Neighbor Language Models. In International Conference on Learning Representations.
[29]
Greg Landrum et al. 2013. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum (2013).
[30]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, Vol. 33 (2020), 9459--9474.
[31]
Min Li, Zhangli Lu, Yifan Wu, and YaoHang Li. 2022. BACPI: a bi-directional attention neural network for compound-protein interaction and binding affinity prediction. Bioinformatics (2022).
[32]
Shuya Li, Fangping Wan, Hantao Shu, Tao Jiang, Dan Zhao, and Jianyang Zeng. 2020. MONN: a multi-objective neural network for predicting compound-protein interactions and affinities. Cell Systems, Vol. 10, 4 (2020), 308--322.
[33]
Bin Liu, Konstantinos Pliakos, Celine Vens, and Grigorios Tsoumakas. 2022. Drug-target interaction prediction via an ensemble of weighted nearest neighbors with interaction recovery. Applied Intelligence, Vol. 52, 4 (2022), 3705--3727.
[34]
Tiqing Liu, Yuhmei Lin, Xin Wen, Robert N Jorissen, and Michael K Gilson. 2007. BindingDB: a web-accessible database of experimentally determined protein--ligand binding affinities. Nucleic acids research, Vol. 35, suppl_1 (2007), D198--D201.
[35]
Thin Nguyen, Hang Le, Thomas P Quinn, Tri Nguyen, Thuc Duy Le, and Svetha Venkatesh. 2021. GraphDTA: Predicting drug--target binding affinity with graph neural networks. Bioinformatics, Vol. 37, 8 (2021), 1140--1147.
[36]
Tri Minh Nguyen, Thin Nguyen, Thao Minh Le, and Truyen Tran. 2022. GEFA: Early Fusion Approach in Drug-Target Affinity Prediction. IEEE/ACM transactions on computational biology and bioinformatics, Vol. 19, 2 (2022), 718--728.
[37]
Hakime Öztürk, Arzucan Özgür, and Elif Ozkirimli. 2018. DeepDTA: deep drug--target binding affinity prediction. Bioinformatics, Vol. 34, 17 (2018), i821--i829.
[38]
Hakime Öztürk, Elif Ozkirimli, and Arzucan Özgür. 2019. WideDTA: prediction of drug-target binding affinity. arXiv preprint arXiv:1902.04166 (2019).
[39]
Tapio Pahikkala, Antti Airola, Sami Pietilä, Sushil Shakyawar, Agnieszka Szwajda, Jing Tang, and Tero Aittokallio. 2015. Toward more realistic drug--target interaction predictions. Briefings in bioinformatics, Vol. 16, 2 (2015), 325--337.
[40]
Qizhi Pei, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Haiguang Liu, Tie-Yan Liu, and Rui Yan. 2023. Breaking the barriers of data scarcity in drug--target affinity prediction. Briefings in Bioinformatics, Vol. 24, 6 (2023), bbad386.
[41]
Sudeep Pushpakom, Francesco Iorio, Patrick A Eyers, K Jane Escott, Shirley Hopper, Andrew Wells, Andrew Doig, Tim Guilliams, Joanna Latimer, Christine McNamee, et al. 2019. Drug repurposing: progress, challenges and recommendations. Nature reviews Drug discovery, Vol. 18, 1 (2019), 41--58.
[42]
Xiaoqing Ru, Xiucai Ye, Tetsuya Sakurai, and Quan Zou. 2022. NerLTR-DTA: drug--target binding affinity prediction based on neighbor relationship and learning to rank. Bioinformatics, Vol. 38, 7 (2022), 1964--1971.
[43]
Freddie R Salsbury Jr. 2010. Molecular dynamics simulations of protein dynamics and their relevance to drug discovery. Current opinion in pharmacology, Vol. 10, 6 (2010), 738--744.
[44]
Piar Ali Shar, Weiyang Tao, Shuo Gao, Chao Huang, Bohui Li, Wenjuan Zhang, Mohamed Shahen, Chunli Zheng, Yaofei Bai, and Yonghua Wang. 2016. Pred-binding: large-scale protein--ligand binding affinity prediction. Journal of enzyme inhibition and medicinal chemistry, Vol. 31, 6 (2016), 1443--1450.
[45]
Jian-Yu Shi, Jia-Xin Li, Bo-Lin Chen, and Yong Zhang. 2018. Inferring interactions between novel drugs and novel targets via instance-neighborhood-based models. Current Protein and Peptide Science, Vol. 19, 5 (2018), 488--497.
[46]
Jooyong Shim, Zhen-Yu Hong, Insuk Sohn, and Changha Hwang. 2021. Prediction of drug--target binding affinity using similarity-based convolutional neural network. Scientific Reports, Vol. 11, 1 (2021), 1--9.
[47]
Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-to-end memory networks. Advances in neural information processing systems, Vol. 28 (2015).
[48]
Natchanon Suviriyapaisal and Duangdao Wichadakul. 2023. iEdgeDTA: integrated edge information and 1D graph convolutional neural networks for binding affinity prediction. RSC advances, Vol. 13, 36 (2023), 25218--25228.
[49]
Jing Tang, Agnieszka Szwajda, Sushil Shakyawar, Tao Xu, Petteri Hintsanen, Krister Wennerberg, and Tero Aittokallio. 2014. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. Journal of Chemical Information and Modeling, Vol. 54, 3 (2014), 735--743.
[50]
Betsabeh Tanoori, Mansoor Zolghadri Jahromi, and Eghbal G Mansoori. 2021. Drug-target continuous binding affinity prediction using multiple sources of information. Expert Systems with Applications, Vol. 186 (2021), 115810.
[51]
Maha A Thafar, Mona Alshahrani, Somayah Albaradei, Takashi Gojobori, Magbubah Essack, and Xin Gao. 2022. Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Scientific reports, Vol. 12, 1 (2022), 1--18.
[52]
Maha A Thafar, Rawan S Olayan, Somayah Albaradei, Vladimir B Bajic, Takashi Gojobori, Magbubah Essack, and Xin Gao. 2021. DTi2Vec: Drug--target interaction prediction using network embedding and ensemble learning. Journal of cheminformatics, Vol. 13, 1 (2021), 1--18.
[53]
Oleg Trott and Arthur J Olson. 2010. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, Vol. 31, 2 (2010), 455--461.
[54]
Twan Van Laarhoven and Elena Marchiori. 2013. Predicting drug-target interactions for new drug compounds using a weighted nearest neighbor profile. PloS one, Vol. 8, 6 (2013), e66952.
[55]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[56]
Marcel L Verdonk, Jason C Cole, Michael J Hartshorn, Christopher W Murray, and Richard D Taylor. 2003. Improved protein--ligand docking using GOLD. Proteins: Structure, Function, and Bioinformatics, Vol. 52, 4 (2003), 609--623.
[57]
David Weininger. 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences, Vol. 28, 1 (1988), 31--36.
[58]
Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In 3rd International Conference on Learning Representations, ICLR 2015.
[59]
Jun Xia, Chengshuai Zhao, Bozhen Hu, Zhangyang Gao, Cheng Tan, Yue Liu, Siyuan Li, and Stan Z Li. 2022. Mole-bert: Rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations.
[60]
Shufang Xie, Rui Yan, Junliang Guo, Yingce Xia, Lijun Wu, and Tao Qin. 2023. Retrosynthesis prediction with local template retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 5330--5338.
[61]
Yoshihiro Yamanishi, Michihiro Araki, Alex Gutteridge, Wataru Honda, and Minoru Kanehisa. 2008. Prediction of drug--target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, Vol. 24, 13 (2008), i232--i240.
[62]
Ziduo Yang, Weihe Zhong, Lu Zhao, and Calvin Yu-Chian Chen. 2022. MGraphDTA: deep multiscale graph neural network for explainable drug--target binding affinity prediction. Chemical science, Vol. 13, 3 (2022), 816--833.
[63]
Hao Zhang, Alexander C Berg, Michael Maire, and Jitendra Malik. 2006. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. IEEE, 2126--2136.
[64]
Jiani Zhang, Xingjian Shi, Irwin King, and Dit-Yan Yeung. 2017. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th international conference on World Wide Web. 765--774.
[65]
Longxin Zhang, Wenliang Zeng, Jingsheng Chen, Jianguo Chen, and Keqin Li. 2024. GDilatedDTA: Graph dilation convolution strategy for drug target binding affinity prediction. Biomedical Signal Processing and Control, Vol. 92 (2024), 106110.
[66]
Xin Zheng, Zhirui Zhang, Junliang Guo, Shujian Huang, Boxing Chen, Weihua Luo, and Jiajun Chen. 2021. Adaptive Nearest Neighbor Machine Translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 368--374.
[67]
Yadi Zhou, Fei Wang, Jian Tang, Ruth Nussinov, and Feixiong Cheng. 2020. Artificial intelligence in COVID-19 drug repurposing. The Lancet Digital Health, Vol. 2, 12 (2020), e667--e676.

Index Terms

  1. Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
        October 2024
        5705 pages
        ISBN:9798400704369
        DOI:10.1145/3627673
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 21 October 2024

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. drug-target affinity
        2. k nearest neighbor
        3. retrieval

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        CIKM '24
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

        Upcoming Conference

        CIKM '25

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 77
          Total Downloads
        • Downloads (Last 12 months)77
        • Downloads (Last 6 weeks)39
        Reflects downloads up to 24 Dec 2024

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media