research-article

Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

Authors:

Rui YanAuthors Info & Claims

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Pages 1856 - 1866

https://doi.org/10.1145/3627673.3679704

Published: 21 October 2024 Publication History

Abstract

Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose kNN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power of the DTA model with no or negligible cost. Different from existing methods, we introduce two neighbor aggregation ways from both embedding space and label space that are integrated into a unified framework. Specifically, we propose a label aggregation with pair-wise retrieval and a representation aggregation with point-wise retrieval of the nearest neighbors. This method executes in the inference phase and can efficiently boost the DTA prediction performance with no training cost. In addition, we propose an extension, Ada-kNN-DTA, an instance-wise and adaptive aggregation with lightweight learning. Results on four benchmark datasets show that kNN-DTA brings significant improvements, outperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB IC₅₀ and K_i testbeds, kNN-DTA obtains new records of RMSE 0.684 and 0.750 . The extended Ada-kNN-DTA further improves the performance to be 0.675 and 0.735 RMSE. These results strongly prove the effectiveness of our method. Results in other settings and comprehensive studies/analyses also show the great potential of our kNN-DTA approach.

References

[1]

Karim Abbasi, Parvin Razzaghi, Antti Poso, Massoud Amanlou, Jahan B Ghasemi, and Ali Masoudi-Nejad. 2020. DeepCDA: deep cross-domain compound--protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics, Vol. 36, 17 (2020), 4633--4642.

[2]

Qi An and Liang Yu. 2021. A heterogeneous network embedding framework for predicting similarity-based drug-target interactions. Briefings in Bioinformatics, Vol. 22, 6 (2021), bbab275.

[3]

Bence Bolgár and Péter Antal. 2016. Bayesian matrix factorization with non-random missing data using informative Gaussian process priors and soft evidences. In Conference on Probabilistic Graphical Models. PMLR, 25--36.

[4]

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2021. Improving language models by retrieving from trillions of tokens. arXiv preprint arXiv:2112.04426 (2021).

[5]

Anna Cichonska, Balaguru Ravikumar, Elina Parri, Sanna Timonen, Tapio Pahikkala, Antti Airola, Krister Wennerberg, Juho Rousu, and Tero Aittokallio. 2017. Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors. PLoS computational biology, Vol. 13, 8 (2017), e1005678.

[6]

Peter JA Cock, Tiago Antao, Jeffrey T Chang, Brad A Chapman, Cymon J Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, et al. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, Vol. 25, 11 (2009), 1422--1423.

Digital Library

[7]

Mindy I Davis, Jeremy P Hunt, Sanna Herrgard, Pietro Ciceri, Lisa M Wodicka, Gabriel Pallares, Michael Hocker, Daniel K Treiber, and Patrick P Zarrinkar. 2011. Comprehensive analysis of kinase inhibitor selectivity. Nature biotechnology, Vol. 29, 11 (2011), 1046--1051.

[8]

Hao Ding, Ichigaku Takigawa, Hiroshi Mamitsuka, and Shanfeng Zhu. 2014. Similarity-based machine learning methods for predicting drug--target interactions: a brief review. Briefings in bioinformatics, Vol. 15, 5 (2014), 734--747.

[9]

Sofia D'Souza, KV Prema, and Seetharaman Balaji. 2020. Machine learning models for drug--target interactions: current knowledge and future directions. Drug Discovery Today, Vol. 25, 4 (2020), 748--756.

[10]

Angela Fan, Claire Gardent, Chloé Braud, and Antoine Bordes. 2021. Augmenting transformers with knn-based composite memory for dialog. Transactions of the Association for Computational Linguistics, Vol. 9 (2021), 82--99.

[11]

Shikun Feng, Lixin Yang, Weiying Ma, and Yanyan Lan. 2023. UniMAP: Universal SMILES-Graph Representation Learning. arXiv preprint arXiv:2310.14216 (2023).

[12]

Michael A Fligner, Joseph S Verducci, and Paul E Blower. 2002. A modification of the Jaccard--Tanimoto similarity index for diverse selection of chemical compounds using binary strings. Technometrics, Vol. 44, 2 (2002), 110--119.

[13]

Michael K Gilson and Huan-Xiang Zhou. 2007. Calculation of protein-ligand binding affinities. Annu. Rev. Biophys. Biomol. Struct., Vol. 36 (2007), 21--42.

[14]

Mithat Gönen and Glenn Heller. 2005. Concordance probability and discriminatory power in proportional hazards regression. Biometrika, Vol. 92, 4 (2005), 965--970.

[15]

Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor OK Li. 2018. Search engine guided neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[16]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International Conference on Machine Learning. PMLR, 3929--3938.

[17]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. Realm: Retrieval-augmented language model pre-training. arXiv preprint arXiv:2002.08909 (2020).

[18]

Tong He, Marten Heidemeyer, Fuqiang Ban, Artem Cherkasov, and Martin Ester. 2017. SimBoost: a read-across approach for predicting drug--target binding affinities using gradient boosting machines. Journal of cheminformatics, Vol. 9, 1 (2017), 1--14.

[19]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

[20]

Kexin Huang, Tianfan Fu, Lucas M Glass, Marinka Zitnik, Cao Xiao, and Jimeng Sun. 2020. DeepPurpose: a deep learning library for drug--target interaction prediction. Bioinformatics, Vol. 36, 22--23 (2020), 5545--5547.

[21]

James Inglese and Douglas S Auld. 2007. High throughput screening (HTS) techniques: applications in chemical biology. Wiley Encyclopedia of Chemical Biology (2007), 1--15.

[22]

Sk Mazharul Islam, Sk Md Mosaddek Hossain, and Sumanta Ray. 2021. DTI-SNNFRA: Drug-target interaction prediction by shared nearest neighbors and fuzzy-rough approximation. Plos one, Vol. 16, 2 (2021), e0246920.

[23]

Mingjian Jiang, Shuang Wang, Shugang Zhang, Wei Zhou, Yuanyuan Zhang, and Zhen Li. 2022. Sequence-based drug-target affinity prediction using weighted graph neural networks. BMC genomics, Vol. 23, 1 (2022), 1--17.

[24]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, Vol. 7, 3 (2019), 535--547.

[25]

Łukasz Kaiser, Ofir Nachum, Aurko Roy, and Samy Bengio. 2017. Learning to remember rare events. arXiv preprint arXiv:1703.03129 (2017).

[26]

Mostafa Karimi, Di Wu, Zhangyang Wang, and Yang Shen. 2019. DeepAffinity: interpretable deep learning of compound--protein affinity through unified recurrent and convolutional neural networks. Bioinformatics, Vol. 35, 18 (2019), 3329--3338.

[27]

Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020. Nearest Neighbor Machine Translation. In International Conference on Learning Representations.

[28]

Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2019. Generalization through Memorization: Nearest Neighbor Language Models. In International Conference on Learning Representations.

[29]

Greg Landrum et al. 2013. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum (2013).

[30]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, Vol. 33 (2020), 9459--9474.

[31]

Min Li, Zhangli Lu, Yifan Wu, and YaoHang Li. 2022. BACPI: a bi-directional attention neural network for compound-protein interaction and binding affinity prediction. Bioinformatics (2022).

[32]

Shuya Li, Fangping Wan, Hantao Shu, Tao Jiang, Dan Zhao, and Jianyang Zeng. 2020. MONN: a multi-objective neural network for predicting compound-protein interactions and affinities. Cell Systems, Vol. 10, 4 (2020), 308--322.

[33]

Bin Liu, Konstantinos Pliakos, Celine Vens, and Grigorios Tsoumakas. 2022. Drug-target interaction prediction via an ensemble of weighted nearest neighbors with interaction recovery. Applied Intelligence, Vol. 52, 4 (2022), 3705--3727.

Digital Library

[34]

Tiqing Liu, Yuhmei Lin, Xin Wen, Robert N Jorissen, and Michael K Gilson. 2007. BindingDB: a web-accessible database of experimentally determined protein--ligand binding affinities. Nucleic acids research, Vol. 35, suppl_1 (2007), D198--D201.

[35]

Thin Nguyen, Hang Le, Thomas P Quinn, Tri Nguyen, Thuc Duy Le, and Svetha Venkatesh. 2021. GraphDTA: Predicting drug--target binding affinity with graph neural networks. Bioinformatics, Vol. 37, 8 (2021), 1140--1147.

[36]

Tri Minh Nguyen, Thin Nguyen, Thao Minh Le, and Truyen Tran. 2022. GEFA: Early Fusion Approach in Drug-Target Affinity Prediction. IEEE/ACM transactions on computational biology and bioinformatics, Vol. 19, 2 (2022), 718--728.

[37]

Hakime Öztürk, Arzucan Özgür, and Elif Ozkirimli. 2018. DeepDTA: deep drug--target binding affinity prediction. Bioinformatics, Vol. 34, 17 (2018), i821--i829.

[38]

Hakime Öztürk, Elif Ozkirimli, and Arzucan Özgür. 2019. WideDTA: prediction of drug-target binding affinity. arXiv preprint arXiv:1902.04166 (2019).

[39]

Tapio Pahikkala, Antti Airola, Sami Pietilä, Sushil Shakyawar, Agnieszka Szwajda, Jing Tang, and Tero Aittokallio. 2015. Toward more realistic drug--target interaction predictions. Briefings in bioinformatics, Vol. 16, 2 (2015), 325--337.

[40]

Qizhi Pei, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Haiguang Liu, Tie-Yan Liu, and Rui Yan. 2023. Breaking the barriers of data scarcity in drug--target affinity prediction. Briefings in Bioinformatics, Vol. 24, 6 (2023), bbad386.

[41]

Sudeep Pushpakom, Francesco Iorio, Patrick A Eyers, K Jane Escott, Shirley Hopper, Andrew Wells, Andrew Doig, Tim Guilliams, Joanna Latimer, Christine McNamee, et al. 2019. Drug repurposing: progress, challenges and recommendations. Nature reviews Drug discovery, Vol. 18, 1 (2019), 41--58.

[42]

Xiaoqing Ru, Xiucai Ye, Tetsuya Sakurai, and Quan Zou. 2022. NerLTR-DTA: drug--target binding affinity prediction based on neighbor relationship and learning to rank. Bioinformatics, Vol. 38, 7 (2022), 1964--1971.

[43]

Freddie R Salsbury Jr. 2010. Molecular dynamics simulations of protein dynamics and their relevance to drug discovery. Current opinion in pharmacology, Vol. 10, 6 (2010), 738--744.

[44]

Piar Ali Shar, Weiyang Tao, Shuo Gao, Chao Huang, Bohui Li, Wenjuan Zhang, Mohamed Shahen, Chunli Zheng, Yaofei Bai, and Yonghua Wang. 2016. Pred-binding: large-scale protein--ligand binding affinity prediction. Journal of enzyme inhibition and medicinal chemistry, Vol. 31, 6 (2016), 1443--1450.

[45]

Jian-Yu Shi, Jia-Xin Li, Bo-Lin Chen, and Yong Zhang. 2018. Inferring interactions between novel drugs and novel targets via instance-neighborhood-based models. Current Protein and Peptide Science, Vol. 19, 5 (2018), 488--497.

[46]

Jooyong Shim, Zhen-Yu Hong, Insuk Sohn, and Changha Hwang. 2021. Prediction of drug--target binding affinity using similarity-based convolutional neural network. Scientific Reports, Vol. 11, 1 (2021), 1--9.

[47]

Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-to-end memory networks. Advances in neural information processing systems, Vol. 28 (2015).

[48]

Natchanon Suviriyapaisal and Duangdao Wichadakul. 2023. iEdgeDTA: integrated edge information and 1D graph convolutional neural networks for binding affinity prediction. RSC advances, Vol. 13, 36 (2023), 25218--25228.

[49]

Jing Tang, Agnieszka Szwajda, Sushil Shakyawar, Tao Xu, Petteri Hintsanen, Krister Wennerberg, and Tero Aittokallio. 2014. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. Journal of Chemical Information and Modeling, Vol. 54, 3 (2014), 735--743.

[50]

Betsabeh Tanoori, Mansoor Zolghadri Jahromi, and Eghbal G Mansoori. 2021. Drug-target continuous binding affinity prediction using multiple sources of information. Expert Systems with Applications, Vol. 186 (2021), 115810.

Digital Library

[51]

Maha A Thafar, Mona Alshahrani, Somayah Albaradei, Takashi Gojobori, Magbubah Essack, and Xin Gao. 2022. Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Scientific reports, Vol. 12, 1 (2022), 1--18.

[52]

Maha A Thafar, Rawan S Olayan, Somayah Albaradei, Vladimir B Bajic, Takashi Gojobori, Magbubah Essack, and Xin Gao. 2021. DTi2Vec: Drug--target interaction prediction using network embedding and ensemble learning. Journal of cheminformatics, Vol. 13, 1 (2021), 1--18.

[53]

Oleg Trott and Arthur J Olson. 2010. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, Vol. 31, 2 (2010), 455--461.

[54]

Twan Van Laarhoven and Elena Marchiori. 2013. Predicting drug-target interactions for new drug compounds using a weighted nearest neighbor profile. PloS one, Vol. 8, 6 (2013), e66952.

[55]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[56]

Marcel L Verdonk, Jason C Cole, Michael J Hartshorn, Christopher W Murray, and Richard D Taylor. 2003. Improved protein--ligand docking using GOLD. Proteins: Structure, Function, and Bioinformatics, Vol. 52, 4 (2003), 609--623.

[57]

David Weininger. 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences, Vol. 28, 1 (1988), 31--36.

Digital Library

[58]

Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In 3rd International Conference on Learning Representations, ICLR 2015.

[59]

Jun Xia, Chengshuai Zhao, Bozhen Hu, Zhangyang Gao, Cheng Tan, Yue Liu, Siyuan Li, and Stan Z Li. 2022. Mole-bert: Rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations.

[60]

Shufang Xie, Rui Yan, Junliang Guo, Yingce Xia, Lijun Wu, and Tao Qin. 2023. Retrosynthesis prediction with local template retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 5330--5338.

Digital Library

[61]

Yoshihiro Yamanishi, Michihiro Araki, Alex Gutteridge, Wataru Honda, and Minoru Kanehisa. 2008. Prediction of drug--target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, Vol. 24, 13 (2008), i232--i240.

Digital Library

[62]

Ziduo Yang, Weihe Zhong, Lu Zhao, and Calvin Yu-Chian Chen. 2022. MGraphDTA: deep multiscale graph neural network for explainable drug--target binding affinity prediction. Chemical science, Vol. 13, 3 (2022), 816--833.

[63]

Hao Zhang, Alexander C Berg, Michael Maire, and Jitendra Malik. 2006. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. IEEE, 2126--2136.

Digital Library

[64]

Jiani Zhang, Xingjian Shi, Irwin King, and Dit-Yan Yeung. 2017. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th international conference on World Wide Web. 765--774.

Digital Library

[65]

Longxin Zhang, Wenliang Zeng, Jingsheng Chen, Jianguo Chen, and Keqin Li. 2024. GDilatedDTA: Graph dilation convolution strategy for drug target binding affinity prediction. Biomedical Signal Processing and Control, Vol. 92 (2024), 106110.

[66]

Xin Zheng, Zhirui Zhang, Junliang Guo, Shujian Huang, Boxing Chen, Weihua Luo, and Jiajun Chen. 2021. Adaptive Nearest Neighbor Machine Translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 368--374.

[67]

Yadi Zhou, Fei Wang, Jian Tang, Ruth Nussinov, and Feixiong Cheng. 2020. Artificial intelligence in COVID-19 drug repurposing. The Lancet Digital Health, Vol. 2, 12 (2020), e667--e676.

Index Terms

Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors
1. Applied computing
  1. Life and medical sciences
    1. Bioinformatics
    2. Computational biology
2. Computing methodologies
  1. Machine learning

Recommendations

A survey of drug-target interaction and affinity prediction methods via graph neural networks
Abstract
The tasks of drug-target interaction (DTI) and drug-target affinity (DTA) prediction play important roles in the field of drug discovery. However, biological experiment-based methods are time-consuming and expensive. Recently, computational-based ...
Highlights
- We present a comprehensive review on drug-target interaction and affinity prediction via recent graph neural networks.
- The drug molecule and protein are represented by graphs to overcome limitations of traditional methods.
- Accurate ...
An Efficient Drug Design Method Based on Drug-Target Affinity
Advanced Intelligent Computing Technology and Applications
Abstract
Computer-aided drug design can accelerate drug development and reduce the cost. This study proposes a targeted drug design method based on long short-term memory (LSTM) neural network and drug-target affinity. The method consists of de novo drug ...
Deep Learning-Based Prediction of Drug-Target Binding Affinities by Incorporating Local Structure of Protein
Advanced Intelligent Computing Technology and Applications
Abstract
Traditional drug discovery methods are both time-consuming and expensive. Utilizing artificial intelligence to predict drug-target binding affinity (DTA) has become an essential approach for accelerating new drug discovery. While many deep ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

October 2024

5705 pages

ISBN:9798400704369

DOI:10.1145/3627673

General Chairs:
Edoardo Serra
Boise State University, USA
,
Francesca Spezzano
Boise State University, USA

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Intelligent Social Governance Platform, Major Innovation & Planning Interdisciplinary Platform for the Double-First Class Initiative, Renmin University of China
Research Funds of Renmin University of China
Beijing Outstanding Young Scientist Program
Fundamental Research Funds for the Central Universities
Outstanding Innovative Talents Cultivation Funded Programs 2023 of Renmin University of China
National Natural Science Foundation of China

Conference

CIKM '24

Sponsor:

SIGIR

CIKM '24: The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

ID, Boise, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
77
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)39

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents