Abstract
Proteins binding to DNA is crucial for biological processes and drug development. The current computational methods are limited by the high cost of data acquisition, complex processing process, and incomplete engineering representation of manually designed feature extraction. Therefore, based on DNA-binding protein sequence information, a feature extraction method combining manual features and pre-trained models is proposed. Secondly, deep learning methods are used to capture local sequence features and long-term dependencies within the sequence, respectively. Finally, the attention mechanism is introduced to integrate features and learn weights. The performance of the latest protein language model is compared with that of the mainstream method on the test set. The MCC value of the proposed method is improved by 22.1% on average. The comparison results prove the efficiency and accuracy of the method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lambert, S.A., Jolma, A., Campitelli, L.F., et al.: The human transcription factors. Cell 172(4), 650–665 (2018)
Yu, Y., et al.: Cryo-EM structure of DNA-bound Smc5/6 reveals DNA clamping enabled by multi-subunit conformational changes. Proc. Natl. Acad. Sci. 119(23), e2202799119 (2022)
Ferraz, R.A.C., Lopes, A.L.G., da Silva, J.A.F., et al.: DNA–protein interaction studies: a historical and comparative analysis. Plant Methods 17(1), 1–21 (2021)
Hwang, S., Gou, Z., Kuznetsov, I.B.: DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5), 634–636 (2007)
Si, J., Zhang, Z., Lin, B., et al.: MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst. Biol. 5(1), 1–7 (2011)
Zhu, Y.H., Hu, J., Song, X.N., et al.: DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J. Chem. Inf. Model. 59(6), 3057–3071 (2019)
Guan, S., Zou, Q., Wu, H., et al.: Protein-dna binding residues prediction using a deep learning model with hierarchical feature extraction. IEEE/ACM Trans. Comput. Biol. Bioinform. (2022)
Zhang, J., Chen, Q., Liu, B.: NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning. Briefings Bioinform. 22(5), bbaa397 (2021)
Yuan, Q.M., Chen, S., Rao, J.H., et al.: AlphaFold2-aware protein-DNA binding site prediction using graph transformer. Briefings Bioinform. 23(2), bbab564 (2022)
Zhang, J., Ghadermarzi, S., Katuwawala, A., et al.: DNAgenie: accurate prediction of DNA-type-specific binding residues in protein sequences. Briefings Bioinform. 22(6), bbab336 (2021)
Patiyal, S., Dhall, A., et al.: A deep learning-based method for the prediction of DNA interacting residues in a protein. Briefings Bioinform. 23(5), bbac322 (2022)
Hu, J., Bai, Y.S., Zheng, L.L., et al.: Protein-dna binding residue prediction via bagging strategy and sequence-based cube-format feature. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(6), 3635–3645 (2021)
Chen, Y.C., Wright, J.D., Lim, C.: DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res. 40(W1), W249–W256 (2012)
Tsuchiya, Y., Kinoshita, K., Nakamura, H.: PreDs: a server for predicting dsDNA-binding site on protein molecular surfaces. Bioinformatics 21(8), 1721–1723 (2005)
Xia, Y., Xia, C.Q., Pan, X.Y., et al.: GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues. Nucleic Acids Res. 49(9), e51 (2021)
Esmaeeli, R., Bauzá, A., Perez, A.: Structural predictions of protein–DNA binding: MELD-DNA. Nucleic Acids Res. 51(4), 1625–1636 (2023)
Cao, L., Coventry, B., Goreshnik, I., et al.: Design of protein-binding proteins from the target structure alone. Nature 605(7910), 551–560 (2022)
Elnaggar, A., Heinzinger, M., Dallago, C., et al.: ProtTrans: towards cracking the language of Life’s code through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 7112–7127 (2022)
Hu, B., et al.: Protein language models and structure prediction: connection and progression. arXiv preprint arXiv:2211.16742 (2022)
Lin, Z., et al.: Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379(6637), 1123–1130 (2023)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Hu, J., Li, Y., Zhang, M., et al.: Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1389–1398 (2016)
Fu, L., Niu, B., Zhu, Z., et al.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)
Zhang, C., Zhang, X., Freddolino, P.L., et al.: BioLiP2: an updated structure database for biologically relevant ligand–protein interactions. Nucleic Acids Res.: gkad630, (2023)
Cui, Y., Jia, M., Lin, T.-Y., et al.: Class-balanced loss based on effective number of samples. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, 9260–9269 (2019)
Acknowledgments
The authors thank the laboratory equipment and configuration for the timely help in analyzing a large amount of data. Fundings from the National Natural Science Foundation (grant number: 62377036) and the Tianjin Research Innovation Project for Postgraduate Students (Project number: 2022SKYZ104) are gratefully acknowledged. We thank Dr Zhang, the professor at Tianjin University of Science and Technology, for his help with this study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shan, K., Zhang, X., Song, C. (2024). Prediction of Protein-DNA Binding Sites Based on Protein Language Model and Deep Learning. In: Huang, DS., Pan, Y., Zhang, Q. (eds) Advanced Intelligent Computing in Bioinformatics. ICIC 2024. Lecture Notes in Computer Science(), vol 14882. Springer, Singapore. https://doi.org/10.1007/978-981-97-5692-6_28
Download citation
DOI: https://doi.org/10.1007/978-981-97-5692-6_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5691-9
Online ISBN: 978-981-97-5692-6
eBook Packages: Computer ScienceComputer Science (R0)