Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Prediction of Protein-DNA Binding Sites Based on Protein Language Model and Deep Learning

  • Conference paper
  • First Online:
Advanced Intelligent Computing in Bioinformatics (ICIC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 14882))

Included in the following conference series:

  • 333 Accesses

Abstract

Proteins binding to DNA is crucial for biological processes and drug development. The current computational methods are limited by the high cost of data acquisition, complex processing process, and incomplete engineering representation of manually designed feature extraction. Therefore, based on DNA-binding protein sequence information, a feature extraction method combining manual features and pre-trained models is proposed. Secondly, deep learning methods are used to capture local sequence features and long-term dependencies within the sequence, respectively. Finally, the attention mechanism is introduced to integrate features and learn weights. The performance of the latest protein language model is compared with that of the mainstream method on the test set. The MCC value of the proposed method is improved by 22.1% on average. The comparison results prove the efficiency and accuracy of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lambert, S.A., Jolma, A., Campitelli, L.F., et al.: The human transcription factors. Cell 172(4), 650–665 (2018)

    Article  Google Scholar 

  2. Yu, Y., et al.: Cryo-EM structure of DNA-bound Smc5/6 reveals DNA clamping enabled by multi-subunit conformational changes. Proc. Natl. Acad. Sci. 119(23), e2202799119 (2022)

    Article  Google Scholar 

  3. Ferraz, R.A.C., Lopes, A.L.G., da Silva, J.A.F., et al.: DNA–protein interaction studies: a historical and comparative analysis. Plant Methods 17(1), 1–21 (2021)

    Article  Google Scholar 

  4. Hwang, S., Gou, Z., Kuznetsov, I.B.: DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5), 634–636 (2007)

    Article  Google Scholar 

  5. Si, J., Zhang, Z., Lin, B., et al.: MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst. Biol. 5(1), 1–7 (2011)

    Google Scholar 

  6. Zhu, Y.H., Hu, J., Song, X.N., et al.: DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J. Chem. Inf. Model. 59(6), 3057–3071 (2019)

    Article  Google Scholar 

  7. Guan, S., Zou, Q., Wu, H., et al.: Protein-dna binding residues prediction using a deep learning model with hierarchical feature extraction. IEEE/ACM Trans. Comput. Biol. Bioinform. (2022)

    Google Scholar 

  8. Zhang, J., Chen, Q., Liu, B.: NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning. Briefings Bioinform. 22(5), bbaa397 (2021)

    Google Scholar 

  9. Yuan, Q.M., Chen, S., Rao, J.H., et al.: AlphaFold2-aware protein-DNA binding site prediction using graph transformer. Briefings Bioinform. 23(2), bbab564 (2022)

    Google Scholar 

  10. Zhang, J., Ghadermarzi, S., Katuwawala, A., et al.: DNAgenie: accurate prediction of DNA-type-specific binding residues in protein sequences. Briefings Bioinform. 22(6), bbab336 (2021)

    Google Scholar 

  11. Patiyal, S., Dhall, A., et al.: A deep learning-based method for the prediction of DNA interacting residues in a protein. Briefings Bioinform. 23(5), bbac322 (2022)

    Google Scholar 

  12. Hu, J., Bai, Y.S., Zheng, L.L., et al.: Protein-dna binding residue prediction via bagging strategy and sequence-based cube-format feature. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(6), 3635–3645 (2021)

    Google Scholar 

  13. Chen, Y.C., Wright, J.D., Lim, C.: DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res. 40(W1), W249–W256 (2012)

    Article  Google Scholar 

  14. Tsuchiya, Y., Kinoshita, K., Nakamura, H.: PreDs: a server for predicting dsDNA-binding site on protein molecular surfaces. Bioinformatics 21(8), 1721–1723 (2005)

    Article  Google Scholar 

  15. Xia, Y., Xia, C.Q., Pan, X.Y., et al.: GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues. Nucleic Acids Res. 49(9), e51 (2021)

    Article  Google Scholar 

  16. Esmaeeli, R., Bauzá, A., Perez, A.: Structural predictions of protein–DNA binding: MELD-DNA. Nucleic Acids Res. 51(4), 1625–1636 (2023)

    Article  Google Scholar 

  17. Cao, L., Coventry, B., Goreshnik, I., et al.: Design of protein-binding proteins from the target structure alone. Nature 605(7910), 551–560 (2022)

    Article  Google Scholar 

  18. Elnaggar, A., Heinzinger, M., Dallago, C., et al.: ProtTrans: towards cracking the language of Life’s code through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 7112–7127 (2022)

    Article  Google Scholar 

  19. Hu, B., et al.: Protein language models and structure prediction: connection and progression. arXiv preprint arXiv:2211.16742 (2022)

  20. Lin, Z., et al.: Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379(6637), 1123–1130 (2023)

    Article  MathSciNet  Google Scholar 

  21. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  22. Hu, J., Li, Y., Zhang, M., et al.: Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1389–1398 (2016)

    Article  Google Scholar 

  23. Fu, L., Niu, B., Zhu, Z., et al.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)

    Article  Google Scholar 

  24. Zhang, C., Zhang, X., Freddolino, P.L., et al.: BioLiP2: an updated structure database for biologically relevant ligand–protein interactions. Nucleic Acids Res.: gkad630, (2023)

    Google Scholar 

  25. Cui, Y., Jia, M., Lin, T.-Y., et al.: Class-balanced loss based on effective number of samples. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, 9260–9269 (2019)

    Google Scholar 

Download references

Acknowledgments

The authors thank the laboratory equipment and configuration for the timely help in analyzing a large amount of data. Fundings from the National Natural Science Foundation (grant number: 62377036) and the Tianjin Research Innovation Project for Postgraduate Students (Project number: 2022SKYZ104) are gratefully acknowledged. We thank Dr Zhang, the professor at Tianjin University of Science and Technology, for his help with this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiankun Zhang .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shan, K., Zhang, X., Song, C. (2024). Prediction of Protein-DNA Binding Sites Based on Protein Language Model and Deep Learning. In: Huang, DS., Pan, Y., Zhang, Q. (eds) Advanced Intelligent Computing in Bioinformatics. ICIC 2024. Lecture Notes in Computer Science(), vol 14882. Springer, Singapore. https://doi.org/10.1007/978-981-97-5692-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5692-6_28

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5691-9

  • Online ISBN: 978-981-97-5692-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics