Sequence-Based Prediction of Hot Spots in Protein-RNA Complexes Using an Ensemble Approach

Zhao, Le; Zhang, Sijia; Xia, Junfeng

doi:10.1007/978-3-030-26763-6_55

Le Zhao¹¹,
Sijia Zhang¹² &
Junfeng Xia^11,12

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11643))

Included in the following conference series:

International Conference on Intelligent Computing

1552 Accesses

Abstract

RNA-binding hot spots are dominant and fundamental residues that contribute most to the binding free energy of protein-RNA interfaces. As experimental methods for identifying hot spots are expensive and time-consuming, high-efficiency computational approaches are required in predicting hot spots on a large scale. In this work, we proposed a sequence-based machine learning method to predict hot spots in protein-RNA complexes. We extracted 83 relative independent physicochemical features from a set of the 544 properties in AAindex1. Each physicochemical feature was combined with the predicted relative accessible surface area (RASA) and substitution probability feature from Blocks Substitution Matrix (BLOSUM) for training models by support vector machine (SVM) and k-nearest neighbor algorithm (k-NN). The combinations of the 166 individual models were explored and 33 top-performance models were selected to construct the final ensemble classifier by a majority voting technique. The ensemble classifier outperformed the state-of-the-art computational methods, yielding F1 score of 0.742 and AUC of 0.824 on the independent test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

König, J., et al.: Protein-RNA interactions: new genomic technologies and perspectives. Nat. Rev. Genetics 13(2), 77 (2012)
Article Google Scholar
Ellis, J.J., Broom, M., Jones, S.: Protein-RNA interactions: structural analysis and functional classes. Proteins: Struct. Funct. Bioinf. 66(4), 903–911 (2007)
Google Scholar
Clackson, T., Wells, J.A.: A hot spot of binding energy in a hormone-receptor interface. Science 267(5196), 383–386 (1995)
Article Google Scholar
Moreira, I.S., Fernandes, P.A., Ramos, M.J.: Hot spots-A review of the protein-protein interface determinant amino-acid residues. Proteins: Struct. Funct. Bioinf. 68(4), 803–812 (2007)
Article Google Scholar
Kumar, M., Gromiha, M.M., Raghava, G.: Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins: Struct. Funct. Bioinf. 71(1), 189–194 (2008)
Article Google Scholar
Liu, Z.-P., et al.: Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics 26(13), 1616–1622 (2010)
Article Google Scholar
Tang, Y., et al.: A boosting approach for prediction of protein-RNA binding residues. BMC Bioinf. 18(13), 465 (2017)
Article Google Scholar
Walia, R.R., et al.: RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins. PLoS One 9(5), e97725 (2014)
Article Google Scholar
Yang, X., et al.: SNBRFinder: a sequence-based hybrid algorithm for enhanced prediction of nucleic acid-binding residues. PLoS One 10(7), e0133260 (2015)
Article Google Scholar
Barik, A., et al.: Probing binding hot spots at protein-RNA recognition sites. Nucleic Acids Res. 44(2), e9 (2015)
Article Google Scholar
Pan, Y., et al.: Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics 34(9), 1473–1480 (2017)
Article Google Scholar
Shuichi, K., et al.: AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(Database issue), D202–D205 (2008)
Google Scholar
Chen, P., et al.: Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins: Struct. Funct. Bioinf. 81(8), 1351–1362 (2013)
Article Google Scholar
Hu, S.-S., et al.: Protein binding hot spots prediction from sequence only by a new ensemble learning method. Amino Acids 49(10), 1773–1785 (2017)
Article Google Scholar
Chou, K.C.: Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Bioinf. 43(3), 246–255 (2001)
Article Google Scholar
Morten, N., et al.: A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct. Biol. 9(1), 51 (2009)
Article Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci. 89(22), 10915–10919 (1992)
Article Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Xia, J.-F., et al.: APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinf. 11(1), 174 (2010)
Article Google Scholar
Zhu, X., Mitchell, J.C.: KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features. Proteins: Struct. Funct. Bioinf. 79(9), 2671–2683 (2011)
Article Google Scholar

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China (61672037, 21601001, and 11835014), the Anhui Provincial Outstanding Young Talent Support Plan (gxyqZD2017005), the Young Wanjiang Scholar Program of Anhui Province, the Recruitment Program for Leading Talent Team of Anhui Province (2019-16), the China Postdoctoral Science Foundation Grant (2018M630699) and the Anhui Provincial Postdoctoral Science Foundation Grant (2017B325).

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
Le Zhao & Junfeng Xia
Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
Sijia Zhang & Junfeng Xia

Authors

Le Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Sijia Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sijia Zhang .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Polytechnic University of Bari, Bari, Italy
Vitoantonio Bevilacqua
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne

Copyright information

About this paper

Cite this paper

Zhao, L., Zhang, S., Xia, J. (2019). Sequence-Based Prediction of Hot Spots in Protein-RNA Complexes Using an Ensemble Approach. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11643. Springer, Cham. https://doi.org/10.1007/978-3-030-26763-6_55

Download citation

DOI: https://doi.org/10.1007/978-3-030-26763-6_55
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26762-9
Online ISBN: 978-3-030-26763-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics