Abstract
RNA-binding hot spots are dominant and fundamental residues that contribute most to the binding free energy of protein-RNA interfaces. As experimental methods for identifying hot spots are expensive and time-consuming, high-efficiency computational approaches are required in predicting hot spots on a large scale. In this work, we proposed a sequence-based machine learning method to predict hot spots in protein-RNA complexes. We extracted 83 relative independent physicochemical features from a set of the 544 properties in AAindex1. Each physicochemical feature was combined with the predicted relative accessible surface area (RASA) and substitution probability feature from Blocks Substitution Matrix (BLOSUM) for training models by support vector machine (SVM) and k-nearest neighbor algorithm (k-NN). The combinations of the 166 individual models were explored and 33 top-performance models were selected to construct the final ensemble classifier by a majority voting technique. The ensemble classifier outperformed the state-of-the-art computational methods, yielding F1 score of 0.742 and AUC of 0.824 on the independent test set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
König, J., et al.: Protein-RNA interactions: new genomic technologies and perspectives. Nat. Rev. Genetics 13(2), 77 (2012)
Ellis, J.J., Broom, M., Jones, S.: Protein-RNA interactions: structural analysis and functional classes. Proteins: Struct. Funct. Bioinf. 66(4), 903–911 (2007)
Clackson, T., Wells, J.A.: A hot spot of binding energy in a hormone-receptor interface. Science 267(5196), 383–386 (1995)
Moreira, I.S., Fernandes, P.A., Ramos, M.J.: Hot spots-A review of the protein-protein interface determinant amino-acid residues. Proteins: Struct. Funct. Bioinf. 68(4), 803–812 (2007)
Kumar, M., Gromiha, M.M., Raghava, G.: Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins: Struct. Funct. Bioinf. 71(1), 189–194 (2008)
Liu, Z.-P., et al.: Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics 26(13), 1616–1622 (2010)
Tang, Y., et al.: A boosting approach for prediction of protein-RNA binding residues. BMC Bioinf. 18(13), 465 (2017)
Walia, R.R., et al.: RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins. PLoS One 9(5), e97725 (2014)
Yang, X., et al.: SNBRFinder: a sequence-based hybrid algorithm for enhanced prediction of nucleic acid-binding residues. PLoS One 10(7), e0133260 (2015)
Barik, A., et al.: Probing binding hot spots at protein-RNA recognition sites. Nucleic Acids Res. 44(2), e9 (2015)
Pan, Y., et al.: Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics 34(9), 1473–1480 (2017)
Shuichi, K., et al.: AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(Database issue), D202–D205 (2008)
Chen, P., et al.: Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins: Struct. Funct. Bioinf. 81(8), 1351–1362 (2013)
Hu, S.-S., et al.: Protein binding hot spots prediction from sequence only by a new ensemble learning method. Amino Acids 49(10), 1773–1785 (2017)
Chou, K.C.: Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Bioinf. 43(3), 246–255 (2001)
Morten, N., et al.: A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct. Biol. 9(1), 51 (2009)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci. 89(22), 10915–10919 (1992)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Xia, J.-F., et al.: APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinf. 11(1), 174 (2010)
Zhu, X., Mitchell, J.C.: KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features. Proteins: Struct. Funct. Bioinf. 79(9), 2671–2683 (2011)
Acknowledgement
This work was supported by the National Natural Science Foundation of China (61672037, 21601001, and 11835014), the Anhui Provincial Outstanding Young Talent Support Plan (gxyqZD2017005), the Young Wanjiang Scholar Program of Anhui Province, the Recruitment Program for Leading Talent Team of Anhui Province (2019-16), the China Postdoctoral Science Foundation Grant (2018M630699) and the Anhui Provincial Postdoctoral Science Foundation Grant (2017B325).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, L., Zhang, S., Xia, J. (2019). Sequence-Based Prediction of Hot Spots in Protein-RNA Complexes Using an Ensemble Approach. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11643. Springer, Cham. https://doi.org/10.1007/978-3-030-26763-6_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-26763-6_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26762-9
Online ISBN: 978-3-030-26763-6
eBook Packages: Computer ScienceComputer Science (R0)