Abstract
Effective feature extraction methods play very important role for prediction of multisite protein subcellular locations. With the progress of many proteome projects, more and more proteins are annotated with more than one subcellular location. However, compared with the problems of single-site protein, the problems of multiplex protein subcellular localizations are far more difficult and complicated to deal with. To improve the multisite prediction quality, it is necessary to incorporate different feature extraction methods. In this paper, a version of feature combination method which is to make use of the 20 dimensions of entropy density instead of the former 20 dimensions of amphiphilic pseudo amino acid composition (AmPseAAC), is used in two different datasets. It is different from the way of simple dimensions additive feature fusion. On base of this novel feature combination method, we adopt the multi-label k-nearest neighbors (ML-KNN) algorithm and setting different weights into different attributes’ ML-KNN, which is called wML-KNN, to predict multiplex protein subcellular locations. The best overall accuracy rate on dataset S1 from the predictor of Virus-mPLoc is 61.11 % and 82.03 % on dataset S2 from Gpos-mPLoc, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chou, K.C.: Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 43, 246–256 (2001)
Du, P.F., Xu, C.: Predicting multisite protein subcellular locations: progress and challenges. Proteomics 10(3), 227–237 (2013)
Chou, K.C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics 6, 262–274 (2009)
Chou, K.C., Cai, Y.D.: Predicting protein localization in budding yeast. Bioinformatics 21(7), 944–950 (2005)
Su, C.Y., Lo, A., Lin, C.C., et al.: A novel approach for prediction of multi-labeled protein subcellular localization for prokaryotic bacteria. In: Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference Workshops, Stanford, California, 8–12 August, pp. 79–80. IEEE, Piscataway (2005)
Zhu, H.Q., She, Z.S., Wang, J.: An EDP-based description of DNA sequences and its application in identification of exons in human genome. In: The Second Chinese Bioinformatics Conference Proceedings, Beijing, pp. 23–24 (2002)
Shannon, C.E.: The mathematical theory of communication. Bell Syst. Tech. 27, 623–656 (1948)
Chou, K.C., Wu, Z.C., Xiao, X.: iLoc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J. Theor. Biol. 284, 42–51 (2011)
Chou, K.C.: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10–19 (2005)
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038 (2007)
Shen, Z.B., Bai, Q.Y.: KNN text classification method based on weight modify. Comput. Sci. 35(10), 123–126 (2008)
Qu, X., Chen, Y., Qiao, S., Wang, D., Zhao, Q.: Predicting the subcellular localization of proteins with multiple sites based on multiple features fusion. In: Huang, D.-S., Han, K., Gromiha, M. (eds.) ICIC 2014. LNCS, vol. 8590, pp. 456–465. Springer, Heidelberg (2014)
Acknowledgment
This research was partially supported by the Science and Technology Foundation of University of Jinan (Grant No. XKY1402), Shandong Provincial Natural Science Foundation, China, under Grant ZR2015JL025, the Youth Project of National Natural Science Fund (Grant No. 61302128), the Youth Science and Technology Star Program of Jinan City (201406003), the Natural Science Foundation of Shandong Province (ZR2011FL022, ZR2013FL002), the Scientific Research Fund of Jinan University (XKY1410, XKY1411), the Program for Scientific research innovation team in Colleges and Universities of Shandong Province (2012–2015), and the Shandong Provincial Key Laboratory of Network Based Intelligent Computing.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, L., Wang, D., Chen, Y., Qiao, S., Zhao, Y., Cong, H. (2016). Feature Combination Methods for Prediction of Subcellular Locations of Proteins with Both Single and Multiple Sites. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9771. Springer, Cham. https://doi.org/10.1007/978-3-319-42291-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-42291-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42290-9
Online ISBN: 978-3-319-42291-6
eBook Packages: Computer ScienceComputer Science (R0)