Abstract
Prediction of subcellular localization is critical for the analysis of mechanism and functions of proteins and biological research. A series of efficient methods have been proposed to identify subcellular localization, but challenges still exist. In this paper, a novel feature extraction method, denoted as F-Dipe, is proposed to identify subcellular localization. F-Dipe, which is based on dipeptide pseudo amino acid composition method, improves the performance of multi-site prediction by increasing the focus information of proteins. Besides, convolution neural networks, denoted as CNN, is utilized to predict the subcellular localization of multi-site virus proteins. The multi-label k-nearest neighbor algorithm, denoted as MLKNN, is a base classifier to verify the performance of F-Dipe and CNN. The best overall accuracy of F-Dipe on dataset S from the predictor of MLKNN is 59.92%, higher than the accuracy of pseudo amino acid based features method, denoted as PseAAC, 57.14% and the best overall accuracy of F-Dipe on database S from the predictor of CNN is 62.3%, better than from the predictor of MLKNN 59.92%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, X., Li, G.Z., Lu, W.C.: Virus-ECC-mPLoc: a multi-label predictor for predicting the subcellular localization of virus proteins with both single and multiple sites based on a general form of Chou’s pseudo amino acid composition. Protein Pept. Lett. 20, 309–317 (2013)
Scott, M.S., Oomen, R., Thomas, D.Y., Hallett, M.T.: Predicting the subcellular localization of viral proteins within a mammalian host cell. Virol. J. 3, 24 (2006)
Accquaah-Mensah, G.K., Leach, S.M., Guda, C.: Predicting the subcellular localization of human proteins using machine learning and exploratory data analysis. Genomics Proteomics Bioinform. 4(2), 120–133 (2006)
Xiao, X., Wu, Z.C., et al.: iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J. Theor. Biol. 284(1), 42–51 (2011)
Chou, K.C.: Impacts of bioinformatics to medicinal chemistry. Med. Chem. 11(3), 218–234 (2015)
Ji, Z., Wu, D., Zhao, W., et al.: Systemic modeling myeloma-osteoclast interactions under normoxic/hypoxic condition using a novel computational approach. Sci. Rep. 5, 13291 (2015)
Wang, B., Zhang, J., Chen, P., et al.: Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features. BMC Bioinform. 14(8), S9 (2013)
Shen, H.B., Chou, K.C.: A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. Anal. Biochem. 394(2), 269–274 (2009)
Chou, K.C., Shen, H.B.: A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS ONE 5(4), e9931 (2010)
Chou, K.C., Wu, Z.C., Xiao, X.: iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol. BioSyst. 8(2), 629–641 (2012)
Huang, D.S.: Systematic Theory of Neural Networks for Pattern Recognition (in Chinese). Publishing House of Electronic Industry of China, Beijing (1996)
Wu, Z.C., Xiao, X., Chou, K.C.: iLoc-Gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins. Protein Pept. Lett. 19(1), 4–14 (2012)
You, Z.-H., Lei, Y.-K., Huang, D.S., Zhou, X.: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26(21), 2744–2751 (2010)
Deng, Y., Luo, Y.L., et al.: Effect of different drying methods on the myosin structure, amino acid composition, protein digestibility and volatile profile of squid fillets. Food Chem. 171(15), 168–176 (2015)
Dehzangi, A., Heffernan, R., et al.: Gram-positive and gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou’s general PseAAC. Theor. Biol. 364, 284–294 (2015)
Emanuelsson, O., Nielsen, H., et al.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Mol. Biol. 300(4), 1005–1016 (2000)
Shen, H.B., Chou, K.C.: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analyt. Biochem. 373(2), 386–388 (2007)
Milletari, F., Ahmadi, S.A., Kroll, C., et al.: Hough-CNN: deep learning for segmentation of deep brain regions in MRI and ultrasound. Comput. Vis. Image Underst. (2017). doi:10.1016/j.cviu.2017.04.002
Huang, D.S., Yu, H.-J.: Normalized feature vectors: a novel alignment-free sequence comparison method based on the numbers of adjacent amino acids. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(2), 457–467 (2013)
Ji, Z., Wu, G., Hu, M.: Feature selection based on adaptive genetic algorithm and SVM. Comput. Eng. 14, 072 (2009)
Yu, S.Q., Jia, S., Xu, C.Y.: Convolutional neural networks for hyperspectral image classification. Neurocomputing 219(5), 88–98 (2017)
Han, S.Y., Chen, Y.H., Tang, G.Y.: Sensor fault and delay tolerant control for networked control systems subject to external disturbances. Sensors 17(4), 700 (2017)
Han, S.Y., Zhang, C.H., Tang, G.Y.: Approximation optimal vibration for networked nonlinear vehicle active suspension with actuator time delay. Asian J. Control (2017). doi:10.1002/asjc.1419
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Xiao, X., Wu, Z.C., et al.: A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS ONE 6, e20592 (2011)
Bao, W., Chen, Y., Wang, D.: Prediction of protein structure classes with flexible neural tree. Bio-Med. Mater. Eng. 24(6), 3797–3806 (2014)
Ji, Z., Wang, B., Deng, S.P., et al.: Predicting dynamic deformation of retaining structure by LSSVR-based time series method. Neurocomputing 137, 165–172 (2014)
Chou, K.C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics 6(4), 262–274 (2009)
Han, S.Y., Chen, Y.H., Tang, G.Y.: Fault diagnosis and fault-tolerant tracking control for discrete-time systems with faults and delays in actuator and measurement. J. Franklin Inst. 354(12), 4719–4738 (2017)
Acknowledgment
This research was supported by the National Key Research And Development Program of China (No. 2016YFC0106000), National Natural Science Foundation of China (Grant No. 61302128, 61573166, 61572230, 61671220, 61640218), the Youth Science and Technology Star Program of Jinan City (201406003), the Natural Science Foundation of Shandong Province (ZR2013FL002), the Shandong Distinguished Middle-aged and Young Scientist Encourage and Reward Foundation, China (Grant No. ZR2016FB14), the Project of Shandong Province Higher Educational Science and Technology Program, China (Grant No. J16LN07), the Shandong Province Key Research and Development Program, China (Grant No. 2016GGX101022), Research Fund for the Doctoral Program of University of Jinan (No. XBS1604).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wang, L., Wang, D., Zhao, Y., Chen, Y. (2017). Prediction of Subcellular Localization of Multi-site Virus Proteins Based on Convolutional Neural Networks. In: Huang, DS., Jo, KH., Figueroa-GarcÃa, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-63312-1_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63311-4
Online ISBN: 978-3-319-63312-1
eBook Packages: Computer ScienceComputer Science (R0)