Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Transductive Learning for Multi-Label Protein Subchloroplast Localization Prediction

Published: 01 January 2017 Publication History

Abstract

Predicting the localization of chloroplast proteins at the sub-subcellular level is an essential yet challenging step to elucidate their functions. Most of the existing subchloroplast localization predictors are limited to predicting single-location proteins and ignore the multi-location chloroplast proteins. While recent studies have led to some multi-location chloroplast predictors, they usually perform poorly. This paper proposes an ensemble transductive learning method to tackle this multi-label classification problem. Specifically, given a protein in a dataset, its composition-based sequence information and profile-based evolutionary information are respectively extracted. These two kinds of features are respectively compared with those of other proteins in the dataset. The comparisons lead to two similarity vectors which are weighted-combined to constitute an ensemble feature vector. A transductive learning model based on the least squares and nearest neighbor algorithms is proposed to process the ensemble features. We refer to the resulting predictor as as EnTrans-Chlo. Experimental results on a stringent benchmark dataset and a novel dataset demonstrate that EnTrans-Chlo significantly outperforms state-of-the-art predictors and particularly gains more than 4 percent absolute improvement on the overall actual accuracy. For readers’ convenience, EnTrans-Chlo is freely available online at http://bioinfo.eie.polyu.edu.hk/EnTransChloServer/.

References

[1]
A. V. Melkikh, V. D. Seleznev, and O. I. Chesnokova, "Analytical model of ion transport and conversion of light energy in chloroplasts," J. Theoretical Biol., vol. 264, no. 3, pp. 702-710, 2010.
[2]
Z. Wang and C. Benning, "Chloroplast lipid synthesis and lipid trafficking through ER-plastid membrane contact sites," Biochem. Soc. Trans., vol. 40, no. 2, pp. 457-463, 2012.
[3]
D. Post-Beittenmiller, G. Roughan, and J. B. Ohlrogge, "Regulation of plant fatty acid biosynthesis analysis of Acyl-coenzyme A and Acyl-acyl carrier protein substrate pools in spinach and pea chloroplasts," Plant Physiol., vol. 100, no. 2, pp. 923-930, 1992.
[4]
P. R. Kirk and R. M. Leech, "Amino acid biosynthesis by isolated chloroplasts during photosynthesis," Plant Physiol., vol. 50, no. 2, pp. 228-234, 1972.
[5]
F. D. Moore and D. C. Shephard, "Chloroplast autonomy in pigment synthesis," Protoplasma, vol. 94, nos. 1/2, pp. 1-17, 1978.
[6]
P. Du, S. Cao, and Y. Li, "SubChlo: Predicting protein subchloroplast locations with Pseudo-amino acid composition and the Evidence-theoretic K-nearest neighbor (ET-KNN) algorithm," J. Theoretical Biol., vol. 261, no. 2, pp. 330-335, 2009.
[7]
C.-W. Tung, C. Liaw, S.-J. Ho, and S.-Y. Ho, "Prediction of protein subchloroplast locations using random forests," World Acad. Sci., Eng. Technol., vol. 65, pp. 699-703, 2010.
[8]
J. Hu and X. Yan, "BS-KNN: An effective algorithm for predicting protein subchloroplast localization," Evol. Bioinf., vol. 8, pp. 79-87, 2012.
[9]
S.-P. Shi, J.-D. Qiu, X.-Y. Sun, J.-H. Huang, S.-Y. Huang, S.-B. Suo, R.-P. Liang, and L. Zhang, "Identify submitochondria and sub-chloroplast locations with pseudo amino acid composition: Approach from the strategy of discrete wavelet transform feature extraction," Biochimica et Biophysica Acta, vol. 1813, no. 3, pp. 424-430, 2011.
[10]
W. Z. Lin, J. A. Fang, X. Xiao, and K. C. Chou, "iLoc-Animal: A multi-label learning classifier for predicting subcellular localization of animal proteins," Molecular BioSyst., vol. 9, no. 4, pp. 634-644, 2013.
[11]
X. Wang, W. Zhang, Q. Zhang, and G. Z. Li, "MultiP-SChlo: Multilabel protein subchloroplast localization prediction with Chou's pseudo amino acid composition and a novel multi-label classifier," Bioinf., vol. 31, pp. 1-7, 2015.
[12]
M. W. Mak, J. Guo, and S. Y. Kung, "PairProSVM: Protein subcellular localization based on local pairwise profile alignment and SVM," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 5, no. 3, pp. 416-422, 2008.
[13]
R. Mott, J. Schultz, P. Bork, and C. P. Ponting, "Predicting protein cellular localization using a domain projection method," Genome Res., vol. 12, no. 8, pp. 1168-1174, 2002.
[14]
G. L. Fan and Q. Z. Li, "Predict mycobacterial proteins subcellular locations by incorporating Pseudo-average chemical shift into the general form of Chou's pseudo amino acid composition," J. Theor. Biol., vol. 304, pp. 88-95, 2012.
[15]
G. P. Zhou and K. Doctor, "Subcellular location prediction of apoptosis proteins," PROTEINS: Struct. Function, Genetics, vol. 50, pp. 44-48, 2003.
[16]
K. C. Chou, "Prediction of protein cellular attributes using pseudo amino acid composition," Proteins: Struct., Function, Genetics, vol. 43, pp. 246-255, 2001.
[17]
O. Emanuelsson, H. Nielsen, S. Brunak, and G. von Heijne, "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence," J. Mol. Biol., vol. 300, no. 4, pp. 1005-1016, 2000.
[18]
H. Nielsen, J. Engelbrecht, S. Brunak, and G. von Heijne, "A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites," Int. J. Neural Sys., vol. 8, pp. 581-599, 1997.
[19]
S. Wan, M. W. Mak, and S. Y. Kung, "mLASSO-Hum: A LASSO-based interpretable Human-protein subcellular localization predictor," J. Theoretical Biol., vol. 382, pp. 223-234, 2015.
[20]
S. Wan, M. W. Mak, B. Zhang, Y. Wang, and S. Y. Kung, "An ensemble classifier with random projection for predicting multilabel protein subcellular localization," in Proc. IEEE Int. Conf. Bioinf. Biomed., 2013, pp. 35-42.
[21]
S. Mei, "Multi-label multi-kernel transfer learning for human protein subcellular localization," PLoS ONE, vol. 7, no. 6, p. e37716, 2012.
[22]
S. Wan, M. W. Mak, and S. Y. Kung, "R3P-Loc: A compact multilabel predictor using ridge regression and random projection for protein subcellular localization," J. Theor. Biol., vol. 360, pp. 34-45, 2014.
[23]
K. C. Chou and H. B. Shen, "Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers," J. Proteome Res., vol. 5, pp. 1888-1897, 2006.
[24]
S. Wan, M. W. Mak, and S. Y. Kung, "Semantic similarity over gene ontology for multi-label protein subcellular localization," Eng., vol. 5, pp. 68-72, 2013.
[25]
K. C. Chou and Y. D. Cai, "Prediction of protein subcellular locations by GO-FunD-PseAA predictor," Biochem. Biophys. Res. Commun., vol. 320, pp. 1236-1239, 2004.
[26]
S. Wan, M. W. Mak, and S. Y. Kung, "GOASVM: A subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition," J. Theor. Biol., vol. 323, pp. 40-48, 2013.
[27]
K. C. Chou, Z. C. Wu, and X. Xiao, "iLoc-Hum: Using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites," Molecular BioSyst., vol. 8, pp. 629-641, 2012.
[28]
S. Wan, M. W. Mak, and S. Y. Kung, "Adaptive thresholding for multi-label SVM classification with application to protein subcellular localization prediction," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 3547-3551.
[29]
A. Fyshe, Y. Liu, D. Szafron, R. Greiner, and P. Lu, "Improving subcellular localization prediction using text classification and the gene ontology," Bioinf., vol. 24, pp. 2512-2517, 2008.
[30]
S. Brady and H. Shatkay, "EpiLoc: A (working) text-based system for predicting protein subcellular location," in Proc. Pac. Symp. Biocomput., 2008, pp. 604-615.
[31]
R. Nair and B. Rost, "Sequence conserved for subcellular localization," Protein Sci., vol. 11, pp. 2836-2847, 2002.
[32]
Z. Lu, D. Szafron, R. Greiner, P. Lu, D. S. Wishart, B. Poulin, J. Anvik, C. Macdonell, and R. Eisner, "Predicting subcellular localization of proteins using machine-learned classifiers," Bioinf., vol. 20, no. 4, pp. 547-556, 2004.
[33]
M. Ferro, D. Salvi, S. Brugière, S. Miras, S. Kowalski, M. Louwagie, J. Garin, J. Joyard, and N. Rolland, "Proteomics of the chloroplast envelope membranes from Arabidopsis thaliana," Molecular Cellular Proteomics, vol. 2, no. 5, pp. 325-345, 2003.
[34]
G. T. Hanke, S. Okutani, Y. Satomi, T. Takao, A. Suzuki, and T. Hase, "Multiple iso-proteins of FNR in arabidopsis: Evidence for different contributions to chloroplast function and nitrogen assimilation," Plant, Cell Environ., vol. 28, no. 9, pp. 1146-1157, 2005.
[35]
Y. Hu, T. Li, J. Sun, S. Tang, W. Xiong, D. Li, G. Chen, and P. Cong, "Predicting gram-positive bacterial protein subcellular localization based on localization motifs," J. Theor. Biol., vol. 308, pp. 135-140, 2012.
[36]
S. Wan, M. W. Mak, and S. Y. Kung, "Mem-mEN: Predicting multi-functional types of membrane proteins by interpretable elastic nets," IEEE/ACM Trans. Comput. Biol. Bioinf., to appear, 2015.
[37]
G. Yu, H. Zhu, C. Domeniconi, and M. Guo, "Integrating multiple networks for protein function prediction," BMC Syst. Biol., vol. 9, no. Suppl 1, p. S3, 2015.
[38]
M. Kumar, M. M. Gromiha, and G. P. S. Raghava, "SVM based prediction of RNA-binding proteins using binding residues and evolutionary information," J. Molecular Recog., vol. 24, no. 2, pp. 303-313, 2011.
[39]
C. T. Su, C. Y. Chen, and Y. Y. Ou, "Protein disorder prediction by condensed PSSM considering propensity for order or disorder," BMC Bioinf., vol. 7, no. 1, pp. 319, 2006.
[40]
L. Zhu, J. Yang, and H.-B. Shen, "Multi label learning for prediction of human protein subcellular localizations," Protein J., vol. 28, nos. 9/10, pp. 384-390, 2009.
[41]
H. Rangwala and G. Karypis, "Profile-based direct kernels for remote homology detection and fold recognition," Bioinf., vol. 21, no. 23, pp. 4239-4247, 2005.
[42]
S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, "Gapped BLAST and PSI-BLAST: A new generation of protein database search programs," Nucleic Acids Res., vol. 25, pp. 3389-3402, 1997.
[43]
M. W. Mak, J. Guo, and S. Y. Kung, "PairProSVM: Protein subcellular localization based on local pairwise profile alignment and SVM," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 5, no. 3, pp. 416-422, Jul.-Sep. 2008.
[44]
S. Wan, M. W. Mak, and S. Y. Kung, "Protein subcellular localization prediction based on profile alignment and gene ontology," in Proc. IEEE Int. Workshop Mach. Learn. Signal Process., Sep. 2011, pp. 1-6.
[45]
K. C. Chou, "Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes," Bioinf., vol. 21, pp. 10-19, 2005.
[46]
Y. Zeng, Y. Guo, R. Xiao, L Yang, L. Yu, and M. Li, "Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach," J. Theor. Biol., vol. 259, no. 2, pp. 366-372, 2009.
[47]
D. Zou, Z. He, J. He, and Y. Xia, "Supersecondary structure prediction using Chou's pseudo amino acid composition," J. Comput. Chem., vol. 32, no. 2, pp. 271-278, 2011.
[48]
H. B. Shen and K. C. Chou, "PseAAC: A flexible web server for generating various kinds of protein pseudo amino acid composition," Analytical Biochem., vol. 373, no. 2, pp. 386-388, 2008.
[49]
S. Wan and M. W. Mak, Machine Learning for Protein Subcellular Localization Prediction. Boston, MA, USA: Walter De Gruyter, 2015.
[50]
X. Kong, M. K. Ng, and Z. H. Zhou, "Transductive multilabel learning via label set propagation," IEEE Trans. Knowl. Data Eng., vol. 25, no. 3, pp. 704-719, Mar. 2013.
[51]
A. Gammerman, V. Vovk, and V. Vapnik, "Learning by transduction," in Proc. 14th Conf. Uncertainty Artif. Intell, 1998, pp. 148-155.
[52]
G. Yu, H. Rangwala, C. Domeniconi, G. Zhang, and Z. Yu, "Protein function prediction using multi-label ensemble classification," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 10, no. 4, pp. 1045-1057, Jul./Aug. 2013.
[53]
C. Rausch, T. Weber, O. Kohlbacher, W. Wohlleben, and D. H. Huson, "Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs)," Nucleic Acids Res., vol. 33, no. 18, pp. 5799-5808, 2005.
[54]
T. Joachims, "Transductive inference for text classification using support vector machines," in Proc. 16th Int. Conf. Mach. Learn., 1999, pp. 200-209.
[55]
S. Wan, M. W. Mak, and S. Y. Kung, "mGOASVM: Multilabel protein subcellular localization based on gene ontology and support vector machines," BMC Bioinf., vol. 13, p. 290, 2012.
[56]
J. Wang, F. Wang, C. Zhang, H. C. Shen, and L. Quan, "Linear neighborhood propagation and its applications," IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 9, pp. 1600-1615, Sep. 2009.
[57]
D. Zhou, O. Bousquet, and J. Weston, "Learning with local and global consistency," in Proc. Adv. Neural Inform. Process. Syst., 2004, pp. 312-328.
[58]
S. Wan, M. W. Mak, and S. Y. Kung, "mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction," Anal. Biochem., vol. 473, pp. 14-27, 2015.
[59]
X. Xiao, Z. C. Wu, and K. C. Chou, "iLoc-Virus: A multilabel learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites," J. Theor. Biol., vol. 284, pp. 42-51, 2011.
[60]
S. Wan, M. W. Mak, and S. Y. Kung, "HybridGO-Loc: Mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins," PLoS ONE, vol. 9, no. 3, p. e89545, 2014.
[61]
G. Tsoumakas, I. Katakis, and I. Vlahavas, "Mining multilabel data," in Data Mining and Knowledge Discovery Handbook, O. Maimon, l. Rokach Eds., 2nd ed. New York, NY, USA: Springer, 2010, pp. 667-685.
[62]
R. E. Schapire and Y. Singer, "Boostexter: A boosting-based system for text categorization," Mach. Learn., vol. 39, no. 2/3, pp. 135-168, 2000.
[63]
G. Tsoumakas, I. Katakis, and I. Vlahavas, "Random K-labelsets for multilabel classification," IEEE Trans. Knowl. Data Eng., vol. 23, no. 7, pp. 1079-1089, Jul. 2011.
[64]
C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, vol. 1. Cambridge, U.K.: Cambridge Univ. Press, 2008.
[65]
T. Hastie, R. Tibshirani, and J. Friedman, The Element of Statistical Learning. New York, NY, USA: Springer-Verlag, 2001.
[66]
J. Read, B. Pfahringer, G. Holmes, and E. Frank, "Classifier chains for multi-label classification," in Proc. Eur. Conf. Mach. Learn. Principles Practice Knowl. Discovery Databases, 2009, pp. 254-269.

Cited By

View all
  • (2022)GAN-PCL: An Efficient Protein Subchloroplast Site Predictor with GAN-based Data Augmented and Feature FusionProceedings of the 14th International Conference on Bioinformatics and Biomedical Technology10.1145/3543377.3543388(72-78)Online publication date: 27-May-2022
  • (2021)Identification of chloroplast and sub-chloroplast proteins from sequence-attributed features using support vector machine and domain information2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)10.1109/CIBCB49929.2021.9562787(1-9)Online publication date: 13-Oct-2021
  • (2020)Predicting protein subchloroplast locations: the 10th anniversaryFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-9507-015:2Online publication date: 2-Oct-2020
  • Show More Cited By
  1. Transductive Learning for Multi-Label Protein Subchloroplast Localization Prediction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
      IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 14, Issue 1
      January 2017
      229 pages

      Publisher

      IEEE Computer Society Press

      Washington, DC, United States

      Publication History

      Published: 01 January 2017
      Published in TCBB Volume 14, Issue 1

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)GAN-PCL: An Efficient Protein Subchloroplast Site Predictor with GAN-based Data Augmented and Feature FusionProceedings of the 14th International Conference on Bioinformatics and Biomedical Technology10.1145/3543377.3543388(72-78)Online publication date: 27-May-2022
      • (2021)Identification of chloroplast and sub-chloroplast proteins from sequence-attributed features using support vector machine and domain information2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)10.1109/CIBCB49929.2021.9562787(1-9)Online publication date: 13-Oct-2021
      • (2020)Predicting protein subchloroplast locations: the 10th anniversaryFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-9507-015:2Online publication date: 2-Oct-2020
      • (2017)Bayesian Collective Markov Random Fields for Subcellular Localization Prediction of Human ProteinsProceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics10.1145/3107411.3107412(321-329)Online publication date: 20-Aug-2017

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media