Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Novel Ensemble Technique for Protein Subcellular Location Prediction

  • Chapter
Ensembles in Machine Learning Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 373))

  • 1571 Accesses

Abstract

In this chapter we present an ensemble classifier that performs multi-class classification by combining several kernel classifiers through Decision Direct Acyclic Graph (DDAG). Each base classifier, called K-TIPCAC, is mainly based on the projection of the given points on the Fisher subspace, estimated on the training data, by means of a novel technique. The proposed multiclass classifier is applied to the task of protein subcellular location prediction, which is one of the most difficult multiclass prediction problems in modern computational biology. Although many methods have been proposed in the literature to solve this problem all the existing approaches are affected by some limitations, so that the problem is still open. Experimental results clearly indicate that the proposed technique, called DDAG K-TIPCAC, performs equally, if not better, than state of the art ensemble methods aimed at multi-class classification of highly unbalanced data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bhasin, M., Garg, A., Raghava, G.P.: PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 21, 2522–2524 (2005)

    Article  Google Scholar 

  2. Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Briesemeister, S., Rahnenfuhrer, J., Kohlbacher, O.: Going from where to why - interpretable prediction of protein subcellular localization. Bioinformatics 26, 1232–1238 (2010)

    Article  Google Scholar 

  4. Brubaker, S.C., Vempala, S.: Isotropic PCA and affine-invariant clustering. In: Proc. the 49th Annual IEEE Symp. Foundations Comp., Philadelphia, PA, pp. 551–560 (2008)

    Google Scholar 

  5. Cai, Y.D., Chou, K.C.: Nearest neighbor algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem. and Biophys. Research Communications 305, 407–411 (2003)

    Article  Google Scholar 

  6. Chou, K.C.: A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins: Structure, Function, and Genetics 21, 319–344 (1995)

    Article  Google Scholar 

  7. Chou, K.C., Elrod, D.W.: Protein subcellular location prediction. Protein Engineering 12, 107–118 (1999)

    Article  Google Scholar 

  8. Chou, K.C.: Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Structure, Function, and Genetics 43, 246–255 (2001)

    Article  Google Scholar 

  9. Chou, K.C., Cai, Y.D.: Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 277, 45765–45769 (2002)

    Article  Google Scholar 

  10. Chou, K.C., Cai, Y.D.: Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem. and Biophys. Research Communications 320, 1236–1239 (2004)

    Article  Google Scholar 

  11. Chou, K.C., Shen, H.B.: Predicting eukaryotic protein subcellular locations by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J. Proteome Research 5, 1888–1897 (2006)

    Article  Google Scholar 

  12. Chou, K.C., Shen, H.B.: Recent progress in protein subcellular location prediction. Analytical Biochem. 370, 1–16 (2007)

    Article  Google Scholar 

  13. Chou, K., Shen, H.: Cell-Ploc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nature Protocol 3, 153–162 (2008)

    Article  Google Scholar 

  14. Chou, K., Shen, H.: A new method for predicting the subcellular localization of eukariotic proteins with both single and multiple sites: Euk-mPLoc 2.0. Plos One 5, e9931 (2010)

    Google Scholar 

  15. Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning 20, 273–293 (1995)

    MATH  Google Scholar 

  16. Cover, T.M., Hart, P.E.: Nearest neighbour pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  17. Denoeux, T.: A K-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans. System, Man, and Cybernetics 25, 804–813 (1995)

    Article  Google Scholar 

  18. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Recognition, 2nd edn. Wiley- Interscience, Hoboken (2001)

    Google Scholar 

  19. Frank, E., Kramer, S.: Ensembles of nested dichotomies for multi-class problems. In: Brodley, C.E. (ed.) Proc. the 21st Int. Conf. Machine Learning, Banff, AL. ACM Press, New York (2004)

    Google Scholar 

  20. Fox, J.: Applied regression analysis, linear models, and related methods. Sage, Thousand Oaks (1997)

    Google Scholar 

  21. Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press, Burlington (1990)

    MATH  Google Scholar 

  22. Garg, A., Bhasin, M., Raghava, G.P.: Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J. Biol. Chem. 280, 14427–14432 (2005)

    Article  Google Scholar 

  23. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11 (2009)

    Google Scholar 

  24. Hansen, P.C.: The truncated SVD as a method for regularization. Technical Report, Standford University, CA, USA (1986)

    Google Scholar 

  25. Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Proc. Neural Inf. Proc. Syst., Denver, CO, pp. 507–513. MIT Press, Cambridge (1998)

    Google Scholar 

  26. Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001)

    Article  Google Scholar 

  27. Huang, Y., Li, Y.: Prediction of protein subcellular locations using fuzzy K-NN method. Bioinformatics 20, 21–28 (2004)

    Article  Google Scholar 

  28. Lei, Z., Dai, Y.: An SVM-based system for predicting protein subnuclear localizations. BMC Bioinformatics 6 (2005)

    Google Scholar 

  29. Park, K.J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using composition amino acid and amino acid pairs. Bioinformatics 19, 1656–1663 (2003)

    Article  Google Scholar 

  30. Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A.J., Bartlett, P.L., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)

    Google Scholar 

  31. Platt, C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Proc. Neural Inf. Proc. Syst., Denver, CO, pp. 547–553. MIT Press, Cambridge (2000)

    Google Scholar 

  32. Rozza, A., Lombardi, G., Casiraghi, E.: Novel IPCA-based classifiers and their application to spam filtering. In: Abraham, A., Sánchez, J.M.B., Herrera, F., Loia, V., Marcelloni, F., Senatore, S. (eds.) Proc. Int. Conf. Syst. Design and Appl., Pisa, Italy, pp. 797–802. IEEE Computer Society, Washington (2009)

    Google Scholar 

  33. Rozza, A., Lombardi, G., Casiraghi, E.: PIPCAC: A novel binary classifier assuming mixtures of Gaussian functions. In: Proc. Artif. Intell. Appl., Innsbruck, Austria. ACTA Press, Calgary (2010)

    Google Scholar 

  34. Rozza, A., Lombardi, G., Rosa, M., Casiraghi, E.: O-IPCAC and its application to EEG classification. J. Machine Learning Research 11, 4–11 (2010)

    Google Scholar 

  35. Shen, H.B., Chou, K.C.: Virus-PLoc: A fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 85, 233–240 (2006)

    Article  Google Scholar 

  36. Shen, H., Chou, K.: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical Biochem. 373, 386–388 (2008)

    Article  Google Scholar 

  37. Shen, H.B., Chou, K.C.: Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem. and Biophys. Research Communications 355, 1006–1011 (2007)

    Article  Google Scholar 

  38. Shen, H.B., Chou, K.C.: Virus-PLoc: a fusion classifier for predicting protein subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 85, 233–240 (2007)

    Article  Google Scholar 

  39. Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computing 10, 1299–1319 (1998)

    Article  Google Scholar 

  40. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)

    Article  Google Scholar 

  41. Zouhal, L.M., Denoeux, T.: An evidence theoretic K-NN rule with parameter optimization. IEEE Trans. Syst., Man, and Cybernetics 28, 263–271 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rozza, A., Lombardi, G., Re, M., Casiraghi, E., Valentini, G., Campadelli, P. (2011). A Novel Ensemble Technique for Protein Subcellular Location Prediction. In: Okun, O., Valentini, G., Re, M. (eds) Ensembles in Machine Learning Applications. Studies in Computational Intelligence, vol 373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22910-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22910-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22909-1

  • Online ISBN: 978-3-642-22910-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics