Abstract
In this chapter we present an ensemble classifier that performs multi-class classification by combining several kernel classifiers through Decision Direct Acyclic Graph (DDAG). Each base classifier, called K-TIPCAC, is mainly based on the projection of the given points on the Fisher subspace, estimated on the training data, by means of a novel technique. The proposed multiclass classifier is applied to the task of protein subcellular location prediction, which is one of the most difficult multiclass prediction problems in modern computational biology. Although many methods have been proposed in the literature to solve this problem all the existing approaches are affected by some limitations, so that the problem is still open. Experimental results clearly indicate that the proposed technique, called DDAG K-TIPCAC, performs equally, if not better, than state of the art ensemble methods aimed at multi-class classification of highly unbalanced data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bhasin, M., Garg, A., Raghava, G.P.: PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 21, 2522–2524 (2005)
Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)
Briesemeister, S., Rahnenfuhrer, J., Kohlbacher, O.: Going from where to why - interpretable prediction of protein subcellular localization. Bioinformatics 26, 1232–1238 (2010)
Brubaker, S.C., Vempala, S.: Isotropic PCA and affine-invariant clustering. In: Proc. the 49th Annual IEEE Symp. Foundations Comp., Philadelphia, PA, pp. 551–560 (2008)
Cai, Y.D., Chou, K.C.: Nearest neighbor algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem. and Biophys. Research Communications 305, 407–411 (2003)
Chou, K.C.: A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins: Structure, Function, and Genetics 21, 319–344 (1995)
Chou, K.C., Elrod, D.W.: Protein subcellular location prediction. Protein Engineering 12, 107–118 (1999)
Chou, K.C.: Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Structure, Function, and Genetics 43, 246–255 (2001)
Chou, K.C., Cai, Y.D.: Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 277, 45765–45769 (2002)
Chou, K.C., Cai, Y.D.: Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem. and Biophys. Research Communications 320, 1236–1239 (2004)
Chou, K.C., Shen, H.B.: Predicting eukaryotic protein subcellular locations by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J. Proteome Research 5, 1888–1897 (2006)
Chou, K.C., Shen, H.B.: Recent progress in protein subcellular location prediction. Analytical Biochem. 370, 1–16 (2007)
Chou, K., Shen, H.: Cell-Ploc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nature Protocol 3, 153–162 (2008)
Chou, K., Shen, H.: A new method for predicting the subcellular localization of eukariotic proteins with both single and multiple sites: Euk-mPLoc 2.0. Plos One 5, e9931 (2010)
Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning 20, 273–293 (1995)
Cover, T.M., Hart, P.E.: Nearest neighbour pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)
Denoeux, T.: A K-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans. System, Man, and Cybernetics 25, 804–813 (1995)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Recognition, 2nd edn. Wiley- Interscience, Hoboken (2001)
Frank, E., Kramer, S.: Ensembles of nested dichotomies for multi-class problems. In: Brodley, C.E. (ed.) Proc. the 21st Int. Conf. Machine Learning, Banff, AL. ACM Press, New York (2004)
Fox, J.: Applied regression analysis, linear models, and related methods. Sage, Thousand Oaks (1997)
Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press, Burlington (1990)
Garg, A., Bhasin, M., Raghava, G.P.: Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J. Biol. Chem. 280, 14427–14432 (2005)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11 (2009)
Hansen, P.C.: The truncated SVD as a method for regularization. Technical Report, Standford University, CA, USA (1986)
Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Proc. Neural Inf. Proc. Syst., Denver, CO, pp. 507–513. MIT Press, Cambridge (1998)
Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001)
Huang, Y., Li, Y.: Prediction of protein subcellular locations using fuzzy K-NN method. Bioinformatics 20, 21–28 (2004)
Lei, Z., Dai, Y.: An SVM-based system for predicting protein subnuclear localizations. BMC Bioinformatics 6 (2005)
Park, K.J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using composition amino acid and amino acid pairs. Bioinformatics 19, 1656–1663 (2003)
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A.J., Bartlett, P.L., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)
Platt, C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Proc. Neural Inf. Proc. Syst., Denver, CO, pp. 547–553. MIT Press, Cambridge (2000)
Rozza, A., Lombardi, G., Casiraghi, E.: Novel IPCA-based classifiers and their application to spam filtering. In: Abraham, A., Sánchez, J.M.B., Herrera, F., Loia, V., Marcelloni, F., Senatore, S. (eds.) Proc. Int. Conf. Syst. Design and Appl., Pisa, Italy, pp. 797–802. IEEE Computer Society, Washington (2009)
Rozza, A., Lombardi, G., Casiraghi, E.: PIPCAC: A novel binary classifier assuming mixtures of Gaussian functions. In: Proc. Artif. Intell. Appl., Innsbruck, Austria. ACTA Press, Calgary (2010)
Rozza, A., Lombardi, G., Rosa, M., Casiraghi, E.: O-IPCAC and its application to EEG classification. J. Machine Learning Research 11, 4–11 (2010)
Shen, H.B., Chou, K.C.: Virus-PLoc: A fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 85, 233–240 (2006)
Shen, H., Chou, K.: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical Biochem. 373, 386–388 (2008)
Shen, H.B., Chou, K.C.: Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem. and Biophys. Research Communications 355, 1006–1011 (2007)
Shen, H.B., Chou, K.C.: Virus-PLoc: a fusion classifier for predicting protein subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 85, 233–240 (2007)
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computing 10, 1299–1319 (1998)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Zouhal, L.M., Denoeux, T.: An evidence theoretic K-NN rule with parameter optimization. IEEE Trans. Syst., Man, and Cybernetics 28, 263–271 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rozza, A., Lombardi, G., Re, M., Casiraghi, E., Valentini, G., Campadelli, P. (2011). A Novel Ensemble Technique for Protein Subcellular Location Prediction. In: Okun, O., Valentini, G., Re, M. (eds) Ensembles in Machine Learning Applications. Studies in Computational Intelligence, vol 373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22910-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-22910-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22909-1
Online ISBN: 978-3-642-22910-7
eBook Packages: EngineeringEngineering (R0)