Abstract
Recent advancement in the pattern recognition field has driven many classification algorithms being implemented to tackle protein fold prediction problem. In this paper, a newly introduced method called Rotation Forest for building ensemble of classifiers based on bootstrap sampling and feature extraction is implemented and applied to challenge this problem. The Rotation Forest is a straight forward extension of bagging algorithms which aims to promote diversity within the ensemble through feature extraction by using Principle Component Analysis (PCA). We compare the performance of the employed method with other Meta classifiers that are based on boosting and bagging algorithms, such as: AdaBoost.M1, LogitBoost, Bagging and Random Forest. Experimental results show that the Rotation Forest enhanced the protein folding prediction accuracy better than the other applied Meta classifiers, as well as the previous works found in the literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
RodrÃguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
Freund, Y., Schapier, R.E.: A Short Introduction to Boosting. Journal of Japanese Society for Artificial Intelligence 14(5), 771–780 (1997)
Kuncheva, L.I., RodrÃguez, J.J.: An Experimental Study on Rotation Forest Ensembles. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 459–468. Springer, Heidelberg (2007)
Stiglic, G., Kokol, P.: Effectiveness of Rotation Forest in Meta-learning Based Gene Expression Classification. In: Proceedings of Twentieth IEEE International Symposium on Computer-Based Medical Systems (2007) ISBN: 0-7695-2905-4
Friedman, J., Hastie, T., Tibshirani, R.: (Published version) Additive Logistic Regression: a Statistical View of Boosting Annals of Statistics 28(2), 337–407 (2001)
Breiman, L.: Random Forest. Machine learning. Kluwer Academic Publishers, Dordrecht (2001) ISSN: 0885-6125
Kecman, V., Yang, T.: Protein Fold Recognition with Adaptive Local Hyper plane Algorithm. In: IEEE Symposium Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2009 (2009); 4925710
Stanley, Y., Shi, M.P., Suganthan, N.: Multiclass protein fold recognition using multiobjective evolutionary algorithms. Computational Intelligence in Bioinformatics and Computational Biology (2004); 0-7803-8728-7
Okun, O.G.: Protein Fold Recognition with K-local Hyperplane Distance Nearest Neighbor Algorithm. In: Proceedings in the Second European Workshop on Data Mining and Text Mining in Bioinformatics, Pisa, Italy, pp. 51–57 (2004)
Chinnasamy, A., Sung, W.K., Mittal, A.: Protein structure and fold prediction using tree-augmented naive Bayesian classifier. Pacific Symposium on Biocomputing. In: Pacific Symposium on Biocomputing, vol. 9, pp. 387–398 (2004)
Nanni, L.: Ensemble of classifiers for protein fold recognition. In: New Issues in Neurocomputing: 13th European Symposium on Artificial Neural Networks, vol. 69, pp. 850–853 (2006)
Karplus, K.: SAM-T08, HMM-based protein structure prediction. Nucleic Acids Research 37(suppl. 2), W492–W497 (2009)
Bologna, G., Appel, R.D.: A comparison study on protein fold recognition. In: Proceedings of the Ninth International Conference on Neural Information Processing, November 2002, vol. 5, pp. 2492–2496 (2002)
Huang, C., Lin, C., Pal, N.: Hierarchical learning architecture with automatic feature selection for multi class protein fold classification. IEEE Transactions on Nano Bioscience 2(4), 221–232 (2003)
Lin, K.L., Li, C.Y., Huang, C.D., Chang, H.M., Yang, C.Y., Lin, C.T., Tang, C.Y., Hsu, D.F.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transactions on Nano Bioscience 6(2) (2008)
Lin, K.L., Lin, C.Y., Huang, C.D., Chang, H.M., Yang, C.Y., Lin, C.T., Tang, C.Y., Hsu, D.F.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transactions on Nano Bioscience (2007) ISSN: 1536–1241
Chen, C., Zhou, X.B., Tian, Y.X., Zou, X.Y., Cai, P.X.: Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal. Biochem. 357, 116–121 (2006)
Lewis, D.P., Jebara, T., Noble, W.S.: Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics 22, 2753–2760 (2006)
Zhang, S.W., Pan, Q., Zhang, H.C., Shao, Z.C., Shi, J.Y.: Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion. Amino Acids 30, 461–468 (2006)
Zhou, X.B., Chen, C., Li, Z.C., Zou, X.Y.: Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine. Amino Acids 35, 383–388 (2008)
Ding, C., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4), 349–358 (2001)
Dubchak, I., Muchnik, I., Kim, S.K.: Protein folding class predictor for SCOP: approach based on global descriptors. In: 5th International Conference on Intelligent Systems for Molecular Biology, vol. 5, pp. 104–107 (1997)
Chung, I.F., Huang, C.D., Shen, Y.H., Lin, C.T.: Recognition of structure classification of protein folding by NN and SVM hierarchical learning architecture. In: Artificial Neural Networks and Neural Information Processing, pp. 1159–1167 (2003)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006) ISBN-13: 978-0387-31073-2
Krishnaraj, Y., Reddy, C.K.: Boosting methods for Protein Fold Recognition: An Empirical Comparison. In: IEEE International Conference on Bioinformatics and Biomedicine (2008) ISBN: 978-0-7695-3452-7
Cai, Y.D., Feng, K.Y., Lu, W.C., Chou, K.C.: Using LogitBoost classifier to predict protein structural classes. Journal of Theoretical Biology 238, 172–176 (2006)
Zhang, C.X., Zhang, J.S., Wang, J.W.: An empirical study of using Rotation Forest to improve regressors. Applied Mathematics and Computation 195, 618–629 (2007)
Scholkopf, B., Smola, A., Muller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Lo Conte, L., Ailey, B., Hubbard, T.J.P., Braner, S.E., Murzin, A.G., Chothia, C.: SCOP a structural classification of proteins database 28(1), 257–259 (2000)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)
Hobohm, U., Scharf, M., Schneider, R., Sander, C.: Selection of a representative set of structure from the Brookhaven Protein Bank protein. Science 1, 409–417 (1992)
Duwairi, R., Kassawneh, A.: A Framework for Predicting Proteins 3D Structures. In: Computer Systems and Applications, AICCSA 2008 (2008); 978-1-4244-1968
Chou, K.C., Zhang, C.T.: Prediction of protein structural classes, Critical Review. Biochem. Mol. Biol. 30(4), 275–349 (1995)
Livingston, F.: Implementation of Breiman’s Random Forest Machine Learning Algorithm. ECE591Q Machine Learning Journal Paper (2005)
Hashemi, H.B., Shakery, A., Naeini, M.P.: Protein Fold Pattern Recognition Using Bayesian nsemble of RBF Neural Networks. In: International Conference of Soft Computing and Pattern Recognition, pp. 436–441 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dehzangi, A., Phon-Amnuaisuk, S., Manafi, M., Safa, S. (2010). Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2010. Lecture Notes in Computer Science, vol 6023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12211-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-12211-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12210-1
Online ISBN: 978-3-642-12211-8
eBook Packages: Computer ScienceComputer Science (R0)