Abstract
The explanation of a decision made is important for the acceptance of machine learning technology, especially for such applications as bioinformatics. Support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction. However, it is a black box model. On the other hand, a decision tree has good comprehensibility. In this paper, a novel approach to rule generation for understanding protein secondary structure prediction by integrating merits of both support vector machine and decision tree is presented. This approach combines SVM with decision tree into a new algorithm called SVM_DT. The results of the experiments of protein secondary structure prediction on RS126 data sets show that the comprehensibility of SVM_DT is much better than that of SVM. Moreover, the generalization ability of SVM_DT is better than that of decision tree and is similar to that of SVM. Hence, SVM_DT can be used not only for prediction, but also for guiding biological experiments.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sikder, A.R., Zomaya, A.Y.: An overview of protein-folding techniques: issues and perspectives. Int. J. Bioinformatics Research and Applications 1(1), 121–143 (2005)
Barakat, N., Diederich, J.: Learning-based Rule-Extraction from Support Vector Machine. In: The third Conference on Neuro-Computing and Evolving Intelligence, NCEI 2004 (2004)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Casbon, J.: Protein Secondary Structure Prediction with Support Vector Machines (2002)
Chandonia, J.M., Karplus, M.: New Methods for accurate prediction of protein secondary structure. Proteins 35, 293–306 (1999)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 237–297 (1995)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Gorgevik, D., Cakmakov, D., Radevski, V.: Handwritten Digit Recognition Using Statistical and Rule-Based Decision Fusion. IEEE MELECON 2002, May 7-9 (2002)
Henikoff, S., Henikoff, J.G.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89, 10915–10919 (1992)
Hu, H., Pan, Y., Harrison, R., Tai, P.C.: Improved Protein Secondary Structure Prediction Using Support Vector Machine with a New Encoding Scheme and an Advanced Tertiary Classifier. IEEE Transactions on NanoBioscience 3(4), 265–271 (2004)
Hua, S., Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. 308, 397–407 (2001)
Joachims, T.: SVMlight (2002), http://www.cs.cornell.edu/People/tj/svm_light/
Kim, H., Park, H.: Protein Secondary Structure Prediction Based on an Improved Support Vector Machines Approach (2002)
Lim, T.S., Loh, W.Y., Shih, Y.S.: A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty_Tree Old and New Classification Algorithm. Machine Learning 40(3), 203–228 (2000)
Lin, S., Patel, S., Duncan, A.: Using Decision Trees and Support Vector Machines to Classify Genes by Names. In: Proceeding of the Europen Workshop on Data Mining and Text Mining for Bioinformatics (2003)
Mitchell, M.T.: Machine Learning. McGraw-Hill, US (1997)
Mitsdorffer, R., Diederich, J., Tan, C.: Rule-extraction from Technology IPOs in the US Stock Market. In: ICONIP 2002, Singapore (2002)
Noble, W.S.: Kernel Methods in Computational Biology. In: Schoelkopf, B., Tsuda, K., Vert, J.-P. (eds.), pp. 71–92. MIT Press, Cambridge (2004)
Núñez, H., Angulo, C., Catala, A.: Rule-extraction from Support Vector Machines. In: The European Symposium on Artifical Neural Networks, Burges, pp.107-112 (2002), ISBN 2-930307-02-1
Quinlan, J.R.: Improved Use of Continuous Attributes in C4.5. J. Artificial Intelligence Research 4, 77–90 (1996)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Rost, B., Sander, C.: Prediction of protein Secondary Structure at Better than 70% Accuracy. J. Mol. Biol. 232, 584–599 (1993)
Vapnik, V.: Statistical Learning Theory. John Wiley&Sons, Inc., New York (1998)
Yang, Z.R., Chou, K.: Bio-support Vector Machines for Computational Proteomics. Bioinformatics 20(5) (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, J., Hu, HJ., Harrison, R., Tai, P.C., Dong, Y., Pan, Y. (2005). Understanding Protein Structure Prediction Using SVM_DT. In: Chen, G., Pan, Y., Guo, M., Lu, J. (eds) Parallel and Distributed Processing and Applications - ISPA 2005 Workshops. ISPA 2005. Lecture Notes in Computer Science, vol 3759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11576259_23
Download citation
DOI: https://doi.org/10.1007/11576259_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29770-3
Online ISBN: 978-3-540-32115-6
eBook Packages: Computer ScienceComputer Science (R0)