Abstract
Poly-transformation is the extension of the idea of ensemble learning to the transformation step of Knowledge Discovery in Databases (KDD). In poly-transformation multiple transformations of the data are made before learning (data mining) is applied. The theoretical basis for poly-transformation is the same as that for other combining methods – using different predictors to remove uncorrelated errors. It is not possible to demonstrate the utility of poly-transformation using standard datasets, because no pre-transformed data exists for such datasets. We therefore demonstrate its utility by applying it to a single well-known hard problem for which we have expertise – the problem of predicting protein secondary structure from primary structure. We applied four different transformations of the data, each of which was justifiable by biological background knowledge. We then applied four different learning methods (linear discrimination, back-propagation, C5.0, and learning vector quantization) both to the four transformations, and to combining predictions from the different transformations to form the poly-transformation predictions. Each of the learning methods produced significantly higher accuracy with poly-transformation than with only a single transformation. Poly-transformation is the basis of the secondary structure prediction method Prof, which is one of the most accurate existing methods for this problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fayyad, U., Pietetsky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge (1996)
Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)
King, R.D., Srinivasan, A., Sternberg, M.J.E.: Relating chemical activity to structure: an examination of ILP successes. New Gen. Computing. 13, 411–433 (1995)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)
Cherkauer, K.J.: Human expert-level performance on a scientific image analysis by a system using combined artificial neural network. In: Chan, P. (ed.) Working Notes of AAAI Workshop on Integrating Multiple Learned Models, pp. 15–21 (1996)
Zheng, Z., Webb, G.I.: Stochastic attribute selection committees. In: Proceedings of the Eleventh Australian Joint Conference on Artificial Intelligence (AI 1998), pp. 321–332. Springer, Berlin (1998)
King, R.D., Sternberg, J.E.: Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Science 5, 2298–2310 (1996)
Salamov, A.A., Solovyev, V.V.: Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J. Mol. Biol. 247, 11–15 (1995)
Muggleton, S., King, R.D., Sternberg, M.J.E.: Protein secondary structure prediction using logic. Protein Eng. 5, 647–657 (1992)
Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599 (1993)
Dietterich, T.G.: Machine Learning Research: Four Current Directions. AI Magazine 18, 97–136 (1997)
Ouali, M., King, R.D.: Cascaded multiple classifiers for secondary structure prediction. Protein Sci. 9, 1162–1176 (2000)
Garnier, J., Gibrat, J.F., Robson, B.G.: Method for Predicting Protein Secondary Structure from Amino Acid Sequence. Methods in Enzymology 266, 541–553 (1996)
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)
Jones, D.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, Heidelberg (2001)
Kohonen, T., Kangas, J., Laaksonen, J., Torkkola, K.: LVQ_PAK: A program package for the correct application of Learning Vector Quantization algorithms. In: Proceedings of the International Joint Conference on Neural Networks, pp. 725–730 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
King, R.D., Ouali, M. (2004). Poly-transformation. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-28651-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22881-3
Online ISBN: 978-3-540-28651-6
eBook Packages: Springer Book Archive