Abstract
We present a method, called equivalence learning, which applies a two-class classification approach to object-pairs defined within a multi-class scenario. The underlying idea is that instead of classifying objects into their respective classes, we classify object pairs either as equivalent (belonging to the same class) or non-equivalent (belonging to different classes). The method is based on a vectorisation of the similarity between the objects and the application of a machine learning algorithm (SVM, ANN, LogReg, Random Forests) to learn the differences between equivalent and non-equivalent object pairs, and define a unique kernel function that can be obtained via equivalence learning. Using a small dataset of archaeal, bacterial and eukaryotic 3-phosphoglycerate-kinase sequences we found that the classification performance of equivalence learning slightly exceeds those of several simple machine learning algorithms at the price of a minimal increase in time and space requirements.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Pearson, W.R.: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98 (1985)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Eddy, S.: HMMER Biological sequence analysis using profile hidden Markov models, Version 2.3.2 (2003), http://hmmer.janelia.org/
Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. 2nd edn. Cold Spring Harbor Laboratory Press (2004)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Lodhi, H., Saunders, C., Cristianini, N., Watkins, C., Shawe-Taylor, J.: String Matching Kernels for Text Classification. Journal of Machine Learning Research 2, 419–444 (2002)
Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. In: Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge (2003)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: PSB 2002. Proceedings of the Pacific Symposium on Biocomputing, World Scientific Publishing, Singapore (2002)
Vert, J.-P., Saigo, H., Akatsu, T.: Local alignment kernels for biological sequences. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel methods in Computational Biology, pp. 131–154. MIT Press, Cambridge (2004)
Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. In: Proc Int. Conf. Intell. Syst. Mol. Biol., pp. 149–158 (1999)
Bishop, D.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
Rice, J.C.: Logistic regression: An introduction. In: Thompson, B. (ed.) Advances in social science methodology, vol. 3, pp. 191–245. JAI Press, Greenwich, CT (1994)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Cristianini, N., Kandola, J., Elisseeff, A., Shawe-Taylor, J.: On Kernel Target Alignment. In: Advances in Neural Information Processing Systems, vol. 14, pp. 367–373 (2001)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research 5, 27–72 (2004)
Kwok, T.J., Tsang, I.W.: Learning with idealized Kernels. In: Proc. of the 28 International Confernece on Machine Learning, Washington, DC (2003)
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10, 857–868 (2003)
Vlahovicek, K., Kajan, L., Agoston, V., Pongor, S.: The SBASE domain sequence resource, release 12: prediction of protein domain-architecture using support vector machines. Nucleic Acids Res. 33, 223–225 (2005)
Tsuda, K.: Support vector classification with asymmetric kernel function. Pros. ESANN, 183–188 (1999)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning tools and Techniques with JAVA implementations. Morgan Kaufman, Seattle, Washington (1999)
Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer, Heidelberg (1984)
Kertesz-Farkas, A., Dhir, S., Sonego, P., Pacurar, M., Netoteia, S., Nijveen, H., Leunissen, J., Kocsor, A., Pongor, S.: A comparison of random and supervised cross-validation strategies and benchmark datasets for protein classification (submitted for publication, 2007)
Sonego, P., Pacurar, M., Dhir, D., Kertész-Farkas, A., Kocsor, A., Gáspári, Z., Leunissen, J.A.M., Pongor, S.: A Protein Classification Benchmark collection for Machine Learning. Nucleid Acids Research
Henikoff, S., Henikoff, J.G., Pietrokovski, S.: Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioin-formatics 15, 471–479 (1999)
Gribskov, M., Robinson, N.L.: Use of Receiver Operating Characteristic (ROC) analysis to evaluate sequence matching. Comput. Chem. 20, 25–33 (1996)
Johns, K.W., Williams, D.A.: Acquired equivalence learning with antecedent and consequent unconditioned stimuli. J. Exp. Psychol. Anim. Behav. Process 24m, 3–14 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kertész-Farkas, A., Kocsor, A., Pongor, S. (2007). Equivalence Learning in Protein Classification. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_62
Download citation
DOI: https://doi.org/10.1007/978-3-540-73499-4_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)