Abstract
A class of new kernels has been developed for vectors derived from a coding scheme of the k-peptide composition for protein sequences. Each kernel defines the biological similarity for two mapped k-peptide coding vectors. The mapping transforms a k-peptide coding vector into a new vector based on a matrix formed by high BLOSUM scores associated with pairs of k-peptides. In conjunction with the use of support vector machines, the effectiveness of the new kernels is evaluated against the conventional coding scheme of k-peptide (k ≤ 3) for the prediction of subcellular localizations of proteins in Gram-negative bacteria. It is demonstrated that the new method outperforms all the other methods in a 5-fold cross-validation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bannai, H., Tamada, Y., Maruyama, O., Nakai, K., Miyano, S.: Extensive feature detection of N-terminal protein sorting signals. Bioinformatics 18, 298–305 (2002)
Cai, Y.D., Chou, K.C.: Predicting subcellular localization of proteins in a hybridization space. Bioinformatics 20, 1151–1156 (2003)
Chou, K.C., Cai, Y.D.: Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 277, 45765–4576 (2002)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005–1016 (2000)
Emanuelsson, O.: Predicting protein subcellular localisation from amino acid sequence information. Brief. Bioinform. 3, 361–376 (2002)
Feng, Z.P.: Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. Biopolymers 58, 491–499 (2001)
Gardy, J.L., et al.: PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res. 31, 3613–3617 (2003)
Gardy, J.L., et al.: PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21, 617–623 (2005)
von Heijne, G.: Signals for protein targeting into and across membranes. Subcell. Biochem. 22, 1–19 (1994)
Horton, P., Nakai, K.: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci. 24, 34–36 (1999)
Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001)
Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. In: Proc. of the Seventh International Conference on Intelligent Systems for Molecular Biology, pp. 149–158 (1999)
Joachims, T.: Making Large Scale SVM Learning Practical. Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge (1999)
Lei, Z., Dai, Y.: A novel approach for prediction of protein subcellular localization from sequence using Fourier analysis and support vector machines. In: Proc. of the Fourth ACM SIGKDD Workshop on Data Mining in Bioinformatics, pp. 11–17 (2004)
Lei, Z., Dai, Y.: A new kernel based on high-scored pairs of tri-peptides and its application in prediction of protein subcellular localization. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 903–910. Springer, Heidelberg (2005)
Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)
Li, H., Jiang, T.: A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. In: Proc. of the Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 262–271 (2004)
Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, J., Macdonell, C., Eisner, R.: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20, 547–556 (2004)
Meinicke, P., Tech, M., Morgenstern, B., Merkl, R.: Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites. BMC Bioinformatics 5, 169 (2004)
Menne, K.M.L., Hermjakob, H., Apweiler, R.: A comparison of signal sequence prediction methods using a test set of signal peptides. Bioinformatics 16, 741–742 (2000)
Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach - A case study in intensive care monitoring. In: Proc. of the Sixteenth International Conference on Machine Learning, pp. 268–277 (1999)
Nair, R., Rost, B.: Sequence conserved for subcellular localization. Protein Sci. 11, 2836–2847 (2002)
Nakai, K.: Protein sorting signals and prediction of subcellular localization. Adv. Protein. Chem. 54, 277–344 (2000)
Nakai, K., Kanehisa, M.: Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins 11, 95–110 (1991)
Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G.: A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int. J. Neural Syst. 8, 581–599 (1997)
Park, K., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19, 1656–1663 (2003)
Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res. 26, 2230–2236 (1998)
Tusnady, G.E., Simon, I.: Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 283, 489–506 (1998)
Tusnady, G.E., Simon, I.: The HMMTOP transmembrane topology prediction server. Bioinformatics 17, 849–850 (2001)
Yu, C.S., Lin, C.J., Hwang, J.K.: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci. 13, 1402–1406 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lei, Z., Dai, Y. (2005). A Class of New Kernels Based on High-Scored Pairs of k-Peptides for SVMs and Its Application for Prediction of Protein Subcellular Localization. In: Priami, C., Zelikovsky, A. (eds) Transactions on Computational Systems Biology II. Lecture Notes in Computer Science(), vol 3680. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11567752_3
Download citation
DOI: https://doi.org/10.1007/11567752_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29401-6
Online ISBN: 978-3-540-31661-9
eBook Packages: Computer ScienceComputer Science (R0)