Abstract
We describe and empirically evaluate machine learning methods for the prediction of zinc binding sites from protein sequences. We start by observing that a data set consisting of single residues as examples is affected by autocorrelation and we propose an ad-hoc remedy in which sequentially close pairs of candidate residues are classified as being jointly involved in the coordination of a zinc ion. We develop a kernel for this particular type of data that can handle variable length gaps between candidate coordinating residues. Our empirical evaluation on a data set of non redundant protein chains shows that explicit modeling the correlation between residues close in sequence allows us to gain a significant improvement in the prediction performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blom, N., Gammeltoft, S., Brunak, S.: Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294, 1351–1362 (1999)
Nielsen, H., Brunak, S., von Heijne, G.: Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng. 12(1), 3–9 (1999)
Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G.: Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1–6 (1997)
Martelli, P.L., Fariselli, P., Casadio, R.: Prediction of disulfide-bonded cysteines in proteomes with a hidden neural network. Proteomics 4, 1665–1671 (2004)
Fiser, A., Simon, I.: Predicting the oxidation state of cysteines by multiple sequence alignment. Bioinformatics 16, 251–256 (2000)
Fariselli, P., Casadio, R.: Prediction of disulfide connectivity in proteins. Bioinformatics 17, 957–964 (2001)
Vullo, A., Frasconi, P.: Disulfide connectivity prediction using recursive neural networks and evolutionary information. Bioinformatics 20, 653–659 (2004)
Andreini, C., Bertini, I., Rosato, A.: A hint to search for metalloproteins in gene banks. Bioinformatics 20, 1373–1380 (2004)
Passerini, A., Frasconi, P.: Learning to discriminate between ligand-bound and disulfide-bound cysteines. Protein Eng. Des. Sel. 17, 367–373 (2004)
Rost, B., Sander, C.: Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. U.S.A. 90, 7558–7562 (1993)
Jensen, D., Neville, J.: Linkage and autocorrelation cause feature selection bias in relational learning. In: Proceedings of the Nineteenth International Conference on Machine Learning (ICML 2002) (2002)
Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco (2002)
Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)
Mika, S., Rost, B.: Uniqueprot: creating sequence-unique protein data sets. Nucleic Acids Res. 31, 3789–3791 (2003)
Vallee, B.L., Auld, D.S.: Functional zinc-binding motifs in enzymes and DNA-binding proteins. Faraday Discuss, 47–65 (1992)
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 1–25 (1995)
Schölkopf, B., Smola, A.: Learning with Kernels. The MIT Press, Cambridge (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge Univ. Press, Cambridge (2004)
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. MIT Press, Cambridge (2000)
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Menchetti, S., Passerini, A., Frasconi, P., Andreini, C., Rosato, A. (2006). Improving Prediction of Zinc Binding Sites by Modeling the Linkage Between Residues Close in Sequence. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_26
Download citation
DOI: https://doi.org/10.1007/11732990_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33295-4
Online ISBN: 978-3-540-33296-1
eBook Packages: Computer ScienceComputer Science (R0)