Abstract
In the human proteome, about 5’000 proteins lack experimentally validated functional information. In this work we propose to tackle the problem of human protein function prediction by three distinct supervised learning schemes: one-versus-all classification; tournament learning; multi-label learning. Target values of supervised learning models are represented by the nodes of a subset of the Gene Ontology, which is widely used as a benchmark for functional prediction. With an independent dataset including very difficult cases the recall measure reached a reasonable performance for the first 50 ranked predictions, on average; however, average precision was quite low.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.C., De Moor, B., Marynen, P., Hassan, B., et al.: Gene prioritization through genomic data fusion. Nat. Biotechnol. 24(5), 537–544 (2006)
Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)
Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth, Belmont (1984)
Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: web based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Research 31(13), 3692–3697 (2003)
Eisenberg, D., Schwarz, E., Komaromy, M., Wall, R.: Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 179(1), 125–142 (1984)
Hu, L., Huang, T., Shi, X., Lu, W.C., Cai, Y.D., Chou, K.C.: Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS One 6(1), e14556 (2011)
Jensen, L.J., Gupta, R., Staerfeldt, H.-H., Brunak, S.: Prediction of human protein function according to gene ontology categories. Bioinformatics 19(5), 635–642 (2003)
Kazawa, H., Izumitani, T., Taira, H., Maeda, E.: Maximal margin labelling for multi-topic text categorization. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 649–656. MIT Press, Cambridge (2005)
Mewes, H.W., Heumann, K., Kaps, A., Mayer, K., Pfeiffer, F., Stocker, S., Frishman, D.: MIPS: a database for protein sequences and complete genomes. Nucl. Acids Research 27, 44–48 (1999)
Pena-Castillo, L., Tasan, M., Myers, C.L., Lee, H., Joshi, T., Zhang, C., Guan, Y., Leone, M., Pagnani, A., Kim, W.K., et al.: A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 9(suppl. 1), S2 (2008)
Ranea, J.A., Yeats, C., Grant, A., Orengo, C.A.: Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes. PLoS Comput. Biol. 3(11), e237 (2007)
The Gene Ontology Consortium. The gene ontology project in 2008. Nucleic Acid Research 36(1), D440–D444 (November 2007)
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. International Journal of Data Warehouse and Mining 3(3), 1–13 (2007)
Vapnik, V.: The nature of statistical learning. Springer, Heidelberg (1995)
Vens, C., Struyf, J., Schietgat, L., Dzeroski, S.: Decision trees for hierarchical multi-label classification. Machine Learning 73(2), 185–214 (2008)
Zhang, M.L., Zhou, Z.H.: Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering 18(10), 1338–1351 (2006)
Zhu, M., Gao, L., Guo, Z., Li, Y., Wang, D., Wang, J., Wang, C.: Globally predicting protein functions based on co-expressed protein-protein interaction networks and ontology taxonomy similarities. Gene 391(1-2), 113–119 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bologna, G., Veuthey, AL., Pagni, M., Lane, L., Bairoch, A. (2011). A Preliminary Study on the Prediction of Human Protein Functions. In: Ferrández, J.M., Álvarez Sánchez, J.R., de la Paz, F., Toledo, F.J. (eds) Foundations on Natural and Artificial Computation. IWINAC 2011. Lecture Notes in Computer Science, vol 6686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21344-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-21344-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21343-4
Online ISBN: 978-3-642-21344-1
eBook Packages: Computer ScienceComputer Science (R0)