Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-12683-3_34guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Leveraging sequence classification by taxonomy-based multitask learning

Published: 25 April 2010 Publication History

Abstract

In this work we consider an inference task that biologists are very good at: deciphering biological processes by bringing together knowledge that has been obtained by experiments using various organisms, while respecting the differences and commonalities of these organisms We look at this problem from an sequence analysis point of view, where we aim at solving the same classification task in different organisms We investigate the challenge of combining information from several organisms, whereas we consider the relation between the organisms to be defined by a tree structure derived from their phylogeny Multitask learning, a machine learning technique that recently received considerable attention, considers the problem of learning across tasks that are related to each other We treat each organism as one task and present three novel multitask learning methods to handle situations in which the relationships among tasks can be described by a hierarchy These algorithms are designed for large-scale applications and are therefore applicable to problems with a large number of training examples, which are frequently encountered in sequence analysis We perform experimental analyses on synthetic data sets in order to illustrate the properties of our algorithms Moreover, we consider a problem from genomic sequence analysis, namely splice site recognition, to illustrate the usefulness of our approach We show that intelligently combining data from 15 eukaryotic organisms can indeed significantly improve the prediction performance compared to traditional learning approaches On a broader perspective, we expect that algorithms like the ones presented in this work have the potential to complement and enrich the strategy of homology-based sequence analysis that are currently the quasi-standard in biological sequence analysis.

References

[1]
Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data JMLR 6, 1817-1853 (2005)
[2]
Ben-Hur, A., Ong, C.S., Sonnenburg, S., Schölkopf, B., Rätsch, G.: Support vector machines and kernels for computational biology PLoS Comput Biol 4(10), e1000173 (2008)
[3]
Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Wortman, J.: Learning bounds for domain adaptation In: Advances in Neural Information Processing Systems, vol 20, pp 129-136 (2008)
[4]
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001)
[5]
Daumé, H.: Frustratingly easy domain adaptation In: ACL, The Association for Computer Linguistics (2007)
[6]
Evgeniou, T., Micchelli, C.A., Pontil, M.: Learning multiple tasks with kernel methods Journal of Machine Learning Research 6, 615-637 (2005)
[7]
Joachims, T.: SVMLight: Support Vector Machine SVM-Light Support Vector Machine University of Dortmund (1999), http://svmlight.joachims.org/
[8]
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification In: Proceedings of the Pacific Symposium on Biocomputing, pp 564-575 (2002)
[9]
Rätsch, G., Sonnenburg, S.: Accurate Splice Site Detection for Caenorhabditis elegans MIT Press, Cambridge (2004)
[10]
Rätsch, G., Sonnenburg, S., Srinivasan, J., Witte, H., Müller, K.-R., Sommer, R., Schölkopf, B.: Improving the C elegans genome annotation using machine learning PLoS Computational Biology 3(2), e20 (2007)
[11]
Schweikert, G., Widmer, C., Schölkopf, B., Rätsch, G.: An empirical analysis of domain adaptation algorithms In: Advances in Neural Information Processing System, NIPS, Vancouver, B.C., vol 22 (2008)
[12]
Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Zien, A., de Bona, F., Gehl, C., Binder, A., Franc, V.: The shogun machine learning toolbox (under revision) Journal of Machine Learning Research (2010)
[13]
Vapnik, V.N.: The nature of statistical learning theory Springer, New York (1995)

Cited By

View all
  • (2024)The use of multi-task learning in cybersecurity applications: a systematic literature reviewNeural Computing and Applications10.1007/s00521-024-10436-336:35(22053-22079)Online publication date: 1-Dec-2024
  • (2015)Hierarchical Bayesian Inference and Recursive Regularization for Large-Scale ClassificationACM Transactions on Knowledge Discovery from Data10.1145/26295859:3(1-23)Online publication date: 13-Apr-2015
  • (2013)Recursive regularization for large-scale classification with hierarchical and graphical dependenciesProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2487575.2487644(257-265)Online publication date: 11-Aug-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
RECOMB'10: Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
April 2010
579 pages
ISBN:3642126820
  • Editor:
  • Bonnie Berger

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 25 April 2010

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The use of multi-task learning in cybersecurity applications: a systematic literature reviewNeural Computing and Applications10.1007/s00521-024-10436-336:35(22053-22079)Online publication date: 1-Dec-2024
  • (2015)Hierarchical Bayesian Inference and Recursive Regularization for Large-Scale ClassificationACM Transactions on Knowledge Discovery from Data10.1145/26295859:3(1-23)Online publication date: 13-Apr-2015
  • (2013)Recursive regularization for large-scale classification with hierarchical and graphical dependenciesProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2487575.2487644(257-265)Online publication date: 11-Aug-2013
  • (2011)Hierarchical multitask structured output learning for large-scale sequence segmentationProceedings of the 25th International Conference on Neural Information Processing Systems10.5555/2986459.2986759(2690-2698)Online publication date: 12-Dec-2011

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media