Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Large Margin Methods for Structured Output Prediction

  • Chapter
Computational Intelligence Paradigms

Part of the book series: Studies in Computational Intelligence ((SCI,volume 137))

  • 578 Accesses

Abstract

Many real-life data problems require effective classification algorithms able to model structural dependencies between multiple labels and to perform classification in a multivariate setting, i.e. such that complex, non-scalar predictions must be produced in correspondence to input vectors. Examples of these tasks range from natural language parsing to speech recognition, machine translation, image segmentation, handwritten character recognition or gene prediction.

Recently many algorithms have been developed in this direction in the machine learning community. They are commonly referred as structured output learning approaches. The main idea behind them is to produce an effective and flexible representation of the data exploiting general dependencies between labels. It has been shown that in many applications structured prediction methods outperform models that do not directly represent correlation between inputs and output labels.

Among the variety of the approaches developed in last few years, in particular large margin methods deserve attention since they have proved to be successful in several tasks. These techniques are based on the smart integration between Support Vector Machines (SVMs) and probabilistic graphical models (PGMs), so they combine the ability to learn in high dimensional feature spaces typical of kernel methods with the algorithmic efficiency and the flexibility in representing data inherited by PGMs.

In this paper we review some of the most recent large margin methods summarizing the main theoretical results, addressing some important computational issues, and presenting the most successful applications. Specifically, we show results in the context of biological sequence alignment and for sequence labeling and parsing in the natural language processing field. We finally discuss some of the main challenges in this new and promising research field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden markov support vector machines. In: 20th International Conference on Machine Learning (ICML) (2003)

    Google Scholar 

  2. Bagnell, J.A., Ratliff, N., Zinkevich, M.: Maximum margin planning. In: Proceedings of the International Conference on Machine Learning (ICML) (2006)

    Google Scholar 

  3. Bartlett, P.L., Collins, M., Taskar, B., McAllester, D.: Exponentiated gradient algorithms for large-margin structured classification. In: Advances in Neural Information Processing Systems (NIPS) (2004)

    Google Scholar 

  4. Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania (1999)

    Google Scholar 

  5. Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP) (2002)

    Google Scholar 

  6. Collins, M.: Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods. In: IWPT (2001)

    Google Scholar 

  7. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292 (2001)

    Article  Google Scholar 

  8. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  9. Fletcher, R.: Practical Methods of Optimization. 2nd edn. John Wiley & Sons, Chichester (1987)

    MATH  Google Scholar 

  10. Kassel, R.: A Comparison of Approaches to On-line Handwritten Character Recognition. PhD thesis, MIT Spoken Language Systems Group (1995)

    Google Scholar 

  11. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL, vol. 41, pp. 423–430 (2003)

    Google Scholar 

  12. Kumar, S., Hebert, M.: Discriminative random fields: A discriminative framework for contextual interaction in classification. In: IEEE International Conference Computer Vision (ICCV), vol. 2, pp. 1150–1157 (2003)

    Google Scholar 

  13. Lafferty, J., Pereira, F., McCallum, A.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (ICML) (2001)

    Google Scholar 

  14. Lafferty, J., Zhu, X., Liu, Y.: Kernel conditional random fields: representation and clique selection. In: Proceedings of the Twenty-first International Conference on Machine Learning (ICML) (2004)

    Google Scholar 

  15. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the International Conference on Machine Learning (ICML) (2000)

    Google Scholar 

  16. Nesterov, Y.: Dual extrapolation and its application for solving variational inequalites and related problems. Technical report, CORE, Catholic University of Louvain (2003)

    Google Scholar 

  17. Pinto, D., McCallum, A., Wei, X., Bruce Croft, W.: Table extraction using conditional random fields. In: Proceedings of the 26th ACM SIGIR conference, pp. 235–242 (2003)

    Google Scholar 

  18. Platt, J.: Using analytic QP and sparseness to speed training of support vector machines. In: Advances in Neural Information Processing Systems (NIPS), pp. 557–563 (1999)

    Google Scholar 

  19. Quattoni, A., Collins, M., Darrel, T.: Conditional random fields for object recognition (2005)

    Google Scholar 

  20. Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77, 257–286 (1989)

    Article  Google Scholar 

  21. Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Learning hierarchical multi-category text classification models. In: Proc. of the International Conference on Machine Learning (ICML), pp. 744–751 (2005)

    Google Scholar 

  22. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics annual meeting (HLT/NAACL) (2003)

    Google Scholar 

  23. Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: Neural Information Processing Systems (NIPS) (2003)

    Google Scholar 

  24. Taskar, B., Klein, D., Collins, M., Koller, D., Manning, C.: Max-margin parsing. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2004)

    Google Scholar 

  25. Taskar, B., Jordan, M., Lacoste-Julien, S.: Structured prediction via the extragradient method. In: Neural Information Processing Systems (NIPS) (2005)

    Google Scholar 

  26. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the International Conference on Machine Learning (ICML) (2004)

    Google Scholar 

  27. Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  28. Vishwanathan, S.V.N., Schraudolph, N.N., Schmidt, M.W., Murphy, K.: Accelerated training of conditional random fields with stochastic meta-descent. In: Proceedings of the International Conference on Machine Learning (ICML) (2006)

    Google Scholar 

  29. Wolsey, L.: Integer programming. John Wiley & Sons Inc., Chichester (1998)

    MATH  Google Scholar 

  30. Zhang, T.: Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research 2, 527–550 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Lakhmi C. Jain Mika Sato-Ilic Maria Virvou George A. Tsihrintzis Valentina Emilia Balas Canicious Abeynayake

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ricci, E., Perfetti, R. (2008). Large Margin Methods for Structured Output Prediction. In: Jain, L.C., Sato-Ilic, M., Virvou, M., Tsihrintzis, G.A., Balas, V.E., Abeynayake, C. (eds) Computational Intelligence Paradigms. Studies in Computational Intelligence, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79474-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-79474-5_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-79473-8

  • Online ISBN: 978-3-540-79474-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics