Large Margin Methods for Structured Output Prediction

Ricci, Elisa; Perfetti, Renzo

doi:10.1007/978-3-540-79474-5_5

Elisa Ricci¹ &
Renzo Perfetti¹

Part of the book series: Studies in Computational Intelligence ((SCI,volume 137))

578 Accesses

Abstract

Many real-life data problems require effective classification algorithms able to model structural dependencies between multiple labels and to perform classification in a multivariate setting, i.e. such that complex, non-scalar predictions must be produced in correspondence to input vectors. Examples of these tasks range from natural language parsing to speech recognition, machine translation, image segmentation, handwritten character recognition or gene prediction.

Recently many algorithms have been developed in this direction in the machine learning community. They are commonly referred as structured output learning approaches. The main idea behind them is to produce an effective and flexible representation of the data exploiting general dependencies between labels. It has been shown that in many applications structured prediction methods outperform models that do not directly represent correlation between inputs and output labels.

Among the variety of the approaches developed in last few years, in particular large margin methods deserve attention since they have proved to be successful in several tasks. These techniques are based on the smart integration between Support Vector Machines (SVMs) and probabilistic graphical models (PGMs), so they combine the ability to learn in high dimensional feature spaces typical of kernel methods with the algorithmic efficiency and the flexibility in representing data inherited by PGMs.

In this paper we review some of the most recent large margin methods summarizing the main theoretical results, addressing some important computational issues, and presenting the most successful applications. Specifically, we show results in the context of biological sequence alignment and for sequence labeling and parsing in the natural language processing field. We finally discuss some of the main challenges in this new and promising research field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Bayesian Nonlinear Support Vector Machines for Big Data

Machine Learning Methodology in Bioinformatics

Nonlinear optimization and support vector machines

Article Open access 14 April 2022

References

Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden markov support vector machines. In: 20th International Conference on Machine Learning (ICML) (2003)
Google Scholar
Bagnell, J.A., Ratliff, N., Zinkevich, M.: Maximum margin planning. In: Proceedings of the International Conference on Machine Learning (ICML) (2006)
Google Scholar
Bartlett, P.L., Collins, M., Taskar, B., McAllester, D.: Exponentiated gradient algorithms for large-margin structured classification. In: Advances in Neural Information Processing Systems (NIPS) (2004)
Google Scholar
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania (1999)
Google Scholar
Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP) (2002)
Google Scholar
Collins, M.: Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods. In: IWPT (2001)
Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292 (2001)
Article Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Google Scholar
Fletcher, R.: Practical Methods of Optimization. 2nd edn. John Wiley & Sons, Chichester (1987)
MATH Google Scholar
Kassel, R.: A Comparison of Approaches to On-line Handwritten Character Recognition. PhD thesis, MIT Spoken Language Systems Group (1995)
Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL, vol. 41, pp. 423–430 (2003)
Google Scholar
Kumar, S., Hebert, M.: Discriminative random fields: A discriminative framework for contextual interaction in classification. In: IEEE International Conference Computer Vision (ICCV), vol. 2, pp. 1150–1157 (2003)
Google Scholar
Lafferty, J., Pereira, F., McCallum, A.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (ICML) (2001)
Google Scholar
Lafferty, J., Zhu, X., Liu, Y.: Kernel conditional random fields: representation and clique selection. In: Proceedings of the Twenty-first International Conference on Machine Learning (ICML) (2004)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the International Conference on Machine Learning (ICML) (2000)
Google Scholar
Nesterov, Y.: Dual extrapolation and its application for solving variational inequalites and related problems. Technical report, CORE, Catholic University of Louvain (2003)
Google Scholar
Pinto, D., McCallum, A., Wei, X., Bruce Croft, W.: Table extraction using conditional random fields. In: Proceedings of the 26th ACM SIGIR conference, pp. 235–242 (2003)
Google Scholar
Platt, J.: Using analytic QP and sparseness to speed training of support vector machines. In: Advances in Neural Information Processing Systems (NIPS), pp. 557–563 (1999)
Google Scholar
Quattoni, A., Collins, M., Darrel, T.: Conditional random fields for object recognition (2005)
Google Scholar
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77, 257–286 (1989)
Article Google Scholar
Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Learning hierarchical multi-category text classification models. In: Proc. of the International Conference on Machine Learning (ICML), pp. 744–751 (2005)
Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics annual meeting (HLT/NAACL) (2003)
Google Scholar
Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: Neural Information Processing Systems (NIPS) (2003)
Google Scholar
Taskar, B., Klein, D., Collins, M., Koller, D., Manning, C.: Max-margin parsing. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2004)
Google Scholar
Taskar, B., Jordan, M., Lacoste-Julien, S.: Structured prediction via the extragradient method. In: Neural Information Processing Systems (NIPS) (2005)
Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the International Conference on Machine Learning (ICML) (2004)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
MATH Google Scholar
Vishwanathan, S.V.N., Schraudolph, N.N., Schmidt, M.W., Murphy, K.: Accelerated training of conditional random fields with stochastic meta-descent. In: Proceedings of the International Conference on Machine Learning (ICML) (2006)
Google Scholar
Wolsey, L.: Integer programming. John Wiley & Sons Inc., Chichester (1998)
MATH Google Scholar
Zhang, T.: Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research 2, 527–550 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Electronic and Information Engineering, University of Perugia, 06125, Perugia, Italy
Elisa Ricci & Renzo Perfetti

Authors

Elisa Ricci
View author publications
You can also search for this author in PubMed Google Scholar
Renzo Perfetti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Lakhmi C. Jain Mika Sato-Ilic Maria Virvou George A. Tsihrintzis Valentina Emilia Balas Canicious Abeynayake

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ricci, E., Perfetti, R. (2008). Large Margin Methods for Structured Output Prediction. In: Jain, L.C., Sato-Ilic, M., Virvou, M., Tsihrintzis, G.A., Balas, V.E., Abeynayake, C. (eds) Computational Intelligence Paradigms. Studies in Computational Intelligence, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79474-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-79474-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79473-8
Online ISBN: 978-3-540-79474-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Large Margin Methods for Structured Output Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Bayesian Nonlinear Support Vector Machines for Big Data

Machine Learning Methodology in Bioinformatics

Nonlinear optimization and support vector machines

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Large Margin Methods for Structured Output Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Bayesian Nonlinear Support Vector Machines for Big Data

Machine Learning Methodology in Bioinformatics

Nonlinear optimization and support vector machines

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation