Syntactic Chunking Across Different Corpora

Xu, Weiqun; Carletta, Jean; Moore, Johanna

doi:10.1007/11965152_15

Weiqun Xu¹⁹,
Jean Carletta¹⁹ &
Johanna Moore¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

800 Accesses

Abstract

Syntactic chunking has been a well-defined and well-studied task since its introduction in 2000 as the conll shared task. Though some efforts have been further spent on chunking performance improvement, the experimental data has been restricted, with few exceptions, to (part of) the Wall Street Journal data, as adopted in the shared task. It remains open how those successful chunking technologies could be extended to other data, which may differ in genre/domain and/or amount of annotation. In this paper we first train chunkers with three classifiers on three different data sets and test on four data sets. We also vary the size of training data systematically to show data requirements for chunkers. It turns out that there is no significant difference between those state-of-the-art classifiers; training on plentiful data from the same corpus (switchboard) yields comparable results to Wall Street Journal chunkers even when the underlying material is spoken; the results from a large amount of unmatched training data can be obtained by using a very modest amount of matched training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

Chunking in Turkish with Conditional Random Fields

TDC: Typed Dependencies-Based Chunking Model

Article 01 June 2017

References

Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the conll-2000 shared task: Chunking. In: Cardie, C., Daelemans, W., Nedellec, C., Tjong Kim Sang, E. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 127–132 (2000)
Google Scholar
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus: A pre-announcement. In: Proceedings of 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (2005)
Google Scholar
Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 shared task: Semantic role labeling. In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL 2005), Association for Computational Linguistics, Ann Arbor, Michigan, pp. 152–164. (2005)
Google Scholar
Abney, S.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, pp. 257–278. Kluwer Academic Publishers, Boston (1991)
Google Scholar
Abney, S.: Partial parsing via finite-state cascade. Natural Language Engineering 2(4), 337–344 (1996)
Article Google Scholar
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Yarovsky, D., Church, K. (eds.) Proceedings of the Third Workshop on Very Large Corpora., pp. 82–94 (1995)
Google Scholar
Osborne, M.: Shallow parsing as part-of-speech tagging. In: Cardie, C., Daelemans, W., Nedellec, C., Tjong Kim Sang, E. (eds.) Proceedings of CoNLL 2000 and LLL 2000, Lisbon, Portugal, pp. 145–147 (2000)
Google Scholar
Osborne, M.: Shallow parsing using noisy and non-stationary training material. Journal of Machine Learning Research 2, 695–719 (2002)
Article MATH Google Scholar
Kudo, T., Matsumoto, Y.: Use of support vector learning for chunk identification. In: Cardie, C., Daelemans, W., Nedellec, C., Tjong Kim Sang, E. (eds.) Proceedings of CoNLL 2000 and LLL 2000, Lisbon, Portugal, pp. 142–144 (2000)
Google Scholar
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001. Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001, pp. 1–8. Association for Computational Linguistics, Morristown (2001)
Google Scholar
Zhang, T., Damerau, F., Johnson, D.: Text chunking based on a generalization of winnow. Journal of Machine Learning Research 2, 615–637 (2002)
Article MATH Google Scholar
Carreras, X., Màrquez, L., Castro, J.: Filtering-ranking perceptron learning for partial parsing. Machine Learning 60, 41–71 (2005)
Article Google Scholar
Ando, R., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, Michigan, Association for Computational Linguistics, pp. 1–9 (2005)
Google Scholar
Ratnaparkhi, A.: A maximum entropy part-of-speech tagger. In: Brill, E., Church, K. (eds.) Proceedings of the Conference on Empirical Methods in Natural Language Processing 1996, pp. 133–142 (1996)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, Chichester (1998)
MATH Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: NAACL 2003. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 134–141. Association for Computational Linguistics, Morristown (2003)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley-Interscience, New York (1991)
Book MATH Google Scholar
Gildea, D.: Corpus variation and parser performance. In: Lee, L., Harman, D. (eds.) Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 167–202 (2001)
Google Scholar
Daumé III, H., Marcu, D.: Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research (conditionally accepted, 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

HCRC and ICCS, School of Informatics, University of Edinburgh,
Weiqun Xu, Jean Carletta & Johanna Moore

Authors

Weiqun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jean Carletta
View author publications
You can also search for this author in PubMed Google Scholar
Johanna Moore
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, W., Carletta, J., Moore, J. (2006). Syntactic Chunking Across Different Corpora. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_15

Download citation

DOI: https://doi.org/10.1007/11965152_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Syntactic Chunking Across Different Corpora

Abstract

Access this chapter

Preview

Similar content being viewed by others

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

Chunking in Turkish with Conditional Random Fields

TDC: Typed Dependencies-Based Chunking Model

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Syntactic Chunking Across Different Corpora

Abstract

Access this chapter

Preview

Similar content being viewed by others

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

Chunking in Turkish with Conditional Random Fields

TDC: Typed Dependencies-Based Chunking Model

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation