Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Syntactic Chunking Across Different Corpora

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

  • 800 Accesses

Abstract

Syntactic chunking has been a well-defined and well-studied task since its introduction in 2000 as the conll shared task. Though some efforts have been further spent on chunking performance improvement, the experimental data has been restricted, with few exceptions, to (part of) the Wall Street Journal data, as adopted in the shared task. It remains open how those successful chunking technologies could be extended to other data, which may differ in genre/domain and/or amount of annotation. In this paper we first train chunkers with three classifiers on three different data sets and test on four data sets. We also vary the size of training data systematically to show data requirements for chunkers. It turns out that there is no significant difference between those state-of-the-art classifiers; training on plentiful data from the same corpus (switchboard) yields comparable results to Wall Street Journal chunkers even when the underlying material is spoken; the results from a large amount of unmatched training data can be obtained by using a very modest amount of matched training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19(2), 313–330 (1993)

    Google Scholar 

  2. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the conll-2000 shared task: Chunking. In: Cardie, C., Daelemans, W., Nedellec, C., Tjong Kim Sang, E. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 127–132 (2000)

    Google Scholar 

  3. Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus: A pre-announcement. In: Proceedings of 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (2005)

    Google Scholar 

  4. Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 shared task: Semantic role labeling. In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL 2005), Association for Computational Linguistics, Ann Arbor, Michigan, pp. 152–164. (2005)

    Google Scholar 

  5. Abney, S.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, pp. 257–278. Kluwer Academic Publishers, Boston (1991)

    Google Scholar 

  6. Abney, S.: Partial parsing via finite-state cascade. Natural Language Engineering 2(4), 337–344 (1996)

    Article  Google Scholar 

  7. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Yarovsky, D., Church, K. (eds.) Proceedings of the Third Workshop on Very Large Corpora., pp. 82–94 (1995)

    Google Scholar 

  8. Osborne, M.: Shallow parsing as part-of-speech tagging. In: Cardie, C., Daelemans, W., Nedellec, C., Tjong Kim Sang, E. (eds.) Proceedings of CoNLL 2000 and LLL 2000, Lisbon, Portugal, pp. 145–147 (2000)

    Google Scholar 

  9. Osborne, M.: Shallow parsing using noisy and non-stationary training material. Journal of Machine Learning Research 2, 695–719 (2002)

    Article  MATH  Google Scholar 

  10. Kudo, T., Matsumoto, Y.: Use of support vector learning for chunk identification. In: Cardie, C., Daelemans, W., Nedellec, C., Tjong Kim Sang, E. (eds.) Proceedings of CoNLL 2000 and LLL 2000, Lisbon, Portugal, pp. 142–144 (2000)

    Google Scholar 

  11. Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001. Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001, pp. 1–8. Association for Computational Linguistics, Morristown (2001)

    Google Scholar 

  12. Zhang, T., Damerau, F., Johnson, D.: Text chunking based on a generalization of winnow. Journal of Machine Learning Research 2, 615–637 (2002)

    Article  MATH  Google Scholar 

  13. Carreras, X., Màrquez, L., Castro, J.: Filtering-ranking perceptron learning for partial parsing. Machine Learning 60, 41–71 (2005)

    Article  Google Scholar 

  14. Ando, R., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, Michigan, Association for Computational Linguistics, pp. 1–9 (2005)

    Google Scholar 

  15. Ratnaparkhi, A.: A maximum entropy part-of-speech tagger. In: Brill, E., Church, K. (eds.) Proceedings of the Conference on Empirical Methods in Natural Language Processing 1996, pp. 133–142 (1996)

    Google Scholar 

  16. Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, Chichester (1998)

    MATH  Google Scholar 

  17. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: NAACL 2003. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 134–141. Association for Computational Linguistics, Morristown (2003)

    Google Scholar 

  18. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  19. Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley-Interscience, New York (1991)

    Book  MATH  Google Scholar 

  20. Gildea, D.: Corpus variation and parser performance. In: Lee, L., Harman, D. (eds.) Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 167–202 (2001)

    Google Scholar 

  21. Daumé III, H., Marcu, D.: Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research (conditionally accepted, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, W., Carletta, J., Moore, J. (2006). Syntactic Chunking Across Different Corpora. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_15

Download citation

  • DOI: https://doi.org/10.1007/11965152_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69267-6

  • Online ISBN: 978-3-540-69268-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics