Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

An Integrated Statistical Model for Tagging and Chunking Unrestricted Text

  • Conference paper
  • First Online:
Text, Speech and Dialogue (TSD 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1902))

Included in the following conference series:

Abstract

In this paper, we present a corpus-based approach for tagging and chunking. The formalism used is based on stochastic finite-state automata. Therefore, it can include n-grams models or any stochastic finite-state automata learnt using grammatical inference techniques. As the models involved in our system are learnt automatically, it allows for a very flexible and portable system for different languages and chunk definitions. In order to show the viability of our approach, we present results for tagging and chunking using different combinations of bigrams and other more complex automata learnt by means of the Error Correcting Grammatical Inference (ECGI) algorithm. The experimentation was carried out on the Wall Street Journal corpus for English and on the Lexesp corpus for Spanish.

This work has been supported by the Spanish Research Project TIC97-0671-C02-01/02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. S. Abney. Parsing by Chunks. R. Berwick, S. Abney and C. Tenny (eds.) Principle-Based Parsing. Kluwer Academic Publishers, Dordrecht, 1991.

    Google Scholar 

  2. S. Abney. Partial Parsing via Finite-State Cascades. In Proceedings of the ESSLLI’96 Robust Parsing Workshop, Prague, Czech Republic, 1996.

    Google Scholar 

  3. S. Argamon, I. Dagan, and Y. Krymolowski. A Memory-Based Approach to Learning Shallow Natural Language Patterns. In Proceedings of the joint 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, COLING-ACL, pp. 67–73, Montréal, Canada, 1998.

    Google Scholar 

  4. S. Aït-Mokhtar and J.-P. Chanod. Incremental Finite-State Parsing. In Proceedings of the 5th Conference on Applied Natural Language Processing, Washington D.C., USA, 1997.

    Google Scholar 

  5. D. Bourigault. Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases. In Proceedings of the 15th International Conference on Computational Linguistics, pp. 977–981, 1992.

    Google Scholar 

  6. E. Brill. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-Of-Speech Tagging. Computational Linguistics, 21(4):543–565, 1995.

    Google Scholar 

  7. J. Carmona, S. Cervell, L. Màrquez, M. Martí, L. Padró, R. Placer, H. Rodríýguez, M. Taulé, and J. Turmo. An Environment for Morphosyntactic Processing of Unrestricted Spanish Text. In Proceedings of the 1st International Conference on Language Resources and Evaluation, LREC, pp. 915–922, Granada, Spain, May 1998.

    Google Scholar 

  8. K. W. Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the 1st Conference on Applied Natural Language Processing, ANLP, pp. 136–143. ACL, 1988.

    Google Scholar 

  9. P. Clarksond and R. Ronsenfeld. Statistical Language Modelling using the CMU-Cambridge Toolkit. In Proceedings of Euro speech, Rhodes, Greece, 1997.

    Google Scholar 

  10. W. Daelemans, S. Buchholz, and J. Veenstra. Memory-Based Shallow Parsing. In Proceedings ofEMNLP/VLC-99, pp. 239–246, University of Maryland, USA, June 1999.

    Google Scholar 

  11. W. Daelemans, J. Zavrel, P. Berck, and S. Gillis. MBT: A Memory-Based Part-Of-Speech Tagger Generator. In Proceedings of the 4th Workshop on Very Large Corpora, pp. 14–27, Copenhagen, Denmark, 1996.

    Google Scholar 

  12. E. Ejerhed. Finding Clauses in Unrestricted Text by Finitary and Stochastic Methods. In Proceedings of Second Conference on Applied Natural Language Processing, pp. 219–227. ACL, 1988.

    Google Scholar 

  13. D.M. Magerman. Learning Grammatical Structure Using Statistical Decision-Trees. In Proceedings of the 3rd International Colloquium on Grammatical Inference, ICGI, pp. 1–21, 1996. Springer-Verlag Lecture Notes Series in Artificial Intelligence 1147.

    Google Scholar 

  14. M. P. Marcus, M.A. Marcinkiewicz, and B. Santorini. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 1993.

    Google Scholar 

  15. B. Merialdo. Tagging English Text with a Probabilistic Model. Computational Linguistics, 20(2):155–171, 1994.

    Google Scholar 

  16. F. Pla and A. Molina. Etiquetado Morfosintáctico del Corpus BDGEO. In Proceedings of the CAEPIA, Murcia, España, November 1999.

    Google Scholar 

  17. F. Pla and N. Prieto. Using Grammatical Inference Methods for Automatic Part-Of-Speech Tagging. In Proceedings of 1st International Conference on Language Resources and Evaluation, LREC, Granada, Spain, 1998.

    Google Scholar 

  18. N. Prieto and E. Vidal. Learning Language Models through the ECGI Method. Speech Communication, 1:299–309, 1992.

    Article  Google Scholar 

  19. L. Ramshaw and M. Marcus. Text Chunking Using Transformation-Based Learning. In Proceedings of third Workshop on Very Large Corpora, pp. 82–94, June 1995.

    Google Scholar 

  20. A. Ratnaparkhi. A Maximum Entropy Part-Of-Speech Tagger. In Proceedings of the 1st Conference on Empirical Methods in Natural Language Processing, EMNLP, 1996.

    Google Scholar 

  21. A. Voutilainen. NPTool, a Detector of English Noun Phrases. In Proceedings of the Workshop on Very Large Corpora. ACL, June 1993.

    Google Scholar 

  22. A. Voutilainen. A Syntax-Based Part-Of-Speech Analyzer. In Proceedings of the 7th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL, Dublin, Ireland, 1995.

    Google Scholar 

  23. A. Voutilainen and L. Padró. Developing a Hybrid NP Parser. In Proceedings of the 5th Conference on Applied Natural Language Processing, ANLP, pp. 80–87, Washington DC, 1997. ACL.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pla, F., Molina, A., Prieto, N. (2000). An Integrated Statistical Model for Tagging and Chunking Unrestricted Text. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-45323-7_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41042-3

  • Online ISBN: 978-3-540-45323-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics