Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Head-Driven Statistical Models for Natural Language Parsing

Published: 01 December 2003 Publication History

Abstract

This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.

References

[1]
Alshawi, Hiyan. 1996. Head automata and bilingual tiling: Translation with minimal representations. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 167-176.
[2]
Alshawi, Hiyan and David Carter. 1994. Training and scaling preference functions for disambiguation. Computational Linguistics, 20(4):635-648.
[3]
Bikel, Dan. 2000. A statistical model for parsing and word-sense disambiguation. In Proceedings of the Student Research Workshop at ACL 2000.
[4]
Bikel, Dan, Scott Miller, Richard Schwartz, and Ralph Weischedel. 1997. Nymble: A high-performance learning name-finder. In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 194-201.
[5]
Black, Ezra, Steven Abney, Dan Flickinger, Claudia Gdaniec, Ralph Grishman, Philip Harrison, Donald Hindle, Robert Ingria, Frederick Jelinek, Judith Klavans, Mark Liberman, Mitch Marcus, Salim Roukos, Beatrice Santorini, and Tomek Strzalkowski. 1991. A Procedure for quantitatively comparing the syntactic coverage of english grammars. In Proceedings of the February 1991 DARPA Speech and Natural Language Workshop.
[6]
Black, Ezra, Frederick Jelinek, John Lafferty, David Magerman, Robert Mercer and Salim Roukos. 1992. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings of the Fifth DARPA Speech and Natural Language Workshop, Harriman, NY.
[7]
Blaheta, Don, and Eugene Charniak. 2000. Assigning function tags to parsed text. In Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pages 234-240, Seattle.
[8]
Bod, Rens. 2001. What is the minimal set of fragments that achieves maximal parse accuracy? In Proceedings of ACL 2001.
[9]
Booth, Taylor L., and Richard A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442-450.
[10]
Brill, Eric. 1993. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics.
[11]
Charniak, Eugene. 1997. Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI Press/MIT Press, Menlo Park, CA.
[12]
Charniak, Eugene. 2000. A maximum-entropy-inspired parser. In Proceedings of NAACL 2000.
[13]
Charniak, Eugene. 2001. Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics.
[14]
Chelba, Ciprian, and Frederick Jelinek. 1998. Exploiting syntactic structure for language modeling. In Proceedings of COLING-ACL 1998, Montreal.
[15]
Chiang, David. 2000. Statistical parsing with an automatically-extracted tree adjoining grammar. In Proceedings of ACL 2000, Hong Kong, pages 456-463.
[16]
Collins, Michael, and James Brooks. 1995. Prepositional phrase attachment through a backed-off model. Proceedings of the Third Workshop on Very Large Corpora, pages 27-38.
[17]
Collins, Michael. 1996. A new statistical parser based on bigram lexical dependencies. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 184-191.
[18]
Collins, Michael. 1997. Three generative, lexicalised models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pages 16-23.
[19]
Collins, Michael. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.
[20]
Collins, Michael, Jan Hajic, Lance Ramshaw, and Christoph Tillmann. 1999. A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of the ACL, College Park, Maryland.
[21]
Collins, Michael. 2000. Discriminative reranking for natural language parsing. Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000).
[22]
Collins, Michael. 2002. Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods. To appear as a book chapter.
[23]
De Marcken, Carl. 1995. On the unsupervised induction of phrase-structure grammars. In Proceedings of the Third Workshop on Very Large Corpora.
[24]
Eisner, Jason. 1996a. Three new probabilistic models for dependency parsing: An exploration. Proceedings of COLING-96, pages 340-345.
[25]
Eisner, Jason. 1996b. An empirical comparison of probability models for dependency grammar. Technical Report IRCS-96-11, Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia.
[26]
Eisner, Jason. 2002. Transformational priors over grammars. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia.
[27]
Eisner, Jason, and Giorgio Satta. 1999. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of the 37th Annual Meeting of the ACL.
[28]
Freund, Yoav, Raj Iyer, Robert E. Schapire, and Yoram Singer. 1998. An efficient boosting algorithm for combining preferences. In Machine Learning: Proceedings of the Fifteenth International Conference. Morgan Kaufmann.
[29]
Gazdar, Gerald, E. H. Klein, G. K. Pullum, and Ivan Sag. 1985. Generalized Phrase Structure Grammar. Harvard University Press, Cambridge, MA.
[30]
Gildea, Daniel. 2001. Corpus variation and parser performance. In Proceedings of 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pittsburgh, PA.
[31]
Goodman, Joshua. 1997. Probabilistic feature grammars. In Proceedings of the Fourth International Workshop on Parsing Technologies.
[32]
Hermjakob, Ulf, and Ray Mooney. 1997. Learning parse and translation decisions from examples with rich context. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pages 482-489.
[33]
Hindle, Don, and Mats Rooth. 1991. Structural Ambiguity and Lexical Relations. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics.
[34]
Hopcroft, John, and J. D. Ullman. 1979. Introduction to automata theory, languages, and computation. Addison-Wesley, Reading, MA.
[35]
Jelinek, Frederick, John Lafferty, David Magerman, Robert Mercer, Adwait Ratnaparkhi, and Salim Roukos. 1994. Decision tree parsing using a hidden derivation model. In Proceedings of the 1994 Human Language Technology Workshop, pages 272-277.
[36]
Johnson, Mark. 1997. The effect of alternative tree representations on tree bank grammars. In Proceedings of NeMLAP 3.
[37]
Jones, Mark and Jason Eisner. 1992a. A probabilistic parser applied to software testing documents. In Proceedings of National Conference on Artificial Intelligence (AAAI-92), pages 322-328, San Jose, CA.
[38]
Jones, Mark and Jason Eisner. 1992b. A probabilistic parser and its application. In Proceedings of the AAAI-92 Workshop on Statistically-Based Natural Language Processing Techniques, San Jose, CA.
[39]
Joshi, Aravind and Bangalore Srinivas. 1994. Disambiguation of super parts of speech (or supertags): Almost parsing. In International Conference on Computational Linguistics (COLING 1994), Kyoto University, Japan, August.
[40]
Klein, Dan and Christopher Manning. 2002. Conditional structure versus conditional estimation in NLP models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia.
[41]
Lafferty, John, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML 2001.
[42]
Lafferty, John, Daniel Sleator, and David Temperley. 1992. Grammatical trigrams: A probabilistic model of link grammar. Proceedings of the 1992 AAAI Fall Symposium on Probabilistic Approaches to Natural Language.
[43]
Magerman, David. 1995. Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 276-283.
[44]
Manning, Christopher D., and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
[45]
Marcus, Mitchell, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. 1994. The Penn Treebank: Annotating predicate argument structure. Proceedings of the 1994 Human Language Technology Workshop, pages 110-115.
[46]
Marcus, Mitchell, Beatrice Santorini and M. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330.
[47]
Miller, Scott, Heidi Fox, Lance Ramshaw, and Ralph Weischedel. 2000. A novel use of statistical parsing to extract information from text. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 226-233.
[48]
Pereira, Fernando, and David Warren. 1980. Definite clause grammars for language analysis: A survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence, 13:231-278.
[49]
Pinker, Stephen. 1994. The Language Instinct. William Morrow, New York.
[50]
Ratnaparkhi, Adwait. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, May.
[51]
Ratnaparkhi, Adwait. 1997. A linear observed time statistical parser based on maximum entropy models. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, Brown University, Providence, RI.
[52]
Resnik, Philip. 1992. Probabilistic tree-adjoining grammar as a framework for statistical natural language processing. In Proceedings of COLING 1992, vol. 2, pages 418-424.
[53]
Roark, Brian. 2001. Probabilistic top-down parsing and language modeling. Computational Linguistics, 27(2):249-276.
[54]
Schabes, Yves. 1992. Stochastic lexicalized tree-adjoining grammars. In Proceedings of COLING 1992, vol. 2, pages 426-432.
[55]
Schabes, Yves and Richard Waters. 1993. Stochastic lexicalized context-free grammar. In Proceedings of the Third International Workshop on Parsing Technologies.
[56]
Sekine, Satoshi, John Carroll, S. Ananiadou, and J. Tsujii. 1992. Automatic Learning for Semantic Collocation. In Proceedings of the Third Conference on Applied Natural Language Processing.
[57]
Seneff, Stephanie. 1992. TINA: A natural language system for spoken language applications. Computational Linguistics, 18(1):61-86.
[58]
Witten, Ian and Timothy C. Bell. 1991. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4):1085-1094.

Cited By

View all
  • (2024)The impact of state merging on predictive accuracy in probabilistic tree automataJournal of Computer and System Sciences10.1016/j.jcss.2024.103563146:COnline publication date: 1-Dec-2024
  • (2023)The Impact of State Merging on Predictive Accuracy in Probabilistic Tree Automata: Dietze’s Conjecture RevisitedFundamentals of Computation Theory10.1007/978-3-031-43587-4_6(74-87)Online publication date: 18-Sep-2023
  • (2022)Dependency and Span, Cross-Style Semantic Role Labeling on PropBank and NomBankACM Transactions on Asian and Low-Resource Language Information Processing10.1145/352621421:6(1-16)Online publication date: 12-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computational Linguistics
Computational Linguistics  Volume 29, Issue 4
December 2003
158 pages
ISSN:0891-2017
EISSN:1530-9312
Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 December 2003
Published in COLI Volume 29, Issue 4

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The impact of state merging on predictive accuracy in probabilistic tree automataJournal of Computer and System Sciences10.1016/j.jcss.2024.103563146:COnline publication date: 1-Dec-2024
  • (2023)The Impact of State Merging on Predictive Accuracy in Probabilistic Tree Automata: Dietze’s Conjecture RevisitedFundamentals of Computation Theory10.1007/978-3-031-43587-4_6(74-87)Online publication date: 18-Sep-2023
  • (2022)Dependency and Span, Cross-Style Semantic Role Labeling on PropBank and NomBankACM Transactions on Asian and Low-Resource Language Information Processing10.1145/352621421:6(1-16)Online publication date: 12-Nov-2022
  • (2022)Multitask Pointer Network for multi-representational parsingKnowledge-Based Systems10.1016/j.knosys.2021.107760236:COnline publication date: 25-Jan-2022
  • (2022)Unsupervised and few-shot parsing from pretrained language modelsArtificial Intelligence10.1016/j.artint.2022.103665305:COnline publication date: 1-Apr-2022
  • (2021)I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical TheoryACM Transactions on Asian and Low-Resource Language Information Processing10.1145/347229521:2(1-32)Online publication date: 18-Nov-2021
  • (2021)Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency ParsingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/342424720:2(1-12)Online publication date: 30-Mar-2021
  • (2020)Improving Natural Language Parser Accuracy by Unknown Word ReplacementProceedings of the International Conference on Computing Advancements10.1145/3377049.3377124(1-7)Online publication date: 10-Jan-2020
  • (2019)Designing High Accuracy Statistical Machine Translation for Sign Language Using Parallel CorpusJournal of Information Technology Research10.4018/JITR.201904010812:2(134-158)Online publication date: 1-Apr-2019
  • (2018)Neural character-level dependency parsing for chineseProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504673(5205-5212)Online publication date: 2-Feb-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media