article

Head-Driven Statistical Models for Natural Language Parsing

Author:

Michael CollinsAuthors Info & Claims

Computational Linguistics, Volume 29, Issue 4

Pages 589 - 637

https://doi.org/10.1162/089120103322753356

Published: 01 December 2003 Publication History

Abstract

This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.

References

[1]

Alshawi, Hiyan. 1996. Head automata and bilingual tiling: Translation with minimal representations. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 167-176.

Digital Library

[2]

Alshawi, Hiyan and David Carter. 1994. Training and scaling preference functions for disambiguation. Computational Linguistics, 20(4):635-648.

Digital Library

[3]

Bikel, Dan. 2000. A statistical model for parsing and word-sense disambiguation. In Proceedings of the Student Research Workshop at ACL 2000.

Digital Library

[4]

Bikel, Dan, Scott Miller, Richard Schwartz, and Ralph Weischedel. 1997. Nymble: A high-performance learning name-finder. In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 194-201.

Digital Library

[5]

Black, Ezra, Steven Abney, Dan Flickinger, Claudia Gdaniec, Ralph Grishman, Philip Harrison, Donald Hindle, Robert Ingria, Frederick Jelinek, Judith Klavans, Mark Liberman, Mitch Marcus, Salim Roukos, Beatrice Santorini, and Tomek Strzalkowski. 1991. A Procedure for quantitatively comparing the syntactic coverage of english grammars. In Proceedings of the February 1991 DARPA Speech and Natural Language Workshop.

Digital Library

[6]

Black, Ezra, Frederick Jelinek, John Lafferty, David Magerman, Robert Mercer and Salim Roukos. 1992. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings of the Fifth DARPA Speech and Natural Language Workshop, Harriman, NY.

Digital Library

[7]

Blaheta, Don, and Eugene Charniak. 2000. Assigning function tags to parsed text. In Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pages 234-240, Seattle.

Digital Library

[8]

Bod, Rens. 2001. What is the minimal set of fragments that achieves maximal parse accuracy? In Proceedings of ACL 2001.

Digital Library

[9]

Booth, Taylor L., and Richard A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442-450.

Digital Library

[10]

Brill, Eric. 1993. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics.

Digital Library

[11]

Charniak, Eugene. 1997. Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI Press/MIT Press, Menlo Park, CA.

Digital Library

[12]

Charniak, Eugene. 2000. A maximum-entropy-inspired parser. In Proceedings of NAACL 2000.

Digital Library

[13]

Charniak, Eugene. 2001. Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics.

Digital Library

[14]

Chelba, Ciprian, and Frederick Jelinek. 1998. Exploiting syntactic structure for language modeling. In Proceedings of COLING-ACL 1998, Montreal.

Digital Library

[15]

Chiang, David. 2000. Statistical parsing with an automatically-extracted tree adjoining grammar. In Proceedings of ACL 2000, Hong Kong, pages 456-463.

Digital Library

[16]

Collins, Michael, and James Brooks. 1995. Prepositional phrase attachment through a backed-off model. Proceedings of the Third Workshop on Very Large Corpora, pages 27-38.

[17]

Collins, Michael. 1996. A new statistical parser based on bigram lexical dependencies. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 184-191.

Digital Library

[18]

Collins, Michael. 1997. Three generative, lexicalised models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pages 16-23.

Digital Library

[19]

Collins, Michael. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.

Digital Library

[20]

Collins, Michael, Jan Hajic, Lance Ramshaw, and Christoph Tillmann. 1999. A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of the ACL, College Park, Maryland.

Digital Library

[21]

Collins, Michael. 2000. Discriminative reranking for natural language parsing. Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000).

Digital Library

[22]

Collins, Michael. 2002. Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods. To appear as a book chapter.

[23]

De Marcken, Carl. 1995. On the unsupervised induction of phrase-structure grammars. In Proceedings of the Third Workshop on Very Large Corpora.

[24]

Eisner, Jason. 1996a. Three new probabilistic models for dependency parsing: An exploration. Proceedings of COLING-96, pages 340-345.

Digital Library

[25]

Eisner, Jason. 1996b. An empirical comparison of probability models for dependency grammar. Technical Report IRCS-96-11, Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia.

[26]

Eisner, Jason. 2002. Transformational priors over grammars. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia.

Digital Library

[27]

Eisner, Jason, and Giorgio Satta. 1999. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of the 37th Annual Meeting of the ACL.

Digital Library

[28]

Freund, Yoav, Raj Iyer, Robert E. Schapire, and Yoram Singer. 1998. An efficient boosting algorithm for combining preferences. In Machine Learning: Proceedings of the Fifteenth International Conference. Morgan Kaufmann.

Digital Library

[29]

Gazdar, Gerald, E. H. Klein, G. K. Pullum, and Ivan Sag. 1985. Generalized Phrase Structure Grammar. Harvard University Press, Cambridge, MA.

[30]

Gildea, Daniel. 2001. Corpus variation and parser performance. In Proceedings of 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pittsburgh, PA.

[31]

Goodman, Joshua. 1997. Probabilistic feature grammars. In Proceedings of the Fourth International Workshop on Parsing Technologies.

[32]

Hermjakob, Ulf, and Ray Mooney. 1997. Learning parse and translation decisions from examples with rich context. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pages 482-489.

Digital Library

[33]

Hindle, Don, and Mats Rooth. 1991. Structural Ambiguity and Lexical Relations. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics.

Digital Library

[34]

Hopcroft, John, and J. D. Ullman. 1979. Introduction to automata theory, languages, and computation. Addison-Wesley, Reading, MA.

Digital Library

[35]

Jelinek, Frederick, John Lafferty, David Magerman, Robert Mercer, Adwait Ratnaparkhi, and Salim Roukos. 1994. Decision tree parsing using a hidden derivation model. In Proceedings of the 1994 Human Language Technology Workshop, pages 272-277.

Digital Library

[36]

Johnson, Mark. 1997. The effect of alternative tree representations on tree bank grammars. In Proceedings of NeMLAP 3.

Digital Library

[37]

Jones, Mark and Jason Eisner. 1992a. A probabilistic parser applied to software testing documents. In Proceedings of National Conference on Artificial Intelligence (AAAI-92), pages 322-328, San Jose, CA.

Digital Library

[38]

Jones, Mark and Jason Eisner. 1992b. A probabilistic parser and its application. In Proceedings of the AAAI-92 Workshop on Statistically-Based Natural Language Processing Techniques, San Jose, CA.

[39]

Joshi, Aravind and Bangalore Srinivas. 1994. Disambiguation of super parts of speech (or supertags): Almost parsing. In International Conference on Computational Linguistics (COLING 1994), Kyoto University, Japan, August.

Digital Library

[40]

Klein, Dan and Christopher Manning. 2002. Conditional structure versus conditional estimation in NLP models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia.

Digital Library

[41]

Lafferty, John, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML 2001.

Digital Library

[42]

Lafferty, John, Daniel Sleator, and David Temperley. 1992. Grammatical trigrams: A probabilistic model of link grammar. Proceedings of the 1992 AAAI Fall Symposium on Probabilistic Approaches to Natural Language.

[43]

Magerman, David. 1995. Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 276-283.

Digital Library

[44]

Manning, Christopher D., and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

Digital Library

[45]

Marcus, Mitchell, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. 1994. The Penn Treebank: Annotating predicate argument structure. Proceedings of the 1994 Human Language Technology Workshop, pages 110-115.

Digital Library

[46]

Marcus, Mitchell, Beatrice Santorini and M. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330.

Digital Library

[47]

Miller, Scott, Heidi Fox, Lance Ramshaw, and Ralph Weischedel. 2000. A novel use of statistical parsing to extract information from text. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 226-233.

Digital Library

[48]

Pereira, Fernando, and David Warren. 1980. Definite clause grammars for language analysis: A survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence, 13:231-278.

[49]

Pinker, Stephen. 1994. The Language Instinct. William Morrow, New York.

[50]

Ratnaparkhi, Adwait. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, May.

[51]

Ratnaparkhi, Adwait. 1997. A linear observed time statistical parser based on maximum entropy models. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, Brown University, Providence, RI.

[52]

Resnik, Philip. 1992. Probabilistic tree-adjoining grammar as a framework for statistical natural language processing. In Proceedings of COLING 1992, vol. 2, pages 418-424.

Digital Library

[53]

Roark, Brian. 2001. Probabilistic top-down parsing and language modeling. Computational Linguistics, 27(2):249-276.

Digital Library

[54]

Schabes, Yves. 1992. Stochastic lexicalized tree-adjoining grammars. In Proceedings of COLING 1992, vol. 2, pages 426-432.

Digital Library

[55]

Schabes, Yves and Richard Waters. 1993. Stochastic lexicalized context-free grammar. In Proceedings of the Third International Workshop on Parsing Technologies.

[56]

Sekine, Satoshi, John Carroll, S. Ananiadou, and J. Tsujii. 1992. Automatic Learning for Semantic Collocation. In Proceedings of the Third Conference on Applied Natural Language Processing.

Digital Library

[57]

Seneff, Stephanie. 1992. TINA: A natural language system for spoken language applications. Computational Linguistics, 18(1):61-86.

Digital Library

[58]

Witten, Ian and Timothy C. Bell. 1991. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4):1085-1094.

Digital Library

Cited By

Björklund J(2024)The impact of state merging on predictive accuracy in probabilistic tree automataJournal of Computer and System Sciences10.1016/j.jcss.2024.103563146:COnline publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1016/j.jcss.2024.103563
Björklund J(2023)The Impact of State Merging on Predictive Accuracy in Probabilistic Tree Automata: Dietze’s Conjecture RevisitedFundamentals of Computation Theory10.1007/978-3-031-43587-4_6(74-87)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43587-4_6
Li ZZhao HZhou JParnow KHe S(2022)Dependency and Span, Cross-Style Semantic Role Labeling on PropBank and NomBankACM Transactions on Asian and Low-Resource Language Information Processing10.1145/352621421:6(1-16)Online publication date: 12-Nov-2022
https://dl.acm.org/doi/10.1145/3526214
Show More Cited By

Index Terms

Head-Driven Statistical Models for Natural Language Parsing

Recommendations

GLR parsing with multiple grammars for natural language queries

This article presents an approach for parsing natural language queries that integrates multiple subparsers and subgrammars, in contrast to the traditional single grammar and parser approach. In using LR(k) parsers for natural language processing, we are ...
Well-Foundedness and Reliability in Statistical Natural Language Parsing
Head-driven statistical models for natural language parsing

Comments

Information & Contributors

Information

Published In

cover image Computational Linguistics

Computational Linguistics Volume 29, Issue 4

December 2003

158 pages

ISSN:0891-2017

EISSN:1530-9312

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 December 2003

Published in COLI Volume 29, Issue 4

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

121
Total Citations
View Citations
1,484
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Björklund J(2024)The impact of state merging on predictive accuracy in probabilistic tree automataJournal of Computer and System Sciences10.1016/j.jcss.2024.103563146:COnline publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1016/j.jcss.2024.103563
Björklund J(2023)The Impact of State Merging on Predictive Accuracy in Probabilistic Tree Automata: Dietze’s Conjecture RevisitedFundamentals of Computation Theory10.1007/978-3-031-43587-4_6(74-87)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43587-4_6
Li ZZhao HZhou JParnow KHe S(2022)Dependency and Span, Cross-Style Semantic Role Labeling on PropBank and NomBankACM Transactions on Asian and Low-Resource Language Information Processing10.1145/352621421:6(1-16)Online publication date: 12-Nov-2022
https://dl.acm.org/doi/10.1145/3526214
Fernández-González DGómez-Rodríguez C(2022)Multitask Pointer Network for multi-representational parsingKnowledge-Based Systems10.1016/j.knosys.2021.107760236:COnline publication date: 25-Jan-2022
https://dl.acm.org/doi/10.1016/j.knosys.2021.107760
Zeng ZXiong D(2022)Unsupervised and few-shot parsing from pretrained language modelsArtificial Intelligence10.1016/j.artint.2022.103665305:COnline publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1016/j.artint.2022.103665
Halabi DFayyoumi EAwajan A(2021)I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical TheoryACM Transactions on Asian and Low-Resource Language Information Processing10.1145/347229521:2(1-32)Online publication date: 18-Nov-2021
https://dl.acm.org/doi/10.1145/3472295
Shi SLuo DWu XLong CHuang H(2021)Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency ParsingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/342424720:2(1-12)Online publication date: 30-Mar-2021
https://dl.acm.org/doi/10.1145/3424247
Kibria RHasan K(2020)Improving Natural Language Parser Accuracy by Unknown Word ReplacementProceedings of the International Conference on Computing Advancements10.1145/3377049.3377124(1-7)Online publication date: 10-Jan-2020
https://dl.acm.org/doi/10.1145/3377049.3377124
Othman AJemni M(2019)Designing High Accuracy Statistical Machine Translation for Sign Language Using Parallel CorpusJournal of Information Technology Research10.4018/JITR.201904010812:2(134-158)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.4018/JITR.2019040108
Li HZhang ZJu YZhao HMcIlraith SWeinberger K(2018)Neural character-level dependency parsing for chineseProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504673(5205-5212)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.5555/3504035.3504673
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents