Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3377049.3377124acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaConference Proceedingsconference-collections

Improving Natural Language Parser Accuracy by Unknown Word Replacement

Published: 20 March 2020 Publication History


Natural language parsers are the basis for further understanding the content written in natural language. Parsers for natural language have been shown to be effective in many NLP tasks, such as, machine translation, sentiment analysis and classification of documents. The existing state-of-the-art parsers, such as Charniak [9], Collins [11], Stanford, OpenNLP, have been shown to have F Score ranging from 85 to 92 percent. The accuracy of the parsers is hampered to a major extent by unknown and unseen words. In this paper we show a novel method on improving the accuracy by incorporating knowledge about the unknown words from external source. Experimental results show our technique improves accuracy. The improvement depends on number of known words present in the model during training. We show that we achieve above one percent improvement on some parsers.


How many words are there in the english language. https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-the-english-language/.
Updates to the oed. https://public.oed.com/updates//.
Wordnet. https://wordnet.princeton.edu/.
Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Processing Conference, 1996.
James Allen. Natural Language Understanding. THE BEN-JAMIN/CUMMINGS PUBLISHING COMPANY, INC., 1987.
Daniel M. Bikel. Intricacies of collins' parsing model, 2004.
Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, and Salim Roukos. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, ACL '93, pages 31--37, Stroudsburg, PA, USA, 1993. Association for Computational Linguistics.
Eugene Charniak. Statistical parsing with a context-free grammar and word statistics. Proceedings of the 14th National Conference on Artificial Intelligence, 1997.
Eugene Charniak. A Maximum-Entropy-Inspired Parser. 1st North American chapter of the Association for Computational Linguistics conference (NAACL' 2000), 2000.
Danqi Chen and Christopher Manning. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
Michael Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, 1999.
Michael Collins. Head-driven statistical models for natural language parsing, 2003.
Kyle D Dent and Sharoda A Paul. Through the twitter glass: Detecting questions in micro-text. Analyzing Microtext, 11:05, 2011.
Evangelos Dermatas and George Kokkinakis. Automatic stochastic tagging of natural language texts. Comput. Linguist., 21(2):137--163, June 1995.
Jason Eisner and Giorgio Satta. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 457--464. Association for Computational Linguistics, 1999.
F. Jelinek, J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi, and S. Roukos. Decision tree parsing using a hidden derivation model. In Proceedings of the Workshop on Human Language Technology, HLT '94, pages 272--277, Stroudsburg, PA, USA, 1994. Association for Computational Linguistics.
Daniel Jurafsky. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Computational Linguistics, 2000.
Adam Kilgarriff and Christiane Fellbaum. WordNet: An Electronic Lexical Database. Language, 2000.
Dan Klein and Christopher D Manning. Fast extract inference with a factored model for natural language parsing. Advances in Neural Information Processing Systems 15 (NIPS 2002), 2003.
David M. Magerman. Learning grammatical structure using statistical decision-trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1996.
Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. The penn treebank: annotating predicate argument structure. In Proceedings of the workshop on Human Language Technology, pages 114--119. Association for Computational Linguistics, 1994.
Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313--330, 1993.
Hermann Ney. Dynamic programming parsing for context-free grammars in continuous speech recognition. IEEE Transactions on Signal Processing, 39(2):336--340, 1991.
Joakim Nivre. Parsing with pcfgs. 2013.
Steven Pinker. Language learnability and language development, with new commentary by the author, volume 7. Harvard University Press, 2009.
Sujith Ravi, Kevin Knight, and Radu Soricut. Automatic Prediction of Parser Accuracy. Computational Linguistics, 2008.
R Socher and Cc Lin. Parsing natural scenes and natural language with recursive neural networks. International Conference on Machine Learning, 2011.
Andreas Stolcke. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational linguistics, 21(2):165--201, 1995.
R. Thompson and T. Booth. Applying probability measures to abstract languages. IEEE Transactions on Computers, 22:442--450, 05 1973.
R Weischedel, R Schwartz, J Palmucci, M Meteer, and L Ramshaw. Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 1993.

Index Terms

  1. Improving Natural Language Parser Accuracy by Unknown Word Replacement



    Information & Contributors


    Published In

    cover image ACM Other conferences
    ICCA 2020: Proceedings of the International Conference on Computing Advancements
    January 2020
    517 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 March 2020


    Request permissions for this article.

    Check for updates


    • Short-paper
    • Research
    • Refereed limited


    ICCA 2020


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • 0
      Total Citations
    • 48
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 08 Feb 2025

    Other Metrics


    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media