short-paper

Improving Natural Language Parser Accuracy by Unknown Word Replacement

Authors:

Khandaker Tabin HasanAuthors Info & Claims

ICCA 2020: Proceedings of the International Conference on Computing Advancements

Article No.: 3, Pages 1 - 7

https://doi.org/10.1145/3377049.3377124

Published: 20 March 2020 Publication History

Abstract

Natural language parsers are the basis for further understanding the content written in natural language. Parsers for natural language have been shown to be effective in many NLP tasks, such as, machine translation, sentiment analysis and classification of documents. The existing state-of-the-art parsers, such as Charniak [9], Collins [11], Stanford, OpenNLP, have been shown to have F Score ranging from 85 to 92 percent. The accuracy of the parsers is hampered to a major extent by unknown and unseen words. In this paper we show a novel method on improving the accuracy by incorporating knowledge about the unknown words from external source. Experimental results show our technique improves accuracy. The improvement depends on number of known words present in the model during training. We show that we achieve above one percent improvement on some parsers.

References

[1]

How many words are there in the english language. https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-the-english-language/.

[2]

Updates to the oed. https://public.oed.com/updates//.

[3]

Wordnet. https://wordnet.princeton.edu/.

[4]

Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Processing Conference, 1996.

[5]

James Allen. Natural Language Understanding. THE BEN-JAMIN/CUMMINGS PUBLISHING COMPANY, INC., 1987.

[6]

Daniel M. Bikel. Intricacies of collins' parsing model, 2004.

Digital Library

[7]

Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, and Salim Roukos. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, ACL '93, pages 31--37, Stroudsburg, PA, USA, 1993. Association for Computational Linguistics.

Digital Library

[8]

Eugene Charniak. Statistical parsing with a context-free grammar and word statistics. Proceedings of the 14th National Conference on Artificial Intelligence, 1997.

[9]

Eugene Charniak. A Maximum-Entropy-Inspired Parser. 1st North American chapter of the Association for Computational Linguistics conference (NAACL' 2000), 2000.

[10]

Danqi Chen and Christopher Manning. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.

[11]

Michael Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, 1999.

Digital Library

[12]

Michael Collins. Head-driven statistical models for natural language parsing, 2003.

Digital Library

[13]

Kyle D Dent and Sharoda A Paul. Through the twitter glass: Detecting questions in micro-text. Analyzing Microtext, 11:05, 2011.

[14]

Evangelos Dermatas and George Kokkinakis. Automatic stochastic tagging of natural language texts. Comput. Linguist., 21(2):137--163, June 1995.

Digital Library

[15]

Jason Eisner and Giorgio Satta. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 457--464. Association for Computational Linguistics, 1999.

Digital Library

[16]

F. Jelinek, J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi, and S. Roukos. Decision tree parsing using a hidden derivation model. In Proceedings of the Workshop on Human Language Technology, HLT '94, pages 272--277, Stroudsburg, PA, USA, 1994. Association for Computational Linguistics.

Digital Library

[17]

Daniel Jurafsky. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Computational Linguistics, 2000.

[18]

Adam Kilgarriff and Christiane Fellbaum. WordNet: An Electronic Lexical Database. Language, 2000.

[19]

Dan Klein and Christopher D Manning. Fast extract inference with a factored model for natural language parsing. Advances in Neural Information Processing Systems 15 (NIPS 2002), 2003.

[20]

David M. Magerman. Learning grammatical structure using statistical decision-trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1996.

[21]

Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. The penn treebank: annotating predicate argument structure. In Proceedings of the workshop on Human Language Technology, pages 114--119. Association for Computational Linguistics, 1994.

Digital Library

[22]

Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313--330, 1993.

Digital Library

[23]

Hermann Ney. Dynamic programming parsing for context-free grammars in continuous speech recognition. IEEE Transactions on Signal Processing, 39(2):336--340, 1991.

Digital Library

[24]

Joakim Nivre. Parsing with pcfgs. 2013.

[25]

Steven Pinker. Language learnability and language development, with new commentary by the author, volume 7. Harvard University Press, 2009.

[26]

Sujith Ravi, Kevin Knight, and Radu Soricut. Automatic Prediction of Parser Accuracy. Computational Linguistics, 2008.

[27]

R Socher and Cc Lin. Parsing natural scenes and natural language with recursive neural networks. International Conference on Machine Learning, 2011.

Digital Library

[28]

Andreas Stolcke. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational linguistics, 21(2):165--201, 1995.

Digital Library

[29]

R. Thompson and T. Booth. Applying probability measures to abstract languages. IEEE Transactions on Computers, 22:442--450, 05 1973.

[30]

R Weischedel, R Schwartz, J Palmucci, M Meteer, and L Ramshaw. Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 1993.

Index Terms

Improving Natural Language Parser Accuracy by Unknown Word Replacement
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Bottom-up context-sensitive algorithms for Bengali parser in natural language processing

This paper embodies the design of parsing algorithms tangibly for a Bengali parser. To design parsing algorithms a detailed study on linguistics and grammar has been performed. A detailed study also has been made on the various techniques and algorithms ...
A conceptual dependency parser for natural language
COLING '69: Proceedings of the 1969 conference on Computational linguistics

This paper describes an operable automatic parser for natural language. The parser is not concerned with producing the syntactic structure of an input sentence. Instead, it is a conceptual parser, concerned with determining the underlying meaning of the ...
An efficient parser generator for natural language
COLING '94: Proceedings of the 15th conference on Computational linguistics - Volume 1

We have developed a parser generator for natural language processing. The generator named "NLyacc" accepts grammar rules written in the Yacc format. NLyacc, unlike Yacc, can handle arbitrary context-free grammars using the generalized LR parsing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCA 2020: Proceedings of the International Conference on Computing Advancements

January 2020

517 pages

ISBN:9781450377782

DOI:10.1145/3377049

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Short-paper
Research
Refereed limited

Conference

ICCA 2020

ICCA 2020: International Conference on Computing Advancements

January 10 - 12, 2020

Dhaka, Bangladesh

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
48
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten