Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1119176.1119211dlproceedingsArticle/Chapter ViewAbstractPublication PagesconllConference Proceedingsconference-collections
Article
Free access

Memory-based named entity recognition using unannotated data

Published: 31 May 2003 Publication History

Abstract

We used the memory-based learner Timbl (Daelemans et al., 2002) to find names in English and German newspaper text. A first system used only the training data, and a number of gazetteers. The results show that gazetteers are not beneficial in the English case, while they are for the German data. Type-token generalization was applied, but also reduced performance. The second system used gazetteers derived from the unannotated corpus, as well as the ratio of capitalized versus uncapitalized use of each word. These strategies gave an increase in performance.

References

[1]
Michael Collins. 2002. Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 489--496, Philadelphia.
[2]
Walter Daelemans and Véronique Hoste. 2002. Evaluation of Machine Learning Methods for Natural Language Processing Tasks. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002), pages 755--760, Las Palmas, Gran Canaria.
[3]
Walter Daelemans, Jakub Zavrel, Ko van der Sloot, and Antal van den Bosch. 2002. TiMBL: Tilburg Memory Based Learner, version 4.3, Reference Guide. ILK Technical Report 02-10, ILK. Available from http://ilk.kub.nl/downloads/pub/papers/ilk0210.ps.gz.
[4]
Fien De Meulder, Walter Daelemans, and Véronique Hoste. 2002. A Named Entity Recognition System for Dutch. In M. Theune, A. Nijholt, and H. Hondrop, editors, Computational Linguistics in the Netherlands 2001. Selected Papers from the Twelfth CLIN Meeting, pages 77--88, Amsterdam - New York. Rodopi.

Cited By

View all
  • (2008)Named entity normalization in user generated contentProceedings of the second workshop on Analytics for noisy unstructured text data10.1145/1390749.1390755(23-30)Online publication date: 24-Jul-2008
  • (2006)Empirical study on the performance stability of named entity recognition model across domainsProceedings of the 2006 Conference on Empirical Methods in Natural Language Processing10.5555/1610075.1610145(509-516)Online publication date: 22-Jul-2006
  • (2005)Mining knowledge from text using information extractionACM SIGKDD Explorations Newsletter10.1145/1089815.10898177:1(3-10)Online publication date: 1-Jun-2005
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
May 2003
213 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 31 May 2003

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)10
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2008)Named entity normalization in user generated contentProceedings of the second workshop on Analytics for noisy unstructured text data10.1145/1390749.1390755(23-30)Online publication date: 24-Jul-2008
  • (2006)Empirical study on the performance stability of named entity recognition model across domainsProceedings of the 2006 Conference on Empirical Methods in Natural Language Processing10.5555/1610075.1610145(509-516)Online publication date: 22-Jul-2006
  • (2005)Mining knowledge from text using information extractionACM SIGKDD Explorations Newsletter10.1145/1089815.10898177:1(3-10)Online publication date: 1-Jun-2005
  • (2004)Chinese named entity recognition based on multilevel linguistic featuresProceedings of the First international joint conference on Natural Language Processing10.1007/978-3-540-30211-7_10(90-99)Online publication date: 22-Mar-2004
  • (2003)Introduction to the CoNLL-2003 shared taskProceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 410.3115/1119176.1119195(142-147)Online publication date: 31-May-2003

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media