Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1458082.1458102acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Web-scale named entity recognition

Published: 26 October 2008 Publication History

Abstract

Automatic recognition of named entities such as people, places, organizations, books, and movies across the entire web presents a number of challenges, both of scale and scope. Data for training general named entity recognizers is difficult to come by, and efficient machine learning methods are required once we have found hundreds of millions of labeled observations. We present an implemented system that addresses these issues, including a method for automatically generating training data, and a multi-class online classification training method that learns to recognize not only high level categories such as place and person, but also more fine-grained categories such as soccer players, birds, and universities. The resulting system gives precision and recall performance comparable to that obtained for more limited entity types in much more structured domains such as company recognition in newswire, even though web documents often lack consistent capitalization and grammatical sentence construction.

References

[1]
E. Agichtein. Extracting Relations from Large Text Corpora. PhD. Thesis, CS, Columbia, University, 2005.
[2]
Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In IJCAI, 2007.
[3]
A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Proc. 6th Workshop on Very Large Corpora (VLC). Association for Computational Linguistics, 1998.
[4]
Sergey Brin. Extracting patterns and relations from the world wide web. In WebDB, 1998.
[5]
Razvan C. Bunescu and Marius Pasca. Using encyclopedic knowledge for named entity disambiguation. In EACL, Trento, Italy, 2006.
[6]
Michelangelo Ceci and Donato Malerba. Classifying web documents in a hierarchy of categories: a comprehensive study. J. Intell. Inf. Syst., 28(1):37--78, 2007.
[7]
N. Chinchor, L. Hirschman, and D. Lewis. Evaluating message understanding systems: An analysis of the third message understanding conference (muc-3). Computational Linguistics, 3(19):409--449, 1994.
[8]
Michael Collins and Yoram Singer. Unsupervised models for named entity classification. In EMNLP/VLC-99, 1999.
[9]
Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551--585, 2006.
[10]
S. Cucerzan. Large scale named entity disambiguation based on wikipedia data. In The EMNLP-CoNLL Joint Conference. Prague, 2007.
[11]
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI'04: Sixth Symposium on Operating System Design and Implementation, pages 137--150, 2004.
[12]
Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 1995.
[13]
Doug Downey, Matthew Broadhead, and Oren Etzioni. Locating complex named entities in web text. In IJCAI, 2007.
[14]
Doug Downey, Steven Schoenmackers, and Oren Etzioni. Sparse information extraction: Unsupervised language models to the rescue. In ACL-07, 2007.
[15]
Susan T. Dumais and Hao Chen. Hierarchical classification of Web content. In SIGIR-00, pages 256--263. ACM Press, New York, US, 2000.
[16]
A. Yeh et al. Biocreative task 1a: Gene mention finding evaluation. BMC Bioinformatics, 6, 2005.
[17]
D. Miller et al. Named entity extraction from broadcast news. In Proceedings of DARPA Broadcast News Workshop. Herndon, VA, 1999.
[18]
O. Etzioni. et al. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91--134, 2005.
[19]
Jae-Ho Kim, In-Ho Kang, and Key-Sun Choi. Unsupervised named entity classification models and their ensembles. In Proc. 19th Intl. Conf. on Computational linguistics, 2002.
[20]
Dan Klein, Joseph Smarr, Huy Nguyen, and Christopher D. Manning. Named entity recognition with character-level models. In Proceedings the Seventh Conference on Natural Language Learning, 2003.
[21]
D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In ICML, pages 170--178, 1997.
[22]
James Mayfield, Paul McNamee, and Christine D. Piatko. Named entity recognition using hundreds of thousands of features. In CoNLL, 2003.
[23]
David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification. Linguisticae Investigationes, 30(1):3--26, 2007.
[24]
David Nadeau, Peter D. Turney, and Stan Matwin. Unsupervised named entity recognition: Generating gazetteers and resolving ambiguity. In Proc. Canadian Conference on Artificial Intelligence, 2006.
[25]
B. Rosenfeld and R. Feldman. Using corpus statistics on entities to improve semi-supervised relation extraction from the web. In ACL-07, 2007.
[26]
Aixin Sun and Ee-Peng Lim. Hierarchical text classification and evaluation. In ICDM '01, pages 521--528, 2001.
[27]
Erik F. Tjong, Kim Sang, and Fien De Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In CoNLL, 2003.

Cited By

View all
  • (2023)Deep Learning-Based Joint Extraction Model of Entity Relationships for Cloud Operations Knowledge Graph2023 5th International Academic Exchange Conference on Science and Technology Innovation (IAECST)10.1109/IAECST60924.2023.10502732(775-786)Online publication date: 8-Dec-2023
  • (2022)Weakly labeled data augmentation for social media named entity recognitionExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118217209:COnline publication date: 15-Dec-2022
  • (2021)System-aware dynamic partitioning for batch and streaming workloadsProceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing10.1145/3468737.3494087(1-10)Online publication date: 6-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. named entity recognition
  2. text mining
  3. web mining

Qualifiers

  • Research-article

Conference

CIKM08
CIKM08: Conference on Information and Knowledge Management
October 26 - 30, 2008
California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Deep Learning-Based Joint Extraction Model of Entity Relationships for Cloud Operations Knowledge Graph2023 5th International Academic Exchange Conference on Science and Technology Innovation (IAECST)10.1109/IAECST60924.2023.10502732(775-786)Online publication date: 8-Dec-2023
  • (2022)Weakly labeled data augmentation for social media named entity recognitionExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118217209:COnline publication date: 15-Dec-2022
  • (2021)System-aware dynamic partitioning for batch and streaming workloadsProceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing10.1145/3468737.3494087(1-10)Online publication date: 6-Dec-2021
  • (2020)Knowledge Graph Oriented Information ExtractionHans Journal of Data Mining10.12677/HJDM.2020.10403010:04(282-302)Online publication date: 2020
  • (2020)A unified framework for attribute extraction in electronic medical recordsProceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence10.1145/3446132.3446410(1-7)Online publication date: 24-Dec-2020
  • (2018)Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus)2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)10.1109/INFRKM.2018.8464820(1-8)Online publication date: Mar-2018
  • (2017)Relation extraction for knowledge graph of dangerous goods based on distributed representation2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC.2017.8122601(194-199)Online publication date: Oct-2017
  • (2017)A Novel Method for Open Relation Extraction from Public Announcements of Chinese Listed Companies2017 Fifth International Conference on Advanced Cloud and Big Data (CBD)10.1109/CBD.2017.42(200-205)Online publication date: Aug-2017
  • (2016)Geotagging Named Entities in News and Online DocumentsProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983795(1321-1330)Online publication date: 24-Oct-2016
  • (2015)Identifying Things, Relations, and Semantizing DataAdvanced Applications of Natural Language Processing for Performing Information Extraction10.1007/978-3-319-15563-0_3(27-36)Online publication date: 2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media